Methodology — Canadian AI Incident Monitor

CAIM documents AI incidents and hazards with a Canada nexus through a continuous monitoring pipeline combining automated detection with human editorial review. Every record carries a verification status, externally attributed policy recommendations where available, and taxonomy tags aligned with OECD and AIID frameworks.

Scope

CAIM documents AI-related harm and the conditions that produce it, through two types of records:

Incident records document discrete events where an AI system's development, deployment, or use produced harm or near-harm.
Hazard records document structural conditions that create realistic pathways to harm, whether or not harm has already occurred.

These are different kinds of things: a hazard is an ongoing condition; an incident is a discrete event. A single hazard can produce many incidents over time, and the hazard persists even after incidents occur, just as a dangerous intersection remains a hazard after each collision. The materialized_from link on incident records captures this relationship: incidents are evidence that a hazard is real. This follows the model used in aviation safety, where crash investigations and voluntary hazard reports feed the same safety objective.

What systems are in scope

An AI system is one that uses machine learning, neural networks, foundation models, or systems built on these. This includes statistical models trained on data, deep learning, generative models, and hybrid systems incorporating such components.

Systems whose behavior is fully specified by human-authored rules (deterministic scoring instruments, structured questionnaires, rule-based automation, data extraction tools) are out of scope, even when described as "algorithmic" or "AI."

This definition will be revised as the technology evolves.

What is out of scope

Rule-based systems, deterministic scoring instruments, structured questionnaires, and data extraction tools, even when deployed at scale in consequential decisions
Simple automation (e.g., mail merge, spreadsheet macros, basic workflow triggers) where the system has no decision-making function and no plausible pathway to harm
Purely theoretical risks with no documented evidence of a precursor condition
Events where AI is mentioned incidentally but played no material role in the pathway to harm

Incidents

An event or series of events in which an AI system's development, deployment, or use is plausibly implicated in harm or a near-harm outcome. This includes materially AI-enabled misuse.

Hazards

A credible precursor condition, precursor failure, or near-miss pattern indicating a realistic pathway to harm, even if harm was prevented or has not yet been observed. Hazards are included because near-misses are often the most informative cases for prevention.

Hazard records require documented evidence of the precursor condition: a regulatory finding, an investigation, a published technical assessment, or equivalent. A policy gap alone is not sufficient; there must be evidence that the gap has created conditions where harm is plausible and proximate.

Material AI involvement

A case is in scope when an AI system is a meaningful factor in the pathway to harm, not merely incidental. The test: would this harm have occurred without the learned or emergent behaviour of the AI component? If the same outcome would result from a fixed rule, a lookup table, or a manual process, the AI is incidental.

Borderline cases, worked examples:

Scenario	In scope?	Reasoning
AI-generated deepfake used to impersonate a CEO and authorize a fraudulent wire transfer	Yes	AI capability (voice cloning / image synthesis) is the enabling factor; the fraud could not have occurred at this fidelity without it
A phishing email written with ChatGPT	Generally no	AI improved the email's grammar but a human conceived and executed the fraud. AI is incidental to the pathway to harm
A hospital deploys an AI triage tool that delays care for a patient who is later harmed	Yes	The AI system's classification directly influenced the clinical decision pathway
A hospital's electronic health record system crashes, delaying care	No	Software failure, but no AI or automated decision-making component in the pathway to harm
An employer uses an AI resume screener that systematically disadvantages candidates with disabilities	Yes	The AI system's learned biases are the mechanism of discrimination
An employer's HR department applies a manual policy that disadvantages candidates with disabilities	No	Discrimination occurred but no AI or automated decision-making system was involved
A government agency uses a fixed-score questionnaire to assess risk, and the tool produces biased outcomes	No	A deterministic scoring instrument with human-authored rules. The harm is real, but the system is not AI; its behavior is fully specified by its design
A government benefits system uses an ML model trained on historical data, and the model's learned correlations systematically disadvantage Indigenous applicants	Yes	The model's learned bias is the material factor; a rule-based system applying the same eligibility criteria would not produce the same discriminatory pattern
Published evaluations demonstrate that a foundation model can provide viable synthesis routes for a controlled biological agent, with accuracy comparable to expert knowledge	Yes (hazard)	Documented evidence of a precursor condition. AI capability materially lowers the barrier to CBRN harm. No incident has occurred, but the hazard is documentable with primary sources (model evaluations, regulatory assessments)

Canada nexus

A case has a Canada nexus if one or more of the following applies:

People, institutions, or infrastructure in Canada were materially affected
A Canadian organization developed, deployed, or operated the AI system
An international event has documented implications for Canadian systems, populations, or governance

Severity calibration

CAIM uses an ordinal severity scale. To ensure consistency across editors and over time, each level is anchored with operational criteria and reference examples.

Level	Criteria	Reference examples
Minor	Limited, easily reversible harm affecting a small number of individuals. No lasting consequences. Quickly corrected.	A chatbot gives incorrect but non-dangerous information; a recommendation system briefly shows irrelevant results
Moderate	Meaningful harm that is recoverable but required effort to correct. Affected a defined group or created measurable costs.	AI hiring tool screens out qualified candidates from a batch; autonomous vehicle testing proceeds without comprehensive safety framework
Significant	Substantial harm that is difficult to reverse. Affected a large group, created systemic risks, or triggered regulatory intervention.	Facial recognition deployed covertly at population scale; AI-generated deepfake disinformation targets election integrity
Severe	Serious harm to many individuals or institutions. Documented financial, psychological, or rights impacts at scale. Required major institutional response.	AI-generated CSAM at volume requiring law enforcement response; AI chatbot failures causing documented psychological harm at scale
Critical	Widespread, potentially irreversible harm. Loss of life, large-scale rights violations, or systemic institutional failure.	Autonomous weapons causing civilian casualties; AI system failure causing critical infrastructure collapse (no Canadian examples to date)

When severity is uncertain, records use "unknown" rather than guessing. Severity may be upgraded or downgraded as new information emerges; all changes are documented in the changelog.

Reach

Reach describes the scale of people or entities affected. Like severity, it uses an ordinal scale with "unknown" permitted.

Level	Criteria	Reference examples
Individual	One or a small number of identified individuals directly affected.	A single person denied a benefit by an automated system; a chatbot giving harmful advice to one user
Group	A defined group of people affected, typically dozens to hundreds sharing a common characteristic or context.	Applicants screened out by a biased hiring tool in a single recruitment round; patients at one hospital affected by a diagnostic AI error
Organization	An entire organization's operations or workforce materially affected.	A company's AI system breach exposing all employee data; an agency's automated workflow failing department-wide
Sector	Systemic impact across an industry or government sector, affecting multiple organizations or the sector's operational norms.	AI hiring tools creating sector-wide discrimination patterns; regulatory gaps affecting all health AI deployments nationally
Population	Society-wide or affecting a large portion of the Canadian population, or creating conditions that could.	Mass AI surveillance without legal authority; AI-generated election disinformation at national scale

The pipeline

CAIM operates through a seven-stage pipeline combining automated monitoring with human editorial review. Automated stages use large language models (LLMs) for triage and structured extraction; every record is reviewed by a human editor before publication.

1. Monitor

Automated polling of Canadian sources in English and French: news media, federal regulatory notices, legal databases, technology publications, and international incident databases. Sources are categorized by function:

Detection sources (scanned continuously): Canadian media, regulatory notices, court databases, incident databases (AIID)
Corroboration sources (pulled on demand during drafting): OPC decisions, Hansard, provincial commissioners
Context sources (referenced during drafting): OECD AIM, AIAAIC, AISI publications

In addition to automated monitoring, reports can be submitted directly through structured submissions or a confidential channel for sensitive cases requiring source protection or coordinated disclosure.

2. Filter

A keyword pre-filter (broad regex, deliberately over-inclusive, bilingual) reduces the volume of monitored items. Items passing the filter are enriched with full article text. This stage is deterministic and auditable — every keyword match is logged.

3. Triage

Each filtered item is assessed by an LLM for:

Scope: Does it involve material AI involvement and a Canada nexus?
Classification: Is it best treated as an incident or a hazard?
Deduplication: Does this relate to an existing record, or to other items from the same monitoring run?

All triage decisions are logged with reasoning. When multiple articles describe the same event, triage clusters them as multiple observations of a single incident rather than separate records.

4. Extract and resolve

For items that pass triage, the pipeline extracts a full structured record conforming to the CAIM schema and resolves all entity and system mentions against the existing registry. Entity and system resolution — matching mentions like "the federal privacy watchdog" to the correct registry entry — uses LLM world knowledge in place of traditional lookup infrastructure. Each resolution carries a confidence level and auditable reasoning.

5. Analyze

Three analytical assessments are produced, each structurally separate from the factual record (see EC-1):

Classification: taxonomy tags along standard axes (domain, harm type, AI pathway, lifecycle phase, systemic risk factor)
Control structure: governance adequacy assessment at each relevant tier (organizational, sectoral, provincial, federal, international)
Trajectory (hazards only): assessment of how the risk condition is evolving

6. Review

A human reviewer examines each draft record, including:

The full structured record with all fields populated
Source observations with full text
Entity and system resolution decisions (pre-filled for high confidence, flagged for medium or low)
Draft analyses (classification, control structure, trajectory)
Automated QA results from caim-eval, a mandatory quality gate that checks factuality, attribution, calibration, and internal consistency

No record is published without human editorial review. The reviewer can approve, revise, reject, or defer each item. For security-sensitive cases, a safety reviewer handles redaction decisions and coordinated disclosure.

7. Publish

Approved records are inserted into the database with all entity, system, and observation references. New entities and systems confirmed during review are added to the registry. The public site is rebuilt.

Records are published with a verification label, sources, taxonomy tags, and a version number. All subsequent changes produce new versions with a visible changelog. Where responsible publication requires delay, CAIM withholds the record until publication is safe, and publishes high-level defensive guidance in the interim where possible.

Corrections

If a record is materially inaccurate, it is corrected promptly and transparently. If core claims cannot be supported, records may be retracted with an explanation and preserved tombstone metadata. Appeals focus on factual accuracy and responsible publication.

The record format

Every published record has three layers, with a hard boundary between factual documentation and editorial analysis. The narrative describes what happened; analytical judgments (causal pathways, governance significance, risk assessment) are structurally separated in the assessment fields. This follows the model used in aviation safety investigation, where factual reports and causal analysis are distinct documents, and prevents editorial interpretation from embedding in the factual record. Most AI incident databases do not enforce this separation.

Narrative layer

A human-readable, compact account:

What happened (or almost happened)
Key dates and jurisdiction(s)
Who was affected (classes of stakeholders)
AI system context (what can be responsibly supported)
Observed harms or near-harms
What is known vs. alleged vs. uncertain

Evidence layer

Transparent sourcing:

Source list with dates and type (media, official, court, disclosure, academic)
Where useful, a claims table mapping specific claims to supporting sources and confidence notes

Structure layer

Taxonomy tags enabling search, filtering, and analysis:

Domain: finance, health, public services, education, critical infrastructure, elections/information integrity, etc.
Harm type: fraud/impersonation, privacy/data exposure, discrimination/rights impacts, safety failure, cyber incident, misinformation, operational failure, non-consensual imagery, etc.
AI involvement type: development flaw, deployment failure, misuse, supply-chain/tooling, human oversight breakdown, monitoring gap, etc.
Lifecycle phase: design, training, evaluation, deployment, monitoring, incident response
Severity and reach: ordinal scales calibrated with reference examples (see above); explicit "unknown" permitted
Jurisdiction level: federal, provincial/territorial, municipal, or multi-level, identifying which level of government has primary regulatory authority over the system or domain
Canada nexus basis: which nexus criteria are met

Policy recommendations

Where external authorities have proposed relevant measures, records include attributed policy recommendations tied to the specific case. Each recommendation cites its source. CAIM does not issue its own prescriptions.

Verification ladder

Records carry a verification status so readers can always assess how certain the information is.

Status	Meaning
Reported	Credible initial reporting; claims not yet independently corroborated
Corroborated	Supported by multiple independent credible sources
Confirmed	Supported by primary documentation or exceptionally strong corroboration
Contested	Credible dispute exists about core claims
Retracted	Core claims cannot be supported; withdrawn with explanation

CAIM distinguishes between what is known, what is alleged, and what remains uncertain. Where information is incomplete, CAIM publishes what is supportable and explicitly marks what is unknown.

Verification of hazard records

For incident records, the verification ladder assesses how well-established the facts of the event are. For hazard records, it assesses how well-documented the precursor condition is:

Reported: A credible source has identified the condition, but it has not been independently examined, e.g., a media report describing a regulatory gap.
Corroborated: Multiple independent credible sources document the condition, e.g., both a regulatory body and independent researchers have identified the same gap or precursor failure.
Confirmed: A primary authority has formally documented the condition, e.g., an official investigation, audit, or regulatory finding establishes the precursor condition as fact.

The verification status reflects the strength of evidence for the precursor condition itself, not a prediction of future harm. The risk assessment (how plausible the pathway to harm is and how severe the consequences could be) is a separate editorial judgment, stated transparently in the narrative, grounded in evidence where possible, and revisable as new information emerges. A hazard can be "confirmed" (the underlying condition is well-documented) while the severity of the risk remains uncertain or contested.

Taxonomy

CAIM's taxonomy is designed to be stable, interpretable, and interoperable. Records are coded along the dimensions described above (domain, harm type, AI involvement type, lifecycle phase, severity, jurisdiction level, nexus basis). The taxonomy is published and versioned; changes are documented.

Where feasible, CAIM aligns its fields with international incident-reporting frameworks, particularly the OECD AI Incidents Monitor and the AI Incident Database (AIID), to support comparability and institutional adoption.

Data model

CAIM's data model separates observations (base-level) from classification (taxonomy layers). A record is publishable with no taxonomy applied. All classification can evolve independently of the underlying observations.

Entity role primitives

Every entity referenced on a record carries one or more role primitives, a small, durable set of organizational relationships:

Developer: built, trained, or created the AI system
Deployer: put the system into operational use
Regulator: investigated, audited, or issued findings
Affected party: experienced harm or was subject to the system's decisions
Reporter: disclosed, documented, or reported the incident or hazard

These enable structured queries ("all deployers," "all incidents with regulator involvement") without relying on taxonomy. Entity pages aggregate records grouped by role.

Shared objects

The schema defines five shared objects that are referenced across incidents and hazards:

Observation: a piece of external evidence — a news article, government report, court filing, academic paper. The monitor curates these but does not create them. Each observation records its publisher, publication date, source type, and optionally what specific claim it supports.
Entity: an actor involved in AI risk situations — a company, regulator, court, institution, or individual. Shared across records.
System: an AI system — a model, product, or deployed service — linked to its developer entity. Shared across records.
Response: a governance action taken in relation to AI risk — legislation, investigation, court decision, guidance, enforcement. Each response records the actor, jurisdiction, date, status, and description.
Analysis: an analytical assessment attached to a record. Analyses are structurally separate from the factual record — they represent editorial judgment, not evidence.

Analysis approaches

Each record can have multiple analyses using different approaches. The approach determines what the analysis contains:

Classification: categorizes the record along taxonomy axes — domain, harm type, AI pathway, lifecycle phase, systemic risk factor. Each entry is tagged known or potential. Powers cross-record queries and risk clustering.
Control structure: CAIM's core analytical contribution. Evaluates governance adequacy at each relevant tier (organizational, sectoral, provincial, federal, international). Each tier is assessed as absent, partial, adequate, or overwhelmed, with reasoning. Each governance level appears at most once per analysis.
Trajectory: tracks how a risk condition is evolving over time. Typically attached to hazards. Addresses current status (active, mitigated, escalating, retired), supporting evidence, confidence, and triggers or mitigating factors observed.

Response tracking

Records track the governance feedback loop: what was done in response, by whom, and with what result. Each response entry includes the actor (linked to an entity), jurisdiction, date, action type, current status, and description. On incidents, this tracks investigation, enforcement, policy change, and litigation. On hazards, it tracks governance attention: reports published, consultations launched, legislation introduced.

This makes CAIM useful for policy analysis: which incidents led to regulatory action? What's the response rate? Which sectors have governance gaps?

Constraints and invariants

The schema enforces structural integrity through formal constraints:

EC-1: Fact and analysis are structurally separate. Updating an analysis never modifies the factual fields or observations.
EC-2: Materialization preserves both objects. When a hazard produces an incident, the hazard persists with its own identity and assessments unchanged.
EC-3: Approach-specific constraints — every classification entry carries a confidence tag; each governance level appears at most once per control structure analysis.

And invariants that hold across all state transitions:

INV-1: Verification must be supported by evidence. corroborated requires ≥2 observations from different publishers; confirmed requires ≥1 observation with an official, court, regulatory, or disclosure source type.
INV-2: Evidence is append-only. Observations are not removed except in cases of fabrication or legal liability.
INV-3: Published records are never deleted. They may be marked redacted but their identity and existence remain visible.

One-sided links

All relationships are declared on one side only. When an incident is linked to a hazard, the materialized_from reference is declared on the incident. The build step computes reverse lookups; the hazard page shows its linked incidents without storing them. This eliminates consistency rot as the record corpus grows.

Build-time integrity

The build step validates the entire record graph: slug references, taxonomy values, bilingual parity, assessment ordering, and relationship consistency. Broken references are build errors. Missing translations are warnings. No record can reference a nonexistent entity, system, or record.

Systemic risk analysis

CAIM's most distinctive analytical contribution is a methodology for connecting deployment-level incident patterns to catastrophic risk trajectories. This is the bridging analysis.

Systemic risk factors

Every record is tagged with zero or more systemic risk factors: structural properties of the failure that are relevant across risk scales. These are the dimensions that connect what happened in a specific Canadian AI deployment to what could happen at higher capability levels:

Factor	What it reveals
Loss of human control	System operated beyond human oversight capacity
Unexpected capability	System demonstrated behaviour outside design expectations
Resistance to correction	Institutional or technical barriers made correction difficult
Opacity	Decision process not interpretable by affected parties or overseers
Autonomous scope expansion	System's influence expanded beyond intended boundaries
Cascade propagation	Failure triggered failures elsewhere
Governance gap	No mechanism existed to prevent, detect, or respond
Accountability void	No entity bore clear responsibility
Concentration of power	Incident reflected or increased power asymmetry
Epistemic degradation	Incident undermined collective ability to assess truth or risk

The editorial question for each factor is: "Does this record demonstrate this structural property?", not "Is this the root cause?" Systemic risk factors describe what the record reveals about structural conditions.

Trajectory analysis

Hazard records carry trajectory analyses that track how the risk condition is evolving. Each trajectory assessment addresses:

Current status: active, mitigated, escalating, or retired
Supporting evidence: what observations support the status assessment
Confidence: how certain the assessment is (low, moderate, high)
Triggers and mitigating factors: what could cause the risk to escalate or recede

Whether a hazard has produced incidents is captured separately through the materialized_from links on incident records; a hazard's trajectory describes the state of the underlying condition, not whether incidents have occurred.

Derived quantities

CAIM computes aggregate patterns across the entire record corpus:

Governance gap distribution: from control structure analyses, how many records have each adequacy value at each governance level
Materialization rate: fraction of hazards referenced by at least one incident's materialized_from link
Response latency: time between the anchor date (when harm occurred or was identified) and the first linked governance response
Risk clusters: from classification analyses, records grouped by shared classification, control gap, entity, or system

Interoperability

OECD alignment

CAIM maintains two classification layers on every record: a CAIM native taxonomy (primary, richer, optimized for Canadian policy users) and an OECD AIM interoperability layer (optional, populated during editorial tagging). The two layers coexist without flattening; neither is redundant. This follows the pattern used in aviation safety, where national authorities maintain detailed classification systems while mapping to international codes for reporting.

Data exports include an OECD-compatible view that maps CAIM fields to the OECD schema. CAIM's editorial metadata (verification ladder, versioning, redaction flags, bilingual labels) is preserved in an extension namespace.

AIID alignment

CAIM adopts the AIID's conceptual split between incidents (canonical events) and reports (individual source documents). Records include optional AIID cross-reference identifiers where matches exist. CAIM's taxonomy provides a crosswalk to AIID taxonomy sets, with local tags (bilingual labels, Canada nexus, editorial metadata) maintained separately.

API

CAIM provides machine-readable access to the full record corpus, aggregate statistics, systemic risk analysis, taxonomy definitions, and a JSON Feed. All endpoints are static JSON, CORS-enabled, with no authentication required.

For full endpoint documentation, see the API reference.

Privacy and responsible publication

Naming policy

CAIM documents systemic patterns of AI harm. The analytical unit is the system, deployment, and governance gap, not the individuals affected. The identity of harmed individuals adds no analytical value; the circumstances of their harm do. Accordingly:

Private individuals who were harmed or affected are never named. They are described by role and relevant demographic context: "an Ontario recruiter," "a PEI retiree," "a 14-year-old user." This applies even when the individual's name is available in public reporting. Consent to a media interview is not consent to permanent inclusion in a structured, machine-readable incident database.
Organizations, AI systems, and public officials acting in official capacity are named, because institutional accountability requires it.
Professionals acting in professional capacity (lawyers, researchers, commissioners) may be named when their role is relevant to the record.
Criminal defendants named in public court proceedings are handled case-by-case. Minors are never named.
Civil litigants: case names (e.g., Moffatt v. Air Canada) may be referenced for legal traceability, but the individual is not profiled beyond what the legal record requires.

The test is: could this record, as a permanent structured entry in a public database, cause harm to the individual it describes? Sensitive personal details (mental health history, financial losses, disability status) are especially likely to follow a named individual. CAIM does not contribute to that outcome.

Responsible publication safeguards

Beyond the naming policy, CAIM follows additional safeguards:

Cases involving minors receive heightened protection; names are never published, and identifying details are minimized
Records avoid reproducing harmful content unnecessarily and use victim-centered language
For security-sensitive cases, CAIM follows coordinated disclosure norms: it prioritizes mitigation and safety, publishes high-level learning and defensive guidance, and withholds enabling details until risk is reduced
CAIM does not become a platform for harassment or reputational attacks; records are sourced, cautious in language, and focused on what happened and what can be learned