Methodology
Scope
CAIM documents AI-related harm and the conditions that produce it, through two types of records:
- Incident records document discrete events where an AI system's development, deployment, or use produced harm or near-harm.
- Hazard records document structural conditions that create realistic pathways to harm, whether or not harm has already occurred.
These are different kinds of things: a hazard is an ongoing condition; an incident is a discrete event. A single hazard can produce many incidents over time, and the hazard persists even after incidents occur, just as a dangerous intersection remains a hazard after each collision. The materialized_from link on incident records captures this relationship: incidents are evidence that a hazard is real. This follows the model used in aviation safety, where crash investigations and voluntary hazard reports feed the same safety objective.
What systems are in scope
An AI system is one that uses machine learning, neural networks, foundation models, or systems built on these. This includes statistical models trained on data, deep learning, generative models, and hybrid systems incorporating such components.
Systems whose behavior is fully specified by human-authored rules (deterministic scoring instruments, structured questionnaires, rule-based automation, data extraction tools) are out of scope, even when described as "algorithmic" or "AI."
This definition will be revised as the technology evolves.
What is out of scope
- Rule-based systems, deterministic scoring instruments, structured questionnaires, and data extraction tools, even when deployed at scale in consequential decisions
- Simple automation (e.g., mail merge, spreadsheet macros, basic workflow triggers) where the system has no decision-making function and no plausible pathway to harm
- Purely theoretical risks with no documented evidence of a precursor condition
- Events where AI is mentioned incidentally but played no material role in the pathway to harm
Incidents
An event or series of events in which an AI system's development, deployment, or use is plausibly implicated in harm or a near-harm outcome. This includes materially AI-enabled misuse.
Hazards
A credible precursor condition, precursor failure, or near-miss pattern indicating a realistic pathway to harm, even if harm was prevented or has not yet been observed. Hazards are included because near-misses are often the most informative cases for prevention.
Hazard records require documented evidence of the precursor condition: a regulatory finding, an investigation, a published technical assessment, or equivalent. A policy gap alone is not sufficient; there must be evidence that the gap has created conditions where harm is plausible and proximate.
Material AI involvement
A case is in scope when an AI system is a meaningful factor in the pathway to harm, not merely incidental. The test: would this harm have occurred without the learned or emergent behaviour of the AI component? If the same outcome would result from a fixed rule, a lookup table, or a manual process, the AI is incidental.
Borderline cases, worked examples:
| Scenario | In scope? | Reasoning |
|---|---|---|
| AI-generated deepfake used to impersonate a CEO and authorize a fraudulent wire transfer | Yes | AI capability (voice cloning / image synthesis) is the enabling factor; the fraud could not have occurred at this fidelity without it |
| A phishing email written with ChatGPT | Generally no | AI improved the email's grammar but a human conceived and executed the fraud. AI is incidental to the pathway to harm |
| A hospital deploys an AI triage tool that delays care for a patient who is later harmed | Yes | The AI system's classification directly influenced the clinical decision pathway |
| A hospital's electronic health record system crashes, delaying care | No | Software failure, but no AI or automated decision-making component in the pathway to harm |
| An employer uses an AI resume screener that systematically disadvantages candidates with disabilities | Yes | The AI system's learned biases are the mechanism of discrimination |
| An employer's HR department applies a manual policy that disadvantages candidates with disabilities | No | Discrimination occurred but no AI or automated decision-making system was involved |
| A government agency uses a fixed-score questionnaire to assess risk, and the tool produces biased outcomes | No | A deterministic scoring instrument with human-authored rules. The harm is real, but the system is not AI; its behavior is fully specified by its design |
| A government benefits system uses an ML model trained on historical data, and the model's learned correlations systematically disadvantage Indigenous applicants | Yes | The model's learned bias is the material factor; a rule-based system applying the same eligibility criteria would not produce the same discriminatory pattern |
| Published evaluations demonstrate that a foundation model can provide viable synthesis routes for a controlled biological agent, with accuracy comparable to expert knowledge | Yes (hazard) | Documented evidence of a precursor condition. AI capability materially lowers the barrier to CBRN harm. No incident has occurred, but the hazard is documentable with primary sources (model evaluations, regulatory assessments) |
Canada nexus
A case has a Canada nexus if one or more of the following applies:
- People, institutions, or infrastructure in Canada were materially affected
- A Canadian organization developed, deployed, or operated the AI system
- An international event has documented implications for Canadian systems, populations, or governance
Severity calibration
CAIM uses an ordinal severity scale. To ensure consistency across editors and over time, each level is anchored with operational criteria and reference examples.
| Level | Criteria | Reference examples |
|---|---|---|
| Minor | Limited, easily reversible harm affecting a small number of individuals. No lasting consequences. Quickly corrected. | A chatbot gives incorrect but non-dangerous information; a recommendation system briefly shows irrelevant results |
| Moderate | Meaningful harm that is recoverable but required effort to correct. Affected a defined group or created measurable costs. | AI hiring tool screens out qualified candidates from a batch; autonomous vehicle testing proceeds without comprehensive safety framework |
| Significant | Substantial harm that is difficult to reverse. Affected a large group, created systemic risks, or triggered regulatory intervention. | Facial recognition deployed covertly at population scale; AI-generated deepfake disinformation targets election integrity |
| Severe | Serious harm to many individuals or institutions. Documented financial, psychological, or rights impacts at scale. Required major institutional response. | AI-generated CSAM at volume requiring law enforcement response; AI chatbot failures causing documented psychological harm at scale |
| Critical | Widespread, potentially irreversible harm. Loss of life, large-scale rights violations, or systemic institutional failure. | Autonomous weapons causing civilian casualties; AI system failure causing critical infrastructure collapse (no Canadian examples to date) |
When severity is uncertain, records use "unknown" rather than guessing. Severity may be upgraded or downgraded as new information emerges; all changes are documented in the changelog.
Reach
Reach describes the scale of people or entities affected. Like severity, it uses an ordinal scale with "unknown" permitted.
| Level | Criteria | Reference examples |
|---|---|---|
| Individual | One or a small number of identified individuals directly affected. | A single person denied a benefit by an automated system; a chatbot giving harmful advice to one user |
| Group | A defined group of people affected, typically dozens to hundreds sharing a common characteristic or context. | Applicants screened out by a biased hiring tool in a single recruitment round; patients at one hospital affected by a diagnostic AI error |
| Organization | An entire organization's operations or workforce materially affected. | A company's AI system breach exposing all employee data; an agency's automated workflow failing department-wide |
| Sector | Systemic impact across an industry or government sector, affecting multiple organizations or the sector's operational norms. | AI hiring tools creating sector-wide discrimination patterns; regulatory gaps affecting all health AI deployments nationally |
| Population | Society-wide or affecting a large portion of the Canadian population, or creating conditions that could. | Mass AI surveillance without legal authority; AI-generated election disinformation at national scale |
The pipeline
CAIM operates through a seven-stage pipeline combining automated monitoring with human editorial review. Automated stages use large language models (LLMs) for triage and structured extraction; every record is reviewed by a human editor before publication.
1. Monitor
Automated polling of Canadian sources in English and French: news media, federal regulatory notices, legal databases, technology publications, and international incident databases. Sources are categorized by function:
- Detection sources (scanned continuously): Canadian media, regulatory notices, court databases, incident databases (AIID)
- Corroboration sources (pulled on demand during drafting): OPC decisions, Hansard, provincial commissioners
- Context sources (referenced during drafting): OECD AIM, AIAAIC, AISI publications
In addition to automated monitoring, reports can be submitted directly through structured submissions or a confidential channel for sensitive cases requiring source protection or coordinated disclosure.
2. Filter
A keyword pre-filter (broad regex, deliberately over-inclusive, bilingual) reduces the volume of monitored items. Items passing the filter are enriched with full article text. This stage is deterministic and auditable — every keyword match is logged.
3. Triage
Each filtered item is assessed by an LLM for:
- Scope: Does it involve material AI involvement and a Canada nexus?
- Classification: Is it best treated as an incident or a hazard?
- Deduplication: Does this relate to an existing record, or to other items from the same monitoring run?
All triage decisions are logged with reasoning. When multiple articles describe the same event, triage clusters them as multiple observations of a single incident rather than separate records.
4. Extract and resolve
For items that pass triage, the pipeline extracts a full structured record conforming to the CAIM schema and resolves all entity and system mentions against the existing registry. Entity and system resolution — matching mentions like "the federal privacy watchdog" to the correct registry entry — uses LLM world knowledge in place of traditional lookup infrastructure. Each resolution carries a confidence level and auditable reasoning.
5. Analyze
Three analytical assessments are produced, each structurally separate from the factual record (see EC-1):
- Classification: taxonomy tags along standard axes (domain, harm type, AI pathway, lifecycle phase, systemic risk factor)
- Control structure: governance adequacy assessment at each relevant tier (organizational, sectoral, provincial, federal, international)
- Trajectory (hazards only): assessment of how the risk condition is evolving
6. Review
A human reviewer examines each draft record, including:
- The full structured record with all fields populated
- Source observations with full text
- Entity and system resolution decisions (pre-filled for high confidence, flagged for medium or low)
- Draft analyses (classification, control structure, trajectory)
- Automated QA results from caim-eval, a mandatory quality gate that checks factuality, attribution, calibration, and internal consistency
No record is published without human editorial review. The reviewer can approve, revise, reject, or defer each item. For security-sensitive cases, a safety reviewer handles redaction decisions and coordinated disclosure.
7. Publish
Approved records are inserted into the database with all entity, system, and observation references. New entities and systems confirmed during review are added to the registry. The public site is rebuilt.
Records are published with a verification label, sources, taxonomy tags, and a version number. All subsequent changes produce new versions with a visible changelog. Where responsible publication requires delay, CAIM withholds the record until publication is safe, and publishes high-level defensive guidance in the interim where possible.
Corrections
If a record is materially inaccurate, it is corrected promptly and transparently. If core claims cannot be supported, records may be retracted with an explanation and preserved tombstone metadata. Appeals focus on factual accuracy and responsible publication.
The record format
Every published record has three layers, with a hard boundary between factual documentation and editorial analysis. The narrative describes what happened; analytical judgments (causal pathways, governance significance, risk assessment) are structurally separated in the assessment fields. This follows the model used in aviation safety investigation, where factual reports and causal analysis are distinct documents, and prevents editorial interpretation from embedding in the factual record. Most AI incident databases do not enforce this separation.
Narrative layer
A human-readable, compact account:
- What happened (or almost happened)
- Key dates and jurisdiction(s)
- Who was affected (classes of stakeholders)
- AI system context (what can be responsibly supported)
- Observed harms or near-harms
- What is known vs. alleged vs. uncertain
Evidence layer
Transparent sourcing:
- Source list with dates and type (media, official, court, disclosure, academic)
- Where useful, a claims table mapping specific claims to supporting sources and confidence notes
Structure layer
Taxonomy tags enabling search, filtering, and analysis:
- Domain: finance, health, public services, education, critical infrastructure, elections/information integrity, etc.
- Harm type: fraud/impersonation, privacy/data exposure, discrimination/rights impacts, safety failure, cyber incident, misinformation, operational failure, non-consensual imagery, etc.
- AI involvement type: development flaw, deployment failure, misuse, supply-chain/tooling, human oversight breakdown, monitoring gap, etc.
- Lifecycle phase: design, training, evaluation, deployment, monitoring, incident response
- Severity and reach: ordinal scales calibrated with reference examples (see above); explicit "unknown" permitted
- Jurisdiction level: federal, provincial/territorial, municipal, or multi-level, identifying which level of government has primary regulatory authority over the system or domain
- Canada nexus basis: which nexus criteria are met
Policy recommendations
Where external authorities have proposed relevant measures, records include attributed policy recommendations tied to the specific case. Each recommendation cites its source. CAIM does not issue its own prescriptions.
Verification ladder
Records carry a verification status so readers can always assess how certain the information is.
| Status | Meaning |
|---|---|
| Reported | Credible initial reporting; claims not yet independently corroborated |
| Corroborated | Supported by multiple independent credible sources |
| Confirmed | Supported by primary documentation or exceptionally strong corroboration |
| Contested | Credible dispute exists about core claims |
| Retracted | Core claims cannot be supported; withdrawn with explanation |
CAIM distinguishes between what is known, what is alleged, and what remains uncertain. Where information is incomplete, CAIM publishes what is supportable and explicitly marks what is unknown.
Verification of hazard records
For incident records, the verification ladder assesses how well-established the facts of the event are. For hazard records, it assesses how well-documented the precursor condition is:
- Reported: A credible source has identified the condition, but it has not been independently examined, e.g., a media report describing a regulatory gap.
- Corroborated: Multiple independent credible sources document the condition, e.g., both a regulatory body and independent researchers have identified the same gap or precursor failure.
- Confirmed: A primary authority has formally documented the condition, e.g., an official investigation, audit, or regulatory finding establishes the precursor condition as fact.
The verification status reflects the strength of evidence for the precursor condition itself, not a prediction of future harm. The risk assessment (how plausible the pathway to harm is and how severe the consequences could be) is a separate editorial judgment, stated transparently in the narrative, grounded in evidence where possible, and revisable as new information emerges. A hazard can be "confirmed" (the underlying condition is well-documented) while the severity of the risk remains uncertain or contested.
Taxonomy
CAIM's taxonomy is designed to be stable, interpretable, and interoperable. Records are coded along the dimensions described above (domain, harm type, AI involvement type, lifecycle phase, severity, jurisdiction level, nexus basis). The taxonomy is published and versioned; changes are documented.
Where feasible, CAIM aligns its fields with international incident-reporting frameworks, particularly the OECD AI Incidents Monitor and the AI Incident Database (AIID), to support comparability and institutional adoption.
Data model
CAIM's data model separates observations (base-level) from classification (taxonomy layers). A record is publishable with no taxonomy applied. All classification can evolve independently of the underlying observations.
Entity role primitives
Every entity referenced on a record carries one or more role primitives, a small, durable set of organizational relationships:
- Developer: built, trained, or created the AI system
- Deployer: put the system into operational use
- Regulator: investigated, audited, or issued findings
- Affected party: experienced harm or was subject to the system's decisions
- Reporter: disclosed, documented, or reported the incident or hazard
These enable structured queries ("all deployers," "all incidents with regulator involvement") without relying on taxonomy. Entity pages aggregate records grouped by role.
Shared objects
The schema defines five shared objects that are referenced across incidents and hazards:
- Observation: a piece of external evidence — a news article, government report, court filing, academic paper. The monitor curates these but does not create them. Each observation records its publisher, publication date, source type, and optionally what specific claim it supports.
- Entity: an actor involved in AI risk situations — a company, regulator, court, institution, or individual. Shared across records.
- System: an AI system — a model, product, or deployed service — linked to its developer entity. Shared across records.
- Response: a governance action taken in relation to AI risk — legislation, investigation, court decision, guidance, enforcement. Each response records the actor, jurisdiction, date, status, and description.
- Analysis: an analytical assessment attached to a record. Analyses are structurally separate from the factual record — they represent editorial judgment, not evidence.
Analysis approaches
Each record can have multiple analyses using different approaches. The approach determines what the analysis contains:
- Classification: categorizes the record along taxonomy axes — domain, harm type, AI pathway, lifecycle phase, systemic risk factor. Each entry is tagged
knownorpotential. Powers cross-record queries and risk clustering. - Control structure: CAIM's core analytical contribution. Evaluates governance adequacy at each relevant tier (organizational, sectoral, provincial, federal, international). Each tier is assessed as
absent,partial,adequate, oroverwhelmed, with reasoning. Each governance level appears at most once per analysis. - Trajectory: tracks how a risk condition is evolving over time. Typically attached to hazards. Addresses current status (active, mitigated, escalating, retired), supporting evidence, confidence, and triggers or mitigating factors observed.
Response tracking
Records track the governance feedback loop: what was done in response, by whom, and with what result. Each response entry includes the actor (linked to an entity), jurisdiction, date, action type, current status, and description. On incidents, this tracks investigation, enforcement, policy change, and litigation. On hazards, it tracks governance attention: reports published, consultations launched, legislation introduced.
This makes CAIM useful for policy analysis: which incidents led to regulatory action? What's the response rate? Which sectors have governance gaps?
Constraints and invariants
The schema enforces structural integrity through formal constraints:
- EC-1: Fact and analysis are structurally separate. Updating an analysis never modifies the factual fields or observations.
- EC-2: Materialization preserves both objects. When a hazard produces an incident, the hazard persists with its own identity and assessments unchanged.
- EC-3: Approach-specific constraints — every classification entry carries a confidence tag; each governance level appears at most once per control structure analysis.
And invariants that hold across all state transitions:
- INV-1: Verification must be supported by evidence.
corroboratedrequires ≥2 observations from different publishers;confirmedrequires ≥1 observation with an official, court, regulatory, or disclosure source type. - INV-2: Evidence is append-only. Observations are not removed except in cases of fabrication or legal liability.
- INV-3: Published records are never deleted. They may be marked redacted but their identity and existence remain visible.
One-sided links
All relationships are declared on one side only. When an incident is linked to a hazard, the materialized_from reference is declared on the incident. The build step computes reverse lookups; the hazard page shows its linked incidents without storing them. This eliminates consistency rot as the record corpus grows.
Build-time integrity
The build step validates the entire record graph: slug references, taxonomy values, bilingual parity, assessment ordering, and relationship consistency. Broken references are build errors. Missing translations are warnings. No record can reference a nonexistent entity, system, or record.
Systemic risk analysis
CAIM's most distinctive analytical contribution is a methodology for connecting deployment-level incident patterns to catastrophic risk trajectories. This is the bridging analysis.
Systemic risk factors
Every record is tagged with zero or more systemic risk factors: structural properties of the failure that are relevant across risk scales. These are the dimensions that connect what happened in a specific Canadian AI deployment to what could happen at higher capability levels:
| Factor | What it reveals |
|---|---|
| Loss of human control | System operated beyond human oversight capacity |
| Unexpected capability | System demonstrated behaviour outside design expectations |
| Resistance to correction | Institutional or technical barriers made correction difficult |
| Opacity | Decision process not interpretable by affected parties or overseers |
| Autonomous scope expansion | System's influence expanded beyond intended boundaries |
| Cascade propagation | Failure triggered failures elsewhere |
| Governance gap | No mechanism existed to prevent, detect, or respond |
| Accountability void | No entity bore clear responsibility |
| Concentration of power | Incident reflected or increased power asymmetry |
| Epistemic degradation | Incident undermined collective ability to assess truth or risk |
The editorial question for each factor is: "Does this record demonstrate this structural property?", not "Is this the root cause?" Systemic risk factors describe what the record reveals about structural conditions.
Trajectory analysis
Hazard records carry trajectory analyses that track how the risk condition is evolving. Each trajectory assessment addresses:
- Current status: active, mitigated, escalating, or retired
- Supporting evidence: what observations support the status assessment
- Confidence: how certain the assessment is (low, moderate, high)
- Triggers and mitigating factors: what could cause the risk to escalate or recede
Whether a hazard has produced incidents is captured separately through the materialized_from links on incident records; a hazard's trajectory describes the state of the underlying condition, not whether incidents have occurred.
Derived quantities
CAIM computes aggregate patterns across the entire record corpus:
- Governance gap distribution: from control structure analyses, how many records have each adequacy value at each governance level
- Materialization rate: fraction of hazards referenced by at least one incident's
materialized_fromlink - Response latency: time between the anchor date (when harm occurred or was identified) and the first linked governance response
- Risk clusters: from classification analyses, records grouped by shared classification, control gap, entity, or system
Interoperability
OECD alignment
CAIM maintains two classification layers on every record: a CAIM native taxonomy (primary, richer, optimized for Canadian policy users) and an OECD AIM interoperability layer (optional, populated during editorial tagging). The two layers coexist without flattening; neither is redundant. This follows the pattern used in aviation safety, where national authorities maintain detailed classification systems while mapping to international codes for reporting.
Data exports include an OECD-compatible view that maps CAIM fields to the OECD schema. CAIM's editorial metadata (verification ladder, versioning, redaction flags, bilingual labels) is preserved in an extension namespace.
AIID alignment
CAIM adopts the AIID's conceptual split between incidents (canonical events) and reports (individual source documents). Records include optional AIID cross-reference identifiers where matches exist. CAIM's taxonomy provides a crosswalk to AIID taxonomy sets, with local tags (bilingual labels, Canada nexus, editorial metadata) maintained separately.
API
CAIM provides machine-readable access to the full record corpus, aggregate statistics, systemic risk analysis, taxonomy definitions, and a JSON Feed. All endpoints are static JSON, CORS-enabled, with no authentication required.
For full endpoint documentation, see the API reference.
Privacy and responsible publication
Naming policy
CAIM documents systemic patterns of AI harm. The analytical unit is the system, deployment, and governance gap, not the individuals affected. The identity of harmed individuals adds no analytical value; the circumstances of their harm do. Accordingly:
- Private individuals who were harmed or affected are never named. They are described by role and relevant demographic context: "an Ontario recruiter," "a PEI retiree," "a 14-year-old user." This applies even when the individual's name is available in public reporting. Consent to a media interview is not consent to permanent inclusion in a structured, machine-readable incident database.
- Organizations, AI systems, and public officials acting in official capacity are named, because institutional accountability requires it.
- Professionals acting in professional capacity (lawyers, researchers, commissioners) may be named when their role is relevant to the record.
- Criminal defendants named in public court proceedings are handled case-by-case. Minors are never named.
- Civil litigants: case names (e.g., Moffatt v. Air Canada) may be referenced for legal traceability, but the individual is not profiled beyond what the legal record requires.
The test is: could this record, as a permanent structured entry in a public database, cause harm to the individual it describes? Sensitive personal details (mental health history, financial losses, disability status) are especially likely to follow a named individual. CAIM does not contribute to that outcome.
Responsible publication safeguards
Beyond the naming policy, CAIM follows additional safeguards:
- Cases involving minors receive heightened protection; names are never published, and identifying details are minimized
- Records avoid reproducing harmful content unnecessarily and use victim-centered language
- For security-sensitive cases, CAIM follows coordinated disclosure norms: it prioritizes mitigation and safety, publishes high-level learning and defensive guidance, and withholds enabling details until risk is reduced
- CAIM does not become a platform for harassment or reputational attacks; records are sourced, cautious in language, and focused on what happened and what can be learned