Pilot phase: CAIM is under construction. Records are provisional, based on public sources, and have not yet been peer-reviewed. Feedback welcome.
Active Significant Confidence: medium

Foundation models trained on scraped Canadian data create permanent, uncorrectable records and generate false claims about real people — not currently addressed by Canadian privacy law.

Identified: April 1, 2023 Last assessed: March 8, 2026

Foundation models are trained on data scraped from the internet including personal information of millions of Canadians — published without their knowledge, consent, or meaningful opt-out. The Office of the Privacy Commissioner of Canada and provincial counterparts have launched a joint investigation into OpenAI's ChatGPT, examining whether the company's training data practices violate Canadian privacy law and whether the generation of false biographical information about identifiable Canadians constitutes a privacy violation.

The structural challenge extends beyond any single company. Large language models embed personal information in model parameters during training in a way that makes targeted deletion technically infeasible with current methods. Traditional privacy remedies — the right to access, correct, or delete personal information — cannot be meaningfully exercised against information encoded in model weights. PIPEDA and provincial privacy legislation were designed for databases, not neural networks.

The jurisdictional dimension compounds the challenge. Foundation model training happens extraterritorially, primarily in the United States. Canadian privacy authorities can investigate and issue findings, but enforcement against foreign companies operating through cloud services requires international cooperation that current frameworks do not adequately support. This is not an edge case — it is the default condition for all Canadians whose information appears in foundation model training data.

Materialized Incidents

Harms

Foundation models trained on internet-scraped data include personal information of millions of Canadians — published without knowledge, consent, or meaningful opt-out. Once embedded in model weights, this data cannot be fully removed or corrected.

Privacy & Data ExposureSignificantPopulation

AI models generate false biographical information about identifiable Canadians, presenting fabricated claims as factual. The Google AI Overview defamation case (MacIsaac v. Google) demonstrates that AI-generated false statements cause reputational harm with no effective correction mechanism.

Privacy & Data ExposureMisinformationModeratePopulation

Evidence

3 reports

  1. Regulatory — Office of the Privacy Commissioner of Canada (Jan 25, 2024)

    Privacy commissioners investigating whether OpenAI violated Canadian privacy law through data scraping and confabulation

  2. Academic — SSRN (Sep 1, 2023)

    Technical infeasibility of targeted data deletion from model weights with current methods

  3. Official — Office of the Privacy Commissioner of Canada (Feb 7, 2024)

    Privacy Commissioner's analysis of generative AI challenges for Canadian privacy law

Record details

Responses & Outcomes

Office of the Privacy Commissioner of CanadainvestigationActive

Launched joint investigation with provincial privacy commissioners into OpenAI's ChatGPT

Policy Recommendationsassessed

Privacy framework adapted for foundation model training, addressing extraterritorial data collection and the technical infeasibility of traditional remedies

Office of the Privacy Commissioner of Canada (Feb 7, 2024)

Right to effective correction of AI-generated false biographical information

Office of the Privacy Commissioner of Canada (Jan 25, 2024)

Transparency requirements for training data provenance and composition

Office of the Privacy Commissioner of Canada (Feb 7, 2024)

Jurisdictional enforcement capacity against foreign AI developers operating in Canada

Office of the Privacy Commissioner of Canada (Feb 7, 2024)

Editorial Assessment assessed

Foundation models trained on data scraped from the internet include personal information of Canadians. Once embedded in model weights, this data cannot be selectively removed or corrected. The OPC and provincial counterparts have launched a joint investigation into OpenAI's data practices. Existing privacy legislation was designed for traditional data collection and storage, and its application to foundation model training presents unresolved legal and technical questions.

Entities Involved

AI Systems Involved

ChatGPT

LLM trained on internet-scraped data including Canadian personal information; generates false biographical claims about identifiable individuals

Related Records

Taxonomyassessed

Domain
TelecommunicationsPublic Services
Harm type
Privacy & Data ExposureMisinformation
AI pathway
Training Data OriginConfabulation
Lifecycle phase
Data CollectionTrainingDeployment

Changelog

Changelog
VersionDateChange
v1Mar 8, 2026Initial publication

Version 1