This site is a work-in-progress prototype.
Active Confidence: medium Potential severity: Significant Version 1

Foundation models trained on scraped Canadian personal data create permanent records that cannot be corrected, generate false biographical claims about identifiable individuals, and operate beyond the effective reach of Canadian privacy law — representing a structural challenge to privacy rights that existing legislation was not designed to address.

Identified: April 1, 2023 Last assessed: March 8, 2026

Description

Foundation models are trained on data scraped from the internet including personal information of millions of Canadians — published without their knowledge, consent, or meaningful opt-out. The Office of the Privacy Commissioner of Canada and provincial counterparts have launched a joint investigation into OpenAI’s ChatGPT, examining whether the company’s training data practices violate Canadian privacy law and whether the generation of false biographical information about identifiable Canadians constitutes a privacy violation.

The structural challenge extends beyond any single company. Large language models embed personal information in model parameters during training in a way that makes targeted deletion technically infeasible with current methods. Traditional privacy remedies — the right to access, correct, or delete personal information — cannot be meaningfully exercised against information encoded in model weights. PIPEDA and provincial privacy legislation were designed for databases, not neural networks.

The jurisdictional dimension compounds the challenge. Foundation model training happens extraterritorially, primarily in the United States. Canadian privacy authorities can investigate and issue findings, but enforcement against foreign companies operating through cloud services requires international cooperation that current frameworks do not adequately support. This is not an edge case — it is the default condition for all Canadians whose information appears in foundation model training data.

Risk Pathway

Foundation models are trained on data scraped from the internet including personal information of Canadians — published without knowledge, consent, or meaningful opt-out. Once embedded in model weights, this data cannot be fully removed or corrected. The models then generate false biographical information about identifiable Canadians, presenting fabrications as fact. PIPEDA and provincial privacy legislation were not designed for this paradigm: the data collection happens extraterritorially, the "processing" is inseparable from the model itself, and traditional privacy remedies (deletion, correction) are technically infeasible for information encoded in model parameters. Canadian privacy authorities have limited enforcement capacity against foreign AI developers.

Assessment History

Active Confidence: medium Significant

The Office of the Privacy Commissioner of Canada and provincial counterparts launched a joint investigation into OpenAI's ChatGPT examining whether training data practices violate Canadian privacy law and whether the generation of false biographical information about identifiable Canadians constitutes a privacy violation. The investigation is ongoing. No regulatory finding has been issued. The structural challenge — extraterritorial data collection embedded in model weights beyond effective domestic remedy — applies to all foundation model developers, not only OpenAI.

Initial assessment. Investigation ongoing. Status active pending regulatory findings.

Triggers

  • Increasing scale and comprehensiveness of training datasets
  • New foundation models trained on ever-larger data collections
  • Growing public reliance on LLMs for information about individuals
  • AI companies asserting broad fair use or legitimate interest defenses

Mitigating Factors

  • Joint privacy investigation creating regulatory scrutiny
  • EU AI Act and GDPR creating international pressure for training data transparency
  • Growing technical research on machine unlearning
  • Public awareness of AI confabulation risks

Risk Controls

  • Privacy framework adapted for foundation model training, addressing extraterritorial data collection and the technical infeasibility of traditional remedies
  • Right to effective correction of AI-generated false biographical information
  • Transparency requirements for training data provenance and composition
  • Jurisdictional enforcement capacity against foreign AI developers operating in Canada
  • Consent or legitimate interest requirements for inclusion of personal data in training datasets
  • Technical standards for machine unlearning to enable meaningful data deletion

Affected Populations

  • Canadians whose personal information was scraped for model training
  • Individuals about whom models generate false biographical information
  • Public figures disproportionately affected by AI-generated false claims

Entities Involved

OpenAI
developer

Developed ChatGPT, subject of joint privacy investigation by federal and provincial commissioners

Leading joint investigation examining whether OpenAI violated Canadian privacy law through training data practices and confabulated personal information

AI Systems Involved

ChatGPT

LLM trained on internet-scraped data including Canadian personal information; generates false biographical claims about identifiable individuals

Responses

Office of the Privacy Commissioner of Canada

Launched joint investigation with provincial privacy commissioners into OpenAI's ChatGPT

Taxonomy

Domain
TelecommunicationsPublic Services
Harm type
Privacy & Data ExposureMisinformation
AI involvement
Training Data IssueModel Confabulation
Lifecycle phase
Data CollectionTrainingDeployment

Sources

  1. Joint investigation of ChatGPT by the Privacy Commissioner of Canada and provincial counterparts Regulatory — Office of the Privacy Commissioner of Canada (Jan 25, 2024)
  2. Privacy in the Age of Generative AI Official — Office of the Privacy Commissioner of Canada (Feb 7, 2024)
  3. Machine Unlearning: A Survey Academic — SSRN (Sep 1, 2023)

Changelog

VersionDateChange
v1 Mar 8, 2026 Initial publication