Purpose
Ensure every dataset used for AI is handled lawfully, ethically, and transparently by codifying requirements for consent, minimization, retention, and cross-border transfer.
Scope
Encompasses all personal, sensitive, or proprietary data sourced for AI training, validation, testing, or inference, including third-party data feeds and synthetic datasets built from regulated sources.
- Customer and employee personal data (PII/PHI/PCI)
- Supplier and partner data ingested into AI pipelines
- Telemetry or behavioral data captured via products
- Third-party datasets purchased or licensed for AI
Definitions
- Lawful Basis: Legal grounds (consent, contract, legitimate interest) authorizing data processing.
- Data Use Case Register: Inventory mapping datasets to AI use cases and legal basis.
- Data Minimization: Collecting only data necessary to achieve a defined AI purpose.
Policy
All AI data usage must have a documented legal basis, approved retention schedule, and minimization test. Data may not be repurposed for new AI initiatives without reassessing lawful basis and notifying affected parties when required. Sensitive data must be masked or pseudonymized before training unless an exemption is formally approved.
Roles and Responsibilities
Data Protection Officer (DPO) approves data use cases and maintains the register. Data Stewards document lineage and consent proofs. Engineering enforces minimization and masking controls. Legal reviews cross-border transfers and contractual restrictions.
Procedures
Teams must complete the following steps before ingesting data into AI workloads:
- Submit a data use case entry specifying purpose, lawful basis, data categories, and retention period.
- Perform minimization checklists documenting why each attribute is required.
- Attach consent records or contract clauses proving authorization.
- Run privacy risk assessment for sensitive datasets, including masking/pseudonymization plans.
- Obtain DPO approval before data leaves the originating jurisdiction or is shared with vendors.
- Review and refresh approvals annually or when requirements change.
Exceptions
Emergency analytics needs may receive provisional approval from the DPO for up to 30 days, provided masking is enabled and no production deployment occurs until the full review is complete.
Review Cadence
Data use cases are re-certified annually, ensuring lawful basis, retention, and minimization remain valid. Metrics (expired approvals, unauthorized datasets detected) are reported to the privacy steering committee.
References
- GDPR Articles 5-6 (Lawfulness, fairness, transparency, purpose limitation)
- EU AI Act Article 10 (Data and data governance)
- Internal documents: Data Governance Standard, Consent Management SOP, Cross-Border Data Transfer Policy