1. Purpose
This policy establishes the controls for handling sensitive data in AI systems at [Organization Name]. It defines data classification levels, specifies the security measures required for each level, and confirms that sensitive information is protected throughout the AI lifecycle — from training data ingestion through model inference and output.
2. Scope
This policy applies to:
- All data classified as Confidential or Restricted that is used in or generated by AI systems.
- All personally identifiable information (PII) processed by AI systems.
- All special category data (health, biometric, financial, etc.) in AI contexts.
- All environments: development, testing, staging, and production.
- All employees, contractors, and third-party vendors handling sensitive AI data.
3. Data classification levels
| Level | Definition | Examples in AI context |
|---|---|---|
| Public | Information intended for public disclosure. No restrictions on access. | Published model cards, public documentation, anonymized benchmarks. |
| Internal | Information for internal use. Low risk if disclosed but not intended for public. | Non-sensitive training metrics, internal experiment logs, model architecture notes. |
| Confidential | Sensitive business or personal information. Disclosure could cause harm. | Customer data used for training, PII in inference inputs, proprietary model weights, business-sensitive predictions. |
| Restricted | Highly sensitive information. Disclosure could cause severe harm or regulatory breach. | Health records, biometric data, financial account data, credit scoring inputs, data covered by legal privilege. |
All datasets used in AI systems must be classified before use. Classification is performed by the Data Owner and reviewed by the Data Privacy Officer for datasets containing personal data.
4. Protection requirements by classification
| Control | Public | Internal | Confidential | Restricted |
|---|---|---|---|---|
| Encryption at rest | Optional | Recommended | Required (AES-256) | Required (AES-256) |
| Encryption in transit | Recommended | Required (TLS 1.2+) | Required (TLS 1.2+) | Required (TLS 1.3) |
| Access control | Open | Role-based | Role-based + approval | Named individuals + MFA |
| Audit logging | Optional | Recommended | Required | Required + real-time alerting |
| Data masking/anonymization | Not required | Not required | Required for non-production | Required for all environments |
| Retention review | Annual | Annual | Quarterly | Monthly |
| DLP monitoring | Not required | Recommended | Required | Required |
5. PII handling in AI systems
Placeholder. Populate with your organization's language for 5. PII handling in AI systems.
5.1 Discovery
Before data enters any AI pipeline, it must be scanned for PII using automated discovery tools. PII categories include but are not limited to: names, email addresses, phone numbers, national ID numbers, financial account numbers, health records, biometric identifiers, and location data.
5.2 Minimization
AI systems must use the minimum amount of PII necessary. Techniques to reduce PII exposure:
- Masking: Replace PII with functional placeholders (e.g., [EMAIL], [NAME]) that preserve data structure without exposing actual values.
- Redaction: Permanently remove PII fields that are not necessary for the AI task.
- Tokenization: Replace PII with reversible tokens stored in a secure vault, accessible only by authorized systems.
- Anonymization: Irreversibly transform data so individuals cannot be re-identified. Preferred for training data when personal identification is not required.
- Synthetic data: Generate artificial data that preserves statistical properties without containing real PII. Preferred for development and testing environments.
5.3 AI guardrails
Runtime guardrails must be configured to scan AI inputs and outputs for PII leakage. Guardrail actions:
- Block: Reject the request if PII is detected in input or output.
- Mask: Replace detected PII with placeholders before forwarding.
- Alert: Log the detection and notify the security team without blocking.
6. AI model and output security
- Proprietary model weights are classified as Confidential and must be encrypted at rest and access-controlled.
- Model outputs containing Confidential or Restricted data must be handled at the same classification level as the input data.
- AI-generated content must not be stored in systems with lower classification than the source data.
- Model extraction and inversion attacks must be considered in the threat model for high-value models.
7. Development and testing environments
- Production data classified as Confidential or Restricted must not be used in development or testing without anonymization, masking, or use of synthetic data.
- Development environments must have equivalent access controls to the data classification level they handle.
- Test datasets must be documented with their classification level and any transformations applied.
8. Third-party data handling
- Third-party AI providers handling Confidential or Restricted data must demonstrate equivalent security controls.
- Data Processing Agreements must specify classification handling requirements.
- Providers must not use sensitive data for their own model training.
- Data residency and sub-processor restrictions must be contractually enforced.
9. Incident response for sensitive data
If sensitive data is exposed through an AI system (prompt leakage, model memorization, unauthorized access):
- The incident must be reported immediately to the Security team and Data Privacy Officer.
- The AI system must be suspended pending investigation if the exposure is ongoing.
- GDPR Article 33 requires notification to the supervisory authority within 72 hours for personal data breaches.
- Affected individuals must be notified if the breach is likely to result in high risk to their rights (GDPR Article 34).
- Root cause analysis and remediation must be completed and documented.
10. Roles and responsibilities
| Role | Responsibilities |
|---|---|
| Data Owner | Classifies data, approves access, reviews retention, ensures classification is maintained. |
| Model Owner | Ensures AI system handles data at or above its classification level, configures guardrails. |
| Security | Implements encryption, DLP, access controls, and monitors for unauthorized access. |
| Data Privacy Officer | Reviews classification for personal data, advises on anonymization, handles breach notifications. |
| All employees | Handle data according to classification, report suspected data exposure. |
11. Regulatory alignment
- GDPR: Articles 5 (principles), 25 (privacy by design), 32 (security of processing), 33-34 (breach notification).
- EU AI Act: Article 10 (data governance), Article 15 (accuracy and robustness).
- ISO/IEC 27001: Annex A controls for access control, cryptography, and operations security.
- ISO/IEC 42001: Annex B (B.7 — data for AI systems).
12. Review
This policy is reviewed annually or sooner when triggered by data breaches, new data classification requirements, regulatory changes, or changes to AI processing activities.
Document control
| Field | Value |
|---|---|
| Policy owner | [CISO / Data Privacy Officer] |
| Approved by | [AI Governance Committee] |
| Effective date | [Date] |
| Next review date | [Date + 12 months] |
| Version | 1.0 |
| Classification | Internal |