Purpose
Provide a consistent assurance layer that proves AI models meet functional, ethical, and safety expectations before they are exposed to customers or internal decision-makers. The policy ensures QA activities produce audit-ready evidence consumable by Engineering, Compliance, and regulators.
Scope
Applies to every AI model, model update, prompt library, or inference service entering pre-production or production environments, irrespective of whether the asset is built in-house or sourced from a vendor.
- Regression releases, fine-tuned checkpoints, and re-trained models
- Prompt libraries used in conversational assistants
- Real-time and batch inference services
- Shadow deployments used for A/B or canary experiments
Definitions
- Quality Assurance (QA): Independent verification activities that evaluate whether acceptance criteria have been met.
- Test Harness: Automated or semi-automated tooling used to execute evaluation suites.
- Quality Gate: Required metric or artifact threshold that must be satisfied before engineering can request release approval.
Policy
All AI releases must satisfy predefined quality gates covering functionality, data integrity, safety, fairness, resiliency, and guardrail behaviour. QA evidence must be stored in the model inventory with traceability to dataset versions, prompts, and evaluator configurations. Releases missing QA evidence are automatically rejected by the deployment pipeline.
Roles and Responsibilities
QA Lead owns the test strategy, publishes coverage targets, and signs the QA attestation. Engineering builds and maintains test harnesses. Responsible AI team supplies fairness and safety evaluation playbooks. Product Owner defines acceptance criteria and acknowledges any residual risk.
Procedures
QA must include the following components, tailored to risk classification:
- Functional testing: deterministic unit and integration tests with ≥95% pass rate.
- Performance and resiliency testing: latency, throughput, and stress tests aligned to SLOs.
- Safety and ethics evaluations: bias/fairness probes, red-team adversarial tests, prompt-injection defenses.
- Data quality checks: dataset drift analysis, schema validation, contamination and PII sweeps.
- Traceability package: link test outputs to model version, dataset IDs, evaluator configs, and commit hashes.
- Sign-off workflow: QA Lead signs attestation, Responsible AI signs fairness attestation, Product Owner acknowledges acceptance or raises remediation ticket.
Exceptions
Only the Head of QA may approve a reduction in coverage or deferred test execution. Each exception must include compensating controls (e.g., enhanced monitoring, shortened rollback triggers) and an expiration date not exceeding one release cycle.
Review Cadence
QA playbooks and coverage targets are reviewed quarterly. Metrics (failed gates, exceptions, post-release incidents linked to QA gaps) feed into continuous improvement actions maintained by the QA Council.
References
- ISO/IEC 42001:2023 Clause 8.4 (Operational control)
- NIST AI RMF Measure/Manage functions
- Internal documents: Model Validation & Testing SOP, Responsible AI Evaluation Playbook, Regression Automation Standards