AI Quality Assurance Policy

Purpose

Provide a consistent assurance layer that proves AI models meet functional, ethical, and safety expectations before they are exposed to customers or internal decision-makers. The policy ensures QA activities produce audit-ready evidence consumable by Engineering, Compliance, and regulators.

Scope

Applies to every AI model, model update, prompt library, or inference service entering pre-production or production environments, irrespective of whether the asset is built in-house or sourced from a vendor.

Regression releases, fine-tuned checkpoints, and re-trained models
Prompt libraries used in conversational assistants
Real-time and batch inference services
Shadow deployments used for A/B or canary experiments

Definitions

Quality Assurance (QA): Independent verification activities that evaluate whether acceptance criteria have been met.
Test Harness: Automated or semi-automated tooling used to execute evaluation suites.
Quality Gate: Required metric or artifact threshold that must be satisfied before engineering can request release approval.

Policy

All AI releases must satisfy predefined quality gates covering functionality, data integrity, safety, fairness, resiliency, and guardrail behaviour. QA evidence must be stored in the model inventory with traceability to dataset versions, prompts, and evaluator configurations. Releases missing QA evidence are automatically rejected by the deployment pipeline.

Roles and Responsibilities

QA Lead owns the test strategy, publishes coverage targets, and signs the QA attestation. Engineering builds and maintains test harnesses. Responsible AI team supplies fairness and safety evaluation playbooks. Product Owner defines acceptance criteria and acknowledges any residual risk.

Procedures

QA must include the following components, tailored to risk classification:

Functional testing: deterministic unit and integration tests with ≥95% pass rate.
Performance and resiliency testing: latency, throughput, and stress tests aligned to SLOs.
Safety and ethics evaluations: bias/fairness probes, red-team adversarial tests, prompt-injection defenses.
Data quality checks: dataset drift analysis, schema validation, contamination and PII sweeps.
Traceability package: link test outputs to model version, dataset IDs, evaluator configs, and commit hashes.
Sign-off workflow: QA Lead signs attestation, Responsible AI signs fairness attestation, Product Owner acknowledges acceptance or raises remediation ticket.

Exceptions

Only the Head of QA may approve a reduction in coverage or deferred test execution. Each exception must include compensating controls (e.g., enhanced monitoring, shortened rollback triggers) and an expiration date not exceeding one release cycle.

Review Cadence

QA playbooks and coverage targets are reviewed quarterly. Metrics (failed gates, exceptions, post-release incidents linked to QA gaps) feed into continuous improvement actions maintained by the QA Council.

References

ISO/IEC 42001:2023 Clause 8.4 (Operational control)
NIST AI RMF Measure/Manage functions
Internal documents: Model Validation & Testing SOP, Responsible AI Evaluation Playbook, Regression Automation Standards