Purpose
Prevent prompt-based attacks (injection, data exfiltration, jailbreaks) by defining standards for prompt design, sanitization, and runtime guardrails across conversational and generative AI applications.
Scope
Applies to all prompt-driven interfaces exposed to employees, partners, or customers, as well as internal agent frameworks that rely on prompts to orchestrate actions.
- Customer support chatbots and knowledge assistants
- Internal copilots and code-generation tools
- Agent frameworks interacting with external APIs or tools
- Shared prompt templates and prompt libraries
Definitions
- Prompt Injection: Attack where user input manipulates model instructions to disclose secrets or perform unintended actions.
- System Prompt: Non-user visible instruction set controlling model behaviour.
- Guardrail Prompt: Supplemental prompt that enforces safety boundaries or refusal logic.
Policy
All prompts must pass a security review prior to deployment. User inputs must be sanitized, sensitive instructions must be isolated, and guardrail prompts must be applied for every interaction. Runtime monitoring must detect and block malicious prompt activity.
Roles and Responsibilities
Application Security Lead curates prompt security standards and approves reviews. Engineering implements sanitization libraries and integrates guardrail services. Responsible AI defines behavioural boundaries and escalation criteria. Security Operations monitors alerts and coordinates incident response when violations occur.
Procedures
Prompt hardening must include:
- Prompt design review documenting goals, constraints, and disallowed behaviours.
- Input sanitation pipeline removing embedded instructions, HTML/Markdown exploits, and sensitive data patterns.
- Isolation of system prompts and secrets in secure storage instead of embedding them in user-visible prompts.
- Automated red teaming against prompt injection, jailbreaks, and context-hijacking scenarios.
- Runtime guardrails that inspect inputs/outputs and enforce refusal or rollback when violations are detected.
- Audit logging of prompt interactions for forensic review.
Exceptions
Prototype prompts may run with reduced guardrails inside sandbox environments only. Production rollout requires full control coverage.
Review Cadence
Prompt libraries undergo quarterly reviews to remove obsolete prompts, incorporate new intelligence, and verify guardrail effectiveness.
References
- OWASP LLM Top 10 (Prompt Injection)
- NIST AI RMF Govern/Manage functions
- Internal documents: Prompt Hardening Guide, Guardrail Service Runbook, Secure Coding Standard