Prompt Security and Prompt Hardening Policy

Purpose

Prevent prompt-based attacks (injection, data exfiltration, jailbreaks) by defining standards for prompt design, sanitization, and runtime guardrails across conversational and generative AI applications.

Scope

Applies to all prompt-driven interfaces exposed to employees, partners, or customers, as well as internal agent frameworks that rely on prompts to orchestrate actions.

Customer support chatbots and knowledge assistants
Internal copilots and code-generation tools
Agent frameworks interacting with external APIs or tools
Shared prompt templates and prompt libraries

Definitions

Prompt Injection: Attack where user input manipulates model instructions to disclose secrets or perform unintended actions.
System Prompt: Non-user visible instruction set controlling model behaviour.
Guardrail Prompt: Supplemental prompt that enforces safety boundaries or refusal logic.

Policy

All prompts must pass a security review prior to deployment. User inputs must be sanitized, sensitive instructions must be isolated, and guardrail prompts must be applied for every interaction. Runtime monitoring must detect and block malicious prompt activity.

Roles and Responsibilities

Application Security Lead curates prompt security standards and approves reviews. Engineering implements sanitization libraries and integrates guardrail services. Responsible AI defines behavioural boundaries and escalation criteria. Security Operations monitors alerts and coordinates incident response when violations occur.

Procedures

Prompt hardening must include:

Prompt design review documenting goals, constraints, and disallowed behaviours.
Input sanitation pipeline removing embedded instructions, HTML/Markdown exploits, and sensitive data patterns.
Isolation of system prompts and secrets in secure storage instead of embedding them in user-visible prompts.
Automated red teaming against prompt injection, jailbreaks, and context-hijacking scenarios.
Runtime guardrails that inspect inputs/outputs and enforce refusal or rollback when violations are detected.
Audit logging of prompt interactions for forensic review.

Exceptions

Prototype prompts may run with reduced guardrails inside sandbox environments only. Production rollout requires full control coverage.

Review Cadence

Prompt libraries undergo quarterly reviews to remove obsolete prompts, incorporate new intelligence, and verify guardrail effectiveness.

References

OWASP LLM Top 10 (Prompt Injection)
NIST AI RMF Govern/Manage functions
Internal documents: Prompt Hardening Guide, Guardrail Service Runbook, Secure Coding Standard