Prompt Security and Prompt Hardening Policy

Objetivo

Prevent prompt-based attacks (injection, data exfiltration, jailbreaks) by defining standards for prompt design, sanitization, and runtime guardrails across conversational and generative AI applications.

Alcance

Applies to all prompt-driven interfaces exposed to employees, partners, or customers, as well as internal agent frameworks that rely on prompts to orchestrate actions.

Customer support chatbots and knowledge assistants
Internal copilots and code-generation tools
Agent frameworks interacting with external APIs or tools
Shared prompt templates and prompt libraries

Definiciones

Prompt Injection: Attack where user input manipulates model instructions to disclose secrets or perform unintended actions.
System Prompt: Non-user visible instruction set controlling model behaviour.
Guardrail Prompt: Supplemental prompt that enforces safety boundaries or refusal logic.

Política

All prompts must pass a security review prior to deployment. User inputs must be sanitized, sensitive instructions must be isolated, and guardrail prompts must be applied for every interaction. Runtime monitoring must detect and block malicious prompt activity.

Roles y responsabilidades

Application Security Lead curates prompt security standards and approves reviews. Engineering implements sanitization libraries and integrates guardrail services. Responsible AI defines behavioural boundaries and escalation criteria. Security Operations monitors alerts and coordinates incident response when violations occur.

Procedimientos

Prompt hardening must include:

Prompt design review documenting goals, constraints, and disallowed behaviours.
Input sanitation pipeline removing embedded instructions, HTML/Markdown exploits, and sensitive data patterns.
Isolation of system prompts and secrets in secure storage instead of embedding them in user-visible prompts.
Automated red teaming against prompt injection, jailbreaks, and context-hijacking scenarios.
Runtime guardrails that inspect inputs/outputs and enforce refusal or rollback when violations are detected.
Audit logging of prompt interactions for forensic review.

Excepciones

Prototype prompts may run with reduced guardrails inside sandbox environments only. Production rollout requires full control coverage.

Frecuencia de revisión

Prompt libraries undergo quarterly reviews to remove obsolete prompts, incorporate new intelligence, and verify guardrail effectiveness.

Referencias

OWASP LLM Top 10 (Prompt Injection)
NIST AI RMF Govern/Manage functions
Internal documents: Prompt Hardening Guide, Guardrail Service Runbook, Secure Coding Standard