Volver a plantillas de gobernanza de IA
Data and Security AI Policies

Prompt Security and Prompt Hardening Policy

Guidance to stop prompt injection and sanitize prompts.

Responsable: Application Security Lead

Objetivo

Prevent prompt-based attacks (injection, data exfiltration, jailbreaks) by defining standards for prompt design, sanitization, and runtime guardrails across conversational and generative AI applications.

Alcance

Applies to all prompt-driven interfaces exposed to employees, partners, or customers, as well as internal agent frameworks that rely on prompts to orchestrate actions.

  • Customer support chatbots and knowledge assistants
  • Internal copilots and code-generation tools
  • Agent frameworks interacting with external APIs or tools
  • Shared prompt templates and prompt libraries

Definiciones

  • Prompt Injection: Attack where user input manipulates model instructions to disclose secrets or perform unintended actions.
  • System Prompt: Non-user visible instruction set controlling model behaviour.
  • Guardrail Prompt: Supplemental prompt that enforces safety boundaries or refusal logic.

Política

All prompts must pass a security review prior to deployment. User inputs must be sanitized, sensitive instructions must be isolated, and guardrail prompts must be applied for every interaction. Runtime monitoring must detect and block malicious prompt activity.

Roles y responsabilidades

Application Security Lead curates prompt security standards and approves reviews. Engineering implements sanitization libraries and integrates guardrail services. Responsible AI defines behavioural boundaries and escalation criteria. Security Operations monitors alerts and coordinates incident response when violations occur.

Procedimientos

Prompt hardening must include:

  • Prompt design review documenting goals, constraints, and disallowed behaviours.
  • Input sanitation pipeline removing embedded instructions, HTML/Markdown exploits, and sensitive data patterns.
  • Isolation of system prompts and secrets in secure storage instead of embedding them in user-visible prompts.
  • Automated red teaming against prompt injection, jailbreaks, and context-hijacking scenarios.
  • Runtime guardrails that inspect inputs/outputs and enforce refusal or rollback when violations are detected.
  • Audit logging of prompt interactions for forensic review.

Excepciones

Prototype prompts may run with reduced guardrails inside sandbox environments only. Production rollout requires full control coverage.

Frecuencia de revisión

Prompt libraries undergo quarterly reviews to remove obsolete prompts, incorporate new intelligence, and verify guardrail effectiveness.

Referencias

  • OWASP LLM Top 10 (Prompt Injection)
  • NIST AI RMF Govern/Manage functions
  • Internal documents: Prompt Hardening Guide, Guardrail Service Runbook, Secure Coding Standard

¿Listo para implementar esta política?

Use VerifyWise para personalizar esta plantilla de política, desplegarla y hacer seguimiento del cumplimiento.

Prompt Security and Prompt Hardening Policy | Plantillas de gobernanza de IA de VerifyWise