MCP guardrails
Configure PII detection, content filtering, and prompt injection rules for MCP tool inputs.
Overview
MCP Guardrails scan tool inputs before the gateway forwards them to the backend server. They catch PII, prohibited content, and prompt injection attempts before they reach your tools.
You'll find them at AI Gateway > MCP Gateway > Guardrails.
These guardrails are separate from the AI Gateway's LLM guardrails. LLM guardrails scan chat completion requests to LLM providers. MCP guardrails scan tool invocation inputs sent by agents to MCP servers.
Rule list
The main view lists all configured guardrail rules. A summary line at the top shows the total rule count and how many are active (e.g., "3 rules configured, 2 active").
Each rule row displays:
- Name: The name you gave the rule.
- Rule type chip: Shows "PII", "Content filter", or "Prompt injection" with a color-coded badge.
- Action chip: Shows "Block" or "Mask".
- Scope: Where the rule applies (currently "tool_input").
- Tool scope: Either "Applies to all tools" or a list of specific tool names.
- Tool chips: When scoped to specific tools, small chips show each tool name.
- Active toggle: Turn the rule on or off without deleting it.
Inactive rules appear dimmed (60% opacity). Click any row to edit it.
Rule types
Three types of guardrail rules are available:
| Rule type | Chip color | What it does |
|---|---|---|
| PII detection | Blue (info) | Scans tool inputs for personal identifiable information: email addresses, phone numbers, credit card numbers, SSNs, etc. Uses Presidio-based detection. |
| Content filter | Amber (warning) | Checks tool inputs against a list of keywords or regex patterns that you define. Useful for blocking profanity, competitor names, or domain-specific terms. |
| Prompt injection | Green (success) | Detects attempts to manipulate the tool's behavior through injected instructions in the input data. |
Actions
Each rule has an action that determines what happens when a match is found:
- Block: Rejects the entire tool call. The agent receives a JSON-RPC error with code
-32003and a message explaining the guardrail violation. The call is logged as "blocked" in the audit trail. - Mask: Replaces the matched content with placeholders before forwarding the call to the server. The tool still executes, but with sanitized input.
Creating a guardrail rule
Click Add guardrail in the top-right corner. The modal has these fields:
| Field | Required | Description |
|---|---|---|
| Name | Yes | A descriptive name for the rule (e.g., "Block PII in database queries"). Max 255 characters. |
| Rule type | Yes | Select PII detection, Content filter, or Prompt injection. |
| Action | Yes | Select Block or Mask. |
| Scope | Yes | Where to apply the rule. Currently only "Tool input" is available. |
| Applies to tools | No | Comma-separated tool names. Leave empty to apply to all MCP tools. |
| Config (JSON) | No | Optional JSON object for advanced rule settings. Format depends on the rule type. |
| Active | Yes | Toggle to enable or disable the rule immediately. |
Scoping rules to specific tools
By default, a rule applies to all MCP tools. You can restrict it to specific tools by entering a comma-separated list of tool names in the "Applies to tools" field.
Examples:
- Leave empty: the rule runs on every tool call.
- Enter
run_query, search_db: the rule only checks calls to those two tools. - Enter
delete_record: the rule only checks calls to delete_record.
Config JSON
The Config field accepts a JSON object for advanced rule settings. The format depends on the rule type:
PII detection config
{
"entities": {
"EMAIL_ADDRESS": "mask",
"PHONE_NUMBER": "block",
"CREDIT_CARD": "block"
}
}Define which PII entity types to detect and how to handle each one. If no config is provided, the default detection settings are used.
Content filter config
{
"keywords": ["password", "secret", "confidential"],
"patterns": ["\\b\\d{3}-\\d{2}-\\d{4}\\b"]
}Define keywords (exact match) and regex patterns to flag in tool inputs.
Editing a rule
Click any rule row or the pencil icon to open the edit modal. All fields are editable. Click Save changes to apply. Changes take effect immediately for the next tool call.
Enabling and disabling rules
Each rule has an active toggle. Disabled rules are skipped during tool input scanning. This is useful for testing: create a rule, disable it, verify it works on the next tool call, then enable it.
Deleting a rule
Click the trash icon on a rule row. A confirmation modal appears: "This action takes effect immediately. MCP tool invocations will no longer be checked against this rule."
Deletion is permanent but you can re-create the rule at any time.
How guardrails execute
When an agent calls a tool through the gateway, the guardrail evaluation happens in this order:
- The gateway authenticates the agent key and resolves the tool.
- ACLs and rate limits are checked.
- If the tool requires approval, the approval flow takes priority (guardrails run after approval).
- All active guardrail rules that apply to this tool are evaluated against the input.
- If any rule triggers a "block" action, the call is rejected immediately.
- If any rule triggers a "mask" action, the matched content is replaced before forwarding.
- The (possibly modified) input is forwarded to the backend MCP server.
Empty state
When no rules are configured, three tips explain the feature:
- Scan tool inputs before execution: Rules are evaluated against tool input data before the tool runs.
- Scope rules to specific tools: Apply guardrails globally or restrict to specific MCP tools.
- Multiple rule types: Choose from PII detection, content filtering, or prompt injection detection.
Permissions
Creating, editing and deleting guardrail rules requires the Admin role. All authenticated users can view the rule list.