AI Gateway

Prompts

Create versioned prompt templates with variables, test them with streaming responses, and bind them to endpoints.

Overview

A prompt is a versioned list of messages (system, user, assistant) with {{variable}} placeholders. Bind a prompt to an endpoint and the gateway resolves the template before every request, so you can change instructions without touching application code.

The editor is a 50/50 split: messages on the left, a test chat on the right. Fill in variables, pick an endpoint and see streaming responses before you publish anything.

Creating a prompt

Go to AI Gateway > Prompts in the sidebar
Click Create prompt
Enter a name (e.g., "Customer support agent"). A URL-safe slug is generated from the name automatically.
Add a description if you want (optional, but helps your team find the right prompt later)
Click Create. You'll land in the editor.

Editing messages

The left panel shows your message blocks. Each block has a role (system, user, or assistant) and a text area. Here's what you can do:

Add messages: Click "+ Add message" below the last block to add a new user message. Change the role using the dropdown in the block header.
Reorder messages: Drag the grip handle on the left side of any message block to reorder them.
Delete messages: Click the trash icon in the block header. At least one message must remain.
Change the role: Use the dropdown in the block header to switch between SYSTEM, USER, and ASSISTANT roles.

Template variables

Write {{variableName}} anywhere in a message to create a placeholder. The editor detects variables automatically and shows them as chips below the message blocks. Variable names can contain letters, numbers and underscores.

Variable resolution

When testing in the editor, you fill in variable values manually in the right panel. When a prompt is bound to an endpoint in production, variables are resolved from the request metadata.

Model and parameters

Each version can store a model and parameter overrides. Pick a model from the dropdown at the top of the editor, then click the gear icon to configure:

Temperature: Controls randomness (0.0 to 2.0). Lower values produce more focused, deterministic output. Higher values produce more creative, varied output. Default is 1.0.
Max tokens: Maximum number of tokens to generate in the response. Higher values allow longer outputs but increase cost and latency.
Top P: Nucleus sampling parameter (0.0 to 1.0). The model considers tokens within the top_p cumulative probability. Use either temperature or top P for best results, not both.

Versioning

Every Save draft click creates a new version. Versions are append-only, numbered v1, v2, v3, etc. Each one captures the full message list, detected variables, model and parameters.

Draft: A saved version that is not yet active. Drafts can be loaded into the editor and tested.
Published: The active version used by bound endpoints. Only one version can be published at a time.

Click the clock icon in the top right to open the Version history drawer. From there you can load any version into the editor or publish it directly.

Publishing

When you publish a version, all other versions are automatically set back to draft. The published version is immediately used by any endpoints bound to this prompt.

Testing prompts

The right panel is a live test chat. To try your prompt:

Pick an endpoint from the Test endpoint dropdown. This controls which provider and API key get used.
If your prompt has variables, fill in their values in the fields that appear below the dropdown.
Type a message and press Enter (or click send).
The response streams in. Once it finishes, you'll see latency, token count and cost above the input.

Test requests use real endpoints

Test requests go through the real proxy flow: guardrails scan the input, rate limits apply and spend is logged. This costs real budget, same as a production request.

Binding prompts to endpoints

Once you've published a version, you can bind the prompt to an endpoint. The gateway then resolves the published messages and prepends them to every request that goes through that endpoint.

The resolution priority is:

If the endpoint has a prompt_id, resolve the published version and prepend its messages (with variables substituted from request metadata).
If no published version exists or resolution fails, fall back to the endpoint's system prompt field.
If neither is set, use the request messages as-is.

Deleting prompts

Click the trash icon on any row in the list. This removes the prompt and all its versions. Any endpoints that were using this prompt get unlinked (prompt_id set to null). Requests already in flight aren't affected, but new requests will fall back to the endpoint's system prompt or pass through with no template.

Permissions

All users can view prompts and their versions. Creating, editing, publishing and deleting prompts requires the Admin role.

Endpoints

Configure LLM provider endpoints that prompts can be bound to.

Guardrails

Set up PII detection and content filtering that applies to prompt test requests.

Getting started

Set up the AI Gateway from scratch with API keys, endpoints and first requests.

PreviousLogs

NextModels