AI Gateway

Endpoints

Configure LLM provider endpoints with model selection, API keys, and system prompts.

Overview

Each endpoint maps a URL-safe slug to a provider, model, and API key. Applications call the slug, not the model directly, so you can swap models without changing any application code.

Creating an endpoint

Click "Add endpoint" in the top right
Enter a display name (e.g., "Production GPT-4o"). A URL-safe slug is generated automatically.
Select a model from the dropdown. Models are grouped by provider.
Select an API key to authenticate with the provider. If no keys are available, add one in Settings first.
Optionally set max tokens, temperature, system prompt and rate limit (RPM).
Optionally select a fallback endpoint (used if the primary provider is down).
Click "Create endpoint"

System prompts

If you set a system prompt on an endpoint, it gets prepended to every request. Good for enforcing consistent behavior or response formatting without changing your application.

Endpoint slugs

Each endpoint has a unique slug (e.g., "prod-gpt4o") that is used to route requests. Slugs must be lowercase alphanumeric with hyphens only. They are auto-generated from the display name but can be customized.

Managing endpoints

The endpoint list shows each endpoint with its provider icon, display name, model, associated API key, guardrail status and active/inactive state. You can:

Delete: Removes the endpoint. Requests using this slug will fail.
View guardrail status: Shows how many guardrail rules are currently active across all endpoints.
RPM badge: Shows the rate limit in requests per minute, if configured.
Fallback badge: Shows "has fallback" when a fallback endpoint is configured.

Rate limiting

Set "Rate limit (RPM)" on an endpoint to cap requests per minute. When the limit is hit, additional requests return HTTP 429. The limit uses a Redis sliding window, so it resets continuously (not on a fixed clock boundary). Leave the field empty for no limit.

Fallback endpoints

Select a fallback endpoint when creating or editing an endpoint. If the primary provider returns an error (timeout, 500, rate limit from the provider side), the gateway automatically retries the request using the fallback endpoint. This works for both non-streaming and streaming requests.

Fallback chain

The fallback endpoint can itself have a fallback, forming a chain. The gateway follows the chain until it finds a working endpoint or runs out of fallbacks. Guardrails run on each attempt in the chain. Avoid circular fallback chains (A to B to A) as they will loop.

Role-based access

Each endpoint has an allowed roles list (defaults to all 4 roles: Admin, Reviewer, Editor, Auditor). When listing endpoints, users only see endpoints their role has access to. This applies to the Endpoints page, the Playground dropdown and any API call that lists endpoints.

Supported providers

The model dropdown includes popular models from these providers. You can also type a custom model string in LiteLLM format:

Direct providers: OpenAI, Anthropic, Google Gemini, Mistral, xAI, Cohere
Cloud providers: AWS Bedrock, Azure OpenAI, Google Vertex AI
Aggregators: OpenRouter, Together AI
Self-hosted: Ollama, vLLM, NVIDIA NIM

Model format

Models follow the LiteLLM format: "provider/model-name" (e.g., "openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"). For OpenRouter, use "openrouter/provider/model" (e.g., "openrouter/meta-llama/llama-3.3-70b-instruct").

Settings

Add the API keys that endpoints use to authenticate with providers.

Guardrails

Set up PII detection and content filtering applied to all endpoints.

Prompts

Create versioned prompt templates that can be bound to endpoints.

PreviousAnalytics

NextPlayground