Endpoints
Configure LLM provider endpoints with model selection, API keys, and system prompts.
Overview
Each endpoint maps a URL-safe slug to a provider, model, and API key. Applications call the slug, not the model directly, so you can swap models without changing any application code.
Creating an endpoint
- Click "Add endpoint" in the top right
- Enter a display name (e.g., "Production GPT-4o"). A URL-safe slug is generated automatically.
- Select a model from the dropdown. Models are grouped by provider.
- Select an API key to authenticate with the provider. If no keys are available, add one in Settings first.
- Optionally set max tokens, temperature, system prompt, and rate limit (RPM).
- Optionally select a fallback endpoint (used if the primary provider is down).
- Click "Create endpoint"
Endpoint slugs
Each endpoint has a unique slug (e.g., "prod-gpt4o") that is used to route requests. Slugs must be lowercase alphanumeric with hyphens only. They are auto-generated from the display name but can be customized.
Managing endpoints
The endpoint list shows each endpoint with its provider icon, display name, model, associated API key, guardrail status, and active/inactive state. You can:
- Delete: Removes the endpoint. Requests using this slug will fail.
- View guardrail status: Shows how many guardrail rules are currently active across all endpoints.
- RPM badge: Shows the rate limit in requests per minute, if configured.
- Fallback badge: Shows "has fallback" when a fallback endpoint is configured.
Rate limiting
Set "Rate limit (RPM)" on an endpoint to cap requests per minute. When the limit is hit, additional requests return HTTP 429. The limit uses a Redis sliding window, so it resets continuously (not on a fixed clock boundary). Leave the field empty for no limit.
Fallback endpoints
Select a fallback endpoint when creating or editing an endpoint. If the primary provider returns an error (timeout, 500, rate limit from the provider side), the gateway automatically retries the request using the fallback endpoint. This works for both non-streaming and streaming requests.
Role-based access
Each endpoint has an allowed roles list (defaults to all 4 roles: Admin, Reviewer, Editor, Auditor). When listing endpoints, users only see endpoints their role has access to. This applies to the Endpoints page, the Playground dropdown, and any API call that lists endpoints.
Supported providers
The model dropdown includes popular models from these providers. You can also type a custom model string in LiteLLM format:
- Direct providers: OpenAI, Anthropic, Google Gemini, Mistral, xAI, Cohere
- Cloud providers: AWS Bedrock, Azure OpenAI, Google Vertex AI
- Aggregators: OpenRouter, Together AI
- Self-hosted: Ollama, vLLM, NVIDIA NIM