Skip to main content

Smart Fallbacks

ScaleLLM automatically routes to backup models when your primary model is unavailable.

How It Works

  1. You send a request with your primary model
  2. If it fails (timeout, rate limit, server error), we try your fallback
  3. First successful response is returned
  4. All happens server-side—no extra latency

Configuring Fallbacks

Via Request Body

curl https://api.scalellm.dev/v1/chat/completions \
  -H "Authorization: Bearer sk_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "Hello"}],
    "fallback_models": ["gemini-3-pro-preview", "claude-haiku-4.5"]
  }'

Via Header

curl https://api.scalellm.dev/v1/chat/completions \
  -H "Authorization: Bearer sk_your_key" \
  -H "X-Fallback-Models: gemini-3-pro-preview,claude-haiku-4.5" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
PrimaryFallback 1Fallback 2
claude-opus-4.5claude-sonnet-4.5gemini-3-pro-preview
claude-sonnet-4.5gemini-3-pro-previewclaude-haiku-4.5
claude-haiku-4.5gemini-3-flashclaude-sonnet-4.5
gemini-3-pro-previewclaude-sonnet-4.5gemini-3-flash

When Fallbacks Trigger

  • Timeout: Primary doesn’t respond in time
  • Rate Limit: Provider returns 429
  • Server Error: Provider returns 5xx
  • Model Unavailable: Model is down
Client errors (4xx except 429) don’t trigger fallbacks—they indicate a problem with your request.

Response Metadata

{
  "model": "gemini-3-pro-preview",
  "x_scalellm": {
    "primary_model": "claude-sonnet-4.5",
    "model_used": "gemini-3-pro-preview",
    "fallback_reason": "timeout"
  }
}

Disabling Fallbacks

{
  "model": "claude-sonnet-4.5",
  "fallback_models": []
}