Smart Fallbacks
ScaleLLM automatically routes to backup models when your primary model is unavailable.How It Works
- You send a request with your primary model
- If it fails (timeout, rate limit, server error), we try your fallback
- First successful response is returned
- All happens server-side—no extra latency
Configuring Fallbacks
Via Request Body
Via Header
Recommended Fallbacks
| Primary | Fallback 1 | Fallback 2 |
|---|---|---|
claude-opus-4.5 | claude-sonnet-4.5 | gemini-3-pro-preview |
claude-sonnet-4.5 | gemini-3-pro-preview | claude-haiku-4.5 |
claude-haiku-4.5 | gemini-3-flash | claude-sonnet-4.5 |
gemini-3-pro-preview | claude-sonnet-4.5 | gemini-3-flash |
When Fallbacks Trigger
- Timeout: Primary doesn’t respond in time
- Rate Limit: Provider returns 429
- Server Error: Provider returns 5xx
- Model Unavailable: Model is down
Client errors (4xx except 429) don’t trigger fallbacks—they indicate a problem with your request.