Skip to main content

Chat

Create chat completions using the OpenAI-compatible API format.

Endpoint

POST https://api.scalellm.dev/v1/chat/completions

Examples

from openai import OpenAI

client = OpenAI(
    base_url="https://api.scalellm.dev/v1",
    api_key="sk_your_key"
)

response = client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

Request Body

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g., claude-sonnet-4.5)
messagesarrayYesArray of message objects
max_tokensintegerNoMaximum tokens to generate
temperaturefloatNoRandomness (0-2, default 1)
top_pfloatNoNucleus sampling (0-1)
streambooleanNoStream responses
stopstring/arrayNoStop sequences
fallback_modelsarrayNoFallback model IDs

Message Object

FieldTypeDescription
rolestringsystem, user, or assistant
contentstringMessage content

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699000000,
  "model": "claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 12,
    "total_tokens": 32
  }
}

Streaming

from openai import OpenAI

client = OpenAI(
    base_url="https://api.scalellm.dev/v1",
    api_key="sk_your_key"
)

stream = client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Streamed response (SSE):
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"!"}}]}

data: [DONE]

With Fallbacks

from openai import OpenAI

client = OpenAI(
    base_url="https://api.scalellm.dev/v1",
    api_key="sk_your_key"
)

response = client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "fallback_models": ["gemini-3-pro-preview", "claude-haiku-4.5"]
    }
)

print(response.choices[0].message.content)

Available Models

ModelDescription
claude-opus-4.5Most capable, best for complex tasks
claude-sonnet-4.5Balanced performance and speed
claude-haiku-4.5Fastest, best for simple tasks
gemini-3-pro-previewGoogle’s latest with 1M token context
gemini-3-flashUltra-fast and cost-effective

Headers

HeaderRequiredDescription
AuthorizationYesBearer sk_your_key
Content-TypeYesapplication/json