Chat Completions

Send a list of messages and receive a model-generated reply. This is the primary endpoint for working with chat models. Supports both synchronous responses and SSE streaming. Authentication: Authorization: Bearer ms-YOUR_KEY

Parameters

model

string

required

Model ID to use for the request. Use GET /v1/models to retrieve available IDs (e.g., gpt-4o, claude-sonnet-4).

messages

array

required

Array of message objects forming the conversation. Each object requires:

role — "system", "user", or "assistant"
content — message text as a string

temperature

number

default:"1"

Sampling temperature between 0 and 2. Higher values produce more random output; lower values are more deterministic. Do not use alongside top_p.

max_tokens

integer

Maximum number of tokens to generate in the response. Defaults to the model’s context limit.

stream

boolean

default:"false"

When true, the response is delivered as a series of server-sent events (SSE). Each chunk contains a partial delta. The stream ends with data: [DONE].

top_p

number

default:"1"

Nucleus sampling threshold between 0 and 1. The model considers only the tokens comprising the top top_p probability mass. Do not use alongside temperature.

frequency_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens based on how often they’ve already appeared in the response, reducing repetition.

presence_penalty

number

default:"0"

Number between -2.0 and 2.0. Positive values penalize tokens that have appeared at all, encouraging the model to explore new topics.

stop

string or array

Up to 4 sequences where the model will stop generating. The stop string itself is not included in the output.

integer

default:"1"

Number of completion choices to generate for each message. Note that this multiplies token usage by n.

Examples

curl https://modelswitch.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ms-YOUR_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is ModelSwitch?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": false
  }'

Response

Non-streaming response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "ModelSwitch is a unified AI API gateway..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

string

Unique identifier for this completion request.

object

string

Always "chat.completion" for non-streaming responses.

created

integer

Unix timestamp of when the completion was created.

model

string

The model that generated the response.

choices

array

Array of generated completions. Contains one item unless n is set.

Show properties

index

integer

Zero-based index of this choice in the array.

message

object

The generated message with role ("assistant") and content (string).

finish_reason

string

Why generation stopped: "stop" (natural end or stop sequence), "length" (hit max_tokens), or "content_filter".

usage

object

Token counts for this request.

Show properties

prompt_tokens

integer

Tokens in the input messages.

completion_tokens

integer

Tokens in the generated response.

total_tokens

integer

Sum of prompt and completion tokens. This is the amount billed.

Streaming response

When stream: true, the API returns newline-delimited SSE events. Each event is a JSON object with a delta instead of a message:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "choices": [{
    "delta": {"content": "Model"},
    "index": 0,
    "finish_reason": null
  }]
}

Assemble the full response by concatenating choices[0].delta.content from each chunk. The stream ends with:

data: [DONE]

When using the OpenAI Python or TypeScript SDK, streaming is handled transparently with stream=True / stream: true. The SDK iterates over chunks and exposes .choices[0].delta.content on each.

Inference

Account

Errors

Parameters

Examples

Response

Non-streaming response

Streaming response

Inference

Account

Errors

​Parameters

​Examples

​Response

​Non-streaming response

​Streaming response

Parameters

Examples

Response

Non-streaming response

Streaming response