Skip to main content
POST
/
v1
/
chat
/
completions
Chat Completions
curl --request POST \
  --url https://modelswitch.io/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {}
  ],
  "temperature": 123,
  "max_tokens": 123,
  "stream": true,
  "top_p": 123,
  "frequency_penalty": 123,
  "presence_penalty": 123,
  "stop": {},
  "n": 123
}
'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {},
      "finish_reason": "<string>"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  }
}
Send a list of messages and receive a model-generated reply. This is the primary endpoint for working with chat models. Supports both synchronous responses and SSE streaming. Authentication: Authorization: Bearer ms-YOUR_KEY

Parameters

model
string
required
Model ID to use for the request. Use GET /v1/models to retrieve available IDs (e.g., gpt-4o, claude-sonnet-4).
messages
array
required
Array of message objects forming the conversation. Each object requires:
  • role"system", "user", or "assistant"
  • content — message text as a string
temperature
number
default:"1"
Sampling temperature between 0 and 2. Higher values produce more random output; lower values are more deterministic. Do not use alongside top_p.
max_tokens
integer
Maximum number of tokens to generate in the response. Defaults to the model’s context limit.
stream
boolean
default:"false"
When true, the response is delivered as a series of server-sent events (SSE). Each chunk contains a partial delta. The stream ends with data: [DONE].
top_p
number
default:"1"
Nucleus sampling threshold between 0 and 1. The model considers only the tokens comprising the top top_p probability mass. Do not use alongside temperature.
frequency_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize tokens based on how often they’ve already appeared in the response, reducing repetition.
presence_penalty
number
default:"0"
Number between -2.0 and 2.0. Positive values penalize tokens that have appeared at all, encouraging the model to explore new topics.
stop
string or array
Up to 4 sequences where the model will stop generating. The stop string itself is not included in the output.
n
integer
default:"1"
Number of completion choices to generate for each message. Note that this multiplies token usage by n.

Examples

curl https://modelswitch.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ms-YOUR_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is ModelSwitch?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": false
  }'

Response

Non-streaming response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "ModelSwitch is a unified AI API gateway..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}
id
string
Unique identifier for this completion request.
object
string
Always "chat.completion" for non-streaming responses.
created
integer
Unix timestamp of when the completion was created.
model
string
The model that generated the response.
choices
array
Array of generated completions. Contains one item unless n is set.
usage
object
Token counts for this request.

Streaming response

When stream: true, the API returns newline-delimited SSE events. Each event is a JSON object with a delta instead of a message:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "choices": [{
    "delta": {"content": "Model"},
    "index": 0,
    "finish_reason": null
  }]
}
Assemble the full response by concatenating choices[0].delta.content from each chunk. The stream ends with:
data: [DONE]
When using the OpenAI Python or TypeScript SDK, streaming is handled transparently with stream=True / stream: true. The SDK iterates over chunks and exposes .choices[0].delta.content on each.