Skip to main content
ModelSwitch supports streaming via the stream: true parameter on chat completions. Tokens are delivered as Server-Sent Events (SSE) as soon as the model generates them, so your UI can start rendering immediately instead of waiting for the full response. Streaming works with all models that support it, and is compatible with the OpenAI SDK’s built-in stream helpers.

Code examples

from openai import OpenAI

client = OpenAI(api_key="ms-YOUR_KEY", base_url="https://modelswitch.io/v1")

with client.chat.completions.stream(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

SSE format

Each chunk arrives as a line beginning with data: , followed by a JSON object. The stream ends with the sentinel value data: [DONE].
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk",...}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk",...}

data: [DONE]

Chunk structure

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "choices": [{
    "delta": { "content": "Hello" },
    "index": 0,
    "finish_reason": null
  }]
}
The delta.content field contains the new token(s) for that chunk. When finish_reason is set (e.g., "stop"), the model has finished generating and no more content chunks will follow.