How a request is charged
Hold placed
When you send a request, ModelSwitch places a hold on your balance for the estimated cost. This prevents your balance from going negative under concurrent requests.
Cost settled
Once the provider returns the actual token counts, the hold is adjusted to match the real cost.
Cost formula
The cost of a request is calculated separately for prompt (input) and completion (output) tokens:Example: GPT-4o
GPT-4o is priced at 10.00 per 1M output tokens. For a request with 500 prompt tokens and 200 completion tokens:| Component | Calculation | Cost |
|---|---|---|
| Prompt | (500 / 1,000,000) × $2.50 | $0.00125 |
| Completion | (200 / 1,000,000) × $10.00 | $0.00200 |
| Total | $0.00325 |
Markup
ModelSwitch adds a small markup over provider prices to cover infrastructure costs: servers, network, monitoring, and platform development. The markup is a fixed percentage already included in the prices shown in the model catalog.- No hidden fees, surcharges, or inactivity charges.
- Live prices are always available via
GET /v1/models.
When your balance runs out
If your balance reaches zero, API requests return403 Forbidden. Top up your balance to resume.
Next steps
Top up balance
Add funds by card or invoice
Auto top-up
Set up recurring top-ups
Documents
Invoices and accounting docs for legal entities