Enter your tokens and volume — all models appear side by side with per-request, daily and monthly cost. Prices editable.
| {{ __t('model') }} | {{ __t('th_in') }} $/Mtok |
{{ __t('th_out') }} $/Mtok |
{{ __t('th_cache_read') }} $/Mtok |
{{ __t('th_per_request') }} $ |
{{ __t('th_per_day') }} $ |
{{ __t('th_per_month') }} $ |
|---|---|---|---|---|---|---|
| {{ m.name }} {{ m.vendor }} |
{{ calcRequest(m).toFixed(5) }} | {{ calcDay(m).toFixed(2) }} | {{ calcMonth(m).toFixed(2) }} |
Providers bill per million tokens (Mtok) — separately for input (prompt + system + context) and output (model reply). Many add a caching tier: stable context can be cached and read again at a fraction of the input price. With many requests sharing the same system prompt this slashes the bill — typically 50–90% off the input portion.
Classification, summarization, simple extraction: small models (Haiku, GPT-4o mini, Gemini Flash, DeepSeek) are often 20–100× cheaper than top tier and cover 80% of workloads. Reasoning, code and multi-hop tasks benefit from Opus, GPT-4o, Gemini Pro. Tip: build a router that sends easy tasks to cheap models and hard ones to top tier — typical 40–70% savings.