Which LLM is cheapest for your use case?

Enter your tokens and volume — all models appear side by side with per-request, daily and monthly cost. Prices editable.

{{ __t('usage_section') }}
{{ __t('cache_hint') }}

{{ __t('cost_comparison') }}
{{ __t('model') }} {{ __t('th_in') }}
$/Mtok
{{ __t('th_out') }}
$/Mtok
{{ __t('th_cache_read') }}
$/Mtok
{{ __t('th_per_request') }}
$
{{ __t('th_per_day') }}
$
{{ __t('th_per_month') }}
$
{{ m.name }}
{{ m.vendor }}
{{ calcRequest(m).toFixed(5) }} {{ calcDay(m).toFixed(2) }} {{ calcMonth(m).toFixed(2) }}
{{ __t('note_label') }}: {{ __t('prices_disclaimer') }}

How LLM API costs are structured

Providers bill per million tokens (Mtok) — separately for input (prompt + system + context) and output (model reply). Many add a caching tier: stable context can be cached and read again at a fraction of the input price. With many requests sharing the same system prompt this slashes the bill — typically 50–90% off the input portion.

Practice: how to pick the cheapest viable model

Classification, summarization, simple extraction: small models (Haiku, GPT-4o mini, Gemini Flash, DeepSeek) are often 20–100× cheaper than top tier and cover 80% of workloads. Reasoning, code and multi-hop tasks benefit from Opus, GPT-4o, Gemini Pro. Tip: build a router that sends easy tasks to cheap models and hard ones to top tier — typical 40–70% savings.

Tips to lower your bill

  • Use prompt caching: mark system prompts and large constant contexts as cache blocks — Anthropic, OpenAI and Google all support it.
  • Shorten output — output is usually 3–5× the input price. Use structured JSON outputs instead of long prose.
  • Use batch APIs: async batches are 50% cheaper at Anthropic and OpenAI than live requests.