Quickly estimate token counts for Claude, GPT and Gemini — all models side by side, with context window check.
| {{ __t('th_model') }} | {{ __t('th_tokens') }} | {{ __t('th_context') }} | {{ __t('th_usage') }} | {{ __t('th_status') }} |
|---|---|---|---|---|
| {{ row.name }} | {{ row.tokens.toLocaleString() }} | {{ row.context.toLocaleString() }} | {{ row.pct.toFixed(1) }}% | {{ __t('status_ok') }} {{ __t('status_tight') }} {{ __t('status_over') }} |
Tokens are the building blocks an LLM breaks text into before processing it. A token is usually a word fragment — for English text roughly 4 characters or ¾ of a word; for Chinese/Japanese often 1–2 tokens per character. Tokens drive both cost (billed per million tokens) and capacity (context window in tokens).
Each provider trains their own tokenizer on a specific corpus. GPT-4o uses o200k_base (~200k vocab), older GPT cl100k_base (100k). Claude and Gemini have their own tokenizers with different subword splits. The same "internationalization" can be split into 4, 6 or more tokens depending on the tokenizer. Differences are largest for non-Latin scripts (Arabic, Thai, CJK).