A token is a chunk of text an LLM processes. For English, one token is roughly four characters or three quarters of a word.

Token Cost Estimator | Free LLM API Pricing Calculator

Q: How accurate is the token counter?

The counter uses character and word heuristics and is typically accurate within about 15 percent for English prose. For exact counts, use your provider's tokenizer.

Q: How does prompt caching work?

Providers can discount stable prompt prefixes that repeat across requests. Cached input tokens are commonly billed at a lower rate than uncached input tokens.

Q: Is my data sent anywhere?

No. Everything runs in the browser, so prompts, text, and estimates stay on the user's device.

Predict what your LLM project will actually cost

Estimate monthly, yearly, and per-request costs across Claude, GPT, Gemini, and OpenRouter models. Model prompt caching, compare options side-by-side, and count tokens from real text.

Pricing data current as of April 2026. Verify official rates before final budgeting.

Single model estimator

Configure your workload and see costs at every time horizon.

Start with a common workload

Choose a realistic starter template, then fine-tune the sliders for your own traffic and prompt sizes.

Model

—

Input tokens per request 2,000

Prompt + system + RAG context sent to the model

Output tokens per request 500

Response generated by the model

Requests per day 1,000

Cache hit rate 0%

Percent of input tokens served from prompt cache (typically ~90% discount)

Batch API (50% discount)

Projected cost —

Per request $0.00

Per day $0.00

Per month $0.00

Per year $0.00

Monthly breakdown

Input tokens (full price)—

Input tokens (cached)—

Output tokens—

Input cost—

Output cost—

Savings vs no caching—

Compare models side-by-side

Same workload, different models. Monthly cost using your estimator inputs above.

Model	Provider	Input $/M	Output $/M	Monthly cost	vs cheapest

Token counter

Paste real prompt text to get a token estimate. Uses character- and word-based heuristics — accurate to roughly ±15% for English prose. For exact counts, use your provider's tokenizer.

Paste text

Characters 0

Words 0

Estimated tokens 0

FAQ

What is a token?

A token is a chunk of text an LLM processes — roughly 4 characters or ¾ of a word for English. "Hello, world!" is about 4 tokens. Providers charge separately for input tokens (your prompt) and output tokens (the response), and output tokens typically cost 4–5× more than input.

How accurate is the token counter?

The counter uses a character/word heuristic (~4 chars per token for English). It's accurate within roughly 15% for English prose, but code, non-English languages, and unusual formatting can skew it. For production budgeting, use the official tokenizer from your provider or log actual usage from a pilot.

How does prompt caching work?

When you send the same prefix (system prompt, long context, tool definitions) across many requests, providers can cache it. Cached input tokens typically cost ~10% of the standard rate on Anthropic and ~50% on OpenAI. If 70% of your input is a stable system prompt sent 1,000 times a day, caching that 70% saves you ~63% on input costs.

What's the Batch API discount?

Anthropic, OpenAI, and Google all offer a Batch API that processes requests asynchronously within 24 hours at 50% off both input and output tokens. It's ideal for non-interactive workloads: nightly data processing, bulk classification, evals, offline summarization. The discount stacks with prompt caching.

Where's the pricing data from?

Pricing is sourced from each provider's official pricing page as of April 2026. Model prices do change — verify against the provider before committing to a budget. For OpenRouter models, use the "Custom pricing" option and paste the rates directly from openrouter.ai/models.

What's not modeled?

This estimator handles base token costs, prompt caching, and batch discounts. It does not currently model: extended thinking tokens (billed as output), tool use overhead (typically a few hundred extra tokens per call), image/audio inputs, fine-tuning, data-residency surcharges, or long-context premiums that kick in above 200K tokens on some models. For those, add a 10–30% buffer to your estimate.

Is my data sent anywhere?

No. Everything runs in your browser. No data, prompts, or estimates leave your device.