Predict what your LLM project will actually cost

Estimate monthly, yearly, and per-request costs across Claude, GPT, Gemini, and OpenRouter models. Model prompt caching, compare options side-by-side, and count tokens from real text.

Pricing data current as of April 2026. Verify official rates before final budgeting.

Single model estimator

Configure your workload and see costs at every time horizon.

Start with a common workload

Choose a realistic starter template, then fine-tune the sliders for your own traffic and prompt sizes.

2,000

Prompt + system + RAG context sent to the model

500

Response generated by the model

1,000
0%

Percent of input tokens served from prompt cache (typically ~90% discount)

Projected cost
Per request $0.00
Per day $0.00
Per month $0.00
Per year $0.00

Monthly breakdown

Input tokens (full price)
Input tokens (cached)
Output tokens
Input cost
Output cost
Savings vs no caching

Compare models side-by-side

Same workload, different models. Monthly cost using your estimator inputs above.

Model Provider Input $/M Output $/M Monthly cost vs cheapest

Token counter

Paste real prompt text to get a token estimate. Uses character- and word-based heuristics — accurate to roughly ±15% for English prose. For exact counts, use your provider's tokenizer.

Characters 0
Words 0
Estimated tokens 0

FAQ

What is a token?

A token is a chunk of text an LLM processes — roughly 4 characters or ¾ of a word for English. "Hello, world!" is about 4 tokens. Providers charge separately for input tokens (your prompt) and output tokens (the response), and output tokens typically cost 4–5× more than input.

How accurate is the token counter?

The counter uses a character/word heuristic (~4 chars per token for English). It's accurate within roughly 15% for English prose, but code, non-English languages, and unusual formatting can skew it. For production budgeting, use the official tokenizer from your provider or log actual usage from a pilot.

How does prompt caching work?

When you send the same prefix (system prompt, long context, tool definitions) across many requests, providers can cache it. Cached input tokens typically cost ~10% of the standard rate on Anthropic and ~50% on OpenAI. If 70% of your input is a stable system prompt sent 1,000 times a day, caching that 70% saves you ~63% on input costs.

What's the Batch API discount?

Anthropic, OpenAI, and Google all offer a Batch API that processes requests asynchronously within 24 hours at 50% off both input and output tokens. It's ideal for non-interactive workloads: nightly data processing, bulk classification, evals, offline summarization. The discount stacks with prompt caching.

Where's the pricing data from?

Pricing is sourced from each provider's official pricing page as of April 2026. Model prices do change — verify against the provider before committing to a budget. For OpenRouter models, use the "Custom pricing" option and paste the rates directly from openrouter.ai/models.

What's not modeled?

This estimator handles base token costs, prompt caching, and batch discounts. It does not currently model: extended thinking tokens (billed as output), tool use overhead (typically a few hundred extra tokens per call), image/audio inputs, fine-tuning, data-residency surcharges, or long-context premiums that kick in above 200K tokens on some models. For those, add a 10–30% buffer to your estimate.

Is my data sent anywhere?

No. Everything runs in your browser. No data, prompts, or estimates leave your device.