Tokens are the currency of AI. Understanding how they're priced — and how to spend them wisely — is the single biggest lever in building cost-effective AI systems.
Every interaction with an LLM is measured and billed in tokens — the smallest units of text your model reads and writes.
Prices change frequently. Use this as a relative guide — the cost ratios between tiers matter more than absolute figures.
| Tier | Example Models | Input $/1M | Output $/1M | Cost per 100k tokens | Best for |
|---|---|---|---|---|---|
| ⚡ Nano | GPT Nano, Gemini Flash Lite, open-source hosted | $0.05–$0.20 | $0.20–$1.00 | $0.03–$0.12 | ClassificationRoutingFiltering |
| 🔵 Mini | Claude Haiku, GPT Mini, Gemini Flash | $0.20–$1.00 | $1.00–$5.00 | $0.12–$0.60 | ParsingSummariesExtraction |
| 🟣 Mid-tier | Claude Sonnet, GPT-5 mid-tier, Gemini Pro | $2–$5 | $10–$20 | $1.20–$2.50 | CodingAgentsWorkflows |
| 🌟 Frontier | Claude Opus, GPT-5.4+, Gemini Ultra | $10–$20 | $50–$100 | $6.00–$12.00 | Deep reasoningArchitecture |
Estimate costs across model tiers based on your actual usage patterns. See exactly how much you save by choosing the right tier.
Same workload, different model tiers — relative costs for your current usage
Don't use one model for everything. Route tasks to the cheapest capable model, escalate only when needed.
Model tier selection is the biggest lever — but these techniques compound to drive costs down further.
These four patterns account for the majority of preventable AI cost overruns in production systems.
Once you're in production, cost management becomes operational. These guardrails prevent the anti-patterns from creating real damage.
Winning AI systems default to cheap models, escalate only when needed, and design every workflow for token efficiency.
From model selection to cost-aware agent architecture — get the full playbook for building AI systems that scale without breaking your budget.