Token Cost Calculator
How much does inference actually cost?
Configure your workload. See the real cost per million tokens across 5,213 GPU instances — then compare against API pricing to find the cheapest path.
60%
10%LowTypicalHigh95%
730h
Token throughput is estimated from GPU TFLOPS. Real-world throughput depends on serving framework, batching, context length, and model architecture.
Cheapest cost per 1M tokens
$0.0002
per 1M output tokens · 8B model · FP16 · 60% util
on RTX5070 · Vast.ai · $0.0067/hr
vs API Pricing (per 1M output tokens)
Best GPU (RTX5070)
$0.0002
GPT-4o
$2.50
GPT-4o mini
$0.15
Claude 3.5 Sonnet
$3.00
Llama 3 70B (Groq)
$0.59
Self-hosted saves up to 100% vs GPT-4o
Top 5 Cheapest by Token Cost
8B · FP16Estimated Monthly Cost (730h · 60% util)
#1 RTX5070
$3
Vast.ai · $0.0067/hr
#2 RTX5090
$41
Vast.ai · $0.0940/hr
#3 RTX5090
$44
Vast.ai · $0.100/hr