Token Cost Calculator
How much does inference actually cost?
Configure your workload. See the real cost per million tokens across 5,326 GPU instances — then compare against API pricing to find the cheapest path.
60%
10%LowTypicalHigh95%
730h
Token throughput is estimated from GPU TFLOPS. Real-world throughput depends on serving framework, batching, context length, and model architecture.
Cheapest cost per 1M tokens
$0.0006
per 1M output tokens · 8B model · FP16 · 60% util
on RTXPRO6000 · GCP · $0.152/hr
vs API Pricing (per 1M output tokens)
Best GPU (RTXPRO6000)
$0.0006
GPT-4o
$2.50
GPT-4o mini
$0.15
Claude 3.5 Sonnet
$3.00
Llama 3 70B (Groq)
$0.59
Self-hosted saves up to 100% vs GPT-4o
Top 5 Cheapest by Token Cost
8B · FP16Estimated Monthly Cost (730h · 60% util)
#1 RTXPRO6000
$67
GCP · $0.152/hr
#2 RTX5090
$70
Vast.ai · $0.160/hr
#3 RTXPRO6000
$73
GCP · $0.167/hr