Skip to main content
Token Cost Calculator

How much does inference actually cost?

Configure your workload. See the real cost per million tokens across 5,213 GPU instances — then compare against API pricing to find the cheapest path.

60%
10%LowTypicalHigh95%
730h

Token throughput is estimated from GPU TFLOPS. Real-world throughput depends on serving framework, batching, context length, and model architecture.

Cheapest cost per 1M tokens

$0.0002

per 1M output tokens · 8B model · FP16 · 60% util

on RTX5070 · Vast.ai · $0.0067/hr

vs API Pricing (per 1M output tokens)

Best GPU (RTX5070)
$0.0002
GPT-4o
$2.50
GPT-4o mini
$0.15
Claude 3.5 Sonnet
$3.00
Llama 3 70B (Groq)
$0.59

Self-hosted saves up to 100% vs GPT-4o

Top 5 Cheapest by Token Cost

8B · FP16
1
5070
RTX5070Blackwell
Vast.ai·12GB·~11250 tok/s·$0.0067/hr
$0.000
/1M tokens
Deploy
2
5090
RTX5090Blackwell
Vast.ai·32GB·~33750 tok/s·$0.0940/hr
$0.001
/1M tokens
Deploy
3
5090
RTX5090Blackwell
Vast.ai·32GB·~33750 tok/s·$0.100/hr
$0.001
/1M tokens
Deploy
4
5070
RTX5070Blackwell
Vast.ai·12GB·~11250 tok/s·$0.0400/hr
$0.001
/1M tokens
Deploy
5
5090
RTX5090Blackwell
Vast.ai·32GB·~33750 tok/s·$0.120/hr
$0.001
/1M tokens
Deploy

Estimated Monthly Cost (730h · 60% util)

#1 RTX5070
$3
Vast.ai · $0.0067/hr
#2 RTX5090
$41
Vast.ai · $0.0940/hr
#3 RTX5090
$44
Vast.ai · $0.100/hr