Skip to main content
Token Cost Calculator

How much does inference actually cost?

Configure your workload. See the real cost per million tokens across 5,326 GPU instances — then compare against API pricing to find the cheapest path.

60%
10%LowTypicalHigh95%
730h

Token throughput is estimated from GPU TFLOPS. Real-world throughput depends on serving framework, batching, context length, and model architecture.

Cheapest cost per 1M tokens

$0.0006

per 1M output tokens · 8B model · FP16 · 60% util

on RTXPRO6000 · GCP · $0.152/hr

vs API Pricing (per 1M output tokens)

Best GPU (RTXPRO6000)
$0.0006
GPT-4o
$2.50
GPT-4o mini
$0.15
Claude 3.5 Sonnet
$3.00
Llama 3 70B (Groq)
$0.59

Self-hosted saves up to 100% vs GPT-4o

Top 5 Cheapest by Token Cost

8B · FP16
1
PRO6K
RTXPRO6000Blackwell
GCP·96GB·~67088 tok/s·$0.152/hr
$0.001
/1M tokens
Deploy
2
5090
RTX5090 ×2Blackwell
Vast.ai·64GB·~67500 tok/s·$0.160/hr
$0.001
/1M tokens
Deploy
3
PRO6K
RTXPRO6000Blackwell
GCP·96GB·~67088 tok/s·$0.167/hr
$0.001
/1M tokens
Deploy
4
PRO6K
RTXPRO6000Blackwell
GCP·96GB·~67088 tok/s·$0.167/hr
$0.001
/1M tokens
Deploy
5
PRO6K
RTXPRO6000Blackwell
GCP·96GB·~67088 tok/s·$0.182/hr
$0.001
/1M tokens
Deploy

Estimated Monthly Cost (730h · 60% util)

#1 RTXPRO6000
$67
GCP · $0.152/hr
#2 RTX5090
$70
Vast.ai · $0.160/hr
#3 RTXPRO6000
$73
GCP · $0.167/hr

We use cookies for analytics and to remember your preferences. Privacy Policy