Token Cost Calculator

How much does inference actually cost?

Configure your workload. See the real cost per million tokens across 5,213 GPU instances — then compare against API pricing to find the cheapest path.

Model Size (Parameters)

Quantization

GPU Utilization60%

10%LowTypicalHigh95%

Monthly Hours730h

Token throughput is estimated from GPU TFLOPS. Real-world throughput depends on serving framework, batching, context length, and model architecture.

Cheapest cost per 1M tokens

$0.0002

per 1M output tokens · 8B model · FP16 · 60% util

on RTX5070 · Vast.ai · $0.0067/hr

vs API Pricing (per 1M output tokens)

Best GPU (RTX5070)

$0.0002

GPT-4o

$2.50

GPT-4o mini

$0.15

Claude 3.5 Sonnet

$3.00

Llama 3 70B (Groq)

$0.59

Self-hosted saves up to 100% vs GPT-4o

Top 5 Cheapest by Token Cost

8B · FP16

5070

RTX5070Blackwell

Vast.ai·12GB·~11250 tok/s·$0.0067/hr

$0.000

/1M tokens

Deploy

5090

RTX5090Blackwell

Vast.ai·32GB·~33750 tok/s·$0.0940/hr

$0.001

/1M tokens

Deploy

5090

RTX5090Blackwell

Vast.ai·32GB·~33750 tok/s·$0.100/hr

$0.001

/1M tokens

Deploy

5070

RTX5070Blackwell

Vast.ai·12GB·~11250 tok/s·$0.0400/hr

$0.001

/1M tokens

Deploy

5090

RTX5090Blackwell

Vast.ai·32GB·~33750 tok/s·$0.120/hr

$0.001

/1M tokens

Deploy

View all 5,039 instances ranked by token cost

Estimated Monthly Cost (730h · 60% util)

#1 RTX5070

Vast.ai · $0.0067/hr

#2 RTX5090

$41

Vast.ai · $0.0940/hr

#3 RTX5090

$44

Vast.ai · $0.100/hr