High prices = multi-GPU instances or reserved capacity blocks
We track 78 distinct GPU models across 54 cloud providers in real time. Prices update every 6 hours. This reference covers minimum, median, and maximum prices for every model with at least 3 live instances — giving you a ground truth for what GPU compute actually costs.
Note: maximums are often multi-GPU configurations or long-term reserved blocks, not single-GPU on-demand rates. For single-GPU on-demand, the median is a better reference.
Budget Tier: Under $1/hr Median
| GPU | Min | Median | Max | Instances |
|---|---|---|---|---|
| RTX 3090 24GB VRAM | $0.053 | $0.44 | $1.32 | 58 |
| A16 4×16GB VRAM | $0.059 | $0.47 | $7.53 | 57 |
| T4 16GB VRAM | $0.068 | $0.94 | $13.30 | 1511 |
| P100 16GB HBM2 | $0.153 | $0.73 | $2.55 | 106 |
| A40 48GB VRAM | $0.075 | $0.80 | $3.60 | 59 |
| RTX 4090 24GB VRAM | $0.107 | $0.80 | $3.12 | 105 |
| RTX 5090 32GB GDDR7 | $0.134 | $1.59 | $7.12 | 100 |
Mid Tier: $1–$5/hr Median
| GPU | Min | Median | Max | Instances |
|---|---|---|---|---|
| L4 24GB VRAM | $0.159 | $1.26 | $22.69 | 370 |
| A10 24GB VRAM | $0.084 | $1.60 | $13.04 | 274 |
| RTX A6000 48GB VRAM | $0.490 | $1.17 | $4.68 | 34 |
| RTX6000Ada 48GB VRAM | $0.289 | $1.17 | $6.61 | 30 |
| A10G 24GB VRAM | $0.307 | $1.98 | $27.69 | 134 |
| L40S 48GB VRAM | $0.260 | $2.96 | $445 | 215 |
| A100 80GB 80GB HBM2e | $1.080 | $3.28 | $32.78 | 66 |
| RTXPRO6000 96GB VRAM | $0.719 | $5.75 | $39.60 | 155 |
HPC Tier: $5–$15/hr Median
| GPU | Min | Median | Max | Instances |
|---|---|---|---|---|
| A100 40/80GB HBM2e | $0.123 | $7.78 | $65.54 | 363 |
| H100 80GB HBM3 | $0.533 | $7.34 | $97.44 | 256 |
| V100 16/32GB HBM2 | $0.048 | $9.17 | $33.55 | 527 |
| H200 141GB HBM3e | $1.428 | $9.16 | $50.44 | 104 |
| B200 180GB HBM3e | $1.669 | $14.97 | $90.22 | 92 |
Key Pricing Observations
The T4 Is Everywhere But Poorly Priced
The T4 is the most common GPU in the cloud with 1,511 instances — 29.5% of all tracked compute. Its median price is $0.94/hr, but it has only 16GB VRAM, slow memory bandwidth, and no FP8 support. The L4 at $1.26/hr median gives 48% more VRAM (24GB) with 2× the throughput — making the T4 a poor choice for any modern LLM workload.
The V100 Is the Biggest Value Trap
Despite being a 2017 GPU, the V100 remains the second most common cloud GPU with 527 instances and a median price of $9.17/hr. That's more than the H100 minimum ($0.53/hr). The V100 has no FP8, no NVLink 3.0, and 2-3× lower throughput on modern workloads. Avoid it unless your software specifically requires Volta architecture.
H200 Is Now Cheaper Than H100 at Median
The H200 (141GB HBM3e) has a median price of $9.16/hr versus $7.34/hr for H100 — so the H100 is still cheaper at median. But the H200 minimum ($1.43/hr) is below many H100 on-demand rates and gives 1.75× the VRAM. For memory-bound 70B model inference, the H200 is increasingly the better choice.
RTX 4090 Remains the Best $/Token GPU for Small Models
The RTX 4090 at $0.80/hr median (min: $0.11/hr on spot) continues to be the most cost-efficient GPU for models up to 13B parameters. It outperforms the A100 on cost-per-token for Llama 3 8B, despite having no HBM memory or NVLink.