GPU cloud prices have collapsed 40-60% in the last 12 months. The H100 went from $3.50/hr to $1.29/hr. The A100 went from $1.80/hr to $0.34/hr on spot. The RTX 4090 went from $0.74/hr to $0.19/hr on spot. And based on the supply-side data we track, prices are going to fall another 30-40% by the end of 2026. Here is why, and here is what it means for your GPU budget timing.
What Caused the 2025 Price Crash
Three forces converged simultaneously:
- Massive H100 supply came online. NVIDIA shipped over 3.5 million H100 equivalents in 2024-2025. Every hyperscaler, every GPU cloud, every startup with a data center budget ordered them. The result: supply caught up with demand for the first time since GPT-4 launched.
- B200 cannibalized H100 demand. Enterprise buyers waiting for Blackwell delayed H100 orders, creating a demand gap. Existing H100 inventory had to be sold at lower prices to maintain utilization.
- Marketplace competition intensified. The number of GPU cloud providers we track grew from 12 to 18+ in 12 months. More competition = lower margins = lower prices. Vast.ai and TensorDock drove consumer GPU prices to commodity levels.
The 12-Month Price Trajectory
| GPU | Feb 2025 | Feb 2026 | Change | Predicted Dec 2026 |
|---|---|---|---|---|
| H100 SXM (on-demand) | $3.50 | $1.29 | -63% | $0.80-1.00 |
| H100 SXM (spot) | $1.87 | $0.73 | -61% | $0.40-0.60 |
| A100 80GB (on-demand) | $1.80 | $1.10 | -39% | $0.70-0.90 |
| A100 80GB (spot) | $0.80 | $0.34 | -58% | $0.15-0.25 |
| RTX 4090 (on-demand) | $0.74 | $0.39 | -47% | $0.25-0.35 |
| RTX 4090 (spot) | $0.44 | $0.19 | -57% | $0.10-0.15 |
| H200 (on-demand) | $4.00+ | $1.49 | -63% | $0.90-1.20 |
Why Prices Will Keep Falling
The forces pushing prices down have not stopped:
- B200/B300 deployment accelerates H100 oversupply. As enterprises upgrade to Blackwell, their existing H100 clusters get redeployed to cloud providers or marketplace resellers. This creates a cascade of used H100 supply at discount prices.
- RTX 5090 enters the cloud market. At 1,800 FP8 TFLOPS for $2,000, the 5090 will put massive downward pressure on H100 pricing for inference workloads. Why rent an H100 at $1.29/hr when a 5090 at $0.70/hr does the same inference at 91% of the speed?
- Inference efficiency keeps improving. Speculative decoding, KV cache quantization, and framework optimizations mean you get more tokens per GPU-hour every quarter. Same hardware, more output = lower effective price.
The Timing Decision: When to Lock In
Buy now (use on-demand/spot): If you need GPUs today, use spot instances and on-demand with auto-shutdown. Do not sign long-term commitments. Prices will be 30-40% lower in 6 months.
Wait 3-6 months: If you can wait, RTX 5090 cloud instances will create a new pricing floor for inference workloads. H100 spot prices should drop below $0.60/hr.
Do not: Sign 1-year reserved instance contracts at current prices. You will be overpaying by Q4 2026. The only exception is if your provider offers price-match guarantees.
Track the price drops: Our GPU price tracker updates every 6 hours. Bookmark it and check back — you will see the prices falling in real time.