Every week someone asks: "what GPU should I rent?" And every week they get the same useless answer: "it depends on your workload." That is technically true and practically worthless. So here is what nobody will commit to: a definitive tier list of every cloud GPU, ranked by price-to-performance, based on real pricing data from 5,000+ instances we track across 18 providers. I am going to assign a letter grade to every GPU. You are going to disagree with at least three of them. That is the point.
The methodology: for each GPU, I took the median on-demand price across all providers (not the cheapest — that is usually one outlier), the FP16 TFLOPS, the VRAM, and computed a composite score: (TFLOPS * VRAM) / (price/hr * 1000). Then I adjusted for real-world factors: driver maturity, actual inference throughput (not theoretical), multi-GPU scaling efficiency, and availability. The result is a ranking that reflects what you actually experience, not what a spec sheet promises.
S Tier: The No-Brainers
H100 SXM (80GB) — from $1.29/hr on-demand
The king. 1,979 FP8 TFLOPS, 80GB HBM3, 3.35 TB/s bandwidth. Nothing touches it for large model training and high-throughput inference. The only GPU where you can run a full Llama 3 70B in FP16 without quantization on a single card with room to spare for KV cache. The price has dropped 47% in 12 months and is still falling. At $1.29/hr on smaller providers, the performance-per-dollar is unmatched for any workload that needs 48GB+ VRAM.
RTX 4090 (24GB) — from $0.19/hr spot, $0.39/hr on-demand
The inference king for models that fit in 24GB. 330 FP8 TFLOPS in a consumer card. Runs Llama 3 8B quantized at 80+ tokens/sec, SDXL at 1.2 sec/image. At $0.39/hr on-demand, the cost per token is lower than an A100. The catch: 24GB VRAM means you are limited to 13B quantized or 7B FP16. But for those workloads, nothing beats it.
H200 (141GB) — from $1.49/hr
141GB HBM3e with 4.8 TB/s bandwidth. Nearly 2x the VRAM of H100 at similar or lower prices on some providers. For 70B models in FP16 with massive KV cache headroom, the H200 is actually cheaper per inference than the H100 because you do not need to quantize. The bandwidth advantage means 1.5-1.9x faster inference on memory-bound workloads. Still limited availability, but S tier when you can get it.
A Tier: Excellent Value
L40S (48GB) — from $0.26/hr spot, $0.69/hr on-demand
The most underrated GPU in the cloud. 48GB GDDR6X, 362 FP8 TFLOPS, Ada Lovelace architecture. Handles 13B models in FP16 with room for KV cache. At $0.69/hr on-demand, it is half the price of an A100 80GB with better FP8 performance. The downside: GDDR6X bandwidth is 864 GB/s vs. the A100's 2 TB/s, so memory-bound inference is slower. For compute-bound workloads like batch inference and fine-tuning, it punches way above its price class.
A100 80GB SXM — from $0.34/hr spot, $1.10/hr on-demand
The workhorse. Everyone knows it. 312 FP16 TFLOPS, 80GB HBM2e, 2 TB/s bandwidth. At $1.10/hr on-demand it is overpriced for what you get — the H100 at $1.29/hr is only 17% more expensive but 3x faster in FP8. But at $0.34/hr spot, the A100 is an absolute steal. The only reason it is A tier instead of S: on-demand pricing has not dropped fast enough to keep up with the H100 price collapse.
RTX 3090 (24GB) — from $0.07/hr spot
A five-year-old consumer card at $0.07/hr. At that price, who cares that it is old? Runs 7B quantized models at 40+ tok/s. Stable Diffusion 1.5 at sub-second generation times. For hobbyists and small experiments, nothing in the cloud is cheaper per TFLOP. The catch: limited availability, no FP8, and only 24GB VRAM. But for the price, A tier all day.
B Tier: Solid but Situational
A6000 (48GB) — from $0.47/hr
48GB VRAM at a low price. Great for fine-tuning 13B models and inference up to 30B quantized. But the GDDR6 bandwidth (768 GB/s) makes it memory-bound sooner than an A100. If your workload is compute-bound, the A6000 is great. If it is memory-bound, you will notice the bottleneck. B tier because the L40S does almost everything better for a similar price.
A10G (24GB) — from $0.20/hr
AWS's workhorse inference GPU. 24GB VRAM, 125 TFLOPS. Cheap, widely available, and reliable. But it is Ampere-generation with no FP8 support. For 7B inference and image generation, it works fine. For anything bigger, you need something else. The reason it is B tier: at $0.20/hr, the RTX 3090 at $0.07/hr gives you more TFLOPS for less money on marketplace providers.
MI300X (192GB) — from $3.45/hr
192GB HBM3, 5,300 FP8 TFLOPS. On paper, this destroys the H100. In practice, the software ecosystem (ROCm vs. CUDA) still makes it a gamble. If your framework supports it well and you need 192GB for massive models, B tier becomes A tier. For most people, the ecosystem friction pushes it down.
C Tier: Overpriced or Outdated
V100 (16/32GB) — from $0.10/hr
Old. No FP8, no TF32 in hardware, max 32GB VRAM. At $0.10/hr it is cheap, but an RTX 3090 at $0.07/hr is faster with equal VRAM. The V100 is only C tier because it still exists in cloud catalogs and people still rent it out of habit.
T4 (16GB) — from $0.06/hr
Turing architecture, 65 TFLOPS, 16GB VRAM. The T4 was great for inference in 2020. In 2026, it cannot even run a 7B model in FP16 (needs 14GB just for weights, no room for KV cache). Useful only for tiny models or image classification. At $0.06/hr it is cheap, but so are your results.
F Tier: Avoid
K80 (12GB) — from $0.03/hr
Kepler architecture. From 2014. 6 TFLOPS. 12GB VRAM shared across two GPU dies. Cannot run any modern LLM. Cannot run Stable Diffusion. Can barely run a CNN from 2018. The fact that AWS still lists this GPU is an indictment of cloud provider catalogs. It costs $0.03/hr because nobody should pay more.
P100 (16GB) — from $0.05/hr
Pascal architecture. 18 TFLOPS. The P100 is the GPU equivalent of a flip phone. It works for trivially small models and basic PyTorch training on tiny datasets. For AI workloads in 2026, this is effectively an expensive paperweight.
The Quick Reference Chart
| Tier | GPU | VRAM | From $/hr | Best For |
|---|---|---|---|---|
| S | H100 SXM | 80GB | $1.29 | Everything |
| S | RTX 4090 | 24GB | $0.19 | Inference under 13B |
| S | H200 | 141GB | $1.49 | 70B+ models |
| A | L40S | 48GB | $0.26 | Mid-range everything |
| A | A100 80GB | 80GB | $0.34 | Spot deals, training |
| A | RTX 3090 | 24GB | $0.07 | Budget inference |
| B | A6000 | 48GB | $0.47 | Fine-tuning 13B |
| B | A10G | 24GB | $0.20 | Light inference |
| B | MI300X | 192GB | $3.45 | Huge models (ROCm) |
| C | V100 | 32GB | $0.10 | Nothing (use 3090) |
| C | T4 | 16GB | $0.06 | Tiny models only |
| F | K80 | 12GB | $0.03 | Nostalgia |
| F | P100 | 16GB | $0.05 | Nostalgia |
Disagree? Check the latest real-time prices yourself on our GPU comparison page — prices change every 6 hours, and some of these tier rankings will shift as prices keep falling.
Live Pricing Pages for Top-Ranked GPUs