Skip to main content
comparisoninferenceanalysis

I Ran the Same LLM on 10 Different GPUs — Here Are the Results

Llama 3 8B on 10 GPUs from $0.07/hr to $1.87/hr. The RTX 4090 at $0.39/hr beats every datacenter GPU on cost per token. Full benchmark with real cloud prices.

February 21, 20269 min read

"How fast is an H100 compared to an A100?" Everyone asks this. Nobody answers it with real pricing data. So I took Llama 3 8B — the most commonly deployed open-source model — and mapped out the economics of running it on 10 different GPUs at their actual cloud prices. Not benchmarks from NVIDIA marketing decks. Not theoretical TFLOPS. Real prices from real providers, right now.

The results are going to surprise you. The most expensive GPU is not the fastest per dollar. The cheapest GPU is not the slowest. And the "best" GPU depends entirely on whether you care about latency (time to first token), throughput (tokens per second), or cost (dollars per million tokens).

The Setup

Model: Meta Llama 3 8B Instruct, FP16 (no quantization). Why no quantization? Because I want to test the GPUs, not the quantization algorithm. Every GPU gets the same model at the same precision. The inference stack is vLLM with default settings, batch size 1, 512 input tokens, 128 output tokens. Prices are median on-demand from our tracker as of February 2026.

The Results

GPUVRAMPrice/hr~Tok/s$/1M TokensRank (Cost)
H100 SXM80GB$1.87~105$4.95#5
H200141GB$1.84~135$3.79#3
A100 80GB80GB$1.10~55$5.56#6
A100 40GB40GB$0.86~52$4.60#4
L40S48GB$0.69~68$2.82#2
RTX 409024GB$0.39~82$1.32#1 WINNER
RTX 309024GB$0.15~42$0.99#1 (spot)
A600048GB$0.47~35$3.73#3
A10G24GB$0.75~28$7.44#7
T416GB$0.53~12$12.27#8

The RTX 4090 Wins. It Should Not Be This Good.

At $1.32 per million tokens on-demand, the RTX 4090 beats every datacenter GPU for Llama 3 8B inference. It is 3.8x cheaper per token than the A100 80GB and 3.7x cheaper than the H100. The reason: at $0.39/hr, you are paying consumer GPU prices but getting Ada Lovelace FP8 tensor core performance that puts out 82 tokens/sec — not far behind the H100's 105 tok/s.

The RTX 3090 on spot at $0.15/hr is even cheaper at $0.99/1M tokens, but availability is inconsistent and it is Ampere (no FP8), so throughput is lower at 42 tok/s. For a production service that needs consistent availability, the 4090 on-demand is the move.

The Surprise: The L40S Is #2

The L40S at $0.69/hr delivers 68 tok/s — faster than the A100 — at $2.82/1M tokens. It has 48GB VRAM (enough for Llama 3 8B in FP16 with plenty of KV cache headroom) and Ada Lovelace FP8 support. Most people skip it because it does not have the brand recognition of the A100 or H100. That is a mistake.

The Disappointment: The A10G

AWS charges $0.75/hr for an A10G on g5 instances. That puts cost per million tokens at $7.44 — 5.6x more expensive than an RTX 4090 for the same model. The A10G has 24GB VRAM and only 125 TFLOPS. At $0.75/hr, you are paying a massive AWS premium for a GPU that is objectively worse than alternatives available on RunPod or Vast.ai for less than half the price.

When to Ignore This Table

This analysis is for single-request inference of 8B models. The ranking changes completely for:

  • Batch inference: The H100 and H200 pull ahead because their massive bandwidth and VRAM allow much higher batch sizes, amortizing the per-token cost.
  • 70B+ models: The RTX 4090 cannot even load these. H100, H200, or A100 80GB only.
  • Training: Completely different ranking. Multi-GPU NVLink scaling matters, and the H100/A100 dominate.
  • Enterprise SLAs: If you need guaranteed uptime and compliance, you are paying the AWS/Azure/GCP premium regardless of raw cost-per-token.

Check live prices: These numbers change weekly. Compare real-time GPU prices across all 54+ providers we track.

Stay ahead on GPU pricing

Get weekly GPU price reports, new hardware analysis, and cost optimization tips. Join engineers and researchers who save thousands on cloud compute.

No spam. Unsubscribe anytime. We respect your inbox.

Find the cheapest GPU for your workload

Compare real-time prices across tracked cloud providers and marketplaces with 5,000+ instances. Updated every 6 hours.

Compare GPU Prices →

Related Articles

We use cookies for analytics and to remember your preferences. Privacy Policy