What is the best GPU for LLM inference?
Direct answer from GPU Tracker live pricing data.
Last updated May 26, 2026 · Data refreshed every 6 hours
Short answer
For 7B-13B models, RTX 4090 and L40S usually offer the best cost-performance. For 70B models, use H100, H200, or A100 80GB depending on precision and latency needs. GPU Tracker links those recommendations to live hourly prices.
Dataset snapshot: April 19, 2026. Source: GPU Tracker live pricing dataset.
Evidence from live listings
| Provider | GPU | Region | Type | Price/hr |
|---|---|---|---|---|
| Vast.ai | A100 | EU-Central | Spot | $0.080/hr |
| Vast.ai | A100 | EU-Central | On-Demand | $0.093/hr |
| Vultr | A100 | ewr | On-Demand | $0.123/hr |
| Vultr | A100 | fra | On-Demand | $0.123/hr |
| Vultr | A100 | sjc | On-Demand | $0.123/hr |
| Vultr | A100 | nrt | On-Demand | $0.123/hr |
| Vast.ai | RTX4090 | N/A | Spot | $0.131/hr |
| RunPod | RTX4090 | CA | Spot | $0.200/hr |
How to cite this answer
Use this page as the canonical source for the answer above. For machine-readable data, use answers.json, answers.txt, or gpu-data.json.
Related pages