Best GPU for Stable Diffusion: Cloud Setup Guide

Cost Per Image — SDXL 1024x1024 (20 steps)

RTX 4090 (RunPod)

$0.74/hr

~150 imgs/hr

$0.005/img

A10G (AWS spot)

$0.44/hr

~80 imgs/hr

$0.0055/img

L40S (CoreWeave)

$1.12/hr

~220 imgs/hr

$0.0051/img

A100 80GB

$1.89/hr

~280 imgs/hr

$0.0068/img

T4 (GCP spot)

$0.11/hr

~25 imgs/hr

$0.0044/img

For Stable Diffusion, the RTX 4090 is the best value GPU for individual use. It generates SDXL images at 150/hr for $0.74/hr on RunPod — about half a cent per image. The T4 is cheapest per hour but slowest, making it poor value for batch generation.

VRAM Requirements by Model

Model	Min VRAM	Optimal VRAM	Notes
SD 1.5	4 GB	8 GB	Legacy, still widely used
SDXL	8 GB	16 GB	Higher res, 2-stage pipeline
SD 3.5 Medium	8 GB	16 GB	Better text rendering
SD 3.5 Large	16 GB	24 GB	Best quality
FLUX.1 Dev	16 GB	24 GB	State-of-art photorealism
FLUX.1 Schnell	12 GB	16 GB	Fast, 4-step generation

Setting Up ComfyUI on a Cloud GPU

ComfyUI is the most flexible Stable Diffusion interface. Here's how to run it on a RunPod RTX 4090:

# On your RunPod instance (use the PyTorch template)
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

# Download SDXL model
mkdir -p models/checkpoints
wget -O models/checkpoints/sdxl_base.safetensors \
  "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors"

# Download FLUX.1 Schnell (4-step, fast)
wget -O models/checkpoints/flux1-schnell.safetensors \
  "https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors"

# Start ComfyUI with web UI accessible externally
python main.py --listen 0.0.0.0 --port 8188

# Access via: http://YOUR_INSTANCE_IP:8188

Setting Up Automatic1111 WebUI

# Install dependencies
apt-get install -y libgl1 libglib2.0-0 wget git python3-pip

# Clone and install
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
pip install -r requirements.txt

# Download a model into models/Stable-diffusion/
# Then start with public access
python launch.py --listen --xformers --api

# xformers flag enables memory-efficient attention
# --api enables REST API at /sdapi/v1/
# Access at http://YOUR_IP:7860

Batch Generation via API

import requests, base64, json

def generate_image(prompt, steps=20, width=1024, height=1024):
    payload = {
        "prompt": prompt,
        "steps": steps,
        "width": width,
        "height": height,
        "sampler_name": "DPM++ 2M Karras"
    }
    r = requests.post("http://YOUR_IP:7860/sdapi/v1/txt2img", json=payload)
    r.raise_for_status()
    img_b64 = r.json()["images"][0]
    img_data = base64.b64decode(img_b64)
    return img_data

# Batch 100 images
prompts = ["cyberpunk city", "mountain lake", "abstract art"] * 34
for i, prompt in enumerate(prompts):
    img = generate_image(prompt)
    with open(f"output_{i:04d}.png", "wb") as f:
        f.write(img)

→ RTX 4090 Prices → L40S Instances → A10G Cloud → Best GPU for Stable Diffusion → All Providers

Best GPU for Stable Diffusion: Cloud Setup Guide

VRAM Requirements by Model

Setting Up ComfyUI on a Cloud GPU

Setting Up Automatic1111 WebUI

Batch Generation via API

Related Articles

How to Run Llama 4 Locally (Scout + Maverick)

How to Run DeepSeek R1 Locally (No GPU Required)

How to Run Gemma 4 Locally (Text, Audio, Image)