FLUX.1 Variants Compared

FLUX.1 [schnell]

VRAM: 12 GB

Steps: 4 steps

License: Apache 2.0

Fastest, free

FLUX.1 [dev]

VRAM: 16 GB

Steps: 20-50 steps

License: Non-commercial

Best quality free

FLUX.1 [pro]

VRAM: API only

Steps: —

License: Commercial

Best quality paid

FLUX.1 by Black Forest Labs (the team that created Stable Diffusion) is the current state of the art for open image generation. It handles text rendering in images far better than SDXL, produces photorealistic results, and the Schnell variant generates an image in just 4 steps. You can run it locally with 12 GB VRAM.

Requirements

Component	Schnell	Dev
VRAM	12 GB (FP8) / 16 GB (BF16)	16 GB (FP8) / 24 GB (BF16)
RAM	24 GB	32 GB
GPU	RTX 3080 12GB, RTX 4070	RTX 3090, RTX 4080/4090
Storage	24 GB (model files)	24 GB
Python	3.10+	3.10+

Method 1: Diffusers (Python)

pip install diffusers transformers accelerate sentencepiece protobuf
# For FP8 quantization (saves VRAM):
pip install optimum-quanto

from diffusers import FluxPipeline
import torch

# Load FLUX.1 Schnell (4-step, fast, Apache 2.0)
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()  # Optional: offload to CPU when not in use

image = pipe(
    prompt="A photorealistic cat astronaut on the moon, 8K, detailed",
    num_inference_steps=4,     # Schnell only needs 4 steps
    height=1024,
    width=1024,
    guidance_scale=0.0,        # Schnell uses 0 guidance
    generator=torch.Generator("cpu").manual_seed(42)
).images[0]
image.save("flux_output.png")
print("Saved flux_output.png")

FP8 Quantization (Fits in 12 GB VRAM)

from diffusers import FluxPipeline
from optimum.quanto import freeze, qfloat8, quantize
import torch

# Load model
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
)

# Quantize transformer to FP8 — reduces VRAM by ~40%
quantize(pipe.transformer, weights=qfloat8)
freeze(pipe.transformer)

pipe.to("cuda")

image = pipe(
    prompt="A neon-lit Tokyo street at night, cinematic",
    num_inference_steps=28,
    guidance_scale=3.5,
    height=1024,
    width=1024,
).images[0]
image.save("flux_dev_output.png")

Method 2: ComfyUI (No-Code GUI)

# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI && pip install -r requirements.txt

# Download FLUX.1 Schnell GGUF (smaller, runs on 8 GB VRAM)
mkdir -p models/unet
wget -O models/unet/flux1-schnell-q8_0.gguf \
  "https://huggingface.co/city96/FLUX.1-schnell-gguf/resolve/main/flux1-schnell-Q8_0.gguf"

# Download text encoders
mkdir -p models/clip
wget -O models/clip/t5xxl_fp16.safetensors \
  "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors"
wget -O models/clip/clip_l.safetensors \
  "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors"

# Download VAE
mkdir -p models/vae
wget -O models/vae/ae.safetensors \
  "https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/ae.safetensors"

# Start ComfyUI
python main.py --listen
# Load the FLUX workflow JSON from comfyanonymous/ComfyUI_examples

Cloud Option: When 12 GB VRAM Is Not Enough

For FLUX.1 Dev at full BF16 precision, you need 24 GB VRAM. An RTX 4090 at $0.74/hr on RunPod handles this well. At 150 images/hr with a 5-second generation time (4 steps), FLUX Schnell is practical for production batch generation on cloud.

→ RTX 4090 Cloud Prices → RTX 3090 Instances → L40S for Image Gen → Best GPU for Stable Diffusion → All Providers

How to Run FLUX Image Generation Locally

Requirements

Method 1: Diffusers (Python)

FP8 Quantization (Fits in 12 GB VRAM)

Method 2: ComfyUI (No-Code GUI)

Cloud Option: When 12 GB VRAM Is Not Enough

Related Articles

How to Run Llama 4 Locally (Scout + Maverick)

How to Run DeepSeek R1 Locally (No GPU Required)

How to Run Gemma 4 Locally (Text, Audio, Image)