FLUX.1 by Black Forest Labs (the team that created Stable Diffusion) is the current state of the art for open image generation. It handles text rendering in images far better than SDXL, produces photorealistic results, and the Schnell variant generates an image in just 4 steps. You can run it locally with 12 GB VRAM.
Requirements
| Component | Schnell | Dev |
|---|---|---|
| VRAM | 12 GB (FP8) / 16 GB (BF16) | 16 GB (FP8) / 24 GB (BF16) |
| RAM | 24 GB | 32 GB |
| GPU | RTX 3080 12GB, RTX 4070 | RTX 3090, RTX 4080/4090 |
| Storage | 24 GB (model files) | 24 GB |
| Python | 3.10+ | 3.10+ |
Method 1: Diffusers (Python)
pip install diffusers transformers accelerate sentencepiece protobuf
# For FP8 quantization (saves VRAM):
pip install optimum-quanto
from diffusers import FluxPipeline
import torch
# Load FLUX.1 Schnell (4-step, fast, Apache 2.0)
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload() # Optional: offload to CPU when not in use
image = pipe(
prompt="A photorealistic cat astronaut on the moon, 8K, detailed",
num_inference_steps=4, # Schnell only needs 4 steps
height=1024,
width=1024,
guidance_scale=0.0, # Schnell uses 0 guidance
generator=torch.Generator("cpu").manual_seed(42)
).images[0]
image.save("flux_output.png")
print("Saved flux_output.png")FP8 Quantization (Fits in 12 GB VRAM)
from diffusers import FluxPipeline
from optimum.quanto import freeze, qfloat8, quantize
import torch
# Load model
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16
)
# Quantize transformer to FP8 — reduces VRAM by ~40%
quantize(pipe.transformer, weights=qfloat8)
freeze(pipe.transformer)
pipe.to("cuda")
image = pipe(
prompt="A neon-lit Tokyo street at night, cinematic",
num_inference_steps=28,
guidance_scale=3.5,
height=1024,
width=1024,
).images[0]
image.save("flux_dev_output.png")Method 2: ComfyUI (No-Code GUI)
# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI && pip install -r requirements.txt
# Download FLUX.1 Schnell GGUF (smaller, runs on 8 GB VRAM)
mkdir -p models/unet
wget -O models/unet/flux1-schnell-q8_0.gguf \
"https://huggingface.co/city96/FLUX.1-schnell-gguf/resolve/main/flux1-schnell-Q8_0.gguf"
# Download text encoders
mkdir -p models/clip
wget -O models/clip/t5xxl_fp16.safetensors \
"https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors"
wget -O models/clip/clip_l.safetensors \
"https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors"
# Download VAE
mkdir -p models/vae
wget -O models/vae/ae.safetensors \
"https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/ae.safetensors"
# Start ComfyUI
python main.py --listen
# Load the FLUX workflow JSON from comfyanonymous/ComfyUI_examplesCloud Option: When 12 GB VRAM Is Not Enough
For FLUX.1 Dev at full BF16 precision, you need 24 GB VRAM. An RTX 4090 at $0.74/hr on RunPod handles this well. At 150 images/hr with a 5-second generation time (4 steps), FLUX Schnell is practical for production batch generation on cloud.