Fine-Tuning Cost Estimate

~$1–3

Llama 3.1 8B QLoRA

1k examples, 3 epochs

~$8–15

Llama 3.1 70B QLoRA

1k examples, 3 epochs

A100 80GB

Recommended GPU

or H100 for 70B

30–60 min

Training time (8B)

on A100

Fine-tuning Llama gives you a model that follows your domain-specific format, style, or task. With QLoRA (Quantized Low-Rank Adaptation), you can fine-tune Llama 3.1 8B on a single A100 80GB for under $5. This guide uses Axolotl — the simplest fine-tuning toolkit that handles QLoRA out of the box.

Step 1: Provision a Cloud GPU

For Llama 3.1 8B fine-tuning: an A100 40GB ($1.19/hr on RunPod) is the minimum. For 70B: use A100 80GB ($1.89/hr on Lambda). SSH into your instance and verify CUDA:

# Verify GPU and CUDA
nvidia-smi
nvcc --version

# Update and install basics
apt-get update && apt-get install -y git python3-pip
pip install -U pip

Step 2: Format Your Training Data

Axolotl supports multiple formats. The simplest is Alpaca-style JSON:

# data.jsonl — one example per line
{"instruction": "Summarize this support ticket", "input": "Customer reports login fails on mobile app after update 3.2.1", "output": "Login regression introduced in 3.2.1 on mobile. Priority: high. Assign to mobile team."}
{"instruction": "Summarize this support ticket", "input": "User asks how to export data to CSV", "output": "Feature request for CSV export. Priority: low. Add to backlog."}

# Minimum viable dataset: 200+ examples
# Recommended: 1,000–5,000 high-quality examples
# More is not always better — quality > quantity

Step 3: Install Axolotl

git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl

pip install packaging ninja
pip install -e '.[flash-attn,deepspeed]'

# Log in to Hugging Face (to download Llama weights)
huggingface-cli login
# Paste your HF token from huggingface.co/settings/tokens

Step 4: Create Training Config

# config.yaml
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_4bit: true
strict: false

datasets:
  - path: data.jsonl
    type: alpaca

dataset_prepared_path: ./prepared_data
val_set_size: 0.05
output_dir: ./output

sequence_len: 4096
sample_packing: true

adapter: qlora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

bf16: true
tf32: true
gradient_checkpointing: true
micro_batch_size: 2
gradient_accumulation_steps: 4
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
logging_steps: 10
save_steps: 100

Step 5: Run Training

# Start fine-tuning
accelerate launch -m axolotl.cli.train config.yaml

# Monitor GPU usage in another terminal
watch -n 1 nvidia-smi

# Training for 1k examples takes ~30-60 min on A100
# Checkpoint saved every 100 steps in ./output/

Step 6: Merge Adapter and Export

# Merge QLoRA adapter into base model
python -m axolotl.cli.merge_lora config.yaml

# Convert to GGUF for local inference with Ollama/llama.cpp
cd llama.cpp
python convert_hf_to_gguf.py ../output/merged --outfile my-model.gguf

# Create Ollama model from GGUF
cat > Modelfile << 'EOF'
FROM ./my-model.gguf
SYSTEM "You are a helpful customer support agent."
EOF
ollama create my-fine-tuned-model -f Modelfile
ollama run my-fine-tuned-model

Cost Summary

At $1.19/hr for A100 40GB on RunPod, a 1-hour Llama 3.1 8B fine-tune costs about $1.20. The 70B on A100 80GB ($1.89/hr) for 4 hours is about $7.56. Always enable gradient checkpointing to reduce VRAM use, and use spot instances if your training has checkpoints.

→ A100 40GB Prices → A100 80GB Prices → H100 for Fine-Tuning → Best GPU for Training → Compare All Providers

How to Fine-Tune Llama on a Cloud GPU (Step by Step)

Step 1: Provision a Cloud GPU

Step 2: Format Your Training Data

Step 3: Install Axolotl

Step 4: Create Training Config

Step 5: Run Training

Step 6: Merge Adapter and Export

Cost Summary

Related Articles

How to Run Llama 4 Locally (Scout + Maverick)

How to Run DeepSeek R1 Locally (No GPU Required)

How to Run Gemma 4 Locally (Text, Audio, Image)