Skip to main content
guidellmlocal

How to Run Qwen 3 Locally with Ollama

All Qwen 3 variants from 4B to 72B. Ollama commands, tool calling, and thinking mode examples.

April 10, 20267 min read
Qwen 3 Quick Reference
Qwen3 0.6B
1 GB
Edge / IoT
Qwen3 4B
3 GB
Laptop
Qwen3 8B
6 GB
Daily driver
Qwen3 14B
10 GB
RTX 3080
Qwen3 32B
22 GB
RTX 3090/4090
Qwen3 72B
48 GB
Multi-GPU / Cloud

Qwen 3 is Alibaba's latest open model family. The Qwen3 8B beats GPT-4o on HumanEval coding benchmarks and runs on a single consumer GPU. It supports 128K context, 29 languages, and tool calling out of the box.

Step 1: Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version

# Start Ollama server (runs in background automatically)
ollama serve &

Step 2: Pull Your Qwen 3 Variant

# For most developers: 8B is the sweet spot
ollama pull qwen3:8b

# Pull a specific quantization
ollama pull qwen3:14b-q4_K_M   # 9 GB VRAM
ollama pull qwen3:14b-q8_0     # 15 GB VRAM (better quality)

# List what you have downloaded
ollama list

Step 3: Run Qwen 3

# Interactive chat
ollama run qwen3:8b

# Single prompt
ollama run qwen3:8b "Write a Python function to parse CSV files"

# With extended context (requires more VRAM)
OLLAMA_NUM_CTX=32768 ollama run qwen3:14b

# Thinking mode (built-in CoT, like DeepSeek R1)
ollama run qwen3:8b "/think Prove that 0.999... = 1"

Using Qwen 3 as an API

# Python with OpenAI SDK
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = client.chat.completions.create(
    model="qwen3:8b",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a binary search in Python"}
    ]
)
print(response.choices[0].message.content)

# Tool calling (Qwen3 supports function calling natively)
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
    }
}]
response = client.chat.completions.create(
    model="qwen3:8b",
    messages=[{"role":"user","content":"What's the weather in Tokyo?"}],
    tools=tools
)

Performance vs Cloud Cost

SetupTokens/secCost
RTX 4090 local (14B)~80 t/s$0 (owned)
RTX 4090 RunPod (14B)~80 t/s$0.74/hr
A100 Lambda (72B Q4)~45 t/s$1.89/hr
CPU only (8B)~8 t/s$0 (laptop)

Stay ahead on GPU pricing

Get weekly GPU price reports, new hardware analysis, and cost optimization tips. Join engineers and researchers who save thousands on cloud compute.

No spam. Unsubscribe anytime. We respect your inbox.

Find the cheapest GPU for your workload

Compare real-time prices across tracked cloud providers and marketplaces with 5,000+ instances. Updated every 6 hours.

Compare GPU Prices →

Related Articles