GPU cloud sounds intimidating, but the modern platforms have simplified it significantly. You don't need to understand networking, storage systems, or cloud infrastructure. RunPod is the most beginner-friendly option — it has pre-built templates, a web terminal, and costs start at $0.11/hr for a T4.
What GPU Do You Need?
| Use Case | GPU | Cost |
|---|---|---|
| Learning / experiments | RTX 3080 (16 GB) | $0.22/hr |
| Run 7B LLMs, small diffusion | RTX 4090 (24 GB) | $0.74/hr |
| Fine-tuning 7B–13B models | A100 40GB | $1.19/hr |
| Training, large LLMs | A100 80GB / H100 | $1.89–2.49/hr |
For a first instance, start with the RTX 4090 at $0.74/hr. It is fast, has 24 GB VRAM, and handles everything from running LLMs to generating images. You can always switch GPU types between sessions.
Step 1: Create a RunPod Account
Go to runpod.io, sign up, and add a payment method. RunPod charges per minute of use. Add $10 to start — that gives you ~13 hours of RTX 4090 time.
Step 2: Launch a Pod
Click "Deploy" in the RunPod dashboard. Select these settings:
GPU: RTX 4090 (Community Cloud for lowest price, Secure Cloud for reliability)
Template: "RunPod PyTorch" — pre-installs Python, PyTorch, CUDA
Container disk: 20 GB (enough for most models)
Volume disk: 50 GB (persists between sessions — costs ~$0.07/GB/month)
Click "Deploy On-Demand". The pod will start in 30–120 seconds.
Step 3: Connect to Your Instance
Option A: Use the built-in web terminal (no setup needed — click "Connect" in the dashboard). Option B: SSH from your local machine:
# RunPod gives you an SSH command — it looks like:
ssh root@YOUR_POD_IP -p YOUR_PORT -i ~/.ssh/id_rsa
# First time? Add your SSH key in RunPod Settings → SSH Keys
# Generate a key if you don't have one:
ssh-keygen -t ed25519 -C "your@email.com"
cat ~/.ssh/id_ed25519.pub # Copy this into RunPod settings
# Verify you're on the GPU instance
nvidia-smi
# Should show your RTX 4090Step 4: Run Your First Model
# Install Ollama on the instance
curl -fsSL https://ollama.com/install.sh | sh
# Pull and run Llama 3.1 8B (takes ~2 min to download)
ollama run llama3.1:8b
# Or run a Python script with transformers
pip install transformers accelerate
python3 - << 'EOF'
from transformers import pipeline
import torch
pipe = pipeline(
"text-generation",
model="microsoft/phi-3-mini-4k-instruct",
torch_dtype=torch.float16,
device_map="auto"
)
result = pipe("What is the capital of France?", max_new_tokens=50)
print(result[0]["generated_text"])
EOFStep 5: Start a Jupyter Notebook
# Install and start Jupyter
pip install jupyter
# Start with no browser (we'll access via port forwarding)
jupyter notebook --no-browser --port=8888 --ip=0.0.0.0 --allow-root
# In another terminal on your LOCAL machine:
ssh -L 8888:localhost:8888 root@YOUR_POD_IP -p YOUR_PORT
# Then open in your browser: http://localhost:8888
# Copy the token from the terminal outputImportant: Stop Your Pod When Done
GPU instances charge by the minute. Always stop your pod when you're not using it. In RunPod, click "Stop Pod" (not "Terminate" — that deletes everything). The volume disk keeps your files safe while the GPU is off. You'll only pay for storage (~$0.07/GB/month) while stopped.