GPU Cloud Security: What Happens to Your Data on Shared GPUs?

Here's something most ML engineers never think about: when you rent a GPU instance from a cloud provider, that GPU was being used by someone else five minutes ago. Their model weights, training data tensors, gradient values, and activations were all sitting in that GPU's VRAM. When their instance terminated, what happened to that data? On most providers, the answer is nothing. GPU VRAM is not automatically zeroed out between tenants on many platforms, especially marketplace providers. Your model is loading into memory that still contains fragments of the previous tenant's data, and the next tenant after you could theoretically read fragments of yours.

For most people doing research with public models and public datasets, this doesn't matter. But if you're fine-tuning on proprietary data, training on PII, or serving a commercial model whose weights are your competitive advantage, you should understand the security implications of shared GPU infrastructure — and make informed decisions about where to run your workloads.

The GPU Memory Problem, Explained

GPU VRAM behaves differently from CPU RAM in one important way: it's managed by the CUDA driver and the application, not the operating system. When a process allocates VRAM with cudaMalloc(), the driver returns a pointer to a block of VRAM, but it doesn't necessarily zero that memory first. The memory contains whatever was there before — stale data from previous allocations, which could be from a different user's process on a shared machine or from the previous tenant on a re-provisioned instance.

In practice, exploiting this is not trivial. An attacker would need to allocate VRAM, read the uninitialized contents, and interpret the raw bytes as meaningful data. For model weights (which are random-looking floating point numbers), reconstructing a usable model from VRAM fragments is extremely difficult. For training data — especially if it includes text, images, or structured records — the fragments could be more interpretable. The risk is theoretical for most workloads, but it's real enough that enterprise providers take it seriously.

Provider Security Tiers

Not all GPU cloud providers offer the same level of security. They fall into roughly three tiers based on their isolation model, memory management, and compliance certifications.

Tier 1: Enterprise Hyperscalers (AWS, GCP, Azure)

The hyperscalers run dedicated GPU instances with full memory sanitization between tenants. When your EC2 p5 instance terminates, AWS scrubs the GPU VRAM before assigning that hardware to another customer. They also provide: hardware-level isolation (no GPU sharing between tenants on standard instances), encrypted data at rest and in transit by default, VPC networking with security groups and NACLs, IAM-based access control with audit logging, and a full suite of compliance certifications including SOC 2 Type II, HIPAA, FedRAMP, PCI DSS, ISO 27001, and more. AWS will sign a Business Associate Agreement (BAA) for HIPAA workloads, making them legally accountable for protecting health information processed on their infrastructure.

Tier 2: Managed GPU Providers (Lambda, CoreWeave, Latitude.sh)

These providers own and operate their own data centers and hardware. They offer dedicated instances (not shared GPUs), private networking, and typically have SOC 2 Type II certification. Memory sanitization practices vary — some guarantee VRAM clearing between tenants, others don't explicitly commit to it. They generally don't have the full compliance suite of hyperscalers: HIPAA support is limited or unavailable, FedRAMP is not offered, and BAAs may not be available. For most commercial workloads that don't involve regulated data, Tier 2 providers offer adequate security at 3-5x lower cost than hyperscalers. Your instances are dedicated hardware, not shared, and the provider controls the physical infrastructure.

Tier 3: Marketplace Providers (Vast.ai, RunPod)

Marketplace providers aggregate GPU supply from a diverse pool of hosts: data centers, mining farms, individual GPU owners, and small hosting companies. The key distinction is that you're running your code on hardware that belongs to someone else — not the provider, but the individual host. This means the host has physical access to the hardware your code runs on. On Vast.ai, the host could theoretically inspect network traffic, read data from disk after you terminate your instance, or modify the CUDA driver. On RunPod, the infrastructure is more controlled (RunPod operates many of its own data centers), but some supply comes from community hosts with less oversight. Memory sanitization guarantees are typically not provided. Compliance certifications are limited.

Security Feature	AWS/GCP/Azure	Lambda/CoreWeave	Vast.ai/RunPod
GPU Memory Clearing	Yes (guaranteed)	Varies by provider	Not guaranteed
Tenant Isolation	Dedicated hardware	Dedicated hardware	Shared / varies
SOC 2 Type II	Yes	Yes / In progress	Limited / No
HIPAA / BAA	Yes	No / Limited	No
FedRAMP	Yes	No	No
Network Isolation	VPC / Private	Private networking	Public IP / SSH
Physical Security	Tier III/IV DCs	Owned DCs	Varies (host-dependent)

Risk Levels: Matching Security to Your Workload

Not every workload needs enterprise-grade security. The right provider depends on what data touches the GPU. Here's how to assess your risk level and choose accordingly.

Low Risk: Use Any Provider

If your workload involves inference of publicly available models (Llama 3, Mistral, Stable Diffusion) on non-sensitive inputs, there is nothing secret in VRAM worth stealing. The model weights are already public. The inference inputs are ephemeral. Use whatever provider offers the best price — Vast.ai at $0.01/hr is perfectly fine for this. Academic research with public datasets, hackathon projects, learning and experimentation, and public model benchmarking all fall in this category.

Medium Risk: Use Tier 2+ Providers

If you're fine-tuning on proprietary data (your company's documents, customer interactions, internal knowledge bases), the fine-tuned model weights and training data exist in VRAM during training. A data leak here means someone could potentially reconstruct aspects of your proprietary training data or replicate your fine-tuned model's capabilities. Use providers with dedicated instances and known infrastructure: Lambda, CoreWeave, Latitude.sh, or hyperscalers. Avoid peer-to-peer marketplaces for this tier. Commercial model serving where the weights themselves are your IP also belongs here.

High Risk: Enterprise Providers Only

If your training data includes personally identifiable information (PII), protected health information (PHI), financial records, or data subject to GDPR/CCPA/HIPAA, you need a provider with explicit compliance certifications and the willingness to sign binding agreements (like a HIPAA BAA). A data breach here isn't just embarrassing — it's a regulatory violation with real legal consequences. This means AWS, GCP, or Azure. No exceptions. The cost premium is the cost of compliance. Some teams have explored "anonymize first, train anywhere" approaches, but de-identification is hard to get right, and regulators don't accept "we tried" as a defense.

Practical Security Measures You Should Implement

Regardless of your provider, these operational practices reduce your attack surface.

Clear GPU memory before deallocating: Call torch.cuda.empty_cache(), then allocate a tensor of zeros that fills available VRAM, then free it. This overwrites VRAM with zeros before your instance terminates. It's not bulletproof (the driver could re-expose freed memory), but it eliminates casual snooping.
Encrypt model weights at rest: If your fine-tuned model weights are stored on disk, encrypt them. Use GPG or age for file-level encryption, or enable volume encryption if your provider supports it. This prevents the host or the next tenant from reading your model weights from the disk image after you terminate.
Use SSH tunnels for all traffic: Marketplace instances are often on public IPs with minimal firewall rules. Never expose ports directly. Use SSH port forwarding or a VPN to access services running on your GPU instance. Set up SSH keys (not passwords) and disable password authentication in sshd_config.
Don't store credentials on GPU instances: Never hardcode API keys, database passwords, or cloud credentials in environment variables or config files on shared infrastructure. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault) and fetch credentials at runtime. If your instance is compromised, the attacker shouldn't get access to your entire infrastructure.
Audit network traffic: Run tcpdump or Wireshark for a few minutes when you first provision a marketplace instance. Check for unexpected outbound connections. Some host machines have been caught running cryptominers or phoning home. If you see traffic you didn't initiate, terminate the instance immediately.
Use checksums on model files: After downloading model weights to your GPU instance, verify the SHA-256 checksum against a known-good value. This protects against man-in-the-middle attacks during download and against malicious modification by the host.

The Marketplace Security Model: Understanding the Risk

The peer-to-peer marketplace model (Vast.ai, and parts of RunPod's community cloud) deserves special attention because the trust model is fundamentally different from traditional cloud. When you rent from AWS, you're trusting Amazon — a company with billions of dollars in reputation at stake and legally binding compliance certifications. When you rent from a Vast.ai host, you're trusting an anonymous individual or small business whose identity you may not even know. The host has physical access to the hardware. They could, in theory, install a modified CUDA driver that logs all GPU operations, inspect network traffic at the host level, read data from the disk after your instance terminates, or modify the system software to exfiltrate data.

In practice, Vast.ai and RunPod have reputation systems and monitoring to catch malicious hosts, and the vast majority of hosts are legitimate businesses or enthusiasts. But the attack surface is larger than on a hyperscaler, and the guarantees are weaker. For low-risk workloads, this tradeoff is absolutely worth it — you're saving 80%+ on GPU costs. For anything involving sensitive data, it's not.

When security genuinely doesn't matter: If you're running inference of Llama 3 on public inputs, doing a Kaggle competition, prototyping a side project, or learning ML — use the cheapest GPU you can find and don't worry about security. The model weights are already public, your training data is a public dataset, and there's nothing in VRAM worth stealing. Save your security budget for workloads that actually have secrets to protect.

The Bottom Line

GPU cloud security is a spectrum, not a binary. Match your provider's security level to your workload's actual sensitivity. Public models and public data can run anywhere — use the cheapest provider. Proprietary models and commercial data should run on dedicated instances with known infrastructure. Regulated data (PII, PHI, financial) must run on HIPAA/SOC2-compliant hyperscalers with signed BAAs. The enterprise tax on security is real, but for regulated workloads, it's the cost of staying legal. For everything else, understand the risks, implement basic security hygiene, and save your money.

Our comparison tool shows pricing across all security tiers — from marketplace providers starting at $0.01/hr to enterprise-grade hyperscalers. Choose the tier that matches your risk profile, not the tier that matches your paranoia.