We want to use spot instances for batch work but they keep getting interrupted

Spot is typically 60-90 percent cheaper than on-demand at the price of a 2-minute termination warning at any time. Fine for stateless batch, not fine for stateful systems.

support/cloud-platforms/spot-instances-batch-werksteps: 5last checked: 12 Jun 2026by Semih Arisoy

Try this first

Architect the workload so an instance can stop mid-job without data loss. Write progress to S3, not local disk.
Use AWS Batch, EC2 Auto Scaling with Mixed Instances, or Spot Fleet. Don't run spot RunInstances by hand. Mixed Instances spreads across instance types and pools.
On Azure: Spot VMs in a Virtual Machine Scale Set with eviction policy Deallocate. On GCP: Preemptible or Spot in a MIG.
For long-running batch (hours): mix spot bulk with a few on-demand instances as 'insurance'. An 80/20 mix limits impact during mass eviction.
For latency-critical web or databases: never spot. There the savings aren't worth the operational pain.

When to bring us in

For ML training jobs of 12+ hours on spot, checkpointing isn't trivial. A short review of your SageMaker or Vertex config can save days of work.

None of the above fits?

Describe your situation below. We pass your input plus the steps you already saw to our AI and return tailored next-step advice. If it's too risky to DIY, we'll say so.

Or skip the DIY entirely

Our Managed IT clients do not look these things up. One point of contact, a fixed monthly price, resolved within working hours.

Bring us in How Managed IT works

We want to use spot instances for batch work but they keep getting interrupted

Try this first

When to bring us in

See also

None of the above fits?

Who are you?

Or skip the DIY entirely