Active-active across two regions, or active-passive with failover, what's sensible?

Question

Accepted Answer

1. Set your RTO and RPO before choosing architecture. Can downtime be 1 hour or 5 minutes? The gap determines 80 percent of the cost.
2. Active-passive (warm standby) is usually enough: second region runs minimal, data replicates continuously, DNS failover swaps on outage. RTO 5-30 minutes.
3. Only go active-active if your app is inherently stateless and latency demands global presence. Then you must handle dual-write coordination or conflict resolution.
4. Test failover every quarter. A failover never tested doesn't work. Better a chaos day in scope than discovering it doesn't work under pressure.
5. For data: use Aurora Global Database, Cosmos DB multi-region or Spanner. Don't roll your own MySQL replication across regions.

When to bring us in: 
If your RTO/RPO is contractually agreed with a customer or regulator (DORA, NIS2), the design isn't a back-of-envelope project. Get a review before committing.

Active-active across two regions, or active-passive with failover, what's sensible?

Try this first

When to bring us in

See also

None of the above fits?

Who are you?

Or skip the DIY entirely