Zero-downtime deploys with blue-green on Kubernetes
By Daniel Samson · 2026-03-24
Kubernetes RollingUpdate gets you near-zero downtime out of the box. But you can't validate the new version before traffic hits it, there's a brief window where both versions are live at once, and rolling back means pulling the old image again. I wanted true blue-green. Here's the shape it took.
Two permanent slots
Every environment has a blue slot and a green slot. One is always live, the other is standby. Each slot duplicates the whole stateless app — web, scheduler, the web and video queue workers, and the Reverb websocket server. The version under test runs fully, in the standby slot, before it sees a single real request.
Switching traffic is one label
Two router services select pods by a slot label. Switching traffic means flipping that label on two services and updating an active-slot ConfigMap — all in a single git commit that Fleet applies atomically. Rollback is the same flip in reverse, and because the old slot is still running untouched, it's instant. No image pull, no pod restart, no waiting.
Share state, duplicate compute
The trick is being deliberate about what's per-slot and what's shared. Duplicate the stateless app per slot. Share the stateful and the expensive: the database (with a database-per-slot if you need schema isolation during a cutover), search, object storage, and the ingress, which just points at the router services and never moves.
The cost
You're running two copies of the application at all times. On a homelab that's real RAM you have to budget for. In exchange you get pre-switch health validation, an atomic cutover, and genuinely instant rollback. For anything user-facing, I think that's a trade worth making.