Kubernetes Deployments: Rolling Updates Done Right
Master Kubernetes rolling updates with proper readiness probes, deployment strategies, rollback procedures, and zero-downtime deployments.
Kubernetes rolling updates can either be seamless or catastrophic. The difference lies in understanding readiness probes, update strategies, and rollback procedures. This guide covers production-ready deployment patterns.
Deployment Anatomy
A Kubernetes Deployment manages ReplicaSets, which manage Pods. During an update, a new ReplicaSet is created and scaled up while the old one scales down.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myorg/myapp:v1.0.0
ports:
- containerPort: 8080
Rolling Update Strategy
The default strategy is RollingUpdate. Configure it to control how fast updates happen:
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25% # Max pods above desired count during update
maxUnavailable: 25% # Max pods unavailable during update
Understanding maxSurge and maxUnavailable
With replicas: 6:
| Setting | maxSurge | maxUnavailable | During Update |
|---|---|---|---|
| Careful | 1 | 0 | 6→7→6: Never below desired count |
| Balanced | 25% (2) | 25% (1) | 4-8 pods: Faster but some capacity loss |
| Fast | 100% (6) | 50% (3) | 3-12 pods: Very fast, needs capacity |
For production: Use maxSurge: 1, maxUnavailable: 0 for zero-downtime.
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Never go below desired count
The Critical Role of Probes
Probes are the difference between a smooth rollout and an outage.
Readiness Probe
Tells Kubernetes when a pod is ready to receive traffic:
spec:
containers:
- name: myapp
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5 # Wait before first check
periodSeconds: 5 # Check every 5 seconds
timeoutSeconds: 3 # Timeout for each check
successThreshold: 1 # Successes needed to be ready
failureThreshold: 3 # Failures before marking unready
Liveness Probe
Tells Kubernetes when to restart a pod:
spec:
containers:
- name: myapp
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 15 # Wait for startup
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # 3 failures = restart
Startup Probe
For slow-starting applications (prevents liveness from killing during startup):
spec:
containers:
- name: myapp
startupProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30 # 30 × 5s = 150s max startup time
Implementing Health Endpoints
// Node.js health endpoints
app.get('/health/live', (req, res) => {
// Liveness: Is the process running correctly?
// Return 200 unless the process is fundamentally broken
res.status(200).json({ status: 'alive' });
});
app.get('/health/ready', async (req, res) => {
// Readiness: Can we serve traffic?
// Check dependencies (database, cache, etc.)
try {
await db.query('SELECT 1');
await redis.ping();
res.status(200).json({ status: 'ready' });
} catch (error) {
res.status(503).json({ status: 'not ready', error: error.message });
}
});
Graceful Shutdown
When a pod is terminated, Kubernetes sends SIGTERM. Your app must handle it:
spec:
containers:
- name: myapp
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"] # Give time to drain
terminationGracePeriodSeconds: 30 # Max time to shutdown
// Node.js graceful shutdown
const server = app.listen(8080);
process.on('SIGTERM', () => {
console.log('SIGTERM received, starting graceful shutdown');
// Stop accepting new connections
server.close(() => {
console.log('HTTP server closed');
// Close database connections
db.end(() => {
console.log('Database connections closed');
process.exit(0);
});
});
// Force exit after timeout
setTimeout(() => {
console.log('Forcing exit after timeout');
process.exit(1);
}, 25000); // Less than terminationGracePeriodSeconds
});
Complete Production Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
version: v1.2.0
spec:
replicas: 3
revisionHistoryLimit: 5 # Keep last 5 ReplicaSets for rollback
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: myapp
version: v1.2.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
terminationGracePeriodSeconds: 30
# Don't schedule multiple replicas on the same node
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
containers:
- name: myapp
image: myorg/myapp:v1.2.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: NODE_ENV
value: "production"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
startupProbe:
httpGet:
path: /health/ready
port: http
periodSeconds: 5
failureThreshold: 30
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
---
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: http
protocol: TCP
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp
spec:
minAvailable: 2 # Always keep at least 2 pods running
selector:
matchLabels:
app: myapp
Triggering and Monitoring Rollouts
Trigger a Rollout
# Update image
kubectl set image deployment/myapp myapp=myorg/myapp:v1.2.0
# Or edit directly
kubectl edit deployment/myapp
# Or apply updated manifest
kubectl apply -f deployment.yaml
Monitor Rollout Status
# Watch rollout status
kubectl rollout status deployment/myapp
# View rollout history
kubectl rollout history deployment/myapp
# View specific revision
kubectl rollout history deployment/myapp --revision=2
Rollback
# Rollback to previous revision
kubectl rollout undo deployment/myapp
# Rollback to specific revision
kubectl rollout undo deployment/myapp --to-revision=2
# Pause a problematic rollout
kubectl rollout pause deployment/myapp
# Resume rollout
kubectl rollout resume deployment/myapp
Blue-Green Deployments
For instant cutover with instant rollback:
# Blue deployment (current)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: myapp
image: myorg/myapp:v1.0.0
---
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: myapp
image: myorg/myapp:v1.1.0
---
# Service points to active version
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
version: blue # Switch to 'green' to cutover
ports:
- port: 80
targetPort: 8080
Switch traffic:
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'
Canary Deployments
Gradually shift traffic to new version:
# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
spec:
replicas: 9 # 90% of traffic
selector:
matchLabels:
app: myapp
track: stable
template:
metadata:
labels:
app: myapp
track: stable
spec:
containers:
- name: myapp
image: myorg/myapp:v1.0.0
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1 # 10% of traffic
selector:
matchLabels:
app: myapp
track: canary
template:
metadata:
labels:
app: myapp
track: canary
spec:
containers:
- name: myapp
image: myorg/myapp:v1.1.0
---
# Service sends traffic to both
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp # Matches both stable and canary
ports:
- port: 80
targetPort: 8080
For fine-grained traffic control, use a service mesh like Istio or Linkerd.
Debugging Failed Rollouts
# Check deployment status
kubectl describe deployment myapp
# Check pod status
kubectl get pods -l app=myapp
kubectl describe pod <pod-name>
# Check logs
kubectl logs -l app=myapp --tail=100
# Check events
kubectl get events --sort-by='.lastTimestamp' | grep myapp
Common issues:
- ImagePullBackOff: Image doesn’t exist or wrong credentials
- CrashLoopBackOff: Application crashes on startup
- Pending: Insufficient resources or scheduling constraints
- Readiness probe failing: Application not responding to health checks
Key Takeaways
- Always use readiness probes — without them, traffic goes to unready pods
- Set
maxUnavailable: 0for zero-downtime updates - Implement graceful shutdown — handle SIGTERM properly
- Use PodDisruptionBudgets to protect against cluster maintenance
- Keep revision history for quick rollbacks
- Use pod anti-affinity to spread replicas across nodes
- Start with rolling updates — blue-green and canary add complexity
“A deployment without readiness probes is just a hope that everything works.”