Kubernetes rolling updates can either be seamless or catastrophic. The difference lies in understanding readiness probes, update strategies, and rollback procedures. This guide covers production-ready deployment patterns.

Deployment Anatomy

A Kubernetes Deployment manages ReplicaSets, which manage Pods. During an update, a new ReplicaSet is created and scaled up while the old one scales down.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myorg/myapp:v1.0.0
          ports:
            - containerPort: 8080

Rolling Update Strategy

The default strategy is RollingUpdate. Configure it to control how fast updates happen:

spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%        # Max pods above desired count during update
      maxUnavailable: 25%  # Max pods unavailable during update

Understanding maxSurge and maxUnavailable

With replicas: 6:

SettingmaxSurgemaxUnavailableDuring Update
Careful106→7→6: Never below desired count
Balanced25% (2)25% (1)4-8 pods: Faster but some capacity loss
Fast100% (6)50% (3)3-12 pods: Very fast, needs capacity

For production: Use maxSurge: 1, maxUnavailable: 0 for zero-downtime.

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # Never go below desired count

The Critical Role of Probes

Probes are the difference between a smooth rollout and an outage.

Readiness Probe

Tells Kubernetes when a pod is ready to receive traffic:

spec:
  containers:
    - name: myapp
      readinessProbe:
        httpGet:
          path: /health/ready
          port: 8080
        initialDelaySeconds: 5   # Wait before first check
        periodSeconds: 5         # Check every 5 seconds
        timeoutSeconds: 3        # Timeout for each check
        successThreshold: 1      # Successes needed to be ready
        failureThreshold: 3      # Failures before marking unready

Liveness Probe

Tells Kubernetes when to restart a pod:

spec:
  containers:
    - name: myapp
      livenessProbe:
        httpGet:
          path: /health/live
          port: 8080
        initialDelaySeconds: 15  # Wait for startup
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3      # 3 failures = restart

Startup Probe

For slow-starting applications (prevents liveness from killing during startup):

spec:
  containers:
    - name: myapp
      startupProbe:
        httpGet:
          path: /health/ready
          port: 8080
        initialDelaySeconds: 0
        periodSeconds: 5
        timeoutSeconds: 3
        failureThreshold: 30     # 30 × 5s = 150s max startup time

Implementing Health Endpoints

// Node.js health endpoints
app.get('/health/live', (req, res) => {
  // Liveness: Is the process running correctly?
  // Return 200 unless the process is fundamentally broken
  res.status(200).json({ status: 'alive' });
});

app.get('/health/ready', async (req, res) => {
  // Readiness: Can we serve traffic?
  // Check dependencies (database, cache, etc.)
  try {
    await db.query('SELECT 1');
    await redis.ping();
    res.status(200).json({ status: 'ready' });
  } catch (error) {
    res.status(503).json({ status: 'not ready', error: error.message });
  }
});

Graceful Shutdown

When a pod is terminated, Kubernetes sends SIGTERM. Your app must handle it:

spec:
  containers:
    - name: myapp
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 10"]  # Give time to drain
  terminationGracePeriodSeconds: 30  # Max time to shutdown
// Node.js graceful shutdown
const server = app.listen(8080);

process.on('SIGTERM', () => {
  console.log('SIGTERM received, starting graceful shutdown');
  
  // Stop accepting new connections
  server.close(() => {
    console.log('HTTP server closed');
    
    // Close database connections
    db.end(() => {
      console.log('Database connections closed');
      process.exit(0);
    });
  });

  // Force exit after timeout
  setTimeout(() => {
    console.log('Forcing exit after timeout');
    process.exit(1);
  }, 25000);  // Less than terminationGracePeriodSeconds
});

Complete Production Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
    version: v1.2.0
spec:
  replicas: 3
  revisionHistoryLimit: 5  # Keep last 5 ReplicaSets for rollback
  selector:
    matchLabels:
      app: myapp
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: myapp
        version: v1.2.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      terminationGracePeriodSeconds: 30
      
      # Don't schedule multiple replicas on the same node
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - myapp
                topologyKey: kubernetes.io/hostname
      
      containers:
        - name: myapp
          image: myorg/myapp:v1.2.0
          imagePullPolicy: IfNotPresent
          
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          
          env:
            - name: NODE_ENV
              value: "production"
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          
          startupProbe:
            httpGet:
              path: /health/ready
              port: http
            periodSeconds: 5
            failureThreshold: 30
          
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
            initialDelaySeconds: 0
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
            initialDelaySeconds: 0
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]
          
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp
spec:
  minAvailable: 2  # Always keep at least 2 pods running
  selector:
    matchLabels:
      app: myapp

Triggering and Monitoring Rollouts

Trigger a Rollout

# Update image
kubectl set image deployment/myapp myapp=myorg/myapp:v1.2.0

# Or edit directly
kubectl edit deployment/myapp

# Or apply updated manifest
kubectl apply -f deployment.yaml

Monitor Rollout Status

# Watch rollout status
kubectl rollout status deployment/myapp

# View rollout history
kubectl rollout history deployment/myapp

# View specific revision
kubectl rollout history deployment/myapp --revision=2

Rollback

# Rollback to previous revision
kubectl rollout undo deployment/myapp

# Rollback to specific revision
kubectl rollout undo deployment/myapp --to-revision=2

# Pause a problematic rollout
kubectl rollout pause deployment/myapp

# Resume rollout
kubectl rollout resume deployment/myapp

Blue-Green Deployments

For instant cutover with instant rollback:

# Blue deployment (current)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
        - name: myapp
          image: myorg/myapp:v1.0.0
---
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
        - name: myapp
          image: myorg/myapp:v1.1.0
---
# Service points to active version
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Switch to 'green' to cutover
  ports:
    - port: 80
      targetPort: 8080

Switch traffic:

kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

Canary Deployments

Gradually shift traffic to new version:

# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-stable
spec:
  replicas: 9  # 90% of traffic
  selector:
    matchLabels:
      app: myapp
      track: stable
  template:
    metadata:
      labels:
        app: myapp
        track: stable
    spec:
      containers:
        - name: myapp
          image: myorg/myapp:v1.0.0
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1  # 10% of traffic
  selector:
    matchLabels:
      app: myapp
      track: canary
  template:
    metadata:
      labels:
        app: myapp
        track: canary
    spec:
      containers:
        - name: myapp
          image: myorg/myapp:v1.1.0
---
# Service sends traffic to both
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp  # Matches both stable and canary
  ports:
    - port: 80
      targetPort: 8080

For fine-grained traffic control, use a service mesh like Istio or Linkerd.

Debugging Failed Rollouts

# Check deployment status
kubectl describe deployment myapp

# Check pod status
kubectl get pods -l app=myapp
kubectl describe pod <pod-name>

# Check logs
kubectl logs -l app=myapp --tail=100

# Check events
kubectl get events --sort-by='.lastTimestamp' | grep myapp

Common issues:

  • ImagePullBackOff: Image doesn’t exist or wrong credentials
  • CrashLoopBackOff: Application crashes on startup
  • Pending: Insufficient resources or scheduling constraints
  • Readiness probe failing: Application not responding to health checks

Key Takeaways

  1. Always use readiness probes — without them, traffic goes to unready pods
  2. Set maxUnavailable: 0 for zero-downtime updates
  3. Implement graceful shutdown — handle SIGTERM properly
  4. Use PodDisruptionBudgets to protect against cluster maintenance
  5. Keep revision history for quick rollbacks
  6. Use pod anti-affinity to spread replicas across nodes
  7. Start with rolling updates — blue-green and canary add complexity

“A deployment without readiness probes is just a hope that everything works.”