Kubernetes Resource Management: Requests, Limits, and QoS
How Kubernetes allocates CPU and memory to pods, why resource requests and limits matter, and how to tune them for stable clusters.
Kubernetes Resource Management: Requests, Limits, and QoS
Under-configured Kubernetes clusters are unstable clusters. Pods evict unexpectedly, nodes run out of memory, the scheduler makes bad decisions. Most of these problems trace back to the same root cause: misconfigured resource requests and limits.
Here’s how it actually works.
Requests vs Limits
Every container in a pod can declare two resource settings per resource type (CPU and memory):
Request: The amount of resource the container is guaranteed to get. The scheduler uses this to decide which node to place the pod on.
Limit: The maximum the container can use. If it tries to exceed this, it gets throttled (CPU) or killed (memory).
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
250m CPU = 250 millicores = 0.25 of one CPU core.
How Scheduling Uses Requests
The scheduler looks at requests, not limits to decide where to place a pod. It finds a node where the sum of all scheduled pod requests fits within the node’s allocatable capacity.
This means you can overcommit — schedule pods whose limits exceed what the node can actually provide, relying on the assumption that not all pods will hit their limits simultaneously. This is normal and expected.
What’s dangerous: setting requests too low. If your container actually needs 512Mi but requests 64Mi, the scheduler may pack 8 such pods onto a node, then all 8 spike simultaneously and the node runs OOM.
Quality of Service Classes
Kubernetes automatically assigns one of three QoS classes to each pod, based on its resource configuration:
Guaranteed
Condition: Every container has requests == limits for both CPU and memory.
These pods are the last to be evicted under memory pressure. Ideal for latency-sensitive or critical workloads.
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "500m"
Burstable
Condition: At least one container has a request or limit set, but not all match.
Most pods fall here. They get evicted before Guaranteed pods when the node is under pressure.
BestEffort
Condition: No requests or limits set at all.
These pods get evicted first. Only use for truly non-critical background jobs.
CPU vs Memory: Different Behaviors
CPU and memory are handled very differently when a container hits its limit:
CPU limiting (throttling): The container is rate-limited. It won’t get more CPU than its limit, but it keeps running. This causes latency spikes, not crashes.
Memory limiting (OOM kill): When a container exceeds its memory limit, the kernel kills it with OOMKilled. Kubernetes then restarts the container based on the pod’s restart policy.
This is why memory limits are more dangerous to get wrong than CPU limits. A too-low memory limit causes repeated crashes. A too-low CPU limit causes slowness.
LimitRange: Setting Defaults
Manually setting resources on every deployment is error-prone. Use a LimitRange to define namespace-level defaults:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
memory: "256Mi"
cpu: "200m"
defaultRequest:
memory: "128Mi"
cpu: "100m"
max:
memory: "2Gi"
cpu: "2"
Now any container without explicit resource settings gets these defaults.
ResourceQuota: Limiting Namespaces
ResourceQuota caps total resource consumption in a namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
This prevents one team from consuming the entire cluster.
Vertical Pod Autoscaler (VPA)
Setting the right request values manually is guesswork. The Vertical Pod Autoscaler watches actual usage and recommends (or automatically sets) appropriate values.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # or "Off" for recommendations only
In Off mode, run VPA for a week to gather recommendations before applying them. Avoid Auto in production until you trust the recommendations.
Practical Guidelines
Always set requests. Without them, the scheduler is flying blind.
Set memory limits cautiously. If you’re not sure, set the limit 2x the request. Watch for OOMKilled events and adjust.
For CPU, limits are optional. CPU throttling is less harmful than OOM kills. Some teams omit CPU limits intentionally (setting only requests) to allow bursting.
Use namespace-level LimitRange as a safety net. Even if individual teams forget to set resources, the defaults protect the cluster.
Monitor kube_pod_container_resource_requests and kube_pod_container_resource_limits in Prometheus. Alert when requests are unset or when actual usage consistently exceeds requests.
Getting resources right is one of those unsexy infrastructure tasks that prevents 3 AM pages. Worth the investment.