Containers March 21, 2026 ⏱ 4 min read

Kubernetes Resource Management: Requests, Limits, and QoS

How Kubernetes allocates CPU and memory to pods, why resource requests and limits matter, and how to tune them for stable clusters.

kubernetesresourcesqoscpumemorypods

Kubernetes Resource Management: Requests, Limits, and QoS

Under-configured Kubernetes clusters are unstable clusters. Pods evict unexpectedly, nodes run out of memory, the scheduler makes bad decisions. Most of these problems trace back to the same root cause: misconfigured resource requests and limits.

Here’s how it actually works.

Requests vs Limits

Every container in a pod can declare two resource settings per resource type (CPU and memory):

Request: The amount of resource the container is guaranteed to get. The scheduler uses this to decide which node to place the pod on.

Limit: The maximum the container can use. If it tries to exceed this, it gets throttled (CPU) or killed (memory).

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

250m CPU = 250 millicores = 0.25 of one CPU core.

How Scheduling Uses Requests

The scheduler looks at requests, not limits to decide where to place a pod. It finds a node where the sum of all scheduled pod requests fits within the node’s allocatable capacity.

This means you can overcommit — schedule pods whose limits exceed what the node can actually provide, relying on the assumption that not all pods will hit their limits simultaneously. This is normal and expected.

What’s dangerous: setting requests too low. If your container actually needs 512Mi but requests 64Mi, the scheduler may pack 8 such pods onto a node, then all 8 spike simultaneously and the node runs OOM.

Quality of Service Classes

Kubernetes automatically assigns one of three QoS classes to each pod, based on its resource configuration:

Guaranteed

Condition: Every container has requests == limits for both CPU and memory.

These pods are the last to be evicted under memory pressure. Ideal for latency-sensitive or critical workloads.

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Burstable

Condition: At least one container has a request or limit set, but not all match.

Most pods fall here. They get evicted before Guaranteed pods when the node is under pressure.

BestEffort

Condition: No requests or limits set at all.

These pods get evicted first. Only use for truly non-critical background jobs.

CPU vs Memory: Different Behaviors

CPU and memory are handled very differently when a container hits its limit:

CPU limiting (throttling): The container is rate-limited. It won’t get more CPU than its limit, but it keeps running. This causes latency spikes, not crashes.

Memory limiting (OOM kill): When a container exceeds its memory limit, the kernel kills it with OOMKilled. Kubernetes then restarts the container based on the pod’s restart policy.

This is why memory limits are more dangerous to get wrong than CPU limits. A too-low memory limit causes repeated crashes. A too-low CPU limit causes slowness.

LimitRange: Setting Defaults

Manually setting resources on every deployment is error-prone. Use a LimitRange to define namespace-level defaults:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      memory: "256Mi"
      cpu: "200m"
    defaultRequest:
      memory: "128Mi"
      cpu: "100m"
    max:
      memory: "2Gi"
      cpu: "2"

Now any container without explicit resource settings gets these defaults.

ResourceQuota: Limiting Namespaces

ResourceQuota caps total resource consumption in a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

This prevents one team from consuming the entire cluster.

Vertical Pod Autoscaler (VPA)

Setting the right request values manually is guesswork. The Vertical Pod Autoscaler watches actual usage and recommends (or automatically sets) appropriate values.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # or "Off" for recommendations only

In Off mode, run VPA for a week to gather recommendations before applying them. Avoid Auto in production until you trust the recommendations.

Practical Guidelines

Always set requests. Without them, the scheduler is flying blind.

Set memory limits cautiously. If you’re not sure, set the limit 2x the request. Watch for OOMKilled events and adjust.

For CPU, limits are optional. CPU throttling is less harmful than OOM kills. Some teams omit CPU limits intentionally (setting only requests) to allow bursting.

Use namespace-level LimitRange as a safety net. Even if individual teams forget to set resources, the defaults protect the cluster.

Monitor kube_pod_container_resource_requests and kube_pod_container_resource_limits in Prometheus. Alert when requests are unset or when actual usage consistently exceeds requests.

Getting resources right is one of those unsexy infrastructure tasks that prevents 3 AM pages. Worth the investment.