Kubernetes In-Place Pod Resizing Complete Guide

Kubernetes Pod Resizing: Scale CPU and Memory Without Restarts

Kubernetes pod resizing solves one of the most frustrating operational problems in container orchestration: changing a pod’s resource allocation requires killing it and starting a new one. For stateful workloads, websocket connections, and long-running batch jobs, this forced restart means downtime, lost connections, and interrupted work. In-place pod resizing changes that entirely.

The Problem This Solves — Why It Matters

Consider a real scenario: your API server normally uses 500m CPU and 512Mi memory. During a marketing campaign, traffic spikes 5x and the pods need 2 CPU cores and 2Gi memory. With traditional Kubernetes, the only option is to update the Deployment spec and trigger a rolling restart — killing active connections, dropping in-flight requests, and causing a brief service disruption.

With in-place resizing, the kubelet adjusts the container’s cgroup limits while it continues running. The process inside the container never knows anything changed. Moreover, active TCP connections, in-memory caches, and JVM warmup state are all preserved. Consequently, resource scaling becomes as seamless as adjusting a thermostat.

This is especially critical for:

Stateful databases — PostgreSQL, Redis, MongoDB pods that hold connection state
WebSocket servers — Chat applications, real-time dashboards with thousands of active connections
ML inference — GPU-accelerated pods that take minutes to load models into memory
Batch processing — Jobs that have been running for hours and can’t be restarted

How It Works Under the Hood

When you patch a pod’s resource requests/limits, the kubelet communicates with the container runtime (containerd or CRI-O) to update the cgroup v2 limits. For CPU, this is nearly instantaneous — the kernel simply adjusts the CPU bandwidth allocation. For memory, it’s more nuanced because the kernel can’t always reclaim memory that’s already been allocated to the process.

# Deployment with resize policies configured
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        image: api-server:v2.4.1
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired    # CPU changes: no restart needed
        - resourceName: memory
          restartPolicy: RestartContainer  # Memory increases MAY need restart
---
# To resize a running pod (kubectl 1.31+):
# kubectl patch pod api-server-7d9f8b6c4-x2k9p --subresource resize -p #   '{"spec":{"containers":[{"name":"api","resources":{"requests":{"cpu":"1000m","memory":"1Gi"},"limits":{"cpu":"4000m","memory":"4Gi"}}}]}}'

The resizePolicy field is critical. For CPU, NotRequired works on all major runtimes because CPU bandwidth is a kernel-level limit that applies immediately. For memory, it’s more complex: increasing the limit is always safe (you’re just allowing the process to use more), but the container runtime may need to restart the container to apply certain memory configurations. Therefore, RestartContainer is the safe default for memory.

Kubernetes pod resizing cloud infrastructure — In-place resizing adjusts cgroup limits without stopping the container process

Vertical Pod Autoscaler (VPA) Integration

The real power of in-place resizing comes when combined with VPA. Previously, VPA had a frustrating chicken-and-egg problem: it could calculate the right resource values but could only apply them by evicting the pod and letting it restart with new values. This caused disruptions and was often disabled in production.

With the new updateMode: InPlace (available in VPA 1.2+), VPA applies recommendations without eviction. Additionally, you can configure bounds to prevent VPA from over-allocating:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "InPlace"          # Apply without restart
    minReplicas: 2                 # Keep at least 2 pods during updates
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: "250m"
        memory: "256Mi"
      maxAllowed:
        cpu: "4000m"
        memory: "8Gi"
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

In practice, VPA + in-place resizing means your pods automatically right-size throughout the day. A service that needs 2 cores during peak hours and 500m during off-hours adjusts without any human intervention or pod disruption.

Kubernetes Pod Resizing: Monitoring and Troubleshooting

After submitting a resize, check the pod’s status to see if it was applied:

# Check resize status
kubectl get pod api-server-7d9f8b6c4-x2k9p -o jsonpath='{.status.resize}'
# Possible values: Proposed, InProgress, Deferred, Infeasible

# Check actual allocated resources (what the container is currently using)
kubectl get pod api-server-7d9f8b6c4-x2k9p -o jsonpath='{.status.containerStatuses[0].allocatedResources}'

# Monitor resize events
kubectl get events --field-selector involvedObject.name=api-server-7d9f8b6c4-x2k9p

“Deferred” means the node doesn’t have enough resources right now — the resize will be applied when resources become available. “Infeasible” means it can never be satisfied on this node (requesting more CPU than the node has). However, in most production clusters with proper resource margins, resizes complete within seconds.

Cloud infrastructure monitoring dashboard — Prometheus metrics expose resize latency, success rates, and failure reasons

Production Best Practices

1. Always set resize policies explicitly. Don’t rely on defaults — be clear about which resources can resize without restart. This prevents surprises in production.

2. Set reasonable maxAllowed limits. A pod that auto-scales to 32 cores on a 64-core node can starve other workloads. Use VPA resource policies to cap growth.

3. Combine with HPA wisely. Use HPA for horizontal scaling (add more pods) and VPA for vertical scaling (make existing pods bigger). For most workloads, HPA handles traffic spikes while VPA optimizes baseline resource allocation. Furthermore, configure HPA to scale on custom metrics (requests per second) rather than CPU, since VPA is also changing CPU.

4. Test resize behavior in staging first. Some applications don’t handle cgroup limit changes gracefully. Java’s JVM respects dynamic CPU count changes, but it won’t return allocated heap memory when memory limits decrease. Go applications handle it well. Node.js is generally fine. Specifically, test your runtime’s behavior before enabling in production.

5. Monitor for OOMKill after memory resizes. If you decrease memory limits and the process is already using more than the new limit, the kernel will OOMKill the container. Always check current usage before decreasing limits.

Kubernetes cluster scaling metrics — Combine VPA for right-sizing with HPA for traffic-based horizontal scaling

Current Limitations (As of Kubernetes 1.32+)

In-place resizing has matured significantly but still has some limitations to be aware of:

Only works with cgroup v2 — if your nodes use cgroup v1, you need to upgrade first
Cannot resize ephemeral storage — only CPU and memory
Init containers cannot be resized
Some managed Kubernetes services (EKS, GKE, AKS) may have different feature gate availability — check your provider’s documentation

Related Reading:

Official Resources:

In conclusion, Kubernetes pod resizing transforms resource management from a disruptive restart-dependent process into a seamless runtime operation. For any team running stateful or connection-sensitive workloads, this feature alone justifies upgrading to the latest Kubernetes version.