Kubernetes Autoscaling with KEDA: Production Guide 2026

Kubernetes Autoscaling with KEDA: Production Guide

Kubernetes’ built-in HPA scales on CPU and memory, but most real workloads should scale on business metrics: queue depth, request latency, or event count. Kubernetes autoscaling KEDA fills this gap by scaling based on 60+ external event sources, including the ability to scale to zero. This guide covers architecture, trigger configuration, and production tuning.

Why HPA Alone Isn’t Enough

HPA works for CPU-bound workloads, but most microservices are I/O-bound. A Kafka consumer might use 10% CPU while its queue backs up to millions. HPA sees low CPU and does nothing. Moreover, HPA can’t scale to zero — you always pay for at least one replica. KEDA solves both problems.

Kubernetes KEDA autoscaling
KEDA extends Kubernetes autoscaling with 60+ event sources beyond CPU/memory

Kubernetes Autoscaling KEDA: Kafka Consumer Lag

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
spec:
  scaleTargetRef:
    name: order-processor
  minReplicaCount: 0        # Scale to zero when idle
  maxReplicaCount: 50
  cooldownPeriod: 300
  pollingInterval: 15
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
            - type: Percent
              value: 25
              periodSeconds: 60
        scaleUp:
          policies:
            - type: Pods
              value: 10
              periodSeconds: 60
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: order-processors
        topic: orders
        lagThreshold: "100"
        activationLagThreshold: "5"

Scaling on Prometheus Metrics

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-server
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_p99_latency
        query: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_seconds_bucket{
              service="api-server"}[2m])) by (le))
        threshold: "0.5"  # Scale when p99 > 500ms
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_rps
        query: sum(rate(http_requests_total{service="api-server"}[1m]))
        threshold: "1000"

Cron-Based Pre-Scaling

triggers:
  # Business hours: minimum 10 replicas
  - type: cron
    metadata:
      timezone: America/New_York
      start: 0 8 * * 1-5
      end: 0 20 * * 1-5
      desiredReplicas: "10"
  # Lunch rush: minimum 20
  - type: cron
    metadata:
      timezone: America/New_York
      start: 0 11 * * 1-5
      end: 0 14 * * 1-5
      desiredReplicas: "20"
  # Also scale on CPU for unexpected spikes
  - type: cpu
    metricType: Utilization
    metadata:
      value: "70"
Prometheus metrics dashboard
KEDA scales on Prometheus queries, Kafka lag, cron schedules, or any custom metric

ScaledJobs for Batch Processing

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: video-transcoder
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: transcoder
            image: video-transcoder:latest
            resources:
              requests: { cpu: "2", memory: "4Gi" }
        restartPolicy: Never
  maxReplicaCount: 20
  triggers:
    - type: rabbitmq
      metadata:
        host: amqp://rabbitmq:5672
        queueName: video-jobs
        queueLength: "1"

Production Tuning

Set cooldownPeriod to 300+ seconds to prevent thrashing. Use stabilizationWindowSeconds to smooth metrics spikes. Set activationThreshold to prevent unnecessary scale-from-zero events. Always set resource requests so the cluster autoscaler can provision nodes. Test scaling under load before production.

Kubernetes batch processing
ScaledJobs create one Kubernetes Job per message — ideal for batch workloads

Key Takeaways

For further reading, refer to the AWS documentation and the Google Cloud documentation for comprehensive reference material.

Key Takeaways

  • Start with a solid foundation and build incrementally based on your requirements
  • Test thoroughly in staging before deploying to production environments
  • Monitor performance metrics and iterate based on real-world data
  • Follow security best practices and keep dependencies up to date
  • Document architectural decisions for future team members

Kubernetes autoscaling KEDA enables event-driven scaling based on real business metrics. With 60+ triggers and scale-to-zero capability, KEDA handles virtually any scaling scenario. Start with one ScaledObject on your most variable workload, tune the parameters, and expand from there.

In conclusion, Kubernetes Autoscaling Keda is an essential topic for modern software development. By applying the patterns and practices covered in this guide, you can build more robust, scalable, and maintainable systems. Start with the fundamentals, iterate on your implementation, and continuously measure results to ensure you are getting the most value from these approaches.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top