Kubernetes Autoscaling with KEDA: Production Guide
Kubernetes’ built-in HPA scales on CPU and memory, but most real workloads should scale on business metrics: queue depth, request latency, or event count. Kubernetes autoscaling KEDA fills this gap by scaling based on 60+ external event sources, including the ability to scale to zero. This guide covers architecture, trigger configuration, and production tuning.
Why HPA Alone Isn’t Enough
HPA works for CPU-bound workloads, but most microservices are I/O-bound. A Kafka consumer might use 10% CPU while its queue backs up to millions. HPA sees low CPU and does nothing. Moreover, HPA can’t scale to zero — you always pay for at least one replica. KEDA solves both problems.
Kubernetes Autoscaling KEDA: Kafka Consumer Lag
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor
spec:
scaleTargetRef:
name: order-processor
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 50
cooldownPeriod: 300
pollingInterval: 15
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
scaleUp:
policies:
- type: Pods
value: 10
periodSeconds: 60
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: order-processors
topic: orders
lagThreshold: "100"
activationLagThreshold: "5"Scaling on Prometheus Metrics
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-server
spec:
scaleTargetRef:
name: api-server
minReplicaCount: 2
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_p99_latency
query: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket{
service="api-server"}[2m])) by (le))
threshold: "0.5" # Scale when p99 > 500ms
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_rps
query: sum(rate(http_requests_total{service="api-server"}[1m]))
threshold: "1000"Cron-Based Pre-Scaling
triggers:
# Business hours: minimum 10 replicas
- type: cron
metadata:
timezone: America/New_York
start: 0 8 * * 1-5
end: 0 20 * * 1-5
desiredReplicas: "10"
# Lunch rush: minimum 20
- type: cron
metadata:
timezone: America/New_York
start: 0 11 * * 1-5
end: 0 14 * * 1-5
desiredReplicas: "20"
# Also scale on CPU for unexpected spikes
- type: cpu
metricType: Utilization
metadata:
value: "70"ScaledJobs for Batch Processing
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: video-transcoder
spec:
jobTargetRef:
template:
spec:
containers:
- name: transcoder
image: video-transcoder:latest
resources:
requests: { cpu: "2", memory: "4Gi" }
restartPolicy: Never
maxReplicaCount: 20
triggers:
- type: rabbitmq
metadata:
host: amqp://rabbitmq:5672
queueName: video-jobs
queueLength: "1"Production Tuning
Set cooldownPeriod to 300+ seconds to prevent thrashing. Use stabilizationWindowSeconds to smooth metrics spikes. Set activationThreshold to prevent unnecessary scale-from-zero events. Always set resource requests so the cluster autoscaler can provision nodes. Test scaling under load before production.
Key Takeaways
For further reading, refer to the AWS documentation and the Google Cloud documentation for comprehensive reference material.
Key Takeaways
- Start with a solid foundation and build incrementally based on your requirements
- Test thoroughly in staging before deploying to production environments
- Monitor performance metrics and iterate based on real-world data
- Follow security best practices and keep dependencies up to date
- Document architectural decisions for future team members
Kubernetes autoscaling KEDA enables event-driven scaling based on real business metrics. With 60+ triggers and scale-to-zero capability, KEDA handles virtually any scaling scenario. Start with one ScaledObject on your most variable workload, tune the parameters, and expand from there.
In conclusion, Kubernetes Autoscaling Keda is an essential topic for modern software development. By applying the patterns and practices covered in this guide, you can build more robust, scalable, and maintainable systems. Start with the fundamentals, iterate on your implementation, and continuously measure results to ensure you are getting the most value from these approaches.