FinOps Cost Optimization for Kubernetes
FinOps cost optimization brings financial accountability to cloud spending. For Kubernetes environments, where resource allocation is dynamic and multi-tenant, cost visibility and optimization are uniquely challenging. Teams often discover they are paying for 3-5x more compute than they actually use because of over-provisioned resource requests, idle namespaces, and unoptimized instance types.
This guide provides actionable strategies for reducing Kubernetes cloud costs by 40-60% without sacrificing reliability. We cover resource right-sizing, spot instance adoption, cost allocation, and automated optimization tools that deliver immediate savings.
Understanding Kubernetes Cost Drivers
Before optimizing, you need visibility into where money is actually spent. Moreover, Kubernetes cost analysis requires mapping pod-level resource consumption to cloud infrastructure costs:
Typical Kubernetes Cost Breakdown
┌────────────────────────┬──────────┬───────────────┐
│ Cost Driver │ % Spend │ Savings Opp. │
├────────────────────────┼──────────┼───────────────┤
│ Compute (nodes) │ 55-70% │ 30-50% │
│ Storage (EBS/PV) │ 10-15% │ 20-30% │
│ Network (NAT, LB) │ 8-12% │ 15-25% │
│ Managed services │ 5-10% │ 10-20% │
│ Data transfer │ 5-8% │ 20-40% │
│ Observability │ 3-5% │ 30-50% │
└────────────────────────┴──────────┴───────────────┘
Common waste patterns:
- 60% of pods request 3x+ more CPU than used
- 45% of pods request 2x+ more memory than used
- 25% of PVs are unattached or unused
- 15% of nodes run at <20% utilizationResource Right-Sizing
The single biggest cost saving comes from right-sizing resource requests and limits. Therefore, deploy the Kubernetes Vertical Pod Autoscaler (VPA) in recommend mode to analyze actual usage:
# VPA in recommend-only mode (safe to deploy)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # recommend only, don't auto-apply
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 4Gi# Check VPA recommendations
kubectl get vpa api-server-vpa -o jsonpath='{.status.recommendation}'
# Typical finding:
# Current request: cpu=1000m, memory=2Gi
# Recommended: cpu=250m, memory=512Mi
# Savings: 75% CPU, 75% memory → right-size!Additionally, use Goldilocks (by Fairwinds) for cluster-wide VPA recommendations with a web dashboard:
# Install Goldilocks
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks --create-namespace
# Enable for a namespace
kubectl label namespace production \
goldilocks.fairwinds.com/enabled=true
# Access dashboard
kubectl port-forward svc/goldilocks-dashboard 8080:80 -n goldilocksSpot Instance Strategy
Spot instances provide 60-90% savings over on-demand pricing. Consequently, they should run the majority of your stateless workloads:
# Karpenter NodePool for spot instances
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-general
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.large
- m6a.large
- m5.large
- m5a.large
- c6i.large
- c6a.large # Diversify instance types!
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
nodeClassRef:
name: default
limits:
cpu: 100
memory: 400Gi
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720h
---
# On-demand pool for critical workloads only
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: on-demand-critical
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
taints:
- key: workload-type
value: critical
effect: NoSchedule
limits:
cpu: 20 # Much smaller — only for databases, etc.Cost Allocation and Showback
FinOps requires attributing costs to teams, services, and environments. Furthermore, Kubernetes labels are the foundation of cost allocation:
# Enforce cost labels with OPA/Gatekeeper
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-cost-labels
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment", "StatefulSet"]
parameters:
labels:
- key: "cost-center"
allowedRegex: "^(engineering|platform|data|ml)$"
- key: "team"
- key: "environment"
allowedRegex: "^(prod|staging|dev)$"# Cost allocation script using Kubecost API
import requests
import pandas as pd
def get_namespace_costs(kubecost_url, window="7d"):
"""Fetch per-namespace cost breakdown."""
resp = requests.get(
f"{kubecost_url}/model/allocation",
params={
"window": window,
"aggregate": "namespace",
"accumulate": "true",
}
)
data = resp.json()["data"][0]
costs = []
for ns, alloc in data.items():
costs.append({
"namespace": ns,
"cpu_cost": alloc["cpuCost"],
"memory_cost": alloc["ramCost"],
"storage_cost": alloc["pvCost"],
"network_cost": alloc["networkCost"],
"total_cost": alloc["totalCost"],
"efficiency": alloc["totalEfficiency"],
})
df = pd.DataFrame(costs)
df = df.sort_values("total_cost", ascending=False)
return df
# Generate weekly showback report
costs = get_namespace_costs("http://kubecost:9090")
print(costs.to_markdown(index=False))Automated Optimization
Automate recurring optimizations to prevent cost drift:
# CronJob: Scale down dev/staging at night
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-non-prod
namespace: platform
spec:
schedule: "0 20 * * 1-5" # 8 PM weekdays
jobTemplate:
spec:
template:
spec:
containers:
- name: scaler
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
for ns in dev staging qa; do
for deploy in $(kubectl get deploy -n $ns -o name); do
kubectl scale $deploy --replicas=0 -n $ns
done
done
restartPolicy: OnFailure
---
# Scale back up in the morning
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-up-non-prod
spec:
schedule: "0 8 * * 1-5" # 8 AM weekdays
jobTemplate:
spec:
template:
spec:
containers:
- name: scaler
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
for ns in dev staging qa; do
kubectl scale deploy --all --replicas=1 -n $ns
done
restartPolicy: OnFailureWhen NOT to Use Aggressive Cost Optimization
Cost optimization must not compromise reliability. Avoid spot instances for stateful workloads, databases, and services with strict latency SLAs. Do not right-size below the minimum required for startup and burst traffic. Additionally, over-aggressive autoscaling can cause instability during traffic spikes. Always maintain headroom — a 20% buffer above measured peak usage prevents outages during unexpected load. The cost of a production incident always exceeds the savings from aggressive optimization.
Key Takeaways
- FinOps cost optimization starts with visibility — deploy Kubecost or OpenCost for per-namespace cost tracking
- Resource right-sizing typically saves 40-60% by aligning requests with actual usage
- Spot instances save 60-90% on compute — diversify instance types and use graceful shutdown handlers
- Enforce cost allocation labels with policy engines to enable team-level showback reporting
- Automate non-production scaling schedules to eliminate overnight and weekend waste
Related Reading
- Karpenter Kubernetes Autoscaling Guide
- Kubernetes Network Policies Zero Trust
- OpenTelemetry Collector Observability Pipeline