FinOps Cost Optimization for Kubernetes: Cloud Spending Strategies That Work

FinOps Cost Optimization for Kubernetes

FinOps cost optimization brings financial accountability to cloud spending. For Kubernetes environments, where resource allocation is dynamic and multi-tenant, cost visibility and optimization are uniquely challenging. Teams often discover they are paying for 3-5x more compute than they actually use because of over-provisioned resource requests, idle namespaces, and unoptimized instance types.

This guide provides actionable strategies for reducing Kubernetes cloud costs by 40-60% without sacrificing reliability. We cover resource right-sizing, spot instance adoption, cost allocation, and automated optimization tools that deliver immediate savings.

Understanding Kubernetes Cost Drivers

Before optimizing, you need visibility into where money is actually spent. Moreover, Kubernetes cost analysis requires mapping pod-level resource consumption to cloud infrastructure costs:

Typical Kubernetes Cost Breakdown

┌────────────────────────┬──────────┬───────────────┐
│ Cost Driver            │ % Spend  │ Savings Opp.  │
├────────────────────────┼──────────┼───────────────┤
│ Compute (nodes)        │ 55-70%   │ 30-50%        │
│ Storage (EBS/PV)       │ 10-15%   │ 20-30%        │
│ Network (NAT, LB)      │ 8-12%    │ 15-25%        │
│ Managed services       │ 5-10%    │ 10-20%        │
│ Data transfer          │ 5-8%     │ 20-40%        │
│ Observability          │ 3-5%     │ 30-50%        │
└────────────────────────┴──────────┴───────────────┘

Common waste patterns:
- 60% of pods request 3x+ more CPU than used
- 45% of pods request 2x+ more memory than used
- 25% of PVs are unattached or unused
- 15% of nodes run at <20% utilization
FinOps cost optimization Kubernetes dashboard
Cost visibility is the first step — you cannot optimize what you cannot measure

Resource Right-Sizing

The single biggest cost saving comes from right-sizing resource requests and limits. Therefore, deploy the Kubernetes Vertical Pod Autoscaler (VPA) in recommend mode to analyze actual usage:

# VPA in recommend-only mode (safe to deploy)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # recommend only, don't auto-apply
  resourcePolicy:
    containerPolicies:
      - containerName: api-server
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi
# Check VPA recommendations
kubectl get vpa api-server-vpa -o jsonpath='{.status.recommendation}'

# Typical finding:
# Current request: cpu=1000m, memory=2Gi
# Recommended:     cpu=250m,  memory=512Mi
# Savings: 75% CPU, 75% memory → right-size!

Additionally, use Goldilocks (by Fairwinds) for cluster-wide VPA recommendations with a web dashboard:

# Install Goldilocks
helm install goldilocks fairwinds-stable/goldilocks \
    --namespace goldilocks --create-namespace

# Enable for a namespace
kubectl label namespace production \
    goldilocks.fairwinds.com/enabled=true

# Access dashboard
kubectl port-forward svc/goldilocks-dashboard 8080:80 -n goldilocks

Spot Instance Strategy

Spot instances provide 60-90% savings over on-demand pricing. Consequently, they should run the majority of your stateless workloads:

# Karpenter NodePool for spot instances
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-general
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m6i.large
            - m6a.large
            - m5.large
            - m5a.large
            - c6i.large
            - c6a.large  # Diversify instance types!
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-east-1a", "us-east-1b", "us-east-1c"]
      nodeClassRef:
        name: default
  limits:
    cpu: 100
    memory: 400Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h

---
# On-demand pool for critical workloads only
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: on-demand-critical
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
      taints:
        - key: workload-type
          value: critical
          effect: NoSchedule
  limits:
    cpu: 20  # Much smaller — only for databases, etc.

Cost Allocation and Showback

FinOps requires attributing costs to teams, services, and environments. Furthermore, Kubernetes labels are the foundation of cost allocation:

# Enforce cost labels with OPA/Gatekeeper
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-cost-labels
spec:
  match:
    kinds:
      - apiGroups: ["apps"]
        kinds: ["Deployment", "StatefulSet"]
  parameters:
    labels:
      - key: "cost-center"
        allowedRegex: "^(engineering|platform|data|ml)$"
      - key: "team"
      - key: "environment"
        allowedRegex: "^(prod|staging|dev)$"
# Cost allocation script using Kubecost API
import requests
import pandas as pd

def get_namespace_costs(kubecost_url, window="7d"):
    """Fetch per-namespace cost breakdown."""
    resp = requests.get(
        f"{kubecost_url}/model/allocation",
        params={
            "window": window,
            "aggregate": "namespace",
            "accumulate": "true",
        }
    )
    data = resp.json()["data"][0]

    costs = []
    for ns, alloc in data.items():
        costs.append({
            "namespace": ns,
            "cpu_cost": alloc["cpuCost"],
            "memory_cost": alloc["ramCost"],
            "storage_cost": alloc["pvCost"],
            "network_cost": alloc["networkCost"],
            "total_cost": alloc["totalCost"],
            "efficiency": alloc["totalEfficiency"],
        })

    df = pd.DataFrame(costs)
    df = df.sort_values("total_cost", ascending=False)
    return df

# Generate weekly showback report
costs = get_namespace_costs("http://kubecost:9090")
print(costs.to_markdown(index=False))
FinOps cost allocation and showback reporting
Cost allocation reports drive accountability by attributing spending to teams

Automated Optimization

Automate recurring optimizations to prevent cost drift:

# CronJob: Scale down dev/staging at night
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-non-prod
  namespace: platform
spec:
  schedule: "0 20 * * 1-5"  # 8 PM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: scaler
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - |
                  for ns in dev staging qa; do
                    for deploy in $(kubectl get deploy -n $ns -o name); do
                      kubectl scale $deploy --replicas=0 -n $ns
                    done
                  done
          restartPolicy: OnFailure
---
# Scale back up in the morning
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-up-non-prod
spec:
  schedule: "0 8 * * 1-5"  # 8 AM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: scaler
              image: bitnami/kubectl:latest
              command:
                - /bin/sh
                - -c
                - |
                  for ns in dev staging qa; do
                    kubectl scale deploy --all --replicas=1 -n $ns
                  done
          restartPolicy: OnFailure

When NOT to Use Aggressive Cost Optimization

Cost optimization must not compromise reliability. Avoid spot instances for stateful workloads, databases, and services with strict latency SLAs. Do not right-size below the minimum required for startup and burst traffic. Additionally, over-aggressive autoscaling can cause instability during traffic spikes. Always maintain headroom — a 20% buffer above measured peak usage prevents outages during unexpected load. The cost of a production incident always exceeds the savings from aggressive optimization.

Kubernetes cost optimization vs reliability balance
Balance cost savings with reliability — the cheapest infrastructure is worthless if it causes outages

Key Takeaways

  • FinOps cost optimization starts with visibility — deploy Kubecost or OpenCost for per-namespace cost tracking
  • Resource right-sizing typically saves 40-60% by aligning requests with actual usage
  • Spot instances save 60-90% on compute — diversify instance types and use graceful shutdown handlers
  • Enforce cost allocation labels with policy engines to enable team-level showback reporting
  • Automate non-production scaling schedules to eliminate overnight and weekend waste

Related Reading

External Resources

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top