Kubernetes Multi-Cluster Management with Cluster API: Production Patterns

Kubernetes Multi-Cluster Management with Cluster API

Kubernetes multi-cluster Cluster API (CAPI) provides a declarative, Kubernetes-native way to provision and manage clusters across any infrastructure. Instead of using cloud-specific CLIs and Terraform scripts, you define clusters as Kubernetes resources and let CAPI handle the provisioning, scaling, and lifecycle management. This brings the same infrastructure-as-code principles to cluster management that Kubernetes brings to application deployment.

This guide covers setting up a management cluster, provisioning workload clusters across AWS, Azure, and bare metal, implementing GitOps-driven cluster lifecycle management, and handling day-2 operations like upgrades and scaling. Whether you manage 5 clusters or 500, CAPI provides a consistent, automatable approach.

Cluster API Architecture

CAPI uses a management cluster to manage workload clusters. The management cluster runs CAPI controllers that watch for Cluster resources and reconcile them with the actual infrastructure. Each cloud provider has its own infrastructure provider that translates generic CAPI resources into provider-specific API calls.

Kubernetes multi-cluster Cluster API architecture
CAPI architecture: management cluster provisioning workload clusters across clouds
Cluster API Architecture

Management Cluster (runs CAPI controllers):
├── CAPI Core Controller
├── Bootstrap Provider (kubeadm)
├── Control Plane Provider (kubeadm)
├── Infrastructure Provider (AWS/Azure/vSphere)
└── Watches: Cluster, Machine, MachineDeployment CRDs

Workload Cluster A (AWS):
├── 3 control plane nodes (m6i.xlarge)
├── MachineDeployment: 5 worker nodes
└── Auto-managed by CAPI controllers

Workload Cluster B (Azure):
├── 3 control plane nodes (Standard_D4s_v3)
├── MachineDeployment: 10 worker nodes
└── Auto-managed by CAPI controllers

Setting Up the Management Cluster

# Initialize a management cluster
# Start with a kind cluster or existing cluster
kind create cluster --name capi-management

# Install clusterctl CLI
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/latest/download/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl && sudo mv clusterctl /usr/local/bin/

# Initialize with AWS infrastructure provider
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=<your-access-key>
export AWS_SECRET_ACCESS_KEY=<your-secret-key>

clusterctl init \
  --infrastructure aws \
  --bootstrap kubeadm \
  --control-plane kubeadm

# Verify controllers are running
kubectl get pods -n capi-system
kubectl get pods -n capa-system

Provisioning a Workload Cluster

# cluster-production.yaml — Declarative cluster definition
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-us-east
  namespace: clusters
  labels:
    environment: production
    region: us-east-1
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
    serviceDomain: cluster.local
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: production-us-east-cp
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
    kind: AWSCluster
    name: production-us-east
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
  name: production-us-east
  namespace: clusters
spec:
  region: us-east-1
  sshKeyName: capi-key
  network:
    vpc:
      cidrBlock: "10.0.0.0/16"
    subnets:
      - availabilityZone: us-east-1a
        cidrBlock: "10.0.1.0/24"
        isPublic: false
      - availabilityZone: us-east-1b
        cidrBlock: "10.0.2.0/24"
        isPublic: false
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: production-us-east-cp
  namespace: clusters
spec:
  replicas: 3
  version: v1.31.0
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
      kind: AWSMachineTemplate
      name: production-cp-template
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: external
    clusterConfiguration:
      apiServer:
        extraArgs:
          audit-log-maxage: "30"
          audit-log-maxbackup: "10"
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: production-us-east-workers
  namespace: clusters
spec:
  clusterName: production-us-east
  replicas: 5
  selector:
    matchLabels: {}
  template:
    spec:
      clusterName: production-us-east
      version: v1.31.0
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: production-worker-config
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        kind: AWSMachineTemplate
        name: production-worker-template

GitOps-Driven Cluster Lifecycle

Moreover, integrating CAPI with GitOps tools like Flux or ArgoCD enables fully automated cluster provisioning and upgrades. Store cluster definitions in Git and let the GitOps controller reconcile them.

# flux-kustomization.yaml — GitOps for cluster management
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: clusters
  namespace: flux-system
spec:
  interval: 5m
  path: ./clusters/production
  prune: false  # Never auto-delete clusters!
  sourceRef:
    kind: GitRepository
    name: infrastructure
  healthChecks:
    - apiVersion: cluster.x-k8s.io/v1beta1
      kind: Cluster
      name: production-us-east
      namespace: clusters
  timeout: 30m  # Cluster provisioning takes time
Multi-cloud Kubernetes cluster management
Managing clusters across multiple clouds with GitOps and Cluster API

Day-2 Operations: Upgrades and Scaling

# Rolling Kubernetes upgrade — just change the version
kubectl patch kubeadmcontrolplane production-us-east-cp \
  --namespace clusters \
  --type merge \
  --patch '{"spec":{"version":"v1.32.0"}}'

# CAPI performs rolling upgrade:
# 1. Creates new control plane node with v1.32
# 2. Waits for it to be healthy
# 3. Removes old control plane node
# 4. Repeats for all control plane nodes
# 5. Upgrades workers via MachineDeployment rollout

# Scale workers
kubectl scale machinedeployment production-us-east-workers \
  --namespace clusters --replicas=10

# Monitor cluster status
kubectl get clusters -n clusters
kubectl get machines -n clusters
clusterctl describe cluster production-us-east -n clusters

When NOT to Use Cluster API

Cluster API adds significant operational complexity. If you manage fewer than 3 clusters, managed Kubernetes services (EKS, AKS, GKE) with Terraform are simpler and require less expertise. Additionally, CAPI’s management cluster is a single point of failure — if it goes down, you cannot manage workload clusters until it recovers.

Furthermore, CAPI is not suitable for edge or IoT scenarios where clusters are behind NATs or unreliable networks. Consequently, use CAPI when you manage many clusters, need consistent provisioning across clouds, and have a platform team capable of operating the management infrastructure. As a result, small teams should start with managed Kubernetes and consider CAPI only when multi-cluster sprawl becomes a management burden.

Cloud infrastructure automation and management
Automating Kubernetes cluster lifecycle with Cluster API and GitOps

Key Takeaways

Kubernetes multi-cluster Cluster API brings declarative, Kubernetes-native management to cluster infrastructure. Define clusters as YAML, version them in Git, and let CAPI controllers handle provisioning and lifecycle. Combined with GitOps, you get auditable, reproducible cluster management across any cloud. Start with a dedicated management cluster, begin with one workload cluster, and expand as your platform team matures.

For related DevOps topics, explore our guide on GitOps with Flux and ArgoCD and Terraform infrastructure automation. The Cluster API book and CAPI GitHub repository provide detailed documentation.

Scroll to Top