Kubernetes Multi-Cluster Management with Cluster API
Kubernetes multi-cluster Cluster API (CAPI) provides a declarative, Kubernetes-native way to provision and manage clusters across any infrastructure. Instead of using cloud-specific CLIs and Terraform scripts, you define clusters as Kubernetes resources and let CAPI handle the provisioning, scaling, and lifecycle management. This brings the same infrastructure-as-code principles to cluster management that Kubernetes brings to application deployment.
This guide covers setting up a management cluster, provisioning workload clusters across AWS, Azure, and bare metal, implementing GitOps-driven cluster lifecycle management, and handling day-2 operations like upgrades and scaling. Whether you manage 5 clusters or 500, CAPI provides a consistent, automatable approach.
Cluster API Architecture
CAPI uses a management cluster to manage workload clusters. The management cluster runs CAPI controllers that watch for Cluster resources and reconcile them with the actual infrastructure. Each cloud provider has its own infrastructure provider that translates generic CAPI resources into provider-specific API calls.
Cluster API Architecture
Management Cluster (runs CAPI controllers):
├── CAPI Core Controller
├── Bootstrap Provider (kubeadm)
├── Control Plane Provider (kubeadm)
├── Infrastructure Provider (AWS/Azure/vSphere)
└── Watches: Cluster, Machine, MachineDeployment CRDs
Workload Cluster A (AWS):
├── 3 control plane nodes (m6i.xlarge)
├── MachineDeployment: 5 worker nodes
└── Auto-managed by CAPI controllers
Workload Cluster B (Azure):
├── 3 control plane nodes (Standard_D4s_v3)
├── MachineDeployment: 10 worker nodes
└── Auto-managed by CAPI controllersSetting Up the Management Cluster
# Initialize a management cluster
# Start with a kind cluster or existing cluster
kind create cluster --name capi-management
# Install clusterctl CLI
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/latest/download/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl && sudo mv clusterctl /usr/local/bin/
# Initialize with AWS infrastructure provider
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=<your-access-key>
export AWS_SECRET_ACCESS_KEY=<your-secret-key>
clusterctl init \
--infrastructure aws \
--bootstrap kubeadm \
--control-plane kubeadm
# Verify controllers are running
kubectl get pods -n capi-system
kubectl get pods -n capa-systemProvisioning a Workload Cluster
# cluster-production.yaml — Declarative cluster definition
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-us-east
namespace: clusters
labels:
environment: production
region: us-east-1
spec:
clusterNetwork:
pods:
cidrBlocks: ["192.168.0.0/16"]
services:
cidrBlocks: ["10.96.0.0/12"]
serviceDomain: cluster.local
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: production-us-east-cp
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
name: production-us-east
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
name: production-us-east
namespace: clusters
spec:
region: us-east-1
sshKeyName: capi-key
network:
vpc:
cidrBlock: "10.0.0.0/16"
subnets:
- availabilityZone: us-east-1a
cidrBlock: "10.0.1.0/24"
isPublic: false
- availabilityZone: us-east-1b
cidrBlock: "10.0.2.0/24"
isPublic: false
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: production-us-east-cp
namespace: clusters
spec:
replicas: 3
version: v1.31.0
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: production-cp-template
kubeadmConfigSpec:
initConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
clusterConfiguration:
apiServer:
extraArgs:
audit-log-maxage: "30"
audit-log-maxbackup: "10"
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: production-us-east-workers
namespace: clusters
spec:
clusterName: production-us-east
replicas: 5
selector:
matchLabels: {}
template:
spec:
clusterName: production-us-east
version: v1.31.0
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: production-worker-config
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: production-worker-templateGitOps-Driven Cluster Lifecycle
Moreover, integrating CAPI with GitOps tools like Flux or ArgoCD enables fully automated cluster provisioning and upgrades. Store cluster definitions in Git and let the GitOps controller reconcile them.
# flux-kustomization.yaml — GitOps for cluster management
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: clusters
namespace: flux-system
spec:
interval: 5m
path: ./clusters/production
prune: false # Never auto-delete clusters!
sourceRef:
kind: GitRepository
name: infrastructure
healthChecks:
- apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
name: production-us-east
namespace: clusters
timeout: 30m # Cluster provisioning takes timeDay-2 Operations: Upgrades and Scaling
# Rolling Kubernetes upgrade — just change the version
kubectl patch kubeadmcontrolplane production-us-east-cp \
--namespace clusters \
--type merge \
--patch '{"spec":{"version":"v1.32.0"}}'
# CAPI performs rolling upgrade:
# 1. Creates new control plane node with v1.32
# 2. Waits for it to be healthy
# 3. Removes old control plane node
# 4. Repeats for all control plane nodes
# 5. Upgrades workers via MachineDeployment rollout
# Scale workers
kubectl scale machinedeployment production-us-east-workers \
--namespace clusters --replicas=10
# Monitor cluster status
kubectl get clusters -n clusters
kubectl get machines -n clusters
clusterctl describe cluster production-us-east -n clustersWhen NOT to Use Cluster API
Cluster API adds significant operational complexity. If you manage fewer than 3 clusters, managed Kubernetes services (EKS, AKS, GKE) with Terraform are simpler and require less expertise. Additionally, CAPI’s management cluster is a single point of failure — if it goes down, you cannot manage workload clusters until it recovers.
Furthermore, CAPI is not suitable for edge or IoT scenarios where clusters are behind NATs or unreliable networks. Consequently, use CAPI when you manage many clusters, need consistent provisioning across clouds, and have a platform team capable of operating the management infrastructure. As a result, small teams should start with managed Kubernetes and consider CAPI only when multi-cluster sprawl becomes a management burden.
Key Takeaways
Kubernetes multi-cluster Cluster API brings declarative, Kubernetes-native management to cluster infrastructure. Define clusters as YAML, version them in Git, and let CAPI controllers handle provisioning and lifecycle. Combined with GitOps, you get auditable, reproducible cluster management across any cloud. Start with a dedicated management cluster, begin with one workload cluster, and expand as your platform team matures.
For related DevOps topics, explore our guide on GitOps with Flux and ArgoCD and Terraform infrastructure automation. The Cluster API book and CAPI GitHub repository provide detailed documentation.