GCP Cloud Run Serverless Containers Guide

GCP Cloud Run: The Simplest Way to Run Containers

GCP Cloud Run serverless is Google Cloud’s fully managed container platform that automatically scales from zero to thousands of instances based on incoming traffic. You bring a container image, Cloud Run handles everything else — provisioning, scaling, TLS certificates, and load balancing. Therefore, teams can deploy any application that fits in a container without managing any infrastructure.

Cloud Run’s killer feature is scale-to-zero — when your service receives no traffic, it runs zero instances and you pay nothing. Moreover, it scales up in seconds when requests arrive, handling traffic spikes without pre-provisioning capacity. Consequently, Cloud Run is ideal for APIs, web applications, webhooks, and event-driven workloads where traffic is variable or unpredictable.

GCP Cloud Run Serverless: Deploying Your First Service

Deploy a container to Cloud Run with a single command. Cloud Run pulls your image, configures networking, provisions TLS, and makes your service available at a unique HTTPS URL. Furthermore, you can deploy from source code directly — Cloud Run uses Cloud Build to containerize your application automatically.

# Deploy from a container image
gcloud run deploy order-service \
  --image gcr.io/my-project/order-service:v1.2.0 \
  --platform managed \
  --region us-central1 \
  --memory 1Gi \
  --cpu 2 \
  --min-instances 0 \
  --max-instances 100 \
  --concurrency 80 \
  --port 8080 \
  --set-env-vars "SPRING_PROFILES_ACTIVE=production" \
  --set-secrets "DB_PASSWORD=db-password:latest" \
  --vpc-connector my-vpc-connector \
  --allow-unauthenticated

# Deploy from source (Cloud Build auto-builds)
gcloud run deploy my-api \
  --source . \
  --region us-central1

# Deploy with traffic splitting (canary)
gcloud run services update-traffic order-service \
  --to-revisions LATEST=10,order-service-v1=90 \
  --region us-central1

GCP Cloud Run serverless deployment — Cloud Run automatically scales from zero to handle any traffic pattern

Service Configuration and Scaling

Cloud Run provides fine-grained control over scaling behavior, request handling, and resource allocation. The concurrency setting determines how many requests each instance handles simultaneously — higher concurrency means fewer instances but more memory per instance. Additionally, minimum instances keep your service warm for latency-sensitive endpoints.

# Cloud Run service YAML configuration
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: order-service
  annotations:
    run.googleapis.com/launch-stage: GA
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "100"
        run.googleapis.com/cpu-throttling: "false"
        run.googleapis.com/startup-cpu-boost: "true"
        run.googleapis.com/vpc-access-connector: my-vpc-connector
        run.googleapis.com/vpc-access-egress: private-ranges-only
    spec:
      containerConcurrency: 80
      timeoutSeconds: 300
      containers:
        - image: gcr.io/my-project/order-service:v1.2.0
          ports:
            - containerPort: 8080
          resources:
            limits:
              cpu: "2"
              memory: 1Gi
          env:
            - name: SPRING_PROFILES_ACTIVE
              value: production
          startupProbe:
            httpGet:
              path: /actuator/health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 12
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080

VPC Connectivity and Private Services

Cloud Run services can connect to resources in your VPC — databases, Redis caches, internal APIs — through VPC connectors or Direct VPC egress. Furthermore, you can restrict ingress to internal traffic only, making services accessible only from within your VPC or through a load balancer.

Cloud networking and VPC architecture — VPC connectors enable Cloud Run services to access private resources securely

CI/CD with Cloud Build

Automate deployments with Cloud Build triggers that build, test, and deploy on every push. Additionally, use traffic splitting for gradual rollouts and automatic rollback on failure. See the Cloud Run documentation for advanced deployment patterns.

Key Takeaways

Start with a solid foundation and build incrementally based on your requirements
Test thoroughly in staging before deploying to production environments
Monitor performance metrics and iterate based on real-world data
Follow security best practices and keep dependencies up to date
Document architectural decisions for future team members

# cloudbuild.yaml — CI/CD pipeline
steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'gcr.io/$PROJECT_ID/order-service:$SHORT_SHA', '.']

  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$PROJECT_ID/order-service:$SHORT_SHA']

  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: gcloud
    args:
      - 'run'
      - 'deploy'
      - 'order-service'
      - '--image=gcr.io/$PROJECT_ID/order-service:$SHORT_SHA'
      - '--region=us-central1'
      - '--tag=canary'
      - '--no-traffic'

  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: gcloud
    args:
      - 'run'
      - 'services'
      - 'update-traffic'
      - 'order-service'
      - '--to-tags=canary=10'
      - '--region=us-central1'

images:
  - 'gcr.io/$PROJECT_ID/order-service:$SHORT_SHA'

CI/CD deployment pipeline — Cloud Build automates container builds and canary deployments to Cloud Run

In conclusion, GCP Cloud Run serverless is the fastest path from container to production on Google Cloud. With scale-to-zero pricing, automatic TLS, and built-in traffic management, it removes operational overhead while giving you full container flexibility. Start with a simple deployment, add VPC connectivity for database access, and implement canary deployments for safe releases.

GCP Cloud Run: Serverless Containers with Automatic Scaling to Zero