--- name: gcp-cloud-run description: Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-driven architecture with Pub/Sub. risk: unknown source: vibeship-spawner-skills (Apache 2.0) date_added: 2026-02-27 --- # GCP Cloud Run Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-driven architecture with Pub/Sub. ## Principles - Cloud Run for containers, Functions for simple event handlers - Optimize for cold starts with startup CPU boost and min instances - Set concurrency based on workload (start with 8, adjust) - Memory includes /tmp filesystem - plan accordingly - Use VPC Connector only when needed (adds latency) - Containers should start fast and be stateless - Handle signals gracefully for clean shutdown ## Patterns ### Cloud Run Service Pattern Containerized web service on Cloud Run **When to use**: Web applications and APIs,Need any runtime or library,Complex services with multiple endpoints,Stateless containerized workloads ```dockerfile # Dockerfile - Multi-stage build for smaller image FROM node:20-slim AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production FROM node:20-slim WORKDIR /app # Copy only production dependencies COPY --from=builder /app/node_modules ./node_modules COPY src ./src COPY package.json ./ # Cloud Run uses PORT env variable ENV PORT=8080 EXPOSE 8080 # Run as non-root user USER node CMD ["node", "src/index.js"] ``` ```javascript // src/index.js const express = require('express'); const app = express(); app.use(express.json()); // Health check endpoint app.get('/health', (req, res) => { res.status(200).send('OK'); }); // API routes app.get('/api/items/:id', async (req, res) => { try { const item = await getItem(req.params.id); res.json(item); } catch (error) { console.error('Error:', error); res.status(500).json({ error: 'Internal server error' }); } }); // Graceful shutdown process.on('SIGTERM', () => { console.log('SIGTERM received, shutting down gracefully'); server.close(() => { console.log('Server closed'); process.exit(0); }); }); const PORT = process.env.PORT || 8080; const server = app.listen(PORT, () => { console.log(`Server listening on port ${PORT}`); }); ``` ```yaml # cloudbuild.yaml steps: # Build the container image - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA', '.'] # Push the container image - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA'] # Deploy to Cloud Run - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args: - 'run' - 'deploy' - 'my-service' - '--image=gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA' - '--region=us-central1' - '--platform=managed' - '--allow-unauthenticated' - '--memory=512Mi' - '--cpu=1' - '--min-instances=1' - '--max-instances=100' - '--concurrency=80' - '--cpu-boost' images: - 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA' ``` ### Structure project/ ├── Dockerfile ├── .dockerignore ├── src/ │ ├── index.js │ └── routes/ ├── package.json └── cloudbuild.yaml ### Gcloud_deploy # Direct gcloud deployment gcloud run deploy my-service \ --source . \ --region us-central1 \ --allow-unauthenticated \ --memory 512Mi \ --cpu 1 \ --min-instances 1 \ --max-instances 100 \ --concurrency 80 \ --cpu-boost ### Cloud Run Functions Pattern Event-driven functions (formerly Cloud Functions) **When to use**: Simple event handlers,Pub/Sub message processing,Cloud Storage triggers,HTTP webhooks ```javascript // HTTP Function // index.js const functions = require('@google-cloud/functions-framework'); functions.http('helloHttp', (req, res) => { const name = req.query.name || req.body.name || 'World'; res.send(`Hello, ${name}!`); }); ``` ```javascript // Pub/Sub Function const functions = require('@google-cloud/functions-framework'); functions.cloudEvent('processPubSub', (cloudEvent) => { // Decode Pub/Sub message const message = cloudEvent.data.message; const data = message.data ? JSON.parse(Buffer.from(message.data, 'base64').toString()) : {}; console.log('Received message:', data); // Process message processMessage(data); }); ``` ```javascript // Cloud Storage Function const functions = require('@google-cloud/functions-framework'); functions.cloudEvent('processStorageEvent', async (cloudEvent) => { const file = cloudEvent.data; console.log(`Event: ${cloudEvent.type}`); console.log(`Bucket: ${file.bucket}`); console.log(`File: ${file.name}`); if (cloudEvent.type === 'google.cloud.storage.object.v1.finalized') { await processUploadedFile(file.bucket, file.name); } }); ``` ```bash # Deploy HTTP function gcloud functions deploy hello-http \ --gen2 \ --runtime nodejs20 \ --trigger-http \ --allow-unauthenticated \ --region us-central1 # Deploy Pub/Sub function gcloud functions deploy process-messages \ --gen2 \ --runtime nodejs20 \ --trigger-topic my-topic \ --region us-central1 # Deploy Cloud Storage function gcloud functions deploy process-uploads \ --gen2 \ --runtime nodejs20 \ --trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \ --trigger-event-filters="bucket=my-bucket" \ --region us-central1 ``` ### Cold Start Optimization Pattern Minimize cold start latency for Cloud Run **When to use**: Latency-sensitive applications,User-facing APIs,High-traffic services ## 1. Enable Startup CPU Boost ```bash gcloud run deploy my-service \ --cpu-boost \ --region us-central1 ``` ## 2. Set Minimum Instances ```bash gcloud run deploy my-service \ --min-instances 1 \ --region us-central1 ``` ## 3. Optimize Container Image ```dockerfile # Use distroless for minimal image FROM node:20-slim AS builder WORKDIR /app COPY package*.json ./ RUN npm ci --only=production FROM gcr.io/distroless/nodejs20-debian12 WORKDIR /app COPY --from=builder /app/node_modules ./node_modules COPY src ./src CMD ["src/index.js"] ``` ## 4. Lazy Initialize Heavy Dependencies ```javascript // Lazy load heavy libraries let bigQueryClient = null; function getBigQueryClient() { if (!bigQueryClient) { const { BigQuery } = require('@google-cloud/bigquery'); bigQueryClient = new BigQuery(); } return bigQueryClient; } // Only initialize when needed app.get('/api/analytics', async (req, res) => { const client = getBigQueryClient(); const results = await client.query({...}); res.json(results); }); ``` ## 5. Increase Memory (More CPU) ```bash # Higher memory = more CPU during startup gcloud run deploy my-service \ --memory 1Gi \ --cpu 2 \ --region us-central1 ``` ### Optimization_impact - Startup_cpu_boost: 50% faster cold starts - Min_instances: Eliminates cold starts for traffic spikes - Distroless_image: Smaller attack surface, faster pull - Lazy_init: Defers heavy loading to first request ### Concurrency Configuration Pattern Proper concurrency settings for Cloud Run **When to use**: Need to optimize instance utilization,Handle traffic spikes efficiently,Reduce cold starts ## Understanding Concurrency ```bash # Default concurrency is 80 # Adjust based on your workload # For I/O-bound workloads (most web apps) gcloud run deploy my-service \ --concurrency 80 \ --cpu 1 # For CPU-bound workloads gcloud run deploy my-service \ --concurrency 1 \ --cpu 1 # For memory-intensive workloads gcloud run deploy my-service \ --concurrency 10 \ --memory 2Gi ``` ## Node.js Concurrency ```javascript // Node.js is single-threaded but handles I/O concurrently // Use async/await for all I/O operations // GOOD - async I/O app.get('/api/data', async (req, res) => { const [users, products] = await Promise.all([ fetchUsers(), fetchProducts() ]); res.json({ users, products }); }); // BAD - blocking operation app.get('/api/compute', (req, res) => { const result = heavyCpuOperation(); // Blocks other requests! res.json(result); }); ``` ## Python Concurrency with Gunicorn ```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . # 4 workers for concurrency CMD exec gunicorn --bind :$PORT --workers 4 --threads 2 main:app ``` ```python # main.py from flask import Flask app = Flask(__name__) @app.route('/api/data') def get_data(): return {'status': 'ok'} ``` ### Concurrency_guidelines - Concurrency=1: Only for CPU-bound or unsafe code - Concurrency=8 20: Memory-intensive workloads - Concurrency=80: Default, good for I/O-bound - Concurrency=250: Maximum, for very lightweight handlers ### Pub/Sub Integration Pattern Event-driven processing with Cloud Pub/Sub **When to use**: Asynchronous message processing,Decoupled microservices,Event-driven architecture ## Push Subscription to Cloud Run ```bash # Create topic gcloud pubsub topics create orders # Create push subscription to Cloud Run gcloud pubsub subscriptions create orders-push \ --topic orders \ --push-endpoint https://my-service-xxx.run.app/pubsub \ --ack-deadline 600 ``` ```javascript // Handle Pub/Sub push messages const express = require('express'); const app = express(); app.use(express.json()); app.post('/pubsub', async (req, res) => { // Verify the request is from Pub/Sub if (!req.body.message) { return res.status(400).send('Invalid Pub/Sub message'); } try { // Decode message data const message = req.body.message; const data = message.data ? JSON.parse(Buffer.from(message.data, 'base64').toString()) : {}; console.log('Processing order:', data); await processOrder(data); // Return 200 to acknowledge res.status(200).send('OK'); } catch (error) { console.error('Processing failed:', error); // Return 500 to trigger retry res.status(500).send('Processing failed'); } }); ``` ## Publishing Messages ```javascript const { PubSub } = require('@google-cloud/pubsub'); const pubsub = new PubSub(); async function publishOrder(order) { const topic = pubsub.topic('orders'); const messageBuffer = Buffer.from(JSON.stringify(order)); const messageId = await topic.publishMessage({ data: messageBuffer, attributes: { type: 'order_created', priority: 'high' } }); console.log(`Published message ${messageId}`); return messageId; } ``` ## Dead Letter Queue ```bash # Create DLQ topic gcloud pubsub topics create orders-dlq # Update subscription with DLQ gcloud pubsub subscriptions update orders-push \ --dead-letter-topic orders-dlq \ --max-delivery-attempts 5 ``` ### Cloud SQL Connection Pattern Connect Cloud Run to Cloud SQL securely **When to use**: Need relational database,Migrating existing applications,Complex queries and transactions ```bash # Deploy with Cloud SQL connection gcloud run deploy my-service \ --add-cloudsql-instances PROJECT:REGION:INSTANCE \ --set-env-vars INSTANCE_CONNECTION_NAME="PROJECT:REGION:INSTANCE" \ --set-env-vars DB_NAME="mydb" \ --set-env-vars DB_USER="myuser" ``` ```javascript // Using Unix socket connection const { Pool } = require('pg'); const pool = new Pool({ user: process.env.DB_USER, password: process.env.DB_PASS, database: process.env.DB_NAME, // Cloud SQL connector uses Unix socket host: `/cloudsql/${process.env.INSTANCE_CONNECTION_NAME}`, max: 5, // Connection pool size idleTimeoutMillis: 30000, connectionTimeoutMillis: 10000, }); app.get('/api/users', async (req, res) => { const client = await pool.connect(); try { const result = await client.query('SELECT * FROM users LIMIT 100'); res.json(result.rows); } finally { client.release(); } }); ``` ```python # Python with SQLAlchemy import os from sqlalchemy import create_engine def get_engine(): instance_connection_name = os.environ["INSTANCE_CONNECTION_NAME"] db_user = os.environ["DB_USER"] db_pass = os.environ["DB_PASS"] db_name = os.environ["DB_NAME"] engine = create_engine( f"postgresql+pg8000://{db_user}:{db_pass}@/{db_name}", connect_args={ "unix_sock": f"/cloudsql/{instance_connection_name}/.s.PGSQL.5432" }, pool_size=5, max_overflow=2, pool_timeout=30, pool_recycle=1800, ) return engine ``` ### Best_practices - Use connection pooling (max 5-10 per instance) - Set appropriate idle timeouts - Handle connection errors gracefully - Consider Cloud SQL Proxy for local development ### Secret Manager Integration Securely manage secrets in Cloud Run **When to use**: API keys, database passwords,Service account keys,Any sensitive configuration ```bash # Create secret echo -n "my-secret-value" | gcloud secrets create my-secret --data-file=- # Mount as environment variable gcloud run deploy my-service \ --update-secrets=API_KEY=my-secret:latest # Mount as file volume gcloud run deploy my-service \ --update-secrets=/secrets/api-key=my-secret:latest ``` ```javascript // Access mounted as environment variable const apiKey = process.env.API_KEY; // Access mounted as file const fs = require('fs'); const apiKey = fs.readFileSync('/secrets/api-key', 'utf8'); // Access via Secret Manager API (when not mounted) const { SecretManagerServiceClient } = require('@google-cloud/secret-manager'); const client = new SecretManagerServiceClient(); async function getSecret(name) { const [version] = await client.accessSecretVersion({ name: `projects/${projectId}/secrets/${name}/versions/latest` }); return version.payload.data.toString(); } ``` ## Sharp Edges ### /tmp Filesystem Counts Against Memory Severity: HIGH Situation: Writing files to /tmp directory in Cloud Run Symptoms: Container killed with OOM error. Memory usage spikes unexpectedly. File operations cause container restarts. "Container memory limit exceeded" in logs. Why this breaks: Cloud Run uses an in-memory filesystem for /tmp. Any files written to /tmp consume memory from your container's allocation. Common scenarios: - Downloading files temporarily - Creating temp processing files - Libraries caching to /tmp - Large log buffers A 512MB container that downloads a 200MB file to /tmp only has ~300MB left for the application. Recommended fix: ## Calculate memory including /tmp usage ```yaml # cloudbuild.yaml steps: - name: 'gcr.io/cloud-builders/gcloud' args: - 'run' - 'deploy' - 'my-service' - '--memory=1Gi' # Include /tmp overhead - '--image=gcr.io/$PROJECT_ID/my-service' ``` ## Stream instead of buffering ```python # BAD - buffers entire file in /tmp def process_large_file(bucket_name, blob_name): blob = bucket.blob(blob_name) blob.download_to_filename('/tmp/large_file') with open('/tmp/large_file', 'rb') as f: process(f.read()) # GOOD - stream processing def process_large_file(bucket_name, blob_name): blob = bucket.blob(blob_name) with blob.open('rb') as f: for chunk in iter(lambda: f.read(8192), b''): process_chunk(chunk) ``` ## Use Cloud Storage for large files ```python from google.cloud import storage def process_with_gcs(bucket_name, input_blob, output_blob): client = storage.Client() bucket = client.bucket(bucket_name) # Process directly to/from GCS input_blob = bucket.blob(input_blob) output_blob = bucket.blob(output_blob) with input_blob.open('rb') as reader: with output_blob.open('wb') as writer: for chunk in iter(lambda: reader.read(65536), b''): processed = transform(chunk) writer.write(processed) ``` ## Monitor memory usage ```python import psutil import logging def log_memory(): memory = psutil.virtual_memory() logging.info(f"Memory: {memory.percent}% used, " f"{memory.available / 1024 / 1024:.0f}MB available") ``` ### Concurrency=1 Causes Scaling Bottlenecks Severity: HIGH Situation: Setting concurrency to 1 for request isolation Symptoms: Auto-scaling creates many container instances. High latency during traffic spikes. Increased cold starts. Higher costs from more instances. Why this breaks: Setting concurrency to 1 means each container handles only one request at a time. During traffic spikes: - 100 concurrent requests = 100 container instances - Each instance has cold start overhead - More instances = higher costs - Scaling takes time, requests queue up This should only be used when: - Processing is truly single-threaded - Memory-heavy per-request processing - Using thread-unsafe libraries Recommended fix: ## Set appropriate concurrency ```bash # For I/O-bound workloads (most web apps) gcloud run deploy my-service \ --concurrency=80 \ --max-instances=100 # For CPU-bound workloads gcloud run deploy my-service \ --concurrency=4 \ --cpu=2 # Only use 1 when absolutely necessary gcloud run deploy my-service \ --concurrency=1 \ --max-instances=1000 # Be prepared for many instances ``` ## Node.js - use async properly ```javascript // With high concurrency, ensure async operations const express = require('express'); const app = express(); app.get('/api/data', async (req, res) => { // All I/O should be async const data = await fetchFromDatabase(); const enriched = await enrichData(data); res.json(enriched); }); // Concurrency 80+ is safe for async I/O workloads ``` ## Python - use async framework ```python from fastapi import FastAPI import asyncio import httpx app = FastAPI() @app.get("/api/data") async def get_data(): # Async I/O allows high concurrency async with httpx.AsyncClient() as client: response = await client.get("https://api.example.com/data") return response.json() # Concurrency 80+ safe with async framework ``` ## Calculate concurrency ``` concurrency = memory_limit / per_request_memory Example: - 512MB container - 20MB per request overhead - Safe concurrency: ~25 ``` ### CPU Throttled When Not Handling Requests Severity: HIGH Situation: Running background tasks or processing between requests Symptoms: Background tasks run extremely slowly. Scheduled work doesn't complete. Metrics collection fails. Connection keep-alive breaks. Why this breaks: By default, Cloud Run throttles CPU to near-zero when not actively handling a request. This is "CPU only during requests" mode. Affected operations: - Background threads - Connection pool maintenance - Metrics/telemetry emission - Scheduled tasks within container - Cleanup operations after response Recommended fix: ## Enable CPU always allocated ```bash # CPU allocated even outside requests gcloud run deploy my-service \ --cpu-throttling=false \ --min-instances=1 # Note: This increases costs but enables background work ``` ## Use startup CPU boost for initialization ```bash # Boost CPU during cold start only gcloud run deploy my-service \ --cpu-boost \ --cpu-throttling=true # Default, throttle after request ``` ## Move background work to Cloud Tasks ```python from google.cloud import tasks_v2 import json def create_background_task(payload): client = tasks_v2.CloudTasksClient() parent = client.queue_path( "my-project", "us-central1", "my-queue" ) task = { "http_request": { "http_method": tasks_v2.HttpMethod.POST, "url": "https://my-service.run.app/process", "body": json.dumps(payload).encode(), "headers": {"Content-Type": "application/json"} } } client.create_task(parent=parent, task=task) # Handle response immediately, background via Cloud Tasks @app.post("/api/order") async def create_order(order: Order): order_id = await save_order(order) # Queue background processing create_background_task({"order_id": order_id}) return {"order_id": order_id, "status": "processing"} ``` ## Use Pub/Sub for async processing ```yaml # Move heavy processing to separate service steps: # Main service - responds quickly - name: 'gcr.io/cloud-builders/gcloud' args: ['run', 'deploy', 'api-service', '--cpu-throttling=true'] # Worker service - processes messages - name: 'gcr.io/cloud-builders/gcloud' args: ['run', 'deploy', 'worker-service', '--cpu-throttling=false', '--min-instances=1'] ``` ### VPC Connector 10-Minute Idle Timeout Severity: MEDIUM Situation: Cloud Run service connecting to VPC resources Symptoms: Connection errors after period of inactivity. "Connection reset" or "Connection refused" errors. Sporadic failures to VPC resources. Database connections drop unexpectedly. Why this breaks: Cloud Run's VPC connector has a 10-minute idle timeout on connections. If a connection is idle for 10 minutes, it's silently closed. Affects: - Database connection pools - Redis connections - Internal API connections - Any persistent VPC connection Recommended fix: ## Configure connection pool with keep-alive ```python # SQLAlchemy with connection recycling from sqlalchemy import create_engine engine = create_engine( DATABASE_URL, pool_size=5, max_overflow=2, pool_recycle=300, # Recycle connections every 5 minutes pool_pre_ping=True # Validate connection before use ) ``` ## TCP keep-alive for custom connections ```python import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5) ``` ## Redis with connection validation ```python import redis pool = redis.ConnectionPool( host=REDIS_HOST, port=6379, socket_keepalive=True, socket_keepalive_options={ socket.TCP_KEEPIDLE: 60, socket.TCP_KEEPINTVL: 60, socket.TCP_KEEPCNT: 5 }, health_check_interval=30 ) client = redis.Redis(connection_pool=pool) ``` ## Use Cloud SQL Proxy sidecar ```yaml # Use Cloud SQL connector which handles reconnection # requirements.txt cloud-sql-python-connector[pg8000] ``` ```python from google.cloud.sql.connector import Connector import sqlalchemy connector = Connector() def getconn(): return connector.connect( "project:region:instance", "pg8000", user="user", password="password", db="database" ) engine = sqlalchemy.create_engine( "postgresql+pg8000://", creator=getconn ) ``` ### Container Startup Timeout (4 minutes max) Severity: HIGH Situation: Deploying containers with slow initialization Symptoms: Deployment fails with "Container failed to start". Service never becomes healthy. "Revision failed to become ready" errors. Works locally but fails on Cloud Run. Why this breaks: Cloud Run expects your container to start listening on PORT within 4 minutes (240 seconds). If it doesn't, the instance is killed. Common causes: - Heavy framework initialization (ML models, etc.) - Waiting for external dependencies at startup - Large dependency loading - Database migrations on startup Recommended fix: ## Enable startup CPU boost ```bash gcloud run deploy my-service \ --cpu-boost \ --startup-cpu-boost ``` ## Lazy initialization ```python from functools import lru_cache from fastapi import FastAPI app = FastAPI() # Don't load at import time model = None @lru_cache() def get_model(): global model if model is None: # Load on first request, not at startup model = load_heavy_model() return model @app.get("/predict") async def predict(data: dict): model = get_model() # Loads on first call only return model.predict(data) # Startup is fast - model loads on first request ``` ## Start listening immediately ```python import asyncio from fastapi import FastAPI import uvicorn app = FastAPI() # Global state for async initialization initialized = asyncio.Event() @app.on_event("startup") async def startup(): # Start background initialization asyncio.create_task(async_init()) async def async_init(): # Heavy initialization happens after server starts await load_models() await warm_up_connections() initialized.set() @app.get("/ready") async def ready(): if not initialized.is_set(): raise HTTPException(503, "Still initializing") return {"status": "ready"} @app.get("/health") async def health(): # Always respond - health check passes return {"status": "healthy"} ``` ## Use multi-stage builds ```dockerfile # Build stage - slow FROM python:3.11 as builder WORKDIR /app COPY requirements.txt . RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt # Runtime stage - fast startup FROM python:3.11-slim WORKDIR /app COPY --from=builder /wheels /wheels RUN pip install --no-cache /wheels/* && rm -rf /wheels COPY . . CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"] ``` ## Run migrations separately ```bash # Don't migrate on startup - use Cloud Build steps: # Run migrations first - name: 'gcr.io/cloud-builders/gcloud' entrypoint: 'bash' args: - '-c' - | gcloud run jobs execute migrate-job --wait # Then deploy - name: 'gcr.io/cloud-builders/gcloud' args: ['run', 'deploy', 'my-service', ...] ``` ### Second Generation Execution Environment Differences Severity: MEDIUM Situation: Migrating to or using Cloud Run second-gen execution environment Symptoms: Network behavior changes. Different syscall support. File system behavior differences. Container behaves differently than in first-gen. Why this breaks: Cloud Run's second-generation execution environment uses a different sandbox (gVisor) with different characteristics: - More Linux syscalls supported - Full /proc and /sys access - Different network stack - No automatic HTTPS redirect - Different tmp filesystem behavior Recommended fix: ## Explicitly set execution environment ```bash # First generation (legacy) gcloud run deploy my-service \ --execution-environment=gen1 # Second generation (recommended for most) gcloud run deploy my-service \ --execution-environment=gen2 ``` ## Handle network differences ```python # Second-gen doesn't auto-redirect HTTP to HTTPS from fastapi import FastAPI, Request from fastapi.responses import RedirectResponse app = FastAPI() @app.middleware("http") async def redirect_https(request: Request, call_next): # Check X-Forwarded-Proto header if request.headers.get("X-Forwarded-Proto") == "http": url = request.url.replace(scheme="https") return RedirectResponse(url, status_code=301) return await call_next(request) ``` ## GPU access (second-gen only) ```bash # GPUs only available in second-gen gcloud run deploy ml-service \ --execution-environment=gen2 \ --gpu=1 \ --gpu-type=nvidia-l4 ``` ## Check execution environment ```python import os def get_execution_environment(): # Second-gen has different /proc structure try: with open('/proc/version', 'r') as f: version = f.read() if 'gVisor' in version: return 'gen2' except: pass return 'gen1' ``` ### Request Timeout Configuration Mismatch Severity: MEDIUM Situation: Long-running requests or background processing Symptoms: Requests terminated before completion. 504 Gateway Timeout errors. Processing stops unexpectedly. Inconsistent timeout behavior. Why this breaks: Cloud Run has multiple timeout configurations that must align: - Request timeout (default 300s, max 3600s for HTTP, 60m for gRPC) - Client timeout - Downstream service timeouts - Load balancer timeout (for external access) Recommended fix: ## Set consistent timeouts ```bash # Increase request timeout (max 3600s for HTTP) gcloud run deploy my-service \ --timeout=900 # 15 minutes ``` ## Handle long-running with webhooks ```python from fastapi import FastAPI, BackgroundTasks import httpx app = FastAPI() @app.post("/process") async def process(data: dict, background_tasks: BackgroundTasks): task_id = create_task_id() # Start background processing background_tasks.add_task( long_running_process, task_id, data, data.get("callback_url") ) # Return immediately return {"task_id": task_id, "status": "processing"} async def long_running_process(task_id, data, callback_url): result = await heavy_computation(data) # Callback when done if callback_url: async with httpx.AsyncClient() as client: await client.post(callback_url, json={ "task_id": task_id, "result": result }) ``` ## Use Cloud Tasks for reliable long-running ```python from google.cloud import tasks_v2 def create_long_running_task(data): client = tasks_v2.CloudTasksClient() parent = client.queue_path(PROJECT, REGION, "long-tasks") task = { "http_request": { "http_method": tasks_v2.HttpMethod.POST, "url": "https://worker.run.app/process", "body": json.dumps(data).encode(), "headers": {"Content-Type": "application/json"} }, "dispatch_deadline": {"seconds": 1800} # 30 min } return client.create_task(parent=parent, task=task) ``` ## Streaming for long responses ```python from fastapi import FastAPI from fastapi.responses import StreamingResponse @app.get("/large-report") async def large_report(): async def generate(): for chunk in process_large_data(): yield chunk return StreamingResponse(generate(), media_type="text/plain") ``` ## Validation Checks ### Hardcoded GCP Credentials Severity: ERROR GCP credentials must never be hardcoded in source code Message: Hardcoded GCP service account credentials. Use Secret Manager or Workload Identity. ### GCP API Key in Source Code Severity: ERROR API keys should use Secret Manager Message: Hardcoded GCP API key. Use Secret Manager. ### Credentials JSON File in Repository Severity: ERROR Service account JSON files should not be in source control Message: Credentials file detected. Add to .gitignore and use Secret Manager. ### Running as Root User Severity: WARNING Containers should not run as root for security Message: Dockerfile runs as root. Add USER directive for security. ### Missing Health Check in Dockerfile Severity: INFO Cloud Run uses HTTP health checks, Dockerfile HEALTHCHECK is optional Message: No HEALTHCHECK in Dockerfile. Cloud Run uses its own health checks. ### Hardcoded Port in Application Severity: WARNING Port should come from PORT environment variable Message: Hardcoded port. Use PORT environment variable for Cloud Run. ### Large File Writes to /tmp Severity: WARNING /tmp uses container memory, large writes can cause OOM Message: /tmp writes consume memory. Consider Cloud Storage for large files. ### Synchronous File Operations Severity: WARNING Sync file ops block the event loop in async apps Message: Synchronous file operations. Use async versions for better concurrency. ### Global Mutable State Severity: WARNING Global state issues with concurrent requests Message: Global mutable state may cause issues with concurrent requests. ### Thread-Unsafe Singleton Pattern Severity: WARNING Singletons need thread safety for concurrency > 1 Message: Singleton pattern - ensure thread safety if using concurrency > 1. ## Collaboration ### Delegation Triggers - user needs AWS serverless -> aws-serverless (Lambda, API Gateway, SAM) - user needs Azure containers -> azure-functions (Azure Container Apps, Functions) - user needs database design -> postgres-wizard (Cloud SQL design, AlloyDB) - user needs authentication -> auth-specialist (Firebase Auth, Identity Platform) - user needs AI integration -> llm-architect (Vertex AI, Cloud Run + LLM) - user needs workflow orchestration -> workflow-automation (Cloud Workflows, Eventarc) ## When to Use Use this skill when the request clearly matches the capabilities and patterns described above. ## Limitations - Use this skill only when the task clearly matches the scope described above. - Do not treat the output as a substitute for environment-specific validation, testing, or expert review. - Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.