--- name: palantir-prod-checklist description: 'Execute Palantir Foundry production deployment checklist and rollback procedures. Use when deploying Foundry integrations to production, preparing for launch, or implementing go-live procedures. Trigger with phrases like "palantir production", "deploy foundry", "palantir go-live", "foundry launch checklist". ' allowed-tools: Read, Bash(kubectl:*), Bash(curl:*), Grep version: 2.0.0 license: MIT author: Jeremy Longshore tags: - saas - palantir - foundry - production - deployment compatibility: Designed for Claude Code, also compatible with Codex and OpenClaw --- # Palantir Production Checklist ## Overview Complete go-live checklist for deploying Foundry-integrated applications to production. Covers credential management, health checks, monitoring, and rollback procedures. ## Prerequisites - Staging environment tested and verified - Production OAuth2 credentials from Developer Console - Deployment pipeline configured - Monitoring infrastructure ready ## Instructions ### Pre-Deployment: Credentials & Config - [ ] OAuth2 client credentials in secrets manager (not personal tokens) - [ ] Scopes are minimal: only what the app actually needs - [ ] `FOUNDRY_HOSTNAME` points to production enrollment - [ ] Separate credentials from staging (not shared) - [ ] Credential rotation schedule documented (90-day max) ### Code Quality - [ ] All tests passing including Foundry integration tests - [ ] No hardcoded hostnames, tokens, or RIDs - [ ] Error handling covers all Foundry `ApiError` status codes - [ ] Rate limiting with exponential backoff implemented - [ ] Logging uses structured format (JSON) with request IDs ### Infrastructure - [ ] Health check endpoint verifies Foundry connectivity ```python @app.get("/health") async def health(): try: client.ontologies.Ontology.list() return {"status": "healthy", "foundry": "connected"} except foundry.ApiError as e: return {"status": "degraded", "foundry": f"error_{e.status_code}"} ``` - [ ] Circuit breaker pattern for Foundry API calls - [ ] Graceful degradation when Foundry is unreachable - [ ] Timeout configuration: 30s for reads, 60s for writes - [ ] Connection pooling configured ### Monitoring & Alerting - [ ] Metrics: request count, latency p50/p99, error rate by status code - [ ] Alert: 5xx error rate > 5% for 5 minutes → P1 - [ ] Alert: p99 latency > 10s for 10 minutes → P2 - [ ] Alert: 429 rate > 10/min → P2 (tune rate limiter) - [ ] Alert: 401/403 errors → P1 (credential issue) - [ ] Dashboard with Foundry API health summary ### Documentation - [ ] Incident runbook: `palantir-incident-runbook` - [ ] Credential rotation procedure documented - [ ] Rollback procedure documented and tested - [ ] On-call escalation path defined - [ ] Foundry support contact info available ### Deploy ```bash set -euo pipefail # Pre-flight curl -sf "https://$FOUNDRY_HOSTNAME/api/v2/ontologies" \ -H "Authorization: Bearer $FOUNDRY_TOKEN" > /dev/null \ && echo "Foundry API reachable" || echo "BLOCKED: Foundry unreachable" # Deploy with canary kubectl set image deployment/my-app app=myimage:v2.0.0 --record kubectl rollout status deployment/my-app --timeout=300s ``` ### Rollback ```bash kubectl rollout undo deployment/my-app kubectl rollout status deployment/my-app ``` ## Output - Production deployment with verified Foundry connectivity - Health checks passing - Monitoring and alerting active - Rollback procedure tested ## Error Handling | Alert | Condition | Severity | |-------|-----------|----------| | Foundry Unreachable | Health check fails 3x | P1 | | Auth Failure | Any 401/403 | P1 | | Rate Limited | 429 > 10/min | P2 | | High Latency | p99 > 10s | P2 | ## Resources - [Foundry API Reference](https://www.palantir.com/docs/foundry/api/general/overview/introduction) - [Foundry Documentation](https://www.palantir.com/docs/foundry) ## Next Steps For version upgrades, see `palantir-upgrade-migration`.