--- name: deploy-to-production description: Comprehensive guide for deploying Orient to production. Use this skill when deploying changes, updating production, fixing deployment failures, or rolling back. Covers pre-flight checks, environment variables, Docker compose configuration, CI/CD pipeline, smart change detection, and health verification. --- # Deploy to Production ## Quick Reference ### Deploy via GitHub Actions (Recommended) ```bash # Push to main triggers automatic deployment git push origin main # Watch deployment progress gh run watch --exit-status # Check deployment status gh run list --limit 5 ``` ### Force Rebuild All Images When you need to bypass change detection and rebuild everything: ```bash # Via GitHub Actions UI: Run workflow with "Force rebuild all images" checked # Or use workflow_dispatch: gh workflow run deploy.yml -f force_build_all=true ``` ### Manual Deployment (Emergency) ```bash # SSH to server ssh $OCI_USER@$OCI_HOST # Navigate to docker directory cd ~/orient/docker # IMPORTANT: Always use --env-file to load environment variables COMPOSE_CMD="sudo docker compose --env-file ../.env -f docker-compose.v2.yml -f docker-compose.prod.yml -f docker-compose.r2.yml" # Pull latest images $COMPOSE_CMD pull # Start services (recreates containers with current .env values) $COMPOSE_CMD up -d # Or as single commands: sudo docker compose --env-file ../.env -f docker-compose.v2.yml -f docker-compose.prod.yml -f docker-compose.r2.yml pull sudo docker compose --env-file ../.env -f docker-compose.v2.yml -f docker-compose.prod.yml -f docker-compose.r2.yml up -d ``` **Note**: Always pass `--env-file ../.env` to ensure environment variables are loaded. Without it, Docker Compose uses only variables from the shell environment. ## Smart Change Detection The CI/CD pipeline uses intelligent change detection to only rebuild images when their source code changes. ### How It Works The `detect-changes` job analyzes which files changed and sets build flags: | Image | Triggered By Changes In | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------- | | **OpenCode** | `src/**`, `packages/core/**`, `packages/mcp-tools/**`, `packages/mcp-servers/**`, `packages/agents/**`, `docker/Dockerfile.opencode*` | | **WhatsApp** | `packages/bot-whatsapp/**`, `packages/core/**`, `packages/database/**` | | **Dashboard** | `packages/dashboard/**`, `packages/dashboard-frontend/**`, `packages/core/**` | | **All Images** | `package.json`, `pnpm-lock.yaml` (dependency changes) | ### Time Savings | Scenario | Old Pipeline | New Pipeline | | ----------------------------------- | ------------ | ------------ | | Single package change | ~20 min | ~5-8 min | | Config-only change (nginx, compose) | ~20 min | ~3 min | | Website-only change (Docusaurus) | ~20 min | ~2-3 min | | All packages change | ~20 min | ~20 min | ### Website-Only Deployments When you modify only website files (Docusaurus documentation), the deployment is extremely fast because: **What Gets Skipped:** - No Docker image builds (OpenCode, WhatsApp, Dashboard) - No image pushes to GHCR - No container recreation on the server **What Still Runs:** 1. **Detect Changes** (~8s) - Identifies website-only changes 2. **Run Tests** (~40s) - Runs test suite 3. **Deploy to Oracle Cloud** (~2min) - Only syncs website files and restarts nginx **Files That Trigger Website-Only Deployment:** - `website/docs/**` - Documentation markdown files - `website/src/**` - Custom React pages (e.g., privacy.tsx, terms.tsx) - `website/docusaurus.config.ts` - Site configuration - `website/static/**` - Static assets - `website/sidebars.ts` - Navigation configuration **Example Deployment:** ```bash # Change detection output for website-only changes changes_opencode: false changes_whatsapp: false changes_dashboard: false changes_website: true # Build jobs are skipped Build OpenCode Image: skipped Build WhatsApp Bot Image: skipped Build Dashboard Image: skipped # Deploy job syncs website files Deploy to Oracle Cloud: success (2min) ``` **What Gets Deployed:** The deploy job syncs the `website/` directory to the server and runs: ```bash # Build static Docusaurus site cd ~/orient/website && npm run build # Nginx serves the built site from website/build/ # No container restarts needed (except nginx reload for config changes) ``` **Quick Website Updates:** For documentation or content changes, this means you can deploy in under 3 minutes total - perfect for rapid iterations on privacy policies, terms, blog posts, or documentation updates. ### Workflow Jobs ``` detect-changes (8s) | test (40s) | +----+----+----+ | | | | v v v v build-opencode build-whatsapp build-dashboard (conditional) | | | +------+-------+----------------+ | v deploy (2min) ``` ## Pre-Deployment Checklist ### 1. Local Validation Before pushing changes, always verify locally: ```bash # Run tests (CI mode excludes e2e and eval tests) pnpm run test:ci # Run Docker validation tests pnpm turbo test --filter @orientbot/core... # Validate Docker compose syntax cd docker docker compose -f docker-compose.v2.yml -f docker-compose.prod.yml -f docker-compose.r2.yml config --services ``` ### 2. Check Service Names Consistency The v2 compose uses specific service names: | Service | V2 Service Name | Container Name | Notes | | --------- | --------------- | ------------------ | ----------------------------- | | Dashboard | dashboard | orienter-dashboard | Includes WhatsApp integration | | OpenCode | opencode | orienter-opencode | | | Slack | bot-slack | orienter-bot-slack | Optional | | Nginx | nginx | orienter-nginx | | **v0.2.0 Architecture Change**: WhatsApp is now integrated into the dashboard service. There is no separate `bot-whatsapp` service in v0.2.0. ### 3. Environment Variables & GitHub Secrets **CRITICAL**: Environment variables must be properly configured in three places: 1. `.env.production` file (local reference) 2. GitHub Secrets (for CI/CD) 3. Server `.env` file at `/home/opc/orient/.env` #### Managing GitHub Secrets **Update all secrets from .env.production**: ```bash cat .env.production | grep -E '^[A-Z_][A-Z0-9_]*=' | while IFS='=' read -r key value; do value=$(echo "$value" | sed 's/^"//; s/"$//') echo "Setting: $key" echo "$value" | gh secret set "$key" --repo orient-core/orient done ``` #### Critical Environment Variables Required for production: ```bash # Database (SQLite) SQLITE_DB_PATH=/app/data/orient.db # Dashboard Security (REQUIRED - causes crash loop if missing) DASHBOARD_JWT_SECRET="<32+ character secure string>" # Storage (R2) R2_ACCESS_KEY_ID= R2_SECRET_ACCESS_KEY= R2_ACCOUNT_ID= # OAuth Callbacks (must match registered URLs) OAUTH_CALLBACK_URL=https://app.orient.bot/oauth/callback GOOGLE_OAUTH_CALLBACK_URL=https://app.orient.bot/oauth/google/callback # API Keys ANTHROPIC_API_KEY= OPENAI_API_KEY= # Slack Configuration (optional) SLACK_BOT_TOKEN= SLACK_SIGNING_SECRET= SLACK_APP_TOKEN= ``` #### Applying Environment Variable Changes **IMPORTANT**: `docker restart` does NOT reload environment variables from `.env`. ```bash # WRONG - Won't pick up new env vars ssh $OCI_USER@$OCI_HOST "docker restart orienter-dashboard" # CORRECT - Recreates container with new env vars ssh $OCI_USER@$OCI_HOST "cd /home/opc/orient/docker && \ docker compose --env-file ../.env \ -f docker-compose.v2.yml \ -f docker-compose.prod.yml \ -f docker-compose.r2.yml \ up -d dashboard" ``` ## CI/CD Pipeline ### GitHub Actions Workflow (.github/workflows/deploy.yml) The deployment pipeline: 1. **Detect Changes** - Determines which images need rebuilding (8s) 2. **Tests** - Runs `pnpm run test:ci` (excludes e2e/eval tests) 3. **Build Images** - Only builds changed packages (conditional) 4. **Deploy** - Syncs files and restarts services ### Common CI Failures | Issue | Cause | Fix | | ------------------------------------------- | ------------------------- | ------------------------------- | | `Cannot find package` | Missing devDependency | Check pnpm-lock.yaml | | `No test found in suite` | Eval tests included | Use `test:ci` instead of `test` | | Dockerfile not found | Path changed | Update workflow matrix | | Container name conflict | V1/V2 name mismatch | Clean up both names | | `Missing parameter name at index 1: *` | Express 5 breaking change | Use `/{*splat}` not `*` | | `SKILL.md file(s) with invalid frontmatter` | Missing YAML metadata | Add `---` delimited frontmatter | ### Skill File Validation Failures The CI pipeline validates all SKILL.md files have proper YAML frontmatter metadata. **Error Example:** ``` Error: Found 2 SKILL.md file(s) with invalid frontmatter: - .claude/skills/personal-vite-jsx-caching-fix/SKILL.md: File does not start with frontmatter delimiter (---) - .claude/skills/personal-crypto-secrets-management/SKILL.md: File does not start with frontmatter delimiter (---) ``` **Required YAML Frontmatter Format:** ```yaml --- name: my-skill-name description: "Brief description of what this skill does" --- # Skill Title ... rest of skill content ... ``` **Common Issues with Multi-Repo Setups:** When using a personal fork (e.g., `orient-core/orient`) that has additional skills not in the OSS repo (`orient-bot/orient`): 1. **Personal skills are gitignored in OSS** - Files starting with `personal-` are in `.gitignore` 2. **Tests run on ALL skill files** - Including personal skills that may lack frontmatter 3. **OSS repo passes, personal repo fails** - Because personal skills weren't tested upstream **Recovery Workflow:** ```bash # 1. Checkout the failing repo's main branch git fetch deploy main git checkout -B fix-skill-frontmatter deploy/main # 2. Find skills missing frontmatter grep -L "^---" .claude/skills/*/SKILL.md # 3. Add frontmatter to each file # File must START with --- (no content before it) # 4. Commit and push fix git add .claude/skills/ git commit -m "fix(skills): add YAML frontmatter to skill files" git push deploy fix-skill-frontmatter:main # 5. Re-trigger deployment gh workflow run deploy.yml -f force_build_all=true --repo YOUR_ORG/YOUR_REPO ``` **Validation Test Location:** `tests/config/skill-files.test.ts` ### Express 5 / path-to-regexp v8 Breaking Changes Express 5 uses path-to-regexp v8, which has breaking changes: **Problem**: Bare `*` wildcards no longer work ```typescript // BROKEN in Express 5 app.get('*', (req, res) => { ... }); // FIXED - use named wildcard app.get('/{*splat}', (req, res) => { ... }); ``` **Error message**: `TypeError: Missing parameter name at index 1: *` ## Health Verification ### Production Health Checks ```bash # Check all containers ssh $OCI_USER@$OCI_HOST "docker ps --format 'table {{.Names}}\t{{.Status}}'" # Check specific services curl -sf https://app.orient.bot/health # Nginx curl -sf https://code.orient.bot/global/health # OpenCode curl -sf https://app.orient.bot/dashboard/api/health # Dashboard ``` ### Expected Container Names - `orienter-nginx` - `orienter-opencode` - `orienter-dashboard` ## Rollback Procedure ### Automatic Rollback The CI pipeline automatically rolls back if health checks fail. ### Manual Rollback ```bash ssh $OCI_USER@$OCI_HOST cd ~/orient/docker COMPOSE_FILES="-f docker-compose.v2.yml -f docker-compose.prod.yml -f docker-compose.r2.yml" # Find latest backup ls -t ~/orient/backups | head -5 # Restore LATEST=$(ls -t ~/orient/backups | head -1) sudo docker compose ${COMPOSE_FILES} down cp -f ~/orient/backups/${LATEST}/*.yml . sudo docker compose ${COMPOSE_FILES} up -d ``` ## Troubleshooting ### Multi-Repository Deployment (Self-Hosted Runners) #### Jobs Stuck in "Queued" State **Error**: Docker build jobs show "queued" status for 10+ minutes without starting. **Cause**: The workflow requires self-hosted ARM64 runners (`runs-on: [self-hosted, linux, arm64]`), but you triggered the workflow on a repository that doesn't have the runner registered. **Diagnosis**: ```bash # Check your git remotes git remote -v # Example output showing two repos: # deploy https://github.com/orient-core/orient.git (fetch/push) ← Has self-hosted runner # origin https://github.com/orient-bot/orient.git (fetch/push) ← OSS repo, no runner # Check which repo the runner is registered to ssh opc@152.70.172.33 "systemctl status actions.runner.* 2>/dev/null | head -5" # Look for: actions.runner.orient-core-orient.oracle-arm64.service # ^^^^^^^^^^^^ This shows the org/repo ``` **Fix**: Trigger the workflow on the repository that has the self-hosted runner: ```bash # Cancel the stuck workflow gh run cancel --repo orient-bot/orient # Trigger on the correct repo (orient-core/orient has the runner) gh workflow run deploy.yml -f force_build_all=true --repo orient-core/orient # Monitor the new workflow gh run list --repo orient-core/orient --limit 3 gh run watch --repo orient-core/orient --exit-status ``` **Understanding the Two Repositories**: | Repository | Remote | Purpose | Self-Hosted Runner | | -------------------- | -------- | ----------------------- | ------------------ | | `orient-bot/orient` | `origin` | Open source repo | ❌ No | | `orient-core/orient` | `deploy` | Private deployment repo | ✅ Yes (ARM64) | **Key Points**: - The OSS repo (`orient-bot/orient`) doesn't have self-hosted runners - Docker builds will never start - Production deployments must be triggered on `orient-core/orient` where the ARM64 runner is registered - The runner is running on the Oracle Cloud server as a systemd service - GitHub-hosted runners won't work because the workflow specifies `[self-hosted, linux, arm64]` **Verifying Runner Status**: ```bash # Check if the runner service is running on the server ssh opc@152.70.172.33 "systemctl status actions.runner.orient-core-orient.oracle-arm64.service" # Should show: # Active: active (running) # ... Listening for Jobs # Check recent job history ssh opc@152.70.172.33 "journalctl -u actions.runner.orient-core-orient.oracle-arm64.service --since '1 hour ago' | grep -E '(Running job|completed)'" ``` ### GitHub Actions SSH Authentication #### OCI_SSH_PRIVATE_KEY Secret **Error**: `Load key "/home/runner/.ssh/id_rsa": error in libcrypto` or `Permission denied (publickey)` **Cause**: The `OCI_SSH_PRIVATE_KEY` secret is missing or malformed. **Fix**: Add your SSH private key to GitHub Secrets: ```bash # Add via gh CLI (recommended) gh secret set OCI_SSH_PRIVATE_KEY --repo orient-bot/orient < ~/.ssh/id_rsa # Or copy the key content and add via GitHub UI cat ~/.ssh/id_rsa | pbcopy # Then: Settings → Secrets → Actions → New repository secret ``` **Required format**: The full private key including headers: ``` -----BEGIN OPENSSH PRIVATE KEY----- ... key content ... -----END OPENSSH PRIVATE KEY----- ``` **Note**: The key must match what's authorized on the Oracle server (`~/.ssh/authorized_keys`). ### GHCR Package Access (403 Forbidden) **Error**: `failed to resolve reference "ghcr.io/orient-bot/orient/dashboard:latest": 403 Forbidden` **Cause**: GitHub Container Registry packages are private by default, even in public repos. **Fixes**: 1. **Make packages public** (recommended for open source): - Go to https://github.com/orgs/orient-bot/packages - Click each package → Settings → Danger Zone → Change visibility → Public 2. **Or use a PAT with `read:packages` scope**: ```bash # Create a PAT at https://github.com/settings/tokens with read:packages scope # Add to GitHub Secrets as GHCR_PAT # In workflow, authenticate before pulling: echo "${{ secrets.GHCR_PAT }}" | docker login ghcr.io -u USERNAME --password-stdin ``` 3. **Or authenticate the server permanently**: ```bash # On the Oracle server gh auth token | sudo docker login ghcr.io -u orient-bot --password-stdin ``` ### SQLite Database (v0.2.0+) Orient v0.2.0 uses SQLite instead of PostgreSQL, which simplifies deployment significantly. #### How Migrations Work 1. **Automatic on startup**: The dashboard container automatically applies Drizzle migrations when it starts 2. **Migration files**: Located in `packages/database/drizzle/sqlite/` 3. **Migrations table**: Drizzle creates `__drizzle_migrations` to track applied migrations #### Database Location - **In container**: `/app/data/orient.db` - **On host**: `~/orient/data/orient.db` - **Volume mount**: `~/orient/data` → `/app/data` #### Common SQLite Issues **Read-only database error**: ``` SqliteError: attempt to write a readonly database ``` **Cause**: The database file or WAL files aren't writable by the nodejs user (UID 1001). **CRITICAL**: SQLite uses Write-Ahead Logging (WAL) which creates `.db-shm` and `.db-wal` files. ALL THREE files must be owned by UID 1001: ```bash # Check current ownership ssh opc@152.70.172.33 "ls -la ~/orient/data/orient.db*" # Fix ALL SQLite files (main db + WAL files) ssh opc@152.70.172.33 "sudo chown 1001:1001 ~/orient/data/orient.db* ~/orient/data/" # Restart dashboard ssh opc@152.70.172.33 "cd ~/orient/docker && sudo docker compose --env-file ../.env -f docker-compose.v2.yml -f docker-compose.prod.yml -f docker-compose.r2.yml restart dashboard" ``` **Why this happens**: Manual database access (e.g., using `sqlite3` via alpine container) creates WAL files owned by root or opc, making the database read-only for the container. **No such table error**: ``` SqliteError: no such table: scheduled_jobs ``` **Cause**: Drizzle auto-migration failed. This can happen if the migrations table syntax is incompatible. **Manual Migration Recovery**: ```bash ssh opc@152.70.172.33 ' # Get migration SQL from container sudo docker run --rm ghcr.io/orient-core/orient/dashboard:latest \ cat /app/packages/database/drizzle/sqlite/0000_many_william_stryker.sql > /tmp/migration.sql # Apply using alpine container with sqlite sudo docker run --rm -v ~/orient/data:/data -v /tmp/migration.sql:/tmp/migration.sql alpine sh -c " apk add --no-cache sqlite > /dev/null 2>&1 cd /data # Create migrations tracking table sqlite3 orient.db \"CREATE TABLE IF NOT EXISTS __drizzle_migrations (id INTEGER PRIMARY KEY AUTOINCREMENT, hash TEXT NOT NULL, created_at INTEGER);\" # Apply the migration grep -v \"^-->\" /tmp/migration.sql | sqlite3 orient.db # Mark migration as applied sqlite3 orient.db \"INSERT OR IGNORE INTO __drizzle_migrations (hash, created_at) VALUES ('\''0000_many_william_stryker'\'', strftime('\''%s'\'', '\''now'\''));\" echo \"Tables created:\" sqlite3 orient.db \".tables\" " # Fix permissions and restart sudo chown 1001:1001 ~/orient/data/orient.db sudo docker restart orienter-dashboard ' ``` #### Verifying Database Health ```bash # Check database file exists and has correct permissions ssh opc@152.70.172.33 "ls -la ~/orient/data/orient.db" # List tables in database ssh opc@152.70.172.33 "sudo docker run --rm -v ~/orient/data:/data alpine sh -c 'apk add sqlite > /dev/null; sqlite3 /data/orient.db .tables'" # Check migration status ssh opc@152.70.172.33 "sudo docker run --rm -v ~/orient/data:/data alpine sh -c 'apk add sqlite > /dev/null; sqlite3 /data/orient.db \"SELECT * FROM __drizzle_migrations\"'" ``` ### Database Migration Failures (Legacy PostgreSQL) **Note**: v0.2.0+ uses SQLite. This section is for legacy PostgreSQL deployments only. **If migrations fail**, check the logs: ```bash ssh opc@152.70.172.33 "docker logs orienter-dashboard --tail 100 | grep -i migration" ``` #### Production Down After Failed Deploy If deployment fails partway through, production containers may be stopped: ```bash # Check what's running ssh opc@152.70.172.33 "docker ps --format 'table {{.Names}}\t{{.Status}}'" # Restart production manually ssh opc@152.70.172.33 "cd ~/orient/docker && \ sudo docker compose --env-file ../.env \ -f docker-compose.v2.yml \ -f docker-compose.prod.yml \ -f docker-compose.r2.yml \ up -d" ``` ### Docker Build Failures #### Missing Directories in Dockerfile **Error**: `failed to calculate checksum of ref: "/src": not found` or `/credentials": not found` **Cause**: Dockerfile tries to COPY directories that don't exist in the repo (e.g., `src/`, `credentials/`) **Fix**: Remove or comment out COPY statements for non-existent directories in `docker/Dockerfile.opencode.legacy`: ```dockerfile # Remove these lines if directories don't exist: # COPY src ./src # COPY credentials ./credentials ``` #### DEPLOY_ENV Build Argument **Error**: `"/docker/opencode.local.json": not found` **Cause**: OpenCode Dockerfile defaults to `DEPLOY_ENV=local`, which looks for `opencode.local.json` **Fix**: Ensure workflow passes `DEPLOY_ENV=prod` build-arg: ```yaml # In .github/workflows/deploy.yml - name: Build and push Docker image uses: docker/build-push-action@v5 with: build-args: | DEPLOY_ENV=prod ``` ### Server .env Configuration #### Complete .env File Requirements The server `.env` file at `~/orient/.env` must contain: ```bash # REQUIRED - Domain Configuration ORIENT_APP_DOMAIN=app.orient.bot ORIENT_CODE_DOMAIN=code.orient.bot ORIENT_STAGING_DOMAIN=staging.orient.bot ORIENT_CODE_STAGING_DOMAIN=code-staging.orient.bot # REQUIRED - Database (SQLite) SQLITE_DB_PATH=/app/data/orient.db # REQUIRED - Dashboard Security (crash loop if missing) DASHBOARD_JWT_SECRET= # REQUIRED - Encryption ORIENT_MASTER_KEY= # REQUIRED - MinIO (for local S3-compatible storage) MINIO_ROOT_USER=minioadmin MINIO_ROOT_PASSWORD=minioadmin123 # OPTIONAL - API Keys (can be empty initially) ANTHROPIC_API_KEY= OPENAI_API_KEY= R2_ACCESS_KEY_ID= R2_SECRET_ACCESS_KEY= ``` #### Generate Secrets ```bash # Generate secure values openssl rand -hex 32 # For DASHBOARD_JWT_SECRET openssl rand -hex 32 # For ORIENT_MASTER_KEY ``` ### Nginx Upstream Errors **Error**: `host not found in upstream "orienter-opencode-staging:5099"` **Cause**: Nginx config references staging containers that don't exist in production-only deployment **Fix**: Use a production-only nginx config that doesn't define staging upstreams, or make staging upstreams return 503: ```nginx # Staging servers - return 503 when staging not running server { listen 443 ssl; server_name staging.orient.bot code-staging.orient.bot; # ... ssl config ... location / { return 503 'Staging environment not deployed'; } } ``` ### Container Won't Start 1. Check logs: `docker logs orienter-dashboard --tail 100` 2. Check compose config: `docker compose config` 3. Verify service names match between compose files ### Dashboard Crash Loop Check for Express 5 errors: ```bash ssh $OCI_USER@$OCI_HOST "docker logs orienter-dashboard --tail 50 2>&1 | grep -i 'parameter name\|path-to-regexp'" ``` If you see `Missing parameter name at index 1: *`, fix the SPA catch-all route. ### SSL Certificate Issues ```bash # Check certificate paths ls -la ~/orient/certbot/conf/live/ # Verify nginx can read certs docker exec orienter-nginx ls -la /etc/nginx/ssl/ ``` ### Database Connection Failed ```bash # Check SQLite database file exists docker exec orienter-dashboard ls -la /app/data/orient.db # Check SQLITE_DB_PATH in container docker exec orienter-dashboard env | grep SQLITE_DB_PATH ``` ### Health Check Race Conditions **Error**: `dependency failed to start: container orienter-dashboard is unhealthy` during CI/CD deployment **Cause**: Docker Compose health checks can fail transiently when containers are first created, especially if the database connection takes a moment to establish. **Solution**: The CI/CD pipeline uses staged deployment: 1. **Stage 1**: Start backend services (dashboard, opencode) and wait for healthy 2. **Stage 2**: Start bot services and wait for healthy 3. **Stage 3**: Start nginx This is implemented in `.github/workflows/deploy.yml`: ```bash # Stage 1: Start backend services docker compose up -d dashboard opencode # Wait for healthy status before proceeding # Stage 2: Start bot services docker compose up -d bot-whatsapp # Wait for healthy status # Stage 3: Start nginx docker compose up -d nginx ``` **Manual Recovery**: If deployment fails mid-way: ```bash ssh opc@152.70.172.33 cd ~/orient/docker # Start services in stages manually COMPOSE="docker compose --env-file ../.env -f docker-compose.v2.yml -f docker-compose.prod.yml -f docker-compose.r2.yml" # 1. Start dashboard and opencode sudo $COMPOSE up -d dashboard opencode sleep 15 # 2. Check health docker ps --format 'table {{.Names}}\t{{.Status}}' | grep -E 'dashboard|opencode' # 3. If healthy, start remaining services sudo $COMPOSE up -d ``` ### WhatsApp Pairing Issues After Deploy **v0.2.0+**: WhatsApp is integrated into the dashboard service. ```bash # Container restart usually fixes pairing issues docker restart orienter-dashboard # Full reset if needed (clears session) rm -rf ~/orient/data/whatsapp-auth/* docker restart orienter-dashboard # Check WhatsApp logs within dashboard docker logs orienter-dashboard --tail 100 | grep -i whatsapp ``` **Access QR code for pairing**: - URL: https://app.orient.bot/qr/ - Or via API: https://app.orient.bot/api/whatsapp/qr ### Health Endpoint Testing After deployment, verify all services are accessible: ```bash # Test health endpoints curl -sf https://app.orient.bot/health && echo " OK" curl -sf https://code.orient.bot/health && echo " OK" # Test dashboard is serving curl -sf -o /dev/null -w "%{http_code}" https://app.orient.bot/ # Check container health status ssh opc@152.70.172.33 "docker ps --format 'table {{.Names}}\t{{.Status}}'" ``` Expected output - all containers should show "(healthy)": ``` NAMES STATUS orienter-nginx Up X minutes (healthy) orienter-opencode Up X minutes (healthy) orienter-dashboard Up X minutes (healthy) ``` ### ESM Module Import Resolution Issues When running tests in CI or locally, you may encounter module resolution errors for subpath exports. #### Subpath Export Resolution Failures **Error**: `Cannot find package '@orientbot/integrations/google' imported from '...'` **Cause**: Vitest/tsx may not correctly resolve package subpath exports (e.g., `@orientbot/integrations/google`) even when the `exports` field in package.json is properly configured. **Fix**: Use the main export instead of subpath exports: ```typescript // BROKEN - subpath export may not resolve in vitest import { getGoogleOAuthService } from '@orientbot/integrations/google'; // FIXED - use main export (re-exports all submodules) import { getGoogleOAuthService } from '@orientbot/integrations'; ``` **Why This Happens**: The package.json exports field specifies `./dist/...` paths for subpath exports. While Node.js resolves these correctly at runtime, vitest/tsx running TypeScript directly may fail to resolve them during tests. #### Stray .js Files in Source Directories **Error**: Tests fail with old code even after fixing TypeScript source files. **Cause**: Compiled `.js` files accidentally exist in `src/` directories alongside `.ts` files. Vitest may load the stale compiled files instead of the updated TypeScript sources. **Diagnosis**: ```bash # Find stray .js files in source directories find packages/*/src -name "*.js" -type f # Check if the error points to a .js file in src/ grep -r "integrations/google" packages/*/src/*.js ``` **Fix**: ```bash # Remove stray compiled files from source directories rm packages/dashboard/src/server/routes/*.js rm packages/dashboard/src/server/routes/*.js.map # These should only exist in dist/, not src/ ``` **Prevention**: Add to `.gitignore`: ``` # Ignore compiled JS in source directories packages/*/src/**/*.js packages/*/src/**/*.js.map ``` #### Test Failure Diagnosis Workflow When CI tests fail with import errors across multiple service files: 1. **Check the actual error location**: ```bash gh run view RUN_ID --log-failed | grep -A5 "Cannot find" ``` 2. **Verify the import in source files**: ```bash grep -r "integrations/google" packages/*/src/ ``` 3. **Look for stray compiled files**: ```bash find packages -path "*/src/*.js" -type f ``` 4. **Rebuild after fixing**: ```bash pnpm turbo build --filter=@orientbot/dashboard pnpm test:ci ``` #### Docker Compose Test Updates for Multi-Instance Support When container naming schemes change (e.g., adding instance IDs for multi-instance support), existing docker compose tests may fail. **Error**: `expected 'orienter-bot-whatsapp-${AI_INSTANCE_ID:-0}' to be 'orienter-bot-whatsapp'` **Cause**: Tests expect fixed container names but compose files now use instance-aware naming. **Fix**: Update `tests/docker/compose.test.ts` to expect the new naming pattern: ```typescript // OLD - fixed names expect(compose.services['bot-whatsapp'].container_name).toBe('orienter-bot-whatsapp'); // NEW - instance-aware names expect(compose.services['bot-whatsapp'].container_name).toBe( 'orienter-bot-whatsapp-${AI_INSTANCE_ID:-0}' ); ``` ### Worktree Checkout Conflicts When working in a git worktree, you may encounter branch checkout conflicts when trying to merge PRs. **Error:** ``` fatal: 'main' is already used by worktree at '/path/to/other-worktree' ``` **Cause:** Git prevents checking out a branch that's already checked out in another worktree. This happens when: - You try to run `gh pr merge` which attempts to checkout the target branch (main) - Another worktree already has the `main` branch checked out - Git's safety mechanism prevents conflicts between worktrees **Workaround Options:** **Option 1: Use GitHub API (Recommended)** Merge PRs without checking out the target branch: ```bash # Merge PR with squash gh api repos/orient-bot/orient/pulls/PR_NUMBER/merge -X PUT \ -f merge_method=squash \ -f commit_title="Your commit title (#PR_NUMBER)" # Example gh api repos/orient-bot/orient/pulls/50/merge -X PUT \ -f merge_method=squash \ -f commit_title="docs(website): add Privacy Policy and Terms pages (#50)" ``` **Option 2: Use Auto-Merge** Enable auto-merge and let GitHub merge when checks pass: ```bash gh pr merge PR_NUMBER --auto --squash ``` **Option 3: Switch to Main Worktree** Navigate to the worktree that has `main` checked out: ```bash # Find which worktree has main git worktree list | grep main # Navigate to that worktree cd /path/to/main-worktree # Merge from there gh pr merge PR_NUMBER --squash --delete-branch ``` **Option 4: Merge from GitHub Web UI** When automation fails, use the GitHub web interface: 1. Navigate to the PR on GitHub 2. Click "Squash and merge" button 3. Confirm the merge **Prevention:** When creating worktrees for feature branches, avoid checking out `main` in multiple worktrees: ```bash # Good - each worktree has its own branch git worktree add ~/worktrees/feature-a -b feature-a git worktree add ~/worktrees/feature-b -b feature-b # Avoid - multiple worktrees with main git worktree add ~/worktrees/work-1 main # First is OK git worktree add ~/worktrees/work-2 main # This will fail ``` **Why This Matters:** This worktree limitation only affects the merge operation itself. You can: - ✅ Create PRs from worktree branches - ✅ Push changes from worktrees - ✅ Review and approve PRs - ✅ Run CI/CD workflows - ❌ Merge PRs using `gh pr merge` when main is checked out elsewhere The GitHub API workaround bypasses the local git checkout, making it the most reliable option when working with worktrees. ## Quick Commands ```bash # Check production status ssh opc@152.70.172.33 "docker ps --format 'table {{.Names}}\t{{.Status}}'" # View dashboard logs ssh opc@152.70.172.33 "docker logs orienter-dashboard --tail 100" # View nginx logs ssh opc@152.70.172.33 "docker logs orienter-nginx --tail 50" # Restart dashboard ssh opc@152.70.172.33 "docker restart orienter-dashboard" # Full redeploy git push origin main && gh run watch --exit-status # Force full rebuild gh workflow run deploy.yml -f force_build_all=true ``` ## Server Details - **Host**: 152.70.172.33 - **User**: opc - **Deploy Directory**: ~/orient - **Docker Directory**: ~/orient/docker - **Data Directory**: ~/orient/data - **Domains**: - `app.orient.bot` - Dashboard - `code.orient.bot` - OpenCode - `staging.orient.bot` - Staging Dashboard - `code-staging.orient.bot` - Staging OpenCode