# Burnout-as-a-Service: Agent Architecture

This document describes the multi-agent architecture powering the burnout prevention system. The system uses **LangChain4j's Supervisor Pattern** (`langchain4j-agentic`) with Azure OpenAI to orchestrate specialized AI agents that analyze, classify, and rebalance developer workloads.

---

## Setup commands

### Backend (Java 21 + Maven)

```bash
cd backend
mvn clean package -DskipTests
java -jar target/burnout-backend-0.0.1-SNAPSHOT.jar
```

### MCP App (Node.js 18+)

```bash
cd mcp-app
npm install
npm run build
```

### Environment variables

Create a `.env` file in the project root:

```env
# For Azure deployment (after `azd up`):
BACKEND_URL=https://your-backend.wonderfulstone-xxxxx.swedencentral.azurecontainerapps.io

# For local development:
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_DEPLOYMENT=gpt-4o-mini
BACKEND_URL=http://localhost:8080
```

### Running the backend locally

The backend requires Azure OpenAI credentials and has security enabled by default. For local development without an Azure OpenAI resource, use dummy values — the system will use deterministic fallback responses instead of LLM calls:

```bash
cd backend
mvn clean package -DskipTests
java -Dsecurity.enabled=false \
     -Dazure.openai.endpoint=https://dummy.openai.azure.com \
     -Dazure.openai.api-key=dummy-key \
     -jar target/burnout-backend-0.0.1-SNAPSHOT.jar
```

The `security.enabled=false` flag skips GitHub token validation on `/api/**` endpoints. The `AZURE_OPENAI_DEPLOYMENT` defaults to `gpt-4o` (see `application.yml`).

### Azure deployment (one command)

```bash
azd auth login
azd up
```

---

## Dev environment tips

- The backend is a Spring Boot app in `backend/`. Configuration is in `backend/src/main/resources/application.yml` and `application-demo.yml`.
- The MCP app is a TypeScript project in `mcp-app/`. Use `npm run watch` for incremental TypeScript compilation during development.
- The `.vscode/mcp.json` is pre-configured. Restart the MCP server in VS Code after building to pick up changes.
- GitHub CLI (`gh`) must be installed and authenticated (`gh auth login`) — the MCP app uses `gh auth token` for GitHub API access.
- GitHub labels are set up with `scripts/setup-labels.sh` and seed issues with `scripts/seed-issues.sh`.
- Infrastructure is defined in `infra/` using Bicep templates. `azure.yaml` configures Azure Developer CLI.
- The `IssueCache` is an in-memory `ConcurrentHashMap`. Data is lost on restart — re-sync via MCP or re-seed via `/demo/api/seed`.
- The MCP app config is in `mcp-app/src/config.ts`. It reads `BACKEND_URL` from `.env` (defaults to `http://localhost:8080`).
- The MCP app should **not** be modified when adding demo/web features — it is the VS Code Copilot Chat integration layer only.

---

## Testing instructions

- Run backend tests: `cd backend && mvn test`
- Integration test: `backend/src/test/java/com/demo/burnout/IntegrationTest.java`
- MCP app has no test suite currently — validate by building (`npm run build`) and confirming no TypeScript errors.
- After making changes, always run `mvn clean package -DskipTests` to verify the backend compiles, then `mvn test` to run the test suite.
- To test the full flow locally: start the backend, build the MCP app, reload VS Code, then use the MCP tools in Copilot Chat.

### Post-deployment smoke test

After `azd up`, run the smoke test to verify all endpoints, metrics, and pages work:

```powershell
.\scripts\smoke-test.ps1 -BaseUrl https://your-app.azurecontainerapps.io
```

This seeds fresh data, then tests 26 assertions covering:
- Health endpoint, issue seeding, all 6 stress breakdown metrics are non-zero
- Breakdown hints (tooltip data) present for all categories
- Flamegraph API returns day plan with correct issue count
- Study snapshots are persisted and queryable
- All 3 static pages serve HTTP 200

### Quick demo setup (one command)

The `scripts/seed-demo.sh` (bash) and `scripts/seed-demo.ps1` (PowerShell) scripts seed everything needed for a live demo — issues, checkin snapshots, and 14 days of study history for 5 participants (alice, bob, carol, dave, roryp). All timestamps are generated relative to "now" so every metric lights up.

```bash
# Local
bash scripts/seed-demo.sh

# Azure (after azd up)
bash scripts/seed-demo.sh https://your-app.azurecontainerapps.io
```

```powershell
# Local
.\scripts\seed-demo.ps1

# Azure (after azd up)
.\scripts\seed-demo.ps1 -BaseUrl https://your-app.azurecontainerapps.io
```

After seeding, open:
- `/` → landing page with links to all pages
- `/checkin.html` → enter `roryp` + `roryp/burnout-app` to see the full stress breakdown with hover tooltips
- `/flamegraph.html?repo=roryp/burnout-app` → flamegraph visualization
- `/study.html` → researcher dashboard (click **Load Data**, then click any participant to drill into their stress details)

### Testing the demo flamegraph locally

1. Start the backend (see "Running the backend locally" above)
2. Seed test data — POST to `/demo/api/seed` (no auth needed).
   **CRITICAL: Use camelCase field names** (`createdAt`, `updatedAt`) — NOT snake_case (`created_at`, `updated_at`). The `Issue` Java record uses camelCase. Snake_case fields will deserialize as `null`, causing all time-based metrics (Context Switching, After Hours, Sustained Load) to show as 0.
   **CRITICAL: Use recent timestamps** — metrics like Context Switching and After Hours are calculated relative to the current time. Old/static dates will produce zero values. Always generate timestamps relative to "now".
   ```bash
   # Generate current timestamps
   NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)
   RECENT=$(date -u -d '-15 minutes' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -v-15M +%Y-%m-%dT%H:%M:%SZ)
   AFTER_HOURS=$(date -u -d 'today 03:00' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u +%Y-%m-%dT03:00:00Z)
   WEEK_AGO=$(date -u -d '-7 days' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)

   curl -X POST http://localhost:8080/demo/api/seed \
     -H 'Content-Type: application/json' \
     -d '{"repo":"owner/repo","issues":[
       {"number":1,"title":"Critical bug","body":"Security issue","labels":[{"name":"priority:critical"},{"name":"bug"}],"assignees":[{"login":"user"}],"createdAt":"'$WEEK_AGO'","updatedAt":"'$RECENT'","state":"open"},
       {"number":2,"title":"URGENT: memory leak","body":"","labels":[{"name":"urgent"}],"assignees":[],"createdAt":"'$WEEK_AGO'","updatedAt":"'$AFTER_HOURS'","state":"open"}
     ]}'
   ```
3. Open `http://localhost:8080/flamegraph.html?repo=owner/repo` in a browser
4. Or check APIs directly: `GET /demo/api/repos` and `GET /demo/api/flamegraph?repo=owner/repo`

### Testing on Azure Container Apps

After `azd up`, the in-memory `IssueCache` is empty. To populate everything for a demo:
1. Run `bash scripts/seed-demo.sh https://your-app.azurecontainerapps.io` (or `.\scripts\seed-demo.ps1 -BaseUrl ...`)
2. Or seed manually via `POST /demo/api/seed` (see field reference below)
3. Or sync real issues via the MCP `sync_issues` tool in VS Code (authenticates with GitHub)
4. Verify: `GET /demo/api/repos` should list synced repos, `GET /actuator/health` should return UP

---

## Code style and conventions

- **Backend**: Java 21, Spring Boot 3, LangChain4j. Use `@Agent` annotations for sub-agent interfaces. Use `@Tool` annotations for mutation methods.
- **MCP App**: TypeScript strict mode, ES modules (`"type": "module"` in package.json). Dependencies: `@modelcontextprotocol/sdk`, `zod`, `dotenv`.
- **Configuration**: Use Spring `@Configuration` and `@Bean` annotations. Azure OpenAI config is in `AgentConfiguration.java`.
- **Key design principle**: Deterministic services calculate all metrics first. AI agents **only explain and support** — they never make decisions.
- **Graceful degradation**: Every agent must have a fallback path when the LLM is unavailable. If LLM fails, return deterministic responses.

---

## Architecture overview

The system has two main components:

1. **MCP App** (Node.js) — Exposes 4 tools to VS Code Copilot Chat via stdio transport. Calls the backend over HTTP with a GitHub Bearer token.
2. **Java Backend** (Spring Boot + LangChain4j) — Runs the AI agent orchestration and stress analysis.

### Agent hierarchy

- **AgentOrchestrator** — Central coordinator that dispatches to:
  - **BurnoutSupervisorService** — Supervisor pattern with 5 sub-agents (DeferAgent, DelegateAgent, ClassifyAgent, ScopeAgent, WellnessAgent)
  - **ExplainerAiService** — Explains action plans in human-friendly language
  - **ProtectiveAiService** — Detects emotional signals and provides protective interventions
  - **FridayDeployAiService** — Assesses Friday deploy readiness

### MCP tools

| Tool | Description |
|------|-------------|
| `sync_issues` | Fetch GitHub issues via `gh` CLI and sync to backend |
| `show_burnout_wheel` | Display interactive flamegraph with 3-3-3 plan (dry run) |
| `reshape_day` | AI analysis + automatically apply labels to GitHub issues |
| `get_stress_score` | Quick stress check (0–100, LOW/MODERATE/HIGH) |

### API endpoints

| Method | Endpoint | Auth | Description |
|--------|----------|------|-------------|
| POST | `/api/issues/sync` | Yes | Sync issues from MCP app |
| GET | `/api/stress?repo=...&userId=...` | Yes | Get stress analysis |
| POST | `/api/reshape` | Yes | Run full reshape workflow |
| GET | `/demo/api/flamegraph?repo=...&userId=...` | No | Read-only flamegraph data for pre-synced repos |
| GET | `/demo/api/repos` | No | List repos currently synced in memory |
| POST | `/demo/api/sync?repo=owner/repo` | No | Sync issues from GitHub public API (rate-limited: 1 per repo per 5 min) |
| POST | `/demo/api/reshape` | No | Run reshape (supervisor agent) and apply mutations to IssueCache |
| POST | `/demo/api/checkin` | No | Stress check-in — accepts optional `tz` param (e.g. `America/New_York`) for timezone-aware after-hours detection. Returns `breakdown`, `breakdownHints`, `breakdownIssues`, and `timezone` |

### Demo web app

A standalone flamegraph web page is served at `/flamegraph.html` for live demos outside VS Code. It has a **"Sync from GitHub"** button that fetches public repo issues directly — no MCP tool or GitHub token required.

**Live demo:** https://aka.ms/burnout-app

**Demo workflow:**
1. Share the URL with the audience: `https://aka.ms/burnout-app`
2. Enter a public repo (e.g. `roryp/burnout-app`) and click **Sync from GitHub**
3. The flamegraph renders automatically after sync

Alternatively, sync issues via MCP in VS Code first, then share the URL — the audience will see pre-synced repos as clickable buttons.

The demo endpoints never mutate GitHub issues or labels. Sync is rate-limited to 1 request per repo per 5 minutes to avoid exhausting GitHub's unauthenticated API limit (60 req/hour per IP).

The `POST /demo/api/seed` endpoint accepts `{"repo": "owner/repo", "issues": [...]}` and populates the `IssueCache` for testing without GitHub auth. Use this to test the flamegraph locally or on Azure without needing the full MCP sync flow.

**Seed data field reference** (Issue record — all fields use **camelCase**):

| Field | Type | Required | Notes |
|-------|------|----------|-------|
| `number` | int | Yes | Issue number |
| `title` | String | Yes | Issue title |
| `body` | String | Yes | Empty string = "mystery meat" (hurts Clarity score) |
| `labels` | `[{"name": "..."}]` | Yes | See label effects below |
| `assignees` | `[{"login": "..."}]` | Yes | Empty = unassigned (hurts Chaos if urgent) |
| `createdAt` | ISO 8601 | Yes | **camelCase! NOT `created_at`**. Must be recent for time-based metrics |
| `updatedAt` | ISO 8601 | Yes | **camelCase! NOT `updated_at`**. Must be recent for time-based metrics |
| `state` | String | Yes | `"open"` or `"closed"` |

**Labels that affect classification and stress:**
- Deep Work: `priority:critical`, `priority:high`, `architecture`, `security`, `deep-work`, `epic`, `feature`
- Quick Win: `good-first-issue`, `quick-win`, `low-hanging-fruit`, `trivial`
- Maintenance: `dependencies`, `documentation`, `triage`, `chore`, `refactor`, `tech-debt`, `ci`, `devops`, `maintenance`
- Chaos triggers: `urgent` (especially if unassigned or >24h old)
- After hours: set `updatedAt` to hours before 9 AM or after 6 PM in the user's timezone (defaults to server timezone if no `tz` param). Weekends also count as after-hours.
- Context switching: 6+ issues with `updatedAt` within the last 60 minutes

---

## Known issues and gotchas

- **POST to `/api/**` returns 403 even with `security.enabled=false`**: Spring Security's CSRF protection and filter chain ordering can block POST requests even when the `securityEnabled` flag is false. Workaround: use the `/demo/api/seed` endpoint (on the `permitAll` path) for testing. The MCP app works because it sends a GitHub Bearer token.
- **`favicon.ico` returns 403**: The security config permits `/favicon.ico` but no favicon file exists in static resources. This causes a harmless console error in browsers. Fix: add a favicon file to `backend/src/main/resources/static/`.
- **In-memory cache lost on restart**: The `IssueCache` uses `ConcurrentHashMap` — all synced data is lost when the backend restarts or the container is redeployed. Re-sync via MCP or re-seed via `/demo/api/seed`.
- **Azure OpenAI fallback**: When the LLM is unavailable (dummy credentials, network issues), all agents return deterministic fallback responses. The agent explanation will include `*LLM agents unavailable - using deterministic fallback*`.
- **Spring Boot version**: 3.5.10 with Spring Security 6.x. The `SecurityConfig` uses a single `SecurityFilterChain` bean with `permitAll` for demo/health paths and `authenticated` for `/api/**`.
- **Seed data uses camelCase, NOT snake_case**: The `Issue` Java record uses `createdAt`/`updatedAt` (camelCase). The GitHub REST API returns `created_at`/`updated_at` (snake_case), but the `GitHubIssue` record in `DemoFlamegraphController` uses `@JsonProperty` annotations to map snake_case to camelCase during `/demo/api/sync`. When seeding directly via `/demo/api/seed`, you must use camelCase — snake_case fields silently deserialize as `null`, causing all time-based breakdown metrics (Context Switching, After Hours, Sustained Load) to show as 0.
- **Seed data needs current timestamps**: Metrics like Context Switching (`issuesTouchedToday`) and After Hours are calculated relative to the server's current time. Using old/static dates (e.g. `2026-01-01`) will produce zero values. Always generate timestamps relative to "now" when seeding.
- **After-hours is timezone-aware**: The checkin endpoint accepts an optional `tz` field (IANA timezone, e.g. `America/New_York`). The checkin.html page auto-detects the browser timezone and sends it. Working hours are 9 AM–6 PM in the user's timezone; weekends always count as after-hours. If no `tz` is provided, the server's configured `demo.clock.zone` is used (default: `Africa/Johannesburg`).

---

## Security model

- **`SecurityConfig.java`** controls all auth. GitHub tokens are validated against the GitHub API and cached for 5 minutes.
- Paths that require auth: `/api/**` (all API endpoints)
- Paths that are public: `/actuator/**`, `/demo/**`, `/`, `/index.html`, `/flamegraph.html`, `/checkin.html`, `/study.html`, `/favicon.ico`, `OPTIONS /**`
- CORS allows: `*.azurecontainerapps.io`, `*.vscode-cdn.net`, `vscode-webview://*`, `localhost:*`
- For local dev, set `security.enabled=false` via system property or env var `SECURITY_ENABLED=false`
- On Azure, the backend uses managed identity (`AZURE_IDENTITY_CLIENT_ID`) for Azure OpenAI — no API keys needed

---

## PR instructions

- Always run `mvn test` (backend) and `npm run build` (mcp-app) before committing.
- If you change agent behavior, update the corresponding `@Agent` or `@SystemMessage` annotations.
- If you add a new `@Tool` method to `BurnoutMutationTool`, add the corresponding GitHub mutation (labels, comments) and update this file.
- If you add or modify MCP tools in `mcp-app/src/index.ts`, update the MCP tools table above.
- Keep graceful degradation in mind — every new AI feature must have a deterministic fallback.

---

## Key files

| File | Description |
|------|-------------|
| `backend/src/.../agent/AgentOrchestrator.java` | Central agent coordinator |
| `backend/src/.../agent/ExplainerAiService.java` | Plan explanation agent |
| `backend/src/.../agent/ProtectiveAiService.java` | Emotional support agent |
| `backend/src/.../agent/FridayDeployAiService.java` | Deploy readiness agent |
| `backend/src/.../agent/supervisor/BurnoutAgents.java` | 5 sub-agent interfaces with `@Agent` annotations |
| `backend/src/.../agent/supervisor/BurnoutSupervisorService.java` | Supervisor pattern orchestration |
| `backend/src/.../agent/supervisor/BurnoutMutationTool.java` | GitHub mutation tools (`@Tool` methods) |
| `backend/src/.../config/AgentConfiguration.java` | LangChain4j + Azure OpenAI wiring |
| `backend/src/.../config/SecurityConfig.java` | Spring Security: GitHub token validation, permitAll paths, CORS |
| `backend/src/.../service/IssueCache.java` | In-memory `ConcurrentHashMap` cache for synced issues |
| `backend/src/.../service/IssueClassifierService.java` | Classifies issues into DEEP_WORK, QUICK_WIN, MAINTENANCE, DEFERRED |
| `backend/src/.../service/ChaosMetricsService.java` | Calculates chaos score from issue patterns |
| `backend/src/.../service/ComplianceService.java` | Analyzes compliance (labels, assignees, SLA) |
| `backend/src/.../controller/DemoFlamegraphController.java` | Read-only demo endpoints + seed endpoint (no auth) |
| `backend/src/main/resources/static/index.html` | Landing page with links to all demo pages |
| `backend/src/main/resources/static/flamegraph.html` | Standalone flamegraph web app for live demos |
| `backend/src/main/resources/static/checkin.html` | Stress check-in page (supports URL params for deep-linking) |
| `backend/src/main/resources/static/study.html` | Researcher dashboard with clickthrough to checkin + tooltips |
| `backend/src/main/resources/application.yml` | Server, security, Azure OpenAI, and demo config |
| `scripts/seed-demo.sh` | Bash seed script: issues + checkins + study data in one command |
| `scripts/seed-demo.ps1` | PowerShell seed script (same as above, for Windows) |
| `scripts/smoke-test.ps1` | Post-deployment smoke test (26 assertions, seeds + verifies all endpoints) |
| `scripts/demo-screenshots.ps1` | Full before/after demo screenshot capture — 3 phases: seed BEFORE, sync real issues, capture 8 screenshots |
| `scripts/demo-screenshots.js` | Playwright screenshot logic (called by PS1 script, supports `before`/`after`/`study` modes) |
| `scripts/demo-screenshots.sh` | Bash version of demo screenshot script (same flow, for Linux/macOS/CI) |
| `scripts/record-demo.mjs` | Records ~30s demo video with scene title cards and issue drilldown |
| `scripts/seed-issues.sh` | Creates real GitHub issues via `gh` CLI (for live repos) |
| `mcp-app/src/index.ts` | MCP server with 4 tool definitions + 2 UI resources |
| `mcp-app/src/config.ts` | Backend URL config (reads from `.env`) |
| `mcp-app/src/backend-client.ts` | HTTP client for backend API calls |
| `mcp-app/src/demo-data.ts` | Fallback demo data when backend is unavailable |
| `mcp-app/src/ui/burnout-flamegraph.ts` | Flamegraph HTML/JS visualization for VS Code panel |
| `mcp-app/src/ui/burnout-wheel.ts` | Wheel visualization for VS Code panel |
| `infra/main.bicep` | Azure infrastructure: identity, OpenAI, ACR, Container Apps |
| `azure.yaml` | Azure Developer CLI config (backend service on containerapp host) |