# Worker Pool Activity Data Format Documentation This document describes the JSON file formats created by `fetch-worker-data.js`. ## Overview The script generates daily JSON files containing Taskcluster worker pool activity data, fetched from STMO (BigQuery). The format uses string tables, parallel arrays, and differential time compression to minimize file size. Three file types are produced per day: 1. **Index**: `index.json` - List of available dates, used by the dashboard at https://tests.firefox.dev/workers.html to know which dates can be loaded. 2. **Summary**: `workers-YYYY-MM-DD.json` - Small file (~5 MB) with timestamps, resolution, task queue, and project. Loaded immediately by the dashboard for stats computation, time series rendering, and project filtering. 3. **Tasks**: `workers-YYYY-MM-DD-tasks.json` - Full task data (~20 MB) with all fields. Loaded on demand when the user clicks a Profiler button. --- ## Index File (`index.json`) ```json { "dates": ["2026-03-04", "2026-03-03", "2026-03-02", ...] } ``` Dates are sorted in descending order (newest first). Up to 21 days of history are maintained. --- ## Summary File (`workers-YYYY-MM-DD.json`) Small file for fast initial dashboard load. Contains only the fields needed to compute per-pool stats, draw time-series activity tracks, and populate project filter buttons. ### Top-Level Structure ```json { "metadata": { ... }, "tables": { ... }, "tasks": { ... } } ``` ### metadata ```json { "date": "2026-03-04", "generatedAt": "2026-03-05T03:12:45.123Z", "taskCount": 243684 } ``` ### tables ```json { "taskQueueIds": ["gecko-t/t-linux-docker-noscratch-amd", ...], "resolutions": ["completed", "failed", "exception - canceled", ...], "projects": ["mozilla-central", "try", "autoland", ...] } ``` ### tasks Parallel arrays of length `taskCount`: ```json { "scheduled": [1709510400000, 150, 0, 23, ...], "started": [5432, null, 1234, ...], "resolved": [65432, 3600000, 12345, ...], "resolutionIds": [0, 0, 2, ...], "taskQueueIdIds": [0, 1, 0, ...], "projectIds": [0, null, 1, ...] } ``` See [Time Compression](#time-compression) and [Resolutions](#resolutions) below for encoding details. --- ## Tasks File (`workers-YYYY-MM-DD-tasks.json`) Full task data loaded on demand for Firefox Profiler integration. ### Top-Level Structure ```json { "metadata": { ... }, "tables": { ... }, "tasks": { ... }, "workerInfo": { ... }, "taskGroupInfo": { ... } } ``` ### metadata Same structure as the summary file. ### tables All strings are deduplicated and stored once. Tables are sorted by frequency of use (most referenced entries first) to reduce JSON size (frequently used strings get smaller numeric indices). ```json { "labels": ["test-linux2404-64/opt-mochitest-plain-5", ...], "projects": ["mozilla-central", "try", "autoland", ...], "taskQueueIds": ["releng-hardware/gecko-t-linux-2404-wayland", ...], "resolutions": ["completed", "failed", "exception - canceled", ...], "workerGroups": ["us-central1-b", ...], "workerIds": ["7732042218797547089", ...], "priorities": ["low", "lowest", "medium", ...], "users": ["cron@mozilla-central", "user@example.com", ...], "taskGroupIds": ["YJJe4a0CRIqbAmcCo8n63w", ...] } ``` ### tasks Parallel arrays of length `taskCount`: ```json { "scheduled": [1709510400000, 150, 0, 23, ...], "started": [5432, null, 1234, ...], "resolved": [65432, 3600000, 12345, ...], "resolutionIds": [0, 0, 2, ...], "taskIds": ["YJJe4a0CRIqbAmcCo8n63w", "XPPf5b1DRJrcBndDp9o74x.1", ...], "labelIds": [0, 1, 2, ...], "priorityIds": [0, 0, 1, ...], "taskGroupIdIds": [0, 0, 1, ...], "userIds": [0, 1, 0, ...], "taskQueueIdIds": [0, 1, 0, ...], "workerIdIds": [0, null, 1, ...], "runCosts": [0.012244, 0, 0.003456, ...] } ``` #### Task IDs Task IDs include a `.runId` suffix only when the run ID is greater than 0: - Run 0: `"YJJe4a0CRIqbAmcCo8n63w"` - Run 1: `"YJJe4a0CRIqbAmcCo8n63w.1"` #### Index References - `resolutionIds[i]`: Index into `tables.resolutions` - `labelIds[i]`: Index into `tables.labels` - `priorityIds[i]`: Index into `tables.priorities` - `taskGroupIdIds[i]`: Index into `tables.taskGroupIds` - `userIds[i]`: Index into `tables.users` - `taskQueueIdIds[i]`: Index into `tables.taskQueueIds` - `workerIdIds[i]`: Index into `tables.workerIds` (`null` if never assigned) #### Run Costs `runCosts[i]` is the cost in USD, rounded to 6 decimal places. `0` when not available. ### workerInfo Per-worker arrays indexed by worker ID (index into `tables.workerIds`). ```json { "workerGroupIds": [0, 1, 0, ...] } ``` - `workerGroupIds[w]`: Index into `tables.workerGroups` for worker `w` Each worker belongs to exactly one worker group. ### taskGroupInfo Per-task-group arrays indexed by task group ID (index into `tables.taskGroupIds`). ```json { "projectIds": [0, 1, null, ...] } ``` - `projectIds[g]`: Index into `tables.projects` for task group `g` (`null` for task groups with no project, typically decision tasks) Each task group belongs to at most one project. --- ## Shared Encoding Details ### Time Compression Both files use the same time encoding. Tasks are sorted by scheduled time. - **`scheduled`**: Differential compression. `scheduled[0]` is an absolute millisecond timestamp. `scheduled[i]` (for i > 0) is the delta from `scheduled[i-1]`. Since tasks are sorted, deltas are small non-negative integers. - **`started`**: Offset from the task's own scheduled time (i.e., queue wait time in ms). `null` if the task never started. - **`resolved`**: Offset from the task's own scheduled time (i.e., total task lifetime in ms). To reconstruct absolute timestamps: ```javascript const abs = new Array(n); abs[0] = data.tasks.scheduled[0]; for (let i = 1; i < n; i++) { abs[i] = abs[i - 1] + data.tasks.scheduled[i]; } // abs[i] is the absolute scheduled time for task i const startedMs = data.tasks.started[i] !== null ? abs[i] + data.tasks.started[i] : null; const resolvedMs = abs[i] + data.tasks.resolved[i]; ``` ### Resolutions The `resolutions` table merges the task `state` and `reason_resolved` fields: - When both are the same: just the value (e.g., `"completed"`, `"failed"`) - When different: `"state - reason"` (e.g., `"exception - canceled"`, `"exception - deadline-exceeded"`) --- ## Example: Reconstruct a Full Task Row Using the tasks file: ```javascript const d = data; const i = 42; // Reconstruct absolute scheduled time let scheduledMs = d.tasks.scheduled[0]; for (let j = 1; j <= i; j++) { scheduledMs += d.tasks.scheduled[j]; } const startedMs = d.tasks.started[i] !== null ? scheduledMs + d.tasks.started[i] : null; const resolvedMs = scheduledMs + d.tasks.resolved[i]; const taskId = d.tasks.taskIds[i]; const label = d.tables.labels[d.tasks.labelIds[i]]; const resolution = d.tables.resolutions[d.tasks.resolutionIds[i]]; const priority = d.tables.priorities[d.tasks.priorityIds[i]]; const taskQueue = d.tables.taskQueueIds[d.tasks.taskQueueIdIds[i]]; const cost = d.tasks.runCosts[i]; // Task group and project const tgIdx = d.tasks.taskGroupIdIds[i]; const taskGroupId = tgIdx !== null ? d.tables.taskGroupIds[tgIdx] : null; const projectIdx = tgIdx !== null ? d.taskGroupInfo.projectIds[tgIdx] : null; const project = projectIdx !== null ? d.tables.projects[projectIdx] : null; // User const userIdx = d.tasks.userIds[i]; const user = userIdx !== null ? d.tables.users[userIdx] : null; // Worker (may be null if task never ran) const wIdx = d.tasks.workerIdIds[i]; const workerId = wIdx !== null ? d.tables.workerIds[wIdx] : null; const workerGroup = wIdx !== null ? d.tables.workerGroups[d.workerInfo.workerGroupIds[wIdx]] : null; ``` --- ## Data Characteristics - **Task count**: ~240K tasks per day - **Summary file size**: ~5 MB uncompressed per day - **Tasks file size**: ~20 MB uncompressed per day - **String tables**: Sorted by frequency (most used entries have lowest indices) - **History**: 21 days maintained via CI artifact re-upload - **Source**: STMO query 112377 (BigQuery) - **Schedule**: Generated daily at 03:00 UTC via CI cron