# ATM CLI Requirements ## 1. Product Definition Product requirement ID: - `REQ-P-PRODUCT-001` The retained ATM product surface consists of `send`, `read`, `ack`, `clear`, `log`, `doctor`, `teams`, and `members`, backed by a singleton daemon runtime and SQLite source-of-truth for mail and roster state in the Phase Q architecture. Satisfied by: - intentionally undecomposed product requirement; this governs overall retained product scope rather than a single crate-local obligation The product is a local command-line tool named `atm`. The current target architecture no longer treats daemon removal as a product goal. Phase Q reintroduces a tightly-bounded singleton daemon runtime because mail routing, native agent notification, and cross-host transport need one coordinating process, while ATM command behavior remains the user-facing surface. The retained product surface is: - `atm send` - `atm read` - `atm ack` - `atm clear` - `atm log` - `atm doctor` - `atm teams` - `atm members` The system must preserve the retained command behavior unless these requirements explicitly retire or change it. The system uses structured logging through `sc-observability`. Schema ownership references: - Claude Code-native message schema: [`claude-code-message-schema.md`](./claude-code-message-schema.md) - ATM additive/interpreted message schema: [`atm-message-schema.md`](./atm-message-schema.md) - legacy ATM read-compatibility schema: [`legacy-atm-message-schema.md`](./legacy-atm-message-schema.md) - `sc-observability` schema ownership pointer: [`sc-observability-schema.md`](./sc-observability-schema.md) - ATM-owned error-code registry: [`atm-error-codes.md`](./atm-error-codes.md) - schema enforcement models: `tools/schema_models/claude_code_message_schema.py` and `tools/schema_models/atm_message_schema.py` and `tools/schema_models/legacy_atm_message_schema.py` ## 1.1 Documentation Structure Documentation organization is defined in [`documentation-guidelines.md`](./documentation-guidelines.md). Top-level product docs in `docs/` remain the product source of truth. Crate-local ownership docs live under: - [`docs/atm/requirements.md`](./atm/requirements.md) - [`docs/atm/architecture.md`](./atm/architecture.md) - [`docs/atm-core/requirements.md`](./atm-core/requirements.md) - [`docs/atm-core/architecture.md`](./atm-core/architecture.md) - [`docs/atm-daemon/requirements.md`](./atm-daemon/requirements.md) - [`docs/atm-daemon/architecture.md`](./atm-daemon/architecture.md) - [`docs/atm-rusqlite/requirements.md`](./atm-rusqlite/requirements.md) - [`docs/atm-rusqlite/architecture.md`](./atm-rusqlite/architecture.md) During the cleanup/restructure phase, product requirements stay here while crate-local ownership is moved out of this file into the crate directories. Phase-Q supersession note: - earlier daemon-free assumptions in this file are historical requirements from the prior rewrite line - for mail/runtime architecture, the current authoritative direction is Section 21 ## 2. Scope Product requirement ID: - `REQ-P-SCOPE-001` The rewrite retains the documented command surface and migrates ATM mail/runtime ownership from filesystem JSON plus mailbox locks to SQLite plus a singleton daemon without intentionally removing retained functionality. Satisfied by: - intentionally undecomposed product requirement; this governs overall rewrite scope and is enforced across the workspace rather than by one crate-local ID - `REQ-P-RUNTIME-001` Production ATM commands must connect to the daemon and auto-start it when absent. Required behavior: - the production CLI/runtime path first attempts to connect to an already-running daemon - if the daemon is not running, the production CLI/runtime path auto-starts it and retries once - if daemon auto-start still fails, ATM must fail clearly with recovery guidance - no production path may silently bypass the daemon by talking directly to SQLite or inbox files ### 2.1 In Scope - one binary: `atm` - one primary library: `atm-core` - SQLite-backed ATM mail source of truth - SQLite-backed team roster source of truth - singleton daemon runtime - same-host daemon API over Unix domain socket - cross-host daemon API over TCP/TLS - Claude-compatible JSONL inbox ingress and export - configuration resolution - hook-based identity fallback - file-reference policy handling for `send --file` - origin-inbox merge / ingest compatibility for Claude-owned inbox files - ATM-owned read/ack/clear/task state in SQLite - structured logging through `sc-observability` - log query and follow through `sc-observability` - local diagnostics through `atm doctor` - local team discovery and recovery through `atm teams` - local roster verification through `atm members` - native agent/plugin notification interface - the retained local team recovery surface: - `atm teams` - `atm members` - `atm teams add-member` - `atm teams backup` - `atm teams restore` - task metadata carried in the mail envelope - JSON output mode - human-readable output mode ### 2.2 Out Of Scope - long-lived durable remote-delivery queueing - broad plugin host framework beyond the ATM agent notification/runtime interface - CI monitoring - TUI and MCP features - daemon spawning as the core correctness test strategy - bounded daemon smoke tests for the auto-start path are permitted when isolated from default test runs per [Testing Constraints](docs/plan-phase-Q.md#testing-constraints) - manual daemon-start discipline as a product requirement - production CLI auto-start when the daemon is absent is in scope under `REQ-P-RUNTIME-001` - `atm status` in the initial rewrite - separate `atm tail` command in the initial rewrite - team lifecycle management outside the retained local recovery surface (`atm teams`, `atm members`, `atm teams add-member`, `atm teams backup`, `atm teams restore`) ### 2.3 Release Distribution Scope Product requirement ID: - `REQ-P-RELEASE-001` The `1.0` retained-surface release must replace the previously published `agent-team-mail` CLI/core distribution channels from this repo without requiring downstream users to adopt new crate identities. - `REQ-P-RELEASE-002` Channel parity for the replacement release is limited to the historical release channels that actually existed for the old repo: crates.io, GitHub Releases, and Homebrew. - `REQ-P-RELEASE-003` Crate/package identity continuity must be preserved by publishing the retained CLI/core replacement under the legacy package names `agent-team-mail` and `agent-team-mail-core` while keeping the installed CLI binary name `atm`. - `REQ-P-RELEASE-004` This repo must own the release-process control surface needed to ship and verify the replacement release, including the release workflows, artifact manifest, supporting scripts, and `publisher` agent instructions. - `REQ-P-RELEASE-005` Windows installation must be first-class for `1.0` without requiring Rust tooling or manual archive extraction; `winget` is therefore a required additional release channel even though it was not part of the historical `agent-team-mail` release system. - `REQ-P-RELEASE-006` Release prerequisites that depend on account-level distribution infrastructure must be made explicit in the repo-owned release plan before `1.0` release automation is considered complete. Required behavior: - the `1.0` release must publish the retained CLI and core crates under the legacy crates.io package names: - `agent-team-mail` - `agent-team-mail-core` - the `atm` binary name remains the installed CLI entrypoint - the release channels that were already part of the historical `agent-team-mail` release system and must be replaced from this repo are: - crates.io - GitHub Releases - Homebrew - `winget` is not a historical release channel for `agent-team-mail`, but it is a required new `1.0` release channel so normal Windows users can install ATM without Rust tooling or manual zip handling - Homebrew release automation depends on the existing `randlee/homebrew-tap` tap and requires `HOMEBREW_TAP_TOKEN` to be configured in `atm-core` GitHub secrets before the release workflow can update formulas from this repo - `winget` release automation uses the `randlee` namespace with package ID `randlee.agent-team-mail` - the first `winget` release requires a one-time manual manifest submission to `microsoft/winget-pkgs`; after that initial submission, later releases may be automated from this repo - `winget` release automation must not require a repo-specific secret beyond the default GitHub workflow token - release readiness proof for `winget` must validate successful submission or manifest update dispatch; it cannot require same-day installability because Microsoft review introduces a normal 1-2 day publication lag ## 3. External Contracts Product requirement ID: - `REQ-P-CONTRACT-001` External path/config/store/observability contracts must match the documented retained ATM behavior for the active architecture line. Satisfied by: - `REQ-CORE-CONFIG-001` for home/path/config resolution aspects - `REQ-CORE-RUNTIME-001` for durable mail/roster store ownership aspects - `REQ-CORE-INGEST-001` for Claude inbox/config ingest compatibility aspects - `REQ-CORE-MAILBOX-001` for persisted Claude inbox write/read compatibility aspects - `REQ-ATM-OBS-001` for CLI observability bootstrap/integration aspects - `REQ-CORE-OBS-001` for ATM observability boundary/query-model aspects ### 3.1 Home And Path Resolution Path resolution order: 1. `ATM_HOME` when set and non-empty 2. OS home directory Required canonical paths: - `{ATM_HOME}/.claude` - `{ATM_HOME}/.claude/teams` - `{ATM_HOME}/.claude/teams/{team}` - `{ATM_HOME}/.claude/teams/{team}/config.json` - `{ATM_HOME}/.claude/teams/{team}/inboxes/{agent}.json` - `{ATM_HOME}/.config/atm/config.toml` - `{ATM_HOME}/.config/atm/state.json` - `{ATM_HOME}/.config/atm/share/{team}/` ### 3.1.1 Security And Durability Boundaries Product requirement IDs: - `REQ-SEC-001` All user-supplied team and agent name segments must be validated before path construction. - `REQ-SEC-002` JSON number normalization must not allocate unbounded memory. - `REQ-DURABILITY-001` Atomic-write temp file names must be globally unique within a process. Required behavior: - valid team/agent path-segment characters are limited to: - alphanumeric - hyphen - underscore - period - team/agent segments must reject: - empty strings - path separators - `..` sequences - consecutive periods - leading periods - platform-specific path escapes that could break out of the intended ATM home subtree - validation must happen before any path construction in address parsing or home/path helpers - JSON number normalization must cap exponent-driven string expansion at 64 characters - if exponent expansion would exceed 64 characters, ATM must: - return the original raw numeric string unchanged - emit a structured warning using `AtmErrorCode::WarningMalformedAtmFieldIgnored` - atomic persistence helpers must use temp-file names that are unique for each write attempt targeting the same destination path from the same process - timestamp-only temp-file suffixes are not sufficient for the durability contract because rapid same-process writes can collide ### 3.2 Team Mail Store Per-team layout: - `{ATM_HOME}/.claude/teams/{team}/config.json` - `{ATM_HOME}/.claude/teams/{team}/inboxes/{agent}.json` - optional origin inbox files: - `{ATM_HOME}/.claude/teams/{team}/inboxes/{agent}.{origin}.json` The rewrite retains origin-file merge behavior for read and wait paths because it is part of the current file-based mail surface and does not require the daemon. ### 3.2.1 Message Schema Ownership And Compatibility Product requirement ID: - `REQ-P-SCHEMA-001` ATM must preserve explicit ownership boundaries between Claude Code-native message schema, legacy ATM compatibility schema, and forward ATM metadata schema. Satisfied by: - `REQ-CORE-MAILBOX-001` for persisted inbox read/write compatibility - `REQ-CORE-WORKFLOW-001` for ATM workflow semantics layered onto compatible message representations Required rules: - Claude Code-native message schema is owned by Claude Code - ATM must not redefine Claude-native fields as if ATM owned them - ATM read must accept: - Claude Code-native messages - legacy ATM top-level additive messages - future ATM metadata-based messages - new ATM-only machine-readable fields must not be added as new top-level inbox fields - forward ATM machine-readable fields must live in `metadata.atm` - forward ATM-authored alert and repair metadata, including legacy `atmAlertKind` and `missingConfigPath`, must migrate to `metadata.atm` fields such as `metadata.atm.alertKind` and `metadata.atm.missingConfigPath` - ATM may enrich a Claude-native message in place by adding ATM-owned metadata without rewriting native Claude fields except for the explicitly documented cross-team alias projection carve-out on `from`, which also requires `metadata.atm.fromIdentity` - locally owned schema enforcement must distinguish legacy top-level UUID-based ATM identifiers from forward metadata-based ULID identifiers - write-path validation may reject wrong-format ATM-owned identifiers with descriptive errors - read-path validation failure for ATM-owned fields must trigger warning + degradation logic rather than failing the overall message read - a separate ATM-native inbox is explicitly deferred and must not be assumed by the current live design Current compatibility rule: - existing runtime write/read behavior for legacy top-level alert fields remains stable until a later compatibility-migration implementation changes that persisted shape `REQ-P-SCHEMA-001` is owned by: - [`claude-code-message-schema.md`](./claude-code-message-schema.md) - [`atm-message-schema.md`](./atm-message-schema.md) - [`legacy-atm-message-schema.md`](./legacy-atm-message-schema.md) - [`atm-core/design/dedup-metadata-schema.md`](./atm-core/design/dedup-metadata-schema.md) §2.2 and §3.3 for forward ATM alert-field placement and sender-side dedup semantics ### 3.2.2 Shared File Ownership And Mutation Classes Product requirement ID: - `REQ-P-FILEIO-001` Every live file operation must declare file ownership, mutation class, and the single commit path used for persistence. Required rules: - every live file path must be classified as one of: - Claude-owned - ATM-owned - shared/de-facto interoperable - ownership determines whether ATM is allowed to treat the file as writable source-of-truth state - ATM-owned machine state must have one documented write path per file family - ad hoc write logic at leaf call sites is prohibited for live shared state Operation classes: - `read_only` - no lock acquisition - no temp-file write - no persistence side effect - `read_possible_write` - initial unlocked read is allowed - if the read determines no change is needed, return without locking - if the read determines a change is needed, the operation must enter the shared write-commit path before persisting anything - `read_modify_write` - mutation is expected - persistence must still flow through the shared write-commit path Shared write-commit path requirements: - the mutation plan must be computed from a concrete input snapshot - before replacing the live file, ATM must prove source freshness by either: - compare-and-swap against the exact snapshot identity/content that was read, or - lock, reread current state, recompute the mutation from the fresh state, then commit - `read -> mutate -> lock -> blind rename` is not a valid write path - every successful commit of shared mutable structured state must use the documented atomic replacement helper family Source-of-truth guardrails: - ATM must not rely on full-file rewrite of Claude-owned files as the long-term source of truth for ATM-local workflow state - if ATM-local semantics need durability independent of Claude’s native writes, that state must move to ATM-owned sidecars or an equivalent ATM-owned store - when a legacy compatibility path still rewrites a non-ATM-owned shared file, the requirements and architecture docs must call out the limitation ### 3.3 Configuration Resolution Configuration resolution order: 1. CLI flags 2. environment variables 3. repo-local `.atm.toml` 4. global `{ATM_HOME}/.config/atm/config.toml` 5. defaults Required config fields: - default team Supported optional config fields: - `[atm].team_members` - `[atm].aliases` - `[[atm.post_send_hooks]]` Runtime identity rules: - repo-local `.atm.toml` `[atm].identity` is not a valid runtime identity fallback for the retained multi-agent ATM model - runtime identity must come from: - explicit command override when supported - hook-file identity - `ATM_IDENTITY` - an obsolete config `[atm].identity` field may remain temporarily for migration, but ATM must ignore it for runtime identity resolution and `atm doctor` must flag it for removal - `.atm.toml` may define `[atm].team_members` as the baseline team roster that should always be present in `config.json` - `.atm.toml` may define `[atm].aliases` for ATM-owned shorthand addressing of canonical member identities - `.atm.toml` may define one or more `[[atm.post_send_hooks]]` rules for best-effort recipient-scoped post-send automation - retired `[atm].post_send_hook`, `[atm].post_send_hook_senders`, `[atm].post_send_hook_recipients`, and `[atm].post_send_hook_members` keys must be rejected with migration guidance directing operators to `[[atm.post_send_hooks]]` - config sections outside ATM-owned config, such as `[rmux]` or future `[scmux]`, are not ATM runtime config and must be ignored by `atm-core` ### 3.3.1 Config And Schema Recovery Product requirement ID: - `REQ-P-CONFIG-HEALTH-001` Persisted ATM config and team JSON loading must recover at the narrowest safe scope and report precise diagnostics when recovery is not safe. Satisfied by: - `REQ-CORE-CONFIG-003` for config/team schema recovery and diagnostic policy - `REQ-CORE-SEND-001` for send-time missing-config fallback and repair notification policy - `REQ-CORE-MAILBOX-001` for mailbox record skip behavior Required persisted-data classes: - `compatibility-recoverable` - `record-invalid` - `document-invalid` - `missing-document` Required handling policy: - compatibility-only schema drift may be recovered with documented, deterministic defaults - malformed records inside a larger persisted collection should be skipped or quarantined individually when the rest of the document remains trustworthy - malformed root documents or invalid root structure must fail with structured errors rather than guessed repairs - missing persisted team config is a distinct `missing-document` condition and must not be collapsed into generic parse corruption - identity and routing semantics must never be fabricated to keep a command running Required diagnostics: - failure class when known - file path - entity scope when known, such as member name or collection entry - field name when known - parser detail, including line and column when available - recovery guidance when operator action is required Operator examples and safe repair guidance live in [`persisted-data-repair.md`](./persisted-data-repair.md). ### 3.4 Claude Settings Resolution The system must resolve Claude settings for file-reference policy checks. Resolution order: 1. explicit settings path override when provided internally 2. repo-local `.claude/settings.local.json` 3. repo-local `.claude/settings.json` 4. global `{ATM_HOME}/.claude/settings.json` ### 3.5 Observability Shared Integration Baseline ATM depends on `sc-observability` as the shared logging/query/health substrate. The shared surface ATM integrates against must support: - structured log emission - historical query of retained records - follow/tail of new matching records - filtering by severity - filtering by structured key/value fields - filtering by time window - limit/order controls - health reporting for the logging runtime The current shared repo now exposes those generic capabilities. ATM must integrate with them directly rather than preserving a local tracing-only adapter. Required integration rules: - ATM must not implement a parallel ad hoc log-query engine when shared `sc-observability` APIs can own the behavior - `atm-core` must keep the shared crates behind an ATM-owned injected boundary - `atm` owns the concrete shared-crate bootstrap and dependency wiring - the active release baseline uses the published `sc-observability = "1.0.0"` crates.io dependency - the same pinned Rust toolchain must be used locally and in CI across ATM and `sc-*` repos - the concrete integration work is planned in Phase K of [`project-plan.md`](./project-plan.md) Historical note: - `OBS-GAP-1` is complete as a historical planning artifact and does not remain the gating item for retained observability delivery ## 4. Identity Resolution Product requirement ID: - `REQ-P-IDENTITY-001` Identity resolution must follow the documented command precedence rules. Satisfied by: - `REQ-CORE-CONFIG-001` for identity resolution policy ### 4.1 Send Identity Resolution Order 1. `--from` 2. hook-file identity 3. `ATM_IDENTITY` ### 4.2 Read Identity Resolution Order 1. `--as` 2. hook-file identity 3. `ATM_IDENTITY` ### 4.3 Doctor Identity Resolution `atm doctor` uses the same config and hook-resolution paths as the retained mail commands, but it must not fail immediately only because hook identity is absent. Missing hook identity is a diagnostic finding unless identity resolution is explicitly required for a requested check. If command identity cannot be determined where required, the command must fail with a structured recovery-oriented error. An obsolete config `identity` field may be reported as a diagnostic, but it does not count as command identity. ## 5. Address Resolution Product requirement ID: - `REQ-P-ADDRESS-001` Address resolution must support the documented `agent`/`agent@team` forms and precedence rules. Satisfied by: - `REQ-CORE-CONFIG-002` for address parsing, alias rewrite, and team/member validation policy Supported address forms: - `agent` - `agent@team` Resolution order: 1. explicit `agent@team` 2. bare `agent` plus `--team` 3. bare `agent` plus configured default team An explicit `@team` suffix takes precedence over `--team`. Aliases are resolved after splitting `agent@team`, so only the agent token is rewritten. Alias rules: - aliases are accepted as ATM-owned input shorthand only - recipient aliases must resolve to canonical member names before validation, self-send checks, and mailbox lookup - sender aliases may be accepted on input, but canonical sender identity remains the routing and validation identity - same-team messages keep current canonical sender projection behavior - cross-team messages may project an alias-oriented sender in the persisted `from` field only when ATM also stores canonical sender identity in `metadata.atm.fromIdentity` Post-send-hook rules: - `[[atm.post_send_hooks]]` is the only supported post-send hook shape in this release line - each rule binds exactly one `recipient` selector and one `command` argv - `recipient` must be either one concrete team member name or `*` - multiple matching rules may run for a single send, in config order - retired `[atm].post_send_hook`, `[atm].post_send_hook_senders`, `[atm].post_send_hook_recipients`, and `[atm].post_send_hook_members` keys must fail with migration-oriented guidance to `[[atm.post_send_hooks]]` - a relative hook path must resolve from the directory containing the discovered `.atm.toml` - bare executable names such as `bash`, `python3`, or `tmux` must use normal `PATH` resolution - the hook must execute with the config-root directory as its working directory - recipient non-match is expected behavior and must be silent - the hook inherits the process environment and also receives one ATM-owned JSON payload in `ATM_POST_SEND` - the `ATM_POST_SEND` payload must contain: - `from` - `to` - `sender` - `recipient` - `team` - `message_id` - `requires_ack` - `is_ack` - optional `task_id` when present - optional `recipient_pane_id` when ATM has an authoritative pane mapping for the recipient - Phase Q addition: `is_ack` is part of the retained hook payload contract for the daemon-owned send/ack runtime path so hook implementations can distinguish `atm send` from `atm ack` without inspecting message text - the post-send hook must run after successful non-`dry-run` `atm send` - the post-send hook must also run after successful `atm ack`, using the reply message as the hook subject - `is_ack` must be `false` for `atm send` and `true` for `atm ack` - example payload: ```json { "from": "arch-ctm@atm-dev", "to": "recipient@atm-dev", "sender": "arch-ctm", "recipient": "recipient", "team": "atm-dev", "message_id": "...", "requires_ack": false, "is_ack": false, "recipient_pane_id": "%1" } ``` - the hook may optionally emit one structured result object on stdout for ATM to parse as post-send diagnostics - the structured hook-result object must support: - `level` - `message` - optional `fields` - supported hook-result levels are: - `debug` - `info` - `warn` - `error` - missing stdout, empty stdout, oversized stdout, or invalid hook-result schema must not fail the send or convert a successful hook execution into a command error - when a valid hook-result object is returned, ATM must log it with the declared level and preserve any structured fields - when a hook is configured, ATM must emit enough diagnostics to explain whether the hook ran or failed, including the sender, recipient, and matched hook recipient selector ## 6. `atm send` Product requirement ID: - `REQ-P-SEND-001` `atm send` must satisfy the documented send contract. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for human-readable and JSON output aspects - `REQ-CORE-CONFIG-002` for address resolution and target-validation aspects - `REQ-CORE-SEND-001` for send-time missing-config fallback and repair notification behavior - `REQ-CORE-MAILBOX-001` for message creation, duplicate suppression, and atomic mailbox mutation aspects ### 6.1 Purpose Write one message into one target inbox. ### 6.2 Required Flags And Inputs - positional target: `agent` or `agent@team` - optional positional message text - `--team ` - `--file ` - `--stdin` - `--summary ` - `--json` - `--dry-run` - `--from ` - `--requires-ack` - `--task-id ` Retired from the current implementation: - `--offline-action` - this flag exists only to cooperate with daemon-backed liveness checks and is not retained ### 6.3 Required Behavior - resolve sender identity using the defined precedence - resolve recipient address using the defined precedence - resolve aliases before mailbox lookup - when a cross-team alias-oriented sender is projected into `from`, also persist canonical sender identity in `metadata.atm.fromIdentity` and use the canonical sender identity for validation, self-send checks, routing, and audit behavior - verify target team existence and target agent membership as part of address resolution before mailbox path selection, except for the documented `missing-document` fallback in §6.3.1 - generate summary when not explicitly provided - enter the atomic append boundary before final inbox mutation - validate message text inside the atomic append boundary - generate message id for ATM-authored messages inside the atomic append boundary - create inbox file if absent inside the atomic append boundary - preserve duplicate-suppression behavior for message ids inside the atomic append boundary - append atomically to the inbox file - support dry-run without mutation - support sender-controlled ack-required messages - support optional task metadata on sent messages - reject retired `post_send_hook_members` config with actionable migration guidance before send execution proceeds - reject retired flat post-send-hook config keys with actionable migration guidance before send execution proceeds - run matching `[[atm.post_send_hooks]]` rules only after successful non-`dry-run` sends - match rules only by resolved recipient identity - support `recipient = "*"` wildcard matching for all recipients - execute all matching post-send-hook rules in config order - support an optional structured hook result on stdout so hook scripts can report post-send outcomes such as nudges, no-op conditions, and operator errors without relying on stderr scraping - emit structured diagnostics for hook-rule evaluation and actionable warnings only when a configured hook execution fails - treat `post_send_hook` failure or timeout as best-effort diagnostics only; it must not roll back or fail an already-successful send - write a non-null `message_id` on every ATM-authored message - current live write compatibility may generate top-level `message_id` values using UUID while the metadata-based schema is not yet implemented Forward schema requirements: - once ATM writes `messageId` under `metadata.atm`, it must use ULID rather than UUID for newly-authored values - ATM must generate the ULID first and derive the persisted Claude-native `timestamp` from that ULID creation instant - legacy UUID `message_id` remains read-compatible `message_id` is required on every message written by `atm send`. `message_id` is optional in the persisted schema (§14.1) only to support legacy messages written by older clients, but `atm send` never omits it. Recipients use `message_id` for: - duplicate suppression - read-time duplicate collapse - acknowledgement targeting ### 6.3.1 Missing Team Config Fallback When team `config.json` is missing, `atm send` may still proceed only when: - the resolved team directory exists - the target inbox path already exists - no team, agent, or routing identity must be guessed When `atm send` uses this fallback, it must: - surface an actionable warning to the sender that delivery used inbox fallback because team config is missing - keep the original delivery path best-effort and non-interactive - send a best-effort repair notification to `team-lead` when that recipient can be resolved without guesswork - deduplicate repeated repair notifications for the same unresolved missing-team config condition so inboxes do not accumulate hundreds of identical messages When team `config.json` is malformed rather than missing: - `atm send` must fail with a structured configuration error - malformed config must not silently degrade into missing-config fallback ### 6.4 Message Source Semantics Exactly one message source must be used: - positional message text - `--stdin` - `--file` `--file` behavior: - verify the file exists - apply the file-access policy - if allowed, send a file-reference message body - if not allowed, copy the file into the team share directory and rewrite the message body to reference the share copy If positional message text is combined with `--file`, preserve the current two-part body shape: ```text File reference: ``` ### 6.5 Ack-Required And Task Metadata `--requires-ack` means the message must enter the pending-ack queue at write time. Required behavior: - write the message with `read = false` - set `pendingAckAt` to the send timestamp inside the atomic append boundary - do not wait for a later read to create the ack obligation `--task-id ` attaches task metadata to the message envelope. Required behavior: - persist `taskId` - require acknowledgement for any task-linked message - reject blank task ids If `--task-id` is present: - treat the message as task-linked mail - imply `--requires-ack` ### 6.6 Output Contract Human output must include: - recipient - sender - delivery result JSON output must include: - `action = "send"` - `team` - `agent` - `outcome` - `message_id` - `requires_ack` - `task_id` Dry-run JSON output must include: - `action = "send"` - `agent` - `team` - `message` - `dry_run = true` - `requires_ack` - `task_id` ## 7. `atm read` Product requirement ID: - `REQ-P-READ-001` `atm read` must satisfy the documented read/selection/wait contract. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for human-readable and JSON output aspects - `REQ-CORE-CONFIG-002` for target-validation aspects - `REQ-CORE-MAILBOX-001` for merged inbox load/persist aspects - `REQ-CORE-WORKFLOW-001` for classification, queue selection, and legal transition aspects ### 7.1 Purpose Read messages from one inbox. ### 7.2 Supported Flags - optional target: `agent` or `agent@team` - `--team ` - `--all` - `--unread-only` - `--pending-ack-only` - `--history` - `--since-last-seen` - `--no-since-last-seen` - `--no-mark` - `--no-update-seen` - `--limit ` - `--since ` - `--from ` - `--json` - `--timeout ` - `--as ` ### 7.3 Required Behavior - default to the caller’s own inbox when no target agent is provided - resolve identity and target address using the defined precedence - verify target team exists - verify explicit target agent exists in team config - load messages from the merged inbox surface - deduplicate entries by `message_id` before bucket selection and output rendering - classify each message into the read axis, the ack axis, and a derived message class - map the derived message class into display buckets - support filtering by sender and timestamp - support selection by queue mode - preserve origin-inbox visibility when bridge remotes are configured - sort newest-first before limiting - write displayed messages back through the read-axis mutation rules - persist read-triggered state changes back to the physical inbox file that owns each displayed message when origin inbox files are present in the merged surface - support optional wait mode with timeout - support optional seen-state filtering and updates When multiple inbox entries share the same non-null `message_id`, `atm read` must display only the most recent entry. Earlier duplicates are silently suppressed. Deduplication order: - compare entries by `message_id` - keep the newest entry by message timestamp - when timestamps are equal, keep the later record encountered in inbox order - do not emit suppressed duplicates in either human or JSON output `--timeout` preserves the current queue-first behavior: if the requested read selection already contains unread or pending-ack messages at command start, the command returns immediately with those messages. It blocks only when the requested selection is empty at command start. ### 7.4 Display Buckets The CLI exposes three display buckets: - `unread` - `pending_ack` - `history` Bucket mapping from the derived message class: - `Unread` -> `unread` - `PendingAck` -> `pending_ack` - `Read` -> `history` - `Acknowledged` -> `history` The display buckets are a presentation contract. They are not the canonical two-axis model. ### 7.5 Selection Modes Default selection is the actionable queue: - unread - pending-ack Explicit selection modes: - default => actionable queue only - `--unread-only` => unread bucket only - `--pending-ack-only` => pending-ack bucket only - `--history` => actionable queue plus history bucket - `--all` => all buckets and bypass seen-state filtering Mutual exclusion: - `--all` - `--unread-only` - `--pending-ack-only` - `--history` ### 7.6 Seen-State Rules Seen-state is enabled by default unless `--no-since-last-seen` is set. `--since-last-seen` explicitly enables the default watermark filter. When set explicitly, it behaves the same as the default. If both `--since-last-seen` and `--no-since-last-seen` appear, `--no-since-last-seen` wins. When seen-state is enabled and a watermark exists: - unread messages remain eligible even when older than the watermark - pending-ack messages remain eligible even when older than the watermark - history messages are filtered by the watermark On a true first run with no stored watermark: - the default read view still shows only actionable messages - historical messages remain hidden unless `--history` or `--all` is used `--all` bypasses seen-state filtering entirely. If seen-state updates are enabled: - update the watermark using the latest displayed message timestamp - do not use non-displayed messages when computing the watermark `--no-update-seen`: when this flag is set, messages are read and displayed normally but the seen-state watermark is not updated after the operation. The watermark is left unchanged regardless of which messages were displayed. `--since `: filters to messages whose `timestamp` field is greater than or equal to the given ISO 8601 datetime. It filters by message timestamp, not by the seen-state watermark. It may be combined with seen-state filtering; both constraints apply independently. `--from ` in read context is a sender filter: it restricts displayed messages to those sent by the named agent. It does not override the caller's identity. ### 7.7 Wait Mode Rules When `--timeout ` is set: - establish the read selection baseline after actor resolution, inbox loading, workflow classification, and filter application - if the requested selection already contains eligible messages at wait start, return immediately without blocking - otherwise block until a newly arrived message becomes eligible for the requested read selection, or until the timeout expires - re-run the normal read selection over the updated merged inbox surface once a new eligible message arrives - preserve the same sender, timestamp, seen-state, and selection filters during the wait Timeout success condition: - either the initial selection is already non-empty, or at least one message that was not eligible at wait start becomes eligible before the timeout expires Timeout failure condition: - the initial selection is empty and no newly eligible message arrives before the timeout expires ### 7.8 Mutation Rules Base display mutation: - any displayed message is written back with `read = true` Ack-axis activation on display happens only when: - the caller is reading their own inbox - `--no-mark` is not set - the message is displayed - the message is currently `Unread` - the message does not already require acknowledgement Required transition on read of a normal unread message: - `(Unread, NoAckRequired) -> (Read, PendingAck)` Required transition on read of an ack-required unread message: - `(Unread, PendingAck) -> (Read, PendingAck)` Required transition on read with `--no-mark` or when reading another inbox: - `(Unread, NoAckRequired) -> (Read, NoAckRequired)` No additional ack-axis mutation happens when: - the message is already `PendingAck` - the message is already `Acknowledged` - the message is already `Read` ### 7.9 Processing Order 1. resolve actor and target inbox 2. build the hostname registry for configured origin inboxes 3. load messages from the merged inbox surface 4. classify canonical state 5. apply sender and timestamp filters (`--from`, `--since`) 6. apply seen-state filter when enabled and selection is not `--all` 7. map canonical state to display buckets and apply selection mode 8. if `--timeout` is set and the current selection is empty, block until a newly eligible message arrives or the timeout expires 9. sort newest-first and apply limit 10. apply read-axis and ack-axis transitions to displayed messages 11. persist read-triggered state changes atomically 12. update seen-state when enabled 13. render output ### 7.10 Output Contract Human output must preserve the current queue-oriented shape: - queue heading - bucket counts line - bucketed message output - hidden-history summary when history is collapsed JSON output must include: - `action = "read"` - `team` - `agent` - `messages` - `count` - `bucket_counts` - `history_collapsed` `bucket_counts` fields: - `unread` - `pending_ack` - `history` ## 8. `atm ack` Product requirement ID: - `REQ-P-ACK-001` `atm ack` must satisfy the documented acknowledgement contract. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for human-readable and JSON output aspects - `REQ-CORE-MAILBOX-001` for atomic ack persistence and reply append aspects - `REQ-CORE-WORKFLOW-001` for pending-ack eligibility and acknowledgement transition aspects ### 8.1 Purpose Acknowledge a pending-ack message in the caller's own inbox and send a visible reply to the original sender. ### 8.2 Supported Flags And Inputs - positional `message-id` - positional reply text - `--team ` - `--as ` - `--json` ### 8.3 Required Behavior - resolve the caller's own inbox using the retained identity rules - locate the target message in the merged inbox surface - require the target message to be in the pending-ack ack state - persist the ack transition back to the physical inbox file that owns the source message when the merged inbox surface includes origin inbox files - atomically: - set `read = true` - remove `pendingAckAt` - set `acknowledgedAt` - append a reply message to the original sender's inbox - preserve `acknowledgesMessageId` on the emitted reply - reject duplicate acknowledgement of an already acknowledged message - run matching `[[atm.post_send_hooks]]` rules after a successful ack, using the reply message as the hook subject ### 8.4 Output Contract JSON output must include: - `action = "ack"` - `team` - `agent` - `message_id` - `reply_message_id` (Uuid of the reply message sent) - `reply_text` (String body of the reply message sent) - `task_id` (optional String, present when the source message has `taskId`) - `reply_target` - `warnings` (array of strings, omitted when empty) ## 9. `atm clear` Product requirement ID: - `REQ-P-CLEAR-001` `atm clear` must satisfy the documented clear contract and preserve pending-ack protection. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for human-readable and JSON output aspects - `REQ-CORE-CONFIG-002` for target-validation aspects - `REQ-CORE-MAILBOX-001` for clear-set persistence aspects - `REQ-CORE-WORKFLOW-001` for clear-eligibility and pending-ack protection aspects ### 9.1 Purpose Remove non-actionable messages from one inbox without touching actionable work. ### 9.2 Supported Flags - optional target agent: `agent` or `agent@team` - `--as ` override actor identity for this clear operation - `--team ` - `--older-than ` - `--idle-only` - `--dry-run` - `--json` ### 9.3 Required Behavior - default to the caller's own inbox when no target agent is provided - resolve the target inbox using the retained address and identity rules - compute clear eligibility from the merged inbox surface - persist removals back to the physical inbox file that owns each removed message when origin inbox files are present in the merged surface Default clear behavior removes only clearable messages: - `(Read, NoAckRequired)` - `(Read, Acknowledged)` Clear must never remove: - `(Unread, NoAckRequired)` - `(Unread, PendingAck)` - `(Read, PendingAck)` Additional rules: - `--idle-only` narrows removal to idle-notification messages only - `--older-than` further filters the clearable set by message timestamp age - dry-run returns the computed removal set without mutation - clearing must preserve unknown fields on messages that remain ### 9.4 Output Contract JSON output must include: - `action = "clear"` - `team` - `agent` - `removed_total` - `remaining_total` - removal counters by class ## 10. `atm log` Product requirement ID: - `REQ-P-LOG-001` `atm log` must satisfy the documented shared-observability query/follow contract. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for record rendering/output aspects - `REQ-ATM-OBS-001` for CLI observability bootstrap/injection aspects - `REQ-CORE-LOG-001` for core query/follow/filter behavior aspects - `REQ-CORE-OBS-001` for ATM event/query-model aspects ### 10.1 Purpose Inspect ATM observability records through shared `sc-observability` query/follow APIs. `atm log` replaces the old daemon-log viewing model. It must not depend on daemon-owned log files, daemon status, or tmux fallback behavior. ### 10.2 Supported Flags - `--tail` - `--level ` - `--match ` repeatable - `--since ` - `--limit ` - `--json` Deferred from the current source repo: - direct `--file` selection of arbitrary ATM log files - separate `atm tail` command ### 10.3 Required Behavior - query existing ATM records through the injected observability port over `sc-observability` - support follow mode through the same adapter - support filtering by level - support filtering by structured key/value fields - support filtering by time window - support limit/order controls for non-tail mode - default to snapshot mode when `--tail` is not set - return snapshot results newest-first before applying output limits - return followed records in arrival order while `--tail` is active - use the built-in shared file-backed retained log store as the authoritative query/follow source ### 10.4 ATM Log Fields The retained ATM event vocabulary must include enough structure to filter on: - command - team - actor - target - outcome - error class This ATM field set is ATM-owned even when the underlying query/follow/filter mechanics are shared in `sc-observability`. ### 10.5 Output Contract Human output must show one record per line with enough information to understand: - timestamp - severity - source/service - event name or message - important ATM fields when present JSON output must emit structured records suitable for machine filtering and test assertions. Each JSON record must expose at least: - timestamp - severity - source or service - event name - ATM structured fields map ## 11. `atm doctor` Product requirement ID: - `REQ-P-DOCTOR-001` `atm doctor` must satisfy the documented local diagnostics contract. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for report rendering/output aspects - `REQ-ATM-OBS-001` for CLI observability bootstrap/injection aspects - `REQ-CORE-CONFIG-001` for config and identity inspection aspects - `REQ-CORE-DOCTOR-001` for diagnostic evaluation aspects ### 11.1 Purpose Run local ATM diagnostics for the retained ATM runtime. `atm doctor` remains a local diagnostics command, but in the Phase Q target architecture it must also report daemon/runtime availability because normal ATM mail behavior depends on the singleton daemon being present. ### 11.2 Supported Flags - `--team ` - `--json` ### 11.3 Required Checks The initial doctor implementation must cover: - config file discovery and parse health - effective team resolution - identity resolution inputs and fallbacks - obsolete `[atm].identity` configuration drift detection - daemon control-socket existence and reachability - singleton daemon ownership health - SQLite mail-store path visibility and openability when Phase Q runtime is active - baseline `[atm].team_members` coverage against `config.json.members` - team directory existence - team config existence and parse health - inbox directory existence and writability - stale mailbox lock detection across `~/.claude/teams/*/inboxes/*.lock` using start-of-run and end-of-run snapshots; a lock present in both snapshots is stale and must be reported with `ATM_WARNING_STALE_MAILBOX_LOCK` as a transitional compatibility finding rather than a normal mail-correctness dependency in the Phase Q target architecture - hook identity availability - `ATM_HOME`, `ATM_TEAM`, and `ATM_IDENTITY` override visibility - `sc-observability` initialization health - active shared log path visibility - `sc-observability` query-health readiness for `atm log` ### 11.4 Output Contract Human output must provide: - overall status summary - findings grouped by severity - full current member roster from `config.json`, with baseline `[atm].team_members` shown first and `team-lead` first among that baseline - concrete remediation guidance when the user can act JSON output must provide: - summary - findings - recommendations - environment override visibility - member roster - observability health snapshot Each doctor finding must expose at least: - severity - code - message - remediation when available The obsolete config-identity finding must use: - `ATM_WARNING_IDENTITY_DRIFT` Critical findings must cause a non-zero exit status. ## 12. `atm teams` Product requirement ID: - `REQ-P-TEAMS-001` `atm teams` must satisfy the documented retained local team recovery contract. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for human-readable and JSON output aspects - `REQ-CORE-TEAM-001` for local team discovery, roster mutation, and backup/restore behavior ### 12.1 Purpose Provide the minimum retained local team-recovery surface required for initial release and the documented backup/restore workflow. ### 12.2 Retained Surface The retained `teams` surface for initial release is: - `atm teams` - `atm teams add-member` - `atm teams backup` - `atm teams restore` The retained surface explicitly does not include broader historical team orchestration commands such as: - `spawn` - `join` - `resume` - `update-member` - `remove-member` - `cleanup` ### 12.3 Required Behavior Bare `atm teams` must: - list discovered teams under ATM home deterministically - expose at least team name plus enough summary information, such as member count, to pick a target team for restore or repair work `atm teams add-member` must: - validate that the target team exists - reject duplicate member names - persist the new member entry deterministically in team config - create any required local inbox state atomically with the roster update `atm teams backup` must: - create a timestamped snapshot under the ATM team backup area - capture the current `config.json` - capture team inbox files, excluding transient `*.lock` sentinels, dotfiles, and restore markers - capture the ATM team task bucket - report the created backup path in human and JSON output - not claim to back up the separate Claude Code project task list `atm teams restore` must: - restore from the newest snapshot by default or from an explicit backup path - support a dry-run mode that reports members, inboxes, and tasks that would be restored - preserve the current team-lead entry and current `leadSessionId` - add only missing non-lead members from the snapshot - clear runtime-only restored-member fields such as session, activity, and pane state before persisting them - restore non-lead inbox files from the chosen snapshot deterministically - sweep stale inbox `*.lock` sentinels before copying restored inbox files as a self-heal step - restore the ATM team task bucket and recompute `.highwatermark` from the maximum restored task id - fail with a structured error when backup material is missing or malformed - avoid partial restore on validation or snapshot-load failure ### 12.4 Output Contract Human output must make the performed action and target team clear. JSON output must include: - `action` - `team` `add-member` JSON output must additionally include: - `member` `backup` JSON output must additionally include: - `backup_path` `restore` JSON output must additionally include: - `backup_path` - `members_restored` - `inboxes_restored` - `tasks_restored` Dry-run `restore` JSON output must additionally include: - `dry_run = true` - `would_restore_members` - `would_restore_inboxes` - `would_restore_tasks` ## 13. `atm members` Product requirement ID: - `REQ-P-MEMBERS-001` `atm members` must satisfy the documented local roster inspection contract. Satisfied by: - `REQ-ATM-CMD-001` for CLI entry, parsing, and dispatch aspects - `REQ-ATM-OUT-001` for human-readable and JSON output aspects - `REQ-CORE-TEAM-001` for local roster loading and deterministic projection ### 13.1 Purpose List the current local team roster for verification, recovery, and restore follow-up without depending on daemon-only or hook-only state. ### 13.2 Supported Flags - `--team ` - `--json` ### 13.3 Required Behavior `atm members` must: - resolve the effective team using the retained team-resolution rules - load the local team roster from `config.json` - return a structured error when the team or team config is missing - show all configured members deterministically, with `team-lead` first when present and remaining members in stable local order - expose currently persisted member metadata that ATM already knows locally, such as type, model, cwd, or pane id when present in config - remain useful without daemon or hook state Richer runtime state, such as live session or activity data, may be layered on later, but it is not required for the retained local release surface. ### 13.4 Output Contract Human output must show: - team name - one row per member - enough persisted member detail to verify roster repair or restore outcomes JSON output must include: - `team` - `members` Each member object must expose at least: - `name` - persisted local member metadata when present ## 14. Message And Workflow Model Product requirement ID: - `REQ-P-WORKFLOW-001` The message/workflow model must satisfy the documented persisted-field, two-axis, and legal-transition rules. Satisfied by: - `REQ-CORE-WORKFLOW-001` for the canonical two-axis model and legal transitions ### 14.1 Persisted Message Fields Required fields: - `from` - `text` - `timestamp` - `read` Optional fields: - `source_team` - `summary` - `message_id` - `taskId` - `pendingAckAt` - `acknowledgedAt` - `acknowledgesMessageId` - `metadata` Unknown fields must be preserved. For ATM-authored messages: - ATM machine-readable identity is mandatory - current legacy top-level `message_id` values may be UUID - forward metadata `messageId` values must be ULID - ATM-authored machine identifiers must not be null or blank Legacy or externally imported records may still omit `message_id`; the rewrite must preserve such records without inventing synthetic ids during read. ### 14.2 Two-Axis Canonical Model The canonical model has two independent axes. Read axis: - `Unread` - `Read` Ack axis: - `NoAckRequired` - `PendingAck` - `Acknowledged` Persisted-field classification: - read axis: - `read = false` => `Unread` - `read = true` => `Read` - ack axis: - `acknowledgedAt` present => `Acknowledged` - else `pendingAckAt` present => `PendingAck` - else => `NoAckRequired` Derived message class for queue logic: 1. ack axis `PendingAck` => `PendingAck` 2. else ack axis `Acknowledged` => `Acknowledged` 3. else read axis `Unread` => `Unread` 4. else => `Read` The canonical two-axis model is distinct from the read command’s display buckets. ### 14.3 Required State Transitions ```text Send normal message -> (Unread, NoAckRequired) Send ack-required message -> (Unread, PendingAck) Send task-linked message -> persist taskId -> (Unread, PendingAck) Read own inbox with marking enabled, normal unread message (Unread, NoAckRequired) -> (Read, PendingAck) Read own inbox with marking enabled, ack-required unread message (Unread, PendingAck) -> (Read, PendingAck) Read own inbox with --no-mark (Unread, NoAckRequired) -> (Read, NoAckRequired) (Unread, PendingAck) -> (Read, PendingAck) Read another inbox (Unread, NoAckRequired) -> (Read, NoAckRequired) (Unread, PendingAck) -> (Read, PendingAck) (Read, PendingAck) -> (Read, PendingAck) (Read, Acknowledged) -> (Read, Acknowledged) (Read, NoAckRequired) -> (Read, NoAckRequired) Ack workflow (Read, PendingAck) -> (Read, Acknowledged) and emit a reply message that references the original message id Clear workflow remove only (Read, NoAckRequired) and (Read, Acknowledged) ``` Disallowed transitions: - any transition that makes the read axis move from `Read` back to `Unread` - `Acknowledged -> PendingAck` - `Acknowledged -> NoAckRequired` - clearing a message in `PendingAck` - clearing a message with read axis `Unread` The implementation must encode legal transitions in code structure, not only in comments or tests. ### 14.4 Task Metadata Rule Messages with `taskId` are task-linked messages. Required rules: - every task-linked message must require acknowledgement - a task-linked message remains actionable until acknowledged - a task-linked message must continue to appear in `atm read` until acknowledged - a task-linked message must never be removed by `atm clear` before acknowledgement ## 15. Observability Requirements Product requirement ID: - `REQ-P-OBS-001` ATM observability must satisfy the documented best-effort emit behavior and shared query/follow/health expectations. Satisfied by: - `REQ-ATM-OBS-001` for CLI bootstrap/injection aspects - `REQ-CORE-LOG-001` for ATM log query/follow service aspects - `REQ-CORE-DOCTOR-001` for observability health reporting aspects - `REQ-CORE-OBS-001` for ATM event and query-model boundary aspects ATM must emit structured records through `sc-observability`. Initial shared integration scope: - `sc-observability-types` - `sc-observability` Deferred from the initial retained observability integration: - `sc-observe` - `sc-observability-otlp` Required ATM event classes: - command started - command succeeded - command failed - mailbox record skipped Required ATM event fields: - command name - team when known - actor identity when known - target identity when known - task id when known - result - error class on failure - count when applicable - transition count when applicable Emission is best-effort: - logging failures must never block retained command behavior - command correctness takes priority over observability delivery Sink policy: - the shared file sink is required for retained ATM observability - the shared console sink is optional and must remain off by default for normal ATM CLI command execution so command output stays stable - console logging may be enabled later for explicit local debugging or integration testing Diagnostic logging rules: - command failures must emit structured failure diagnostics before the CLI exits, even when the command fails before reaching a core service - degraded recovery paths that intentionally continue, such as malformed-record skips or missing-config fallback warnings, must also emit structured warning diagnostics - every ATM warning/error diagnostic must carry a stable ATM-owned error code in addition to human-readable text - command lifecycle failure events must include the stable error code when one is available `atm log` and `atm doctor` are not best-effort features in the same sense: - they are explicit observability consumers - if shared query/health APIs are unavailable, they must fail with clear structured errors ## 16. Error Requirements Product requirement ID: - `REQ-P-ERROR-001` Public command failures must satisfy the documented structured error requirements. Satisfied by: - intentionally undecomposed product requirement; crate-local error ownership remains derived from command and service requirements rather than a dedicated crate requirement ID in this pass All user-visible failures must use structured errors with recovery guidance. Persisted-data failures must preserve parser and entity context when available. Stable error-code rule: - every public `AtmError` must map to a stable ATM-owned error code - ATM warning and error logs must include that code - CLI bootstrap and argument-validation failures must also be logged with a stable error code before process exit - the single source of truth for ATM-owned error codes is [`atm-error-codes.md`](./atm-error-codes.md) Minimum error categories: - configuration - missing document - address - identity resolution - team not found - agent not found - mailbox read - mailbox write - message validation - store - ingest - export - transport - daemon runtime - daemon singleton - daemon client Phase Q required families: - store: - SQLite bootstrap/open - schema/transaction - busy-timeout / saturation - ingest: - replay/import failure - backpressure/degraded ingest - export: - Claude compatibility export failure - re-export/replay failure - transport: - local daemon request failure - remote connect/timeout/protocol failure - daemon runtime: - shutdown timeout - signal/reload failure - runtime over-capacity - daemon singleton: - already-running daemon - stale-artifact cleanup/release failure - daemon client: - daemon unavailable - daemon health-query timeout - serialization - file policy - wait timeout - observability emit - observability query - observability health Mutation failures must be fail-safe: - no partial send writes - no partial read-mark updates - no illegal state transitions after failed persistence ## 17. Reliability Requirements Product requirement ID: - `REQ-P-RELIABILITY-001` The retained command surface must satisfy the documented durability and consistency constraints. Satisfied by: - `REQ-CORE-MAILBOX-001` for atomicity, duplicate suppression, and mailbox consistency aspects - mailbox writes must be atomic - concurrent appends must not silently lose messages - duplicate message ids must not be appended twice - read-time duplicate message ids collapse to the newest visible entry - corrupt records should be skipped individually when possible - persisted config/team schema drift should recover with deterministic defaults when safe - missing team config may use only the explicitly documented send fallback behavior - persisted config/team records with missing identity or routing-critical fields must fail or be isolated rather than guessed - missing inbox files are treated as empty inboxes - seen-state races must not corrupt mailbox data - observability emission failures must not corrupt command behavior ## 18. Testing Requirements Product requirement ID: - `REQ-P-TEST-001` The rewrite must satisfy the documented testing obligations. Satisfied by: - intentionally undecomposed product requirement; this governs workspace-level test coverage expectations rather than a single crate-local requirement ID Because `sc-observability` is newly introduced into ATM, the rewrite must add explicit test coverage for: - ATM event emission through the observability port boundary - best-effort emission failure behavior - two-axis state classification - two-axis state transition enforcement - task-linked ack-required transition behavior - log query by severity - log query by structured field match - log follow/tail behavior - doctor observability-health reporting - teams list behavior over the local ATM home - members list behavior over local team config - add-member duplicate validation and inbox creation - backup snapshot completeness - restore dry-run reporting - restore preservation of team-lead / `leadSessionId` - restore recomputation of `.highwatermark` to the maximum restored task id - retained mail-command correctness when observability emission fails - clear eligibility behavior The implementation must include: - `atm-core` tests for observability port behavior using test doubles - CLI integration tests for `atm log` - CLI integration tests for `atm doctor` - CLI integration tests for `atm ack` - CLI integration tests for `atm clear` - CLI integration tests for `atm teams` - CLI integration tests for `atm members` ## 19. Acceptance Criteria Product requirement ID: - `REQ-P-ACCEPTANCE-001` The rewrite is complete only when the documented acceptance criteria are met. Satisfied by: - intentionally undecomposed product requirement; this defines overall product completion gates rather than a single crate-local obligation The rewrite is ready when: - `atm send` works through the documented production runtime path - `atm read` works through the documented production runtime path - `atm ack` works through the documented production runtime path - `atm clear` works through the documented production runtime path - `atm log` works through shared `sc-observability` APIs - `atm doctor` works as a local diagnostics command with daemon/runtime visibility in the Phase Q target architecture - `atm teams` provides the retained local team recovery surface - `atm members` provides the retained local roster verification surface - retained commands preserve documented behavior, and any Phase Q runtime-shape changes are explicit in the requirements and architecture - workflow-axis classification is correct - workflow-axis transitions are encoded in implementation structure - display buckets are derived consistently from the two-axis model - task-linked messages remain pending until acknowledged unless the operator explicitly acknowledges them through `atm ack` - observability integration is exercised by automated tests - the file-by-file migration plan is complete enough to implement directly Cross-document invariants that must remain true: - `taskId` implies ack-required behavior at send time - displayed messages always persist `read = true` - pending-ack messages remain actionable until acknowledged - `atm clear` never removes unread messages - `atm clear` never removes pending-ack messages - `atm read --timeout` returns immediately when the requested selection is already non-empty ## 20. Phase M: Mailbox Concurrency And Restore Atomicity Phase M addresses blocking and important findings from the Phase L code review (ARCH-CR-001 through ARCH-CR-004 and associated QA findings) that must be closed before the 1.0 release. ### 20.1 Mailbox Concurrency Safety - `REQ-CORE-MAILBOX-LOCK-001` All mailbox read-modify-write operations must hold an exclusive advisory file lock for the duration of the operation. Rationale: `append_message` in `mailbox/mod.rs` currently reads the full inbox, appends one record in memory, then calls `atomic::write_messages` to replace the file. Two concurrent writers can both read the same snapshot and the later rename silently drops the earlier writer's append. This is ARCH-CR-001. Required behavior: - before entering any read-modify-write section on an inbox file, ATM must acquire an exclusive advisory lock on a well-known lock sentinel derived from the inbox path - the lock must be held for the full duration of read + modify + atomic replacement, including any durability sync that is part of the shared atomic-write helper boundary - lock release must happen automatically when the lock guard is dropped (RAII) - lock acquisition must use a bounded timeout (default 5 seconds) and fail with a structured `AtmError` carrying `AtmErrorCode::MailboxLockTimeout` when the timeout expires - the lock sentinel path is a transient runtime artifact: ATM writes the owner pid while the lock is held, unlinks the sentinel on guard drop, and must tolerate stale pid-bearing sentinels from crashed processes - advisory locking is cooperative: only concurrent ATM processes coordinate - the sentinel lock must not block Claude Code's native inbox appends because Claude does not participate in ATM's cooperative lock protocol - `REQ-CORE-MAILBOX-LOCK-002` Mailbox locking must work on macOS, Linux, and Windows without platform-specific feature flags in consuming code. Required behavior: - on Unix: use `flock(2)` exclusive lock on the lock sentinel file descriptor - on Windows: use `LockFileEx` exclusive lock on the lock sentinel file handle - the public API must present a single `MailboxLockGuard` type that is platform-uniform; platform branching is internal to `lock.rs` - the `fs2` crate is the preferred implementation - `REQ-CORE-MAILBOX-LOCK-003` Locks must be per-inbox-file, not per-team or global. Required behavior: - locking is scoped to a single inbox file path - two concurrent `atm send` commands to different recipients must not block each other - the lock sentinel path is `{inbox_path}.lock` - `REQ-CORE-MAILBOX-LOCK-004` Every mailbox mutation path must acquire the lock. Required coverage: - `append_message` for both normal send and the missing-config team-lead notice path - workflow state writeback in read, ack, and clear paths - any future mutation path added to the mailbox layer Read-only `read_messages` calls with no following writeback do not require locking. - `REQ-CORE-MAILBOX-LOCK-005` Multi-source mailbox commands must acquire their final required lock set before any mutating source reread, and must do so in deterministic path order. Rationale: `read`, `ack`, and `clear` do not operate on a single inbox file. The executed Phase P design permits unlocked observational snapshots when no mutation is committed from that snapshot, but any state-changing path must reacquire the full deterministic lock set, reload fresh source files under that lock set, recompute the mutation, and then persist. Locking only during the final write step would still allow stale reads and lost updates. Required behavior: - `read` is a `read_possible_write` path: it may take an unlocked observational snapshot of the source inbox set, but if display-state mutation is needed it must re-discover the current source-file set, dedupe duplicate paths, sort the resulting paths deterministically by canonical path string, acquire the full lock set, then reload and recompute under that lock set before persisting - `ack` uses an unlocked preflight plus one final superset lock: it may resolve the reply target and candidate source message from an unlocked preflight, but it must acquire the final sorted superset lock plan before the mutating source reread, then re-read and re-validate the pending acknowledgement state under that final lock set before writing either the source or reply mailbox state - mutating `clear` is a full-lock-through-persist path: it must acquire the deterministic lock set before its mutating source reread and must hold that lock set through removal computation, mailbox replacement, and workflow-sidecar updates; `clear --dry-run` remains observational and lock-free - final source-file discovery for a mutating path must use the command's existing requested-inbox plus origin-inbox resolution logic - legitimately absent inbox paths at discovery time are excluded from the lock set rather than locked speculatively - source enumeration faults are not treated as absent paths; if origin inbox discovery cannot enumerate the candidate directory completely, the command must fail closed instead of continuing with a partial source set - for any mutating path, those locks must remain held through the fresh surface computation, state transition, and final writeback - deterministic ordering must prevent deadlock when two commands contend on the same pair of inbox files in opposite discovery order - lock acquisition uses one total timeout budget for the full lock set, not a fresh timeout per file - if any lock in the set cannot be acquired, every previously acquired lock in that attempt must be released immediately and the command must fail without mutating any source inbox from a partially locked snapshot - partial lock acquisition must never degrade into a best-effort state-changing command result for `read`, `ack`, or `clear` - the unlocked observational snapshot used by `read`, `ack`, or dry-run `clear` must never be the snapshot from which a later mutating commit is persisted - source discovery for mutating commands must fail closed: if directory enumeration itself fails or if any directory entry in the candidate inbox directory cannot be enumerated reliably, the command must abort before the mutating reread instead of warning and continuing with a partial source set - if a discovered file disappears or becomes unreadable after lock planning but before or during the under-lock source-file load, the command must fail as a normal operator-actionable file-read error and must not persist any partial state - `REQ-CORE-MAILBOX-LOCK-006` Single-process single-threaded usage must not regress measurably due to lock acquisition. Required behavior: - uncontended `flock` is a single syscall returning immediately; no background threads or polling loops - lock sentinel created lazily on first lock attempt - `REQ-CORE-MAILBOX-LOCK-007` Lock acquisition must distinguish true lock contention from other lock-path I/O failures. Required behavior: - only retry errors that actually mean "lock currently held by another process" for the current platform/API surface - if the sentinel file cannot be opened, locked, or queried because of a non-contention I/O or OS error, fail immediately with `MailboxLockFailed` rather than sleeping until the timeout budget expires - `MailboxLockTimeout` is reserved for genuine contention or equivalent lock-busy conditions - operator recovery guidance must distinguish "wait and retry" from "repair filesystem/permissions state" - `REQ-CORE-MAILBOX-LOCK-008` Stale-lock sweeping must identify rotated lock sentinels conservatively and must evict only verifiable orphaned candidates. Required behavior: - candidate matching is based on the basename, not the full path - the accepted sentinel predicate is: `file_name.ends_with(".lock") || file_name.contains(".lock.")` - the sweep must not use `path.extension() == "lock"` because that misses rotated sentinels such as `inbox.json.lock.old` - the sweep must not broaden to arbitrary substring matching such as `contains("lock")`; non-sentinel files like `locksmith.txt` must not be considered - a matched candidate is evictable only when its contents parse as the documented `pid[:token]` owner record format and `process_is_alive(pid)` returns false - malformed or unreadable candidate contents are treated as non-evictable and must be left in place for explicit operator cleanup instead of speculative deletion - the sweep is a best-effort stale-artifact cleanup path, not a second lock authority; it must not claim ownership without the existing advisory-lock acquisition succeeding afterward - Windows rename semantics must not be assumed to match Unix for a live held lock handle; rotated-name sweeping exists to clean up post-crash or externally renamed artifacts, not to coordinate live-lock handoff Acceptance Criteria: - positive predicate cases: `inbox.json.lock`, `inbox.json.lock.old`, and `inbox.json.lock.replaced` are all treated as stale-sentinel candidates - negative predicate cases: malformed or unrelated names such as `inbox.json.lockold`, `locksmith.txt`, and `inbox.locksmith.json` are not treated as stale-sentinel candidates - malformed rotated candidates that do match the filename predicate but do not contain a parseable `pid[:token]` owner record remain in place and are not deleted speculatively - `REQ-CORE-MAILBOX-LOCK-009` Read-only filesystem failures on the mailbox-lock path must surface as a dedicated non-contention diagnostic. Required behavior: - ATM must classify read-only filesystem errors by raw OS error code rather than treating them as generic permission failures - the required platform mappings are: - Linux: `EROFS` (`30`) - macOS: `EROFS` (`30`) - Windows: `ERROR_WRITE_PROTECT` (`19`) - the same classification helper must be used for lock-path open/create, owner-record truncate/write, and sentinel removal so retry behavior and operator guidance stay consistent - read-only filesystem errors must not participate in the lock-contention retry loop and must not be retried by sentinel-removal backoff logic - on every lock-acquisition retry iteration, read-only-filesystem classification must run before any timeout-budget decision; a classified `EROFS` / `ERROR_WRITE_PROTECT` failure must never fall through to `MailboxLockTimeout` - mutation-path failures caused by a read-only filesystem must return `MailboxLockReadOnlyFilesystem` / `ATM_MAILBOX_LOCK_READ_ONLY_FILESYSTEM`, not `MailboxLockFailed` or `MailboxLockTimeout` - the structured error message and recovery guidance must include the lock path plus the specific attempted operation (`open`, `write owner record`, or `remove stale sentinel`) so operators can distinguish remount/media failures from ACL or contention issues - other non-contention lock-path filesystem failures, including `ENOSPC`, `EMFILE`, and `ESTALE`, remain `MailboxLockFailed` and are not retried - best-effort drop-time cleanup remains warn-only because the command has already completed, but public sweep or acquisition paths must surface the read-only diagnosis instead of silently suppressing it Acceptance Criteria: - `ATM_TEST_FORCE_LOCK_READONLY_FS=open` injects a synthetic platform-correct read-only-filesystem error into the lock open/create path only; owner-record write and sentinel-removal paths continue to run normally - `ATM_TEST_FORCE_LOCK_READONLY_FS=write_owner` injects a synthetic read-only-filesystem error into the owner-record truncate/write path only - `ATM_TEST_FORCE_LOCK_READONLY_FS=remove` injects a synthetic read-only-filesystem error into the stale-sentinel removal path only - when the seam is unset or set to any other value, no synthetic read-only filesystem failure is injected - read-only failures injected through any of the three seam values surface as `MailboxLockReadOnlyFilesystem` / `ATM_MAILBOX_LOCK_READ_ONLY_FILESYSTEM`, never as `MailboxLockTimeout` ### 20.2 Shared Mutable File Atomicity - `REQ-CORE-PERSIST-ATOMIC-001` Every shared mutable ATM-owned structured state file must be persisted atomically. Scope: - live inbox files under `.claude/teams//inboxes/*.json` - team `config.json` - ATM-owned task-bucket JSON/state files written during backup/restore flows - `.highwatermark` and any equivalent ATM-owned monotonic task-state file - send-alert / restore-progress / similar ATM-owned persisted coordination state when that state is shared across processes or operators - any future ATM-owned JSON or JSONL file that can be rewritten by more than one ATM process, agent, or operator workflow Required behavior: - live-file replacement must use a temp-file + fsync + rename pattern or an equivalent same-filesystem atomic-replacement mechanism - for files replaced via rename, the helper must fsync the parent directory after the rename whenever the platform allows directory-sync semantics, so successful return means both file contents and name publication are durably committed as far as the host platform can provide - no live shared structured file may be truncated and rewritten in place - mailbox locking does not replace atomic persistence; both are required for mailbox files - temp-file + rename atomicity alone is not a source-unchanged compare-and-swap against non-cooperating writers, so ATM must not claim mailbox rewrite safety for concurrent Claude Code appends - `REQ-CORE-PERSIST-ATOMIC-001A` Shared mutable file commits must use one of the documented mutation classes and the shared commit protocol. Required behavior: - `read_only` paths must not acquire mailbox/file locks - `read_possible_write` paths may do an initial unlocked read, but any actual commit must prove source freshness before replacing the live file - `read_modify_write` paths must also prove source freshness before replacing the live file - acceptable freshness proofs are limited to: - compare-and-swap against the exact earlier snapshot, or - lock, reread current state, recompute, and then commit - a stale-snapshot rename after late lock acquisition is forbidden even if the rename itself is atomic Open hardening gap — `P.6` send-side workflow freshness: - mailbox read/ack/clear paths satisfy this through `mailbox::store::with_locked_source_files(...)` - workflow-sidecar writes performed during `send` and the missing-config team-lead notice path are already atomic and owner-routed, but they do not yet provide a dedicated freshness proof across concurrent same-recipient sends; P.6 is the tracked hardening item for that gap - `REQ-CORE-PERSIST-ATOMIC-001B` Every shared mutable file family must have one documented write path and one owning helper boundary. Required behavior: - mailbox file replacement must go through the mailbox atomic helper family - shared generic state replacement must go through the shared persistence helper family - new live structured files must not introduce bespoke `fs::write`, truncate-and-rewrite, or ad hoc temp-file logic at individual call sites - if a file family needs special preconditions such as lock ordering or freshness validation, those preconditions must be enforced at the shared helper boundary or a single owner-layer wrapper around it - the current owner-layer set is: - mailbox compatibility surface: `mailbox::store::observe_source_files(...)` for lock-free snapshots, `mailbox::store::with_locked_source_files(...)` for shared read/ack/clear lock+reload orchestration, and `mailbox::store::commit_mailbox_state(...)` / `mailbox::store::commit_source_files(...)` as the persistence leaf - workflow-state sidecar: `workflow::{load_workflow_state(...), save_workflow_state(...), project_envelope(...), remember_initial_state(...), apply_projected_state(...), remove_message_state(...)}` - seen-state watermark: `read::seen_state::save_seen_watermark(...)` - send-alert state: `send::alert_state::{register_missing_team_config_alert(...), clear_missing_team_config_alert(...), save(...), acquire_lock(...)}` - team config: `team_admin::write_team_config(...)` - task bucket and `.highwatermark`: `team_admin::restore::restore_task_state_from_backup(...)` - restore marker and restore staging: `team_admin::restore::write_restore_marker(...)`, `team_admin::restore::clear_restore_marker(...)`, `team_admin::restore::prepare_restore_workspace(...)`, and `team_admin::restore::cleanup_restore_workspace(...)` - send-side workflow seeding must not continue indefinitely as an open-coded `load -> mutate -> save` sequence in command-layer logic; P.6 exists to converge that path onto a dedicated owner-layer freshness boundary - `REQ-CORE-PERSIST-ATOMIC-001C` ATM must not claim rewrite safety for non-cooperating external writers. Required behavior: - if a live file can be concurrently changed by a writer outside ATM’s lock protocol, ATM must document whether that file is: - read-only from ATM’s perspective, or - a legacy compatibility surface with known overwrite risk, or - protected by real freshness validation/CAS - for Claude-owned inbox files, advisory lock correctness applies only to concurrent ATM writers - ATM-local workflow state that requires stronger guarantees must move to an ATM-owned source-of-truth path rather than relying on full-file rewrite of the Claude-owned inbox surface - `REQ-CORE-PERSIST-ATOMIC-002` Phase M must treat atomic persistence as a cross-cutting invariant, not a mailbox-only or restore-only rule. Required behavior: - when Phase M touches a shared mutable structured file path, the implementation must either route that path through an existing atomic write helper or add one before modifying the file - new shared mutable JSON/JSONL/state files introduced during Phase M must adopt the same atomic persistence contract immediately rather than deferring to a follow-on cleanup sprint - `REQ-CORE-PERSIST-ATOMIC-003` Atomic persistence helpers must be centralized and reused instead of duplicated ad hoc at call sites. Required behavior: - `atm-core` must own the shared atomic persistence primitive used by mailbox, config, task-bucket, highwatermark, and shared coordination writers - mailbox writes continue using the mailbox atomic helper - team-config writes continue using `write_team_config(...)` - task-bucket / highwatermark / shared state writes added or touched by Phase M must use a documented helper with the same temp-file + rename semantics - the Phase M audit must grep for direct `fs::write`, `File::create`, or equivalent in-place rewrites of live shared mutable structured files and either remove them or document why the path is not in scope ### 20.2.1 Shared Commit And Freshness Validation The required shared commit protocol is: 1. classify the operation as `read_only`, `read_possible_write`, or `read_modify_write` 2. perform any unlocked observational read allowed by that class 3. compute whether a write is necessary 4. if no write is needed, return without locking 5. if a write is needed, enter the owning write path for that file family 6. prove source freshness by CAS or by lock + reread + recompute 7. write the temp file, fsync, rename, and perform any required directory sync The intentionally forbidden shape is: - read old snapshot - compute mutation from old snapshot - acquire late lock - rename blindly over a newer live file ### 20.2.2 Locking Failure-Path Test Contract - `REQ-CORE-MAILBOX-TEST-001` Phase M follow-up coverage must include deterministic failure-path locking tests in addition to success-path no-deadlock tests. Required behavior: - add bounded tests for lock contention timeout on the mutation commands that use mailbox locking; for the follow-up sprint the explicit command coverage list is `send` for contention timeout, `clear` for fail-closed discovery, and `send` for non-contention lock-error classification - add deterministic coverage for fail-closed source discovery when an origin inbox directory entry cannot be enumerated successfully - add deterministic coverage for non-contention lock-path failures so they do not regress into `MailboxLockTimeout` - `REQ-CORE-MAILBOX-TEST-002` Locking tests must use bounded, non-flaky construction that cannot hang indefinitely. Required behavior: - use explicit timeout-based synchronization (`recv_timeout`, `wait_timeout`, elapsed-time assertions with bounded slack) rather than open-ended thread joins or sleeps waiting for success - tests for directory-entry enumeration failure must use a deterministic seam or injected enumerator/fault source rather than permission tricks, racing deletes, or environment-sensitive filesystem behavior - tests for non-contention lock errors must use a deterministic seam or injectable failure source rather than depending on platform-specific errno behavior - tests that intentionally hold a lock must guarantee teardown via scoped guards/channels even when the assertion path fails - crash-durability helper tests should verify sequencing and error propagation through deterministic seams; they must not rely on real crash simulation - forbidden test patterns: - open-ended polling waiting for "eventual" success - indefinite `join()` or blocking wait with no timeout - sleeps used as the primary correctness mechanism - race-dependent stress loops expected to pass only "most of the time" ### 20.3 Restore Transaction Atomicity - `REQ-CORE-RESTORE-ATOMIC-001` `teams restore` must write `config.json` as the last mutation step, only after all other restore mutations succeed. Rationale: ARCH-CR-002 — `team_admin.rs:372-400` copies inboxes, restores tasks, recomputes highwatermark, then writes config. If the process dies between inbox copy and config write, the team has partially restored inbox files that do not match the config roster. Required behavior: - restore planning and backup validation happen before the marker is written - config.json is written last, after all inbox copies and task restores succeed - a `.restore-in-progress` marker file is written to the team directory before mutation begins and removed after config is successfully fsynced - the config-last step must continue using the existing `write_team_config(...)` atomic temp-file + rename pattern instead of introducing a second config persistence path - on next `atm teams restore`, if a `.restore-in-progress` marker exists, warn the operator and recommend re-running the restore - `atm doctor` must check for stale `.restore-in-progress` markers and report them as findings with recovery guidance - `REQ-CORE-RESTORE-ATOMIC-002` Restored inbox files must be staged before being placed in the live inbox directory. Required behavior: - inbox files from the backup must first be copied to `.restore-staging/inboxes/` - after all staging copies succeed, move staged files to the live inboxes directory using `fs::rename` where possible - on staging or move failure, clean up the staging directory and fail without writing config - if stale staging already exists at restore start, the command must either clean it first or fail with a recovery message; it must never merge old and new staging contents implicitly - `REQ-CORE-RESTORE-ATOMIC-003` Stale restore-progress markers must have a fixed diagnostics contract. Required behavior: - `atm doctor` must report stale `.restore-in-progress` markers as warnings - the finding must not become a blocking error by default - the finding must include recovery guidance telling the operator to rerun `atm teams restore` or remove the marker after manual verification ### 20.4 Error Display And Diagnostics - `REQ-CORE-ERROR-DISPLAY-001` `AtmError::Display` must remain concise and must not emit multi-KB backtrace output. Required behavior: - `Display` renders the human-readable message and recovery text only - captured backtraces remain available via Debug output and a dedicated accessor on `AtmError` - `REQ-CORE-ERROR-DOC-001` Every public function returning `AtmResult` or `Result<_, AtmError>` in the explicit Phase M audit inventory must have a `# Errors` documentation section. Required behavior: - the Phase M audit inventory must explicitly include: - `mailbox/mod.rs` - `mailbox/lock.rs` - `read/mod.rs` - `ack/mod.rs` - `clear/mod.rs` - `team_admin.rs` - `doctor/mod.rs` - `error.rs` - `config/mod.rs` - `home.rs` - `send/mod.rs` - `send/input.rs` - `send/file_policy.rs` - `identity/mod.rs` if the consolidation lands there - any new public atomic/state helper introduced by Phase M - each `# Errors` section must list the `AtmErrorCode` variants the function can return - the implementation must audit the current public API surface instead of relying on a stale hard-coded function count - `REQ-CORE-ERROR-RECOVERY-001` Every `AtmError` construction site in the explicit Phase M audit inventory that represents an operator-actionable failure must use `.with_recovery()`. Required behavior: - Phase M must perform a grep-driven audit of remaining bare `AtmError::new(...)`, `AtmError::mailbox_*`, `AtmError::file_policy(...)`, and similar operator-actionable construction sites in the explicit Phase M audit inventory - the audit must explicitly include bare operator-actionable sites in: - `mailbox/mod.rs` - `mailbox/lock.rs` - `read/mod.rs` - `ack/mod.rs` - `clear/mod.rs` - `team_admin.rs` - `doctor/mod.rs` - `config/mod.rs` - `home.rs` - `address.rs` - `send/mod.rs` - `send/input.rs` - `send/file_policy.rs` - `identity/mod.rs` if new operator-facing errors are introduced there - any new M.1/M.2 helper that constructs `AtmError` - permission, timeout, missing-file, malformed-input, lock-contention, and operator-remediable configuration failures are always considered operator-actionable for this audit - sites already covered by L.7/L.8 recovery work do not need duplicate edits - internal invariant violations do not require recovery guidance ### 20.5 Code Consolidation And Documentation - `REQ-CORE-IDENTITY-CONSOLIDATE-001` The duplicated `resolve_actor_identity` function must be consolidated into a single shared implementation. Required behavior: - the identical helper currently present in `ack/mod.rs`, `clear/mod.rs`, and `read/mod.rs` must be moved to `identity/mod.rs` as `pub(crate)` - `REQ-CORE-CONFIG-DOC-001` The deprecated `[atm].identity` config key must be documented in a `# Deprecated` section in the config module documentation. Required behavior: - migration guidance: use `ATM_IDENTITY` environment variable instead - reference `ATM_WARNING_IDENTITY_DRIFT` error code - `REQ-CORE-PANIC-DOC-001` The panic path in `normalize_json_number` must be eliminated and documented. Required behavior: - `normalize_json_number(...)` must return the raw input string on exponent parse failure or unsupported exponent range instead of panicking - a library function must not panic on potentially untrusted input ## 21. Phase Q: SQLite Mail SSOT, Runtime Boundaries, And Lock Elimination Phase Q supersedes the mailbox-lock line as the target architecture for ATM mail correctness. The `REQ-CORE-MAILBOX-LOCK-*` requirements remain transitional compatibility constraints only for the interim file-based line. The release-complete target is elimination of mailbox-lock dependence from ATM mail correctness. ### 21.1 SQLite Mail And Roster Ownership - `REQ-CORE-RUNTIME-001` ATM mail and team roster state must move to SQLite as the authoritative source of truth. Required behavior: - SQLite is the durable source of truth for: - message records - read/unread state - ack-required / acknowledged state - clear/visibility state - task linkage and task metadata - team roster - Claude-owned inbox JSONL files are compatibility ingress/export surfaces, not ATM's authoritative durable mail store - `config.json` becomes a roster-ingress source, not the durable roster truth - `REQ-CORE-STORE-001` The Phase Q SQLite store must use one documented schema contract with stable keys, constraints, and indexes. Required behavior: - the authoritative schema must define at least: - `messages` - `ack_state` - `message_visibility` - `tasks` - `team_roster` - `inbox_ingest` - `message_key` is the canonical ATM durable message identity - `message_key` format must be deterministic and typed by source family: - `atm:` for ATM-authored durable rows - `ext:` for imported external rows without ATM ids - schema constraints must forbid duplicate authoritative identities - the schema must document the required lookup indexes for message lookup, task lookup, visibility projection, and ingest dedupe - `REQ-CORE-STORE-002` The Phase Q SQLite store must enforce WAL and explicit transaction policy. Required behavior: - `journal_mode = WAL` - `foreign_keys = ON` - mutating ATM command flows must use explicit transactions - no production mutating path may rely on implicit per-statement autocommit as the normal correctness model - `REQ-CORE-INGEST-001` Inbox/config ingest must use one owned contract for replay, backpressure, and degradation. Required behavior: - ingest must be idempotent - parseable external rows must not be silently dropped - malformed external rows must emit structured diagnostics rather than panic - backlog/slow-ingest conditions must surface through structured diagnostics or health findings rather than dropping records silently - roster/config ingest must apply one deterministic last-write-wins policy for replacing roster truth in SQLite - `REQ-CORE-RUNTIME-003` Crash recovery and replay must preserve the durable ordering rule for daemon-managed export work. Required behavior: - the ordering rule is `SQLite commit -> export / remote handoff` - re-export/replay must be keyed by durable `message_key` - if daemon-managed retry/re-export state survives crash, it must be stored durably with a bounded expiry/deadline - persisted retry state must not become a long-lived remote outbox - `REQ-CORE-RUNTIME-002` Live agent status must not use SQLite as its authoritative live truth. Required behavior: - live status is runtime-owned daemon state - SQLite stores the current durable `pid` for each member as roster truth, and daemon memory caches it as the primary liveness field - daemon runtime state must include `last_active_at` for each known active agent/member entry - SQLite must not own live `last_active_at`; it remains daemon-memory-only runtime state - roster truth and live-status truth must remain distinct - `pid` is not a diagnostic snapshot or advisory hint; it is the durable roster-owned process identity until replaced by the documented heartbeat or admin-takeover path ### 21.2 Singleton Daemon Runtime - `REQ-CORE-DAEMON-001` ATM must run exactly one daemon per host in the Phase Q runtime. Required behavior: - it must be impossible for two active ATM daemons to run on one host at the same time - daemon startup must fail deterministically when a live daemon already owns the host runtime - stale daemon ownership artifacts may be cleaned up only when they are proven stale - stale cleanup must never allow two live daemons - `REQ-CORE-DAEMON-002` The daemon must be a thin runtime wrapper rather than a unique business-logic layer. Required behavior: - daemon responsibility is limited to runtime orchestration such as: - transport listeners - route selection - live-status cache - watch/reconcile loop if enabled - the daemon must not become the only place where ATM mail semantics are implemented - `REQ-CORE-DAEMON-003` Production ATM commands must connect to the daemon and auto-start it when absent. Required behavior: - production CLI/runtime calls first attempt to connect to an already-running daemon - if the daemon is absent, the production CLI/runtime path auto-starts it and retries once - if the daemon remains unavailable after auto-start, ATM must fail with a clear recovery message rather than silently falling back to direct SQLite or inbox-file access - in-process test harnesses may bypass the daemon only inside explicit test wiring, not in the production path Satisfies: - `REQ-P-RUNTIME-001` - `REQ-CORE-DAEMON-004` The daemon must implement one documented graceful shutdown and signal-handling contract. Required behavior: - `SIGINT` and `SIGTERM` begin graceful shutdown - `SIGHUP` triggers bounded runtime rescan/reload without releasing singleton ownership - signal handlers install before listeners begin accepting - graceful shutdown must stop accepts, drain inflight work, checkpoint WAL, and release singleton ownership in order ### 21.3 Strict I/O Ownership Boundaries - `REQ-CORE-BOUNDARY-001` Every subsystem must be behind a strict trait boundary for all external I/O. Required behavior: - only the owning store subsystem may touch SQLite - only the owning inbox ingress/export subsystem may parse or write inbox JSONL - only the owning config-ingress subsystem may parse team `config.json` - only the owning watcher/reconcile subsystem may consume filesystem watch events or drive watch-triggered rescan/reconcile logic - only the owning transport subsystem may touch sockets - only the owning notifier/plugin subsystem may talk to agent processes - no business logic may live in I/O adapter code - no "just this one call site" bypasses are allowed - I/O-owning boundary traits are sealed by default; opening a boundary for external implementation requires explicit architectural approval - concrete I/O adapter types and constructors remain private unless a documented boundary contract requires wider visibility - violation of any ownership rule above is a direct QA failure for the Phase Q implementation line ### 21.3.1 Structured Error Boundaries - `REQ-CORE-BOUNDARY-002` Production Phase Q code must model fallible runtime behavior with discriminated error unions and explicit `Result` propagation. Required behavior: - fallible production paths must prefer typed error enums/unions over panic, `unwrap`, or `expect` - compile-time-visible error types must remain the primary enforcement mechanism for runtime failure handling - panic is reserved for invariant corruption or explicitly unreachable code paths, not routine I/O, parse, transport, or store failures - CLI, daemon, and core service layers must preserve structured error identity when translating between boundaries - the `AtmErrorCode` registry must not use wildcard or catch-all variants in place of specific codes - every public `AtmErrorCode` must document one recoverability class - the `AtmErrorCode` registry is centralized and read-only from the perspective of feature/service code; subsystems consume codes from the registry and do not mint local alternatives - violation of these structured-error rules is a direct QA failure for the Phase Q implementation line ### 21.4 Transport And Routing Model - `REQ-CORE-TRANSPORT-001` ATM must use one logical daemon API with two production transport implementations and one test transport. Required behavior: - same-host transport: Unix domain socket - cross-host transport: TCP/TLS - test transport: in-process `test-socket` implementation of the same protocol/interface for subsystem and daemon-boundary tests - these are implementations of one protocol/interface, not separate systems - socket receive logic must remain a small framed-message loop that: - reads one request frame - parses it into a qualified request enum/value - dispatches immediately to the owning handler boundary - returns a typed response - request-kind routing must live behind one dispatcher boundary with injectable typed handlers for request families - adding a new request family must not require embedding business logic into Unix-domain or TCP/TLS transport adapters - socket receive logic must not perform SQL, watcher, or notification business logic inline - any violation of this transport isolation rule is a direct QA failure for the Phase Q implementation line - subsystem and runtime tests must be able to replace Unix/TCP transport adapters with the `test-socket` transport without changing business logic - `REQ-CORE-TRANSPORT-001B` Request routing must live behind one explicit dispatcher boundary with injectable typed handlers. Required behavior: - transport adapters hand parsed qualified requests to the dispatcher - the dispatcher owns request-kind routing only - concrete request-family behavior lives in injectable handlers behind the dispatcher - adding a new request family must not require transport-adapter logic growth beyond decode + dispatch - any violation of this dispatcher/handler rule is a direct QA failure for the Phase Q implementation line - `REQ-CORE-TRANSPORT-001A` Filesystem watch/reconcile logic must remain a separate owned subsystem from transport, store, and notifier logic. Required behavior: - watch event ingestion, debounce, and reconcile triggering must stay behind one owned watcher/reconcile boundary - the watcher boundary may request work from ingress/store/notifier handlers, but it must not inline SQL, socket, or notification delivery logic - the transport boundary must not absorb watcher responsibilities - any violation of this watcher isolation rule is a direct QA failure for the Phase Q implementation line - `REQ-CORE-TRANSPORT-002` Cross-host traffic must be daemon-to-daemon only. Required behavior: - native agent/plugin code talks only to the local daemon - cross-host delivery happens only between daemons - remote routing uses an address form equivalent to `agent@team.host` - sender-side daemons must not write remote host inbox JSONL directly - `REQ-CORE-TRANSPORT-003` Remote delivery must not leave durable long-lived pending messages behind when a host is unreachable. Required behavior: - bounded transient retry is allowed for short intermittent failures - after the bounded retry window expires, the send fails - ATM must not keep a durable remote outbox that can leave stale messages queued for days - `REQ-CORE-TRANSPORT-004` Remote send success must require remote daemon acceptance within the bounded retry window. Required behavior: - sender-side daemons may record observability/audit information locally while attempting remote delivery - a remote send must not be reported as successfully delivered until the remote daemon accepts it - if the bounded retry window expires without remote acceptance, the send fails and must not leave durable delivered-message state behind - `REQ-CORE-TRANSPORT-005` The daemon runtime must use concrete timeout and capacity limits for transport/store/health operations. Required behavior: - same-host daemon request deadline: `3s` - per-leg TCP/TLS connect deadline: `5s` - per-leg TCP/TLS read/write deadline: `5s` - total remote retry budget: `30s` - SQLite `busy_timeout`: `1500ms` - ingest batch processing slice: `2s` - doctor health query deadline: `3s` - max concurrent accepts: `64` - max per-connection inflight requests: `32` - ingest queue depth: `1024` - retry queue depth: `256` - SQLite handle budget: `1..=4` - live status-cache cap: `4096` - saturation behavior must fail with typed errors or structured degradation, never silent drop ### 21.5 Claude Compatibility And Native Agent Path - `REQ-CORE-COMPAT-001` Claude inbox JSONL remains the required compatibility path for Claude context injection. Required behavior: - ATM-authored Claude inbox exports must remain Claude-native at the top level with ATM machine fields under `metadata.atm` - Claude-native external writes must be importable into SQLite through one owned ingress boundary - once team roster and pane mapping truth move to SQLite, ATM-owned post-send-hook payloads must carry the authoritative `recipient_pane_id` from roster truth when known - post-send hooks must be able to rely on that payload field instead of rediscovering pane mappings from local files once the Phase Q migration is complete - `REQ-CORE-COMPAT-002` Native agent/plugin traffic must not use JSONL. Required behavior: - native agent/plugin delivery and notification uses the daemon API instead of JSONL - the later agent plugin crate must align to this daemon API rather than introducing a parallel message transport ### 21.6 Lock Elimination Target - `REQ-CORE-LOCK-RETIRE-001` ATM mail correctness must stop depending on mailbox lock artifacts. Required behavior: - mailbox locks may remain only as transitional compatibility machinery for the interim file-based line - the Phase Q target architecture must eliminate mailbox-lock dependence from normal ATM mail correctness - Phase Q completion requires that stale lock artifacts can no longer wedge normal ATM mail flows ### 21.7 Test Strategy Constraints - `REQ-CORE-TEST-RUNTIME-001` Core Phase Q behavior must be testable without daemon process spawning. Required behavior: - daemon spawning is not part of the core test strategy - core service behavior must be testable in-process - transport/watch/runtime logic must be testable with fakes or in-process harnesses - no default test path may depend on daemon process lifecycle to validate ATM mail correctness ### 21.8 Observability Requirements - `REQ-CORE-OBS-002` Phase Q must keep structured observability first-class at both CLI and daemon boundaries. Required behavior: - CLI entry, daemon runtime, transport, ingest/export, and service orchestration must emit structured events through the shared `sc-observability` boundary - observability wiring must remain layered: - `atm` owns CLI bootstrap and presentation concerns - `atm-daemon` owns daemon/runtime event emission - `atm-core` owns ATM event and error models above the shared boundary - native plugins may emit plugin-local diagnostics, but daemon-owned runtime/transport/store/ingest events must be emitted by the daemon and not delegated to plugin code - observability must not be implemented as ad hoc println/debug output in production paths ### 21.8.1 Doctor Health Interface - `REQ-CORE-DOCTOR-002` The Phase Q runtime must expose a daemon health query interface consumable by `atm doctor`. Required behavior: - `atm doctor` remains a CLI command - daemon/runtime health information must be obtained through an explicit daemon-facing interface rather than direct CLI inspection of private daemon state - the health interface must be able to report at least: - daemon reachability - singleton ownership status - live status-cache summary - ingest backlog / degraded-ingest state when present - SQLite open/readiness state ### 21.9 QA Invariants - `REQ-CORE-QA-RUNTIME-001` Every QA pass for Phase Q must verify the daemon and boundary invariants. Required behavior: - impossible to run two active ATM daemons on one host - daemon unavailability after one auto-start attempt fails clearly with no hidden direct I/O fallback - every subsystem performs external I/O only through its owning trait boundary - production error handling uses typed `Result`/error-enum boundaries instead of panic/unwrap for fallible runtime paths - daemon/runtime code remains thin and does not accumulate business logic - daemon spawning is not the test strategy - SQLite remains the source of truth for mail and roster - live agent status remains runtime-owned state - structured `sc-observability` coverage remains present at both CLI and daemon layers - Claude compatibility export remains Claude-native top-level plus `metadata.atm`