# ATM CLI Project Plan

## 1. Goal

Implement the retained ATM CLI surface while migrating mail/runtime ownership
from filesystem JSON plus mailbox locks to SQLite plus a singleton daemon,
preserving `send`, `read`, `ack`, `clear`, `log`, `doctor`, `teams`, and
`members`.

The authoritative migration document is:
- [`docs/archive/file-migration-plan.md`](./archive/file-migration-plan.md)

This plan sequences the work. File-level migration decisions live in
[`docs/archive/file-migration-plan.md`](./archive/file-migration-plan.md).

Documentation organization and cleanup are governed by
[`documentation-guidelines.md`](./documentation-guidelines.md). As the docs are
restructured, product docs remain in `docs/` and crate-local detail moves into
`docs/atm/`, `docs/atm-core/`, `docs/atm-daemon/`, and
`docs/atm-rusqlite/`.

Phase-Q supersession note:
- earlier daemon-free phases in this plan remain historical execution records
- the current target line is Section 21 and the detailed design in
  `docs/plan-phase-Q.md`

Status:
- Phases 0 through P have executed on the retained rewrite line.
- Phases G and H are complete retained-command phases, closed through the
  shared observability and release-alignment work delivered in later phases.
- Phase K completed the shared `sc-observability` integration boundary.
- Phase L completed the retained release-surface and team-recovery closeout.
- Phase M completed mailbox locking and review-finding fixes.
- Phase N completed publish-replacement and distribution-parity planning and
  implementation merge work.
- Phase O completed the security and hardening follow-up line.
- Phase P implementation is merged; follow-up hardening remains open for
  `P.6` and later cleanup/fix branches, while `P.8` documentation
  reconciliation and the `P.9`/`P.10` lock-sentinel design and implementation
  work are complete on the merged Phase P line.
- Message schema ownership and metadata normalization are now implemented well
  enough for live shared-inbox adoption, while a separate ATM-native inbox
  remains deferred to a later version.
- Phase Q planning is active on the SQLite source-of-truth and daemon-boundary
  line; this phase supersedes mailbox-lock architecture as the target design.
- The current workspace still contains `crates/atm-core` and `crates/atm`
  only; `crates/atm-daemon` and `crates/atm-rusqlite` are introduced by the
  Phase Q implementation line.

## 2. Deliverables

- Rust workspace expanded from `crates/atm-core` + `crates/atm` to include
  `crates/atm-daemon` and `crates/atm-rusqlite`
- retained implementation of `send`, `read`, `ack`, `clear`, `log`,
  `doctor`, `teams`, and `members`
- SQLite-backed mail and roster source of truth
- singleton daemon runtime with one protocol, two production transport
  adapters, and one in-process `test-socket`
- elimination of mailbox-lock dependence from ATM mail correctness
- explicit two-axis workflow model with three display buckets
- task-linked message metadata with mandatory ack behavior
- structured errors with recovery guidance
- structured logs through `sc-observability`
- retained and new integration tests for the retained command surface
- explicit schema ownership docs for Claude Code, legacy ATM compatibility, and
  forward ATM metadata

## 3. Crates

The Phase Q target implementation is split across:

- `crates/atm-core`
- `crates/atm`
- `crates/atm-daemon`
- `crates/atm-rusqlite`

Crate-local scope detail is owned by:

- [`docs/atm-core/requirements.md`](./atm-core/requirements.md)
- [`docs/atm-core/architecture.md`](./atm-core/architecture.md)
- [`docs/atm/requirements.md`](./atm/requirements.md)
- [`docs/atm/architecture.md`](./atm/architecture.md)
- [`docs/atm-daemon/requirements.md`](./atm-daemon/requirements.md)
- [`docs/atm-daemon/architecture.md`](./atm-daemon/architecture.md)
- [`docs/atm-rusqlite/requirements.md`](./atm-rusqlite/requirements.md)
- [`docs/atm-rusqlite/architecture.md`](./atm-rusqlite/architecture.md)

## 4. Work Sequence

### Phase 0: Document Lock [COMPLETE]

Status summary:
- Requirements, architecture, and read-behavior documentation are locked, and
  the migration plan now lives in `docs/archive/`.
- This phase completed without a dedicated PR because it was finished before the
  current atm-core PR sequence began.

Finish and freeze:
- `requirements.md`
- `architecture.md`
- `read-behavior.md`
- `docs/archive/file-migration-plan.md`

Acceptance:
- workflow axes, display buckets, retained command surface, and observability boundary are consistent across all docs
- every retained or excluded source file needed for the retained commands is explicitly listed in `docs/archive/file-migration-plan.md`

### Phase A: `OBS-GAP-1` [COMPLETE]

Status summary:
- The `sc-observability` API gap was catalogued and closed before the ATM log
  and doctor work depends on it.
- This phase is historical context only; it is no longer the gating item for
  retained observability delivery.
- Delivered in PR #1.

Goal:
- verify and close the shared `sc-observability` API gap before ATM depends on it for `atm log` and `atm doctor`

Deliverables:
- ATM-side required capability list
- gap list against current `sc-observability`
- concrete API requests for `arch-obs`
- decision on ATM-owned port-boundary responsibilities versus shared observability responsibilities

Acceptance:
- shared plan exists for emit/query/follow/filter/health support
- no ATM-local ad hoc log query engine is needed

### Phase B: Core Skeleton [COMPLETE]

Status summary:
- The workspace, crate scaffolding, CLI command surface, and documentation gap
  closure were completed and merged.
- Delivered in PRs #2 and #3.

| Sprint | Scope | Required outcome |
| --- | --- | --- |
| B.1 | CLI skeleton | `atm` exposes the initial core messaging surface: `send`, `read`, `ack`, `clear`, `log`, `doctor` |
| B.2 | Documentation gap closure | lock the remaining send/read/clear requirements and architecture details before Phase C begins |

Create:
- workspace manifests
- `atm-core`
- `atm`
- placeholder module tree matching the architecture

Acceptance:
- workspace builds
- CLI help shows the initial core messaging surface: `send`, `read`, `ack`,
  `clear`, `log`, and `doctor`
- B.1 and B.2 are both complete before Phase C starts
- requirements and architecture lock the message id, read dedupe, and clear
  eligibility semantics needed for implementation

### Phase C: Low-Level Reuse [COMPLETE]

Status summary:
- Foundational reuse landed for mailbox schema alignment, config/path helpers,
  and the shared `AtmError` / `AtmErrorKind` model.
- Delivered in PRs #4 and #5.

Port retained foundational files first:
- home/path helpers
- config and bridge resolution
- address parsing
- text utilities
- schema types
- mailbox primitives
- hook identity

Acceptance:
- foundational unit tests pass
- no daemon references remain in foundational modules

### Phase D: Send Path [COMPLETE]

Status summary:
- The send service, CLI wiring, observability port adapter, and team-config
  validation are all implemented and merged.
- Delivered in PR #6.

Port send command and support files:
- identity resolution
- file policy
- summary generation
- mailbox append
- ack-required and task-linked message creation
- command output
- observability emission

Acceptance:
- historical pre-Phase-Q acceptance: `atm send` feature set worked without
  daemon support
- send JSON and human output match the documented contract

### Phase E: Read Path [COMPLETE]

Status summary:
- The read service now includes `IsoTimestamp`, seen-state handling, queue
  bucket filtering, and the required read-path transitions.
- Delivered in PR #7.

Port read command and support files:
- workflow axis classification
- display bucket mapping
- selection modes
- seen-state behavior
- timeout waiting
- legal state transitions
- command output

Acceptance:
- historical pre-Phase-Q acceptance: `atm read` feature set worked without
  daemon support
- workflow axes and display buckets match the requirements
- seen-state semantics match the documented contract

### Phase F: Ack And Clear Path [COMPLETE]

Status summary:
- Ack and clear flows are implemented, the remaining 30 RBP findings were
  closed, and CI isolation hardening was completed for the phase.
- Delivered in PRs #8, #9, and #10.

Port ack and clear command support files:
- acknowledgement transition handling
- reply emission
- clear eligibility computation
- clear dry-run reporting
- command output

Acceptance:
- historical pre-Phase-Q acceptance: `atm ack` feature set worked without
  daemon support
- `atm clear` removes only clearable messages
- pending-ack messages remain visible until acknowledgement

### Phase G: Log Path [UNBLOCKED - Phase K COMPLETE]

Status summary:
- The retained `log` command remains a command-phase deliverable, but concrete
  implementation is blocked until Phase K lands the real
  `sc-observability` adapter and shared query/follow integration.

Port and redesign the log command:
- injected observability port usage
- log query/filter/tail behavior
- command output
- integration tests

Acceptance:
- `atm log` works through shared `sc-observability` APIs
- level and field filtering work
- tail mode works
- emit failures remain best-effort for mail commands

### Phase H: Doctor Path [UNBLOCKED - Phase K COMPLETE]

Status summary:
- The retained `doctor` command remains a command-phase deliverable, but
  concrete implementation is blocked until Phase K lands the real
  `sc-observability` health/query integration.

Port and redesign the doctor command:
- local config/path checks
- hook identity checks
- mailbox readiness checks
- observability health and query-readiness checks
- command output

Acceptance:
- historical pre-Phase-Q acceptance: `atm doctor` worked without daemon
  support

### Phase I: Cleanup And Hardening

Delete:
- daemon-dependent crates and helpers not retained
- leftover imports from daemon-era surfaces

Add:
- integration tests
- snapshot tests
- config/schema hardening for legacy team records with deterministic recovery
  and precise diagnostics
- documentation polish

Acceptance:
- implementation matches `requirements.md`, `architecture.md`,
  `read-behavior.md`, and `docs/archive/file-migration-plan.md`

### Phase J: Message Schema Normalization [COMPLETE]

Status summary:
- schema ownership, compatibility, and forward metadata rules are now
  documented
- the current live design continues to use the shared Claude inbox surface and
  passed J.5 live validation
- a separate ATM-native inbox is explicitly deferred until after the current
  design is live and proven
- no J.5 runtime blocker was found that forces an immediate inbox split

Goal:
- make the shared inbox design safe to run live by clarifying schema ownership,
  deprecating new ATM-only top-level fields, and defining the forward
  metadata-based ATM schema

Execution model:
- this phase is implemented as a coordinated multi-sprint stream owned by
  `team-lead`
- `team-lead` should orchestrate the sprint sequence, worktree assignments, and
  review hand-offs using the `/codex-orchestration` skill
- sprint execution should not assume a separate ATM-native inbox; all work in
  this phase targets the current shared inbox design

Deliverables:
- explicit schema ownership docs:
  - Claude Code-native schema
  - legacy ATM read-compatibility schema
  - forward ATM metadata schema
- enforcement models for locally owned schema docs
- requirements and architecture rules for:
  - legacy read compatibility
  - metadata-only ATM machine fields going forward
  - ULID-based ATM message identifiers
  - timestamp derivation from ULID creation time
  - additive enrichment of Claude-native messages with ATM metadata
- implementation plan for the initial dedup work:
  - PR #18 idle-notification receiver-side dedup using the Claude-native idle
    payload in `text`
  - consolidation of ATM `message_id` surface canonicalization rules across
    read, ack, and clear
  - migration plan for ATM-authored repair/alert dedup toward `metadata.atm`
- next-version deferral note for a separate ATM-native inbox

Completed sprints:

- `J.1` Schema Ownership Lock
  - land the production schema docs and local enforcement models
  - add source-code and unit-test references back to the owning schema docs
  - acceptance: no ambiguity remains about Claude-native vs ATM-owned vs
    legacy ATM read-compat fields

- `J.2` Native Idle Dedup Implementation
  - implement PR #18 receiver-side idle-notification dedup against the
    Claude-native JSON payload stored in `text`
  - remove or reject any implementation that tries to redefine idle notices as
    an ATM-owned native top-level schema
  - acceptance: at most one unread idle notification per sender remains visible
    in an inbox, with fixtures and tests aligned to the Claude-native schema

- `J.3` Surface Canonicalization Consolidation
  - centralize `message_id` dedup logic used by read, ack, and clear
  - keep current legacy top-level `message_id` behavior read-compatible while
    documenting the later move to `metadata.atm.messageId`
  - acceptance: one shared dedup contract is used across operator-facing
    mailbox surfaces

- `J.4` ATM Alert Metadata Migration Plan
  - migrate the design for ATM-authored repair notices from ad hoc top-level
    fields toward `metadata.atm`
  - explicitly preserve legacy top-level `atmAlertKind` and
    `missingConfigPath` as read-compatible until the runtime migration sprint
    lands
  - keep current alert writes/read-compat behavior stable until the migration
    sprint lands
  - acceptance: requirements and architecture specify the forward metadata
    placement for ATM alert/dedup fields without breaking legacy reads

- `J.5` Live Shared-Inbox Validation
  - exercise the documented shared-inbox design in live/manual flows before any
    ATM-native inbox redesign is considered
  - confirm Claude-context projection limitations, enrichment expectations, and
    ack/dedup operator workflows against real inbox files
  - acceptance: the current shared-inbox design is proven usable enough to
    defer ATM-native inbox work to a later version
  - delivered in:
    [`docs/atm-core/design/live-shared-inbox-validation.md`](./atm-core/design/live-shared-inbox-validation.md)

Acceptance:
- schema ownership is explicit in requirements and architecture
- legacy ATM top-level fields are documented as read-compatible but deprecated
  for new writes
- forward ATM metadata schema requires ULID-based ATM message identifiers
- PR #18 idle-notification dedup is explicitly represented in the implementation
  plan as a Claude-native schema-following sprint
- the phase is organized into explicit sprints orchestrated by `team-lead`
  using `/codex-orchestration`
- the current architecture explicitly defers a separate ATM-native inbox until
  a later version

### Phase K: `sc-observability` Integration [COMPLETE]

Status summary:
- ATM now uses the shared `sc-observability` stack for retained emit, query,
  follow, and health behavior
- `atm log` and `atm doctor` are delivered on the shared stack with ATM-owned
  boundary types and error-code mapping
- the remaining follow-on work is release-alignment and post-1.0 feature
  adoption, tracked in Phase L

Goal:
- integrate ATM with the current shared `sc-observability` logging/query/health
  surface in a production-ready way before resuming retained `log` and
  `doctor` delivery

Execution model:
- this phase is implemented as a coordinated multi-sprint stream owned by
  `team-lead`
- `team-lead` should orchestrate the sprint sequence, worktree assignments, and
  review hand-offs using the `/codex-orchestration` skill
- the phase uses the ATM-owned adapter/boundary documented in:
  [`docs/atm-core/design/sc-observability-integration.md`](./atm-core/design/sc-observability-integration.md)
- until `sc-observability` is published, local and CI builds may consume the
  shared crates from a sibling checkout using a repo-local Cargo patch/path
  strategy; committed ATM docs and scripts must not require user-specific
  absolute paths

Planned sprints:

- `K.1` Toolchain And Dependency Alignment
  - align ATM to the shared Rust toolchain floor and current stable pin
  - define the pre-publish local dependency strategy used in developer builds
    and CI
  - land `rust-toolchain.toml`, repo/CI toolchain pinning, and
    `docs/atm-core/dev/pre-publish-deps.md`
  - acceptance: ATM toolchain/docs/CI strategy is explicit and matches the
    shared repo dependency floor

- `K.2` Observability Port Expansion
  - expand the `atm-core` boundary from emit-only to emit/query/follow/health
  - keep `sc-observability` types out of `atm-core` public APIs
  - introduce the single ATM-owned error-code registry in `atm-core` and wire
    it into `AtmError`
  - acceptance: `atm-core` owns the projected ATM request/result types and a
    synchronous tail session boundary, and the error-code registry is centrally
    defined

- `K.3` Concrete Adapter Bootstrap
  - replace the local tracing-only `atm` implementation with a real
    `sc-observability` adapter
  - initialize the shared logger once per CLI process and inject it into
    `atm-core`
  - add terminal failure logging for bootstrap, parse, and core-service error
    paths
  - acceptance: retained mail commands emit through the shared logger and
    preserve best-effort behavior, and failure diagnostics carry stable ATM
    error codes

- `K.4` `atm log` Delivery On Shared Query/Follow
  - implement the retained `log` command over `Logger::query(...)` and
    `Logger::follow(...)`
  - acceptance: snapshot/tail/filtering behavior works through the shared log
    store with integration coverage

- `K.5` `atm doctor` Delivery On Shared Health
  - implement the retained `doctor` command over shared logging/query health
  - acceptance: doctor integration tests cover healthy, unavailable, and
    degraded adapter states; each state produces a structured `DoctorReport`
    with a stable ATM error code from `docs/atm-error-codes.md` when
    applicable

- `K.6` Integration And Live Validation
  - close the command-test gap for observability consumer paths and run one
    live/manual validation pass against a real ATM home
  - close the error-logging gap by verifying CLI/bootstrap/service failures and
    degraded recovery warnings all emit stable ATM-owned error codes
  - acceptance: `atm log` (snapshot, tail, filter) and `atm doctor` are tested
    against the real `sc-observability` adapter in at least one live
    validation pass, and the results are documented in
    `docs/atm-core/design/live-observability-validation.md`

Acceptance:
- ATM no longer depends on a local tracing-only observability adapter
- `atm-core` owns an explicit emit/query/follow/health boundary over shared
  observability crates
- local and CI builds use the same documented pre-publish shared-crate
  dependency strategy
- `atm log` and `atm doctor` are implemented on the shared logging/query/health
  stack
- observability command integration coverage exists for snapshot, tail, filter,
  and doctor readiness flows
- any generic shared-crate usability gaps discovered during implementation are
  filed upstream in `sc-observability`

### Phase L: 1.0 Alignment And Release Surface Cleanup [COMPLETE]

Status summary:
- Phase K delivered the full sc-observability integration against a pre-publish
  local `[patch.crates-io]` override
- Sprint K-CRATES-IO-1 (2026-04-06) removed the override and switched ATM to
  the published `sc-observability = "1.0.0"` on crates.io; CI passed on all
  platforms; this sprint completed the earlier crates.io cutover work, which
  is now tracked historically under `K-CRATES-IO-1` rather than as an open
  Phase L sprint
- sc-observability 1.0.0 ships issues #55 (ConsoleSink::stderr), #57 (fault
  injection), and #21 (file sink path migration) — all confirmed shipped in
  PR #58 of sc-observability
- `L.1` through `L.8` therefore proceed directly against the published
  crates.io release with no local override required
- completed sprint record:
  - `L.1` complete on `feature/pL-s1-stderr-routing` at
    `a84ef5767813a9f604f84d697874cee74e5689e4`
  - `L.2` complete on `feature/pL-s2-fault-injection` / PR #51 at
    `b051c07269a2290315ff3295d728a5ee5c23f153`
  - `L.3` complete on `feature/pL-s3-file-sink-migration` / PR #52 with the
    current branch tip carrying the final fix-r1 closure for the live
    validation and status-summary findings
  - `L.4` complete on `feature/pL-s4-public-api-cleanup` at
    `4304d825ff6dddc52ddc21e08f5d2bb3ead795dc`
  - `L.5` complete on `feature/pL-s5-construction-ergonomics` at
    `512dfa4d89ac71307ef7324f64dffb67d5189cc3`
  - `L.6` complete on `feature/pL-s6-release-closeout` / PR #56 at
    `341e28c1f7175f9890a5a1d5606b64e0ce816d52`
  - `L.7` complete on `feature/pL-s-atm-toml-config` / PR #58, merged to
    `integrate/phase-L` at `5cd266d`, with final branch tip
    `fe467af27f3f7e0ac5280fb80e72201af99f9d75` carrying the pre-merge
    completion record fix after QA-2 PASS
  - `L.8` complete on `feature/pL-s8-team-recovery` / PR #53, merged to
    `integrate/phase-L` at `18aaa9a`

Goal:
- finish the published `sc-observability` 1.0 follow-on work and close the
  remaining retained release-surface gaps required for initial ATM release

Execution model:
- this phase is implemented as a coordinated multi-sprint stream owned by
  `team-lead`
- `team-lead` should orchestrate the sprint sequence, worktree assignments, and
  review hand-offs using the `/codex-orchestration` skill
- the Phase K adapter boundary remains the governing implementation boundary;
  Phase L refines the ATM-side integration against the final 1.0 shared crate
  behavior and closes retained release-surface gaps rather than redefining
  crate ownership
- the detailed ATM-side 1.0 follow-on decisions are documented in:
  [`docs/atm-core/design/sc-obs-1.0-integration.md`](./atm-core/design/sc-obs-1.0-integration.md)
- all sprints use `sc-observability = "1.0.0"` from crates.io directly; no
  local `[patch.crates-io]` override is required or permitted

Planned sprints:

- `L.1` `ConsoleSink::stderr()` Integration
  - goal: adopt upstream issue `#55` so CLI-facing retained logs can target
    stderr when appropriate without polluting normal stdout command output
  - key tasks:
    - wire `ConsoleSink::stderr()` into `CliObservability`
    - add an explicit CLI routing switch such as `--stderr`, or a clearly
      documented TTY-aware auto-routing rule, while preserving the current
      stdout path as the default compatibility behavior unless the chosen
      routing rule says otherwise
    - keep the ATM-owned adapter boundary intact; no `sc-observability` types
      leak into `atm-core`
  - tests:
    - verify stderr mode writes retained console output to stderr
    - verify the normal stdout path remains unchanged when stderr routing is
      not selected
    - keep existing retained-log query/follow tests green
  - dependency note:
    - uses `sc-observability = "1.0.0"` from crates.io directly

- `L.2` Fault Injection For Live Health Validation
  - goal: adopt upstream issue `#57` and close the real-adapter validation gap
    identified in `docs/atm-core/design/live-observability-validation.md`
  - key tasks:
    - use the new shared public fault-injection surface to induce degraded and
      unavailable retained-sink states through the real adapter
    - extend the live validation report so healthy, degraded, and unavailable
      paths are all exercised against the shared crate rather than only through
      ATM-local deterministic doubles
    - keep deterministic ATM integration tests as the fast/stable regression
      layer; the new fault-injected live path supplements them
  - tests:
    - end-to-end `atm doctor` coverage verifies degraded and unavailable states
      through the real shared adapter path
    - live/manual validation is updated to record the induced degraded and
      unavailable runs explicitly
  - dependency note:
    - uses `sc-observability = "1.0.0"` from crates.io directly

- `L.3` File Sink Path Migration
  - goal: align ATM with upstream issue `#21` so ATM stops assuming the older
    retained-log file layout
  - key tasks:
    - update any ATM-side path assumptions to the new
      `<log_root>/logs/<service_name>.log.jsonl` layout
    - verify retained query/follow and doctor health behavior against the
      updated shared file-sink location
    - document any operator-facing path changes where they affect diagnostics
      or manual validation
    - replace the unbounded tail-reader helper in `crates/atm/tests/log.rs`
      with a wall-clock timeout so retained follow coverage cannot hang on
      Windows or other slow CI environments
    - close `PRR-002` by explicitly keeping the ATM observability health
      contract closed at `healthy`, `degraded`, and `unavailable` for the
      initial release
    - close the L.1 traceability gap `ATM-QA-002` by making the final
      `--stderr-logs` contract a canonical Phase L reference
  - tests:
    - retained-log integration tests pass against the new path layout
    - live validation confirms the active log path and query behavior against
      the migrated sink location
  - dependency note:
    - uses `sc-observability = "1.0.0"` from crates.io directly

- `L.4` Public API Cleanup
  - goal: remove raw serialization-format leakage from the `atm-core` public
    observability boundary while preserving centralized JSON handling inside
    `atm-core`
  - key tasks:
    - replace public `serde_json::Value` / `Map<String, Value>` usage in
      observability-facing `atm-core` types with the ATM-owned field model:
      - `LogFieldKey`
      - `AtmJsonNumber`
      - `LogFieldValue`
      - `LogFieldMap`
    - update `LogFieldMatch` to use `LogFieldKey` + `LogFieldValue`
    - update `AtmLogRecord.fields` to use `LogFieldMap`
    - keep JSON/JSONL parsing, validation, degradation, and repair centralized
      in `atm-core` rather than pushing that logic into CLI or sibling crates
    - keep all raw `serde_json` translation at the `atm-core` boundary edge;
      CLI and sibling crates must not need to manipulate raw retained-log JSON
      values directly
    - preserve the published CLI JSON output behavior after the public type
      cleanup
  - closes:
    - `INTEROP-001`
    - `BP-003`
  - tests:
    - unit coverage for `LogFieldKey`, `AtmJsonNumber`, `LogFieldValue`, and
      `LogFieldMap` serde/validation behavior
    - unit coverage for adapter mapping between ATM-owned field types and the
      shared query/result values
    - integration coverage proving CLI JSON output remains stable for
      `atm log snapshot --json`, `atm log filter --json`, and
      `atm log tail --json`
  - dependency note:
    - can proceed in parallel with `L.5` once the Phase K crates.io baseline
      from `K-CRATES-IO-1` is present

- `L.5` Construction And Boundary Ergonomics
  - goal: clean up the remaining release-surface ergonomics without forcing
    speculative refactors that are not yet justified
  - key tasks:
    - add a structured construction API:
      - `CliObservability::new(home_dir, CliObservabilityOptions)`
    - keep `init(...)` only as a delegating CLI bootstrap helper
    - define `CliObservabilityOptions` as the single supported construction
      contract for production bootstrap and tests
    - keep dynamic dispatch (`Box<dyn ObservabilityPort + Send + Sync>`) unless
      implementation proves a concrete release defect
    - keep the current sealed-trait pattern unless implementation proves a
      concrete encapsulation defect
    - record the explicit disposition for `DoctorCommand` injectability:
      - deferred for initial release unless a concrete testing or feature need
        appears during implementation
  - closes:
    - `UX-001`
    - `BP-004`
    - disposition of `UX-002`
    - disposition of `BP-001`
    - disposition of `UNI-003`
  - tests:
    - constructor coverage for default bootstrap and stderr-routing bootstrap
    - no-regression coverage for existing `atm doctor` / `atm log` bootstrap
      behavior after the construction refactor
  - dependency note:
    - may run in parallel with `L.4`, or immediately after it if the public
      API cleanup changes the preferred construction boundary

- `L.6` Release Closeout
  - goal: finish the remaining operator-facing and release-readiness validation
    against the published shared crate behavior
  - key tasks:
    - close the two remaining release-critical identity carry-forward findings:
      - `ATM-QA-001`
        - remove obsolete config identity fallback from runtime identity
          resolution
      - `ATM-QA-002`
        - add `atm doctor` drift reporting for obsolete `[atm].identity`
          configuration
    - verify file sink path alignment against upstream issue `#21`
    - rerun full ATM observability validation on the published
      `sc-observability = "1.0.0"` release
    - close any remaining documentation traceability gaps uncovered during the
      Phase L consistency review
  - result:
    - release-ready ATM observability signoff for initial release
  - dependency note:
    - depends on `L.1` through `L.5` being complete so release validation runs
      against the final observability surface
    - the two release-critical identity items above were pulled forward from
      earlier `L.7` planning because they block release signoff; the remaining
      broader `.atm.toml` semantics work stays in `L.7`

- `L.7` Team Baseline And Identity Source Cleanup
  - goal: align ATM config semantics with multi-agent team launches by moving
    shared team expectations into `.atm.toml` while removing repo-local
    identity fallback behavior and defining cross-team alias handling
  - key tasks:
    - add ATM-owned `team_members` support under the `[atm]` config section as
      the baseline roster that should always be present in `config.json`
    - retain ATM-owned `aliases` support under the `[atm]` config section for
      shorthand addressing of canonical members, especially cross-team
      communication with roles such as `team-lead`
    - add ATM-owned post-send-hook automation support under the `[atm]` config
      section
    - historical note:
      - the release-critical `[atm].identity` fallback removal and doctor drift
        warning were pulled forward and closed in `L.6`
      - the remaining `L.7` scope covers broader baseline-roster, alias, and
        post-send-hook semantics
    - keep `[atm].default_team` as the shared team default and continue to
      ignore `[rmux]` and future `[scmux]` sections from `atm-core`
    - update `atm doctor` to compare `[atm].team_members` against
      `config.json.members`
      - missing baseline members are findings
      - extra runtime members in `config.json` are allowed
    - update `atm doctor` roster output to show all `config.json` members with
      baseline members first, `team-lead` first among the baseline set, and
      extra runtime members afterward
    - define alias resolution and projection rules:
      - aliases are accepted as input shorthand only
      - recipient aliases resolve immediately to canonical member names before
        validation, self-send checks, and mailbox lookup
      - same-team messages keep current canonical `from` behavior
      - cross-team messages may project the sender alias in `from` for
        Claude-facing ergonomics
      - whenever alias-oriented `from` projection is used, canonical sender
        identity must also be persisted in `metadata.atm.fromIdentity` and
        must drive validation, self-send checks, routing, and audit behavior
    - define post-send-hook rules:
      - the hook runs only after a successful non-`dry-run` send
      - `[[atm.post_send_hooks]]` is the supported hook shape
      - each rule binds one recipient selector and one command argv
      - `recipient = "*"` matches all recipients
      - matching rules execute in config order
      - legacy flat hook keys and `post_send_hook_members` are rejected with
        migration guidance to the new rule shape
      - path-like `command[0]` values resolve from the directory that owns the
        discovered `.atm.toml`
      - bare executable names use normal `PATH` lookup
      - the hook must execute with that same config-root directory as its
        working directory
      - the hook inherits the process environment and also receives one
        ATM-owned JSON payload in `ATM_POST_SEND`
      - the `ATM_POST_SEND` payload must contain:
        - `from`
        - `to`
        - `sender`
        - `recipient`
        - `team`
        - `message_id`
        - `requires_ack`
        - optional `task_id`
        - optional `recipient_pane_id` when authoritative roster truth knows it
      - the hook may optionally return one structured stdout object with
        `level`, `message`, and optional `fields`; ATM logs it on a best-effort
        basis and ignores absent/invalid output
      - hook decision logging must make recipient-rule evaluation easy to
        troubleshoot
      - expected recipient non-match is silent
      - hook failure or timeout must never roll back the send; ATM reports the
        failure as post-send-hook diagnostics only
  - `FIX-82` post-send hook redesign
    - scope: replace the old multi-axis hook filter design with
      recipient-scoped `[[atm.post_send_hooks]]` rules, hard reject retired
      hook keys, simplify hook diagnostics, and keep only execution-failure
      warnings
    - acceptance:
      - retired flat hook keys produce hard config errors with migration
        guidance
      - matching recipient rules execute in config order
      - expected recipient non-match is silent
      - `ATM_POST_SEND` includes sender/recipient/team context without
        `hook_match` booleans
      - once roster truth migrates to SQLite, `ATM_POST_SEND` also carries the
        authoritative `recipient_pane_id` when known so hooks can consume it
        directly
      - actionable warnings exist for configured-but-skipped hooks
      - docs, help text, and tests cover the migration and new semantics
    - reserve `atm-identity-missing@<team>` for ATM-generated
      repair/diagnostic notices only; it must not become a normal sender
      identity fallback
  - closes:
    - config identity/source ambiguity for multi-agent shared repos
    - baseline-roster visibility gap in `atm doctor`
    - cross-team alias ambiguity for baseline roles such as `team-lead`
    - missing sender-scoped post-send automation contract for repo-root helper
      scripts
    - duplicate permanent-member spawn planning gap for future team-lead /
      hook-driven orchestration
  - dependency note:
    - independent of `L.1` through `L.3`; it may proceed in parallel once the
      Phase L config and identity rulings are locked

- `L.8` Retained Team Recovery Surface
  - goal: restore the minimum `teams` and `members` command surface required
    for initial release, backup/restore operations, and team-repair workflows
  - key tasks:
    - implement bare `atm teams` to list locally discovered teams under
      `ATM_HOME`
    - implement `atm members` as a local team-roster view suitable for restore
      verification and operator checks without requiring daemon or hook state
    - implement `atm teams add-member` as the retained local roster repair path
      for missing members after restore or config drift
    - implement `atm teams backup` as a timestamped local snapshot of
      `config.json`, team inboxes, and the ATM team task bucket
    - implement `atm teams restore` with a dry-run path and explicit restore
      safety rules:
      - preserve the current team-lead entry and `leadSessionId`
      - restore only missing non-lead members
      - clear runtime-only fields such as session/activity/pane state on
        restored members
      - restore non-lead inbox files from the chosen snapshot
      - recompute `.highwatermark` from the maximum restored task id
      - fail cleanly on missing or malformed backup material without partial
        restore
    - keep broader historical team lifecycle/orchestration commands out of
      scope:
      - `spawn`
      - `join`
      - `resume`
      - `update-member`
      - `remove-member`
      - `cleanup`
  - tests:
    - `teams` lists discovered teams deterministically
    - `members` lists the current local roster deterministically
    - `add-member` rejects duplicates and creates any required local inbox
      state atomically
    - `backup` produces a complete snapshot of team config, inboxes, and ATM
      task files
    - `restore --dry-run` reports members/inboxes/tasks that would be restored
    - `restore` preserves team-lead / `leadSessionId`, clears runtime-only
      restored-member state, and recomputes `.highwatermark` to the maximum
      restored task id
  - dependency note:
    - depends on the Phase L config semantics from `L.7`, but does not depend
      on the observability-specific `L.1` through `L.6` work

Recovered Phase K carry-in mapping and later planning carry-ins:

- `ATM-QA-K-001` and `ATM-QA-K-002` are canonical Phase L.2 work items
- `RUST-QA-001`, `PRR-002`, and the L.1 QA traceability gap `ATM-QA-002` are
  canonical Phase L.3 work items
- `INTEROP-001` and duplicate `BP-003` are canonical Phase L.4 work items
- `UX-001` and duplicate `BP-004` are canonical Phase L.5 work items
- `UX-002`, `BP-001`, and `UNI-003` are Phase L.5 decision/disposition items;
  each must either land as implementation work or be explicitly deferred by a
  documented Phase L architectural ruling
- config identity/source cleanup and baseline team roster enforcement are
  canonical Phase L.7 work items identified by the phase-close planning review
  on 2026-04-07 rather than by numbered Phase K implementation findings
- the retained `teams` / `members` release-gap closure is canonical Phase L.8
  work identified during the same release-planning review and backup/restore
  procedure audit

Acceptance:
- Phase L cannot close until:
  - `L.2` through `L.8` are complete
  - every mapped carry-in item above is either implemented or explicitly
    deferred by a documented Phase L architectural decision
  - retained observability behavior is validated against the published
    crates.io dependency `sc-observability = "1.0.0"`
  - the retained release-critical team recovery surface (`teams`, `members`,
    `teams add-member`, `teams backup`, `teams restore`) is implemented and
    validated
- the phase must preserve ATM’s initial-release focus on agent messaging and
  must not absorb future hook/`schooks` orchestration concerns prematurely

## 5. Hard Rules

- Removing the daemon does not authorize removing retained mail functionality.
- File-level migration decisions must be explicit.
- Every retained useful source file must appear in
  `docs/archive/file-migration-plan.md`.
- Every reviewed non-retained file must also appear there with a `do not copy` decision.
- Workflow-axis transitions must be enforced by code structure, not only by tests.
- Display bucket behavior must remain separate from the canonical two-axis workflow model.
- Task-linked mail must be ack-required from creation time.
- Generic logging query/follow/filter behavior should live in `sc-observability` where possible, not in ATM-specific code.
- Persisted config/schema compatibility issues must recover at the narrowest
  safe scope, and identity/routing fields must never be guessed.
- Missing team config remains distinct from malformed team config; only the
  documented send fallback may bypass it, and repeated repair notifications
  must be deduplicated by unresolved condition.

Cross-document invariants that must stay locked during implementation:
- `taskId` implies ack-required send behavior
- displayed messages always persist `read = true`
- pending-ack messages remain actionable until acknowledged
- `atm clear` never removes unread messages
- `atm clear` never removes pending-ack messages
- `atm read --timeout` returns immediately when the requested selection is already non-empty

## 6. Done Definition

The rewrite is ready when:
- `atm send` works through the documented production runtime path
- `atm read` works through the documented production runtime path
- `atm ack` works through the documented production runtime path
- `atm clear` works through the documented production runtime path
- `atm log` works through shared observability APIs
- `atm doctor` works as a local diagnostics command with daemon/runtime
  visibility in the Phase Q target architecture
- `atm teams` provides the retained local team recovery surface
- `atm members` provides retained local roster verification
- daemon auto-start-when-absent path is exercised in bounded integration
  testing
- `ATM_POST_SEND.recipient_pane_id` is sourced from SQLite roster truth when
  known
- retained command behavior is preserved, and any Phase Q runtime-shape changes
  are intentionally documented
- task-linked mail remains pending until acknowledged
- the file-by-file migration plan is complete enough to implement directly
- the retained command tests pass against the new crate layout

## 7. Documentation Review Checks

Before implementation starts, the docs should be reviewed with these checks:
- every retained or rejected source file referenced by the retained command
  surface appears in `docs/archive/file-migration-plan.md`
- `requirements.md`, `architecture.md`, and `read-behavior.md` agree on the two-axis model, three display buckets, and legal transitions
- `requirements.md`, `architecture.md`, and `read-behavior.md` agree on `--since`, `--since-last-seen`, `--no-since-last-seen`, `--no-update-seen`, and `--timeout`
- `requirements.md`, `architecture.md`, `docs/atm/requirements.md`, and
  `docs/atm/architecture.md` agree on the retained release surface:
  `send`, `read`, `ack`, `clear`, `log`, `doctor`, `teams`, `members`
- `docs/archive/file-migration-plan.md` remains the source of truth for the
  initial core migration set (`send`, `read`, `ack`, `clear`, `log`,
  `doctor`), and the release-only `teams` / `members` expansion is explicitly
  tracked in Phase `L.8`


### Phase M: Mailbox Locking And Code Review Fixes

Status: COMPLETE

Sprint completion records:
- `M.1` complete on `feature/pM-s1-mailbox-locking` / PR #60, merged to
  `integrate/phase-M` at `760e904`
- `M.2` complete on `feature/pM-s2-review-fixes` / PR #61, merged to
  `integrate/phase-M` at `c9fb9fa`

Goal: close all blocking and important code-review findings from the Phase L review before
declaring the codebase 1.0-ready. ARCH-CR-003 and ARCH-CR-004 are closed in L.7 (not Phase M scope).

Phase M finding registry:
- `BP-ECR-001` Public error-surface documentation gap
  - finding: public `AtmResult` / `Result<_, AtmError>` functions in the
    affected modules do not consistently declare `# Errors` sections with
    concrete `AtmErrorCode` coverage
  - resolution criteria:
    - the explicit M.2 audit inventory is reviewed
    - every public `Result`-returning function in that inventory has a `# Errors`
      section
    - each section lists the applicable `AtmErrorCode` variants
- `BP-ECR-002` Operator recovery guidance gap
  - finding: operator-actionable failures still exist without
    `.with_recovery()` guidance
  - resolution criteria:
    - the explicit M.2 recovery audit inventory is grep-reviewed
    - bare operator-actionable construction sites are updated or explicitly
      excluded as non-operator-facing invariant failures
- `BP-ECR-003` Error-display causal-context gap
  - finding: `AtmError::Display` risks flooding normal CLI/log output with
    multi-kilobyte backtraces when full diagnostic detail is only needed on
    demand
  - resolution criteria:
    - `Display` remains concise and does not append the captured backtrace
    - full backtrace access remains available via Debug output and a dedicated
      accessor
    - tests cover both backtrace-present and backtrace-absent branches
- `BP-ECR-004` Deprecated identity migration-doc gap
  - finding: obsolete `[atm].identity` behavior and migration guidance are not
    documented consistently enough for operator repair
  - resolution criteria:
    - config docs contain a `# Deprecated` section for `[atm].identity`
    - docs state it is ignored for runtime identity resolution
    - docs reference `ATM_WARNING_IDENTITY_DRIFT` and the `ATM_IDENTITY`
      migration path
- `BP-ECR-005` Panic-on-untrusted-input gap
  - finding: `normalize_json_number(...)` still panics on malformed exponent
    input instead of degrading safely
  - resolution criteria:
    - the `.expect(...)` is replaced with graceful fallback returning the raw
      string
    - warning-level logging documents the degradation path
    - malformed-input regression tests pass without panic
- `BP-ECR-006` Shared identity-error contract gap
  - finding: `resolve_actor_identity` remains triplicated, which risks drift in
    identity-resolution errors and recovery guidance
  - resolution criteria:
    - `resolve_actor_identity` exists in one shared `identity/mod.rs` location
    - `ack`, `clear`, and `read` call the shared helper
    - behavior remains unchanged except for the shared implementation boundary

Integration branch: `integrate/phase-M` (branched from `integrate/phase-L`)

Execution model: codex-orchestration — arch-ctm is sole developer, sequential sprints,
quality-mgr runs QA in parallel. See `/codex-orchestration` skill.

---

#### M.1 — Mailbox Locking

Branch: `feature/pM-s1-mailbox-locking` (from `integrate/phase-M`)

Deliverables:
- Add `fs2` dependency to `crates/atm-core/Cargo.toml`
- Implement `lock.rs` with `MailboxLockGuard` and `acquire()` using `fs2::FileExt::try_lock_exclusive()`
  with bounded retry loop (50ms intervals, 5s default timeout)
- Add `MailboxLockTimeout` error code to `error_codes.rs`
- Add `MailboxLock` error kind to `error.rs` with recovery guidance
- Implement `locked_read_modify_write()` in `mailbox/mod.rs` for single-file append paths
- Refactor `append_message` to use `locked_read_modify_write`
- Add deterministic multi-lock acquisition for `read`, `ack`, and `clear` so those commands
  lock every discovered source inbox before their first `read_messages(...)` call and hold the
  locks through final writeback
- Make the multi-lock contract explicit in code:
  - finish source-file discovery before the first inbox read
  - exclude files missing at discovery time from the lock set
  - dedupe duplicate paths before acquisition
  - sort the set by canonical path string before acquisition
  - apply one total timeout budget to the full set
  - if any acquisition fails, release all earlier locks and abort before any
    source-file read or mutation
  - if a discovered file disappears before `load_source_files(...)` completes,
    abort the command with an operator-actionable file-read error and persist
    no partial state
- Ensure the missing-config team-lead notice path benefits from the same `append_message` lock
- Audit the shared mutable JSON/JSONL/state files touched by M.1 and route each through an
  atomic temp-file + fsync + rename style helper rather than an in-place rewrite path
- Centralize any new atomic-replacement logic behind one `atm-core` helper boundary rather than
  duplicating temp-file + rename code at individual call sites
- Lock sentinel: `{inbox_path}.lock` (zero-byte, created lazily)

Files to modify:
- `crates/atm-core/Cargo.toml` (add fs2)
- `crates/atm-core/src/mailbox/lock.rs` (implement from placeholder stub)
- `crates/atm-core/src/mailbox/mod.rs` (add `locked_read_modify_write`, refactor `append_message`)
- `crates/atm-core/src/error.rs` (add `MailboxLock` kind)
- `crates/atm-core/src/error_codes.rs` (add `MailboxLockTimeout`)
- `crates/atm-core/src/read/mod.rs` (acquire sorted source-file locks before `load_source_files`, hold through writeback)
- `crates/atm-core/src/ack/mod.rs` (acquire sorted source-file locks before `load_source_files`, hold through transition + reply persist)
- `crates/atm-core/src/clear/mod.rs` (acquire sorted source-file locks before `load_source_files`, hold through set replacement)

Tests required:
- Unit: `lock.rs` acquire/release, timeout, stale sentinel tolerance
- Unit: `locked_read_modify_write` basic operation
- Integration: concurrent append from two threads does not lose messages
- Integration: concurrent `send` and `ack`/`clear` against the same inbox or
  overlapping origin set preserve correctness and do not silently lose updates
- Integration: multi-source `read`/`ack`/`clear` acquire locks in deterministic path order
- Integration: lock timeout produces `MailboxLockTimeout` error code
- Integration: if lock N of M fails, every earlier lock is released and the
  command aborts before the first source inbox read
- Integration: one total timeout budget applies across the full multi-lock set
  instead of resetting per file
- Integration: duplicate discovered paths collapse to one lock acquisition
- Integration: a discovered source inbox disappearing before load causes a
  normal actionable failure and no persisted partial state
- Integration: concurrent `read`/`ack`/`clear` against overlapping origin
  inbox sets do not deadlock because both commands acquire in the same sorted order
- All existing tests must pass (single-process path unaffected)

Acceptance criteria:
- `lock.rs` is no longer a placeholder stub
- all mailbox read-modify-write paths hold an exclusive lock
- `read`, `ack`, and `clear` use one deterministic full-source lock plan for
  every mutating reread and writeback
- no shared mutable structured file touched by M.1 is rewritten in place
- concurrent `atm send` to the same inbox from two processes does not lose messages
- CI passes on macOS, Linux, Windows

---

#### M.2 — Code Review Fixes

Branch: `feature/pM-s2-review-fixes` (from `integrate/phase-M` after M.1 merges)

Dependency: M.1 must be merged to `integrate/phase-M` first.

Deliverables (itemized by finding):

1. **Restore atomicity** (ARCH-CR-002):
   - Reorder `restore_team` in `team_admin.rs` to config-last with staging
   - Add `.restore-in-progress` marker write before mutations, remove after config write
   - Add inbox staging to `.restore-staging/inboxes/` before live move
   - Apply the same atomic-persistence rule to restored task-bucket files,
     `.highwatermark`, and shared restore coordination state touched by this flow
   - `recompute_highwatermark` must either be converted to an atomic helper-backed
     write path or be covered by an explicit crash-safety test proving the
     remaining implementation is safe enough for 1.0
   - Add `atm doctor` check for stale `.restore-in-progress` markers
   - Files: `team_admin.rs`, `doctor/mod.rs`

2. **AtmError backtrace access**:
   - Keep `Display` concise and omit multi-KB backtrace rendering
   - Expose captured backtraces through Debug output and a dedicated accessor
   - File: `error.rs`

3. **`# Errors` doc audit**:
   - audit the public `Result<_, AtmError>` API surface in this explicit inventory:
     `mailbox/mod.rs`, `mailbox/lock.rs`, `read/mod.rs`, `ack/mod.rs`,
     `clear/mod.rs`, `team_admin.rs`, `doctor/mod.rs`, `error.rs`,
     `config/mod.rs`, `home.rs`, `send/mod.rs`, `send/input.rs`,
     `send/file_policy.rs`, `identity/mod.rs` if consolidation lands there,
     and any new public helper introduced by M.1/M.2
   - add `# Errors` sections where missing and list the applicable `AtmErrorCode` variants
   - avoid relying on stale hard-coded function counts; use the current public API surface

4. **`.with_recovery()` audit**:
  - perform a grep-driven audit of remaining operator-actionable bare error construction sites
    in this explicit inventory: `mailbox/mod.rs`, `mailbox/lock.rs`, `read/mod.rs`,
    `ack/mod.rs`, `clear/mod.rs`, `team_admin.rs`, `doctor/mod.rs`, `config/mod.rs`,
    `home.rs`, `address.rs`, `send/mod.rs`, `send/input.rs`, `send/file_policy.rs`,
    `identity/mod.rs` if it gains operator-facing errors, and any new M.1/M.2 code
  - do not re-edit sites that already received recovery guidance in L.7/L.8 unless the new
    Phase M design changes their operator action

5. **Shared mutable file persistence audit**:
   - grep this explicit inventory for direct writes to live shared mutable
     JSON/JSONL/state files (`fs::write`, `File::create`, equivalent):
     `mailbox/mod.rs`, `mailbox/lock.rs`, `read/mod.rs`, `ack/mod.rs`,
     `clear/mod.rs`, `team_admin.rs`, `doctor/mod.rs`, `config/mod.rs`,
     `home.rs`, `send/mod.rs`, `send/input.rs`, `send/file_policy.rs`,
     `identity/mod.rs` if it gains persistence responsibilities, and any new
     helper introduced by M.1/M.2
   - route each in-scope path through an atomic helper or document why the path
     is scratch/staging-only and therefore exempt
   - files in scope include inboxes, team config, restored task-bucket state,
     `.highwatermark`, and shared coordination files such as restore-progress
     or send-alert state

6. **Legacy config key docs**:
   - Add `# Deprecated` section to `config/mod.rs` or `config/types.rs` for `[atm].identity`
   - Reference `ATM_WARNING_IDENTITY_DRIFT`; document migration: use `ATM_IDENTITY` env var

7. **`normalize_json_number` panic removal**:
   - Replace the current exponent-parse `.expect()` in `observability.rs` with graceful fallback + `tracing::warn!`
   - Add `# Panics` doc noting precondition removed

8. **`resolve_actor_identity` consolidation**:
   - Move to `identity/mod.rs` as `pub(crate)` function
   - Update call sites in `ack/mod.rs`, `clear/mod.rs`, `read/mod.rs`

Tests required:
- Restore atomicity: interrupted restore leaves `.restore-in-progress` marker; re-run completes;
  doctor detects stale marker
- Restore atomicity: pre-existing `.restore-staging/` is either cleaned first or
  rejected with actionable recovery text; stale and fresh staging contents are never merged
- Restore atomicity: config-last ordering means config is unchanged when inbox/task/highwatermark
  staging fails before the final config write
- Restore atomicity: failure to remove the marker after a successful config
  write leaves a warning-only stale-marker finding rather than corrupting team state
- Restore atomicity: `recompute_highwatermark` is either converted to atomic
  replacement or covered by an explicit crash-safety regression test
- Backtrace: captured and absent backtrace branches are both tested; `Display`
  remains concise and the dedicated backtrace accessor remains available
- `normalize_json_number`: malformed exponent returns raw string (no panic)
- `resolve_actor_identity`: existing tests pass after consolidation (no behavior change)
- Documentation review pass confirms new `# Errors`, `# Deprecated`, and `# Panics` sections exist
  on the explicit M.2 audit inventory

Acceptance criteria:
- `restore_team` writes config.json last with staging and progress marker
- all shared mutable structured files touched by M.2 use atomic replacement helpers
- `recompute_highwatermark` no longer relies on an undocumented in-place write
  path without either conversion or explicit crash-safety coverage
- `AtmError::Display` conditionally renders backtrace
- all public `Result`-returning functions in the explicit M.2 audit inventory have `# Errors` doc sections
- `.with_recovery()` present at all operator-actionable sites in the explicit M.2 audit inventory
- `[atm].identity` documented as deprecated
- `normalize_json_number` does not panic on malformed input
- `resolve_actor_identity` exists in exactly one location
- no stale M.2 line-number references remain in the sprint spec
- CI passes on all platforms

---

Phase M dependency graph:

```
  integrate/phase-M (from integrate/phase-L)
    |
    +-- M.1: mailbox locking
    |     |
    |     v (merge to integrate/phase-M)
    |
    +-- M.2: review fixes (branch from integrate/phase-M after M.1 merge)
          |
          v (merge to integrate/phase-M)

  integrate/phase-M --> develop (final phase integration PR)
```

Phase M closeout gate (satisfied on `integrate/phase-M`; final merge to
`develop` remains the release-integration step):
- M.1 and M.2 are both merged to `integrate/phase-M`
- ARCH-CR-001 and ARCH-CR-002 blocking findings are resolved
- all BP-ECR-001 through BP-ECR-006 findings are resolved
- CI passes on all platforms
- `integrate/phase-M` merges to `develop`

Post-close review note:
- a later critical review on `develop @ 1e6515a` identified additional locking
  hardening issues that were not fully constrained by the original M.1/M.2
  deliverables. Those are tracked below as a narrowly scoped follow-up sprint.

---

#### M.F1 — Locking Hardening Follow-up

Branch: `feature/pM-locking-followup` (from `develop`, base commit `1e6515a`)

Goal: close the post-merge production-readiness findings from
`ATM-CORE-M-CODE-REVIEW` without reopening unrelated Phase M refactors.

Finding registry:
- `M-LF-001` Source discovery fail-open gap
  - finding: `discover_origin_inboxes(...)` can skip unreadable inbox-directory
    entries and continue, allowing mutation commands to operate on an
    incomplete locked source set
  - resolution criteria:
    - mutation-path source discovery fails closed on entry-enumeration errors
    - commands abort before lock acquisition or mailbox read when the source
      set cannot be enumerated completely
    - no partial-source mutation path remains
- `M-LF-002` Lock-error classification gap
  - finding: `lock.rs::acquire()` can collapse permanent I/O/OS failures into
    `MailboxLockTimeout`
  - resolution criteria:
    - only true lock-busy conditions retry until timeout
    - non-contention failures return `MailboxLockFailed` immediately with
      operator recovery guidance
- `M-LF-003` Atomic durability gap
  - finding: rename-based mailbox replacement does not fsync the parent
    directory after rename
  - resolution criteria:
    - the shared atomic replacement helper durably publishes rename results to
      the parent directory wherever the platform supports directory sync
    - the helper-boundary doc comment names Linux/macOS as parent-directory-sync
      platforms and Windows as the current `Ok(())`-without-parent-sync platform
    - the helper-boundary doc comment explicitly states the `Ok(())` behavior on
      platforms where ATM cannot issue a parent-directory sync
    - the platform caveat appears as a public doc comment at the shared helper
      boundary, not only in the sprint notes
    - the platform-conditional test strategy is explicit: `#[cfg(unix)]` covers
      the parent-directory fsync path, while `#[cfg(not(unix))]` confirms the
      helper returns `Ok(())` on the no-op parent-sync branch
- `M-LF-004` Failure-path test coverage gap
  - finding: mailbox-locking tests prove several success/no-deadlock paths, but
    they do not cover timeout/error/fail-closed paths strongly enough
  - resolution criteria:
    - bounded contention-timeout coverage exists for `send`
    - deterministic fail-closed source-discovery coverage exists for `clear`
    - deterministic non-contention lock-error coverage exists for `send`
- `M-LF-005` Locked-mutation duplication follow-up
  - finding: read/ack/clear still duplicate the lock -> rediscover -> load ->
    persist pattern
  - disposition:
    - advisory only for this sprint
    - refactor to a shared helper is allowed only if it directly simplifies
      `M-LF-001` through `M-LF-004`
    - a standalone cleanup refactor is out of scope for this follow-up

Deliverables:
- make mutation-path source discovery fail closed on directory-entry
  enumeration faults
- update lock acquisition so retry/timeout behavior is reserved for true
  contention and non-contention lock errors fail fast
- extend the shared atomic write path to fsync the parent directory after
  rename where supported
- add deterministic failure-path tests for:
  - contention timeout
  - fail-closed source discovery
  - non-contention lock-path failure classification
- document any platform caveat for parent-directory fsync directly at the
  helper boundary
- do not broaden the scope into unrelated API cleanup or large helper
  extraction unless needed to land the fixes above safely

Files expected to change:
- `crates/atm-core/src/mailbox/source.rs`
- `crates/atm-core/src/mailbox/lock.rs`
- `crates/atm-core/src/mailbox/atomic.rs`
- `crates/atm-core/src/persistence.rs` if mailbox durability is unified there
- `crates/atm-core/src/read/mod.rs`, `ack/mod.rs`, `clear/mod.rs` only as
  needed to accommodate strict source-discovery behavior
- `crates/atm-core/tests/mailbox_locking.rs`
- docs: `requirements.md`, `architecture.md`, `project-plan.md`

Tests required:
- Integration: a synthetic directory-entry enumeration fault causes mutation
  commands to fail closed before mailbox mutation
- Integration: a held mailbox lock produces a bounded `MailboxLockTimeout`
  result without deadlock or indefinite hang
- Unit or focused integration: a deterministic non-contention lock failure path
  returns `MailboxLockFailed`, not `MailboxLockTimeout`
- Unit: atomic replacement helper verifies parent-directory fsync sequencing via
  a deterministic seam or focused helper test
- Unit: `#[cfg(not(unix))]` coverage confirms the shared helper returns `Ok(())`
  without error on platforms where parent-directory sync is unavailable
- All locking tests must use bounded coordination primitives (`recv_timeout`,
  `wait_timeout`, elapsed ceilings) and guaranteed teardown; no open-ended joins
  or sleep-based race assumptions

Acceptance criteria:
- no mailbox mutation command can proceed from a partially enumerated source set
- `MailboxLockTimeout` is emitted only for true contention paths
- rename-based mailbox persistence includes parent-directory durability handling
  at the shared helper boundary
- failure-path locking coverage is deterministic and CI-safe:
  - `send` returns a bounded `MailboxLockTimeout` under held-lock contention
  - `clear` fails closed on synthetic source-discovery fault without mailbox mutation
  - `send` reports synthetic non-contention lock-path failure as `MailboxLockFailed`
- `M-LF-005` remains explicitly advisory unless a helper extraction is needed to
  land the blocking/important fixes

---

### Phase N: Publish Replacement And Distribution Parity [COMPLETE]

Status: COMPLETE

Goal:
- ship the retained `1.0` release from this repo as the direct replacement for
  the historical `agent-team-mail` CLI/core release line
- preserve the historical release channels that actually existed for the old
  repo:
  - crates.io
  - GitHub Releases
  - Homebrew
- add `winget` as a required new `1.0` channel so Windows users can install
  without Rust tooling or manual archive extraction

Status summary:
- the old repo already contains the release source of truth for crates.io,
  GitHub Releases, and Homebrew automation
- the old repo does not contain `winget` release automation, so this repo must
  add it as new release infrastructure rather than porting it directly
- `team-lead` has confirmed the shared account-level publish infrastructure:
  - Homebrew tap remains `randlee/homebrew-tap`
  - `HOMEBREW_TAP_TOKEN` exists in account secrets but is not yet configured on
    `atm-core`
  - `winget` has a proven reference implementation in `randlee/claude-history`
    using `vedantmgoyal2009/winget-releaser@v2`
  - `winget` uses the default GitHub workflow token and does not require an
    additional repo secret
- this repo currently has only CI and no equivalent release-manifest,
  preflight, release, or publisher-agent infrastructure
- the source paths remain `crates/atm` and `crates/atm-core`, but the
  publishable package identities for this release line must be
  `agent-team-mail` and `agent-team-mail-core`

#### N.1 — Package Identity And Manifest Replacement

Goal:
- convert the retained publishable crates in this repo to the legacy package
  identities expected by downstream users

Deliverables:
- rename `crates/atm/Cargo.toml` package name from `atm` to
  `agent-team-mail`
- rename `crates/atm-core/Cargo.toml` package name from `atm-core` to
  `agent-team-mail-core`
- keep the CLI binary name `atm`
- set both publishable crates to the intended `1.0.0` release version
- replace the CLI path-only core dependency with an explicit versioned
  dependency on `agent-team-mail-core`
- add release-grade package metadata to both publishable manifests:
  - description
  - repository
  - homepage
  - readme
  - keywords
  - categories
- ensure the publishable crate surface excludes test-only fixture binaries and
  other non-release executables
- audit release dependency features so production releases do not ship
  test-oriented features unless explicitly intended

Acceptance criteria:
- `cargo package -p agent-team-mail-core --locked` succeeds
- `cargo package -p agent-team-mail --locked` succeeds
- `cargo publish --dry-run -p agent-team-mail-core --locked --no-verify`
  succeeds
- `cargo publish --dry-run -p agent-team-mail --locked --no-verify` succeeds
- only the retained release binary `atm` is part of the publishable CLI
  install surface

#### N.2 — Release Automation Port

Goal:
- port the old repo’s release automation into this repo, narrowed to the
  retained CLI/core release surface plus continued shared-family dependency
  verification and the new required `winget` channel

Deliverables:
- add `release/publish-artifacts.toml` as the new release artifact manifest
- add the missing `HOMEBREW_TAP_TOKEN` GitHub secret to `atm-core` as a
  one-time prerequisite before the Homebrew update job is expected to pass
- port and adapt:
  - `.github/workflows/release-preflight.yml`
  - `.github/workflows/release.yml`
  - `scripts/release_gate.sh`
  - `scripts/release_artifacts.py`
  - release inventory schema/supporting release docs needed by the workflows
- define the retained publishable artifact set in
  `release/publish-artifacts.toml`:
  - `agent-team-mail-core`
  - `agent-team-mail`
- define the retained binary artifact set for GitHub Releases:
  - `atm`
- keep crates.io publish ordering explicit:
  - `agent-team-mail-core` before `agent-team-mail`
- keep GitHub Release asset packaging for the supported platform targets:
  - `x86_64-unknown-linux-gnu`
  - `x86_64-apple-darwin`
  - `aarch64-apple-darwin`
  - `x86_64-pc-windows-msvc`
- port Homebrew update automation for the formulas already managed by the old
  release workflow:
  - `Formula/agent-team-mail.rb`
  - `Formula/atm.rb`
- add `winget` release automation for the retained CLI package:
  - manifest generation or update path
  - release-version and asset-URL wiring
  - SHA256 update from the released Windows archive
  - `vedantmgoyal2009/winget-releaser@v2` workflow step targeting package ID
    `randlee.agent-team-mail`
  - use the Windows ZIP asset from the GitHub Release as the installer source
  - one-time initial manifest submission procedure for the first release
  - recurring submission flow for later releases after the package exists in
    `microsoft/winget-pkgs`
  - no additional `winget` secret beyond the default workflow `GITHUB_TOKEN`
  - verification of submission success rather than same-day installability
- port/reference the proven `claude-history` winget materials:
  - `.winget/randlee.claude-history.yaml`
  - `docs/WINGET_SETUP.md`
  - the `winget` step in `.github/workflows/release.yml`

Acceptance criteria:
- this repo has release-preflight and release workflows with no missing helper
  files or schema dependencies
- the release artifact manifest is the single source of truth for publishable
  crates and release binaries in this repo
- preflight validates version alignment, artifact inventory, and dependency
  ordering from this repo layout
- release workflow produces retained `atm` archives, crates publish order, and
  Homebrew update steps without references to removed daemon/TUI/MCP artifacts
- release automation includes a concrete `winget` update/publish path for the
  retained Windows CLI install surface
- `N.2` explicitly records the Homebrew secret prerequisite and the one-time
  `winget` bootstrap requirement so the workflow design does not assume either
  exists magically

#### N.3 — Publisher Agent Port

Goal:
- port the release-orchestration agent instructions into this repo so release
  execution remains controlled by the same hardened operating procedure

Deliverables:
- create `.claude/agents/publisher.md` in this repo
- port the old `publisher` agent instructions and update all source-of-truth
  references to this repo’s files and workflows
- keep the hard rules around:
  - tag creation only by workflow
  - no manual `v*` tag pushes
  - develop -> main release gate ordering
  - required preflight and release workflow dispatch steps
- narrow the retained artifact/channel assumptions to this repo’s actual
  publish surface:
  - crates.io
  - GitHub Releases
  - Homebrew
  - `winget`
- update the inventory and verification expectations so the publisher does not
  expect daemon, MCP, TUI, or CI monitor outputs from this repo
- document in the publisher agent:
  - that `HOMEBREW_TAP_TOKEN` must exist on `atm-core` before Homebrew release
    automation can run
  - that the first `winget` release requires a one-time manual manifest
    submission
  - that later `winget` releases are workflow-driven
  - that Microsoft review introduces a normal 1-2 day delay before `winget`
    installability is observable

Acceptance criteria:
- `.claude/agents/publisher.md` exists in this repo
- publisher source-of-truth paths resolve to files that exist in this repo
- publisher instructions enumerate the retained artifact set and release
  channels accurately
- publisher instructions distinguish historical parity channels from the new
  required `winget` channel for Windows installation

#### N.4 — Customer-Facing Release Surface Documentation

Goal:
- make the replacement release understandable to downstream users and package
  consumers before `1.0` ships

Deliverables:
- rewrite `README.md` from reset-workspace language into release-facing product
  documentation
- document installation from:
  - GitHub Releases
  - Homebrew
  - crates.io
  - `winget`
- state that `agent-team-mail` and `agent-team-mail-core` are now published
  from this repo
- explain that the retained `1.0` replacement scope historically covered the
  pre-Phase-Q CLI/core pair and continues to consume the published
  `sc-observability` family
- explain that `winget` is a new required `1.0` Windows channel rather than a
  historical parity channel

Acceptance criteria:
- `README.md` matches the retained release surface and actual distribution
  channels
- customer-facing install instructions no longer describe this repo as a reset
  workspace
- release docs promise only retained legacy crates and the actual supported
  install channels, including `winget`

#### N.5 — Final Release Readiness Proof

Goal:
- prove that the retained replacement release can be published and installed
  from this repo before the real `1.0` publish run starts

Deliverables:
- run and record:
  - `cargo fmt --all --check`
  - `cargo clippy --workspace --all-targets -- -D warnings`
  - `cargo test --workspace`
  - `cargo package -p agent-team-mail-core --locked`
  - `cargo package -p agent-team-mail --locked`
  - `cargo publish --dry-run -p agent-team-mail-core --locked --no-verify`
  - `cargo publish --dry-run -p agent-team-mail --locked --no-verify`
- perform one install smoke test against the packaged/publishable CLI artifact
  surface to confirm `atm` is the installed entrypoint
- verify that the release inventory and post-publish verification expectations
  cover the retained release channels:
  - crates.io
  - GitHub Releases
  - Homebrew
  - `winget`
- verify that `winget` readiness proof checks successful submission/manifests,
  not immediate public installability

Acceptance criteria:
- all dry-run packaging and publishability checks succeed from this repo
- the release inventory matches the retained release scope exactly
- no retained release doc or workflow step depends on removed legacy crates
- `N.5` explicitly acknowledges the 1-2 day Microsoft review lag so release
  operators do not treat normal `winget` review delay as a failed publish

Phase N completion gate:
- package identities are switched to the legacy crates.io names
- release automation is present in this repo and references the retained
  artifact set correctly
- `.claude/agents/publisher.md` is ported and accurate for this repo
- customer-facing docs reflect the retained replacement release
- preflight and release dry-runs are clean for the retained publishable crates
- retained release channels confirmed:
  - crates.io
  - GitHub Releases
  - Homebrew
  - `winget`
- `winget` is explicitly documented as a new required Windows install channel,
  not as historical parity

### Phase O: Security And Hardening [COMPLETE]

Status: COMPLETE

Goal:
- close the confirmed CR001 findings that affect path safety, allocation
  bounds, temp-file collision resistance, and send-alert lock behavior before
  the next post-`1.0.1` hardening cycle begins

Status summary:
- CR001 confirmed four follow-up fixes that were intentionally left out of the
  immediate release gate because they need small but focused design + test work
- Phase O groups those fixes into one input-safety sprint and one
  filesystem/lock-hardening sprint
- accepted limitations from CR001 remain documented separately and are not
  re-opened by this phase

Integration branch: `integrate/phase-O`

#### O.1 — Input Validation And Allocation Safety

Goal:
- close the two highest-risk confirmed CR001 findings by tightening the trust
  boundary around path construction and bounding JSON-number normalization

Deliverables:
- `H-1`: add validated newtypes or one shared validator for team/agent path
  segments
  - reject path separators, `..`, empty segments, and platform-specific escapes
    before any path construction in `address.rs` and `home.rs`
- `M-2`: cap `normalize_json_number(...)` expansion length
  - if the normalized form would exceed 64 characters, return the raw string
    unchanged and emit `warn!` with
    `code = %AtmErrorCode::WarningMalformedAtmFieldIgnored`

Acceptance criteria:
- `AgentAddress::from_str(...)` rejects `../evil`, `../../passwd`, and names
  containing path separators
- `normalize_json_number(...)` with exponent expansion over 64 characters
  returns the raw string unchanged and emits the documented warning
- `cargo test --workspace` passes
- `cargo clippy --workspace --all-targets -- -D warnings` is clean

#### O.2 — Filesystem Durability And Lock Hardening

Goal:
- close the remaining confirmed CR001 findings in persistence temp naming and
  send-alert stale-lock handling

Deliverables:
- `C-1`: replace the timestamp-nanos temp-file suffix in `persistence.rs` with
  `Uuid::new_v4()` while keeping the target basename for debuggability
- `H-2`: add sleep/backoff after successful stale-lock eviction in
  `acquire_send_alert_lock(...)` so every `AlreadyExists` turn yields at least
  once

Acceptance criteria:
- `atomic_write_bytes(...)` uses UUID-based temp names
- `acquire_send_alert_lock(...)` sleeps on every `AlreadyExists` iteration,
  regardless of whether stale-lock eviction succeeded
- `cargo test --workspace` passes
- `cargo clippy --workspace --all-targets -- -D warnings` is clean

Phase O completion gate:
- O.1 and O.2 are both merged to `integrate/phase-O`
- all four confirmed CR001 findings are resolved:
  - `H-1`
  - `M-2`
  - `C-1`
  - `H-2`
- CI passes on all platforms
- `integrate/phase-O` merges to `develop`

### Phase P: File-I/O Ownership And Single-Write-Path Hardening [MERGED / HARDENING FOLLOW-UP]

Status note:
- P.1 completed on `feature/pP-s1-ownership-classification` via PR `#111`
  at `git#2e90a97`
- P.2 completed on `feature/pP-s2-mailbox-read-path` via PR `#112`
  at `git#f230ef4`
- P.3 completed on `feature/pP-s3-atm-owned-state` via PR `#115`
  at `git#ecb774a`
- P.4 completed on `feature/pP-s4-claude-inbox-compat` via PR `#113`
  at `git#9d5729b`
- P.5 closure gate completed on `feature/pP-s5-closure-gate` via PR `#114`
- final Phase P integration merged to `develop` via PR `#120` at `git#ad49336`
- this phase section now records both the executed P.1-P.5 history and the
  remaining hardening continuation work needed before a publish-quality close
- `integrate/phase-P` is currently rebased onto `develop@628e176`; refresh it
  again if `develop` advances before the next hardening slice begins

Goal:
- make the retained ATM implementation production-ready by applying one
  explicit file-I/O model across every live file family:
  - `read_only`
  - `read_possible_write`
  - `read_modify_write`
- eliminate ad hoc write paths
- minimize lock hold time without permitting stale-snapshot overwrites
- stop treating Claude-owned inbox files as the long-term source of truth for
  ATM-local workflow durability

Non-negotiable constraints:
- no production code path may introduce a second write path for a live file
  family when an owner-layer helper already exists
- no stale-snapshot `read -> mutate -> lock -> blind rename` flow is allowed
- no new ATM-local durable state may be placed in Claude-owned inbox files
- no tolerance for flaky tests

Integration branch: `integrate/phase-P`

#### P.6-P.8 — Post-Merge Hardening Continuation

Goal:
- close the remaining Phase P publish-risk gaps after the merge to `develop`
- keep follow-up work split into deterministic, reviewable sprint slices:
  - P.6: workflow-sidecar concurrency and typed boundary cleanup
  - P.7: test hygiene and observability cleanup
  - P.8: requirements/architecture/project-plan reconciliation

Planning rule:
- each hardening continuation sprint must start from the latest `develop` on
  `integrate/phase-P`
- if `develop` advances between sprints, refresh `integrate/phase-P` from it
  before opening the next sprint branch
- no sprint may rely on timing-based tests, unlocked ATM-owned state rewrites,
  or new raw `String`/parse-later request surfaces for agent/team/address
  identifiers

##### P.6 — Workflow-Sidecar Concurrency And Typed Boundary Cleanup [PLANNED]

Goals:
- make workflow-sidecar seeding in `send` safe for concurrent same-recipient
  sends
- remove the remaining raw-string request/target boundaries called out by the
  Phase P rust-best-practices review

Files expected in scope:
- `crates/atm-core/src/send/mod.rs`
- `crates/atm-core/src/workflow.rs`
- `crates/atm-core/src/mailbox/source.rs`
- `crates/atm-core/src/read/mod.rs`
- `crates/atm-core/src/team_admin.rs`
- `crates/atm-core/tests/mailbox_locking.rs`
- command-layer constructors/parsers that build these request types

Design details:
- introduce one owner-layer workflow commit path for `.atm-state/workflow` that
  proves freshness before replacing the live file
- `send_mail(...)` and the missing-config team-lead notice path must stop using
  unlocked `load -> mutate -> save` on workflow state
- same-recipient send/send concurrency must either:
  - lock the workflow sidecar and reload under that lock before persisting, or
  - use a compare-and-swap equivalent at the workflow owner boundary
- mailbox append plus workflow seed must be treated as one coordinated
  persistence plan for send-owned writes; no parallel ad hoc sidecar mutation
  path is allowed
- request/target parsing must move to construction time for the remaining
  boundary types:
  - `AddMemberRequest.team` and `.member` use `TeamName` / `AgentName`
    instead of raw `String` (`RBP-F001`)
  - `ReadQuery.target_address` uses `Option<AgentAddress>` instead of
    `Option<String>` (`RBP-F002`)
  - `ResolvedTarget.agent` / `.team` carry `AgentName` / `TeamName`
    rather than raw `String` (`RBP-F003`)

Implementation patterns:
- prefer a typed workflow helper such as
  `workflow::with_locked_state(...)` or `workflow::commit_state(...)`
  over reimplementing lock/CAS logic inside `send/mod.rs`
- keep validation at the API boundary: constructors, CLI parsing, and resolver
  outputs should carry validated newtypes rather than validating deep in the
  implementation
- keep concurrent coverage deterministic by using channels/barriers with bounded
  waits; do not use sleeps to try to overlap sends

Reference shape:

```rust
// Pseudocode shape, not a final required signature.
let envelope = mailbox::append_message(...)?;
workflow::commit_state(home, team, recipient, |state| {
    state.remember_initial_state(&envelope);
})?;
```

```rust
// Pseudocode shape for the final freshness boundary.
workflow::commit_state(home, team, recipient, |state| {
    let fresh = state.reload_or_validate()?;
    fresh.remember_initial_state(&envelope);
    Ok(())
})?;
```

The hardening requirement is not the helper name; it is that `send` and the
missing-config team-lead notice path must stop open-coding `load -> mutate ->
save` against the workflow sidecar and must commit from a freshness-proven
owner-layer boundary.

Required coverage:
- concurrent same-recipient send/send test proving two ATM-authored messages
  both seed workflow state without lost updates
- coverage for the missing-config team-lead notice path using the shared
  workflow owner helper
- request-construction tests showing invalid team/agent/address input is
  rejected before command execution enters the core implementation
- concurrent send where one path is a normal recipient send and the other is the
  missing-config team-lead notice path for the same workflow file family
- contention case where one sender observes an older workflow snapshot, loses
  the race, reloads, and recomputes without dropping the winning sender's entry
- mixed payload case where concurrent sends differ on `requires_ack`, `task_id`,
  and summary generation, proving the seeded sidecar state tracks the correct
  message identity rather than whichever writer saves last

##### P.7 — Test Hygiene And Observability Cleanup [PLANNED]

Goals:
- remove the remaining timing-dependent and process-environment test seams
- stop silently discarding malformed idle-notification JSON during read-path
  classification

Files expected in scope:
- `crates/atm/tests/log.rs`
- `crates/atm-core/src/config/discovery.rs`
- `crates/atm-core/src/clear/mod.rs`
- `crates/atm-core/src/read/mod.rs`
- any shared test fixture/helper file needed to make readiness deterministic

Design details:
- replace the fixed `thread::sleep(Duration::from_millis(250))` in log-tail
  coverage with an explicit readiness handshake or bounded polling barrier
- replace all hardcoded `/tmp/atm-config-root` test roots in
  `config/discovery.rs` with `tempdir()`-backed paths
- scope `ATM_TEST_REMOVE_LOCKED_INBOX_BEFORE_LOAD` with an RAII env guard under
  `#[serial]` in `clear/mod.rs`
- replace `idle_notification_sender(...).ok().and_then(...)` with explicit
  malformed-JSON handling that preserves non-fatal behavior but emits traceable
  diagnostics and recovery context (`RBP-F004`)

Implementation patterns:
- any new helper introduced for log-tail readiness must expose a positive-ready
  signal; it must not sleep "long enough"
- process-environment mutation in tests must use one repo-standard pattern:
  shared env lock plus scoped guard plus `#[serial]`
- malformed JSON handling should remain fail-soft for Claude-owned inbox data,
  but it must not silently disappear; trace/debug logging is required

Reference shape:

```rust
// Pseudocode shape, not a required concrete helper name.
let ready = readiness.wait_until_ready(timeout)?;
assert!(ready);
let records = log_tail.read_after(ready.cursor())?;
```

```rust
let _guard = test_env::scoped_var(
    "ATM_TEST_REMOVE_LOCKED_INBOX_BEFORE_LOAD",
    "1",
);
```

Required coverage:
- deterministic log-tail readiness test with no fixed-duration sleep
- `config/discovery.rs` tests rewritten to tempdir fixtures with no `/tmp`
  assumptions
- `clear/mod.rs` regression showing the injected disappearing-inbox path resets
  state through scoped guard teardown
- read-path coverage proving malformed idle-notification JSON is observable in
  logs/diagnostics and does not panic or change mailbox state
- malformed idle-notification JSON adjacent to valid mailbox records still
  leaves the valid records readable and classifiable
- tempdir-backed discovery tests cover paths with spaces and nested directories
  so platform path handling is exercised instead of assuming `/tmp` semantics
- env-guard teardown is verified on early-return/failure paths so one test's
  injected state cannot leak into the next test process

##### P.8 — Requirements, Architecture, And Plan Reconciliation [COMPLETE]

Goals:
- bring the written Phase P requirements, architecture, and plan text into
  alignment with the landed implementation and the follow-up hardening results
- close the remaining documentation ambiguity before a final publish decision

Files expected in scope:
- `docs/requirements.md`
- `docs/project-plan.md`
- `docs/architecture.md`
- `docs/atm-core/modules/mailbox.md`
- `docs/atm-core/modules/workflow.md`

Design details:
- rewrite `REQ-CORE-MAILBOX-LOCK-005` so it matches the executed mutation
  taxonomy:
  - `read_only`: no locks
  - `read_possible_write`: unlocked observation is allowed, but any commit must
    reload/prove freshness under the final lock set
  - `read_modify_write`: acquire the final lock plan before the mutating
    snapshot and hold it through commit
- explicitly document that `read`, `ack`, and `clear` do not all share the same
  pre-lock read behavior anymore; the requirement should describe the
  command-specific executed pattern instead of the pre-Phase-P rule
- keep `docs/requirements.md` and `docs/architecture.md` as the always-valid
  enforced source of truth for the post-P.5 system state rather than a phase
  narrative
- record the P.6 and P.7 fixes in the Phase P closure history so the plan
  reads as executed release evidence rather than mixed proposal/history

Acceptance criteria:
- requirements, architecture, and project-plan text all describe the same
  mailbox-read taxonomy and lock acquisition model
- the docs explicitly name the remaining P.6 send-side workflow freshness gap
  rather than implying it is already solved
- the Phase P heading and status note clearly show merged/executed state
- no Phase P document still implies that the merged implementation is
  proposal-only

##### P.9 — Lock Sentinel Gap: Detailed Design And Doc Updates [COMPLETE]

Goals:
- produce a complete, implementation-ready design for the P.10 coding sprint
- update `docs/requirements.md`, `docs/architecture.md`, and
  `docs/atm-error-codes.md` so the production contract is settled before code
  changes begin
- make the design output itself the authoritative specification for P.10

Files in scope:
- `docs/requirements.md`
- `docs/architecture.md`
- `docs/atm-error-codes.md`
- `docs/project-plan.md`
- `crates/atm-core/src/mailbox/lock.rs` (read-only analysis only; no code
  changes in P.9)

Design outcomes:
- GAP-1 uses a conservative basename predicate instead of
  `path.extension() == "lock"`
- GAP-2 uses a dedicated
  `AtmErrorCode::MailboxLockReadOnlyFilesystem`
  / `ATM_MAILBOX_LOCK_READ_ONLY_FILESYSTEM`
  instead of overloading `MailboxLockFailed`
- read-only filesystem classification is based on raw OS error codes, not
  generic `PermissionDenied` handling
- read-only failures surface directly from public sweep/acquire paths and do
  not enter retry loops
- drop-time cleanup remains warn-only because the mailbox mutation has already
  completed successfully

P.9 deliverables:
1. Updated `docs/requirements.md` with explicit GAP-1 and GAP-2 lock
   requirements
2. Updated `docs/architecture.md` with the final sweep predicate, call-graph
   decisions, and error-code rationale
3. Updated `docs/atm-error-codes.md` with
   `ATM_MAILBOX_LOCK_READ_ONLY_FILESYSTEM`
4. Populated `docs/project-plan.md` §P.10 with the exact implementation
   contract
5. ATM status report to `team-lead` with the five design-question answers and
   a summary of the doc changes

##### P.10 — Lock Sentinel Residual Gap Closure [PLANNED]

Goals:
- implement the hardened fixes for GAP-1 and GAP-2 exactly as specified by the
  completed P.9 design output
- close the remaining mailbox-lock residual risks without broadening scope into
  unrelated refactors

Files expected in scope:
- `crates/atm-core/src/mailbox/lock.rs`
- `crates/atm-core/src/error.rs`
- `crates/atm-core/src/error_codes.rs`
- `crates/atm-core/src/ack/mod.rs`
- `crates/atm-core/src/send/mod.rs`
- `crates/atm-core/src/schema/inbox_message.rs`
- docs already updated in P.9

Exact GAP-1 predicate:

```rust
let is_lock_sentinel_candidate = path
    .file_name()
    .and_then(|name| name.to_str())
    .is_some_and(|name| name.ends_with(".lock") || name.contains(".lock."));
```

Why this expression:
- `ends_with(".lock")` keeps the ordinary live sentinel path
- `contains(".lock.")` catches rotated artifacts such as `.lock.old` and
  `.lock.replaced`
- basename-only matching avoids matching parent directories
- generic `contains("lock")` is forbidden because it creates unrelated-file
  false positives

Exact GAP-2 platform classification:
- Linux: `libc::EROFS` (`30`)
- macOS: `libc::EROFS` (`30`)
- Windows: `windows_sys::Win32::Foundation::ERROR_WRITE_PROTECT as i32` (`19`)

Recommended helper shape:

```rust
fn is_readonly_filesystem_error(error: &io::Error) -> bool
```

```rust
fn mailbox_lock_path_error(
    operation: &'static str,
    lock_path: &Path,
    error: io::Error,
) -> AtmError
```

Required call-graph decisions:
- `open_lock_file(...)` maps read-only failures to
  `MailboxLockReadOnlyFilesystem`
- `write_lock_owner_record(...)` maps both truncate and write failures through
  the same helper
- `remove_lock_sentinel_with_retry(...)` checks read-only first and returns the
  error immediately instead of entering the permission-denied retry loop
- `evict_stale_lock_sentinel(...)` must become result-bearing enough for public
  sweep/acquire call sites to surface read-only failures instead of only
  warning
- `MailboxLockGuard::drop` keeps warning-only cleanup on read-only failure

Recommended result-shape change:
- change stale-sentinel eviction from a bare `bool` outcome to a richer result
  that distinguishes:
  - removed
  - skipped (live owner / malformed / not found)
  - failed (`io::Error`)

Test simulation pattern:
- do not require a real read-only mount
- add a deterministic synthetic seam patterned after
  `ATM_TEST_FORCE_LOCK_NON_CONTENTION_ERROR`
- recommended env var contract:
  - `ATM_TEST_FORCE_LOCK_READONLY_FS=open`
  - `ATM_TEST_FORCE_LOCK_READONLY_FS=write_owner`
  - `ATM_TEST_FORCE_LOCK_READONLY_FS=remove`
- operation scoping is strict:
  - `open` affects only the lock open/create path; owner-record write and
    sentinel removal continue to execute normally
  - `write_owner` affects only owner-record truncate/write
  - `remove` affects only stale-sentinel removal / cleanup
- the seam must synthesize the platform-correct raw OS error so the production
  classification logic is what the tests exercise

Required coverage:
- unit: rotated sentinel names such as `inbox.json.lock.old` are considered
  sweep candidates, while unrelated filenames are ignored
- unit: malformed rotated sentinel contents are skipped, not deleted
- unit: read-only `open` returns
  `ATM_MAILBOX_LOCK_READ_ONLY_FILESYSTEM`
- unit: read-only owner-record write returns
  `ATM_MAILBOX_LOCK_READ_ONLY_FILESYSTEM`
- unit: read-only sentinel removal is not retried as permission-denied
- unit or focused integration: public stale-sentinel sweep surfaces the
  read-only error instead of logging and continuing
- regression: drop-time cleanup remains non-fatal even when read-only cleanup
  fails after a successful command

Acceptance criteria:
- no rotated stale sentinel escapes the sweep solely because it no longer ends
  with the literal `.lock` extension
- read-only filesystem failures are machine-readable and operator-distinct from
  contention and generic path I/O
- retry loops are reserved for genuine contention or documented transient
  sharing failures, not persistent read-only mounts
- tests remain deterministic and mount-free

#### P.0 — Audited Production File-I/O Inventory

The implementation phase must treat this as the starting inventory of affected
production-path files and modules.

Read-path inventory:
- `crates/atm-core/src/config/mod.rs`
- `crates/atm-core/src/mailbox/mod.rs`
- `crates/atm-core/src/mailbox/source.rs`
- `crates/atm-core/src/mailbox/lock.rs`
- `crates/atm-core/src/read/mod.rs`
- `crates/atm-core/src/read/seen_state.rs`
- `crates/atm-core/src/ack/mod.rs`
- `crates/atm-core/src/clear/mod.rs`
- `crates/atm-core/src/send/mod.rs`
- `crates/atm-core/src/team_admin.rs`
- `crates/atm-core/src/doctor/mod.rs`
- `crates/atm-core/src/identity/hook.rs`
- `crates/atm-core/src/send/input.rs`

Live write-path inventory:
- mailbox replacement through:
  - `crates/atm-core/src/mailbox/atomic.rs`
  - callers in `mailbox/mod.rs`, `read/mod.rs`, `ack/mod.rs`, `clear/mod.rs`
- mailbox lock sentinel lifecycle in:
  - `crates/atm-core/src/mailbox/lock.rs`
- seen-state watermark in:
  - `crates/atm-core/src/read/seen_state.rs`
- send-alert state and lock in:
  - current implementation location: `crates/atm-core/src/send/mod.rs`
  - owning helper boundary remains a Phase P.3 design decision until that
    sprint resolves it
- team config writes in:
  - `crates/atm-core/src/team_admin.rs`
- restore marker / restore staging / task bucket / highwatermark in:
  - `crates/atm-core/src/team_admin.rs`
- shared atomic commit boundary in:
  - `crates/atm-core/src/persistence.rs`

Supporting path/ownership surfaces that must stay aligned:
- `docs/requirements.md`
- `docs/architecture.md`
- `docs/atm-core/modules/mailbox.md`
- `docs/atm-message-schema.md`
- `docs/claude-code-message-schema.md`
- `docs/legacy-atm-message-schema.md`
- `docs/atm-error-codes.md`

Current concrete boundaries to review and either retain as the one owning
helper or retire during the phase:
- mailbox rewrite helpers:
  - `crates/atm-core/src/mailbox/mod.rs::locked_read_modify_write(...)`
  - `crates/atm-core/src/read/mod.rs::persist_source_files(...)`
  - `crates/atm-core/src/ack/mod.rs::persist_source_files(...)`
  - `crates/atm-core/src/clear/mod.rs::persist_source_files(...)`
- ATM-owned state helpers:
  - `crates/atm-core/src/read/seen_state.rs::save_seen_watermark(...)`
  - `crates/atm-core/src/send/alert_state.rs::register_missing_team_config_alert(...)`
  - `crates/atm-core/src/send/alert_state.rs::clear_missing_team_config_alert(...)`
  - `crates/atm-core/src/send/alert_state.rs::save(...)`
  - `crates/atm-core/src/send/alert_state.rs::acquire_lock(...)`
  - `crates/atm-core/src/team_admin.rs::write_team_config(...)`
  - `crates/atm-core/src/team_admin.rs::atomic_write(...)`
  - `crates/atm-core/src/team_admin/restore.rs::write_restore_marker(...)`
  - `crates/atm-core/src/team_admin/restore.rs::clear_restore_marker(...)`
  - `crates/atm-core/src/team_admin/restore.rs::recompute_highwatermark(...)`
- shared low-level atomic commit primitive:
  - `crates/atm-core/src/persistence.rs::atomic_write_bytes(...)`
  - `crates/atm-core/src/persistence.rs::atomic_write_string(...)`

Inventory rule:
- every sprint must update this inventory if new production-path live I/O
  surfaces are introduced
- code review for the phase is not complete until each inventory entry has an
  identified mutation class and one owner-layer commit boundary

#### P.1 — File Ownership Classification And Helper Boundaries

Goal:
- classify every live file family by ownership and mutation class
- extract or confirm one owner-layer write boundary per file family before
  deeper behavior changes land

Design details:
- introduce an internal file-I/O taxonomy used in docs and code review:
  - Claude-owned compatibility surface
  - ATM-owned source of truth
  - staging/scratch artifact
- define one owner-layer boundary for each live file family:
  - mailbox owner helper
  - seen-state owner helper
  - send-alert state owner helper
  - team-config owner helper
  - task/highwatermark owner helper
  - restore-marker owner helper
- low-level atomic replacement stays in `persistence.rs`; file-family semantics
  stay out of command handlers
- no public API ceremony is required if a small internal helper layer keeps the
  ownership boundary explicit

Implementation patterns:
- prefer typed owner helpers over generic closure-heavy abstractions when the
  file family has domain-specific preconditions
- call sites should say "save seen watermark", "write team config", or
  "commit mailbox state" instead of assembling temp-file mechanics locally
- staging directories are not live source-of-truth files, but their creation
  and cleanup must still be deterministic and owned by one restore helper path

Files expected in scope:
- `crates/atm-core/src/persistence.rs`
- `crates/atm-core/src/read/seen_state.rs`
- `crates/atm-core/src/send/mod.rs`
- `crates/atm-core/src/team_admin.rs`
- `crates/atm-core/src/mailbox/atomic.rs`
- `crates/atm-core/src/mailbox/mod.rs`
- `crates/atm-core/src/lib.rs` if new owner modules/helpers are factored out

Concrete implementation targets:
- keep `persistence.rs` as the low-level atomic primitive layer
- use `mailbox::store::commit_mailbox_state(...)` and
  `mailbox::store::commit_source_files(...)` as the mailbox owner boundary
- keep `read::seen_state::save_seen_watermark(...)` as the seen-state owner
  boundary
- use `send::alert_state::{load, save, acquire_lock}` as the send-alert state
  owner boundary, with
  `send::alert_state::{register_missing_team_config_alert,
  clear_missing_team_config_alert}` as the command-facing mutation helpers
- keep `team_admin::write_team_config(...)` as the team-config owner boundary
- use `team_admin::restore::restore_task_state_from_backup(...)` as the task bucket /
  `.highwatermark` owner boundary
- use `team_admin::restore::{prepare_restore_workspace, cleanup_restore_workspace,
  write_restore_marker, clear_restore_marker}` as the restore workspace and
  restore-marker owner boundaries

Tests required:
- unit coverage for each owner helper boundary
- deterministic tests proving owner helpers preserve existing on-disk shape
- deterministic tests proving owner helpers are the only commit path used by
  production call sites added in the sprint

Acceptance criteria:
- every live file family in `P.0` names one owner-layer write boundary
- no new direct `fs::write`, truncate-and-rewrite, or ad hoc temp-file logic is
  introduced in production code
- the docs and module ownership notes match the helper boundaries

#### P.2 — Mailbox Read Path Classification And Shared Commit Flow

Goal:
- make mailbox command behavior conform exactly to the three operation classes
- share the observational read path and the commit path across read/ack/clear
  wherever the behavior is identical

Design details:
- `read_only` mailbox work:
  - `mailbox::store::observe_source_files(...)`
  - merge/classify/filter/select
  - no mailbox lock
- `read_possible_write` mailbox work:
  - unlocked observational snapshot first via
    `mailbox::store::observe_source_files(...)`
  - only if mutation is needed, enter
    `mailbox::store::with_locked_source_files(...)`
- mailbox commit path:
  - acquire the deterministic lock set
  - re-discover source paths under lock
  - reload current source files
  - recompute the selected mutation from fresh data
  - persist through `mailbox::store::commit_source_files(...)` while locks are
    still held
- `ack` resolves the reply inbox from an unlocked preflight, then uses one
  final sorted superset lock for reload/recompute/persist through the shared
  commit pattern

Implementation patterns:
- share the unlocked snapshot loader between `read` initial selection and wait
  polling
- use `mailbox::store::with_locked_source_files(...)` as the shared
  read/ack/clear lock+reload entry point and
  `mailbox::store::commit_source_files(...)` as the shared mailbox persistence
  leaf
- share sort/limit/selection recomputation utilities where behavior matches
- keep lock acquisition out of read-only paths entirely
- use deterministic path ordering and one total timeout budget for every
  multi-file commit path

Files expected in scope:
- `crates/atm-core/src/read/mod.rs`
- `crates/atm-core/src/ack/mod.rs`
- `crates/atm-core/src/clear/mod.rs`
- `crates/atm-core/src/mailbox/mod.rs`
- `crates/atm-core/src/mailbox/source.rs`
- `crates/atm-core/src/mailbox/lock.rs`

Concrete implementation targets:
- consolidate the current per-command mailbox rewrite helpers so `read`,
  `ack`, and `clear` do not each own a separate final persist step
- keep `ack` on the documented unlocked-preflight plus final-superset-lock
  behavior, and move any duplicated reload/recompute/persist shape behind
  shared mailbox owner helpers
- preserve the current Windows-safe sentinel cleanup behavior already present
  from `fix/issue-104-inbox-locks`; no sprint in Phase P may regress that

Tests required:
- deterministic success-path and failure-path tests for:
  - read timeout polling without mailbox lock acquisition
  - writeback path lock acquisition only when mutation is actually needed
  - overlapping-origin `read`/`ack`/`clear` no-deadlock behavior
  - source-set drift under lock causing clean abort instead of partial write
- no test may rely on open-ended sleep loops, background polling races, or
  indefinite `join()` behavior

Acceptance criteria:
- observational mailbox reads are lock-free
- every mailbox writeback path uses the shared commit pattern
- `ack`/`clear`/`read` do not each carry bespoke stale-snapshot writeback logic

#### P.3 — ATM-Owned State Families: Seen-State, Send-Alert, Restore, Tasks

Goal:
- harden every ATM-owned state family outside the mailbox surface so each one
  follows the same mutation taxonomy and one-owner write rule

Design details:
- seen-state:
  - `read_only` load
  - `read_modify_write` save through one seen-state helper
- send-alert state:
  - state JSON and lock file treated as one owner-managed state family
  - lock semantics and stale-lock handling documented and isolated from command
    logic
- restore/task/highwatermark state:
  - restore marker, task bucket install, and `.highwatermark` recompute all go
    through typed owner helpers
  - staging directories remain staging-only and are never treated as live
    source-of-truth files
- team config:
  - continue config-last semantics
  - write path remains singular and helper-owned

Implementation patterns:
- prefer one helper per file family over one mega-helper that obscures domain
  rules
- keep PID/stale-lock handling deterministic and isolated
- avoid mixed responsibility functions that both plan restore state and commit
  several file families ad hoc

Files expected in scope:
- `crates/atm-core/src/read/seen_state.rs`
- `crates/atm-core/src/send/mod.rs`
- `crates/atm-core/src/team_admin.rs`
- `crates/atm-core/src/doctor/mod.rs`
- `crates/atm-core/src/persistence.rs`

Concrete implementation targets:
- remove any remaining direct knowledge of send-alert JSON file shape from
  non-owner call sites
- keep restore staging helpers separate from live-file commit helpers so review
  can distinguish staging-only writes from source-of-truth writes
- if team-admin remains the owner for multiple file families, factor internal
  helper sections or submodules so each file family still has one obvious write
  boundary in code review

Tests required:
- deterministic round-trip tests for each ATM-owned state family
- deterministic stale-lock eviction tests for send-alert state
- deterministic restore-marker/task/highwatermark tests with no timing races
- explicit regression tests that staging cleanup and restore-marker cleanup do
  not hang or depend on filesystem timing quirks

Acceptance criteria:
- every ATM-owned non-mailbox state family in `P.0` uses one owner-layer write
  boundary
- no command handler directly manipulates the underlying JSON/state file shape
  when a helper exists

#### P.4 — Claude-Owned Inbox Compatibility Retirement

Goal:
- remove ATM-local workflow durability from the Claude-owned inbox rewrite path
- move toward an ATM-owned source-of-truth for ATM workflow state

Design details:
- introduce an ATM-owned workflow sidecar family under:
  - `.claude/teams/<team>/.atm-state/workflow/<agent>.json`
- the sidecar is the ATM-owned source of truth for mailbox-local ATM workflow
  state rather than storing new durable ATM semantics by rewriting
  Claude-owned inbox records
- sidecar records are keyed by stable ATM message identity string:
  - ULID for forward ATM-authored messages
  - legacy UUID `message_id` for compatibility records that already have one
- Phase P.4 does not invent durable ATM-local workflow state for Claude-native
  records that lack ATM message identity; those remain compatibility-only until
  a later explicit enrichment/migration decision lands
- initial target data to move behind the sidecar boundary:
  - read state
  - ack-required / acknowledged state
  - ATM message identity metadata when ATM authors the message
- `send` remains allowed to append a new Claude-native inbox record, but ATM
  workflow durability should be committed to ATM-owned state
- `read`, `ack`, and `clear` should project mailbox display state by joining:
  - Claude-owned inbox records
  - ATM-owned workflow sidecar state
- legacy top-level ATM compatibility fields remain readable during migration,
  but new source-of-truth behavior must not depend on rewriting them in place

Implementation patterns:
- design the sidecar keying and join semantics before migration code lands
- use one owner-layer helper for sidecar load/project/save rather than letting
  `read`, `ack`, and `clear` each shape sidecar JSON independently
- prefer lazy backfill or compatibility projection over one risky bulk rewrite
- keep the join deterministic and bounded; no background repair daemon

Files expected in scope:
- `crates/atm-core/src/schema/inbox_message.rs`
- `crates/atm-core/src/mailbox/*`
- `crates/atm-core/src/read/mod.rs`
- `crates/atm-core/src/ack/mod.rs`
- `crates/atm-core/src/clear/mod.rs`
- `crates/atm-core/src/send/mod.rs`
- `docs/atm-message-schema.md`
- `docs/claude-code-message-schema.md`

Concrete implementation targets:
- define one owner helper for sidecar load/project/save before modifying `read`,
  `ack`, or `clear` to consume it
- make sidecar projection deterministic over mixed legacy and forward ATM
  records before any inbox rewrite retirement code lands
- keep message-schema docs and code keyed to the exact same identity contract;
  no sprint may silently broaden the key space without doc updates

Tests required:
- deterministic projection tests over mixed legacy and sidecar-backed data
- deterministic compatibility tests proving Claude-only appends are preserved
- deterministic tests proving sidecar updates are keyed by ATM message identity
  and do not require rewriting the source inbox record
- no test may rely on concurrent unbounded polling or external background
  repair to reach a passing state

Acceptance criteria:
- ATM no longer depends on full-file rewrite of Claude-owned inbox files for
  new durable ATM workflow state
- architecture docs can truthfully say ATM-local workflow durability lives in
  an ATM-owned source-of-truth path
- the sidecar file path and key format are documented and covered by tests

#### P.5 — Cleanup, Audit Closure, And Production Gate

Goal:
- close the phase only when the documented model is actually enforceable in
  code review and CI

Deliverables:
- rerun the explicit `P.0` inventory audit against the current source tree
- remove any leftover parallel write paths
- update module ownership docs if helpers moved
- add code comments at the owner-layer boundaries where future contributors are
  most likely to slip back into stale-snapshot or bespoke-write habits

Deterministic test policy:
- every new failure-path test must have a bounded completion path
- allowed patterns:
  - barrier/channel readiness handshakes
  - short bounded lock timeout values
  - deterministic fault-injection seams
  - elapsed-time upper bounds with generous CI margins
- forbidden patterns:
  - open-ended polling loops waiting for "eventual" state
  - indefinite `join()` waits
  - sleeps used as the primary synchronization mechanism
  - tests that only fail under scheduler luck or filesystem timing luck
  - stress tests that are expected to pass "most of the time"

Verification required:
- `cargo fmt --check`
- `cargo clippy --workspace --all-targets -- -D warnings`
- `cargo test --workspace`

Phase P completion gate:
- every live file family in `P.0` is classified and owned
- every live write path goes through one owner-layer write boundary
- requirements, architecture, project plan, and module ownership docs agree
- the test suite for the phase is explicitly deterministic and CI-safe
- the remaining external-writer limitations, if any, are documented as accepted
  compatibility boundaries rather than hidden assumptions

## 21. Phase Q — SQLite Mail SSOT And Runtime Boundary [PLANNED]

Detailed design source:
- [`docs/plan-phase-Q.md`](./plan-phase-Q.md)

Goal:
- replace filesystem JSON as ATM's mail source of truth with SQLite
- reintroduce one tightly-bounded singleton daemon runtime
- eliminate mailbox-lock dependence from ATM mail correctness

Hard architectural constraints:
- exactly one daemon per host
- impossible for two active daemons to run at the same time
- every subsystem is behind a strict trait boundary for all external I/O
- daemon/runtime code stays thin and does not absorb business logic
- daemon spawning is not the core test strategy
- structured `sc-observability` remains first-class at both CLI and daemon
  layers
- production fallible paths use typed error unions / `Result` propagation
  rather than panic/unwrap as the normal error strategy

Core design decisions:
- SQLite is the source of truth for:
  - messages
  - ack/task state
  - read/clear visibility state
  - team roster
- daemon memory is the live truth for agent status
- `atm doctor` remains a CLI command but must query daemon/runtime state in the
  Phase Q target architecture
- Claude inbox JSONL remains compatibility ingress/egress only
- native agent/plugin traffic does not use JSONL
- one daemon API, two production transport implementations:
  - Unix domain socket for same-host
  - TCP/TLS for cross-host daemon-to-daemon traffic
- one in-process `test-socket` transport for transport-boundary tests
- remote address model expands to `agent@team.host`
- bounded transient retry is allowed for remote delivery, but there is no
  durable long-lived remote outbox
- successful remote delivery requires remote daemon acceptance inside the
  bounded retry window

Planned sprint sequence:

Integration branch:
- `integrate/phase-Q`

### Q.0 — Boundary Cleanup And Debt Retirement

Scope:
- align the existing codebase with the Phase Q target shape before store and
  daemon work begin
- remove technical debt and duplicated compatibility helpers that would
  otherwise slow or distort Q.1+
- keep this sprint strictly about current retained-path cleanup, not
  speculative pre-implementation of later architecture

Implementation focus:
- one shared inbox write boundary
- one shared inbox hydration boundary
- one owned message-id compatibility bridge
- explicit roster/member construction instead of hidden defaults
- centralized hook payload and hook trigger seams
- config-ingest validation parity between shorthand and object forms

Code-review evidence:
- `mailbox::atomic::write_messages()` already serves as the real inbox write
  seam and should remain the only ATM-owned writer
- `schema::to_shared_inbox_value()` and
  `hydrate_legacy_fields_from_metadata()` already serve as the real schema
  compatibility seams and should absorb duplicated helper logic
- recent context-injection fixes showed that drift around these boundaries
  creates immediate product failures
- recent `AgentName` / `AgentMember` cleanup showed that hidden defaults and
  duplicated hydrators create migration friction rather than helping

Acceptance:
- one ATM-owned inbox write boundary and one ATM-owned metadata hydration
  boundary are explicit and covered by tests
- duplicated compatibility helpers are removed from retained command/test code
- hidden default construction for externally meaningful identity fields is
  removed from retained runtime/test paths
- hook/event shaping is auditable from one service boundary
- shorthand and object-form `config.json` member parsing follow the same
  validation expectations where both remain supported
- the codebase is simpler to migrate after Q.0 than before Q.0

### Q.1 — Store And Boundary Foundation

Scope:
- add the SQLite store boundary family:
  - `MailStore`
  - `TaskStore`
  - `RosterStore`
- add the first concrete SQLite implementation crate: `atm-rusqlite`
- add the strict I/O trait boundaries for store, inbox ingress/export, config
  ingress, watcher/reconcile, transport, dispatcher, and notification
- keep service logic fully testable in-process

Parallelization rule:
- Q.0 must be complete before Q.1 is treated as the contract lock-in point
- Q.1 is the convergence point
- Unix/TCP/test-socket transport work, watcher/reconcile work, and
  command-handler migration must not branch into parallel implementation until
  the core boundary traits, dispatcher/handler seams, and request/result
  contracts are defined and reviewed
- once those contracts are stable, follow-on sprints may execute in parallel
  against the shared boundary set

Acceptance:
- SQLite opens under `.claude/teams/<team>/.atm-state/mail.db`
- `atm-rusqlite` is the only crate that owns direct SQLite calls in the first
  implementation line
- core logic is reachable without daemon process spawning
- no direct SQLite or filesystem bypasses outside the owning boundaries
- watcher/reconcile logic exists behind its own boundary and does not bypass
  ingress/store/notifier ownership rules
- transport-boundary tests can replace Unix/TCP with the in-process
  `test-socket` transport
- the core boundary traits and request/result contracts are explicit enough to
  allow parallel follow-on implementation without transport/business-logic
  drift
- the dispatcher/handler contract is explicit enough that Unix, TCP/TLS, and
  `test-socket` implementations can proceed without owning request-family
  behavior

### Q.2 — Compatibility Ingress And Export

Scope:
- import Claude/legacy inbox JSONL into SQLite
- import roster updates from `config.json` into SQLite
- keep ATM export Claude-native at the top level with `metadata.atm`

Parallel execution after Q.1:
- inbox ingress/export can proceed in parallel with:
  - transport adapter work
  - watcher/reconcile implementation
  - handler/service migration
  once the Q.1 boundary contracts are locked

Acceptance:
- external Claude writes become durable in SQLite through one owned ingress path
- export compatibility remains intact
- roster truth no longer depends on `config.json` as the durable source

### Q.3 — Ack/Task Migration

Scope:
- move ack-required state and task state to SQLite-owned semantics
- keep reply export behind SQLite commit success

Parallel execution after Q.1:
- ack/task migration can proceed in parallel with Q.2/Q.4 transport and
  watcher work so long as it stays within the locked service/store/export
  contracts

Acceptance:
- ack/task state is authoritative in SQLite
- reply export remains compatible for Claude recipients

### Q.4 — Read/Clear Cutover + Thin Daemon Runtime

Scope:
- move `read` and `clear` to SQLite-owned mail semantics
- add the singleton daemon runtime
- implement one protocol with Unix socket and TCP/TLS adapters
- add the in-process `test-socket` transport for transport-boundary tests
- keep live agent status in daemon memory
- add daemon-query support needed by `atm doctor`

Parallel execution after Q.1:
- Unix transport, TCP/TLS transport, `test-socket`, watcher/reconcile, and
  daemon-query plumbing may proceed in parallel as separate slices once the
  shared dispatcher/handler and boundary contracts are stable

Acceptance:
- `read` and `clear` no longer require mailbox JSON rewrite correctness
- second daemon startup fails deterministically
- remote traffic is daemon-to-daemon only
- remote send success depends on remote daemon acceptance
- daemon-unavailable CLI/runtime calls first attempt one documented auto-start
  and then fail clearly with no hidden fallback if the daemon still cannot run
- daemon code remains a thin runtime wrapper over the service boundaries
- handler behavior is testable through the in-process `test-socket` transport

### Q.5 — Lock Retirement And Ops Cleanup

Scope:
- retire mailbox-lock dependence from ATM mail correctness
- remove reliance on the 5-minute stale-lock sweep for normal mail flows
- align doctor/restore/ops docs to SQLite ownership

Acceptance:
- mailbox locks are no longer part of the normal mail correctness contract
- stale lock artifacts can no longer wedge ATM mail flows
- requirements, architecture, and project plan all match the final design

### Q.6 — Production-Readiness Gate And Release

Scope:
- prove the Phase Q implementation is production ready rather than merely
  architecturally aligned
- run the release gate, packaging gate, and final QA/documentation alignment
- publish only after all prior sprint gates are green

Acceptance:
- version bump planning is complete
- `cargo publish --dry-run` succeeds for the intended publish set
- crates.io publish succeeds for the intended release line
- GitHub release/tag/binary artifact steps are complete
- `CHANGELOG.md` is updated
- all Phase Q release-gate conditions pass on the release candidate

Phase Q completion gate:
- Q.0 through Q.6 are complete on `integrate/phase-Q`
- SQLite is authoritative for messages, ack/task state, visibility state, and
  roster truth
- `send` and `ack` operate through the daemon production path
- production CLI/runtime paths obey one-attempt daemon auto-start semantics
  with typed failure on final daemon unavailability
- `recipient_pane_id` is sourced from SQLite roster truth when known
- mailbox-lock correctness dependence is retired from normal mail flows
- release-gate and QA invariants for Phase Q are satisfied

QA invariants for every Phase Q pass:
- impossible to run two active daemons on one host
- daemon unavailability fails clearly without hidden fallback to direct store or
  inbox access
- every subsystem performs I/O only through its owning trait boundary
- any observed SQL, watcher, notifier, or socket-boundary bypass is an
  immediate QA failure
- daemon/runtime code remains thin
- socket receive loops remain tiny dispatcher loops only
- any socket loop that performs SQL, watcher, notifier, or workflow logic is
  an immediate QA failure
- any watcher/reconcile implementation that performs SQL, socket, or notifier
  logic inline is an immediate QA failure
- daemon spawning is not the core test strategy
- typed errors are preserved across CLI, daemon, and core boundaries for
  fallible runtime paths
- `AtmErrorCode` remains a centralized read-only registry with no subsystem
  local alternatives
- structured `sc-observability` remains present at both CLI and daemon layers
- SQLite remains the source of truth for mail and roster
- live status remains daemon-memory truth
- Claude compatibility remains Claude-native top-level plus `metadata.atm`