# Managed Agents Deep Audit (phase-4.11.1)

## URL coverage

All 9 canonical Managed Agents pages fetched in full via WebFetch. No
linked `/v1/agents`, `/v1/sessions`, `/v1/environments` reference pages
were available as standalone targets beyond what is inlined in the
guides; every endpoint surface used in the guides is documented below.

- overview, quickstart, agent-setup, sessions, skills, tools, memory,
  files, vaults — all retrieved 2026-04-18.
- Features beta-gated (`managed-agents-2026-04-01` header on every
  request). Memory, outcomes, multi-agent are Research Preview and
  require a separate access form.

## Per-page digests

### overview
Managed Agents is explicitly framed as the **opposite product surface**
to the Messages API. Messages = "direct model prompting, custom agent
loops"; Managed Agents = "pre-built, configurable agent harness that
runs in managed infrastructure". Core objects: `agent` (persona +
tools + skills, versioned), `environment` (cloud container template),
`session` (running instance), `events` (SSE). Runs Claude 4.5+ only.
Rate limits: 60/min create, 600/min read per org, plus tier spend
limits.

### quickstart
Installs `ant` CLI + SDKs (Python/TS/Go/Java/C#/Ruby/PHP). Flow:
`POST /v1/agents` → `POST /v1/environments` → `POST /v1/sessions` →
`POST /v1/sessions/{id}/events` + SSE stream at
`/v1/sessions/{id}/stream`. `agent_toolset_20260401` enables the
full built-in toolset. Session stays `idle` until a user event; agent
autonomously tool-calls until it emits `session.status_idle`.

### agent-setup
Agents are **versioned resources**. Fields: `name`, `model`, `system`,
`tools`, `mcp_servers`, `skills`, `callable_agents` (multi-agent, RP),
`description`, `metadata`. Updates generate new versions with
optimistic concurrency (`version` argument). Lifecycle: update → new
version; list versions; archive (read-only; existing sessions keep
running). Agents can be pinned per-session by passing
`{type:"agent", id, version}`.

### sessions
A session requires `agent` + `environment_id`. Statuses:
`idle | running | rescheduling | terminated`. Sessions are
**stateful** — history persisted server-side, container mounted,
retrievable and listable. Event delivery: POST user events, open
SSE stream. Archive preserves history and blocks new events; delete
tears down container + events. Files, memory stores, environments,
and agents are independent and survive session deletion. Supports
`vault_ids[]` and `resources[]` (files, memory_stores, GitHub repos).

### skills
**Same SKILL.md model as Claude Code**: filesystem-based, progressive
disclosure, attached to the agent. Two flavors: `anthropic` pre-built
(e.g., `xlsx`, `pptx`, `docx`, `pdf`) and `custom` org-authored with
versioning (`latest` or pinned). Cap: 20 skills per session. Skills
are invoked automatically when relevant; they do not consume context
until needed.

### tools
Built-in toolset (`agent_toolset_20260401`): `bash`, `read`, `write`,
`edit`, `glob`, `grep`, `web_fetch`, `web_search` — a 1:1 subset of
Claude Code's harness. Per-tool enable/disable via `configs[]`;
`default_config.enabled:false` for whitelist mode. Custom tools are
client-executed (Messages-API-equivalent tool-use contract) and MCP
servers attach at agent level.

### memory
**Research Preview.** Memory stores (`memstore_...`) are workspace-
scoped collections of ≤100KB text "memories" mounted per session via
`resources[].memory_store`. Up to 8 stores/session, `read_only` or
`read_write`. Agent gets `memory_{list,search,read,write,edit,delete}`
tools automatically. Every mutation creates an immutable `memver_...`
with full audit trail, optimistic concurrency via
`content_sha256`/`not_exists` preconditions, and a `redact` endpoint
for PII/secret scrubbing that keeps the audit record but nukes the
content. This is the first-class replacement for our BM25-over-BQ
long-term memory.

### files
Upload via Files API → mount at `resources[].file` with arbitrary
`mount_path` (read-only inside container, absolute paths). Up to 100
files/session. Files are resources independent of session lifecycle.
Session-scoped listing via `files.list(scope_id=sesn_...)` lets you
retrieve artifacts the agent produced. Copies into session don't
count against storage limits.

### vaults
Per-end-user credential primitive. Workspace-scoped. Holds up to 20
`credential` objects, each bound immutably to a single
`mcp_server_url`. Two auth types: `mcp_oauth` (Anthropic handles
refresh when you register `refresh.token_endpoint` + client auth
style) and `static_bearer`. Secret fields write-only, never returned.
`vault_ids[]` passed at session creation; mid-session rotation
propagates without restart. Only useful for **MCP-server auth** — not
a general secret manager (cannot inject arbitrary env vars into the
container, cannot hold non-MCP keys).

## pyfinAgent fit analysis

**1. Is it a different product surface?**
Yes. Managed Agents is a fully **server-hosted, stateful container
harness** — Anthropic runs the agent loop, the sandbox, the tool
execution, and persists event history. The Messages API we rely on
(`llm_client.py`, all 28 Gemini agents via Vertex, our MAS
orchestrator) is not replaced — Managed Agents only hosts Claude
models (4.5+) and does not support Gemini, so Layer 1 stays on
Messages/Vertex regardless.

**2. Would Layer-2 MAS or the harness cycle benefit from migration?**
*Layer-2 MAS* (`multi_agent_orchestrator.py`): mixed. Managed Agents
would give us free sandboxed bash/file tools, SSE streaming, and
server-side conversation state — but we already run these agents in
our own FastAPI process and need tight integration with BQ, paper
trader, ticket queue. Migration cost is high for modest gain.
*Harness cycle* (`scripts/harness/run_harness.py`,
`autonomous_harness.py`): **potentially high-value**. The harness is
long-running, tool-heavy, already follows Plan→Generate→Evaluate with
Claude Opus. Managed Agents natively supports: durable sessions,
resume semantics, event log = `handoff/`-equivalent, SSE streaming to
the frontend Harness tab, vault for MCP auth, memory stores for cross-
cycle learnings (replacing our `pyfinagent_data.harness_learning_log`
BQ table). The dual-evaluator pattern maps cleanly onto
`callable_agents` (multi-agent RP).

**3. Cost / retention / residency.**
No public pricing table on these pages. The container compute is
billed in addition to model inference. Rate-limited 60 create / 600
read per min per org. Data residency not discussed — assume US-only
until Anthropic documents otherwise; a blocker for any
EU-residency-sensitive data (our GCP billing export is EU; none of
our prod data is). Session archive preserves history indefinitely;
delete is hard. Memory versions accumulate forever until explicitly
deleted or redacted.

**4. Vaults vs. GCP Secret Manager / env vars.**
**Vaults are narrowly scoped to MCP-server auth.** They do not
replace Secret Manager for GCP service accounts, Slack signing
secrets, NextAuth keys, Anthropic/Gemini API keys, etc. If we ever
add user-authorized MCP servers (e.g., per-user Slack, Linear,
GitHub OAuth for the Slack bot), vaults would be the right tool and
would eliminate us writing a per-user OAuth token store. For our
current single-tenant admin-only app, no immediate relevance.

**5. Does it solve our file-based handoff problem?**
Partially, and worth serious thought for phase-4.11+. Our
`handoff/current/{contract,experiment_results,evaluator_critique}.md`
+ `harness_log.md` is essentially a hand-rolled implementation of
what Managed Agents gives natively as: session event history +
memory store + session `resources[]`. The five-file protocol is load-
bearing precisely because the Messages API has no server-side
session. If we move the harness loop onto Managed Agents, three of
the five files become server-side primitives; we'd keep
`contract.md` (human-readable plan) and `harness_log.md` (cross-cycle
summary, which maps to a memory store). **But** the protocol's
real value is Anthropic's "harness design" discipline (immutable
success criteria, dual evaluator, research gate) — which is
orthogonal to where state lives. Moving to Managed Agents would not
remove the discipline, only the file plumbing.

## MUST FIX
None. This is greenfield. Our current harness is conformant with the
Anthropic harness-design doctrine; it just uses a different storage
substrate.

## NICE TO HAVE / adoption evaluation

Ranked by ROI:

1. **Pilot the harness cycle on Managed Agents (phase-4.12 candidate).**
   Single-agent first: port `run_harness.py` GENERATE phase to a
   Managed Agent session with `agent_toolset_20260401`, attach a
   memory store in place of `harness_learning_log`, keep
   qa-evaluator and harness-verifier as local subagents until
   `callable_agents` leaves Research Preview. Expected win: kill
   zombie-worker problems, free SSE stream for the Harness tab, and
   get audited memory versioning for free. Request memory + multi-
   agent RP access via the form linked in the overview page.

2. **Adopt Anthropic pre-built skills** (xlsx/pptx/docx/pdf) for the
   Slack bot and investor-report flow. Replaces any hand-rolled
   openpyxl/python-pptx paths. Zero migration cost — just attach
   them to the agent config.

3. **Defer vaults** until we add user-facing MCP integrations.
   Current single-tenant model doesn't need them; GCP Secret Manager
   continues to cover service-account secrets.

4. **Do NOT migrate Layer-1 or Layer-2 yet.** Layer 1 is Gemini-bound;
   Managed Agents is Claude-only. Layer 2 has too much local
   orchestration (paper trader, ticket queue) to justify the
   container round-trip cost per turn.

5. **Stress-test doctrine check.** Per CLAUDE.md, "every harness
   component encodes an assumption about what the model can't do" —
   Managed Agents is Anthropic's own answer to the same question,
   so it is worth re-running a representative harness step via a
   Managed Agent session (no local five-file plumbing) and comparing
   the output quality/cost to our current run. That experiment is a
   direct test of whether our scaffolding is still load-bearing.

## References

- https://platform.claude.com/docs/en/managed-agents/overview
- https://platform.claude.com/docs/en/managed-agents/quickstart
- https://platform.claude.com/docs/en/managed-agents/agent-setup
- https://platform.claude.com/docs/en/managed-agents/sessions
- https://platform.claude.com/docs/en/managed-agents/skills
- https://platform.claude.com/docs/en/managed-agents/tools
- https://platform.claude.com/docs/en/managed-agents/memory
- https://platform.claude.com/docs/en/managed-agents/files
- https://platform.claude.com/docs/en/managed-agents/vaults
- https://www.anthropic.com/engineering/harness-design-long-running-apps
  (project canonical harness doctrine)
- `/Users/ford/.openclaw/workspace/pyfinagent/CLAUDE.md` — harness
  protocol, research-gate, stress-test doctrine
- `/Users/ford/.openclaw/workspace/pyfinagent/scripts/harness/run_harness.py`
- `/Users/ford/.openclaw/workspace/pyfinagent/backend/autonomous_harness.py`
- `/Users/ford/.openclaw/workspace/pyfinagent/backend/agents/multi_agent_orchestrator.py`