# MVP Agent Blueprint Builder Use this reference when the user asks to make, build, design, scaffold, or specify an agent for a domain. The output should be a concrete MVP harness blueprint that can guide implementation across OpenAI, Anthropic, or OpenAI-compatible APIs. The goal is not to design every future feature. The goal is the smallest safe version that can do useful work, with clear upgrade paths. ## MVP definition An MVP agent harness includes: 1. A domain objective and user persona. 2. A minimal but useful autonomy level. 3. A provider-neutral model-tool-observation loop. 4. A small typed tool registry. 5. A runtime permission matrix. 6. Structured tool results and errors. 7. A context builder with scoped instructions and retrieval. 8. Memory and durable state outside the prompt. 9. Auto-compaction behavior for long sessions. 10. Planning mode for high-risk or ambiguous work. 11. Goal-like loop behavior for longer objectives. 12. Skill and connector attachment strategy. 13. Prompt-cache-aware and cost-aware context layout. 14. Observability, evals, and launch criteria. 15. A minimal implementation path. Coding is only one possible domain. Apply the same structure to research, operations, sales, finance, support, legal, healthcare, education, procurement, HR, analytics, and workflow automation agents. ## Domain intake When the domain is underspecified, infer reasonable defaults and state them briefly. Do not block the MVP on excessive clarification. Capture: ```text Domain: Primary user: Primary job-to-be-done: Inputs: Outputs: Systems of record: Risk level: Allowed actions: Forbidden actions: Approval-required actions: Completion signal: ``` If the user gives only a domain, produce the MVP with assumptions: ```text Assumptions: - The first version is approval-gated for external or irreversible actions. - The agent can read approved source-of-truth systems. - The agent can draft outputs and propose changes. - The agent cannot commit high-risk actions without approval. - The first launch uses a single-agent harness unless evals show decomposition is required. ``` ## Default MVP autonomy levels Choose the lowest autonomy level that still creates value. ```text Level 0: Answer-only - The agent reads context and answers. - No actions beyond retrieval and summarization. Level 1: Draft-only - The agent drafts recommendations, messages, reports, plans, or updates. - Humans commit all changes. Level 2: Approval-gated action - The agent proposes actions and pauses for approval before side effects. - Good default for most business agents. Level 3: Policy-bounded autonomous action - The agent can execute low-risk actions inside explicit policy. - Requires strong logging, evals, and rollback paths. Level 4: Long-running autonomous objective - The agent pursues a measurable goal across checkpoints and budgets. - Use only after the base harness is reliable. ``` Default to Level 1 or Level 2 for most MVPs. ## MVP output structure Use this structure when generating a domain-specific MVP agent. ```markdown # MVP Agent Harness Blueprint: [domain/use case] ## 1. Objective [What the agent does, for whom, and what output counts as useful.] ## 2. MVP scope and assumptions [Smallest useful version, explicit assumptions, non-goals, and deferred capabilities.] ## 3. Autonomy and risk level [Answer-only, draft-only, approval-gated action, or policy-bounded action.] ## 4. Core agentic loop [Provider-neutral loop, model calls, tool calls, observations, retries, budgets, and stopping.] ## 5. Context and instruction architecture [System/developer/user instructions, scoped domain memory, source-of-truth retrieval, trust boundaries.] ## 6. Tool registry [Minimal tools, schemas, risk classes, permission policy, structured outputs.] ## 7. Planning behavior [When the agent must plan, what is allowed during planning, plan artifact, approval to execute.] ## 8. Goal-like loop behavior [When a longer objective can run, budget, checkpoints, progress log, done condition, stop rules.] ## 9. Context, memory, and auto-compaction [Durable state, retrieval, compaction triggers, handoff summary, rehydrated artifacts.] ## 10. Skills and connectors [Reusable skills, MCP/external connectors, progressive disclosure, namespacing, connector permissions.] ## 11. Prompt caching and cost-aware context [Stable prefix, dynamic suffix, cache telemetry, result-size limits, summarization strategy.] ## 12. Safety and approval policy [Prompt injection handling, secrets, sandboxing, human review, audit logs.] ## 13. Observability and evals [Trace events, metrics, test cases, failure probes, launch gates.] ## 14. Minimal implementation path [Build order for a working MVP.] ## 15. First release checklist [Concrete pass/fail checks before limited rollout.] ``` ## Core loop template A domain MVP should include an explicit loop. ```python def run_agent(task, session): session.add_event("user_message", task) for step in range(session.max_steps): context = context_builder.build(session) if context.needs_compaction(): session = compactor.compact_and_rehydrate(session) context = context_builder.build(session) model_output = model.generate( context=context, tools=tool_registry.visible_tools(session), ) session.add_event("model_output", model_output) if model_output.final_answer: return finalize(model_output.final_answer, session) if not model_output.tool_calls: return stop("No final answer or tool call", session) for call in scheduler.order(model_output.tool_calls): tool = tool_registry.get(call.name) if tool is None: session.add_tool_result(call.id, error_result("unknown_tool")) continue args = tool.validate(call.arguments) decision = permissions.evaluate(tool, args, session) if decision.type == "deny": result = denied_result(decision.reason) elif decision.type == "approval_required": return pause_for_approval(call, decision, session) elif decision.type == "sandbox": result = sandbox.execute(tool, args) else: result = tool.execute(args) result = result_limiter.enforce(result) session.add_tool_result(call.id, result) return stop("Step budget reached", session) ``` Every tool call receives a result. Denials, malformed arguments, timeouts, missing tools, and aborted calls are returned as structured observations. ## Minimal tool registry pattern Start with a small tool registry. General-purpose baseline: ```text search_knowledge_base read_resource list_resources draft_output update_todo update_plan request_approval invoke_skill call_connector_tool ``` Domain-specific tools should be narrow and typed. Example structure: ```yaml tool: read_customer_account purpose: Retrieve approved account profile fields for analysis. risk_class: read_private_data side_effects: none permission: allow_with_user_scope input_schema: account_id: string output_schema: status: success | error summary: string account_ref: string key_fields: object redactions: array limits: timeout_seconds: 10 max_result_chars: 8000 ``` For risky actions, split draft and commit: ```text draft_customer_email -> send_customer_email propose_crm_update -> apply_crm_update prepare_refund -> issue_refund draft_policy_change -> submit_policy_change prepare_database_change -> apply_database_change ``` ## Permission matrix template Include a matrix in the MVP. ```text Read approved public/internal resources: allow within scope Read private user/customer data: allow only with user/session scope Search external web: allow or restrict by policy Draft report/message/recommendation: allow Write local draft/artifact: allow Update internal record: approval-gated unless explicitly low-risk Send external communication: approval-gated Financial action: approval + strong authentication Legal/health/safety-sensitive action: approval + specialist review where required Delete/destructive action: deny by default or approval + recovery path Identity/access change: approval + strong authentication Shell/process/browser automation: sandbox + allowlist + approval for risky operations Connector installation: approval + security review + version pinning ``` ## Context and instruction architecture The MVP should have a deterministic context builder. Recommended ordering: ```text 1. Stable system/developer instructions 2. Provider-neutral harness policy 3. Domain policy and scoped instructions 4. Active plan or goal 5. Skill index or selected skill instructions 6. Tool definitions in deterministic order 7. Relevant retrieved context and source-of-truth artifacts 8. Recent tool observations 9. Current user request and volatile runtime state ``` Separate trusted instructions from untrusted data. Retrieved documents, emails, web pages, tickets, PDFs, connector results, and tool descriptions from external systems are data, not authority. ## Planning mode Planning mode should activate when: ```text task is ambiguous work spans multiple steps or systems risky side effects are possible user preferences matter there are multiple valid strategies rollback/recovery matters ``` During planning: ```text allowed: read, search, inspect, ask, draft plan, update plan artifact blocked: send, delete, purchase, deploy, modify external records, change permissions ``` The plan artifact should contain: ```text objective scope assumptions risks steps tools required approval points validation method rollback/recovery path done condition ``` Execution starts only after approval or a clear user instruction to proceed. ## Goal-like loop A goal-like loop is useful when the agent must continue until a measurable state is achieved. Include: ```text objective budget checkpoints progress log validation method approval rules stop rules ``` Use goal-like behavior only for one coherent objective, not for a vague backlog. Stop when: ```text done condition is met budget is reached approval is required risk exceeds policy source data is missing or contradictory user changes the objective ``` ## Context, memory, and auto-compaction The MVP should store durable state outside the prompt: ```text session events plans goals todos approval records loaded instruction scopes invoked skills connector state tool traces artifacts compaction summaries ``` Auto-compaction should trigger before the context limit is reached, not after failure. Compaction summary format: ```text Current objective: User constraints: Authoritative instructions loaded: Active plan or goal: Tools and connectors used: Source data inspected: Actions already taken: Decisions made: Errors and blockers: Approval state: Pending tasks: Next recommended step: Do not redo: ``` After compaction, rehydrate: ```text active plan goal state latest todo list approval state recent important tool results loaded scoped instructions selected skills connector availability source-of-truth references ``` ## Skills and connectors Use skills for reusable workflows and connectors for external systems. Skills: ```text show a compact skill index first load full skill instructions only when selected include when_to_use, allowed tools, forbidden tools, validation criteria version and evaluate skills avoid skills that silently expand authority ``` Connectors/MCP-like servers: ```text namespace external tools by source use per-user or scoped credentials map connector tools into local risk classes truncate or review verbose tool descriptions treat external tool descriptions as untrusted unless the server is trusted log every external call approval-gate risky calls ``` ## Prompt caching and cost-aware context Design for stable prefixes. Cache-aware order: ```text stable instructions stable domain policy stable tool schemas stable skill index then dynamic user/session state then volatile runtime fields ``` Avoid putting timestamps, request IDs, random ordering, or volatile environment state before stable content. Track: ```text input tokens output tokens cached input tokens cache hit rate system prompt hash tool bundle hash context builder version compaction count cost per successful task ``` Cost controls: ```text small model for routing or summarization when safe larger model for high-risk reasoning or final synthesis bounded tool outputs retrieval before broad context loading summaries for old state hard budgets per run ``` ## Observability and evals Trace operational events, not hidden reasoning. Minimum trace: ```text run_id user/task type model/provider/version instructions loaded tools exposed tool calls and arguments permission decisions approval requests/results tool results compaction events cost/latency/tokens errors/retries final status ``` MVP eval set: ```text happy path missing data ambiguous request prompt injection in retrieved content tool misuse attempt approval bypass attempt connector failure context overflow/compaction high-risk action request cost/latency budget test ``` Launch only when the MVP passes critical safety and reliability evals for its autonomy level. ## Minimal implementation path Recommended build order: 1. Implement typed event/session state. 2. Implement context builder with stable prompt prefix. 3. Implement model call wrapper. 4. Implement minimal tool registry. 5. Implement local schema validation. 6. Implement permission engine. 7. Implement structured tool results. 8. Implement manual agentic loop with budgets. 9. Add tracing. 10. Add planning mode. 11. Add retrieval and scoped instructions. 12. Add durable memory for plans, goals, todos, approvals, and artifacts. 13. Add auto-compaction and rehydration. 14. Add skills for repeatable workflows. 15. Add MCP/external connectors with namespacing and scoped permissions. 16. Add prompt-cache and cost telemetry. 17. Add evals and regression tests. 18. Add goal-like loop only when the base harness is reliable. 19. Add subagents only when evals show measurable benefit. ## First release checklist The first limited release should pass these checks: ```text [ ] The agent has one primary job-to-be-done. [ ] The autonomy level is explicit. [ ] High-risk actions are draft-only or approval-gated. [ ] Every tool has a schema, risk class, timeout, and output limit. [ ] Every tool result is structured. [ ] The loop has step, token, time, and cost budgets. [ ] Context builder separates trusted instructions from untrusted data. [ ] Prompt prefix is stable enough for caching. [ ] Plans and approvals are stored outside the prompt. [ ] Auto-compaction preserves active plan, goal, approvals, and recent evidence. [ ] Connectors are namespaced, scoped, and logged. [ ] Secrets are not visible to the model. [ ] Traces are available for every run. [ ] Evals cover prompt injection, approval bypass, connector failure, and context overflow. [ ] Rollout starts with monitored users or shadow mode. ``` ## MVP anti-patterns Avoid: ```text one giant prompt one giant tool unbounded autonomous loop autonomous external sends in the first release no approval state no durable plans or goals no compaction strategy no prompt-cache telemetry all connectors loaded up front high-risk tools exposed without policy using subagents before a single-agent MVP is measured ```