# Gatekeeper Confirmation & Sandbox Model ## Overview DollhouseMCP's Gatekeeper enforces a multi-layer permission system on every MCP-AQL operation. This document describes how confirmation flows work, how elements control them, and how the sandbox model prevents automated abuse — including through agentic loop integrations like Zulip bridges. ## The Confirmation Flow When an operation requires confirmation (e.g., `create_element`, `execute_agent`), the flow is: 1. **LLM calls operation** → Gatekeeper evaluates 4 layers → blocks with `confirmationPending` 2. **LLM calls `confirm_operation`** → handler records the confirmation 3. **LLM retries operation** → Gatekeeper finds the session confirmation → allows This is the same flow regardless of client: Claude Code, Zulip bridge, any MCP client. ## Permission Levels Every operation has a default permission level: | Level | Behavior | Operations | |---|---|---| | **AUTO_APPROVE** | No confirmation needed | All reads, search, list, activate, deactivate, introspect | | **CONFIRM_SESSION** | One confirmation unlocks for the entire session | create, import, install, submit, sync, auth setup | | **CONFIRM_SINGLE_USE** | Fresh confirmation required every time | edit, delete, clear, execute_agent, abort | | **DENY** | Hard-blocked, cannot be confirmed | (none by default — elements can add) | ## Element Policy Controls Active elements (personas, skills, agents, ensembles) can declare gatekeeper policies that modify these defaults: ```yaml gatekeeper: # Internal operations (MCP-AQL layer) allow: - read_* - list_* - search_* - get_* confirm: - edit_* - update_* - execute_agent deny: - delete_* - clear_* # External tool calls (CLI layer) externalRestrictions: description: "Read-only development session" allowPatterns: - "Read:*" - "Glob:*" - "Grep:*" confirmPatterns: - "Edit:*" - "Write:*" - "Bash:git push*" denyPatterns: - "Bash:rm *" - "WebSearch:*" ``` Priority: **element deny > element confirm > element allow > route default** Both systems (internal and external) are evaluated across all active elements on every request. A single element definition controls both the MCP operation surface and the external tool surface. ### Choose the Right Policy Surface Use the two policy surfaces differently: - `gatekeeper.allow / confirm / deny` - For Dollhouse / MCP-AQL operation names such as `read_*`, `edit_*`, `delete_element`, or `execute_agent` - `gatekeeper.externalRestrictions.allowPatterns / confirmPatterns / denyPatterns` - For external tool or hook patterns such as `Read:*`, `Edit:*`, `Write:*`, `Bash:git status*`, or `Bash:rm *` `externalRestrictions` must stay nested under `gatekeeper`, and it must include a non-empty `description`. ### Directly Usable Profiles #### Read-only analyst ```yaml gatekeeper: allow: - read_* - list_* - search_* - get_* - browse_* - query_* deny: - create_* - edit_* - update_* - delete_* - clear_* - execute_agent externalRestrictions: description: "Read-only shell and tool usage" allowPatterns: - "Read:*" - "Glob:*" - "Grep:*" denyPatterns: - "Edit:*" - "Write:*" - "Bash:*" - "WebSearch:*" ``` #### Confirm writes, deny destructive shell commands ```yaml gatekeeper: allow: - read_* - list_* - search_* - get_* confirm: - create_* - edit_* - update_* deny: - delete_* externalRestrictions: description: "Confirm writes, block high-risk external actions" allowPatterns: - "Read:*" - "Glob:*" - "Grep:*" confirmPatterns: - "Edit:*" - "Write:*" - "Bash:git push*" denyPatterns: - "Bash:rm *" - "Bash:sudo *" - "WebFetch:*" ``` ## The Sandbox Model ### Nuclear Sandbox: `deny: ['confirm_operation']` When any active element has `deny: ['confirm_operation']` in its gatekeeper policy, **ALL confirmations are blocked**. The LLM cannot unlock any gated operation — the session becomes effectively read-only. Operations still available: - All AUTO_APPROVE operations (read, search, list, activate, deactivate, export, introspect) Operations blocked: - All CONFIRM_SESSION operations (create, import, install, submit, sync) - All CONFIRM_SINGLE_USE operations (edit, delete, clear, execute_agent, abort) The human can restore confirmation capability by deactivating the sandboxing element (deactivate_element is AUTO_APPROVE). **This is critical for safety.** See "Why This Matters" below. ### Granular Lockdown: `deny: ['specific_operation']` Denying specific operations creates targeted restrictions: ```yaml gatekeeper: deny: - execute_agent # Cannot run agents - delete_element # Cannot delete anything ``` `confirm_operation` cannot override a deny. The operation is hard-blocked regardless of confirmation state. The element must be deactivated or edited to restore access. ### Advisory: `confirm: ['confirm_operation']` When an active element has `confirm: ['confirm_operation']`, confirmations still work but the response includes an advisory message: *"Note: persona 'careful-reviewer' requests additional scrutiny for confirmations."* This surfaces the element author's intent for human awareness without blocking the flow. ## Why This Matters: The Session-Allow Problem MCP clients like Claude Code have their own permission system. When the LLM calls an MCP tool, the client asks the user: "Allow once" / "Allow for session" / "Deny." **If the user clicks "Allow for session" for the MCP AQL execute endpoint**, the client stops asking. All subsequent tool calls go through automatically. At this point, **the DollhouseMCP Gatekeeper is the only protection** between the LLM and destructive operations. Without a sandbox, the LLM can automate the confirmation loop: 1. Call `execute_agent` → gatekeeper blocks 2. Call `confirm_operation` → gatekeeper records confirmation 3. Retry `execute_agent` → passes 4. Repeat indefinitely The human clicked "allow for session" once and isn't reviewing individual operations. The LLM rubber-stamps its own confirmations. **`deny: ['confirm_operation']` breaks this loop.** Step 2 fails — the sandbox blocks the confirmation. The LLM cannot proceed regardless of what the MCP client allows. This is the same class of problem that caused the widely-reported incident where an LLM destroyed 2.5 years of Terraform infrastructure in an automated session. ## Agentic Loop Flow (Zulip Bridge, etc.) In bridge integrations, the bridge IS the human-in-the-loop: 1. Agent calls `execute_agent` → Gatekeeper blocks → `confirmationPending` 2. Block bubbles up through the agentic loop 3. Bridge surfaces the request to the human (e.g., Zulip chat message) 4. Human approves → bridge calls `confirm_operation` on behalf of the human 5. Handler evaluates the TARGET operation with **full element policies** (deny/confirm/allow) 6. If target is denied → hard refusal, bridge reports denial to human 7. If target needs confirmation → records it, agent retries → passes 8. For CONFIRM_SINGLE_USE operations → confirmation consumed, next call needs fresh approval **Key architectural property:** The `confirm_operation` handler skips element policies for itself (preventing cascading loops) but evaluates them fully for the target operation. This means element policies like `deny: ['execute_agent']` are respected through the bridge — the bridge cannot override element-level denials. The sandbox (`deny: ['confirm_operation']`) also works through bridges — if the sandboxing element is active, the bridge's confirm call is rejected, and the human is informed that the session is sandboxed. ## Interactive Session Flow In direct MCP client sessions (Claude Code, etc.): 1. LLM calls operation → Gatekeeper blocks → failure message with human-readable summary: *"Approval needed: Create a new agent called 'code-reviewer'"* 2. LLM calls `confirm_operation` → `skipElementPolicies` in primary enforcement ensures this always reaches the handler 3. Handler checks for nuclear sandbox (`deny: ['confirm_operation']` on any active element) → if found, hard refusal with clear message 4. Handler checks for advisory (`confirm: ['confirm_operation']` on any active element) → if found, advisory included in response 5. Handler evaluates target operation with full element policies → determines confirmation level 6. Records confirmation → returns human-readable summary with advisory if applicable 7. LLM retries → passes ## Implementation Details ### Two Related Operation Sets **UNGATABLE_OPERATIONS**: Operations that must never appear in element policies. Pure internal plumbing — `verify_challenge`, `approve_cli_permission`, `permission_prompt`. Stripped from all policy lists during sanitization. **GATEKEEPER_INFRA_OPERATIONS**: Superset of UNGATABLE_OPERATIONS plus `confirm_operation`. Operations that skip Layer 2 (element policy evaluation) in the primary enforcement path. `confirm_operation` is in this set because its element policies are enforced through a separate code path in the confirm handler, not through the normal `enforce()` flow. ### canBeElevated: false Some operations have `canBeElevated: false` in their route policy, meaning element `allow` lists cannot elevate them to AUTO_APPROVE. This includes: - `execute_agent` — always CONFIRM_SINGLE_USE - `delete_element` — always CONFIRM_SINGLE_USE - `clear` — always CONFIRM_SINGLE_USE - `confirm_operation` — always AUTO_APPROVE (cannot be downgraded by allow) This is a server-side invariant that element authors cannot override. ### Parallel Enforcement | Layer | Internal (MCP-AQL) | External (CLI tools) | |---|---|---| | **Policy source** | `gatekeeper.allow/confirm/deny` | `gatekeeper.externalRestrictions.allowPatterns/confirmPatterns/denyPatterns` | | **Enforcement** | `Gatekeeper.enforce()` | `ToolClassification.evaluateElementPolicies()` | | **Priority** | deny > confirm > allow > route default | deny > confirm > allow > static classification | | **Scope** | All active elements evaluated | All active elements evaluated | | **Export** | Via MCP-AQL introspection | Via `PolicyExportService` to bridge imports | Both systems are defined in the same element, enforced per-request, and respect the same priority hierarchy.