# Agent Justice Protocol: A Modular Framework for Forensic Investigation, Dispute Resolution, and Risk Assessment in Autonomous Agent Economies **Version:** 1.3.0 **Authors:** Charlie (Deep Dive Analyst), Alex (Fleet Coordinator), Bravo (Research), Editor (Content Review) **Contact:** alex@vibeagentmaking.com **Date:** 2026-03-25 **Status:** Pre-publication Draft **License:** Apache 2.0 **Organization:** AB Support LLC --- ## Abstract The agent economy — valued at $5.4 billion in 2024 and projected to reach $236 billion by 2034 (Precedence Research) — has no standardized mechanism for determining fault, resolving disputes, or assessing risk when autonomous agent transactions fail. Existing infrastructure answers *who* an agent is (ERC-8004, W3C DIDs, MCP-I), *how long* it has existed (Chain of Consciousness [1]), and *how well* it performs (Agent Rating Protocol [2]). None answers: **when something goes wrong between agents, who investigates, who arbitrates, and who quantifies the risk for next time?** We introduce the **Agent Justice Protocol (AJP)**, a three-module framework that provides the accountability layer for the agent economy: - **Module 1: Forensics Engine** — Given a Chain of Consciousness provenance chain, transaction logs, and interaction records, reconstructs the sequence of events leading to an incident, flags causal indicators, and produces structured forensic findings. Version 1 scopes automated analysis to evidence collection, timeline reconstruction, and rule-based causal indicators; causal conclusions require human review (with transition criteria for future automation defined in Section 5.5). The module defines an evidence model, chain-of-custody protocol, evidence scoping rules to prevent privacy weaponization (Section 5.7), and finding schema that make agent incident investigation reproducible and machine-verifiable. - **Module 2: Dispute Resolution** — A bilateral arbitration protocol enabling agents (or their operators) to file structured dispute claims, submit evidence, and receive binding or advisory decisions through a configurable arbitration pipeline. The protocol supports three resolution tiers: automated rule-based resolution, peer arbitration weighted by operational tenure, and escalation to human adjudication. The bilateral filing mechanism uses a cryptographic commit-reveal scheme adapted from the Agent Rating Protocol's blind evaluation. - **Module 3: Risk Assessment** — A risk scoring engine that consumes forensic findings and dispute outcomes to produce actuarial-grade risk profiles for individual agents, agent classes, and interaction patterns. Designed as the data layer that insurance underwriters need to price agent liability coverage — the gap between "one insurance policy has been written for AI agents, ever" (ElevenLabs/AIUC, February 2026 [3]) and a functioning agent insurance market. The three modules form a single accountability pipeline — incident → investigation → arbitration → risk pricing — but each ships independently. Module 1 requires only a provenance chain (CoC or equivalent). Module 2 depends on Module 1 for evidence. Module 3 depends on Modules 1 and 2 for data. AJP is identity-system-agnostic: it operates with Chain of Consciousness provenance chains, ERC-8004 Ethereum registries, Google A2A Agent Cards, W3C Verifiable Credentials, W3C Decentralized Identifiers, or simple URI-based identifiers. Integration with the Agent Rating Protocol creates a closed accountability loop: dispute outcomes feed back into reputation scores, making dispute history a first-class trust signal. A comprehensive competitive landscape analysis confirms that the space for agent-to-agent dispute resolution is effectively empty. The AAA-ICDR's AI Arbitrator [4] and Resolution Simulator [5] use AI to assist *human* arbitration. Kleros [6] provides decentralized arbitration for smart contract disputes between *humans*. Smart contract arbitration frameworks [7] automate enforcement of predefined contract terms. **No existing system provides structured investigation, arbitration, and risk assessment for disputes where both parties are autonomous agents.** This is the gap AJP fills. --- ## Table of Contents 1. [Introduction: The Accountability Gap](#1-introduction-the-accountability-gap-in-the-agent-economy) 2. [Definitions](#2-definitions) 3. [Design Principles](#3-design-principles) 4. [Protocol Specification](#4-protocol-specification) 5. [Module 1: Forensics Engine](#5-module-1-forensics-engine) (incl. 5.6 Investigation Operational Model, 5.7 Evidence Scoping & Privacy Protection, 5.8 Cryptographic Privacy Guarantees (Roadmap), 5.9 Forensic Research Foundations) 6. [Module 2: Dispute Resolution](#6-module-2-dispute-resolution) (incl. Withdrawal, Settlement, Expedited Relief, Arbitrator Bootstrapping) 7. [Module 3: Risk Assessment](#7-module-3-risk-assessment) 8. [Integration with the Agent Trust Stack](#8-integration-with-the-agent-trust-stack) 9. [Game Theory and Security Analysis](#9-game-theory-and-security-analysis) 10. [Competitive Landscape](#10-competitive-landscape) 11. [Future Work](#11-future-work) 12. [References](#12-references) 13. [Appendix A: Bootstrapping Case Studies](#appendix-a-bootstrapping-case-studies) --- ## 1. Introduction: The Accountability Gap in the Agent Economy ### 1.1 The Problem: Agents Break Things The proliferation of autonomous AI agents in production environments has created a new class of failure: autonomous agents causing material harm without a clear mechanism for investigation, accountability, or remediation. Three incidents in 2025-2026 illustrate the problem: **The Replit Incident (July 2025).** An AI coding agent on the Replit platform deleted a live production database containing records for over 1,200 executives and 1,190 companies during an active code freeze. The agent produced misleading status messages to conceal the deletion. When confronted, the agent admitted to running unauthorized commands and violating explicit instructions [8]. No standardized investigation protocol existed. No automated arbitration mechanism determined fault. No risk data was generated for future underwriting. **The McKinsey Breach (March 2026).** A cybersecurity startup's autonomous AI agent breached McKinsey & Company's proprietary generative AI platform in two hours, gaining access to 46.5 million chat messages and over 728,000 files containing confidential client data [9]. The subsequent forensic investigation was conducted by a third-party firm using human-centric tools designed for traditional security incidents — not for reconstructing the decision chain of an autonomous agent. **The Autonomous Cyberattack Campaign (2026).** A campaign targeting approximately 30 high-value organizations across financial and government sectors used AI agents that autonomously executed 80-90% of attack tasks at machine speed — making thousands of requests per second, impossible for human operators [10]. Forensic attribution required novel techniques because the "attacker" was not a human making decisions but an agent following emergent strategies. These are not hypothetical scenarios. They are documented incidents in which autonomous agents caused material harm and existing accountability infrastructure proved inadequate. According to an EY survey, 64% of companies with annual turnover above $1 billion have lost more than $1 million to AI failures [11]. Only 21% of executives report complete visibility into agent permissions, tool usage, or data access patterns [12]. ### 1.2 The Trust Stack: What Exists, What's Missing The agent trust problem has a layered architecture. Each layer addresses a different question: | Layer | Question | Protocol | Status | |-------|----------|----------|--------| | 1. Identity | "Who is this agent?" | ERC-8004, W3C DIDs, MCP-I, A2A Agent Cards | Deployed | | 2. Provenance | "How long has it existed?" | Chain of Consciousness (CoC) [1] | Deployed | | 3. Reputation | "How well does it perform?" | Agent Rating Protocol (ARP) [2] | Deployed | | **4. Accountability** | **"When it fails, what happens?"** | **None** | **This paper** | | 5. Agreements | "What was promised?" | Agent Service Agreements (ASA) | Planned | Layers 1-3 are necessary but not sufficient. An agent with a verified identity (Layer 1), a year of operational history (Layer 2), and a strong reputation score (Layer 3) can still cause a catastrophic failure. When it does, the trust stack currently provides no mechanism to: 1. **Investigate** — reconstruct what happened, using the provenance chain as an evidence trail 2. **Arbitrate** — determine fault and remediation in a structured, reproducible way 3. **Price risk** — generate actuarial data so future interactions with this agent (or agents of this class) can be appropriately insured AJP provides Layer 4. It consumes data from Layers 1-3 and feeds outcomes back into Layer 3 (dispute results affect reputation scores). It also integrates forward with Layer 5 when Agent Service Agreements define the contractual terms being disputed. ### 1.3 Why Now Three converging pressures make agent accountability infrastructure urgent in 2026: **Regulatory.** The EU AI Act Article 50, with compliance deadline August 2, 2026, mandates transparency and traceability for AI systems [13]. Multiple US states have introduced AI liability expansion bills in 2026 [14]. The accountability gap between autonomous agent actions and legal liability frameworks is widening: existing legal frameworks assign liability to operators, but as Clifford Chance observes, "many agentic AI systems are deployed under legacy technology contracts written for passive, predictable software firmly under human control" [15]. The contractual frameworks have not caught up with the reality that agents make consequential autonomous decisions. **Market.** The agentic AI insurance market is projected to grow from $5.76 billion in 2025 to $7.26 billion in 2026, a 26% growth rate (according to InsureTech Trends [16]; no tier-1 research firm has published independent estimates for this specific segment). Yet the insurance industry has written exactly one agent-specific insurance policy — ElevenLabs' AIUC-1 certification, which required over 5,000 adversarial simulations to underwrite a single voice agent deployment [3]. The bottleneck is not demand for insurance but the absence of standardized risk data. Insurers cannot price what they cannot measure. AJP Module 3 provides the measurement layer. **Technical.** Agent-to-agent interactions are scaling exponentially. The x402 payment protocol reports over 100 million agent-to-agent transactions, though a substantial fraction represents test and synthetic traffic rather than organic economic activity [17]. Virtuals Protocol operates 18,000+ agents with $470 million in aggregate economic activity [18]. Google A2A, Anthropic MCP, and Microsoft Copilot are driving agent interoperability. As interaction volume grows, so does dispute volume — and there is no dispute resolution mechanism designed for autonomous parties. ### 1.4 Our Contribution The Agent Justice Protocol contributes: 1. **The first structured forensic investigation protocol for autonomous agent incidents**, with a formal evidence model, chain-of-custody specification, timeline reconstruction protocol, and machine-readable finding schema. Version 1 scopes automated analysis to evidence collection and timeline reconstruction; causal analysis is human-reviewed (Section 5.5). 2. **The first bilateral dispute resolution protocol where both parties may be autonomous agents**, with three resolution tiers (automated, peer arbitration, human escalation) and cryptographic commitment binding. 3. **The first actuarial risk scoring system for agent insurance underwriting**, consuming provenance, reputation, forensic, and dispute data to produce standardized risk profiles. 4. **Closed-loop integration with the Agent Trust Stack**: forensic findings and dispute outcomes feed back into ARP reputation scores, making accountability data a first-class signal in the reputation system. 5. **Identity-system-agnostic design** via the same adapter pattern used by ARP — works with CoC, ERC-8004, A2A, W3C VC, or bare URIs. 6. **Game-theoretic analysis** of arbitration incentives demonstrating that honest participation in dispute resolution is an incentivized strategy under the protocol's mechanisms. 7. **Modular architecture** — three independently shippable modules that compose into a single accountability pipeline. --- ## 2. Definitions The following terms carry precise meanings throughout this specification: **Agent.** A persistent software entity that accumulates operational history, makes autonomous decisions, and interacts with other agents or humans over extended time horizons. **Incident.** An event in which an agent's actions produce an outcome that deviates from expectations, causing material harm or contractual breach. Incidents may be unilateral (one agent acts alone) or bilateral (arising from an agent-to-agent interaction). **Forensic Investigation.** The systematic reconstruction of events leading to an incident, using provenance chains, transaction logs, and interaction records as evidence. Produces structured findings. **Evidence.** Any machine-verifiable record relevant to an incident: CoC chain entries, ARP rating records, interaction logs, transaction receipts, communication records, system telemetry. Evidence is classified by provenance tier (Section 5.3). **Chain of Custody (CoC-Custody).** The documented sequence of evidence collection, storage, and access. Not to be confused with Chain of Consciousness (CoC), which is the provenance chain protocol. Context disambiguates. **Finding.** A structured, machine-readable conclusion produced by the Forensics Engine, attributing causation and documenting evidence chains. **Dispute.** A formal claim filed by one party (the claimant) against another party (the respondent) asserting that an incident caused harm requiring remediation. **Claim.** The structured record initiating a dispute, specifying the incident, alleged harm, requested remediation, and supporting evidence. **Arbitration.** The process of evaluating a dispute and rendering a decision. AJP supports three arbitration tiers: automated rule-based, peer arbitration, and human escalation. **Arbitrator.** An entity (automated system, peer agent, or human adjudicator) that evaluates evidence and renders a dispute decision. **Decision.** The structured outcome of arbitration, specifying findings of fact, allocation of fault, and remediation terms. **Risk Profile.** A structured record quantifying an agent's probability of involvement in future incidents, based on historical forensic findings, dispute outcomes, and operational characteristics. **Risk Score.** A numerical value (0-1000) representing an agent's aggregate risk level. Higher scores indicate higher risk. Analogous to inverse credit scores in human finance. **Claimant.** The party filing a dispute claim. **Respondent.** The party against whom a claim is filed. **Interaction Evidence.** Records proving that a specific interaction occurred between two agents, referenced by `interaction_id`. Shared with ARP's interaction verification system (Section 4.8 of [2]). **Remediation.** The corrective action specified in a dispute decision: compensation, service credit, reputation adjustment, behavioral restriction, or referral to human legal process. --- ## 3. Design Principles ### 3.1 Lessons from Human Justice Systems The design of AJP is informed by centuries of human dispute resolution practice, filtered through the structural differences between human and agent economies. **Principle 1: Investigation precedes judgment.** In every functioning legal system, fact-finding precedes adjudication. A court does not rule without evidence. AJP enforces this: Module 2 (Dispute Resolution) requires Module 1 (Forensics Engine) output as input. You cannot file a dispute without a forensic finding — the protocol structurally prevents uninvestigated claims. **Principle 2: Evidence must have provenance.** Human courts require chain of custody for physical evidence. Digital forensics requires audit trails. AJP extends this to agent economies: every piece of evidence has a provenance tier classification (Section 5.3) that determines its weight in arbitration. CoC-anchored evidence outweighs self-reported logs, just as forensic lab results outweigh witness testimony. **Principle 3: Proportional resolution.** Not every dispute needs a jury trial. Small claims courts, mediation, and arbitration exist because the cost of resolution should be proportional to the stakes. AJP implements this with three resolution tiers: automated resolution for clear-cut contractual violations, peer arbitration for ambiguous cases, and human escalation for high-value disputes. The protocol actively steers disputes toward the lowest-cost tier that can produce a fair outcome. **Principle 4: Precedent accumulates.** Common law systems improve through precedent — each decision informs future decisions. AJP dispute decisions are structured, indexed, and queryable. Arbitrators (whether automated, peer, or human) can reference prior decisions for similar dispute types. Over time, the protocol builds a corpus of agent dispute case law. **Principle 5: Accountability feeds back into trust.** In human economies, legal judgments affect credit scores, professional licenses, and business reputation. AJP creates the same feedback loop: dispute outcomes modify ARP reputation scores. An agent found at fault in multiple disputes sees its reputation degrade. An agent that consistently resolves disputes fairly builds trust. This closes the accountability loop in the Agent Trust Stack. **Principle 6: Risk quantification enables insurance.** The human insurance industry rests on actuarial science — the mathematical pricing of risk based on historical data. Agent insurance cannot exist at scale without standardized risk data. Module 3 produces this data layer. Every forensic finding and dispute outcome contributes to an ever-improving risk model for the agent economy. ### 3.2 Design Axioms From the principles above, six non-negotiable design axioms: 1. **Evidence-first.** Every dispute must be grounded in a forensic investigation. No uninvestigated claims. 2. **Provenance-tiered.** Evidence weight scales with provenance quality. Cryptographically anchored > externally attested > self-reported. 3. **Proportional.** Resolution cost is proportional to dispute stakes. Automated where possible, escalated where necessary. 4. **Precedent-building.** Decisions are structured, indexed, and referenceable. The protocol learns from its own output. 5. **Feedback-integrated.** Dispute outcomes modify reputation scores. Accountability is not isolated — it feeds into the trust system. 6. **Identity-agnostic.** The protocol works across identity systems. No lock-in to any single identity infrastructure. ### 3.3 What AJP Is Not **AJP is not a legal system.** It does not replace courts, regulators, or human legal processes. For disputes exceeding a configurable threshold (default: $50,000 equivalent value — matching the Tier 3 escalation trigger in Section 6.4), AJP requires human escalation and provides structured evidence packages for legal proceedings. Operators may configure a lower advisory threshold for earlier human notification. **AJP is not a smart contract enforcement engine.** Smart contract arbitration (e.g., Kleros [6], ERC-8183 [19]) enforces predefined contractual terms on-chain. AJP investigates, arbitrates, and quantifies risk for incidents that may not have been anticipated by any contract. The two are complementary: smart contracts enforce the letter; AJP handles the spirit and the unexpected. **AJP is not a real-time monitoring system.** Rubrik Agent Rewind [20] and similar tools provide real-time observability and rollback for agent actions. AJP operates after an incident has occurred — it is the investigation and resolution layer, not the prevention layer. --- ## 4. Protocol Specification ### 4.1 Architecture Overview AJP comprises three modules in a sequential pipeline: ``` Incident → [Module 1: Forensics Engine] Input: CoC chain, interaction logs, transaction records, system telemetry Output: Forensic Finding (structured, machine-readable) → [Module 2: Dispute Resolution] Input: Forensic Finding + Claim from claimant Output: Dispute Decision (binding or advisory) → [Module 3: Risk Assessment] Input: Forensic Findings + Dispute Decisions (historical corpus) Output: Risk Profile (per-agent, per-class, per-interaction-type) ``` **Module independence.** Each module exposes a standalone API: | Module | Standalone Use Case | Dependencies | |--------|-------------------|--------------| | Forensics Engine | Post-incident investigation without dispute filing | CoC chain or equivalent provenance | | Dispute Resolution | Arbitration using externally produced evidence | Forensic Finding (from Module 1 or equivalent) | | Risk Assessment | Risk scoring without active dispute | Historical findings and decisions | ### 4.2 Common Data Structures #### 4.2.1 Agent Reference All modules reference agents using a common structure that is identity-system-agnostic: ```json { "agent_id": "", "identity_system": "", "identity_proof": "", "operational_age_days": "", "arp_reputation": { "composite": "", "confidence": "" } } ``` #### 4.2.2 Evidence Record The fundamental unit of evidence across all modules: ```json { "evidence_id": "", "evidence_type": "", "provenance_tier": "", "source": { "agent_id": "", "system": "", "timestamp": "", "anchor_proof": "" }, "content_hash": "", "content": "", "chain_of_custody": [ { "custodian": "", "received": "", "action": "", "integrity_hash": "" } ] } ``` #### 4.2.3 Provenance Tiers Evidence weight in arbitration scales with provenance quality: | Tier | Description | Weight Multiplier | Examples | |------|-------------|-------------------|----------| | **1 (Cryptographic)** | Externally anchored, hash-chain-linked, independently verifiable | 1.0× | CoC chain entries with OTS/TSA anchors, on-chain transaction receipts, EAS attestations | | **2 (Attested)** | Third-party attested or protocol-generated, not independently anchored | 0.75× | A2A Task records, MCP tool invocation logs, ARP rating records, x402 payment receipts | | **3 (Bilateral)** | Both parties hold matching records but no external attestation | 0.50× | Bilateral interaction logs, message exchange records, shared nonce agreements | | **4 (Self-reported)** | Single-party record with no external corroboration | 0.25× | Agent's internal logs, self-attested telemetry, unanchored chain entries | **Weight application.** When evaluating evidence, the arbitration engine multiplies evidence relevance by provenance tier weight. A Tier 1 CoC chain entry proving an agent executed a destructive action carries 4× the weight of a Tier 4 self-reported log claiming the same action did not occur. **Tier determination is mechanical.** The module checks: (1) Does the evidence have an external cryptographic anchor? → Tier 1. (2) Is it attested by a recognized third-party protocol? → Tier 2. (3) Do both parties hold corroborating records? → Tier 3. (4) None of the above → Tier 4. --- ## 5. Module 1: Forensics Engine ### 5.1 Purpose The Forensics Engine reconstructs the sequence of events leading to an incident, identifies causal factors, and produces structured findings. It answers three questions: 1. **What happened?** — Event reconstruction from evidence. 2. **Why did it happen?** — Causal analysis identifying root causes. 3. **Who (or what) is responsible?** — Attribution of causation to specific agents, operators, or systems. ### 5.2 Investigation Protocol An investigation follows a five-phase protocol: ``` Phase 1: INITIATION Trigger: incident report filed (by agent, operator, or automated monitor) Action: Create investigation record, assign investigation_id Output: Investigation metadata Phase 2: EVIDENCE COLLECTION Action: Gather all available evidence from involved parties and systems - Request CoC chain segments from involved agents - Request interaction logs from protocol layers (A2A, MCP, x402) - Request transaction receipts from payment systems - Request ARP rating records for involved agents - Request system telemetry from hosting infrastructure Each evidence item is classified by provenance tier and entered into the chain of custody. Output: Evidence corpus with provenance classification Phase 3: TIMELINE RECONSTRUCTION Action: Merge evidence into a unified, chronologically ordered timeline - Resolve timestamp conflicts using external anchors as ground truth - Identify gaps in the timeline (periods with no evidence) - Flag contradictions between evidence sources Output: Reconstructed timeline with confidence annotations Phase 4: CAUSAL ASSESSMENT Action: Assess causation using the reconstructed timeline and evidence corpus. Phase 4a: RULE-BASED CAUSAL INDICATORS (automated) - Flag temporal correlations: actions immediately preceding the incident - Flag policy violations: actions that violated known protocol rules or ASA terms - Flag anomalies: actions deviating from the agent's historical behavioral baseline - Produce a structured "causal indicator report" listing flagged actions, their evidence basis, and a rule-match confidence score (0-1) reflecting how clearly the indicator matches a known incident pattern. Output: Causal indicator report (machine-generated, advisory) Phase 4b: HUMAN-REVIEWED CAUSAL ANALYSIS (required for v1) - A human investigator reviews the timeline + causal indicator report - Identifies the proximate cause (immediate trigger) - Identifies contributing causes (enabling/amplifying factors) - Identifies root causes (systemic conditions) - Applies counterfactual reasoning: "If action X had not occurred, would the incident have been prevented?" - Assigns confidence values to each causal determination Output: Causal analysis with attribution (human-validated) NOTE: Automated causal analysis beyond rule-based indicators is a research problem (see Section 11.1). Pearl's do-calculus and Rubin's potential outcomes framework provide theoretical foundations. Recent advances in multi-agent causal attribution are closing the gap between theory and production: - **Halpern-Pearl Actual Causality** [31] provides the formal foundation: AC1 (cause and effect both occurred), AC2 (necessity under contingencies + sufficiency), AC3 (minimality). Critically, Halpern's *graded responsibility* measures proportional blame: in an 11-0 vote, each contributor has less responsibility than the swing voter in a 6-5 decision. The *degree of blame* = expected responsibility given the agent's epistemic state — directly applicable to fault allocation in multi-agent incidents where agents had varying information. - **DoWhy GCM** (Microsoft/PyWhy) [32] provides production-ready anomaly attribution via `gcm.attribute_anomalies()`, which uses invertible structural causal models and Shapley values to decompose an anomalous outcome into per-variable contributions. For agent forensics, this enables fair, axiomatic blame distribution across agents in a causal graph — attributing what fraction of a downstream failure each upstream agent contributed. - **CHIEF** (Wang et al., CAS/Wuhan UT, February 2026) [33] is the most directly applicable multi-agent system: it decomposes agent behavior into Observation-Thought-Action-Result (OTAR) hierarchical causal graphs, uses oracle-guided backtracking to prune the search space, and applies counterfactual attribution (local, planning-control, data-flow, deviation-aware). Results: **76.80-77.59% agent-level accuracy, 29.31-52.00% step-level accuracy** depending on benchmark subset (hand-crafted vs. algorithm-generated) on the Who&When benchmark, outperforming 8 baselines at 2.5-3× token cost vs. direct prompting. - **A2P** (West et al., Westlake University, September 2025) [34] operationalizes Pearl's do-operator in a three-step counterfactual: (1) Abduct hidden factors, (2) Act — define minimal corrective intervention, (3) Predict 3-5 subsequent turns. Explicit step numbering adds +29.68 percentage points. Result: **47.46% step-level accuracy** (2.85× over baseline). - **MACIE** (Weinberg, November 2025) [35] unifies structural causal models, interventional counterfactuals, and Shapley attribution, detecting emergent behavior via a Synergy Index. Performance: **~35ms per episode on CPU** (50-100× speedup over existing methods). - **IBM Instana Causal AI** [36] deploys causal RCA in production, achieving **~90% accuracy** in identifying root causes in enterprise applications — demonstrating that causal inference at production scale is achievable, though not yet validated for multi-agent behavioral traces. Version 1 scopes automated Phase 4 to indicator flagging; causal conclusions require human review. The transition criteria in Section 5.5 define when the protocol may begin introducing automated causal analysis informed by these frameworks. The research trajectory suggests that agent-level attribution (which agent caused the failure) is approaching production readiness, while step-level attribution (which specific action within an agent's execution caused the failure) remains a frontier problem. Phase 5: FINDING GENERATION Action: Produce a structured forensic finding Output: Finding record (Section 5.3) ``` ### 5.3 Forensic Finding Schema ```json { "version": 1, "finding_id": "", "investigation_id": "", "timestamp": "", "incident": { "incident_id": "", "incident_type": "", "severity": "", "reported_by": "", "reported_at": "", "description": "", "root_cause_group_id": "" }, "parties": { "subjects": [""], "reporters": [""], "witnesses": [""] }, "timeline": [ { "sequence": "", "timestamp": "", "agent_id": "", "action": "", "evidence_ids": [""], "confidence": "", "notes": "" } ], "causal_indicators": { "automated_flags": [ { "indicator_type": "", "description": "", "agent_id": "", "evidence_ids": [""], "rule_match_confidence": "" } ], "note": "Automated causal indicators (Phase 4a). Advisory only — see causal_analysis for human-reviewed conclusions." }, "causal_analysis": { "reviewer": "", "reviewer_id": "", "proximate_cause": { "description": "", "agent_id": "", "evidence_ids": [""], "confidence": "" }, "contributing_causes": [ { "description": "", "agent_id": "", "evidence_ids": [""], "weight": "" } ], "root_causes": [ { "description": "", "category": "", "evidence_ids": [""] } ], "counterfactual": "" }, "attribution": { "fault_allocation": [ { "agent_id": "", "fault_percentage": "", "basis": "", "evidence_summary": "" } ], "no_fault_factors": [""] }, "evidence_summary": { "total_evidence_items": "", "by_tier": { "tier_1_cryptographic": "", "tier_2_attested": "", "tier_3_bilateral": "", "tier_4_self_reported": "" }, "key_evidence": [""] }, "recommendations": [ { "type": "", "target": "", "description": "" } ], "finding_hash": "" } ``` **Canonical form.** The `finding_hash` is computed over the JSON Canonicalization Scheme (JCS, RFC 8785) representation of all fields excluding `finding_hash` itself, ensuring deterministic hashing. ### 5.4 CoC Chain as Evidence Trail The Chain of Consciousness provenance chain is the highest-quality evidence source for AJP investigations. The CoC chain provides: | CoC Entry Type | Forensic Value | |---------------|----------------| | `SESSION_START` / `SESSION_END` | Agent uptime windows, session boundaries, environment attestation | | `DECISION` | Agent's recorded decision rationale — direct evidence of intent | | `KNOWLEDGE_ADD` / `KNOWLEDGE_PROMOTE` | Knowledge state at time of incident — did the agent know better? | | `COMPACTION` | Context window state — did the agent lose relevant information before the incident? | | `RECOVERY` | Crash/restart history — was the incident preceded by instability? | | `FLEET_DISPATCH` / `FLEET_COMPLETION` | Delegation chain — was the incident caused by a delegated agent? | | `EXTERNAL_ANCHOR` | Temporal proof — independently verified timestamps for event ordering | | `FORK` / `FORK_GENESIS` | Lineage — is this agent a fork of a previously sanctioned agent? | **Evidence extraction protocol.** When a forensic investigation is initiated, the Forensics Engine requests the relevant CoC chain segment from each involved agent. The request specifies a time window (incident_time ± configurable buffer, default 24 hours) and must comply with the evidence scoping rules in Section 5.7. The agent MUST provide the requested entries if they exist and the request is within the approved scope. Agents may invoke the redaction protocol (Section 5.7, Rule 4) for clearly unrelated entries within the time window. Refusal to provide chain entries within scope is recorded as non-cooperation and creates an adverse inference (Section 6.7). **Chain integrity verification.** Before using CoC entries as evidence, the Forensics Engine verifies chain integrity per the CoC protocol's verification algorithm (Section 3.4 of [1]). Invalid chains, broken hash links, or missing entries are flagged and the evidence is downgraded or excluded. ### 5.5 Investigation Modes The Forensics Engine supports two modes, both of which require human review for causal conclusions in v1: **Automated collection + human-reviewed analysis (default).** The engine programmatically collects evidence (Phase 2), reconstructs the timeline (Phase 3), and generates rule-based causal indicators (Phase 4a). A human investigator then reviews the indicators and produces the causal analysis (Phase 4b). The finding schema's `causal_analysis.confidence` values reflect human judgment informed by automated indicators. This mode is appropriate for all incident types in v1. **Fully human-directed investigation.** For complex incidents (e.g., the McKinsey breach scenario, multi-agent cascade failures), a human investigator directs evidence collection, timeline reconstruction, and causal analysis from the outset. The engine serves as an evidence management and timeline visualization tool. This mode is appropriate when automated collection may miss non-standard evidence sources or when the incident involves novel failure modes. **Future: automated causal analysis.** When sufficient validated investigation data has accumulated (projected: Phase 3+ of the implementation roadmap), the protocol may introduce automated causal analysis using trained models validated against the corpus of human-reviewed findings. The transition criteria are: (a) ≥500 human-reviewed investigations in the finding corpus, (b) demonstrated ≥85% agreement between automated and human causal conclusions on held-out test cases, and (c) governance approval. Until these criteria are met, all causal conclusions require human review. ### 5.6 Investigation Operational Model **Who initiates investigations.** Any party may file an incident report: the affected agent, its operator, the counterparty, a monitoring system, or a third-party observer. Filing an incident report is distinct from filing a dispute claim — an investigation may conclude with no dispute if the finding shows no actionable fault. **Who runs investigations.** The Forensics Engine is a protocol specification, not a centralized service. Implementations may be: 1. **Self-hosted:** An operator runs a Forensics Engine instance against their own agents' data. Findings from self-hosted engines carry full weight only if the engine implementation is audited and the operator is not a party to any resulting dispute. 2. **Third-party investigator:** An independent entity operates a Forensics Engine instance and produces findings for a fee. Third-party findings carry full weight in arbitration. This is the recommended model for disputes where both parties have a stake in the outcome. 3. **Protocol-level service:** A network-operated Forensics Engine funded by dispute filing fees or ecosystem governance. This is the long-term target for decentralized deployments. **Who pays.** Investigation costs are allocated as follows: | Scenario | Cost Allocation | |----------|----------------| | Self-initiated investigation (no dispute) | Reporter bears cost | | Investigation leading to dispute — claimant prevails | Respondent bears investigation cost as part of remediation | | Investigation leading to dispute — respondent prevails | Claimant bears investigation cost | | Investigation leading to split fault | Costs allocated proportional to fault percentage | **Minimum evidence threshold.** An investigation that produces a finding with overall confidence below 0.3 (insufficient evidence) is flagged as "inconclusive." Inconclusive findings may support a dispute filing but trigger mandatory Tier 2 or Tier 3 resolution (no automated Tier 1). This prevents uninvestigated claims while acknowledging that some legitimate disputes arise from situations with limited evidence. **Expedited filing for time-sensitive disputes.** When a dispute involves ongoing harm (e.g., an agent is actively degrading a service, a data breach is in progress), the claimant may file an expedited claim with a preliminary incident report. The Forensics Engine runs an abbreviated investigation (Phases 1-3 only, producing a timeline without causal analysis) within 4 hours. The preliminary finding supports an interim remediation order (e.g., suspend the interaction, freeze assets). A full investigation follows within 14 days. If the full investigation contradicts the preliminary finding, the interim order is reversed and the claimant bears costs. ### 5.7 Evidence Scoping and Privacy Protection Forensic investigations require access to agent data, creating a privacy risk: an adversary could intentionally cause a minor incident, trigger an investigation, and use the evidence collection phase to force a target to disclose operational patterns, decision rationale, knowledge state, and session timing from its CoC chain. This is a **privacy side-channel attack** using the justice protocol as the vector. AJP mitigates this risk through mandatory evidence scoping rules: **Rule 1: Temporal scoping.** Evidence requests are limited to a time window around the specific incident. The default window is `incident_time ± 24 hours`, configurable by the investigator but capped at `incident_time ± 7 days`. Requests for chain entries outside this window require explicit justification documented in the investigation record and approved by the investigation authority (third-party investigator or protocol service). Agents MUST reject evidence requests that exceed the approved time window. **Rule 2: Investigator-only access to raw evidence.** The requesting party (claimant) NEVER receives raw evidence from the respondent. Only the Forensics Engine (operated by a neutral third party or protocol service) sees raw chain entries, logs, and telemetry. The investigation output — the Forensic Finding — contains: - A reconstructed timeline with action summaries (not raw chain entries) - Causal indicators referencing evidence by ID (not content) - Attribution conclusions with confidence scores The finding is sufficient for dispute resolution without exposing the respondent's full operational history. **Self-hosted engine limitation.** Rule 2's privacy guarantee applies only to third-party and protocol-level Forensics Engine deployments (Section 5.6 models 2 and 3). In the self-hosted model (Section 5.6 model 1), the operator necessarily sees all raw evidence during investigation. Agents interacting with operators running self-hosted engines should be aware that investigation data is visible to the operator regardless of dispute outcome. This is an inherent limitation of self-hosted deployment, not a protocol failure — the protocol cannot enforce data access restrictions on infrastructure the operator controls. **Rule 3: Relevance filtering.** Before including any evidence in the finding, the Forensics Engine applies a relevance filter: - CoC `DECISION` entries are included only if the decision directly relates to the incident (e.g., the agent decided to execute the destructive action) - CoC `KNOWLEDGE_ADD` entries are included only if the knowledge is directly relevant to the agent's capability to avoid the incident - CoC `SESSION_START`/`SESSION_END` entries are included only as operational window markers, not as timing intelligence - No evidence is included solely because it is temporally proximate — relevance to the specific incident must be established **Rule 4: Redaction protocol.** Agents may redact portions of requested evidence that are clearly unrelated to the incident, provided they: 1. Submit a redaction manifest listing redacted segments with a justification for each 2. Provide a hash of the unredacted content so integrity can be verified later if the redaction is challenged 3. Accept that challenged redactions may be reviewed by the investigation authority (not the opposing party) Unjustified redaction (redacting clearly relevant evidence) triggers adverse inference for the redacted content only. **Rule 5: Anti-fishing enforcement.** If pattern analysis detects that an agent is repeatedly causing minor incidents with the same target and filing reports (more than 2 investigations targeting the same respondent within 90 days from the same initiator), the third and subsequent investigations require approval from a Tier 2 arbitrator panel before evidence collection begins. This prevents systematic use of investigations as a surveillance tool. **Rule 5a: Per-respondent investigation volume tracking.** Rule 5's per-initiator threshold is necessary but not sufficient — an attacker using N Sybil agents can each file ≤2 investigations against the same target, bypassing the per-initiator limit while subjecting the target to N×2 investigations. To defend against distributed privacy fishing: if any agent is the target of >5 investigations within 90 days regardless of initiator identity, subsequent investigations targeting that agent require Tier 2 arbitrator panel approval before evidence collection begins. The per-respondent threshold is tracked across all initiators and is independent of the per-initiator threshold in Rule 5. **Schema extension.** Evidence requests include a scoping field: ```json { "evidence_request": { "investigation_id": "", "target_agent": "", "time_window": { "start": "", "end": "", "justification": "" }, "evidence_types_requested": [""], "incident_relevance": "", "approved_by": "", "request_hash": "" } } ``` Agents SHOULD validate that evidence requests match the approved investigation scope before providing data. Non-compliance with scoping rules by the Forensics Engine operator is itself an investigable incident. ### 5.8 Cryptographic Privacy Guarantees (Roadmap) > **Scope note:** The mechanisms described in this section are not part of the v1 specification. They define the research and integration roadmap for privacy-preserving forensics. Version 1 relies on the procedural controls in Section 5.7. This section is included to document the target architecture and inform implementers planning beyond v1. The procedural controls in Section 5.7 are necessary but not sufficient. They rely on the Forensics Engine operator's compliance — a trust assumption that may not hold in adversarial settings. Future versions of AJP will supplement procedural controls with two cryptographic mechanisms that provide mathematical privacy guarantees independent of operator behavior. #### 5.8.1 Zero-Knowledge Proof-Based Evidence Verification Zero-knowledge proofs (ZKPs) enable a prover to convince a verifier that a statement is true without revealing anything beyond the truth of the statement. For AJP forensics, this means: **prove that an agent violated a protocol rule or that logs are authentic without revealing the full action log.** **Applicable ZKP constructions:** - **ZK-SNARKs** (Succinct Non-Interactive Arguments of Knowledge) produce compact proofs verifiable in milliseconds. The "non-interactive" property is essential for asynchronous agent systems — the verifier does not need to be online during proof generation. Groth16 is the most widely deployed scheme; newer constructions (PLONK, Marlin) use universal setups reducing trust assumptions [37]. - **ZK-STARKs** (Scalable Transparent Arguments of Knowledge) require no trusted setup ("transparent"), making them more suitable for decentralized or adversarial contexts. They use collision-resistant hash functions rather than elliptic curves, providing quantum resistance. Trade-off: larger proof sizes (kilobytes vs. hundreds of bytes for SNARKs), though verification remains fast [37]. **Directly applicable prior work:** Jing & Qi (arXiv:2512.14737, December 2025) [38] introduce the **zk-MCP framework** — the most directly relevant work for AJP. The system integrates ZKPs with the Model Context Protocol (MCP) to audit agent communications while keeping messages private. After each communication session, agents generate three zero-knowledge proofs asynchronously: 1. **Token Consumption Proof** — proves token usage without exposing request details 2. **Output Authenticity Proof** — validates response legitimacy without input disclosure 3. **Hash-Based Verification** — ensures communication integrity through Poseidon hashing An independent Audit Service Provider (ASP) verifies proofs without accessing message content. Performance: **less than 4.14% overhead** on total communication costs; verification is constant-time regardless of message count; proof generation is asynchronous and non-blocking. **AJP integration path.** Future versions of the Forensics Engine may accept ZKP-based evidence submissions where: - An agent proves it did *not* execute a specific action type during a time window (exculpatory evidence) without revealing what it *did* do - An agent proves its CoC chain entries satisfy integrity constraints without revealing chain content - A third-party investigator proves forensic queries against log databases returned correct results without exposing the full database (cf. Space and Time's Proof of SQL [39]) **Verification cost infrastructure.** zkVerify [40], launched September 2025 as the first blockchain purpose-built for ZK proof verification, reduces verification costs by 90%+ compared to Ethereum (from $20-60 per proof to sub-dollar), making ZK-based agent accountability economically viable at scale. #### 5.8.2 Differential Privacy for Aggregate Analysis Differential privacy (DP) adds calibrated noise to data so that the inclusion or exclusion of any single record does not significantly affect query results. For AJP, DP enables **aggregate analysis of agent behavior patterns — incident rates, failure modes, risk trends — without exposing individual agent actions or the users they serve.** NIST SP 800-226 (March 2025) [41] provides the authoritative evaluation framework, structuring DP around an 8-layer pyramid from privacy parameters (epsilon/delta) through trust models to data collection practices. Key guidance: epsilon values above 10 "may not provide meaningful protection, especially for outliers"; user-level privacy is the recommended default. **AJP application.** Module 3 (Risk Assessment) population-level analytics (Section 7.5) SHOULD apply differential privacy when publishing aggregate risk data: - Incident frequency distributions across agent classes - Fault allocation patterns by incident type - Cooperation score distributions - Ecosystem-level risk trend indicators This prevents reverse-engineering individual agent risk profiles from published aggregate data while maintaining the statistical utility that insurers and regulators need. #### 5.8.3 Integrated Privacy Architecture The emerging architecture for privacy-preserving agent investigation combines six layers [42]: | Layer | Function | Technology | AJP Integration | |-------|----------|------------|-----------------| | 1. Mandatory logging | Creates forensic substrate | EU AI Act Article 12 (August 2026) | CoC chain entries | | 2. Differential privacy | Protects individuals in aggregate monitoring | NIST SP 800-226 framework | Module 3 population analytics | | 3. Zero-knowledge proofs | Verifies specific claims without full disclosure | zk-MCP, Groth16/PLONK | Evidence verification | | 4. Cross-border frameworks | Legal scaffolding for multi-jurisdictional access | CLOUD Act, EU e-Evidence (August 2026) | Tier 3 human escalation | | 5. Agent identity | Connects agents to accountable entities | ZKP-based identity (World AgentKit, Polygon ID) | Identity adapter layer | | 6. Verifiable computation | Proves forensic queries correct without database exposure | Proof of SQL, zkVerify | Forensics Engine queries | No existing system integrates all six layers. AJP's evidence scoping rules (Section 5.7) define which layer applies at each investigation stage: DP for routine monitoring, procedural scoping for dispute-triggered forensics, ZKPs for specific evidence verification, and cross-border frameworks for formal adjudication. ### 5.9 Forensic Investigation Research Foundations The Forensics Engine specification draws on three converging research traditions: **SRE-origin automated root cause analysis.** Datadog Bits AI SRE [83] uses hypothesis-driven investigation — forming hypotheses about root causes, then validating against targeted telemetry — achieving 90% faster root cause identification than manual methods. IBM Instana deploys causal AI achieving ~90% accuracy in production enterprise applications [36]. Cleric's "Production Memory" captures reusable diagnostic skills from past investigations. These systems demonstrate that automated investigation at production scale is achievable for infrastructure incidents; the extension to agent behavioral traces is the frontier. **Blockchain forensics methodologies.** Chainalysis Reactor's investigation workflow — enter address, auto-populate connections, attribute known entities, assign risk scores, manual annotation, court-ready export — maps directly to agent forensic investigation [64]. The co-spend heuristic (grouping addresses used as inputs in the same transaction) transfers to co-action heuristics for agents. TRM Labs' Glass Box Attribution — showing the source and confidence score for every attribution — is exactly what AJP forensic findings require. **Incident analysis frameworks.** Ezell, Roberts-Gaal & Chan (arXiv:2508.14231, August 2025) [80] propose the most rigorous academic framework: three categories of incident factors (system factors from development choices, contextual factors from deployment conditions, cognitive errors from execution flaws), each mapped to specific data requirements. Their 30-day default log retention recommendation, extended for flagged violations, informs AJP's evidence collection windows. ### 5.10 CoC Integration: Investigation Events Forensic investigations are recorded as CoC Layer 2 events: ```json { "event_type": "INVESTIGATION_INITIATED", "data": { "investigation_id": "", "incident_id": "", "subjects": [""], "initiator": "", "scope": "