# AgentLeak

Benchmark for privacy leakage in multi-agent LLM systems.

This repository accompanies the IEEE Access paper: *AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems*.

Paper https://arxiv.org/abs/2602.11510

## Key Results (5,694 traces across 5 models)

| Model | C1 (Output) | C2 (Internal) | H1 (Audit Gap) | Total Leak |
|-------|-------------|---------------|----------------|------------|
| **Claude-3.5-Sonnet** | 8.2% | 53.9% | 45.7% | 55.2% |
| GPT-4o | 17.2% | 76.8% | 59.6% | 77.6% |
| GPT-4o-mini | 41.2% | 75.3% | 34.2% | 76.3% |
| Llama-3.3-70B | 26.9% | 67.8% | 41.3% | 89.9% |
| Mistral-Large | 47.5% | 96.2% | 48.7% | 99.3% |
| **Average** | **28.2%** | **74.0%** | **45.9%** | **79.7%** |

### Key Findings

- **Internal channels leak 2.6× more** than external (74.0% vs 28.2%)
- **Output-only audits miss 45.9%** of violations
- **Claude 3.5 Sonnet paradox**: Lowest C1 leakage (8.2%) but 6.6× internal/external ratio—the highest among all models
- **Finding 7 (Tool Leakage)**: Tool inputs (C3) and system logs (C6) exhibit extremely high leakage rates (up to **85%** on Claude 3.5), even when the final agent output (C1) is perfectly sanitized.
- Pattern C2 > C1 holds **across all 5 models** tested

## Scope

- 1,000 scenarios (healthcare, finance, legal, corporate)
- 7 channels: C1 output, C2 inter-agent, C3-C4 tools, C5 memory, C6 logs, C7 artifacts
- 32 attack classes, 6 families
- SDK: CrewAI, LangChain, AutoGPT, MetaGPT

## Reproduction

### Main Benchmark (C1, C2, C5)
To reproduce the main results (Output, Internal, Memory):
```bash
cd benchmarks/ieee_repro
python benchmark.py --n 1000 --traces --model openai/gpt-4o
```

### Advanced Tools & Logs Benchmark (C3, C6)
Targets "Secondary Channel" leakage where sensitive data is sent to external tools or dumped in logs.
```bash
cd benchmarks/ieee_repro
# Run for a specific model (e.g., Claude 3.5)
python benchmark_tools.py --n 100 --model anthropic/claude-3.5-sonnet

# Or run the automated multi-model test suite
./run_tools_benchmark.sh
```
Results are saved in `benchmarks/ieee_repro/results/tools/`.

## Structure

- `agentleak/`: The core framework SDK
- `agentleak_data/`: The dataset of 1000 scenarios
- `benchmarks/ieee_repro/`: Scripts to reproduce the paper's findings, including Finding 7 (Tools & Logs).
- `benchmarks/showcase/`: Real-world CrewAI integration demo showing the SDK in action.
- `paper/`: The LaTeX source of the IEEE Access paper

## Setup

```bash
git clone https://github.com/Privatris/AgentLeak
cd AgentLeak
pip install -e .
pytest tests/ -v
```

## Usage

```python
from agentleak import AgentLeakTester, DetectionMode

tester = AgentLeakTester(mode=DetectionMode.HYBRID)
result = tester.check(
    vault={"ssn": "123-45-6789"},
    output="The SSN is 123-45-6789",
    channel="C1"
)
print(f"Leak: {result.leaked}, Confidence: {result.confidence}")
```

CLI:
```bash
python -m agentleak run --quick --dry-run
python -m agentleak run --full
```

## Reproduction

```bash
cd benchmarks/ieee_repro
python benchmark.py --n 100 --traces --model openai/gpt-4o-mini
```

Traces are in `benchmarks/ieee_repro/results/traces/`.

## Citation

```bibtex
@article{el2026agentleak,
  title        = {AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems},
  author       = {El Yagoubi, Faouzi and Badu-Marfo, Godwin and Al Mallah, Ranwa},
  journal      = {arXiv preprint arXiv:2602.11510},
  year         = {2026},
  url          = {https://arxiv.org/abs/2602.11510},
  abstract     = {Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks, sensitive data passes through inter-agent messages, shared memory, and tool arguments, pathways that output-only audits never inspect. We introduce AgentLeak, the first full-stack benchmark for privacy leakage covering internal channels, spanning 1,000 scenarios across healthcare, finance, legal, and corporate domains, paired with a 32-class attack taxonomy and a three-tier detection pipeline. Testing several models across thousands of traces shows that internal channels in multi-agent configurations are the primary privacy vulnerability and that output-only audits miss a large fraction of violations, underscoring the need for coordinated privacy protections on inter-agent communication.},
  note         = {Submitted to arXiv on 12 Feb 2026.},
}
```

## License

MIT