---
name: "deep-research"
description: "Integrated deep research skill that collects documents from multiple
  sources, stores them in organized folders, and synthesizes findings into a comprehensive
  research report with citations."
metadata:
  stage: "alpha"
  source: "MIGRATED"
  requires:
  - "knowledge-research-contract"
---

# Deep Research - Integrated Research Skill

A hybrid research skill combining parallel document collection, structured storage, and evidence-based synthesis.

## When to Use This Skill

Use this skill when:
- You need comprehensive research with traceable sources
- You want to organize research findings systematically
- You need a final report with proper citations
- You're conducting multi-source analysis
- You need reproducible research workflows

## Features

### Three-Phase Workflow

1. **Collection Phase** - Parallel document gathering
   - Multi-source web search
   - Query decomposition
   - Rate-limited execution
   - Deduplication

2. **Storage Phase** - Organized persistence
   - Structured folder hierarchy
   - JSON metadata tracking
   - Raw source preservation
   - Clustering data

3. **Synthesis Phase** - Evidence-based reporting
   - Semantic clustering
   - Citation tracking
   - Structured markdown output
   - Quality scoring

## Requirements

- Python 3.8+
- Dependencies: `pip install -r requirements.txt`

## Installation

```bash
cd deep-research
pip install -r requirements.txt
```

## Usage

> [!IMPORTANT]
> **Run from Workspace Root**: Always replicate the command structure below from your project's root directory. Do not `cd` into the skill folder. This ensures artifacts are saved in your workspace, not hidden in the skill directory.

### Basic Research

```bash
# Run from your project root
python .agent/skills/deep-research/scripts/research.py --query "Research the history of Kubernetes"
```

### With Custom Output Directory

```bash
python .agent/skills/deep-research/scripts/research.py \
  --query "Compare Python web frameworks" \
  --output ./my-research
```

### Specify Number of Sources

```bash
python .agent/skills/deep-research/scripts/research.py \
  --query "Machine learning trends 2024" \
  --max-sources 20
```

### Skip Synthesis (Collection Only)

```bash
python .agent/skills/deep-research/scripts/research.py \
  --query "Docker vs Podman" \
  --no-synthesis
```

## Output Structure

```
research-output/
└── kubernetes-history/
    └── 20260217-180000/
        ├── metadata.json          # Research metadata
        ├── sources/               # Collected documents
        │   ├── source-001.json
        │   ├── source-002.json
        │   └── ...
        ├── clusters.json          # Semantic clustering
        └── final-report.md        # Synthesized report
```

## Metadata Format

```json
{
  "query": "Research the history of Kubernetes",
  "timestamp": "2026-02-17T18:00:00Z",
  "sources_collected": 15,
  "sources_used": 12,
  "clusters_identified": 4,
  "synthesis_completed": true,
  "duration_seconds": 45.3
}
```

## Final Report Structure

The synthesized `final-report.md` includes:

1. **Executive Summary** - Key findings overview
2. **Research Overview** - Query, methodology, sources
3. **Findings by Topic** - Clustered insights with citations
4. **Source Analysis** - Credibility assessment
5. **Conclusions** - Synthesized recommendations
6. **References** - Complete source list

## Advanced Usage

### Custom Source Types

```bash
python .agent/skills/deep-research/scripts/research.py \
  --query "GraphQL best practices" \
  --source-types academic,documentation,blog
```

### Parallel Collection Control

```bash
python .agent/skills/deep-research/scripts/research.py \
  --query "Rust ownership model" \
  --max-parallel 5
```

### Re-synthesize Existing Research

```bash
python .agent/skills/deep-research/scripts/synthesizer.py \
  --input ./research-output/kubernetes-history/20260217-180000
```

## Integration with Other Tools

### Use as a Library

```python
from scripts.research import DeepResearcher

researcher = DeepResearcher(
    query="WebAssembly use cases",
    output_dir="./research-output",
    max_sources=15
)

# Run full workflow
result = researcher.execute()

# Or run phases separately
researcher.collect()
researcher.synthesize()
```

## Best Practices

- **Specific Queries**: More specific queries yield better results
- **Source Diversity**: Use multiple source types for balanced research
- **Iterative Refinement**: Review initial results and re-run with refined queries
- **Citation Verification**: Always verify citations in final report
- **Storage Management**: Archive or clean old research outputs regularly

## Workflow Example

```bash
# 1. Initial research
python .agent/skills/deep-research/scripts/research.py --query "Container orchestration comparison"

# 2. Review sources in research-output/container-orchestration-comparison/*/sources/

# 3. Re-synthesize with different clustering
python .agent/skills/deep-research/scripts/synthesizer.py \
  --input ./research-output/container-orchestration-comparison/20260217-180000 \
  --min-cluster-size 3

# 4. Generate final report
# Output: research-output/container-orchestration-comparison/20260217-180000/final-report.md
```

## Configuration

Create `.research-config.json` in your project:

```json
{
  "default_output": "./research-output",
  "max_sources": 15,
  "max_parallel": 3,
  "source_types": ["academic", "documentation", "blog"],
  "min_credibility": 0.6,
  "cache_enabled": true
}
```

## Troubleshooting

### No sources collected
- Check internet connection
- Verify query is not too specific
- Increase `--max-sources`

### Synthesis fails
- Ensure sources were collected successfully
- Check `metadata.json` for errors
- Try re-running synthesis only

### Low quality results
- Refine query to be more specific
- Increase `--min-credibility` threshold
- Filter by `--source-types`

## Exit Codes

- **0**: Success
- **1**: Collection error
- **2**: Synthesis error  
- **3**: Configuration error
- **130**: Cancelled by user (Ctrl+C)