# Contributing to Multi-MCP

Thanks for your interest in contributing! This guide will help you get started.

## Quick Start

1. **Fork and clone** the repository
2. **Install dependencies:** `uv sync --extra dev`
3. **Create a branch:** `git checkout -b feature-name`
4. **Make your changes**
5. **Run tests and checks:** `make check && make test`
6. **Submit a PR**

## Development Setup

```bash
# Clone your fork
git clone https://github.com/YOUR_USERNAME/multi_mcp.git
cd multi_mcp

# Install dependencies with dev extras (required for testing and linting)
uv sync --extra dev

# Create .env from template and add your API keys
cp .env.example .env
# Edit .env and add at least one API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, or OPENROUTER_API_KEY)
```

## Before Submitting a PR

**IMPORTANT:** Run full integration testing before submitting your PR to ensure everything works with real API calls.

> **Windows note:** `make` is not installed by default on Windows. Run `./scripts/check.sh` via Git Bash for all quality checks + unit tests, or use the individual `uv run ...` commands shown below.

```bash
# 1. Run all code quality checks + unit tests (fast)
make check && make test

# 2. Run full integration tests (REQUIRED before PR submission)
make test-integration
# This runs 96 integration tests (~8-10min) with real API calls
# Requires at least one API key in .env

# Or run everything at once:
make check && make test-all

# Individual commands:

# Format code (auto-fixes issues)
uv run ruff format . && uv run ruff check . --fix

# Type checking
uv run pyright

# Linting
uv run ruff check .

# Unit tests (571 tests, ~2s)
make test
# or: uv run pytest tests/unit/ -v

# Integration tests (96 tests, ~8-10min with parallel execution, REQUIRED before PR)
make test-integration
# or: RUN_E2E=1 uv run pytest tests/integration/ -n auto -v
```

## Code Standards

- **Python 3.11+** required (as specified in `pyproject.toml`: `>=3.11,<3.14`; CI runs 3.13)
- **Type hints** on all functions
- **Async-first** - use `async def` for I/O operations
- **Test coverage** - minimum 80% overall (add tests for new features)
- **Line length** - 120 characters max (configured in `pyproject.toml`)
- **Error handling** - return structured error dicts with context
- **Logging** - use `logger.info()` for model calls with thread_id, model name, token usage

## Design Principles

- **DRY (Don't Repeat Yourself)**: Field descriptions, validation rules, and documentation defined once in Pydantic models
- **Single Source of Truth**: Schema models are the authoritative source for parameter definitions
- **Type Safety**: Full type checking with Pydantic and Pyright
- **YAGNI**: Don't add complexity until actually needed
- **KISS**: Keep it simple, stupid!
- **Clean Code**: No dead code, all imports used, all tests passing
- **Greenfield project**: No worries about backward compatibility - breaking changes are allowed

## Project Structure

```
multi_mcp/
├── server.py          # FastMCP server with factory-generated tools
├── cli.py             # CLI tool (experimental)
├── settings.py        # Environment-based configuration (Pydantic Settings)
├── schemas/           # Pydantic models for request validation
│   ├── base.py        # Base classes, ModelResponseMetadata
│   ├── codereview.py  # CodeReview request/response
│   ├── chat.py        # Chat request/response
│   ├── compare.py  # Compare request/response
│   └── debate.py      # Debate request/response
├── tools/             # Tool implementation functions (*_impl)
│   ├── codereview.py  # Code review workflow
│   ├── chat.py        # Interactive chat
│   ├── compare.py  # Multi-model parallel analysis
│   ├── debate.py      # Two-step debate workflow
│   └── models.py      # Model listing
├── models/            # Model configuration and LLM integration
│   ├── config.py      # YAML-based model config
│   ├── resolver.py    # Model alias resolution
│   └── litellm_client.py  # Async LLM API calls
├── memory/            # Conversation state management
│   └── store.py       # ThreadStore for in-memory conversations
├── prompts/           # System prompts (markdown files)
│   ├── codereview.md
│   ├── chat.md
│   ├── compare.md
│   ├── debate-step1.md
│   └── debate-step2.md
└── utils/             # Utility functions
    ├── mcp_factory.py     # Auto-generate MCP tools from schemas
    ├── mcp_decorator.py   # Request context management
    ├── artifacts.py       # Artifact saving
    ├── llm_runner.py      # LLM execution helpers
    └── ...                # See CLAUDE.md for full list
```

## Testing

We have 667 total tests: 571 unit tests (~2s) and 96 integration tests (~8-10min with real API calls).

```bash
# Unit tests only (571 tests, ~2s, fast - run before every commit)
make test
# or: uv run pytest tests/unit/ -v

# Integration tests (96 tests, ~8-10min, requires real API keys)
make test-integration
# or: RUN_E2E=1 uv run pytest tests/integration/ -n auto -v

# Run integration tests sequentially (slower, ~15min)
RUN_E2E=1 uv run pytest tests/integration/ -v

# Run with specific number of workers
RUN_E2E=1 uv run pytest tests/integration/ -n 4 -v

# All tests (667 total)
make test-all
# or: RUN_E2E=1 uv run pytest -v

# Run with coverage report
uv run pytest tests/unit/ --cov=multi_mcp --cov-report=html
```

**Note:** Integration tests require at least one API key (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, or OPENROUTER_API_KEY) and make real API calls which cost money. They are **disabled in CI** to save costs. We use low-cost models (gpt-5-mini, gemini-3-flash) for testing.

**Parallel Execution:** Integration tests can run in parallel using `pytest-xdist` (`-n auto`), but currently run sequentially by default. Tests use unique thread IDs (UUIDs) so they don't conflict.

**VCR Status:** VCR (cassette recording) is currently **disabled** due to compatibility issues with httpx/litellm. All integration tests make real API calls. See `tests/cassettes/README.md` for details.

### Test Organization

- `tests/unit/` - Fast unit tests with mocked LLM calls (no API keys needed)
- `tests/integration/` - End-to-end tests with real API calls (requires API keys)

### Writing Tests

- Focus on testing `*_impl()` functions directly (no MCP server needed)
- Mock LiteLLM using the `mock_litellm` fixture
- Use `AsyncMock` for async operations
- Minimum 80% code coverage required

## Reporting Issues

Found a bug or have a feature request? [Open an issue](https://github.com/religa/multi_mcp/issues/new) and include:
- **Bugs:** Description, steps to reproduce, expected vs actual behavior, environment details
- **Features:** Description, use case, proposed solution

## Pull Request Process

1. **Create an issue first** (for major changes)
2. **Keep PRs focused** - one feature/fix per PR
3. **Add tests** for new functionality (both unit and integration if applicable)
4. **Run full integration testing** locally before submitting:
   ```bash
   make test-integration
   # or: RUN_E2E=1 uv run pytest tests/integration/ -v
   ```
5. **Update docs** if you change APIs (README.md, CLAUDE.md, or docs/)
6. **Ensure all checks pass**:
   - ✅ Unit tests (571 tests)
   - ✅ Integration tests (96 tests) - **REQUIRED locally before PR**
   - ✅ Type checking (pyright)
   - ✅ Linting (ruff check)
   - ✅ Format check (ruff format --check)

**CI Notes:**
- Integration tests are **disabled in CI** (cost money) but **REQUIRED locally before submitting PR**
- CI runs unit tests + code quality checks on every push/PR
- All CI checks must pass before merge
- You must confirm integration tests passed locally when submitting PR

## Architecture Overview

Multi-MCP uses a clean, factory-based architecture:

**Factory Pattern for MCP Tools:**
- Tools are auto-generated from Pydantic schemas using `create_mcp_wrapper()` factory
- Schema models define field descriptions once (DRY principle)
- Implementation functions (`*_impl()`) contain business logic
- MCP decorators handle context management and logging

**Request Context Management:**
- Uses Python's `contextvars` for request-scoped data
- Context includes: `thread_id`, `workflow`, `step_number`, `base_path`
- Set at entry via `@mcp_decorator`, accessed via `get_*()` helpers
- Enables clean APIs without explicit parameter passing

**Model Configuration:**
- YAML-based model config (`multi_mcp/config/config.yaml`)
- Aliases resolve to full model names (e.g., `mini` → `gpt-5-mini`)
- Runtime defaults in Settings class (`multi_mcp/settings.py` via `.env` files)
- Supports both API models (via LiteLLM) and CLI models (subprocess execution)

For detailed architecture documentation, see `CLAUDE.md`.

## Development Workflow

### Adding a New MCP Tool

Follow this pattern (see CLAUDE.md for details):

1. **Create Pydantic schema** in `multi_mcp/schemas/` (inherit from `BaseToolRequest` or `SingleToolRequest`)
2. **Create implementation function** in `multi_mcp/tools/` (e.g., `my_tool_impl()`)
3. **Add factory-generated wrapper** in `multi_mcp/server.py`:
   ```python
   my_tool = create_mcp_wrapper(MyToolRequest, my_tool_impl, "Description")
   my_tool = mcp.tool()(mcp_monitor(my_tool))
   ```
4. **Add unit tests** in `tests/unit/test_my_tool.py`
5. **Add integration test** in `tests/integration/test_e2e_my_tool.py`
6. **Add system prompt** (if needed) in `multi_mcp/prompts/my_tool.md`

### Debugging

```bash
# Check MCP logs (request/response)
cat logs/*.mcp.json | jq .

# Check LLM API logs
cat logs/*.llm.json | jq .

# Check console logs
tail -f logs/server.log

# Enable verbose logging
LOG_LEVEL=DEBUG uv run python multi_mcp/server.py
```

### Cleanup

```bash
# Remove all cache and build artifacts
make clean
```

This removes:
- `__pycache__/` directories
- `.pytest_cache/`
- `*.egg-info/` (including `multi.egg-info/`)
- `.ruff_cache/`, `.mypy_cache/`, `.pyright/`
- `*.pyc`, `*.pyo`, `*.swp` files
- Coverage reports, dist/, build/

## Logging and Debugging

Multi-MCP has comprehensive logging for development and debugging:

**MCP Tool Logging** (`multi_mcp/utils/mcp_logger.py`):
- Logs all MCP tool requests and responses
- Format: `logs/TIMESTAMP.THREAD_ID.mcp.json`
- Tracks: tool_name, direction (request/response), data, thread_id

**LLM API Logging** (`multi_mcp/utils/request_logger.py`):
- Logs all LiteLLM API calls
- Format: `logs/TIMESTAMP.THREAD_ID.llm.json`
- Tracks: model, messages, temperature, usage, response

**Console Logging:**
- All logs also go to `logs/server.log`
- Structured tags: `[CODEREVIEW]`, `[CHAT]`, `[COMPARE]`, `[MODEL_CALL]`, `[MCP_LOG]`

**Example log files:**
```
logs/
├── server.log                         # Console logs
├── 20251204_180512_345.thread123.mcp.json  # MCP request
├── 20251204_180514_678.thread123.mcp.json  # MCP response
└── 20251204_180515_234.thread123.llm.json  # LLM API call
```

## Common Development Tasks

**Updating Prompts:**
- Edit files in `multi_mcp/prompts/*.md`
- Changes take effect on server restart
- Test with: `RUN_E2E=1 uv run pytest tests/integration/`

**Adding New Models:**
- Edit `multi_mcp/config/config.yaml`
- Add aliases, temperature constraints, provider info
- Update tests if model behavior differs
- See `CLAUDE.md` for model configuration details

**Debugging Model Calls:**
- Check MCP logs: `cat logs/*.mcp.json | jq .`
- Check LLM logs: `cat logs/*.llm.json | jq .`
- Check console logs for tagged entries with thread_id
- Use `LOG_LEVEL=DEBUG` for verbose output

## Project File Guidelines

- **Documentation**: New documentation goes in `docs/`
- **Temporary Files**: Use `tmp/` for experiments and complex bash scripts
- **Reference Projects**: `ref/` contains reference projects - DO NOT modify
- **No Bash for File Operations**: Use Claude Code's Read/Write tools, NOT Bash commands
- **Python Scripts**: Write to `tmp/` directory first, then execute with `uv run python tmp/file_name.py`

## Questions?

Open an issue with the "Question" type and we'll help you out!