---
name: observability-instrument-with-otel
description: |
  Adds OpenTelemetry instrumentation to service methods by applying @traced decorators, structured logging,
  and trace/span ID propagation for distributed tracing. Use when adding new service methods, debugging
  complex flows, investigating performance issues, or need distributed tracing across application layers.
  Triggers on "add tracing", "instrument this method", "add observability", "debug with traces", or
  "track execution flow".
allowed-tools:
  - Read
  - Grep
  - Edit
  - Bash
---

# Instrument with OpenTelemetry

## Purpose
Add OpenTelemetry (OTEL) instrumentation to service methods for distributed tracing, structured logging, and observability across Clean Architecture layers.

## When to Use

Use this skill when:
- Adding new service methods that need observability
- Debugging complex execution flows across layers
- Investigating performance issues or bottlenecks
- Need distributed tracing for multi-service operations
- Adding structured logging with trace correlation
- Instrumenting Application, Infrastructure, or Interface layers
- Replacing print() statements with proper logging
- Tracking operations across method boundaries

## Table of Contents

### Core Sections

- [Quick Start](#quick-start)
  - Simple example to get started immediately
- [Instructions](#instructions)
  - [Step 1: Import OTEL Components](#step-1-import-otel-components)
  - [Step 2: Get Logger Instance](#step-2-get-logger-instance)
  - [Step 3: Apply @traced Decorator](#step-3-apply-traced-decorator)
  - [Step 4: Add Structured Logging](#step-4-add-structured-logging)
  - [Step 5: Add Manual Spans for Complex Operations (Optional)](#step-5-add-manual-spans-for-complex-operations-optional)
  - [Step 6: Add Correlation Context for Request Tracking (Optional)](#step-6-add-correlation-context-for-request-tracking-optional)
  - [Step 7: Verify Instrumentation](#step-7-verify-instrumentation)
- [Examples](#examples)
  - [Example 1: Add Tracing to Command Handler](#example-1-add-tracing-to-command-handler)
  - [Example 2: Add Tracing to Infrastructure Service](#example-2-add-tracing-to-infrastructure-service)
  - [Example 3: Add Manual Spans for Performance Tracking](#example-3-add-manual-spans-for-performance-tracking)
  - [Example 4: Add Correlation Context for Multi-File Operations](#example-4-add-correlation-context-for-multi-file-operations)

### Advanced Topics

- [Common Patterns](#common-patterns)
  - [Pattern 1: Service Method Instrumentation](#pattern-1-service-method-instrumentation)
  - [Pattern 2: Error Logging with Trace Context](#pattern-2-error-logging-with-trace-context)
  - [Pattern 3: Multi-Step Operation Tracking](#pattern-3-multi-step-operation-tracking)
- [Observability Best Practices](#observability-best-practices)
  - Guidelines for effective instrumentation
- [Troubleshooting](#troubleshooting)
  - [Logs Missing Trace IDs](#logs-missing-trace-ids)
  - [@traced Not Working on Async Methods](#traced-not-working-on-async-methods)
  - [Span Attributes Not Appearing](#span-attributes-not-appearing)
- [Requirements](#requirements)
  - Dependencies and prerequisites

### Supporting Resources

- [references/reference.md](./references/reference.md) - Advanced OTEL configuration
- [ARCHITECTURE.md](/Users/dawiddutoit/projects/play/project-watch-mcp/ARCHITECTURE.md) - Clean Architecture layers
- Core monitoring module: `src/project_watch_mcp/core/monitoring/`

### Utility Scripts

- [scripts/add_tracing.py](./scripts/add_tracing.py) - Auto-add OpenTelemetry instrumentation to service files
- [scripts/analyze_traces.py](./scripts/analyze_traces.py) - Analyze trace coverage in Python service files
- [scripts/validate_instrumentation.py](./scripts/validate_instrumentation.py) - Validate OTEL instrumentation patterns

## Quick Start
Add OTEL tracing to a service method:
```python
from project_watch_mcp.core.monitoring import get_logger, traced

logger = get_logger(__name__)

class MyService:
    @traced
    async def my_method(self, param: str) -> ServiceResult[Data]:
        logger.info(f"Processing {param}")
        # Method implementation
        return ServiceResult.ok(result)
```

## Instructions

### Step 1: Import OTEL Components
Add imports at the top of your file (fail-fast principle):
```python
from project_watch_mcp.core.monitoring import get_logger, traced
```

For advanced instrumentation (context management, correlation IDs):
```python
from project_watch_mcp.core.monitoring import (
    get_logger,
    traced,
    trace_span,
    correlation_context,
    add_context,
)
```

### Step 2: Get Logger Instance
Replace any existing logger initialization with OTEL logger:
```python
logger = get_logger(__name__)
```

**Why:** OTEL logger automatically includes trace_id, span_id, and correlation IDs in all log messages.

### Step 3: Apply @traced Decorator
Add `@traced` decorator to methods you want to trace:

**For Application Layer (Commands/Queries):**
```python
class IndexFileHandler(CommandHandler[IndexFileCommand]):
    @traced
    async def handle(self, command: IndexFileCommand) -> ServiceResult[None]:
        logger.info(f"Indexing file: {command.file_path}")
        # Implementation
```

**For Infrastructure Services:**
```python
class EmbeddingService:
    @traced
    async def generate_embeddings(self, texts: list[str]) -> ServiceResult[list[float]]:
        logger.info(f"Generating embeddings for {len(texts)} texts")
        # Implementation
```

**For MCP Tools (Interface Layer):**
```python
@traced
async def search_code(query: str, search_type: str = "semantic") -> dict:
    logger.info(f"Search request: query='{query}', type={search_type}")
    # Implementation
```

### Step 4: Add Structured Logging
Use logger with contextual information (avoid magic numbers, use clear descriptions):

**Good Examples:**
```python
logger.info(f"Processing file: {file_path}")
logger.debug(f"Query execution time: {elapsed_ms}ms")
logger.warning(f"Retry attempt {attempt}/{max_retries} failed")
logger.error(f"Failed to connect to Neo4j: {str(e)}")
```

**What @traced Automatically Adds:**
- `trace_id`: Distributed tracing ID (propagated across services)
- `span_id`: Current operation ID (unique per method call)
- Function name as span name (e.g., `IndexFileHandler.handle`)
- Function arguments as span attributes (primitives only)

### Step 5: Add Manual Spans for Complex Operations (Optional)
For fine-grained tracing within a method:
```python
@traced
async def complex_operation(self, data: Data) -> ServiceResult[Result]:
    logger.info("Starting complex operation")

    # Manual span for specific sub-operation
    with trace_span("validate_data", data_size=len(data)) as span:
        logger.info("Validating data")
        validation_result = await self._validate(data)
        span.set_attribute("validation_passed", validation_result.success)

    # Another manual span
    with trace_span("process_chunks", chunk_count=10) as span:
        logger.info("Processing chunks")
        results = await self._process_chunks(data)
        span.set_attribute("chunks_processed", len(results))

    return ServiceResult.ok(results)
```

### Step 6: Add Correlation Context for Request Tracking (Optional)
For tracking operations across multiple method calls:
```python
async def index_repository(self, repo_path: str) -> ServiceResult[None]:
    with correlation_context() as cid:
        logger.info(f"Starting repository indexing: {repo_path}")

        # All subsequent logs will include this correlation_id
        for file_path in files:
            await self.index_file(file_path)  # Logs include same correlation_id

        logger.info("Repository indexing complete")
```

### Step 7: Verify Instrumentation
Run tests to ensure tracing works:
```bash
uv run pytest tests/path/to/test.py -v
```

Check logs for trace/span IDs:
```bash
tail -f logs/project-watch-mcp.log | grep "trace:"
```

Expected log format:
```
2025-10-18 14:30:45 - [trace:a1b2c3d4 | span:e5f6g7h8] - module.name - INFO - Processing file
```

## Examples

### Example 1: Add Tracing to Command Handler
**Before:**
```python
class IndexFileHandler(CommandHandler[IndexFileCommand]):
    async def handle(self, command: IndexFileCommand) -> ServiceResult[None]:
        print(f"Indexing {command.file_path}")
        return await self._index(command)
```

**After:**
```python
from project_watch_mcp.core.monitoring import get_logger, traced

logger = get_logger(__name__)

class IndexFileHandler(CommandHandler[IndexFileCommand]):
    @traced
    async def handle(self, command: IndexFileCommand) -> ServiceResult[None]:
        logger.info(f"Indexing file: {command.file_path}")
        return await self._index(command)
```

### Example 2: Add Tracing to Infrastructure Service
**Before:**
```python
class Neo4jCodeRepository:
    async def save_file(self, file: File) -> bool:
        logging.info("Saving file")
        # Implementation
```

**After:**
```python
from project_watch_mcp.core.monitoring import get_logger, traced

logger = get_logger(__name__)

class Neo4jCodeRepository:
    @traced
    async def save_file(self, file: File) -> ServiceResult[None]:
        logger.info(f"Saving file to Neo4j: {file.path}")
        # Implementation
        return ServiceResult.ok(None)
```

### Example 3: Add Manual Spans for Performance Tracking
```python
from project_watch_mcp.core.monitoring import get_logger, traced, trace_span

logger = get_logger(__name__)

class EmbeddingService:
    @traced
    async def batch_generate(self, texts: list[str]) -> ServiceResult[list[Embedding]]:
        logger.info(f"Batch generating embeddings for {len(texts)} texts")

        # Track API call performance
        with trace_span("voyage_api_call", text_count=len(texts)) as span:
            logger.debug("Calling Voyage AI API")
            embeddings = await self.client.embed(texts)
            span.set_attribute("embeddings_generated", len(embeddings))

        # Track storage performance
        with trace_span("store_embeddings", embedding_count=len(embeddings)) as span:
            logger.debug("Storing embeddings in Neo4j")
            await self.repository.save_embeddings(embeddings)
            span.set_attribute("storage_success", True)

        return ServiceResult.ok(embeddings)
```

### Example 4: Add Correlation Context for Multi-File Operations
```python
from project_watch_mcp.core.monitoring import get_logger, traced, correlation_context

logger = get_logger(__name__)

class RepositoryIndexer:
    @traced
    async def index_repository(self, repo_path: str) -> ServiceResult[None]:
        # Generate correlation ID for this indexing operation
        with correlation_context() as cid:
            logger.info(f"Starting repository indexing: {repo_path} (correlation_id: {cid})")

            files = await self.discover_files(repo_path)
            logger.info(f"Discovered {len(files)} files to index")

            for file_path in files:
                # All logs from index_file will include same correlation_id
                await self.index_file_handler.handle(IndexFileCommand(file_path))

            logger.info(f"Repository indexing complete: {len(files)} files processed")

        return ServiceResult.ok(None)
```

## Requirements
- OpenTelemetry initialized (handled by `initialize_otel_logger()` in core/monitoring)
- Python 3.11+ (for modern type hints)
- Async/await support for async methods
- ServiceResult pattern used for return types
- Access to `project_watch_mcp.core.monitoring` module

## Common Patterns

### Pattern 1: Service Method Instrumentation
```python
from project_watch_mcp.core.monitoring import get_logger, traced

logger = get_logger(__name__)

class MyService:
    @traced
    async def operation(self, param: str) -> ServiceResult[Data]:
        logger.info(f"Starting operation with param: {param}")
        result = await self._execute(param)
        logger.info(f"Operation completed successfully")
        return ServiceResult.ok(result)
```

### Pattern 2: Error Logging with Trace Context
```python
@traced
async def risky_operation(self) -> ServiceResult[Data]:
    try:
        logger.info("Attempting risky operation")
        result = await self._risky_call()
        return ServiceResult.ok(result)
    except Exception as e:
        logger.error(f"Operation failed: {str(e)}")
        return ServiceResult.fail(f"Operation failed: {str(e)}")
```

### Pattern 3: Multi-Step Operation Tracking
```python
@traced
async def multi_step_process(self, data: Data) -> ServiceResult[Result]:
    with trace_span("step_1_validation") as span:
        logger.info("Step 1: Validating input")
        validation = await self._validate(data)
        span.set_attribute("valid", validation.success)

    with trace_span("step_2_processing") as span:
        logger.info("Step 2: Processing data")
        processed = await self._process(data)
        span.set_attribute("items_processed", len(processed))

    with trace_span("step_3_storage") as span:
        logger.info("Step 3: Storing results")
        await self._store(processed)
        span.set_attribute("items_stored", len(processed))

    return ServiceResult.ok(processed)
```

## Observability Best Practices
1. **Always use @traced on public methods** - Commands, Queries, Services
2. **Never use print() statements** - Use logger instead (fail-fast principle)
3. **Add context to logs** - Include relevant parameters, counts, durations
4. **Use trace_span for expensive operations** - API calls, database queries, file I/O
5. **Use correlation_context for request tracking** - Multi-file operations, batch processing
6. **Log at appropriate levels** - DEBUG (detailed), INFO (milestones), WARNING (degraded), ERROR (failures)
7. **Avoid logging sensitive data** - API keys, passwords, PII
8. **Set span attributes for metrics** - Counts, durations, success/failure indicators

## Troubleshooting

### Logs Missing Trace IDs
**Issue:** Logs don't show `[trace:... | span:...]`

**Solution:** Ensure OTEL is initialized before any logging:
```python
from project_watch_mcp.core.monitoring import initialize_otel_logger

initialize_otel_logger()  # Call once at application startup
```

### @traced Not Working on Async Methods
**Issue:** Decorator doesn't capture spans for async methods

**Solution:** The `@traced` decorator handles both sync and async automatically. Ensure you're using `async def`:
```python
@traced
async def my_method(self):  # ✅ Works
    pass

@traced
def my_method(self):  # ✅ Also works for sync
    pass
```

### Span Attributes Not Appearing
**Issue:** Custom attributes not visible in traces

**Solution:** Use `span.set_attribute()` within the span context:
```python
with trace_span("operation") as span:
    span.set_attribute("key", "value")  # ✅ Correct
```

## See Also
- [references/reference.md](./references/reference.md) - Advanced OTEL configuration
- [ARCHITECTURE.md](/Users/dawiddutoit/projects/play/project-watch-mcp/ARCHITECTURE.md) - Clean Architecture layers
- Core monitoring module: `src/project_watch_mcp/core/monitoring/`