---
name: otel
description: OpenTelemetry instrumentation for the Copilot Chat extension — covers the four agent execution paths, the IOTelService abstraction, span/metric/event conventions, and the relationship between code and the user/developer monitoring docs. Use when adding/changing OTel spans, metrics, or events; instrumenting a new agent surface; touching the Copilot CLI bridge or Claude span emission; or updating `extensions/copilot/docs/monitoring/agent_monitoring*.md`.
---

# OpenTelemetry Instrumentation Skill

When adding, changing, or reviewing OTel telemetry in the Copilot Chat extension, **always read the two source-of-truth docs first** and **always keep them in sync with the code you change**.

## 1. Authoritative Documents

The `extensions/copilot/docs/monitoring/` directory contains the two specs that define the OTel contract for the extension. Treat them like the layout / layer specs in `vs/sessions`.

| Document | Path | Audience | Covers |
|---|---|---|---|
| User-facing | `extensions/copilot/docs/monitoring/agent_monitoring.md` | Extension users | Quick start, settings, env vars, exported spans/metrics/events, backend setup guides |
| Architecture | `extensions/copilot/docs/monitoring/agent_monitoring_arch.md` | Developers | Multi-agent strategies, span hierarchies, file structure, instrumentation points, `IOTelService`, configuration channels |
| Visual flow | `extensions/copilot/docs/monitoring/otel-data-flow.html` | Developers | Renders the bridge data flow for the in-process Copilot CLI agent |

If the implementation changes, **you must update the relevant doc in the same PR**. The arch doc is the most likely to drift; treat divergence as a bug.

## 2. Architecture at a Glance

The extension has four agent execution paths, each with a different OTel strategy:

| Agent | Process Model | Strategy | Debug Panel Source |
|---|---|---|---|
| **Foreground** (`toolCallingLoop`) | Extension host | Direct `IOTelService` spans | Extension spans |
| **Copilot CLI in-process** | Extension host (same process) | **Bridge SpanProcessor** — SDK creates spans natively; bridge forwards to debug panel | SDK native spans via bridge |
| **Copilot CLI terminal** | Separate terminal process | Forward OTel env vars | N/A (separate process) |
| **Claude Code** | Child process (Node fork) | **Synthesized from SDK messages** — extension intercepts the Claude SDK message stream in `claudeMessageDispatch.ts` and emits GenAI spans; LLM calls are proxied through `claudeLanguageModelServer.ts` (which calls `chatMLFetcher`, producing standard `chat` spans). | Extension spans |

> **Why asymmetric?** The CLI SDK runs in-process with full trace hierarchy (subagents, permissions, hooks). A bridge captures this directly. Claude runs as a separate process — internal spans are inaccessible, so the extension synthesizes spans by translating SDK messages and proxying the model API.

## 3. Where Things Live (canonical map)

```
extensions/copilot/src/platform/otel/
├── common/
│   ├── otelService.ts          # IOTelService interface + ISpanHandle + injectCompletedSpan
│   ├── otelConfig.ts           # Config resolution (env → settings → defaults), enabledVia, dbSpanExporter
│   ├── noopOtelService.ts      # Zero-cost no-op (used by chatLib / tests)
│   ├── inMemoryOTelService.ts  # ← actually under node/, see below
│   ├── agentOTelEnv.ts         # deriveCopilotCliOTelEnv / deriveClaudeOTelEnv
│   ├── genAiAttributes.ts      # ⚠ Single source of truth for attribute keys & enums
│   ├── genAiEvents.ts          # Event emitter helpers (emit*Event)
│   ├── genAiMetrics.ts         # GenAiMetrics class
│   ├── messageFormatters.ts    # truncateForOTel, normalizeProviderMessages, toSystemInstructions, …
│   ├── workspaceOTelMetadata.ts
│   ├── sessionUtils.ts
│   └── index.ts                # ⚠ Public barrel — re-export new helpers/constants here
└── node/
    ├── otelServiceImpl.ts      # NodeOTelService + DiagnosticSpanExporter + FilteredSpanExporter + EXPORTABLE_OPERATION_NAMES
    ├── inMemoryOTelService.ts  # InMemoryOTelService (used when OTel is disabled — feeds debug panel only)
    ├── fileExporters.ts        # File-based span/log/metric exporters
    └── sqlite/                 # OTelSqliteStore + SqliteSpanExporter (dbSpanExporter pipeline)

extensions/copilot/src/extension/
├── chatSessions/
│   ├── copilotcli/node/
│   │   ├── copilotCliBridgeSpanProcessor.ts  # Bridge: SDK spans → IOTelService (+ hook span enrichment)
│   │   ├── copilotcliSession.ts              # Root invoke_agent copilotcli span + traceparent + hook stash
│   │   └── copilotcliSessionService.ts       # Bridge installation + env var setup
│   └── claude/
│       ├── common/claudeMessageDispatch.ts   # execute_tool / execute_hook spans + subagent context wiring
│       └── node/
│           ├── claudeOTelTracker.ts          # invoke_agent claude span + per-session token/cost rollup
│           └── claudeLanguageModelServer.ts  # Local HTTP proxy → chatMLFetcher (chat spans)
├── chat/vscode-node/
│   └── chatHookService.ts                    # execute_hook spans for foreground agent hooks
├── intents/node/toolCallingLoop.ts           # invoke_agent spans for foreground agent
├── tools/vscode-node/toolsService.ts         # execute_tool spans for foreground tools
├── prompt/node/chatMLFetcher.ts              # chat spans for all LLM calls
├── byok/vscode-node/                         # BYOK provider chat spans (anthropicProvider, geminiNativeProvider, …)
└── trajectory/vscode-node/
    ├── otelChatDebugLogProvider.ts           # Debug panel data provider
    ├── otelSpanToChatDebugEvent.ts           # Span → ChatDebugEvent conversion
    └── otlpFormatConversion.ts               # OTLP ↔ in-memory span format
```

## 3a. Attribute namespaces & dual-emit policy

Three namespaces coexist on extension-emitted spans:

| Namespace | Purpose | Status |
|---|---|---|
| `gen_ai.*` | OTel GenAI Semantic Conventions. Use whenever a standard key exists. | Canonical |
| `github.copilot.*` | Copilot-specific vendor namespace. | **Preferred — new attributes go here.** |
| `copilot_chat.*` | Original VS Code-only namespace. Several keys remain for backwards compatibility. | **Legacy — keep emitting; do not add new keys here.** |

### Dual-emit rules

- When adding a new attribute that belongs to Copilot's vendor namespace, emit it under `github.copilot.*` only — do **not** introduce a `copilot_chat.*` twin.
- When **renaming** an existing `copilot_chat.*` attribute to its `github.copilot.*` equivalent (e.g., `copilot_chat.repo.*` → `github.copilot.git.*`, `gen_ai.usage.reasoning_tokens` → `gen_ai.usage.reasoning.output_tokens`), **dual-emit both keys indefinitely**. Downstream readers (Agent Debug Log, Chronicle, SQLite span store, OTLP collectors) may depend on the legacy key.
- Mark the legacy row in [agent_monitoring.md](../../../extensions/copilot/docs/monitoring/agent_monitoring.md) with **Legacy** in the "Requirement" column and a pointer to the preferred key. No sunset date — legacy keys live on indefinitely.
- Hash sensitive identifiers (e.g., MCP server names) with `hashTelemetryValue` from [`util/node/crypto.ts`](../../../extensions/copilot/src/util/node/crypto.ts). Emit hashes unconditionally; raw values only when `captureContent` is enabled.

## 4. Service Layer & Selection

`IOTelService` ([otelService.ts](../../../extensions/copilot/src/platform/otel/common/otelService.ts)) is the only abstraction consumers should depend on — never import the OTel SDK directly outside `node/otelServiceImpl.ts`. Three implementations:

| Class | When Used |
|---|---|
| `NoopOTelService` | `chatLib` and tests where no telemetry pipeline is needed — zero cost |
| `NodeOTelService` | OTel enabled — full SDK, OTLP/file/console export, optional SQLite span exporter |
| `InMemoryOTelService` | Registered when OTel is **disabled** — no SDK is loaded, but spans/metrics/logs are still captured in-memory so the Agent Debug Log panel keeps working |

Selection happens in [`src/extension/extension/vscode-node/services.ts`](../../../extensions/copilot/src/extension/extension/vscode-node/services.ts): exactly one of `NodeOTelService` or `InMemoryOTelService` is bound to `IOTelService` per extension host based on `resolveOTelConfig().enabled`.

## 5. Span / Metric / Event Conventions

Follow the [OTel GenAI semantic conventions](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/). **Always use the constants from [`genAiAttributes.ts`](../../../extensions/copilot/src/platform/otel/common/genAiAttributes.ts) — never raw string literals.**

| Operation | Span Name | Kind | Constant |
|---|---|---|---|
| Agent orchestration | `invoke_agent {agent_name}` | `INTERNAL` | `GenAiOperationName.INVOKE_AGENT` |
| LLM API call | `chat {model}` | `CLIENT` | `GenAiOperationName.CHAT` |
| Tool execution | `execute_tool {tool_name}` | `INTERNAL` | `GenAiOperationName.EXECUTE_TOOL` |
| Hook execution | `execute_hook {hook_type}` | `INTERNAL` | `GenAiOperationName.EXECUTE_HOOK` |

Attribute namespaces:

| Namespace | Constant module | Examples |
|---|---|---|
| `gen_ai.*` | `GenAiAttr` | `gen_ai.operation.name`, `gen_ai.usage.input_tokens` |
| `copilot_chat.*` | `CopilotChatAttr` | `copilot_chat.session_id`, `copilot_chat.chat_session_id`, `copilot_chat.hook_*` |
| `github.copilot.*` | `CopilotCliSdkAttr` | SDK-emitted hook attributes (read-only — bridge & debug panel) |
| `claude_code.*` | (raw) | Claude subprocess SDK attributes — only ever observed in OTLP, not produced by the extension |

### Standard span pattern

```ts
return this._otelService.startActiveSpan(
    `execute_tool ${name}`,
    {
        kind: SpanKind.INTERNAL,
        attributes: {
            [GenAiAttr.OPERATION_NAME]: GenAiOperationName.EXECUTE_TOOL,
            [GenAiAttr.TOOL_NAME]: name,
            // …
        },
    },
    async (span) => {
        try {
            const result = await this._actualWork();
            span.setStatus(SpanStatusCode.OK);
            return result;
        } catch (err) {
            span.setStatus(SpanStatusCode.ERROR, err instanceof Error ? err.message : String(err));
            span.setAttribute(StdAttr.ERROR_TYPE, err instanceof Error ? err.constructor.name : 'Error');
            throw err;
        }
    },
);
```

### Cross-boundary trace propagation

```ts
// Parent: store context keyed by something the child knows
const ctx = this._otelService.getActiveTraceContext();
if (ctx) { this._otelService.storeTraceContext(`subagent:invocation:${id}`, ctx); }

// Child: retrieve and use as parent
const parentCtx = this._otelService.getStoredTraceContext(`subagent:invocation:${id}`);
return this._otelService.startActiveSpan('invoke_agent child', { parentTraceContext: parentCtx, … }, fn);
```

### Content capture

The extension uses two conventions side-by-side; pick the right one for the attribute you're adding.

1. **Always emit (truncated)** — used for inputs/outputs that the Agent Debug Log panel needs to be useful even when OTel export is off (e.g. `gen_ai.tool.call.arguments` in [`toolsService.ts`](../../../extensions/copilot/src/extension/tools/vscode-node/toolsService.ts), and `copilot_chat.hook_input` / `hook_output` in [`chatHookService.ts`](../../../extensions/copilot/src/extension/chat/vscode-node/chatHookService.ts)). The attribute is captured unconditionally but always passed through `truncateForOTel`. Use this for moderate-sized, generally-non-secret arguments / results.
2. **Gate on `config.captureContent`** — used for full prompt / response / system-instruction bodies (e.g. `gen_ai.input.messages`, `gen_ai.output.messages`, `gen_ai.system_instructions`, `gen_ai.tool.definitions` in [`chatMLFetcher.ts`](../../../extensions/copilot/src/extension/prompt/node/chatMLFetcher.ts) and the BYOK providers). These are larger and more likely to contain user secrets.

```ts
// Pattern 1 — always emit, always truncate
span.setAttribute(GenAiAttr.TOOL_CALL_ARGUMENTS, truncateForOTel(JSON.stringify(args)));

// Pattern 2 — gated on captureContent
if (this._otelService.config.captureContent) {
    span.setAttribute(GenAiAttr.INPUT_MESSAGES, truncateForOTel(JSON.stringify(messages)));
}
```

### Debug panel vs OTLP isolation

Spans whose `gen_ai.operation.name` is **not** in `EXPORTABLE_OPERATION_NAMES` (defined in [`otelServiceImpl.ts`](../../../extensions/copilot/src/platform/otel/node/otelServiceImpl.ts)) are visible to the debug panel via `onDidCompleteSpan` but excluded from OTLP and SQLite exporters by `DiagnosticSpanExporter` and `FilteredSpanExporter`. Currently exportable: `chat`, `invoke_agent`, `execute_tool`, `embeddings`, `execute_hook`. **If you add a new operation name that should reach the user's collector, update `EXPORTABLE_OPERATION_NAMES` and document it in `agent_monitoring.md`.**

## 6. Configuration Surface (must stay in sync)

When you add or change a setting/env var/command, update **all three** of:

1. The setting/command registration in [`extensions/copilot/package.json`](../../../extensions/copilot/package.json) (search for `github.copilot.chat.otel`).
2. `resolveOTelConfig` in [`otelConfig.ts`](../../../extensions/copilot/src/platform/otel/common/otelConfig.ts) — if the setting affects runtime config — and the `enabledVia` channel if it can implicitly enable OTel.
3. `agent_monitoring.md` ("VS Code Settings", "Environment Variables", "Activation", "Commands" tables) **and** `agent_monitoring_arch.md` ("Activation Channels", "Agent-Specific Env Var Translation" tables).

For sub-process env vars, also update:

- `deriveCopilotCliOTelEnv` / `deriveClaudeOTelEnv` in [`agentOTelEnv.ts`](../../../extensions/copilot/src/platform/otel/common/agentOTelEnv.ts).
- The corresponding tests in `src/platform/otel/common/test/agentOTelEnv.spec.ts`.

## 7. Procedure Checklists

### When adding a new span / attribute

1. Add the attribute key as a constant to `genAiAttributes.ts` (under `GenAiAttr`, `CopilotChatAttr`, or a new domain group). Never inline a raw `'copilot_chat.foo'` literal.
2. Add it to the public barrel in [`index.ts`](../../../extensions/copilot/src/platform/otel/common/index.ts) if it lives in a new group.
3. Use `IOTelService.startActiveSpan` (preferred) or `startSpan` — never `BasicTracerProvider` / `getTracer` directly.
4. Pass the value through `truncateForOTel` (mandatory for any free-form content attribute — prevents OTLP batch failures). Decide whether the attribute should be **always-emitted** (debug-panel-essential, e.g. tool args, hook input/output) or **gated on `config.captureContent`** (large prompt/response bodies, system instructions); follow the existing convention for similar data.
5. If the new operation should reach OTLP, add its op-name to `EXPORTABLE_OPERATION_NAMES` in `otelServiceImpl.ts`.
6. Document the new attribute in `agent_monitoring.md` (under the relevant span table) **and** add a test in `src/platform/otel/common/test/`.

### When adding a new metric / event

1. Add the helper to `genAiMetrics.ts` or `genAiEvents.ts` (mirror existing static / functional patterns).
2. Re-export it from `index.ts`.
3. Add the metric/event row to `agent_monitoring.md` ("Metrics" / "Events" sections) with all attributes documented.
4. Add a unit test in `src/platform/otel/common/test/genAiMetrics.spec.ts` or `genAiEvents.spec.ts` (assert the exact name + attribute keys).

### When instrumenting a new agent surface

1. Pick a strategy: direct spans (foreground-style), bridge processor (CLI-style), or message-stream synthesis (Claude-style).
2. Add the new emit site to the **Instrumentation Points** table in `agent_monitoring_arch.md` and the **Span Hierarchies** diagrams.
3. If you forward OTel env vars to a child process, do it via a new `derive*OTelEnv` helper in `agentOTelEnv.ts` and add a row to the **Agent-Specific Env Var Translation** table.
4. Wire trace propagation explicitly with `storeTraceContext` / `parentTraceContext` for any subagent or async boundary; do not rely on global active context across processes.

### When changing the Copilot CLI bridge

The bridge (`copilotCliBridgeSpanProcessor.ts`) reaches into `_delegate._activeSpanProcessor._spanProcessors` — internal OTel SDK v2 state. This is documented as a known risk. If you touch it:

- Keep the runtime guard that degrades gracefully if the internal shape changes.
- Update the **⚠ SDK Internal Access Warning** block in `agent_monitoring_arch.md` if the access pattern changes.
- Add a unit test in `copilotCliBridgeSpanProcessor.spec.ts`.

## 8. Validation

Before sending a PR that touches OTel code:

```bash
# From extensions/copilot/
npx tsc --noEmit --project tsconfig.json

# OTel + Bridge unit tests
npm test -- --grep "OTel\|Bridge"
```

Manual sanity checks:

- The Aspire Dashboard quick-start in `agent_monitoring.md` still works end-to-end (one agent message → `invoke_agent` + `chat` + `execute_tool` spans visible at <http://localhost:18888>).
- The Agent Debug Log panel in VS Code still shows the full span tree for foreground, Copilot CLI, and Claude sessions.

## 9. Known Risks & Limitations

These are documented in `agent_monitoring_arch.md` — preserve them:

- SDK `_spanProcessors` internal access (graceful runtime guard).
- Two TracerProviders in the same process when CLI SDK is active.
- `process.env` mutation for the CLI SDK (only OTel-specific vars, set before `LocalSessionManager` ctor).
- Single `captureContent` flag for the CLI SDK applies to both debug panel and OTLP — document any user-visible change clearly.
- Claude SDK has no file exporter, and the CLI runtime only supports `otlp-http`.

## 10. Anti-Patterns to Reject

- ❌ Importing `@opentelemetry/api` (or any `@opentelemetry/*` package) from anywhere other than `node/otelServiceImpl.ts`, `fileExporters.ts`, or the CLI bridge processor type imports.
- ❌ Hard-coded attribute keys: `'copilot_chat.hook_type'` instead of `CopilotChatAttr.HOOK_TYPE`.
- ❌ Hard-coded provider strings: `'github'` / `'anthropic'` / `'gemini'` instead of `GenAiProviderName.*`.
- ❌ Magic `SpanStatusCode` numbers (`code: 1`, `code: 2`) — use the enum.
- ❌ Emitting any free-form content attribute without passing it through `truncateForOTel` — OTLP batches will silently drop or fail.
- ❌ Logging full prompt / response / system-instruction bodies without `config.captureContent` gating (these are pattern 2 above).
- ❌ Adding a span operation name without deciding whether it's exportable (`EXPORTABLE_OPERATION_NAMES`).
- ❌ Updating instrumentation without updating `agent_monitoring.md` / `agent_monitoring_arch.md` in the same change.