---
name: opentelemetry
description: Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
---

# OpenTelemetry Implementation Guide

## Overview

OpenTelemetry (OTel) is a vendor-neutral observability framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). This skill provides guidance for implementing OTEL in Kubernetes environments.

## Quick Start

### Deploy OTEL Collector on Kubernetes

```bash
# Add Helm repo
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# Install with basic config
helm install otel-collector open-telemetry/opentelemetry-collector \
  --namespace monitoring --create-namespace \
  --set mode=daemonset
```

### Send Test Data via OTLP

```bash
# gRPC endpoint: 4317, HTTP endpoint: 4318
curl -X POST http://otel-collector:4318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{"resourceSpans":[]}'
```

## Core Concepts

**Signals**: Three types of telemetry data:

- **Traces**: Distributed request flows across services
- **Metrics**: Numerical measurements (counters, gauges, histograms)
- **Logs**: Event records with structured/unstructured data

**Collector Components**:

- **Receivers**: Accept data (OTLP, Prometheus, Jaeger, Zipkin)
- **Processors**: Transform data (batch, memory_limiter, k8sattributes)
- **Exporters**: Send data (prometheusremotewrite, loki, otlp)
- **Extensions**: Add capabilities (health_check, pprof, zpages)

## Collector Configuration

### Basic Pipeline Structure

```yaml
config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25

  exporters:
    prometheusremotewrite:
      endpoint: "http://prometheus:9090/api/v1/write"
    loki:
      endpoint: "http://loki:3100/loki/api/v1/push"

  service:
    pipelines:
      metrics:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [prometheusremotewrite]
      logs:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [loki]
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [otlp/tempo]
```

### Kubernetes Attributes Enrichment

```yaml
processors:
  k8sattributes:
    auth_type: "serviceAccount"
    passthrough: false
    filter:
      node_from_env_var: ${env:K8S_NODE_NAME}
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.node.name
```

## Deployment Modes

| Mode | Use Case | Pros | Cons |
|------|----------|------|------|
| DaemonSet | Node-level collection | Full coverage, host metrics | Higher resource usage |
| Deployment | Centralized gateway | Scalable, easier management | Single point of failure |
| Sidecar | Per-pod collection | Isolated, fine-grained | Resource overhead per pod |

## Common Patterns

### Development Environment

- Enable debug exporter for visibility
- Lower resource limits (250m CPU, 512Mi memory)
- Include spot instance tolerations for cost savings

### Production Environment

- Implement sampling (10-50% for traces)
- Higher batch sizes (2048-4096)
- Enable autoscaling and PodDisruptionBudget
- Use TLS for all endpoints

## Detailed References

For in-depth guidance, see:

- **Collector Configuration**: [COLLECTOR.md](references/COLLECTOR.md)
- **Kubernetes Deployment**: [KUBERNETES.md](references/KUBERNETES.md)
- **Troubleshooting**: [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md)
- **Instrumentation**: [INSTRUMENTATION.md](references/INSTRUMENTATION.md)

## Validation Commands

```bash
# Check collector pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector

# View collector logs
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100

# Test OTLP endpoint
kubectl run test-otlp --image=curlimages/curl:latest --rm -it -- \
  curl -v http://otel-collector.monitoring:4318/v1/traces

# Validate config syntax
otelcol validate --config=config.yaml
```

## Key Helm Chart Values

```yaml
mode: "daemonset"  # or "deployment"
presets:
  logsCollection:
    enabled: true
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
  kubeletMetrics:
    enabled: true
useGOMEMLIMIT: true
resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 256Mi
```