--- name: enable-ssi description: Enable Single Step Instrumentation (SSI) on Kubernetes — automatically instruments applications for APM without code changes. Only use if the Datadog Agent is already running on the cluster — if not, use agent-install first. metadata: version: "1.0.0" author: datadog-labs repository: https://github.com/datadog-labs/agent-skills tags: datadog,apm,kubernetes,ssi,instrumentation,single-step alwaysApply: "false" --- # Enable APM on Kubernetes via Single Step Instrumentation > **Before doing anything else:** Fully resolve all variables in `## Context to resolve before acting`. Do not begin Step 0 until every variable has a concrete value. --- > **Silent failure — check this before any other step:** > > If the application has `ddtrace`, `dd-trace`, or any OpenTelemetry SDK in its **dependency manifest** (`requirements.txt`, `package.json`, `Gemfile`, `go.mod`, `pom.xml`) — even with no import statements in code — SSI will silently disable itself at runtime. > > The failure is invisible: init containers run and complete, the pod starts healthy, no errors appear in `kubectl` or `pup`, but no traces arrive. The injector detects the user-installed tracer and exits cleanly without logging anything. > > ### Claude runs > > ```bash > grep -rE "ddtrace|dd-trace|opentelemetry" \ > requirements.txt package.json Gemfile go.mod pom.xml 2>/dev/null \ > || echo "No tracer dependency found" > ``` > > If any match — **stop**. Remove the package entirely (not just the import), rebuild the image, reload it into the cluster, and restart the pod before continuing. A package present in the manifest is enough to trigger this even if it is never imported. --- ## Triggers Invoke this skill when the user expresses intent to: - Enable APM on a Kubernetes cluster - Instrument Kubernetes applications with Datadog tracing - Set up Single Step Instrumentation (SSI) Do NOT invoke this skill if: - The Datadog Agent is not yet installed — run `agent-install` first - The user wants to verify SSI after setup — use `verify-ssi` - The user wants to enable Profiler, AppSec, or Data Streams — use `dd-apm-k8s-sdk-features` --- ## Prerequisites > **These are not a reading exercise — actively verify each one before proceeding.** **Environment** - [ ] Datadog Agent is installed and healthy — `agent-install` complete - [ ] Kubernetes v1.20+ - [ ] Linux node pools only — Windows pods require explicit namespace exclusion - [ ] Cluster is not ECS Fargate — unsupported - [ ] Not a hardened SELinux environment — unsupported - [ ] Not a very small VM instance (e.g. t2.micro) — SSI can hit init timeouts - [ ] No PodSecurity baseline or restricted policy enforced **Language and runtime** - [ ] Application language is one of: Java, Python, Ruby, Node.js, .NET, PHP - [ ] Runtime version is within SSI's supported range — verify against the [SSI compatibility matrix](https://docs.datadoghq.com/tracing/trace_collection/automatic_instrumentation/single-step-apm/compatibility/) - [ ] Node.js app is not using ESM — SSI does not support ESM - [ ] Java app is not already using a `-javaagent` JVM flag **Existing instrumentation** — confirmed clean by the check at the top of this skill. If you skipped that check, go back and run it now. --- ## Context to resolve before acting > **Discover from the cluster — do not ask the user for information you can find yourself.** | Variable | How to resolve | |---|---| | `AGENT_NAMESPACE` | Same namespace used in `agent-install` (e.g. `datadog`) | | `APP_NAMESPACE` | Run `kubectl get namespaces --no-headers \| awk '{print $1}' \| grep -vE '^(kube-system\|kube-public\|kube-node-lease\|datadog\|local-path-storage)$'` — instrument all non-system namespaces, or use the namespace(s) the user mentioned | | `TARGET_LANGUAGES` | Run `kubectl get pods -A -o jsonpath='{.items[*].spec.containers[*].image}'` and infer language from image names, or check Dockerfiles/manifests in the workspace. If uncertain, enable all languages. | | `DEPLOYMENT_NAME` | Run `kubectl get deployments -A --no-headers` — identify application deployments (exclude system components) | | `APP_LABEL` | Check `spec.selector.matchLabels` in the Deployment manifest via `kubectl get deployment -n -o yaml` | | `CLUSTER_NAME` | Check `spec.global.clusterName` in `datadog-agent.yaml`, or `kubectl config current-context` — needed for kind clusters in Step 0 | | `ENV` | Use `apm-evals` if running in an eval cluster (kind cluster names contain "evalya"). Otherwise use `production` unless the user specifies otherwise. | | `SERVICE_NAME` | Use the deployment name (e.g. `python-app` → service `python-app`). Do not ask the user. | | `VERSION` | Use `1.0.0` as the default. Do not ask the user. | --- ## Step 0 (Only if existing instrumentation detected): Remove Manual Instrumentation Scan all source files for: `import ddtrace`, `from ddtrace`, `require 'ddtrace'`, `require("dd-trace")`, `opentelemetry`, `tracer.trace(` Also check dependency manifests for `ddtrace` / `dd-trace` / OTel SDK packages. If found — remove the import/package, then rebuild and reload: ### Claude runs ```bash docker build -f -t ``` [DECISION: how does this cluster get local images?] Check the repo's setup script (e.g. `create.sh`, `Makefile`, `justfile`) for how images are loaded — do not guess from the cluster name or context. Common patterns: | What you find in the setup script | Load command | |---|---| | `minikube image load` or `minikube cache add` | `minikube -p image load ` — profile is the `-p` flag value in the script, NOT necessarily the kubectl context name | | `kind load docker-image` | `kind load docker-image --name ` | | `docker push` to a registry | Push the new image; the cluster will pull on restart — skip local load | | `k3d image import` | `k3d image import -c ` | | No image load step (cloud cluster, always pulls from registry) | Skip — image will be pulled on next deployment | If the setup script is ambiguous, run the load command it uses exactly as written. - Registry-based: skip — image will be pulled on next deployment > **Confirm with the user before restarting.** Tell the user: "I need to restart `` in `` to pick up the rebuilt image. Ready to proceed?" Wait for confirmation. ### Claude runs ```bash kubectl rollout restart deployment/ -n kubectl wait --for=condition=Ready pod \ -l app= \ -n \ --timeout=120s ``` --- ## Step 1: Extend the DatadogAgent Manifest with APM SSI is configured on the existing `DatadogAgent` resource — do not create a separate manifest. **Choose targeting scope based on what the user asked for:** - User asked to instrument **all applications** or didn't specify scope → **use Option A (cluster-wide)** - User asked for specific namespaces only → use Option B - User asked to exclude namespaces from cluster-wide → use Option C - User asked for specific pods/workloads → use Option D > **Default is cluster-wide (Option A).** If the user said "all my applications", "my whole cluster", or didn't restrict scope, use Option A with no `enabledNamespaces` or `targets`. Recommended `ddTraceVersions`: `java: "1"`, `python: "2"`, `js: "5"`, `dotnet: "3"`, `ruby: "2"`, `php: "1"` **Option A — Cluster-wide (default):** ```yaml features: apm: instrumentation: enabled: true ``` **Option B — Specific namespaces only:** ```yaml features: apm: instrumentation: enabled: true enabledNamespaces: - ``` **Option C — Cluster-wide with exclusions:** ```yaml features: apm: instrumentation: enabled: true disabledNamespaces: - jenkins - kube-system ``` **Option D — Target specific workloads:** ```yaml features: apm: instrumentation: enabled: true targets: - name: namespaceSelector: matchNames: - ddTraceVersions: : "" ``` > **Note:** `ddTraceVersions` only applies inside a `targets[]` entry (Option D). It is not valid alongside `enabledNamespaces` or at the `instrumentation` level directly. ### Claude runs ```bash kubectl apply -f datadog-agent.yaml ``` If `datadogagent.datadoghq.com/datadog configured` — continue to Step 2. ERROR: Validation error — check YAML. `enabledNamespaces` and `disabledNamespaces` cannot both be set. --- ## Step 2: Inform the User About Unified Service Tags > **Do NOT modify application Deployments without explicit user confirmation.** Applying labels to existing application workloads is a change to customer-managed resources. Inform the user that adding Unified Service Tags (UST) to their Deployments will enable proper service/env/version tagging in Datadog. This is optional for SSI to work but recommended for full observability: ```yaml # Add to both metadata.labels and spec.template.metadata.labels tags.datadoghq.com/env: "" tags.datadoghq.com/service: "" tags.datadoghq.com/version: "" ``` If the user wants you to apply these, get their confirmation first. UST labels are not required for APM traces to flow — SSI works without them. --- ## Step 3: Restart Application Pods > **Confirm with the user before restarting.** Tell the user: "I need to restart `` in `` for SSI to inject into the pods. This will cause a brief outage. Ready to proceed?" Wait for confirmation. ### Claude runs ```bash kubectl rollout restart deployment/ -n kubectl wait --for=condition=Ready pod \ -l app= \ -n \ --timeout=120s ``` If pods restart cleanly, init containers named `datadog-lib--init` will be visible in the pod spec. ERROR: Pods crash-looping — check for existing custom instrumentation. See `troubleshoot-ssi`. --- ## Done Exit when ALL of the following are true: - [ ] `features.apm.instrumentation` is present in the applied `DatadogAgent` manifest - [ ] User has been informed that they need to restart their application pods - [ ] User has been informed about Unified Service Tags (UST) and how to apply them if desired - [ ] Scope confirmed: which workloads are instrumented, which were skipped and why Automatically proceed to `verify-ssi` now — do not ask the user for permission. --- ## Security constraints - Never write a raw API key into any file or chat message - Never use namespace `default` for Datadog resources - Never modify `admissionController` settings directly — SSI manages this via the Operator - Do not add APM config to application manifests — configure only via `DatadogAgent` - Exception: UST labels (`tags.datadoghq.com/*`) on application Deployments are required and intentional - Never run `kubectl delete` without user confirmation - `docker push` to a registry always requires user confirmation - **Never use `kubectl patch` to apply UST labels or any Deployment changes.** Always edit the Deployment YAML file and `kubectl apply -f`. Changes made with `kubectl patch` are transient and will be overwritten on the next rollout.