--- name: adk-observability-guide description: > MUST READ before setting up observability for ADK agents or when analyzing production traffic, debugging agent behavior, or improving agent performance. ADK observability guide — Cloud Trace, prompt-response logging, BigQuery Agent Analytics, third-party integrations, and troubleshooting. Use when configuring monitoring, tracing, or logging for agents, or when understanding how a deployed agent handles real traffic. metadata: license: Apache-2.0 author: Google --- # ADK Observability Guide > **Scaffolded project?** Cloud Trace and prompt-response logging are pre-configured by Terraform. See `references/cloud-trace-and-logging.md` for infrastructure details, env vars, and verification commands. > > **No scaffold?** Follow the ADK docs links below for manual setup. For production infrastructure, scaffold with `/adk-scaffold`. ### Reference Files | File | Contents | |------|----------| | `references/cloud-trace-and-logging.md` | Scaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally | | `references/bigquery-agent-analytics.md` | BQ Agent Analytics plugin — enabling, key features, GCS offloading, tool provenance | --- ## Observability Tiers Choose the right level of observability based on your needs: | Tier | What It Does | Scope | Default State | Best For | |------|-------------|-------|---------------|----------| | **Cloud Trace** | Distributed tracing — execution flow, latency, errors via OpenTelemetry spans | All templates, all environments | Always enabled | Debugging latency, understanding agent execution flow | | **Prompt-Response Logging** | GenAI interactions exported to GCS, BigQuery, and Cloud Logging | ADK agents only | Disabled locally, enabled when deployed | Auditing LLM interactions, compliance | | **BigQuery Agent Analytics** | Structured agent events (LLM calls, tool use, outcomes) to BigQuery | ADK agents with plugin enabled | Opt-in (`--bq-analytics` at scaffold time) | Conversational analytics, custom dashboards, LLM-as-judge evals | | **Third-Party Integrations** | External observability platforms (AgentOps, Phoenix, MLflow, etc.) | Any ADK agent | Opt-in, per-provider setup | Team collaboration, specialized visualization, prompt management | **Ask the user** which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive. --- ## Cloud Trace ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow. ### Span Hierarchy ``` invocation └── agent_run (one per agent in the chain) ├── call_llm (model request/response) └── execute_tool (tool execution) ``` ### Setup by Deployment Type | Deployment | Setup | |-----------|-------| | **Agent Engine** | Automatic — traces are exported to Cloud Trace by default | | **Cloud Run (scaffolded)** | Automatic — `otel_to_cloud=True` in the FastAPI app | | **Cloud Run (manual)** | Configure OpenTelemetry exporter in your app | | **Local dev** | Works with `make playground`; traces visible in Cloud Console | View traces: **Cloud Console → Trace → Trace explorer** For detailed setup instructions (Agent Engine CLI/SDK, Cloud Run, custom deployments), fetch `https://google.github.io/adk-docs/integrations/cloud-trace/index.md`. --- ## Prompt-Response Logging Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL), BigQuery (external tables), and Cloud Logging (dedicated bucket). Privacy-preserving by default — only metadata is logged unless explicitly configured otherwise. Key env var: `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` — set to `NO_CONTENT` (metadata only, default in deployed envs), `true` (full content), or `false` (disabled). Logging is disabled locally unless `LOGS_BUCKET_NAME` is set. For scaffolded project details (Terraform resources, env vars, privacy modes, enabling/disabling, verification commands), see `references/cloud-trace-and-logging.md`. For ADK logging docs (log levels, configuration, debugging), fetch `https://google.github.io/adk-docs/observability/logging/index.md`. --- ## BigQuery Agent Analytics Plugin Optional plugin that logs structured agent events to BigQuery. Enable with `--bq-analytics` at scaffold time. See `references/bigquery-agent-analytics.md` for details. --- ## Third-Party Integrations ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior. | Platform | Key Differentiator | Setup Complexity | Self-Hosted Option | |----------|-------------------|-----------------|-------------------| | **AgentOps** | Session replays, 2-line setup, replaces native telemetry | Minimal | No (SaaS) | | **Arize AX** | Commercial platform, production monitoring, evaluation dashboards | Low | No (SaaS) | | **Phoenix** | Open-source, custom evaluators, experiment testing | Low | Yes | | **MLflow** | OTel traces to MLflow Tracking Server, span tree visualization | Medium (needs SQL backend) | Yes | | **Monocle** | 1-call setup, VS Code Gantt chart visualizer | Minimal | Yes (local files) | | **Weave** | W&B platform, team collaboration, timeline views | Low | No (SaaS) | | **Freeplay** | Prompt management + evals + observability in one platform | Low | No (SaaS) | **Ask the user** which platform they prefer — present the trade-offs and let them choose. For setup details, fetch the relevant ADK docs page from the Deep Dive table below. --- ## Troubleshooting | Issue | Solution | |-------|----------| | No traces in Cloud Trace | Verify `otel_to_cloud=True` in FastAPI app; check service account has `cloudtrace.agent` role | | Prompt-response data not appearing | Check `LOGS_BUCKET_NAME` is set; verify SA has `storage.objectCreator` on the bucket; check app logs for telemetry setup warnings | | Privacy mode misconfigured | Check `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` value — use `NO_CONTENT` for metadata-only, `false` to disable | | BigQuery Analytics not logging | Verify plugin is configured in `app/agent.py`; check `BQ_ANALYTICS_DATASET_ID` env var is set | | Third-party integration not capturing spans | Check provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry | | Traces missing tool spans | Tool execution spans appear under `execute_tool` — check trace explorer filters | | High telemetry costs | Switch to `NO_CONTENT` mode; reduce BigQuery retention; disable unused tiers | --- ## Deep Dive: ADK Docs (WebFetch URLs) For detailed documentation beyond what this skill covers, fetch these pages: | Topic | URL | |-------|-----| | Observability overview | `https://google.github.io/adk-docs/observability/index.md` | | Agent activity logging | `https://google.github.io/adk-docs/observability/logging/index.md` | | Cloud Trace integration | `https://google.github.io/adk-docs/integrations/cloud-trace/index.md` | | BigQuery Agent Analytics | `https://google.github.io/adk-docs/integrations/bigquery-agent-analytics/index.md` | | AgentOps | `https://google.github.io/adk-docs/integrations/agentops/index.md` | | Arize AX | `https://google.github.io/adk-docs/integrations/arize-ax/index.md` | | Phoenix (Arize) | `https://google.github.io/adk-docs/integrations/phoenix/index.md` | | MLflow tracing | `https://google.github.io/adk-docs/integrations/mlflow/index.md` | | Monocle | `https://google.github.io/adk-docs/integrations/monocle/index.md` | | W&B Weave | `https://google.github.io/adk-docs/integrations/weave/index.md` | | Freeplay | `https://google.github.io/adk-docs/integrations/freeplay/index.md` |