---
name: adk-observability-guide
description: >
  MUST READ before setting up observability for ADK agents or when
  analyzing production traffic, debugging agent behavior, or improving
  agent performance.
  ADK observability guide — Cloud Trace, prompt-response logging,
  BigQuery Agent Analytics, third-party integrations, and troubleshooting.
  Use when configuring monitoring, tracing, or logging for agents,
  or when understanding how a deployed agent handles real traffic.
metadata:
  license: Apache-2.0
  author: Google
---

# ADK Observability Guide

> **Scaffolded project?** Cloud Trace and prompt-response logging are pre-configured by Terraform. See `references/cloud-trace-and-logging.md` for infrastructure details, env vars, and verification commands.
>
> **No scaffold?** Follow the ADK docs links below for manual setup. For production infrastructure, scaffold with `/adk-scaffold`.

### Reference Files

| File | Contents |
|------|----------|
| `references/cloud-trace-and-logging.md` | Scaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally |
| `references/third-party.md` | Third-party integration setup patterns, trade-offs, and ADK docs links for each provider |

---

## Observability Tiers

Choose the right level of observability based on your needs:

| Tier | What It Does | Scope | Default State | Best For |
|------|-------------|-------|---------------|----------|
| **Cloud Trace** | Distributed tracing — execution flow, latency, errors via OpenTelemetry spans | All templates, all environments | Always enabled | Debugging latency, understanding agent execution flow |
| **Prompt-Response Logging** | GenAI interactions exported to GCS, BigQuery, and Cloud Logging | ADK agents only | Disabled locally, enabled when deployed | Auditing LLM interactions, compliance |
| **BigQuery Agent Analytics** | Structured agent events (LLM calls, tool use, outcomes) to BigQuery | ADK agents with plugin enabled | Opt-in (`--bq-analytics` at scaffold time) | Conversational analytics, custom dashboards, LLM-as-judge evals |
| **Third-Party Integrations** | External observability platforms (AgentOps, Phoenix, MLflow, etc.) | Any ADK agent | Opt-in, per-provider setup | Team collaboration, specialized visualization, prompt management |

**Ask the user** which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.

---

## Cloud Trace

ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.

### Span Hierarchy

```
invocation
  └── agent_run (one per agent in the chain)
        ├── call_llm (model request/response)
        └── execute_tool (tool execution)
```

### Setup by Deployment Type

| Deployment | Setup |
|-----------|-------|
| **Agent Engine** | Automatic — traces are exported to Cloud Trace by default |
| **Cloud Run (scaffolded)** | Automatic — `otel_to_cloud=True` in the FastAPI app |
| **Cloud Run (manual)** | Configure OpenTelemetry exporter in your app |
| **Local dev** | Works with `make playground`; traces visible in Cloud Console |

View traces: **Cloud Console → Trace → Trace explorer**

For detailed setup instructions (Agent Engine CLI/SDK, Cloud Run, custom deployments), fetch the ADK docs:
- `WebFetch: https://google.github.io/adk-docs/integrations/cloud-trace/index.md`

---

## Prompt-Response Logging

Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL), BigQuery (external tables), and Cloud Logging (dedicated bucket).

### Privacy Modes

Prompt-response logging is **privacy-preserving by default** — only metadata is logged. Controlled by `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`:

| Value | Behavior |
|-------|----------|
| `false` | Logging disabled |
| `NO_CONTENT` | Enabled, metadata only — tokens, model name, timing (default in deployed environments) |
| `true` | Enabled with full prompt/response content (not recommended for production) |

For Agent Engine: the platform requires `true` during deployment, but the app overrides to `NO_CONTENT` at runtime.

### Behavior by Environment

| Environment | Prompt-Response Logging | Why |
|-------------|------------------------|-----|
| Local dev (`make playground`) | Disabled | No `LOGS_BUCKET_NAME` set |
| Dev (Terraform deployed) | Enabled (`NO_CONTENT`) | Terraform sets env vars |
| Staging / Production | Enabled (`NO_CONTENT`) | Terraform sets env vars |

To enable locally, set `LOGS_BUCKET_NAME` and `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=NO_CONTENT` before running `make playground`.

To disable in a deployed environment, set `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=false` in `deployment/terraform/service.tf` and re-apply.

For scaffolded project infrastructure details (Terraform resources, env vars, verification), see `references/cloud-trace-and-logging.md`.

For ADK logging docs (log levels, configuration, debugging):
- `WebFetch: https://google.github.io/adk-docs/observability/logging/index.md`

---

## BigQuery Agent Analytics Plugin

An optional plugin that logs structured agent events directly to BigQuery via the Storage Write API. Enables:

- **Conversational analytics** — session flows, user interaction patterns
- **LLM-as-judge evals** — structured data for evaluation pipelines
- **Custom dashboards** — Looker Studio integration
- **Tool provenance tracking** — LOCAL, MCP, SUB_AGENT, A2A, TRANSFER_AGENT

### Enabling

| Method | How |
|--------|-----|
| **At scaffold time** | `uvx agent-starter-pack create . --bq-analytics` |
| **Post-scaffold** | Add the plugin manually to `app/agent.py` (see ADK docs) |

Infrastructure (BigQuery dataset, GCS offloading) is provisioned automatically by Terraform when enabled at scaffold time.

### Key Features

- Auto-schema upgrade (new fields added without migration)
- GCS offloading for multimodal content (images, audio)
- Distributed tracing via OpenTelemetry span context
- SQL-queryable event log for all agent interactions

For full schema, SQL query examples, and Looker Studio setup:
- `WebFetch: https://google.github.io/adk-docs/integrations/bigquery-agent-analytics/index.md`

---

## Third-Party Integrations

ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.

| Platform | Key Differentiator | Setup Complexity | Self-Hosted Option |
|----------|-------------------|-----------------|-------------------|
| **AgentOps** | Session replays, 2-line setup, replaces native telemetry | Minimal | No (SaaS) |
| **Arize AX** | Commercial platform, production monitoring, evaluation dashboards | Low | No (SaaS) |
| **Phoenix** | Open-source, custom evaluators, experiment testing | Low | Yes |
| **MLflow** | OTel traces to MLflow Tracking Server, span tree visualization | Medium (needs SQL backend) | Yes |
| **Monocle** | 1-call setup, VS Code Gantt chart visualizer | Minimal | Yes (local files) |
| **Weave** | W&B platform, team collaboration, timeline views | Low | No (SaaS) |
| **Freeplay** | Prompt management + evals + observability in one platform | Low | No (SaaS) |

**Ask the user** which platform they prefer — present the trade-offs and let them choose. For setup details on each, see `references/third-party.md`.

---

## Troubleshooting

| Issue | Solution |
|-------|----------|
| No traces in Cloud Trace | Verify `otel_to_cloud=True` in FastAPI app; check service account has `cloudtrace.agent` role |
| Prompt-response data not appearing | Check `LOGS_BUCKET_NAME` is set; verify SA has `storage.objectCreator` on the bucket; check app logs for telemetry setup warnings |
| Privacy mode misconfigured | Check `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` value — use `NO_CONTENT` for metadata-only, `false` to disable |
| BigQuery Analytics not logging | Verify plugin is configured in `app/agent.py`; check `BQ_ANALYTICS_DATASET_ID` env var is set |
| Third-party integration not capturing spans | Check provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry |
| Traces missing tool spans | Tool execution spans appear under `execute_tool` — check trace explorer filters |
| High telemetry costs | Switch to `NO_CONTENT` mode; reduce BigQuery retention; disable unused tiers |

---

## Deep Dive: ADK Docs (WebFetch URLs)

For detailed documentation beyond what this skill covers, fetch these pages:

| Topic | URL |
|-------|-----|
| Observability overview | `https://google.github.io/adk-docs/observability/index.md` |
| Agent activity logging | `https://google.github.io/adk-docs/observability/logging/index.md` |
| Cloud Trace integration | `https://google.github.io/adk-docs/integrations/cloud-trace/index.md` |
| BigQuery Agent Analytics | `https://google.github.io/adk-docs/integrations/bigquery-agent-analytics/index.md` |
| AgentOps | `https://google.github.io/adk-docs/integrations/agentops/index.md` |
| Arize AX | `https://google.github.io/adk-docs/integrations/arize-ax/index.md` |
| Phoenix (Arize) | `https://google.github.io/adk-docs/integrations/phoenix/index.md` |
| MLflow tracing | `https://google.github.io/adk-docs/integrations/mlflow/index.md` |
| Monocle | `https://google.github.io/adk-docs/integrations/monocle/index.md` |
| W&B Weave | `https://google.github.io/adk-docs/integrations/weave/index.md` |
| Freeplay | `https://google.github.io/adk-docs/integrations/freeplay/index.md` |