---
description: ADK agent design — callbacks, state, composition, Gemini config, graceful degradation
paths: ["src/supply_chain_triage/modules/*/agents/**"]
---

# Agent rules

Every agent is a subpackage: `agent.py`, `prompts/*.md`, `schemas.py`, `tools.py`. Co-located, not central. `prompts/` is a folder of markdown files (not a single `prompt.py`) — chosen for long multi-section prompts and diff readability.

## 1. Callback placement

Five ADK hooks, each with distinct return semantics:

| Hook | `None` returned | Object returned |
|---|---|---|
| `before_agent_callback` | proceed | `types.Content` — skip agent, use as final output |
| `before_model_callback` | proceed | `LlmResponse` — skip LLM, use as response |
| `after_model_callback` | use LLM output | `LlmResponse` — replace LLM output |
| `before_tool_callback` | proceed | `dict` — skip tool, use as tool result |
| `after_tool_callback` | use tool output | `dict` — replace tool output |

**Canonical uses:**
- `before_agent` — load exception context from Firestore into `state`, request-scoped setup, audit log entry.
- `before_model` — input guardrails, PII redaction, prompt validation, cached-response short-circuit.
- `after_model` — format enforcement (`output_schema` validation recovery), strip hallucinated fields, add disclaimers.
- `before_tool` — argument validation, authz, cost caps, mocked responses in tests.
- `after_tool` — normalize results, mask secrets, translate error schemas.

**Never** put business logic in callbacks (belongs in tools). **Never** put retry loops in `after_model` (use `LoopAgent`). **Never** Firestore-write in `before_model` (latency on critical path).

## 2. State namespacing

`session.state` is a string-keyed, JSON-serializable dict. Prefixes scope lifetime:

| Prefix | Scope |
|---|---|
| none | session-scoped |
| `user:` | per user across sessions (same `app_name`) |
| `app:` | global to the app |
| `temp:` | this invocation only, never persisted |

**Module-scoped keys for this project:** `triage:exception_id`, `triage:classification`, `triage:impact`, `triage:resolution`. When `port_intel/` lands under Meta-Coordinator, use `port_intel:*` — the prefix prevents collisions.

## 3. Never mutate `session.state` directly

Inside tools and callbacks, mutate via the context:
```python
tool_context.state["triage:severity"] = "HIGH"
```
Not:
```python
session.state["triage:severity"] = "HIGH"  # WRONG — bypasses event tracking and persistence
```
ADK captures context mutations as `EventActions.state_delta` and writes atomically through `SessionService`.

## 4. Cross-agent data passing

Standard pattern: `output_key=` on the upstream agent auto-writes the final output to `state[output_key]`. Downstream agents reference `{output_key}` in their instruction template.

```python
classifier = LlmAgent(name="classifier", ..., output_key="triage:classification")
impact = LlmAgent(
    name="impact",
    instruction="Given {triage:classification}, assess business impact...",
    output_key="triage:impact",
)
pipeline = SequentialAgent(sub_agents=[classifier, impact])
```

Keep large domain objects in Firestore keyed by ID; store only the ID + small derived fields in state.

## 5. Structured output + tools mutual exclusion on Gemini 2.5 Flash

`output_schema` on an `LlmAgent` forbids tools or sub-agent transfer. Gemini 3.0 lifts this, but Flash does not. **Canonical workaround (two-agent pattern):**

```python
fetcher = LlmAgent(
    name="fetcher",
    tools=[lookup_exception],
    output_key="raw_exception",
)
formatter = LlmAgent(
    name="formatter",
    instruction="Format {raw_exception} as JSON matching the schema.",
    output_schema=ClassificationOutput,
    output_key="triage:classification",
)
classifier = SequentialAgent(sub_agents=[fetcher, formatter])
```

**Recovery when structured output breaks** (long prompts, deeply nested schemas, union types): validate in `after_model_callback` with `try: Model.model_validate_json(...)`; on failure, return a corrective `LlmResponse` or escalate via `LoopAgent`.

Keep Tier 1 schemas **flat** — primitives, short enums, no deep nesting, no untagged unions.

## 6. Agent composition

| Type | When to use |
|---|---|
| `SequentialAgent` | Deterministic pipeline (Classifier → Impact) |
| `ParallelAgent` | Fan-out independent work (Impact + Route-Optimization on same exception in Tier 3) |
| `LoopAgent` | Until convergence (Tier 2 Generator-Judge, `max_iterations` + judge escalates `escalation_action` to exit) |
| `LlmAgent` with `sub_agents=[...]` | LLM-decided routing (Coordinator) — picks child based on each child's `description=` |

## 7. Terse-coordinator rule

Coordinator instruction stays under ~20 lines. Delegation logic goes in each **child's** `description=` field, which is what the Coordinator LLM actually sees when routing.

Bad: 200-line coordinator prompt that restates every child's behavior.
Good: 10-line router + rich per-child `description=` strings.

## 8. Thinking-budget defaults per role (Gemini 2.5 Flash)

```python
from google.genai.types import GenerateContentConfig, ThinkingConfig

# Classifier / Impact — structured, fast
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=1024))

# Resolution Generator (Tier 2) — creative, longer reasoning
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=4096))

# Judge (Tier 2) — fast pass/fail
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=0))

# Comms drafter (Tier 3)
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=1024))
```

## 9. Safety settings

Default Gemini thresholds block logistics terms like "strike", "hazard cargo". For internal supply-chain content, loosen to `BLOCK_ONLY_HIGH`:

```python
from google.genai.types import SafetySetting, HarmCategory, HarmBlockThreshold

safety_settings = [
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
    ),
    # ... same pattern for other categories
]
```

## 10. Streaming

Stream tokens only from the **final** agent. Never stream intermediate `SequentialAgent` steps — it leaks raw JSON fragments to users.

FastAPI pattern: wrap `Runner.run_async` as an async generator, emit SSE (`text/event-stream`), filter on `event.is_final_response()` + `event.partial`.

## 11. Never hand-write A2A

When an A2A surface is needed (Tier 3): `uvx agent-starter-pack create ... --agent adk_a2a`. Lift scaffolded files into `runners/`. Artifacts that are **never** hand-written:

- `A2aAgentExecutor`
- `AgentCardBuilder`
- `agent.json`
- `A2AFastAPIApplication` mount
- Agent Engine CI/CD glue

## 12. Graceful degradation

If a sub-agent fails, the Coordinator must still return whatever it has. Concrete rule for triage:

> If `{triage:impact}` is missing in state, the Coordinator returns classification only with `impact_available=false`. Never 500.

Model this via Coordinator instruction:
> "If `{triage:impact}` is not present, report classification only and note the impact assessment was unavailable."

## 13. No direct Firestore/Firebase imports

`agent.py` imports `from google.adk.*`. It does **not** import `firebase_admin` or `google.cloud.firestore`. All data access goes through tools. Enforced by ruff `TID251` — see `.claude/rules/imports.md`.