---
name: invoking-cli-agents
description: Use when needing to programmatically invoke a CLI coding agent (Claude Code, OpenCode) from Python, delegate work to a sub-agent, or orchestrate multiple agents. Covers AgentShell instantiation, prompt execution, response streaming, session resumption, tool scoping, and cost tracking.
---

# Invoking CLI Agents with AgentShell

AgentShell is a Python library that runs CLI coding agents headlessly and returns structured output. It hides agent-specific CLI differences behind a unified interface so your code works regardless of which agent runs underneath.

## When to Use

- You need to invoke Claude Code, OpenCode, or another CLI agent from Python
- You want to delegate a coding task to a sub-agent and collect the result
- You need to orchestrate multi-step workflows across agents
- You want to stream agent output in real-time

## When NOT to Use

- You want to call the Anthropic API directly (use the SDK instead)
- The CLI agent is already running interactively and you just need its output

## Installation

```bash
uv add agent-shell-py
```

## Core Concepts

AgentShell has two methods: `execute()` for collecting a complete response, and `stream()` for real-time event processing. Both are async.

### Execute: Run and Collect

Use when you want the final answer and don't need intermediate output.

```python
from agent_shell.shell import AgentShell
from agent_shell.models.agent import AgentType

shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)

response = await shell.execute(
    cwd="/path/to/project",
    prompt="Analyse the authentication module and list all public functions",
    allowed_tools=["Read", "Glob", "Grep"],
    model="sonnet",
)

print(response.response)      # Full text output
print(f"Cost: ${response.cost:.4f}")
print(f"Session: {response.session_id}")
```

### Stream: Real-Time Events

Use when you need progress feedback, want to display output incrementally, or need to react to specific event types (tool use, thinking, errors).

```python
async for event in shell.stream(
    cwd="/path/to/project",
    prompt="Refactor the auth module to use dependency injection",
    allowed_tools=["Read", "Edit", "Bash"],
    model="sonnet",
    effort="high",
    include_thinking=True,
):
    if event.type == "system":
        print(f"Session: {event.session_id}")
    elif event.type == "thinking":
        print(f"[thinking] {event.content}")
    elif event.type == "tool_use":
        print(f"[tool] {event.content}")
    elif event.type == "text":
        print(event.content)
    elif event.type == "result":
        print(f"Done. Cost: ${event.cost:.4f}, Duration: {event.duration:.1f}s")
```

### Session Resumption

Pass `session_id` from a previous response to continue the conversation. This enables multi-turn workflows where each step builds on the last.

```python
# Step 1: Analyse
analysis = await shell.execute(
    cwd="/path/to/project",
    prompt="Analyse this codebase and identify areas that need refactoring",
    allowed_tools=["Read", "Glob", "Grep"],
    model="sonnet",
)

# Step 2: Act on the analysis (same session)
refactor = await shell.execute(
    cwd="/path/to/project",
    prompt="Now refactor the top priority item you identified",
    allowed_tools=["Read", "Edit", "Bash"],
    model="sonnet",
    session_id=analysis.session_id,
)
```

## Parameters

| Parameter | Type | Default | Purpose |
|-----------|------|---------|---------|
| `cwd` | `str` | required | Working directory (must exist) |
| `prompt` | `str` | required | Task or question for the agent |
| `allowed_tools` | `list[str] \| None` | `None` | Restrict which tools the agent can use. `None` = all tools. **Claude Code only.** |
| `model` | `str \| None` | `None` | Model alias or full name (e.g. `"sonnet"`, `"claude-sonnet-4-6"`) |
| `effort` | `str \| None` | `None` | Reasoning effort: `"low"`, `"medium"`, `"high"`, `"max"`. **Claude Code only.** |
| `include_thinking` | `bool` | `False` | Filter only: yields thinking events in `stream()` if already present in CLI output. Does not add a CLI flag. **Claude Code `stream()` only. Dropped by `execute()`.** |
| `auto_approve` | `bool` | `True` | Skip tool permission prompts. **Claude Code only.** |
| `session_id` | `str \| None` | `None` | Resume a previous session |

> **Agent parity warning:** OpenCode currently only maps `model` and `session_id` to CLI flags. Parameters like `allowed_tools`, `effort`, `include_thinking`, and `auto_approve` are accepted but silently ignored. Do not rely on `allowed_tools` for safety when using OpenCode.

## Supported Agents

```python
from agent_shell.models.agent import AgentType

AgentType.CLAUDE_CODE   # Claude Code CLI
AgentType.OPENCODE      # OpenCode CLI
```

Gemini CLI, Copilot CLI, and Codex have enum values but no adapter yet.

## Tool Scoping (Claude Code Only)

Restrict what the agent can do by passing `allowed_tools`. This is critical for safety when delegating work.

> **Important:** Tool scoping only works with Claude Code. OpenCode ignores `allowed_tools` — the agent will have access to all tools regardless of what you pass. Do not use OpenCode for safety-sensitive delegation where tool restriction is required.

> **Gotcha:** `allowed_tools=[]` (empty list) is falsy in Python, so no `--allowed-tools` flag is sent — the agent gets **full tool access**. To restrict tools, always pass a non-empty list. There is no way to disable all tools via this parameter.

```python
# Read-only analysis (Claude Code)
shell = AgentShell(agent_type=AgentType.CLAUDE_CODE)

response = await shell.execute(
    cwd=project_path,
    prompt="Review this code for security issues",
    allowed_tools=["Read", "Glob", "Grep"],  # No write access
)

# Full write access for implementation
response = await shell.execute(
    cwd=project_path,
    prompt="Implement the fix you recommended",
    allowed_tools=["Read", "Edit", "Write", "Bash", "Glob", "Grep"],
)
```

## Error Handling

AgentShell raises exceptions for some errors, but CLI agent failures are reported as events, not exceptions.

```python
from pathlib import Path

# cwd validation - raises ValueError if directory doesn't exist
try:
    response = await shell.execute(cwd="/nonexistent", prompt="hello")
except ValueError as e:
    print(f"Bad directory: {e}")

# Unsupported agent type - raises ValueError at construction
try:
    shell = AgentShell(agent_type=AgentType.GEMINI_CLI)
except ValueError as e:
    print(f"No adapter: {e}")

# KeyboardInterrupt - AgentShell cancels the subprocess cleanly
try:
    response = await shell.execute(cwd=project_path, prompt="long task...")
except KeyboardInterrupt:
    print("Agent cancelled")
```

**CLI agent failures do not raise exceptions.** `execute()` returns an `AgentResponse` with whatever text was accumulated (which may be empty) and gives no indication of failure. When streaming, failures surface in two ways:

- `StreamEvent(type="error")` — CLI process errors or OpenCode agent errors
- `StreamEvent(type="result", content="error")` — Claude Code agent-level failures

Check for both when streaming:

```python
async for event in shell.stream(cwd=project_path, prompt="do something"):
    if event.type == "error":
        print(f"Agent error: {event.content}")
        break
    elif event.type == "result" and event.content == "error":
        print("Agent reported failure")
        break
    elif event.type == "text":
        print(event.content)
```

## Logging

```python
import logging

logging.getLogger("agent_shell").setLevel(logging.DEBUG)
logging.getLogger("agent_shell").addHandler(logging.StreamHandler())
```

`INFO` captures tool calls, session IDs, costs, errors. `DEBUG` adds raw JSON events.

## Quick Reference

| Want to... | Do this |
|------------|---------|
| Get a complete answer | `await shell.execute(cwd, prompt)` |
| Stream events live | `async for event in shell.stream(cwd, prompt)` |
| Continue a conversation | Pass `session_id=response.session_id` |
| Limit agent capabilities | Pass `allowed_tools=["Read", "Glob"]` |
| Track costs | Read `response.cost` or `event.cost` |
| Use a specific model | Pass `model="sonnet"` |
| Increase reasoning depth | Pass `effort="high"` |
| See agent thinking | Pass `include_thinking=True` |
| Cancel a running agent | `KeyboardInterrupt` (handled automatically) |

## Common Mistakes

| Mistake | Fix |
|---------|-----|
| Passing a non-existent `cwd` | Validate the path exists before calling |
| Forgetting `await` on `execute()` | Both `execute()` and the async iterator from `stream()` require async context |
| Not scoping `allowed_tools` (Claude Code) | Always restrict tools to the minimum needed for the task |
| Assuming `allowed_tools` works with OpenCode | OpenCode ignores this parameter — use Claude Code for tool-restricted tasks |
| Expecting `execute()` to include thinking | `include_thinking` only affects `stream()` events; `execute()` drops thinking |
| Not checking for error events in `stream()` | Agent failures yield `StreamEvent(type="error")`, they don't raise exceptions |
| Ignoring `session_id` for multi-step work | Without it, each call starts a fresh conversation with no prior context |
| Using an unsupported `AgentType` | Only `CLAUDE_CODE` and `OPENCODE` have adapters currently |

## API Reference

For detailed model definitions, event types, and adapter protocol: see [api-reference.md](api-reference.md).