# Architecture

FerrisGrid is a local, single-step visual control primitive.

```mermaid
flowchart TD
  human[Human task] --> agent[External agent runtime]
  agent --> ferris[FerrisGrid CLI]
  ferris --> observe[observe: capture screens]
  ferris --> act[act: validate and execute one action]
  ferris --> recap[recap: review existing traces]
  observe --> session[(.ferrisgrid session files)]
  act --> session
  session --> recap
```

## Principles

- **Single-step by default:** one observation or one action per invocation.
- **Agent owns reasoning:** FerrisGrid does not choose the next action.
- **Multi-screen first:** screen IDs disambiguate observation and action targets.
- **Local traceability:** every meaningful step writes local artifacts.
- **Compact Markdown interface:** tool output is designed for agents to read directly.
- **Coordinate correctness before speed:** coordinates must map deterministically.
- **Policy-gated execution:** actions are validated before OS input is emitted.

## Workspace layout

```text
crates/
  ferrisgrid-cli/
  ferrisgrid-core/
  ferrisgrid-capture/
  ferrisgrid-input/
  ferrisgrid-export/
```

The CLI owns argument parsing and Markdown output. Core owns sessions, orchestration, action parsing, validation, and result types. Capture and input crates hide platform-specific backends.