◯ ─────────── ◯

Ouroboros

O U R O B O R O S

◯ ─────────── ◯

Stop prompting. Start specifying.
_{Agent OS for replayable, specification-first AI coding workflows}

Quick Start · Why · Results · How It Works · Commands · Philosophy

**Turn a vague idea into a verified, working codebase -- across Claude Code, Codex CLI, OpenCode, and Hermes.** Ouroboros is an Agent OS for AI coding: a local-first runtime layer that turns non-deterministic agent work into a replayable, observable, policy-bound execution contract. It replaces ad-hoc prompting with a structured specification-first workflow: interview, crystallize, execute, evaluate, evolve. --- ## Why Ouroboros? Most AI coding fails at the **input**, not the output. The bottleneck is not AI capability -- it is human clarity. | Problem | What Happens | Ouroboros Fix | | :------------ | :------------------------------- | :-------------------------------------------- | | Vague prompts | AI guesses, you rework | Socratic interview exposes hidden assumptions | | No spec | Architecture drifts mid-build | Immutable seed spec locks intent before code | | Manual QA | "Looks good" is not verification | 3-stage automated evaluation gate | --- ## Quick Start **Install** — one command, everything auto-detected: ```bash curl -fsSL https://raw.githubusercontent.com/Q00/ouroboros/main/scripts/install.sh | bash ``` **Build** — open your AI coding agent and go: ``` > ooo interview "I want to build a task management CLI" ``` > Works with Claude Code, Codex CLI, OpenCode, Hermes, and Kiro CLI. The installer detects Claude Code, Codex CLI, and Hermes CLI automatically and registers the MCP server. For OpenCode, run `ouroboros setup --runtime opencode` after installation.

Kiro CLI quick start

```bash pip install 'ouroboros-ai[claude]' ouroboros setup # detects Kiro CLI and registers MCP server ``` Set runtime in `.env`: ``` OUROBOROS_RUNTIME=kiro ``` Then use `ooo` commands inside a Kiro CLI session.

Other install methods

**Claude Code plugin only** (no system package): ```bash claude plugin marketplace add Q00/ouroboros && claude plugin install ouroboros@ouroboros ``` Then run `ooo setup` inside a Claude Code session. **pip / uv / pipx**: ```bash pip install ouroboros-ai # base pip install ouroboros-ai[claude] # + Claude Code deps pip install ouroboros-ai[litellm] # + LiteLLM multi-provider pip install ouroboros-ai[mcp] # + MCP server/client support pip install ouroboros-ai[tui] # + Textual terminal UI pip install ouroboros-ai[all] # everything (claude + litellm + mcp + tui + dashboard) ouroboros setup # configure runtime ``` Legacy compatibility: `ouroboros-ai[dashboard]` is still accepted as a compatibility alias while extras migrate. See runtime guides: [Claude Code](./docs/runtime-guides/claude-code.md) · [Codex CLI](./docs/runtime-guides/codex.md) · [Hermes](./docs/runtime-guides/hermes.md) · [OpenCode](./docs/runtime-guides/opencode.md)

Uninstall

```bash ouroboros uninstall ``` Removes all configuration, MCP registration, and data. See [UNINSTALL.md](./UNINSTALL.md) for details.

What just happened?

``` interview -> Socratic questioning exposed 12 hidden assumptions seed -> Crystallized answers into an immutable spec (Ambiguity: 0.15) run -> Executed via Double Diamond decomposition evaluate -> 3-stage verification: Mechanical -> Semantic -> Consensus ``` > Use `ooo ` inside your AI coding agent session, or `ouroboros init start`, `ouroboros run seed.yaml`, etc. from the terminal. The serpent completed one loop. Each loop, it knows more than the last.

--- ## How It Compares AI coding tools are powerful -- but they solve the **wrong problem** when the input is unclear. | | Vanilla AI Coding | Ouroboros | | :------------------ | :--------------------------------------- | :------------------------------------------------------------------------------ | | **Vague prompt** | AI guesses intent, builds on assumptions | Socratic interview forces clarity *before* code | | **Spec validation** | No spec -- architecture drifts mid-build | Immutable seed spec locks intent; Ambiguity gate (<= 0.2) blocks premature code | | **Evaluation** | "Looks good" / manual QA | 3-stage automated gate: Mechanical -> Semantic -> Multi-Model Consensus | | **Rework rate** | High -- wrong assumptions surface late | Low -- assumptions surface in the interview, not in the PR review | --- ## The Loop The ouroboros -- a serpent devouring its own tail -- is not decoration. It IS the architecture: ``` Interview -> Seed -> Execute -> Evaluate ^ | +---- Evolutionary Loop ----+ ``` Each cycle does not repeat -- it **evolves**. The output of evaluation feeds back as input for the next generation, until the system truly knows what it is building. | Phase | What Happens | | :------------ | :-------------------------------------------------------------------- | | **Interview** | Socratic questioning exposes hidden assumptions | | **Seed** | Answers crystallize into an immutable specification | | **Execute** | Double Diamond: Discover -> Define -> Design -> Deliver | | **Evaluate** | 3-stage gate: Mechanical ($0) -> Semantic -> Multi-Model Consensus | | **Evolve** | Wonder *("What do we still not know?")* -> Reflect -> next generation | > *"This is where the Ouroboros eats its tail: the output of evaluation* > *becomes the input for the next generation's seed specification."* > -- `reflect.py` Convergence is reached when ontology similarity >= 0.95 -- when the system has questioned itself into clarity. ### Ralph: The Loop That Never Stops `ooo ralph` runs the evolutionary loop persistently -- across session boundaries -- until convergence is reached. Each step is **stateless**: the EventStore reconstructs the full lineage, so even if your machine restarts, the serpent picks up where it left off. ``` Ralph Cycle 1: evolve_step(lineage, seed) -> Gen 1 -> action=CONTINUE Ralph Cycle 2: evolve_step(lineage) -> Gen 2 -> action=CONTINUE Ralph Cycle 3: evolve_step(lineage) -> Gen 3 -> action=CONVERGED +-- Ralph stops. The ontology has stabilized. ``` --- ## Commands Inside AI coding agent sessions, use `ooo ` skills. From the terminal, use the `ouroboros` CLI. | Skill (`ooo`) | CLI equivalent | What It Does | | :--------------- | :---------------------------------------------------------------- | :----------------------------------------------------------- | | `ooo setup` | `ouroboros setup` | Register runtime and configure project (one-time) | | `ooo interview` | `ouroboros init start` | Socratic questioning -- expose hidden assumptions | | `ooo auto` | `ouroboros auto` | Goal → A-grade Seed → execution handoff with bounded loops | | `ooo seed` | *(generated by interview)* | Crystallize into immutable spec | | `ooo run` | `ouroboros run seed.yaml` | Execute via Double Diamond decomposition | | `ooo evaluate` | *(via MCP)* | 3-stage verification gate | | `ooo evolve` | *(via MCP)* | Evolutionary loop until ontology converges | | `ooo unstuck` | *(via MCP)* | 5 lateral thinking personas when you are stuck | | `ooo status` | `ouroboros status executions` / `ouroboros status execution ` | Session tracking + (MCP-only) drift detection | | `ooo resume-session` | `ouroboros resume` | List in-flight sessions and re-attach commands | | `ooo cancel` | `ouroboros cancel execution [\|--all]` | Cancel stuck or orphaned executions | | `ooo ralph` | *(via MCP)* | Persistent loop until verified | | `ooo tutorial` | *(interactive)* | Interactive hands-on learning | | `ooo help` | `ouroboros --help` | Full reference | | `ooo pm` | *(via MCP)* | PM-focused interview + PRD generation | | `ooo qa` | *(via skill)* | General-purpose QA verdict for any artifact | | `ooo update` | `ouroboros update` | Check for updates + upgrade to latest | | `ooo brownfield` | *(via skill)* | Scan and manage brownfield repo/worktree defaults | | `ooo publish` | *(skill/runtime surface; uses `gh` CLI)* | Publish a Seed as GitHub Epic/Task issues for team workflows | > Not all skills have direct CLI equivalents. Some (`evaluate`, `evolve`, `unstuck`, `ralph`, `publish`) are available through agent skills, runtime rules, or MCP tools rather than a direct `ouroboros ` shell command. > `/resume` is reserved for Claude Code's built-in session picker; use `ooo resume-session` for Ouroboros in-flight sessions. See the [CLI reference](./docs/cli-reference.md) for full details. --- ## The Nine Minds Nine agents, each a different mode of thinking. Loaded on-demand, never preloaded: | Agent | Role | Core Question | | :----------------------- | :--------------------------------- | :-------------------------------------------------- | | **Socratic Interviewer** | Questions-only. Never builds. | *"What are you assuming?"* | | **Ontologist** | Finds essence, not symptoms | *"What IS this, really?"* | | **Seed Architect** | Crystallizes specs from dialogue | *"Is this complete and unambiguous?"* | | **Evaluator** | 3-stage verification | *"Did we build the right thing?"* | | **Contrarian** | Challenges every assumption | *"What if the opposite were true?"* | | **Hacker** | Finds unconventional paths | *"What constraints are actually real?"* | | **Simplifier** | Removes complexity | *"What's the simplest thing that could work?"* | | **Researcher** | Stops coding, starts investigating | *"What evidence do we actually have?"* | | **Architect** | Identifies structural causes | *"If we started over, would we build it this way?"* | --- ## Under the Hood

Architecture overview -- Python >= 3.12

``` src/ouroboros/ +-- bigbang/ Interview, ambiguity scoring, brownfield explorer +-- routing/ PAL Router -- 3-tier cost optimization (1x / 10x / 30x) +-- execution/ Double Diamond, hierarchical AC decomposition +-- evaluation/ Mechanical -> Semantic -> Multi-Model Consensus +-- evolution/ Wonder / Reflect cycle, convergence detection +-- resilience/ 4-pattern stagnation detection, 5 lateral personas +-- observability/ 3-component drift measurement, auto-retrospective +-- persistence/ Event sourcing (SQLAlchemy + aiosqlite), checkpoints +-- orchestrator/ Runtime abstraction layer (Claude Code, Codex CLI, OpenCode, Hermes) +-- core/ Types, errors, seed, ontology, security +-- providers/ LiteLLM adapter (100+ models) +-- mcp/ MCP client/server integration +-- plugin/ Plugin system (skill/agent auto-discovery) +-- tui/ Terminal UI dashboard +-- cli/ Typer-based CLI ``` **Key internals:** - **PAL Router** -- Frugal (1x) -> Standard (10x) -> Frontier (30x) with auto-escalation on failure, auto-downgrade on success - **Drift** -- Goal (50%) + Constraint (30%) + Ontology (20%) weighted measurement, threshold <= 0.3 - **Brownfield** -- Auto-detects config files across multiple language ecosystems - **Evolution** -- Up to 30 generations, convergence at ontology similarity >= 0.95 - **Stagnation** -- Detects spinning, oscillation, no-drift, and diminishing returns patterns - **Agent OS runtime** -- Replayable execution contract across capability discovery, policy, directives, event journal, and agent processes - **Runtime backends** -- Pluggable abstraction layer (`orchestrator.runtime_backend` config) with first-class support for Claude Code, Codex CLI, OpenCode, and Hermes; same workflow spec, different execution engines See [Architecture](./docs/architecture.md) for the full design document.

--- ## From Wonder to Ontology

The philosophical engine behind Ouroboros

> *Wonder -> "How should I live?" -> "What IS 'live'?" -> Ontology* > -- Socrates Every great question leads to a deeper question -- and that deeper question is always **ontological**: not *"how do I do this?"* but *"what IS this, really?"* ``` Wonder Ontology "What do I want?" -> "What IS the thing I want?" "Build a task CLI" -> "What IS a task? What IS priority?" "Fix the auth bug" -> "Is this the root cause, or a symptom?" ``` This is not abstraction for its own sake. When you answer *"What IS a task?"* -- deletable or archivable? solo or team? -- you eliminate an entire class of rework. **The ontological question is the most practical question.** Ouroboros embeds this into its architecture through the **Double Diamond**: ``` * Wonder * Design / (diverge) / (diverge) / explore / create / / * ------------ * ------------ * \ \ \ define \ deliver \ (converge) \ (converge) * Ontology * Evaluation ``` The first diamond is **Socratic**: diverge into questions, converge into ontological clarity. The second diamond is **pragmatic**: diverge into design options, converge into verified delivery. Each diamond requires the one before it -- you cannot design what you have not understood.

Ambiguity Score: The Gate Between Wonder and Code

The Interview does not end when you feel ready -- it ends when the **math** says you are ready. Ouroboros quantifies ambiguity as the inverse of weighted clarity: ``` Ambiguity = 1 - Sum(clarity_i * weight_i) ``` Each dimension is scored 0.0-1.0 by the LLM (temperature 0.1 for reproducibility), then weighted: | Dimension | Greenfield | Brownfield | | :------------------------------------------------------------ | :--------: | :--------: | | **Goal Clarity** -- *Is the goal specific?* | 40% | 35% | | **Constraint Clarity** -- *Are limitations defined?* | 30% | 25% | | **Success Criteria** -- *Are outcomes measurable?* | 30% | 25% | | **Context Clarity** -- *Is the existing codebase understood?* | -- | 15% | **Threshold: Ambiguity <= 0.2** -- only then can a Seed be generated. ``` Example (Greenfield): Goal: 0.9 * 0.4 = 0.36 Constraint: 0.8 * 0.3 = 0.24 Success: 0.7 * 0.3 = 0.21 ------ Clarity = 0.81 Ambiguity = 1 - 0.81 = 0.19 <= 0.2 -> Ready for Seed ``` Why 0.2? Because at 80% weighted clarity, the remaining unknowns are small enough that code-level decisions can resolve them. Above that threshold, you are still guessing at architecture.

Ontology Convergence: When the Serpent Stops

The evolutionary loop does not run forever. It stops when consecutive generations produce ontologically identical schemas. Similarity is measured as a weighted comparison of schema fields: ``` Similarity = 0.5 * name_overlap + 0.3 * type_match + 0.2 * exact_match ``` | Component | Weight | What It Measures | | :--------------- | :----: | :------------------------------------------------- | | **Name overlap** | 50% | Do the same field names exist in both generations? | | **Type match** | 30% | Do shared fields have the same types? | | **Exact match** | 20% | Are name, type, AND description all identical? | **Threshold: Similarity >= 0.95** -- the loop converges and stops evolving. But raw similarity is not the only signal. The system also detects pathological patterns: | Signal | Condition | What It Means | | :---------------------- | :----------------------------------------------- | :--------------------------------- | | **Stagnation** | Similarity >= 0.95 for 3 consecutive generations | Ontology has stabilized | | **Oscillation** | Gen N ~ Gen N-2 (period-2 cycle) | Stuck bouncing between two designs | | **Repetitive feedback** | >= 70% question overlap across 3 generations | Wonder is asking the same things | | **Hard cap** | 30 generations reached | Safety valve | ``` Gen 1: {Task, Priority, Status} Gen 2: {Task, Priority, Status, DueDate} -> similarity 0.78 -> CONTINUE Gen 3: {Task, Priority, Status, DueDate} -> similarity 1.00 -> CONVERGED ``` Two mathematical gates, one philosophy: **do not build until you are clear (Ambiguity <= 0.2), do not stop evolving until you are stable (Similarity >= 0.95).**

--- ## Contributing ```bash git clone https://github.com/Q00/ouroboros cd ouroboros uv sync --all-groups && uv run pytest ``` [Issues](https://github.com/Q00/ouroboros/issues) · [Discussions](https://github.com/Q00/ouroboros/discussions) · [Contributing Guide](./CONTRIBUTING.md) --- ## Star History

---

"The beginning is the end, and the end is the beginning."

The serpent does not repeat -- it evolves.

MIT License