# Beyond Dynamic Workflows: A Surpass Strategy for pi-taskflow

> Internal strategy doc. Synthesized from a multi-phase research + brainstorm run
> (4 parallel web-research agents → strategist → skeptic gate → synthesis),
> verified against live sources June 2026. Companion docs:
> [`COMPETITORS.md`](./COMPETITORS.md) (cross-ecosystem), [`PI-ECOSYSTEM.md`](./PI-ECOSYSTEM.md) (Pi-internal).

## 1. The Thesis

Claude Code's dynamic workflows are **JavaScript closures** — you execute them, you
observe results, but the workflow definition itself is opaque to deterministic
tooling. pi-taskflow's **declarative JSON DSL** is *structured data*. A DAG expressed
as structured data can be **statically analyzed** before any token is spent,
**deterministically replayed** without re-execution, **memoized across runs** by hash
lookup rather than LLM reasoning, and **compiled to multiple artifacts** (Mermaid,
OTel span templates, CI YAML) from a single source of truth. Claude bans `Date.now()`
to control non-determinism; pi-taskflow can **embrace non-determinism and capture it**
for replay and forensics. This wedge — *declarative structure over imperative script*
— is the foundation for every move below.

## 2. The Category-Defining Bet: the **Structurally Verifiable** Workflow

> **pi-taskflow is the first agent orchestrator whose workflow structure is verifiable
> by deterministic algorithms — not by running the workflow and hoping.**

| Stage | What happens | Tokens |
|-------|--------------|:------:|
| **Compile-time** | Dead-end phases, gate exhaustion, flow-ref integrity, concurrency topology warnings, trivial guard contradictions — caught by graph algorithms on the DAG | **0** |
| **Pre-execution** | Graph-position cache key per phase; cross-run memoization index consulted; matched phases reused instantly | **0 (cache hit)** |
| **Execution** | Declarative criteria (schema conformance, path containment, structural invariants) evaluated **before** the LLM gate agent runs; the LLM handles only the qualitative residue | **gate only** |
| **Post-execution** | Event-sourced trace replays the run deterministically; change a gate threshold / budget and replay against cached data | **0** |

No framework does all four. LangGraph has checkpointing but no static verification and
no cross-run memoization. Temporal has event-sourced replay but workflows are imperative
code you can't statically analyze. Claude's JS scripts are structurally opaque.

**The qualifier matters.** "Structurally verifiable" = we can prove DAG integrity,
reference soundness, and gate completeness — *not* that the LLM won't hallucinate. The
tagline is **Structurally Verifiable**, never unqualified "provable".

## 3. Strategic Moves — Ranked

| # | Idea | Attacks | Why pi-taskflow wins | Effort | Surpasses Claude? |
|---|------|---------|----------------------|:------:|:-----------------:|
| 1 | **Graph-position caching** — key = `phaseId(upstreamKeys):inputHash` | map fan-out cache collisions, best-of-N cache pollution | DAG position is explicit & computable at runtime; lives inside existing `hashInput`/`cachedPhase` | **S** | **Y** |
| 2 | **Static structural verification** (dead-ends, gate exhaustion, flow refs, concurrency warnings, trivial contradictions) | 41.8% of multi-agent failures are spec/coordination errors (MAST); Claude has zero static checks | `validateTaskflow()` already does cycle detection + ref checks; the rest is graph-algorithmic on existing output | **S** | **Y** |
| 3 | **Cross-run memoization** (global cache index keyed on phase input hash) | Claude/LangGraph don't share state across sessions | file-based store is inherently shareable & inspectable; needs #1 | **S** | **Y** |
| 4 | **Declarative eval gates** + `onBlock: "retry"` (retry upstream on fail, not halt) | 21.3% of failures are in verification/termination (MAST) | machine-checkable criteria run *before* the LLM gate; `onBlock:retry` is genuinely new control flow | **M** | **Y** |
| 5 | **Deterministic replay** from append-only event trace | Agent Reproducibility Paradox; Claude resume is session-scoped only | `PhaseState` already captures inputHash/output/usage/model; upgrade to JSONL event trace, replay against recorded responses | **L** | **Partial** (Temporal replays workflow code; we replay agent decisions) |
| 6 | **OpenTelemetry GenAI export** (optional peerDependency; no-op when absent) | observability gaps; Claude has no external tracing | already collect timing/tokens/status/agent/model per phase; custom `taskflow.*` span attributes | **S** | **Y** |
| 7 | **Multi-target DSL compilation** (Mermaid + verification report + OTel template now; CNCF/GH-Actions later) | workflows trapped in framework-specific code | JSON DSL compiles to many artifacts from one source; source hash enables drift detection | **M** | **Partial** |
| 8 | **Best-of-N with late binding** (spawn N, take best K) — rescoped from speculative pruning | brute-force parallel blows up cost | runtime owns scheduling; graph-position keys keep pruned branches out of cache | **XL→M** | **Partial** |
| 9 | **Model routing / cost optimization** (cheap phases → cheap models) | per-phase cost is known; nobody auto-routes | runtime already tracks `usage` + enforces caps; add a `route` hint | **S** | n/a |
| 10 | **Workflow template library** (4–6 battle-tested `.tf.json`) | patterns re-implemented per project | dogfoods the `flow` sub-workflow type; reduces adoption friction | **S** | n/a |

## 4. Capability Gaps to Close First (all naturally declarative)

| Gap | pi-taskflow approach | Effort |
|-----|----------------------|:------:|
| **Loop-until-done** | new `loop` phase: `"until": "{steps.X.output.done}==true"` + `maxIterations` + convergence detection — **✅ shipped** | M |
| **Tournament** | new `tournament` phase: N variants compete, a judge sub-phase picks `best`/`aggregate` — **✅ shipped** | M |
| **Worktree isolation** | `"cwd": "temp"`/`"dedicated"`/`"worktree"` per phase; runtime creates & destroys an isolated dir (or a git worktree on a throwaway branch) — **✅ shipped** | M |
| **Security quarantine** | per-phase `"tools": {"allow":[...], "deny":[...]}` (depends on pi core tool-restriction API) | S (if pi supports) |
| **Saga/compensation** | `compensate` phase triggered on upstream failure, reverse order | L (defer) |

## 5. Three-Horizon Roadmap

- **H1 — Verifiable Foundation (~4 wks):** graph-position caching → static verification → loop-until-done → cross-run memoization → OTel export → model routing. *Outcome: the only orchestrator with static DAG verification + cross-run memoization + OTel.*
- **H2 — Quality & Portability (~4 wks):** declarative eval gates (`onBlock:retry`) → tournament → worktree → Mermaid+verification compilation → template library.
- **H3 — Research Frontier (~6 wks):** deterministic replay → best-of-N late binding → quarantine → saga (deferred).

## 6. Honest Risks & Where Others Still Win

- **Zero-dep vs OTel/JSON-Schema tension** → resolve via **optional peerDependencies** (zero-deps at rest, opt-in at runtime). Don't hand-roll OTLP.
- **Claude still wins:** IDE integration, serverless execution, single-`.js` simplicity, Opus model quality.
- **LangGraph still wins:** node-level checkpoint + time-travel (we're phase-level only).
- **Temporal still wins:** event-sourced durability + exactly-once at scale (we're a local orchestrator).
- **Biggest threat:** if Claude ships loops + tournaments + static analysis first, the "structured DAG" narrative erodes. The wedge is only defensible if we ship H1 fast — their imperative model makes static analysis *harder*, which is our time window.

---
*Every capability claim is grounded against existing code; nothing is invented. Update as the landscape moves.*