# Design: chain — 多節點 Prompt 鏈的迭代迴圈工具

Generated by /office-hours on 2026-06-02
Branch: unknown
Repo: chain-cli (not a git repo yet)
Status: APPROVED
Mode: Startup (intrapreneurship 變體)

> **Historical design snapshot (2026-06-02).** This records the approved plan, not
> current behavior. A few things shipped differently: the command is **`chainq`**
> (not `chain`); there is **no `fake`/offline profile** (every `ai` step calls the
> real model); and **`chainq run` re-runs everything by default** — pass `--cache`
> to reuse unchanged steps. For current truth see the [README](../README.md) and
> [docs/cli/](cli/).

## Problem Statement

團隊裡好幾個人，現在用**本機 CLI 大模型**（`claude -p`、`codex -m`）手動串多個 prompt：複製上一步輸出、貼進下一個 prompt、再跑。一條工作流程常有 3 個節點，有時 10 個。

真正的痛**不是「能跑多步」**，而是**優化/迭代**：

> 「一个工作流程里面可能有 3 个提示词（节点），有时候甚至有 10 个。如果需要修改，或者时间久了，在没有 UI 界面的情况下基本无法优化。而且没办法立刻看到优化后的提示词结果，成本太高了。」

沒有 UI → 過陣子回來基本無法優化；沒有快速回饋 → 改一個 prompt 要重跑昂貴上游、又慢又燒 CLI 額度。迭代迴圈太貴，所以大家就不優化了。

## Demand Evidence

- **行為證據（不是意見）**：團隊好幾個人**現在**就在手動串 prompt。這是 behavior，不是 waitlist、不是「看了 demo 說好」。office hours 認定為真實 demand。
- **結構**：每個人可能有多個 workflow，每個 workflow 一或多個 YAML 檔；一個 YAML = 一串 prompt chain。
- **缺口**：demand 是真的，但「每個人流程不一樣」——沒有單一黃金流程可以預先打包；採用靠的是「工具能套上任一條他自己的流程」。

## Status Quo（你的真正對手）

**對手不是 n8n、不是 LangChain，是「他們已經開著的那個終端機視窗」——零設定成本。**

手動複貼的代價（使用者親述）：改一個 prompt → 沒有 UI 看不懂整條 → 看不到改後結果 → 重跑昂貴上游又慢又燒額度。但複貼的設定成本是零。chain 要求先寫一份 YAML，這是一個 setup tax。**chain 只有在「迭代迴圈的爽度大到蓋過設定稅」時才會贏。**

## Target User & Narrowest Wedge

- **第一個真實使用者**：團隊中某一個現在每週手動串、且願意馬上換的同事（用其真實 YAML，不是 hardcode 流程）。
- **最窄楔子**：`chainq ui <any-flow.yaml>` → 三欄編輯（輸入 / prompt / 輸出）→ **釘住上游樣本 → 改 prompt → 只跑這一個節點（本機 CLI）→ 秒級看到新輸出**，不重跑昂貴上游。對應文件的 F1 + F2 + B1。
- **刻意延後**：迴圈、schema、write 節點、F4 結構編輯、完整 CLI run 表面。

## Constraints

- AI 步走**本機 CLI 子行程**（非 HTTP API、無金鑰），慢且燒真實額度 → 「能跳過就跳過、只重跑受影響的」是核心價值，不是錦上添花。
- TypeScript + tsx（dev 免 build）。
- YAML 是唯一真實來源；視覺資料（座標/佈局）不進主檔。
- CLI 與 UI **共用同一引擎與驗證器**（鐵則，不得各寫一份）。
- 子行程鐵則：prompt 經 stdin 餵、argv 陣列 spawn（免注入）、每步 timeout、stderr 原文進報錯、跑前 which 檢查。

## Premises（已與使用者確認）

1. **產品是迭代迴圈，不是引擎。** 引擎是讓迴圈成立的最薄 plumbing；F1+F2+B1 才是同事買單的東西。Build order 相對原文件要把 UI 往前提——不能「引擎全做完 → UI 最後」。
2. **真正的對手是零設定的終端機複貼。** v1 必須把上手摩擦壓到最低；YAML 設定稅是採用的最大阻力。
3.（修正後）**「每個人流程不一樣」= 引擎/UI 要做成通用（能指向任一 YAML），但 feature scope 要窄。** 通用的是「指向任一 YAML → 視覺迭代迴圈」這個機制；窄的是節點型別（先只 `ai`+`cmd`，不做 loop/schema/write）。先讓**一個真實使用者在他的真實 YAML 上迭代**。通用機制、窄的第一採用者。

## Landscape（in-distribution，未對外搜尋）

- **n8n / Flowise / LangFlow**：已有「節點圖 + 改 prompt + 跑」。視覺流程不是你的差異化。
- **PromptLayer / Humanloop / Langfuse**：prompt 迭代 + 觀測，但都是 API-key、雲端 SaaS。
- **你真正的差異化**（文件 §5 幾乎埋沒的那點）：chain 跑在**你已登入的本機 CLI 模型**上（`claude -p`、`codex`），無金鑰、無 HTTP、燒你既有的 CLI 額度。視覺流程那一掛沒人這樣做（他們都為 API 而生）。這也正是「跳過昂貴上游」為何關鍵——你花的是真實 CLI 額度，不是便宜 API。**把這點當行銷與設計的第一賣點，別當註腳。**

## Approaches Considered

### Approach A: 最薄迭代窗（Minimal Viable）
只做 `chainq ui flow.yaml` + 三欄編輯 + 釘樣本只跑這一步 + scratch 快取。引擎最小（解析 YAML、依 `from:` 連線、單節點 subprocess、快取），不做完整 DAG/迴圈/schema/write/獨立 CLI run。
- Effort: S/M（人 ~1~2 週 / CC 數小時）· Risk: Low-Med
- Pros：最快解「無法優化」的痛、這週能 demo、避開最難的手刻畫布/F4
- Cons：沒整條跑、沒迴圈/schema；引擎之後可能要補強

### Approach B: 引擎先做對，UI/CLI 同生（Ideal Architecture）— **選定**
先把薄但正確的引擎做好（內部拓樸排序、`state.json` hash 失效、scratch/outputs 分離），CLI（run/validate）與迭代 UI 同時長在同一引擎上。`ai`+`cmd`+pin。迴圈/schema/write/F4 延後。
- Effort: M/L（人 ~3~4 週 / CC ~1~2 天）· Risk: Med
- Pros：地基乾淨、守住「同一引擎」鐵則、hash 跳過真的成立（真省不是假省）、長期軌跡最好
- Cons：第一個 demo 來得慢；有在驗證採用前過度投資 plumbing 的風險

### Approach C: 「貼上就生鏈」（Creative / Lateral）
v1 不要人先寫 YAML。貼上現有 prompt 序列 → 自動生成 YAML 骨架 → 直接丟進迭代窗。爽點：「貼上 5 個 prompt → 30 秒拿到可改可重跑的鏈」。直接打掉前提 2 的設定稅。
- Effort: M（人 ~2~3 週 / CC ~半天到 1 天）· Risk: Med-High（自動生成模糊）
- Pros：把對手（零設定終端機）變成 on-ramp、30 秒到第一個價值、差異化最強
- Cons：自動生成是模糊活；底下仍需 A 的迭代窗（= A + 額外）

## Recommended Approach

**選定 Approach B（使用者決定，覆蓋我原推薦的 A）。** 使用者的理由是地基乾淨、避免 v1.1 重寫引擎；在 demand 已真實的前提下這是合理取捨。

**強制緩解（寫進 B，防止 B 變成「引擎 4 週、UI 最後」）：**

> B 的內部里程碑排序，必須讓**迭代窗在「薄引擎能跑單一釘住節點」的當下就可 demo**，而不是等完整 CLI/validate 表面做完。第一個同事 demo 要排在 B 物理上最早的那一刻。

建議的 B 內部順序（**demo gate 在步驟 2，`run` 與 flags 是 B-scope-但-demo-後**）：
1. **薄引擎核心**：YAML 解析 → 依 `from:` 建相依 → 單節點 subprocess（stdin 餵、argv spawn、timeout）→ scratch 快取 + `state.json` Merkle key（見「Cache & Invalidation Contract」）。只 `ai`+`cmd`。**v1 執行時只支援線性鏈**；拓樸排序資料結構先建好當未來分岔的 groundwork，但分岔「可解析、不必能跑/能畫」。是否真要分岔，等 The Assignment 看完那位同事的真實流程再定（見 Open Question #3）。G2 假模型 profile 與失效正確性測試在此步就要建。
2. **迭代窗（← 最早 demo / 採用 gate，先於 `run`）**：`chainq ui <yaml>`，三欄 F1/F2，釘上游樣本 → 改 prompt → 只跑此節點 → 看輸出（寫 scratch）。**此刻就抓一個同事試。`run` 還沒做也能 demo。**
3. **（demo 後，按觀察到的需求才補）** CLI 表面：`run`（沿用快取 + `--fresh`）、`validate`、`ls`，以及 `--from/--to/--steps/--profile`。**這些 flag 不預先全做**——等步驟 2 那位同事的真實流程證明需要哪些再加（避免 YAGNI）。
4. 編排補完：迴圈容器、C4 schema+回灌重試、write 節點、E4 完整跑前驗證器（CLI/UI 共用）。
5. F3 逐站 → F4 結構編輯。
- 貫穿：G2 假模型 profile（echo/cat fixture）離線自測、G3 log 誠實前綴。
- **scope 釐清**：原文件 §34/36 把「完整 CLI run 表面」列入刻意延後；此處與之一致——`run` 不在 demo gate（步驟 2）內，屬 B-scope 但排在 demo 之後。Open Question #2（要不要把 C 的極簡 `chainq init` 拉進步驟 2~3 降設定稅）仍未定，等第一個同事的觀察結果。

## Open Questions

1. **第一個同事是誰？** 目前只到「團隊好幾人」這個群體層級，還沒有一個名字。Assignment 要補上。
2. **設定稅怎麼降？** 選了 B 不含 C 的「貼上就生鏈」。第一個同事要怎麼從手動複貼搬到第一份 YAML？要不要把 C 的極簡版（互動式 `chainq init`）排進步驟 2~3，避免設定稅擋掉 demo？
3. **線性 vs 分岔的 v1 視圖**：迭代窗 v1 用線性節點清單就好，還是第一個真實 YAML 就有分岔需要畫出來？取決於那位同事的真實流程長相。
4. **「同一引擎」共用的測試策略**：G2 假模型 profile 要在步驟 1 就建好，才能讓 CLI 與 UI 都能離線跑同一條流程做回歸。

## Cache & Invalidation Contract（核心，必須在步驟 1 釘死）

整個價值主張壓在這一塊。Adversarial review 指出原文件把它當「已解決的 plumbing」其實未定義。以下是不可省的契約：

- **節點 cache key（Merkle 式，含傳遞上游）**：
  `key(node) = hash( node.prompt原文 + node.type + node.profile/model + 解析後的上游輸入 + 每個直接上游的 key(upstream) )`。
  關鍵：key 要**遞迴折入每個上游的 key**。改了 A → key(A) 變 → key(B)、key(C) 連鎖變。否則改 A 不會讓 C 失效，工具會送**舊的「有效」快取 = 靜默錯誤**，比慢更糟。
- **「有效快取」定義**：某節點存在 `state.json` 紀錄，且其記錄的 key == 當前重算的 key，且輸出檔存在於 `.chain/outputs/`（或釘樣本對應的 scratch）。三者皆真才算 `⊘ cached`。
- **hash 演算法**：完整 SHA-256，不截斷（碰撞變成非問題，而非靜默可能）。
- **釘樣本（pin）生命週期**：pin 不進主 YAML（§visual data 不進主檔）。存在 `state.json` / scratch，並**以節點身分（node id）為鍵**，使無關的 YAML 編輯不會弄掉 pin、相關編輯才會。pin **要參與下游 key**（釘住的值就是該節點的「事實輸入」）。換 pin → 下游快取連鎖失效。
- **原子寫入**：節點輸出與 `state.json` 一律 temp file + rename。子行程 timeout/被 kill 時，永不留下半寫的「有效」快取。
- **引擎版本欄位**：`state.json` 帶 `engineVersion`/`schemaVersion`；hash 計法或序列化改變時整批 cache 自動失效（不靜默 mismatch）。
- **本機 CLI 登入過期（跑到一半）**：`which` 跑前檢查只擋「沒裝」，擋不了「token 過期」。要定義 auth-failure 偵測契約（exit code / stderr pattern → 獨立的「登入過期」錯誤態，與 cache-miss 區分），且**失敗的跑絕不寫成有效快取**。
- **並行**：MVP 採單寫者假設 + flow 級 lockfile（或 per-run scratch 子目錄），擋兩個 `chainq ui`/`run` 同時動同一份 flow。先文件化此假設，未做就明標 out-of-scope。

## Success Criteria

- **核心爽點可量測**：在迭代窗改一個 prompt 節點，**只有被改的那個節點發出 CLI 子行程，零個上游子行程被觸發**（這才是可保證的部分；單節點本身的 wall-clock 取決於該模型自身延遲，不由我們控制，故不承諾固定秒數——「3 秒」只在 G2 假模型路徑上當回歸基準）。
- **真省額度**：改下游節點時，上游昂貴 `ai` 步顯示 `⊘ cached`、確實沒有發出 CLI 子行程。
- **失效正確性（最容易錯、最該測）**：改上游 A 的 prompt 後，A 的所有傳遞下游（B、C…）都被標為 stale 並重跑；不相干分支維持 `⊘ cached`。這條要有自動測試（用 G2 假模型）守住。
- **採用門檻**：一個真實同事，用他自己的真實 YAML，在不靠你手把手的情況下完成一次「改 prompt → 看結果 → 滿意 → 存回」。
- **誠實回報（G3）**：跳過/失敗/迴圈逐筆成敗在 log 與 UI 都如實標示。
- **離線自測（G2）**：用假模型 profile 能整條離線跑通。

## Distribution Plan

內部工具，分發 = 安裝/上手摩擦（這本身就是對手「零設定終端機」的一部分）。

- **Dev/內部**：tsx 直跑；單一 binary `chain` + 子指令（`run`/`validate`/`ui`/`ls`）。
- **同事拿到工具的方式**：內部 git + 一行 `npx`/`bun` 安裝，或預編譯單檔 binary 放內部 release。**安裝步驟越接近零越好**——任何安裝摩擦都直接餵養「不如繼續複貼」。
- **CI/CD**：MVP 階段可手動 release；若團隊採用起來，再加 GitHub Actions 自動 build 單檔 binary。
- **登入態依賴**：因為靠本機已登入的 CLI（`claude`/`codex`），文件化「同事自己機器要先登入 CLI」這個前置；跑前 `which` 檢查 + stderr 原文報錯（登入過期/command not found）要在步驟 1 就有。

## Dependencies

- 本機已安裝並登入 `claude` / `codex` CLI（`ai` 步的前置）。
- TypeScript + tsx 環境。
- 無外部服務/金鑰（這是賣點，也是約束）。

## The Assignment

**不是「去把 B 做出來」。** 在寫任何引擎程式碼之前，這週做這一件真實世界的事：

> **去找那個團隊裡最常手動串 prompt 的同事，請他把他現在手動跑的那一條流程，把每一個 prompt 原文貼給你，並坐在他旁邊看他跑一次（不要幫他、不要解說，咬住舌頭）。** 記下兩件事：(1) 他改一個 prompt 想看效果時，實際做了哪些動作、花了多久；(2) 有沒有任何一步讓你意外——他做了你設計時沒預期的事。

這一個人，就是你的第一個 demo 對象，也是 Open Question #1 的答案。在你知道他那條流程長什麼樣之前，別決定迭代窗 v1 要不要畫分岔。

## What I noticed about how you think

- 你沒有說「能跑多步」是賣點。你自己在文件裡就把北極星訂在 B 系列迭代與 E 系列接續——大多數人會把「multi-step orchestration」當主打，你一開始就知道那是基本盤。
- 當我把「需求證據」逼到底，你沒有退到「大家覺得不錯」，你給的是行為：「现在已经有人手动在串这些 prompt」。然後你又自己加上最誠實的那個 yellow flag：「每个人流程不一样」。你沒有為了讓答案漂亮而藏起風險。
- 我推薦 A，你選了 B，而且不是亂選——你在前一題就用「一个 YAML 档案代表一串 prompt chain 指令链、指令可以针对每一个 YAML 做视觉编辑」把整個檔案模型講清楚了。你心裡有一個一致的系統圖，所以你寧可先把引擎做對。這是有 conviction，不是 compliance。
- 你對「設定檔可讀優先」「誠實回報」這種看不見的東西有 taste。把座標擋在主檔外、log 用顯眼前綴承載誠實——這些是會被多數人省略、但會決定工具好不好用的細節。

## Engineering Review — Locked Decisions (2026-06-03, /plan-eng-review)

引擎與架構層級的決定，補在設計文件之上。除 D1/D3 外多為實作層拍板（使用者授權自行決定 plumbing、只升級真正的 risk/product 分叉）。

**引擎核心（D1，使用者拍板）**
- Primitive 分層：`runNode(N, resolvedInputs)` 是最底層；`materializeUpstream(N)` 走 N 的傳遞 `from:` 依賴（有效快取沿用、stale 才 runNode，遞迴）；`runToNode/rerunNode/runChain` 都是它的呼叫者。
- 對映 n8n：**发布模式 = `chainq run`（CLI，整條跑，write/成品節點才 fire）**；**Runtime 模式 = 迭代窗（run-to-here、re-run-node、逐節點 I/O 可見、write 節點不 fire）**。

**UI↔引擎（D2，自行拍板）**
- in-process：引擎是一個 TS 庫，UI 的 Node http server 與 CLI 都 import 它。單節點跑 = 函式呼叫。
- 加 subprocess registry：server 重啟/熱重載時 kill 掉孤兒模型子行程（Codex 提醒）。

**Cache key（擴充後，修正 Codex 抓到的正確性漏洞）**
- `key(N) = SHA-256( prompt原文 + type + profile指令字串 + model旗標 + CLI版本標記 + canonical(resolvedInputs) + 每個直接上游的 key )`。Merkle/傳遞。
- **`cmd` 節點預設不可快取**（如 `cat ./input.txt` 的輸出會隨檔案改變而不反映在 key），除非使用者宣告其輸入檔清單。
- 本機 CLI 模型**不是純函式**（會讀檔、有副作用）→ 快取是 **best-effort**；`--fresh` 與逐節點 force-run 永遠可用。
- canonical（key 排序）序列化，避免跨平台/物件序 hash 飄移。

**D3（使用者捍衛並維持）**
- 昂貴上游 stale → **靜默自動重跑**（n8n 順手）。逐節點 ran/cached 狀態即時可見 + 「本次呼叫 N 個 ai 節點」摘要。
- **框架修正**：AI 步跑在使用者**本機 `claude` CLI 訂閱**上，非計量 API。快取/跳上游的價值是**省延遲 + 省訂閱 rate-limit**，不是省計量金錢。Codex 的「意外燒錢」反對基於錯誤前提，已駁回。

**其餘 plumbing（自行拍板）**
- 節點身分 = YAML key；UI 改名時主動 migrate 其 cache/pin；手改 YAML key = 視為新節點、重跑。
- 並行：`.chain/lock` **阻擋**第二個寫者（非僅警告）+ pid+mtime 清理 stale lock + canonical flow 身分。
- 子行程生命週期：argv spawn、prompt 經 stdin 後關閉、stdout(SSE)+stderr 雙流、timeout SIGTERM→SIGKILL、原子 temp+rename、絕不快取 partial/failed；統一取消語意（停止/瀏覽器斷線/逾時/server 退出）→ 一律收斂成 failed-非快取。auth 過期偵測是 heuristic（非契約）。
- Pin = 具名、可檢視、**可編輯**的 fixture，存 `.chain/`，以 node id 為鍵，參與下游 key。
- 迭代窗同時提供「跑這個節點」（快）與「往下跑到底」（看最終效果改了沒，Codex 提醒：單節點回饋 ≠ 整鏈品質）。
- 運算式 v1 = 只做第一層路徑選擇器；第二層 JS 沙箱延後（安全面，未來用 vetted isolate，絕不 raw eval）。
- v1 執行 = 線性鏈；分岔可解析 + 唯讀顯示，跨分岔執行 = 步驟 4。（「任一 YAML」→ 精確為「任一線性 flow」。）
- YAML 存回保留註解/格式（`yaml` 套件 Document/CST API，非 parse→stringify）。

**測試**
- 框架 = vitest（TS，tsx 下跑）。
- 最關鍵測試（regression 級，CRITICAL）：Merkle 失效正確性 — 改上游 A → A 與所有傳遞下游 stale，無關旁支維持 cached。
- 其餘 CRITICAL：原子寫入、pin 隔離、wedge E2E（改 prompt→只此節點 spawn→串流輸出）。
- G2 假模型 harness **最先建**（步驟 1），讓全引擎離線確定性自測。

## NOT in scope (v1)

| 延後項 | 理由 |
|---|---|
| 迴圈容器 | 步驟 4；v1 線性即可驗證 wedge |
| C4 schema + 回灌重試 | 步驟 4。**風險已記**：結構化輸出不符是 prompt 鏈常見失敗源（Codex），延後是有意識的賭注，非疏漏 |
| write/成品節點 | 步驟 4；Runtime 迭代不需要 |
| ~~F4 結構畫布編輯~~ | **移入 v1**（design review D4：使用者要完整 canvas 含拖拉連線）。用 React Flow，不手刻。 |
| 第二層 JS 運算式沙箱 | 安全面；v1 路徑選擇器已覆蓋多數情境 |
| 分岔執行 | v1 線性；分岔先唯讀顯示 |
| cmd 自動快取 | 預設不可快取，除非宣告輸入檔 |
| 並行執行 (--concurrency) | 引擎內拓樸排序預留，但不在 v1 |

## What already exists
Greenfield，連 git repo 都還沒有。沒有可重用的既有引擎或平行流程。乾淨白紙。**第一步先 `git init`。**

## Failure modes（逐新路徑一個現實失敗 + 是否有測 + 是否有錯誤處理 + 使用者看不看得到）

| 路徑 | 現實失敗 | 有測? | 有錯誤處理? | 使用者看得到? |
|---|---|---|---|---|
| Merkle 失效 | 改上游卻送舊快取（靜默錯） | CRITICAL 測 | key 含傳遞上游 + cmd 不可快取 | ran/cached 前綴可見 |
| 子行程 | CLI 掛住/逾時 | 有 | timeout SIGTERM→SIGKILL | ⟳ 逾時前綴 |
| 登入過期 | token 中途失效 | 有 | 獨立錯誤態、不快取 | E1 顯眼錯誤 |
| 原子寫入 | kill 中途半寫 | CRITICAL 測 | temp+rename | — |
| 孤兒子行程 | server 重啟留下模型行程 | 待補 | subprocess registry kill | — |
| cmd 輸入漂移 | 檔案變了 key 沒變 | 待補 | cmd 預設不可快取 | force-run 可用 |

無「無測 + 無處理 + 靜默」的 critical gap。

## Worktree 平行化

```
Lane A（先行，阻塞）  engine/ 核心：dag + cache + run primitive + validate + G2 harness
Lane B（A 介面穩後）  ui/ 迭代窗（依賴 engine 庫介面）
Lane C（A 介面穩後）  cli/ run/validate/ls/flags（依賴 engine 庫）
執行：A 先做到引擎庫介面穩定 → B + C 平行 worktree。
衝突旗標：B 與 C 都依賴 engine/ 的型別介面 → A 的 public API 要先凍結再開 B/C。
```

## Implementation Tasks
Synthesized from this review. P1 blocks the demo gate; P2 same-branch; P3 follow-up.

- [ ] **T1 (P1, human: ~1d / CC: ~30min)** — engine — `runNode` primitive + `materializeUpstream` + `runToNode`/`rerunNode`/`runChain`
  - Surfaced by: Architecture D1 — layered primitive, both UI and CLI are callers
  - Files: `engine/run.ts`, `engine/index.ts`
  - Verify: vitest — runToNode reuses cached upstream, reruns only target
- [ ] **T2 (P1, human: ~1d / CC: ~30min)** — engine — Merkle cache key (prompt+type+profile+flags+CLI-ver+canonical inputs+upstream keys) + `isValid` + cmd-uncacheable-default
  - Surfaced by: Architecture cache contract + Codex (env/cmd gaps)
  - Files: `engine/cache.ts`
  - Verify: CRITICAL invalidation test (edit A → B,C stale, sibling cached)
- [ ] **T3 (P1, human: ~1d / CC: ~30min)** — engine — subprocess lifecycle (argv/stdin/dual-stream/timeout/SIGKILL/atomic/cancellation/registry)
  - Surfaced by: Architecture subprocess + Codex (orphans, cancellation, stderr)
  - Files: `engine/run.ts`, `engine/proc.ts`
  - Verify: timeout kills; partial never cached; cancel → non-cache
- [ ] **T4 (P1, human: ~half-day / CC: ~20min)** — infra — G2 fake-model profile harness (echo/cat fixture)
  - Surfaced by: Test review — enables offline deterministic engine tests
  - Files: `engine/profiles.ts`, `test/fixtures/`
  - Verify: whole chain runs offline, deterministic
- [ ] **T5 (P1, human: ~half-day / CC: ~20min)** — engine — DAG parse + cycle detect + `validate()` (E4 shared by CLI & UI)
  - Surfaced by: Architecture + Code Quality (one validator, two callers)
  - Files: `engine/dag.ts`, `engine/validate.ts`
  - Verify: collects all errors; "did you mean X" hint
- [ ] **T6 (P1, human: ~2d / CC: ~1h)** — ui — node panel (opens on canvas node click): 3-col INPUT(Schema/Table/JSON) | PROMPT + live rendered(代入後) | OUTPUT(6 states) + run-node + run-to-end + SSE stream + save-back(validate, YAML-preserving)
  - Surfaced by: Architecture D2 + Design review (canvas-first IA, render preview, Schema/Table/JSON, 6 OUTPUT states)
  - Files: `ui/server.ts`, `ui/panel/*`
  - Verify: E2E — click node → panel opens → edit prompt → rendered preview updates → only that node spawns → output streams; save preserves comments
- [ ] **T7 (P2, human: ~half-day / CC: ~20min)** — engine — `.chain/lock` single-writer (blocks) + stale cleanup (pid+mtime)
  - Surfaced by: Architecture concurrency + Codex
  - Files: `engine/lock.ts`
  - Verify: second writer blocked; stale lock reclaimed
- [ ] **T8 (P2, human: ~half-day / CC: ~20min)** — ui — per-node ran/cached status + "called N ai nodes" summary + Stop button
  - Surfaced by: D3 mitigation (G3 honest reporting)
  - Files: `ui/pane/status.ts`
  - Verify: status reflects ran vs cached live
- [ ] **T9 (P2, human: ~1d / CC: ~30min)** — cli — `chainq run/validate/ls` + `--fresh/--from/--to/--profile` (post-demo)
  - Surfaced by: Build order step 3 — gated by observed need
  - Files: `cli/*`
  - Verify: run reuses cache; --fresh ignores
- [ ] **T10 (P1, human: ~4~6d / CC: ~2~4h)** — ui — **canvas (home, v1)** via React Flow: render graph from YAML + layout sidecar, click node → T6 panel, drag-to-connect → write `from:`, add/delete node → edit `steps`, loop container view, all structural edits round-trip to comment-preserving YAML + `validate()` before save
  - Surfaced by: Design review D4 (user wants full editable canvas in v1) — biggest scope item, hardest piece
  - Files: `ui/canvas/*`, `engine/yaml-edit.ts`, `.chain/layout.json`
  - Verify: CRITICAL — drag-connect two nodes → YAML `from:` written correctly + comments preserved; delete node → dangling `from:` caught by validate; layout never pollutes flow YAML

## Design Review — Locked Decisions (2026-06-03, /plan-design-review)

Initial design completeness 5/10 → 8/10. Classification: APP UI (developer workspace, n8n-style). No DESIGN.md (seeding minimal tokens below; `/design-consultation` later for full system).

**Information architecture (D4 — the big correction)**
- **Canvas is home.** 進去就是一塊 canvas（節點圖 + 連線），不是常駐三欄。點節點 → 浮出該節點的編輯視窗（dimmed backdrop，知道自己在編哪個節點）。對映 idea.md §10 F1「左圖右詳」。
- **v1 = 完整可編輯 canvas（含拖拉連線、增刪節點）** per使用者 D4。Risk flagged & accepted: 這是全案最難、最貴的一塊，且不是 wedge 本身；使用者為了 n8n 感受接受此 trade-off。
- De-risk（plumbing，自行拍板）：(1) 用 **React Flow / Svelte Flow，不手刻**；(2) **canvas 座標存 `.chain/layout.json` sidecar**，主 YAML 只存邏輯、保持可讀；(3) **canvas→YAML 序列化是新的關鍵正確性面**（連線=寫下游 `from:`；增刪=改 `steps`；每次結構編輯 round-trip 到保留註解的 YAML + 存前 `validate()`）。

**節點編輯視窗（點 canvas 節點開啟）**
- 三欄：**INPUT**（`Schema`/`Table`/`JSON` 三分頁，Schema=可點欄位樹插入 `{{$json.x}}`）| **PROMPT**（原始模板，`{{}}` 高亮）+ **`fx 代入後` 即時 rendered preview**（綠，代入上游值後真正送模型的提示，改模板即時更新、不發子行程）| **OUTPUT**。
- **OUTPUT 六態**（最大缺口，已補）：never-run（空態給主動作「Run node / set mock data」不留白）· streaming（逐 token+游標+Stop）· fresh（綠●+耗時）· cached（灰⊘，沒發子行程=沒燒 rate-limit）· failed（整格紅+stderr 原文+「前面已完成」脈絡+Retry）· auth-expired（登入過期專屬態，與一般失敗區分）。
- **loop 節點**：開窗顯示 `over {{...}} ×N` / `as item`、item selector（0..N，逐筆狀態色，✗ 那筆紅）、子鏈 chips（title→shorten）、rendered preview 綁定選中的 item；輸出 `{{loop}}` / pluck `{{loop[*].title}}`。

**鍵盤優先（power tool 的 a11y 重點）**
- `⌘↵` run node · `⇧⌘↵` run to here · `↑↓` 選節點 · `⌘S` 存回 · `Esc` 關視窗/Stop · 觸控目標 ≥44px。

**視覺語言（seed tokens，aesthetic A：暗色終端 + 狀態色）**
- mono font（Berkeley Mono / JetBrains Mono）· 一個 amber accent · 狀態色 ok=綠/cache=灰/bad=紅/pin=藍 · 1px 細線、無圓角卡片、無漸層。狀態色是 G3 誠實回報的負載，不可省。

**AI slop 風險**：低（開發者工具，避開全部 11 項 slop pattern）。無 issue。

## Approved Mockups

| Screen | Mockup Path | Direction | Notes |
|--------|-------------|-----------|-------|
| 迭代窗 (canvas + node panel) | ~/.gstack/projects/chain-cli/designs/iteration-pane-20260603/wireframe.html | 暗色終端 + 狀態色；canvas home + 點節點開三欄視窗 | HTML 線稿（designer 無 OpenAI key，改手刻線稿）；含 loop 子鏈 + 6 OUTPUT 態 |

## Engineering Review — Canvas→YAML (2026-06-03, re-review of T10)

Focused re-review after design D4 pulled the full editable canvas into v1. Engine/cache/subprocess already cleared (first eng review) — not re-touched.

**State model (decided)**
```
React Flow in-memory graph = live edit BUFFER
  → on ⌘S / debounced autosave:
    serialize → SURGICAL CST edits on existing YAML (yaml Document API; preserve comments + key order; NEVER regenerate)
    → validate(flow) → pass: atomic write YAML + .chain/layout.json (both temp+rename) · fail: reject, keep buffer, inline errors (壞不落地)
positions → .chain/layout.json keyed by node id; never in main YAML
```

**Forks (user-decided)**
- **D1 — UI stack: React + React Flow (@xyflow/react).** idea.md's hand-coded vanilla canvas is dead; Node http server now serves a React app, tsx for dev. Most mature lib for n8n-style editors.
- **D2 — delete a middle node = n8n-style auto-bridge.** Deleting B rewires its downstream's `from:` to B's upstream (gap closes, chain stays connected). Ambiguous multi-in/multi-out → fall back to a confirm dialog.

**Plumbing (decided)**
- Edge A→B sets B's `from`; `from` = string (single) or list (multi-input), first = `$json`, all addressable via `$node["A"]` (or the n8n-style alias `$('A')`); edge order = list order.
- Cycle check incremental on connect → reject edge live; `validate()` backstop.
- Layout sidecar keyed by node id; missing → auto-layout (dagre/elk); orphan → GC on save; positions never affect execution.
- New node: inline-editable name = YAML key. Rename migrates cache/pin/layout (rename-migration from first review).
- Atomic save: validate first; write YAML + layout both-or-neither; YAML authoritative, layout best-effort (stale → re-auto-layout).

**Critical tests (6)** — comment+order round-trip (make-or-break), delete-bridge (linear + ambiguous), connect→`from:`, cycle-reject, validate-fail-writes-neither, move-node→flow-YAML-byte-identical. Plus E2E: drag-connect → `from:` written + comments survive.

**Build sequencing note:** demo gate (wedge) is still T6 node-panel; the canvas (T10) is the home but the wedge can be E2E-tested through the panel before the canvas is fully editable. Recommend: render-only canvas first (click→panel), then layer drag-connect/add/delete + serialization. This keeps the make-or-break serialization work isolated and testable.

- [ ] **T10 (P1, human: ~4~6d / CC: ~2~4h)** — ui — canvas (React Flow): render from YAML+layout, click→panel, drag-connect→`from:`, add/delete→`steps` (n8n bridge on delete), inline rename, loop container view
  - Files: `ui/canvas/*` (React)
  - Verify: E2E drag-connect writes `from:` + comments survive; delete bridges gap
- [ ] **T11 (P1, human: ~2~3d / CC: ~1~2h)** — engine — `yaml-edit.ts` CST surgical editor (add/delete/connect/rename → comment+order-preserving YAML) + incremental cycle check
  - Files: `engine/yaml-edit.ts`
  - Verify: CRITICAL comment/key-order round-trip; cycle attempt rejected
- [ ] **T12 (P1, human: ~1d / CC: ~30min)** — engine — `.chain/layout.json` sidecar: auto-layout missing, GC orphans, positions never touch flow YAML; atomic both-or-neither save
  - Files: `engine/layout.ts`
  - Verify: move node → flow YAML byte-identical; validate-fail → neither written

## Partial Execution Contract (n8n analysis, 2026-06-03)

Evaluated n8n's `partial-execution-utils` (DirectedGraph, findStartNodes, findSubgraph, cleanRunData, handleCycles, recreateNodeExecutionStack…). Verdict: our engine already implements ~80% of it; we ADOPT one structural idea and KEEP our invalidation core.

**UI action → system promise → engine entry (the contract to honor):**

| UI action | promise | chain entry |
|---|---|---|
| Execute workflow (全量) | rerun from start, ignore cache | `runChain()` (+ `--fresh` to ignore cache) |
| Execute step (部分) | run node + minimal upstream, reuse rest | `runToNode(node)` |
| Pin | freeze output, never call the model | `Runner({pins})` — short-circuits before run |
| Edit param / connection | node + downstream recompute next run | Merkle key changes → auto-dirty (no manual cleanup) |

**Adopted — plan/execute separation** (`engine/plan.ts`): `planRun(flow, destination, deps)` predicts `{toRun, toReuse, toSkip, aiCallCount}` WITHOUT executing. Powers the UI/CLI **preflight** ("will call N ai nodes" — quota awareness up front). One shared `nodeDisposition()` is used by both the planner and the executor so a plan can never drift from what runs. Pin > cache > run priority.

**NOT adopted — n8n's mutation-based `cleanRunData`.** We keep **Merkle keys**: editing a node changes its key → `isValid` false → it + transitive downstream re-run, automatically. Content-addressed, deterministic, no "remember to clear the right set" (n8n shipped bugs there). Post-edit stale display is handled by the UI computing `isValid` and marking a node **stale** (better than n8n blanking it).

**Banked for the loop container (do NOT forget when building loops):**
- A loop is an **atomic re-run unit** — anything dirty inside → whole loop reruns from entry. (Per-item *trial* runs still allowed via scratch/pin.)
- An **unclosed loop must not be a start node** (n8n #22555).
- Loop `done` output empty → loop becomes a start node; **`null` AND `[]` both count as "no data."**
- **Composite edge keys with port index** — adopt only when multi-output/branch nodes (IF) appear; today `from` is a single name / list.
- n8n's **cycle/Tarjan-SCC** model is only needed for feedback/while loops — out of scope; chain's container loops (`over: <fixed list>`) match the design.

## chainq init + E2E Verification (plan, 2026-06-03, eng review pass 4)

Goal: a `chainq init` command to create a new project, plus a fixture-driven E2E harness that runs a few real workflows offline to verify the whole journey.

**`chainq init [dir] [--force]` (D1: dual profile)**
- Creates `<dir>` (default cwd) and writes:
  - `flow.yaml` — `profiles{ default:'claude -p', fake:'cat' }` + a 2-node example (`load` cmd reading `input.txt` + `summarize` ai), with comments on offline vs real runs.
  - `.gitignore` — `.chain/`
  - `input.txt` — sample text
- **Refuses** if `flow.yaml` exists (exit 1) unless `--force` — never clobber user work.
- Prints next steps: `chainq run flow.yaml --profile fake` (offline) · `chainq run flow.yaml` (real, needs `claude login`).

**E2E harness (fixture-driven, DRY)**
- Refactor the single `src/cli/e2e.test.ts` into a fixtures array: `{ name, setup(dir), args, expect: {nodeId:status} | {exitCode,outMatch} }`. One test loops, spawns the repo's **absolute** tsx CLI binary with `cwd=tempdir`, strips ANSI, asserts. (Reuse the existing spawn helper — the `npx tsx` / file:// gotchas are already solved.)

**Workflow matrix (the "few workflows"):**
| Fixture | Verifies |
|---|---|
| init-scaffold | `chainq init` → `run --profile fake` → all nodes ran; files created (CRITICAL) |
| init-refuse | init over an existing `flow.yaml` → exit 1, file untouched (`--force` overrides) |
| linear-cache | cold → cached → edit-downstream → edit-upstream (cache correctness) |
| cmd-inputs | cmd reads `input.txt` (cwd + declared-input cacheable) → cached on run 2 |
| pin-scratch | `--pin` → scratch written, real outputs untouched |
| multi-input | `from:[A,B]` → `$json`=A; reorder → different output (from-order regression) |
| validate-fail | broken flow → exit 1 + error message |

**Failure modes**
- `chainq init` clobbering an existing flow → refuse + `--force` (tested: init-refuse).
- E2E spawn from temp cwd → absolute tsx binary (known gotcha, already solved in `e2e.test.ts`).

**NOT in scope:** interactive init / "貼上就生鏈" (office-hours idea — deferred, scope creep); CI workflow (the E2E runs under `npm test`, no new pipeline); loop/schema/write-node workflows (those features aren't built yet).

**Tasks:** T-init-1 `chainq init` command (+refuse/--force) · T-init-2 fixture-driven E2E harness (refactor existing) · T-init-3 the 7-workflow matrix · T-init-4 README quickstart.

## GSTACK REVIEW REPORT

| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 0 | — | — |
| Codex Review | `/codex review` | Independent 2nd opinion | 1 | issues_found | 30+ challenges, key ones absorbed (cache env/cmd, YAML-preserve, cancellation, lock-blocks); D3 tension → user defended |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 4 | clean | P1 engine plan · P2 canvas→YAML plan · P3 code review (4 fixes) · P4 chainq init + E2E verification plan: dual-profile scaffold, fixture-driven harness, 7-workflow matrix |
| Design Review | `/plan-design-review` | UI/UX gaps | 1 | clean | 5/10 → 8/10, 4 decisions; canvas-first IA, 6 OUTPUT states, render preview, loop view; D4 pulled full canvas into v1 |
| DX Review | `/plan-devex-review` | Developer experience gaps | 0 | — | — |

- **CODEX:** First pass caught the cache-key environment gap (cmd uncacheable, profile/CLI-version in key) and YAML-preservation — both absorbed. D3 objection rejected on wrong-premise (local CLI subscription, not metered API).
- **CROSS-MODEL:** Codex challenged D3 (silent re-run), YAML-key identity, in-process engine. User defended D3 with reasoning; identity + in-process kept with mitigations.
- **CANVAS RE-REVIEW (D4 follow-up):** the canvas→YAML scope is now reviewed. Stack = React + React Flow; delete = n8n auto-bridge; serialization = CST surgical edits (comment-preserving), layout in sidecar, validate-before-write. 6 critical tests, comment/order round-trip is make-or-break. Scope-change flag from the design review is now resolved.
- **CODE REVIEW (eng pass 3, as-built engine):** 4 findings, all FIXED with regression tests. The two load-bearing ones were silent-wrong-output bugs invisible at plan stage — (A1) `computeKeys` sorted upstream keys but `$json` binds to the first upstream → reordering `from` served stale cache; (A2) `Runner` memo/blocked were instance state → a reused Runner (the UI pattern) had memo shadow `force`, so `rerunNode` returned stale output. Both fixed; memo/blocked now per-operation (`RunCtx`). PR #4.
- **UNRESOLVED:** 0.
- **CHAIN INIT + E2E (eng pass 4):** planned `chainq init [dir] --force` (dual-profile scaffold: `claude -p` default + `cat` fake) and a fixture-driven E2E harness running a 7-workflow matrix offline (init-scaffold, init-refuse, linear-cache, cmd-inputs, pin-scratch, multi-input, validate-fail). No code yet — plan cleared to implement.
- **VERDICT:** ENG (×4) + DESIGN CLEARED — engine plan, canvas→YAML plan, as-built engine code, AND the chain-init/E2E plan all reviewed; 48/48 tests green. Next: implement `chainq init` + the E2E matrix, then the UI surface.