# Changelog All notable changes to `huu` are documented here. The format follows [Keep a Changelog 1.1.0](https://keepachangelog.com/en/1.1.0/). This project adheres to [Semantic Versioning](https://semver.org/): breaking changes bump the MAJOR version (in the pre-1.0 phase they rode MINOR bumps). ## [Unreleased] ## [5.1.0] - 2026-07-02 ### Added - **Folder picker reaches the whole workspace.** Now that huu is docker-only, the web folder picker could only browse what was mounted into the container (`~/.huu`, `~/Downloads`). The wrapper now bind-mounts a **workspace root** (RW, at the same absolute path) — `HUU_WORKSPACE`, default `$HOME` — so the picker opens there and can mark/run any project under it. A **⌂ Home** button jumps to the root; `/api/bootstrap` exposes it and a bare `/api/folders` opens there. Tighten the blast radius with `HUU_WORKSPACE=~/Projects`, or widen it to `/` for the entire filesystem. Security note: the workspace is mounted read-write, so an agent's shell can reach anything under it (including `~/.ssh` when it's `$HOME`) — keep it as small as your projects allow. ## [5.0.0] - 2026-07-02 ### Added - **The RAM dial is now a contract, enforced at every layer** (born from the 33-concurrent-runs incident that froze a 32 GiB host at both 85% and 50%): - **Pressure ladder** (`src/orchestrator/pressure-ladder.ts`): the memory guard's single RAM/CPU ≥ 95% trigger — which a swapping host never crosses (it thrash-freezes first) — is replaced by graded levels: **L1** usage sustained over the budget dial → spawn freeze + shed newest agents until back under (never below one live agent: degrade to sequential, not zero); **L2** host pressure (earlyoom-style available-RAM + free-swap floors, PSI `full avg10` ≥ 5%, sustained swap-in, or the legacy 95% line) → shed every tick with the guard tick accelerated 500 ms → 150 ms, queued-run admission frozen; **L3** emergency floors / PSI full ≥ 20%. All thresholds have `HUU_GUARD_*` env knobs. - **Reservation accounting**: admission charges in-flight spawns (full footprint) and young agents (half) against the budget, closing the 1–2 s stale-metrics window in which N runs could over-admit in a burst. The footprint EMA samples only MATURE cohorts (young agents looked cheap and dragged the estimate down — the over-admission spiral) with asymmetric smoothing (tracks up fast, down slowly); its clamp rose 2048 → 4096 MiB. - **Swap/PSI-full awareness**: the sampler now reads SwapTotal/SwapFree, PSI `full avg10` (the canonical thrash signal) and the `/proc/vmstat` swap-in rate, and polls at 250 ms (instead of 1 s) near the budget edge. - **Kernel memory ceilings** (ROADMAP Fase 2.1 delivered): on native Linux huu re-execs itself into a transient **systemd user scope** with `MemoryHigh` = total − OS reserve (the kernel throttles huu's whole tree before the host thrashes; the desktop stays alive), `MemoryMax` = total − reserve/2 (worst case: huu dies inside its scope, never the host), `MemorySwapMax` (`HUU_SWAP_MAX_MB`, default 4 GiB) and `TasksMax`. Degrades to unwrapped with a one-line note when systemd isn't usable; `HUU_NO_CGROUP=1` opts out. The Docker wrapper applies the same ceiling to the container (`--memory`, `--memory-swap`, `--pids-limit`; `HUU_DOCKER_MEMORY_MB` / `HUU_NO_MEM_LIMIT` overrides) — an unlimited container could consume 100% of host RAM. - **OOM victim shaping for tool subprocesses**: a 2 s `/proc` sweep raises agent tool children (vitest workers, npm installs, builds) to `oom_score_adj +500` (`HUU_CHILD_OOM_SCORE_ADJ`), so a kernel OOM kills a test runner — surfacing as a task retry — instead of the orchestrator or the user's session. They previously inherited huu's protective −100. - **Web settings are now server-owned and live**: the ⚙ RAM-budget dial POSTs to the new `POST /api/settings`, which applies it to the shared scheduler IMMEDIATELY (current + queued runs), persists it in `~/.config/huu/web-settings.json` and echoes the effective value; `/api/bootstrap` reads it back. Previously the dial only traveled piggybacked on each run POST — changing it mid-run silently did nothing and a blank field silently meant 85%. - **Budget telemetry**: a 1 Hz `{type:'budget'}` SSE frame + topbar chip show the dial in force, used/total RAM, PSI and the guard's pressure level; a running run whose agents were all withheld by the guard shows an amber pulsing **paused (RAM)** pill and resumes in place when RAM frees up. `huu status` gains a **ram containment** doctor section: dial + source, budget bytes, detected kernel ceiling (or "NONE — software guard only"), live PSI/swap and every active `HUU_*` safety knob. - **Queue as many projects as you want**: the web accept cap is now `HUU_MAX_QUEUED_RUNS` (default 256, was a hardcoded 64); queued runs cost no budget. Admission charges each run's fixed baseline (`HUU_RUN_BASELINE_MB`, 384 MiB) against byte headroom, adapts the live-run cap to the machine (budget ÷ (baseline + agent footprint), ceiling `HUU_MAX_LIVE_RUNS` = 8), and a merging run no longer bypasses a zero-capacity signal (the hole that pulled runs in while the machine was shedding). - **Hermetic pi runtime (default ON).** Every pi session huu composes — openrouter AND azure backends, task agents and conflict resolvers alike — is now hermetic: in-memory auth/settings/model-registry fed by the run's key, ZERO reads of the host's `~/.pi`, ZERO global npm `pi-*` extension discovery (`npm root -g` is never consulted), no skill/prompt/theme auto-discovery, and a huu-owned agent dir (`~/.huu/pi-agent`, with `PI_CODING_AGENT_DIR` exported only-when-unset as defense in depth). This closes the door a host-global `pi-animations` extension once walked through to crash an entire multi-run fleet from a detached timer. Debug escape hatch: `HUU_PI_HERMETIC=0` restores the legacy host-global behavior byte-for-byte. - **`huu status` pi-runtime doctor.** The status report (text and `--json`) now shows the installed pi version, whether hermetic mode is on, the effective agent dir (and where it came from: env override / huu-owned / host), and the host-global `pi-*` packages found-and-ignored. Never runs on `--liveness` (the Docker HEALTHCHECK path stays cheap). - **RAM-tuning env knobs + footprint observability.** New `HUU_AGENT_MEM_SEED_MB` (per-agent memory seed, clamped 128–2048 MiB) and `HUU_AGENT_MEM_EMA_ALPHA` (EMA factor, 0.01–1) let a user who has MEASURED their real per-agent footprint tune the AutoScaler's admission model — defaults are unchanged (the pessimistic 1536 MiB seed remains the OOM guard). The scaler now logs its effective memory model on start (`scaler`/`config`) and significant observed-footprint moves (`scaler`/`ema_move`) to the debug NDJSON for evidence-based calibration. ### Changed - **BREAKING — huu is now DOCKER-ONLY.** The native pipeline-execution mode was removed: `--yolo`, `--no-docker` and `HUU_NO_DOCKER` no longer bypass the container (the CLI prints a notice and re-execs anyway; the flags are stripped before reaching the in-container CLI). Both machine freezes happened on native runs, and the container's kernel memory ceiling (`--memory`) is the one guarantee software can't undermine. What still runs on the host is not pipeline execution: `--help` and the host utilities (`init-docker`, `status`, `prune`). `scripts/huu-try` now always builds + runs `huu:local` via Docker. The native systemd-scope wrap remains in the tree as dormant defense-in-depth. - **BREAKING — default RAM dial lowered 85% → 70%** (`DEFAULT_RAM_PERCENT`). On a desktop the OS + browser + IDE routinely hold 20–30% of RAM, so 85% started every run already at the edge. Explicit `HUU_RAM_PERCENT`, the `--ram-percent` flag and the web Setting override as before. - **MAX (greedy) removed from the web UI**: every web run is subordinate to the shared scheduler, where the per-run greedy flag never drove anything — the button was lying (the 33-run incident was launched with it "on"). Legacy `greedy` POSTs coerce to `auto`; the topbar toggle cycles Auto ⇄ Manual. The single-run TUI MAX became **budget-greedy**: it floods one agent per queued task only while the RAM dial has headroom, instead of up to the 95% destroy line — the dial holds in every mode. - **OS reserve**: the budget's untouchable floor grew from a flat 512 MiB to an adaptive `max(min(2 GiB, 25%), 8%, 512 MiB)` of total RAM (`HUU_OS_RESERVE_MB` overrides) — 512 MiB was far too thin for a desktop running a browser next to 33 runs. - The RAM-safety env knobs (`HUU_RAM_PERCENT`, `HUU_GUARD_*`, `HUU_OS_RESERVE_MB`, run caps) now pass through the Docker wrapper into the container; a host `HUU_RAM_PERCENT` was previously ignored inside. - `HUU_AGENT_MEM_SEED_MB` clamp widened to 128–4096 MiB (was 2048), matching the raised footprint-estimate ceiling. - Terminal web runs no longer retain their full untrimmed state for the server's lifetime (slim snapshot + retention cap of 100 terminal runs). - **pi dependency pinned exactly** (`@mariozechner/pi-coding-agent` and `@mariozechner/pi-ai` at `0.73.1`, no caret): the hermetic composition relies on SDK option names, so version drift is now an explicit, reviewed choice (the hermetic canary tests fail loudly on a regressing bump). - **Agent context files are now SCOPED.** pi sessions no longer auto-inject AGENTS.md/CLAUDE.md from every ancestor directory (which reached `$HOME` and `~/.pi/agent`); huu injects only the target repo root's AGENTS.md/CLAUDE.md (deduped by realpath). Pipelines that relied on `$HOME`-level context files must move that guidance into the repo or the pipeline prompt. - **`huu Test Suite` fan-out widened (12 → 24 files).** The per-file test-writing step is this pipeline's only parallel stage, so its width is what actually exercises the machine-wide RAM budget; at 12 the aggregate demand rarely reached the admission ceiling (the RAM dial went unused). Existing users keep their materialized `pipelines/huu-test-suite.pipeline.json` (bootstrap never overwrites) — delete it to pick up the new width. - **Guided web launch — pipeline → projects → queue (a "cart" flow).** The web launch view is now a 4-step wizard: **pick a pipeline**, **mark one or more project folders** (the folder picker gained a **checkbox** — navigate the filesystem and tick every target; marks persist across navigation), **configure that pipeline once** (provider/model/concurrency/time, shared by all its marked projects), then **add another pipeline** or **run the queue**. Each pipeline **fans out into one run per marked project**, and the queue renders **grouped by pipeline**. Replaces the old one-folder-per-item form where targeting N folders meant re-filling the whole config N times. The scheduler/admission path is unchanged — running the same pipeline over many projects, or many projects on one repo, is safe (runs isolate worktrees/branches by `runId`). Web UI only; the Ink TUI launch flow is unchanged. ### Fixed - **The memory guard now fires on a swapping host.** It previously keyed only on RAM/CPU ≥ 95%, which swap keeps below the line while the machine livelocks — the "guard never fired" signature of both the 9-run and the 33-run freezes. The pressure ladder's earlyoom-style (avail + free-swap) and PSI-full triggers catch exactly that state. - **Web folder picker no longer lists symlinked files as folders.** `listDirs` now follows directory symlinks but excludes symlinks that resolve to a file (e.g. `CLAUDE.md -> AGENTS.md`), so only real navigable/markable directories appear in the project picker. ## [4.0.0] - 2026-06-30 ### Added - **RAM budget dial — "fill the machine, never crash" (resource-control Fase 1).** Concurrency is now governed by a configurable share of TOTAL RAM instead of an opaque safety margin: set it with `HUU_RAM_PERCENT`, the `--ram-percent=` flag, or the new **RAM budget %** field in the web Settings panel (machine-global — one machine, one RAM, no per-project override). Default 85%, clamped 10–95. - **Pressure-aware admission (Linux PSI).** The scaler now reads memory Pressure Stall Information (`/proc/pressure/memory`, or the per-cgroup `memory.pressure` in a container) and freezes new agents the moment real pressure appears — *before* RAM saturates. Where PSI is unavailable (macOS, kernels without it) it falls back to the previous RAM-percent gate. - **Lazy admission in the web UI.** Launching a queue of projects no longer starts them all at once: the server admits the first immediately and holds the rest in a new **queued** state, pulling each in as the shared budget frees up (the headless `run-many` already did this; the browser now does too). This is the direct fix for the multi-project out-of-memory crash. - **Configurable OOM protection.** `huu` nudges its own `oom_score_adj` so the kernel prefers other processes under pressure; tune via `HUU_OOM_SCORE_ADJ` (conservative default, best-effort — takes effect where the process is privileged, e.g. the container). - **Pause instead of kill under memory pressure (resource-control Fase 2.3).** When the machine runs low on RAM, `huu` now **pauses** an agent — freezing its work in place (its git worktree *and* the agent's conversation transcript are preserved) and freeing the memory — instead of killing it and restarting from scratch. The agent **resumes exactly where it left off** as soon as headroom returns, so a pressure spike costs at most the current step, not the whole task. Paused cards show a distinct **PAUSED** state with a `⏸` counter (vs the `↻` requeue counter of a kill). On by default, single- and multi-run; set `HUU_NO_PAUSE=1` to keep the previous kill-and-requeue behaviour. If a checkpoint can't be taken for any reason it falls back to that behaviour automatically, so it is never worse than before. ### Changed - **Adaptive concurrency control (resource-control Fase 2.2).** The pressure brake is now a **closed-loop controller** instead of a single freeze threshold: it continuously ramps concurrency up while the machine is comfortable and eases it down as memory pressure rises, settling at the highest level the machine sustains without thrashing. `huu` therefore uses more of the machine than the earlier conservative freeze, while still backing off *before* RAM saturates. - The per-agent memory estimate now starts **pessimistic** and corrects downward from real measurements, so a cold start admits cautiously and opens up once it confirms agents fit — the inverse of the old optimistic seed that over-admitted. - New agents **ramp up** over a few cycles (geometric, ~+50% per tick) instead of the whole pool spawning in one burst. Manual (`--concurrency`) still fills immediately. ## [3.1.0] - 2026-06-29 ### Added - **Set a per-agent time limit from the web — globally and per project.** A new **Settings** panel (⚙ in the topbar) holds a global **Max time per agent** (minutes) that caps every agent's run time across the **whole pipeline** for **every run started from this browser**; each project's launch field can **override** it (blank inherits the global). It's sent as `timeoutMinutes`, which the server applies to both the multi-file and single-file card timeouts; blank everywhere keeps the pipeline's built-in default (10 min · 5 min for single-file tasks). The global setting and the per-project value persist in the browser and are recorded in History. **Web UI only — the CLI keeps its own rules.** Previously the web could only raise the limit when *retrying* an already-timed-out card; setting it up front was TUI-only. - **Add projects to a live queue from the home view.** While a queue is running you can return to the launch view (**← Home**), add more projects, and they dispatch **immediately** under the shared scheduler — no restart, no prompt. A *running* banner on home (`N running · X/Y done`) stays visible while you pick, with a **“View board →”** jump back to the kanban. Pure client-side: the multi-run server already admitted concurrent runs, so the change is a sticky `homePinned` flag that opts the home view out of the per-frame board auto-switch, plus immediate dispatch on add and the home banner. - **Failed task cards can now be retried interactively — and timeouts are signalled distinctly.** A timed-out card is shown in **amber** (`TIMEOUT`), separate from the **red** of any other failure (`FAILED`), in both the Ink TUI and the web kanban. When a single run (TUI) or any web run ends with failed cards, it no longer jumps straight to the summary: it pauses in a new **`awaiting_retry`** state (integration worktree kept alive) so you can recover individual failures. Retrying a card re-runs that one task against the current integration HEAD and, on success, merges its branch in — no need to re-run the whole pipeline. A **timed-out** card can be retried with a **new, longer time limit**; any other error just re-runs. User retries show a `⟳N` badge. - **TUI**: on the run dashboard, `R` retries the focused error card (timeouts prompt for a new limit first), `D` finishes the run, `Q` aborts. - **Web**: red/amber cards open a drawer with a **Retry** button (timeouts also get a minutes field); a **Finish** button leaves the review hold. New endpoints `POST /api/run/retry` and `POST /api/run/finish`. - Headless drivers (`run-many`, smoke, `/simulation`) are unaffected — `Orchestrator` only holds open when the new `interactiveRetry` option is set, so `start()` resolves immediately on every non-interactive path. - **`huu-tests-findings.md`** — a new Test Suite deliverable: the finalize step rolls every `suspected-bug` FAQ finding into a human-readable table of bugs the run surfaced but (by the freeze) did not fix, deduped by a stable `sb-` join key and cross-checked against the tests that pin them. - **`$baseCommit` work-step prompt token** and a **base-commit Git Context line for judges** (`src/orchestrator/index.ts`, `src/orchestrator/check-evaluator.ts`). Since stage merges are already committed by the time a step or judge runs, a bare `git status` is clean; exposing the run's base commit lets a step diff what the run actually changed (`git diff --name-only $baseCommit..HEAD`) or restore a frozen file (`git checkout $baseCommit -- `). The Test Suite cleanup step uses it to actively restore any source an agent drifted. ### Changed - **Web project selector is now a custom, animated dropdown — no native ``, whose look couldn't be themed and — being rebuilt on every snapshot — closed the instant it opened during a run (see *Fixed*). The run snapshot carries its `runDirectory` so each run is labelled by the project it operates on, not just the pipeline name. Keyboard-navigable (arrows/Enter/Esc), dismiss-on-outside-click, and degrades gracefully (no animation) under `prefers-reduced-motion`. Motion is **vendored** under `src/web/client/vendor/` so the no-build, offline browser client keeps working with no CDN. - **Run log redesigned into a live, cross-project activity console.** The log drawer's header is now a **live activity bar** that sums the agents running **right now across every concurrent run** (`⚡ N running · M projects · Q queued`), refreshed on every frame — the count reflects all projects in real time, not just the viewed one. Each agent gets a stable hue chip so parallel work is visually separable, level glyphs + a colored rail flag warnings/errors, and when more than one run is live the body becomes a single timestamp-ordered stream merging every run's lines (each tagged with its project). A level filter (All · ⚠ · ✕), a "↓ Latest" jump pill and auto-expand-on-first-run round it out. Entirely client-side: the cross-run count is derived from the run snapshots already on the wire — no orchestrator/server/SimulationEngine change. - **`huu Test Suite` is now code-frozen — it writes tests and NEVER edits your source.** The flagship pipeline's step prompts and judge were rewritten so the production tree is read-only. The old escape hatch ("if a real bug is exposed, fix `$file`") is gone: when a generated test reveals apparently-buggy behavior the agent now **characterizes** it (pins the *actual* current behavior so the suite stays green and truthful — Feathers, *Working Effectively with Legacy Code*), records a fixed-shape `suspected-bug` finding, and — on runners with a real expected-failure idiom (vitest `test.fails`, pytest strict `xfail`, RSpec `pending`) — leaves a strict marker that flips red the day the bug is fixed. Stacks without a native xfail (Go, Rust, JUnit 5) use characterization only — `t.Skip`/`#[ignore]`/`@Disabled` are explicitly banned as bug trackers because they assert nothing. Cleanup now prefers converting a bug-catching test to a marker over deleting it (deletion is the last resort for structurally-broken tests). Prompts also gained banned-token determinism rules and a per-test mutation-strength self-check. Grounded in a fresh research pass over characterization/golden-master testing, the documented LLM test-gen "cheat-to-green" failure mode, mutation testing, and the cross-language expected-failure mechanics. - The Test Suite judge (`5. Suite green and code untouched?`) now enforces the freeze mechanically: it diffs the whole run against its base commit (`git diff --name-only $baseCommit..HEAD`) and reworks if any non-test, non-artifact source path changed, plus anti-cheat clauses that reject assertion-free / weak-only / self-mocked "green by emptiness" tests and orphan suspected-bug findings. These clauses are hard — never waved by the `$runs >= 2` lean-approve shortcut. ### Fixed - **The web project selector no longer "opens and immediately closes" mid-run.** The header run-switcher was a native `` that the combobox migration had removed, throwing on `null.value`; it now restores the saved model id (catalog **or** custom) through the combobox. - **The run log now advances in real time instead of only when the run stops.** The pi event translator was dropping every `message_update` streaming event, so between tool calls — i.e. for most of a generation — the orchestrator emitted no state and the log appeared frozen until the run settled. huu now maps pi's streamed `text_delta`/`thinking_delta` events into a new `stream` agent event, coalesces them into whole lines, and surfaces the assistant's text live in the run log (web log panel, per-agent drawer, TUI and headless logs all benefit, since they read the same buffer). - **"Valid API key still returns 401" — fixed at the root: a saved key now takes precedence over the `OPENROUTER_API_KEY` environment variable.** Previously a stale `OPENROUTER_API_KEY` exported from a shell profile (resolver step 3) silently shadowed the key you saved in the Options screen (step 4), so the pre-run probe sent the wrong key and aborted. The resolver now ranks the explicitly saved store ABOVE the env var (`secret-mount → stored → env-file → env`), so the key you saved is the key huu uses; the env var is only the fallback when nothing is saved (CI/headless behavior is unchanged — no saved key means the env var still wins). The Options screen flags when an env var is set but ignored, the 401 message names the winning source (`resolveApiKeyWithSource` + `keyRemedyHint` in `lib/api-key.ts`, with `shadowsStored` reworked to `storedOverridesEnv`), and the Docker wrapper notes on the host when it forwards the saved key over a present env var. - **Run dashboard header no longer breaks on narrow terminals.** The status row now wraps (`flexWrap`) instead of overflowing, every value carries an explicit space after its label (so "stage" and "5/5" never read stuck together), and the active provider + model are shown up front. - **Web UI background, header and mobile layout.** The ambient aurora is a richer multi-blob drift; the topbar wraps and spaces its metrics so they no longer jam; new phone/tablet breakpoints stack the layout, switch the card drawer + folder modal to full-height sheets, and enforce 44px touch targets on coarse pointers. - **Run cost now sums live in the header — it was stuck at `$0.00`.** `Orchestrator.getState()` (and the headless `auto` result) hardcoded `totalCost: 0`, so the dashboard header never moved even though every agent card metered its own spend. `totalCost` is now the live sum of each per-agent `cost` (the authoritative `usage.cost` OpenRouter returns per turn); the header reads it in real time and renders 4 decimals for sub-cent totals so a running meter is visible. (Merge/judge LLM cost is still not metered — only worker agents count toward the total.) - **The per-agent log drawer no longer jumps to the bottom on every update.** It rewrote its text and force-scrolled to the end on each ~120 ms snapshot, yanking you back the moment you scrolled up to read. It now follows the tail only when you're already pinned to the bottom (or on first open) and preserves your scroll position otherwise. ### Removed - **GitHub Copilot backend.** huu is pi-only now (OpenRouter or Azure AI Foundry via the provider toggle). Removed `src/orchestrator/backends/copilot/`, the `@github/copilot-sdk` optional dependency, the `copilot` backend kind / api-key spec / model catalog, and the `--copilot` flag. Existing `huu auto` configs that set `"backend": "copilot"` no longer validate — switch to `"provider": "azure"` or the default OpenRouter. > Note: the bundled `pipelines/*.pipeline.json` were re-rendered to include > the new `description` field. Existing users keep their materialized copies > (bootstrap never overwrites) — delete a file to pick up the new version, or > run `npx tsx scripts/regen-default-pipelines.ts`. ## [2.1.0] - 2026-06-25 ### Changed - **Bundled pipelines are now fully autonomous — manual file-picking removed.** The six defaults that used to require `scope: "per-file"` (huu Test Suite + the five report-only audits) now discover their own targets: a recon step `produces` a `huu-memory-v1` list (with per-file `hint`s) and the work step fans out via `scope: "memory"` + `filesFrom` — the pattern huu Knowledge System already used. A shared `targetsRecon()` helper in `knowledge-protocol.ts` keeps the selection prompt identical across them, and `registry.test.ts` now enforces that NO default ever reintroduces `scope: "per-file"`. huu Test Suite additionally ends with a CheckStep cleanup loop (reworks until the suite is green); huu Security Audit fans its four independent scan dimensions (recon · secrets · CVE · supply-chain) into parallel `dependsOn` waves joined by consolidation. Step prompts were sharpened against the new prompting playbook. NOTE: `pipeline-bootstrap` never overwrites — existing users keep their materialized JSONs; delete `pipelines/.pipeline.json` to re-materialize the autonomous version. ### Added - **`docs/prompting-playbook.md` (+ pt-BR twin) and the `authoring-agent-prompts` knowledge skill.** A research-grounded, cross-LLM playbook of 12 prompt techniques (atomic decomposition, explicit output contracts, `$file`/`$hint` injection, mechanical forward-default judges, lean pi-backend prompts, …) that the bundled pipelines now follow; indexed from `docs/README.md` and the skill catalog. ### Removed - **Dropped the four loose sample pipelines.** `example.pipeline.json` and `example.conditional.pipeline.json` (repo root) plus `demo-quick` / `security-tests` (`pipelines/`) are gone; the README / onboarding / operations quickstart and the pipeline-JSON guide now point at the bundled defaults (`pipelines/huu-test-suite.pipeline.json`, materialized on first launch) and use huu Security Audit / Knowledge System as the live check-step + `dependsOn` wave examples. ## [2.0.0] - 2026-06-25 ### Added - **MAX (greedy) auto-scaling mode.** Press `M` on the run dashboard to flood the worker pool with one agent per queued task (up to the hard ceiling) and let the always-on memory guard be the sole backstop; concurrency settles at the memory limit, cooldown-damped to avoid thrashing. Surfaced as a blue `MAX` status chip. `+`/`-` still pin manual from any mode, `A` returns to auto. ### Removed - **Web UI mode (`huu --web`) removed entirely.** huu is now terminal-only (Ink TUI + headless `huu auto`). Deleted the `src/web/` HTTP+WebSocket back-end, the `webui/` front-end workspace, the `cli-web.ts` entry point, the `--web` / `--web-port` / `--no-open` flags (and the `HUU_WEB_NO_OPEN` env var), the `smoke-web.sh` smoke test, the `extending-web-mode` skill, and the `docs/WEB-UI.md` guide. The `ws` dependency and the `webui` npm workspace were dropped; the Docker build no longer pre-builds the front-end. The orchestrator, FSM (`src/lib/screen-fsm.ts`), and all back-end layers are unchanged. ### Fixed - Source-run scripts (`npm run dev` / `npm start`) now bypass the Docker re-exec gate, so contributors run the TUI natively from source instead of the wrapper trying to containerize the dev process. ## [1.4.0] - 2026-06-12 ### Added - **Pipeline Assistant v2 — the Architect flow.** Creation is no longer one giant turn: after the interview, THREE structural blueprints are sketched **in parallel** under deliberately different lenses (maximize-parallelism · minimize-cost · maximize-verifiability), a **generative selector** compares them — plus the interviewer's draft as candidate 0 — against a mechanical rubric (no collapsed fan-out, memory pairs for discovery, dependsOn diamonds for independence, safe forward check defaults, sane step counts) and fuses the winner with at most small grafts; final prompts are then written **one step per call, in parallel**; and the result runs through the REAL zod + topology validation with at most one error-guided fix. Research-grounded: best-of-N with generative selection beats single-shot (GenSelect, arXiv:2507.17797; multi-agent verification, arXiv:2502.20379), while self-critique loops degrade good baselines (Self-Refine analyses) — so there is deliberately NO critique loop, only external mechanical feedback. Latency ≈ 3 sequential calls (two phases are parallel). Phases stream live in the TUI; the web one-shot path uses the same flow. New `planning` tier in `recommended-models.json` (deepseek-v4-pro, gpt-5.4, claude-opus-4.6) surfaced in the assistant's model picker — planning is maximum leverage. - **Actionable run errors end to end.** Every fatal failure now records `RunManifest.errorReason` — what broke AND what to do next (first root cause wins; cascades don't overwrite it) — and the messages point at the new `docs/troubleshooting.md` anchors (runaway loops, merge conflicts, git state, memory files, keys, ports, Docker, the fixed macOS spawn bug). The summary screen finally tells the truth: red "Run failed" with the ⚠ reason, yellow "finished — N agents failed" with the first failure + where the logs live, green only when clean. Headless final JSON carries `errorReason`. Full symptom→cause→fix guide: `docs/troubleshooting.md` (+ pt-BR), indexed from `docs/README.md`. - **`dependsOn` — parallel/switch/join via deterministic waves.** Steps declare a dependency graph (GitHub-Actions `needs` style; only earlier steps, so cycles are impossible). Any `dependsOn` switches the run to BSP supersteps: each wave runs every ready step with all their tasks in ONE shared pool (real cross-branch parallelism), then merges sequentially in array order — wave composition and merge order derive from the graph + array, never timing (same pipeline ⇒ same commit sequence). Ready checks run as singleton waves (judges never overlap); check outcomes and `next` become activation edges that re-pend the target plus its downstream cone ("rework redoes whatever depended on it"); a memory step implicitly depends on the step that `produces` its `filesFrom`. Pipelines without `dependsOn` keep the legacy linear cursor byte-for-byte (incl. `next`-as-skip). Surfaces: TUI Deps field (SPACE multi-pick of earlier steps), `⇠ needs` badges, `◇ wave N` chip on the run header, "Fan-out → Join (diamond)" scaffold in the pattern picker, Pipeline Assistant emits diamonds for independent-analyses intents, full guide section in `docs/pipeline-json-guide.md`. - **`produces` + auto-appended MEMORY CONTRACT — memory links without boilerplate.** The producer side of a memory pair is now declarative: set `"produces": ""` on the earlier step and huu appends the exact contract (path, huu-memory-v1 format, the consumer's real `maxFiles` cap, the hint rule) to that step's prompt at run time — pipeline authors never paste format boilerplate and saved JSONs stay clean. Topology rejects two steps producing the same path. TUI: the memory step's Files field becomes a **link picker** (choose a declared producer, pick an earlier step to produce it — huu wires both sides with an auto-named path — or type a custom path); producer steps show a `→ produces` badge. Web editor gains a Produces input + a filesFrom select of declared producers. Pipeline Assistant emits producer/consumer pairs and is forbidden from writing format boilerplate. `huu Knowledge System` dogfoods it (steps 1 and 4 declare produces; their prompts shrank). - **Pipeline-builder UX redesign (research-grounded).** `N` (new pipeline) now opens a **pattern picker** — Discover → Act (memory pair pre-wired), Per-file transform, Audit with judge, Blank — scaffolding linked steps with placeholder prompts instead of an empty screen; the Scope field opens a visible list with a one-line consequence per option (no more blind cycling); `E` on the Prompt field opens `$EDITOR` for multiline editing (git-commit pattern, `VISUAL`/`EDITOR`); the step editor gains a single lazygit-style footer that always shows the focused field's keys; the pipeline list shows an actionable problem line for the selected step ("memory not linked — ENTER, then the Files field links a producer") instead of just a red badge. - **`scope: "memory"` — file fan-out decided by an EARLIER step.** Third file-selection mode beyond whole-project and user-picked: a producer step writes a `huu-memory-v1` JSON (`{"_format":"huu-memory-v1","files":[{"path","hint?","priority?"}]}`) and the consuming step declares `filesFrom`; at stage start huu reads it from the integration worktree (check-loop rewrites are picked up) and spawns one agent per listed path, injecting each entry's `hint` via the new `$hint` prompt token. Deterministic failure split: missing memory file → zero tasks, stage completes empty with a loud warning (stub-safe); corrupt file → run fails. `maxFiles` (default 40) caps width; entries run priority-desc then list order; `config.files` overrides win in headless runs. Surfaces: schema + topology validation (filesFrom required, never the first step), TUI StepEditor (`M` shortcut, filesFrom input replaces the picker), Web UI step editor (new Scope select + filesFrom input — scope wasn't editable on the web at all before), Pipeline Assistant can emit the mode, dedicated guide `docs/memory-scope.md` (+ pt-BR) and a `docs/pipeline-json-guide.md` section. Note: `maxNodeExecutions` counts cursor VISITS — a fan-out of N files is ONE visit (docs corrected). - **Memory-aware dynamic concurrency is now the DEFAULT.** The orchestrator always runs the `AutoScaler`: in `auto` mode (default) the worker-pool target is computed from real memory headroom — `ramAvailableBytes` (new in `SystemMetrics`: cgroup `limit − current`, `/proc/meminfo MemAvailable` on Linux hosts, `os.freemem()` fallback) minus a 10%/512 MiB safety margin, divided by an EMA-observed per-agent footprint (seeded 250 MiB, clamped 128 MiB–2 GiB) — replacing the old fixed total-RAM/250 MB estimate. New flags: `--concurrency=N` (pins manual mode at N) and `--no-auto-scale`; headless configs gain `autoScale?: boolean` with back-compat derivation (a config that sets `concurrency` keeps exact manual behavior). New web message `run.setAutoScale`; `run.setConcurrency` now pins manual (parity with the TUI `+`/`-` keys). Headless NDJSON state events now include `concurrency` and `autoScale` mode. - **Always-on memory guard with kill→TODO requeue.** In BOTH modes, at ≥95% RAM/CPU the orchestrator kills the NEWEST agent (least work done — picked by `startedAt`), resets its kanban card to `pending` with a `requeues` counter (TODO column, `↻N` badge in the TUI, `requeued ×N` badge in the web UI) and requeues the task at the front of the queue — older agents' finished work is never lost. TUI header shows `~MB/agent` + free MB in auto mode and a `GUARD` chip (+ kill count) in manual mode. - **CheckStep judges are now kanban cards** (TUI + web). New `CheckRun` slice (`OrchestratorState.checkRuns`, persisted to `manifest.checkRuns`): one entry per check visit with phase `judging → done/error`, the chosen outcome label, `fromJudge` flag (yellow `DEFAULT:` badge when the fallback fired), judge model, resolved condition, reason and live last-log. The maxRuns-exceeded forced default is its own DONE card instead of an invisible skip. TUI: `judge:` cards in `RunKanban` (`theme.ai` while judging); web: new `CheckRunPill` molecule. - **`--no-docker`** — neutral CI spelling of `--yolo` (accepted everywhere `--yolo` is: re-exec gate, `--web` gate, credentials warning). - **`docs/ci.md` + `docs/ci.pt-BR.md`** — running huu in CI without Docker: GitHub Actions and GitLab CI recipes for `huu auto` (npm install, secrets, `fetch-depth: 0` for history-scanning audits, `.huu/audits/**` artifacts, exit-code gating, dynamic per-file config via `git ls-files`, concurrency guidance for small runners). Linked from the READMEs and `docs/README.md`. - **Judge gate on the five report-only audits.** Docs/Quality/Performance/Refactor/Security now END with a `N. Validate report` CheckStep (shared `reportJudgeCondition()` helper in `knowledge-protocol.ts`: sections complete, summary counts match the FAQ, ordering correct, report-only contract held) looping `rework` back to consolidation (maxRuns 2, `approved` is the default outcome) plus a terminal `Finalize report` stamp step. New `registry.test.ts` guards the contract (schema round-trip incl. topology, judge shape, REPORT-ONLY marker, caps). ### Changed - **`huu Agent Knowledge` replaced by `huu Knowledge System`** — the knowledge pipeline became the most ambitious bundled default: it builds the FULL knowledge-skills system on a shared `.huu/knowledge/` blackboard (atlas → parallel per-file findings → ONE synthesis step that also writes the routing ground truth BEFORE any skill exists → per-topic dossiers → skills materialized by template transformation in judge-bounded batches of ≤3 → meta-skills + LEARNINGS + router-aware routing surface → blind routing eval with a description-sharpening rework loop). Engineered for quality on SMALL models: one cognitive operation per step, mechanical judge conditions, stub-safe forward defaults, verbatim-copy scaffolding. Bootstrap never overwrites: an existing `pipelines/huu-agent-knowledge.pipeline.json` keeps working on the old design — delete it if unwanted; the new `huu-knowledge-system.pipeline.json` materializes on next start. - **All 7 default pipelines redesigned on cited, current methodology** (delete `pipelines/.pipeline.json` to re-materialize — bootstrap never overwrites): - *Security Audit*: OWASP Top 10 **2021 → 2025** (new A03 Software Supply Chain Failures and A10 Mishandling of Exceptional Conditions; SSRF folded into A01), CWE Top 25 **2025**, NEW step "Supply chain & CI posture" (SLSA v1.2 / OpenSSF Scorecard informed: lockfile + SHA pinning, `pull_request_target` pwn-request detection, workflow `permissions:`, `curl | bash`, binary artifacts), gitleaks v8.19+ `git`/`dir` subcommands, `semgrep scan`, osv-scanner v2 `scan source`, path-traversal patterns. - *Quality Audit*: NEW step "Hotspot analysis (churn × complexity)" (Tornhill/CodeScene git-log mining), cognitive-complexity scoring rules + thresholds (Sonar S3776 = 15) alongside cyclomatic, hotspot-weighted refactor-first ranking. - *Performance Audit*: explicit INP lab caveat (Lighthouse can't measure INP — TBT is the lab proxy; field/CrUX required), p75 framing, unbounded-concurrency and missing-caching/extraneous-fetching patterns. - *Docs Audit*: classification via the Diátaxis compass; README rubric grounded in standard-readme required sections + assessment-badge evidence (ICSE 2018). - *Refactor Plan*: top-5 target ranking is now smell-weight × churn; characterization-test rationale (Feathers) spelled out. - *Test Suite*: assertion-quality rules that survive mutation testing (behavior not implementation, no change-detector/snapshot-only tests), determinism ruleset (no sleeps/network, frozen clocks, fixed seeds, hermetic tests), optional Stryker/mutmut/PIT follow-up documented in `huu-tests.md`. - *Agent Knowledge*: agentskills.io spec details (optional frontmatter fields, 3-level progressive disclosure, scripts-over-prose guidance). - TUI `+`/`-` keys and web `run.setConcurrency` now PIN manual mode (auto-scale re-enabled with `A` / the web toggle). `--auto-scale` is deprecated (now the default; still accepted — with `--concurrency` it forces auto mode with that seed). - `AutoScaleStatus` gains `mode`, `observedAgentMemoryMb`, `ramAvailableMb`, `guardKillCount`; `OrchestratorState.autoScale` is now always present. - Positioning across READMEs, MANIFESTO, docs and package description: huu designs pipelines that make thinking agents follow a deterministic process — audits, test generation, knowledge extraction and predictable-value processes; NOT a tool for building new features. ### Deprecated - `--auto-scale` flag (auto-scale is now the default), `AgentStatus.killedByAutoScaler`, lifecycle phase `killed_by_autoscaler` (kept so old manifests still parse; no longer produced). ### Fixed - **macOS: the orchestrator could never spawn agents on a warmed-up host.** The resource monitor's host fallback used `os.freemem()`, which on macOS counts only truly-free pages (file cache excluded) — on any Mac in normal use that saturates `ramPercent` ≥95%, permanently gating `AutoScaler.shouldSpawn()` (runs sat forever with every card pending and $0 spent) and spuriously arming the memory guard. The darwin path now derives available memory from `vm_stat` reclaimable pages (free + inactive + purgeable + speculative — the macOS analogue of Linux's `MemAvailable`), cached 500ms, with `os.freemem()` as last resort. Also un-hung the orchestrator integration suite on macOS (16 chronic test timeouts → green, 42s → 6s). - Guard-killed agents no longer park as `killed_by_autoscaler` errors in the DONE column — the card visibly returns to TODO with its requeue counter. - Stale `killedByAutoScaler` status flag could swallow a requeued task's later genuine failure (silent drop — never retried, never counted, never marked error). Replaced by a consumable kill-marker set; regression-tested in `requeue.test.ts`. - Auto-scaler active-agent accounting no longer inflates on retry/final-fail (it skewed the observed per-agent memory estimate). - `npm run build && npm test` no longer runs the compiled `dist/**/*.test.js` twins in parallel with `src/` (vitest 4 dropped the dist exclude; the duplicated native-shim tests raced each other on port 3000). New `vitest.config.ts`. ## [1.3.0] - 2026-06-10 ### Added - **`Pipeline.integrationModelId`** — pipeline-level model override for the merge/integration agent (the conflict resolver that runs between stages). Falls back to the run's global model. Editable in the TUI pipeline editor (`T` → "Integration agent model", backed by the model selector) and in the web pipeline editor; documented in `docs/pipeline-json-guide.md`. - **Merge cards on the kanban** — every stage visit now creates a `StageIntegration` entry (`OrchestratorState.stageIntegrations`, persisted to `manifest.stageIntegrations`) that both dashboards render as a display-only card flowing TODO → DOING → DONE (`pending → merging → conflict_resolving → done/error/skipped`), with live last-log, branches/conflicts counts, elapsed time and the effective integration model. The UI no longer looks frozen during `status: integrating`. TUI: `RunKanban`; web: new `IntegrationPill` molecule in `KanbanBoard`. The `conflict_resolving` state uses the AI color token (`theme.ai`); the deterministic merge stays cyan. - **`huu Agent Knowledge` default pipeline** — studies the project progressively (recon → per-file deep study → topic synthesis, accumulating into `.huu/knowledge/atlas.md` + `findings.json`) and compiles the knowledge into Agent Skills under `.agents/skills/` following the [agentskills.io](https://agentskills.io/specification) spec: one skill per topic plus a `project-knowledge` router skill that any future agent loads first. A check step validates frontmatter/naming/router coverage and loops back to the materialize step on `rework` (max 3 runs; `approved` is the default outcome). Setup pipeline — mutates the repo by design. - **`MANIFESTO.md` + `MANIFESTO.en.md`** — the project thesis ("deterministic in method, not in result", BSP over git), including an honest prior-art counterpoint section. Linked from both READMEs. ### Changed - **Progressive knowledge protocol across the six bundled defaults** (new shared `src/lib/default-pipelines/knowledge-protocol.ts`): project-scope steps that previously acted blind now read the run's findings JSON before acting and append after (re-read + dedupe, append-only); findings gain optional `priority`/`fixability` fields that the final consolidation steps use to order recommendations; audit bootstraps gain a `.gitignore` persistence check (a committed `.huu/` line is rewritten to `.huu/*` + `!.huu/audits/` so reports survive the stage merge — previously they were silently dropped). Test Suite: step 2 records its 3 picks as a `category:"selection"` entry, step 3 builds on it instead of re-testing, step 4 closes the loop with a `category:"run-summary"` entry. Note: pipeline bootstrap is skip-if-exists, so already-materialized `pipelines/*.json` keep their old prompts — delete the file to re-materialize. - `ensureGitignored()` now treats an existing `dir/*` line as satisfying `dir/`, so it no longer re-appends `.huu/` after a user adopts the negation pattern. ### Fixed - `scripts/smoke-image.sh` / `scripts/smoke-pipeline.sh` now work on macOS with Docker Desktop/colima: the scratch repo moves from `mktemp -d` (which lands in `/var/folders/…`, outside the Docker VM's file sharing, so the bind mount arrived empty) to a `.smoke-tmp/` dir inside the huu repo itself (override with `HUU_SMOKE_TMPDIR`), and the path is normalized before being used as a `-v`/`-w` target (an unresolved `..` made git try to create the leading directories inside the container and fail with `Permission denied`). - `scripts/smoke-pipeline.sh` drove `huu --stub run pipeline.json` expecting it to finish without keyboard input — but `huu run` is interactive by contract (it opens the pipeline editor and waits for `G`), so the smoke hung forever at the editor screen. It now drives the headless `huu auto pipeline.json --config config.json` path (stub backend) and asserts the final stdout JSON has `"ok": true` instead of the TUI-only `wait_until_exit_resolved` log event. - **Guaranteed add/add merge conflict on `.env.huu` in fresh repos** — agent worktrees check out the *committed* `.gitignore`, so in any repo that hadn't committed the huu entries, every parallel agent committed its own `.env.huu`/`.huu-bin` (different ports → different content) and every stage merge conflicted (failing the run without a resolver, or burning an LLM call on a junk file with one). The orchestrator now writes these runtime-only paths to `.git/info/exclude` (shared by all worktrees, never touches tracked files). Found by exercising the new merge cards in a scratch repo. - `scripts/smoke-dashboard.tsx` was broken since the backend registry refactor (imported the long-gone `orchestrator/stub-agent.js` and didn't pass `backend: 'stub'`, so the OpenRouter key probe 401-ed the run). - `portAllocation` was silently stripped from pipelines on import/export round-trips (missing from the Zod schema). - `huu Security Audit` step 5 told the agent to add a README badge while its own HARD RULES forbid it; `huu Quality Audit` step 1 allowed `package.json` devDeps additions against its own report-only rule; quality/performance step-5 names still said "+ badge". All report-only contracts are now consistent. - Stray `console.*` output and Node deprecation warnings raised mid-run are now routed into the `LogArea` instead of printing above the kanban and corrupting the Ink layout. ## [1.2.0] - 2026-05-21 ### Added - **`huu auto --config `** — ONE-COMMAND headless pipeline run. Drives the same `Orchestrator` the TUI uses but without Ink: parses the pipeline + a small config JSON (`modelId`, `backend`, per-step `files` override, optional timeouts/retries/concurrency), resolves the API key via the existing `resolveApiKey` chain, runs `await orch.start()` and exits 0 / 1 based on `manifest.status`. NDJSON progress events stream to stderr (throttled ~250 ms); ONE final JSON object lands on stdout (`{ ok, runId, integrationBranch, status, totalCost, durationMs, filesModified, agents[] }`) so `huu auto … | jq .runId` works. Inherits the auto-MTU docker network from 1.1.0 — works in VPN out of the box. Unblocks CI/cron use cases, demos, and unattended overnight runs. - `src/lib/run-config.ts` — zod-validated `RunConfig` schema + `loadRunConfig(path)` + `applyRunConfig(pipeline, config) → { pipeline, warnings }`. The `files` map matches step names; mismatched keys emit warnings instead of failing so typos are surfaced without blocking. - `src/lib/headless-run.ts` — `runHeadless({ pipeline, config, cwd, agentFactory, conflictResolverFactory, concurrency, emitIntervalMs })`. Reusable from scripts and the new CLI subcommand. ### Verification End-to-end smoke against `/home/ondokai/Projects/integracao-vael` with `huu Test Suite` pipeline + config injecting one file into step 3: real `minimax/minimax-m2.7` agent ran inside the auto-MTU docker network, committed `huu-tests.md` to the integration branch — deterministic success marker (per step 1's prompt: "always writes huu-tests.md at repo root"). ## [1.1.0] - 2026-05-21 ### Added - **Auto MTU-aware docker network** — the headline fix of this release. At wrapper start, `detectDefaultRouteMtu()` reads the MTU of the host's default-route interface (Linux only, parsing `ip route get 1.1.1.1` + `/sys/class/net//mtu`); when it's below 1500 — typical of VPN tunnels (WireGuard 1420, Tailscale 1280, OpenVPN ~1500-overhead) — `ensureHuuDockerNetwork(mtu)` idempotently creates a docker bridge `huu-net-mtu` with the matching MTU and pins the container to it via `--network=`. No env var, no `/etc/docker/daemon.json` edit, no `--network=host` (which would break the port-isolation netns). Works whether the user is on a VPN or not — falls back cleanly to the default bridge when the route MTU is 1500 or detection isn't possible (macOS / Windows / Docker-Desktop VMs). The pre-1.1.0 behavior reproduced exactly the symptom in run `dtv2feyz`: 43 agents, all `tokens +0in +0out`, "Request timed out" on every retry — because the docker0 bridge (1500) was larger than a 1280-MTU VPN tunnel and silently dropped every TLS ClientHello. Post-fix on the same machine: pi agent against `minimax/minimax-m2.7` returns `+1786in +17out tokens` in 2.6 s. - `HUU_DOCKER_NETWORK` env var, forwarded verbatim as `docker run --network=`. Use case: explicit override of the auto-detection — force `host`, use a pre-existing user-managed network, or pin a name during testing. Auto-detection still runs when this is unset. - Run-start network reachability probe (`checkOpenRouterReachable` in `src/lib/openrouter.ts`) that hits `/auth/key` with an 8 s `AbortController` before any agent spawns. Defense-in-depth backstop for exotic setups (outbound firewall, proxy misconfig) where MTU detection can't help. On `unreachable` inside a container, the error message includes copy-paste-friendly hints mentioning `HUU_DOCKER_NETWORK=host` and the `/etc/docker/daemon.json` MTU edit. Wired into `Orchestrator.start()`, gated on `config.backend === 'pi'` so stub / copilot runs aren't affected. ### Fixed - Last pocket of Portuguese strings in error paths — `OpenRouter API key ausente. Defina OPENROUTER_API_KEY.` → `… missing. Set …`, `Model ID ausente.` → `Model ID missing.` (in both `pi/factory.ts` and `copilot/factory.ts`), and the project-recon "API key missing" log. The v1.1.0 release is now fully English in user-visible surfaces. - Earlier in this cycle (folded in here): the v1.0.1 "English everywhere" pass had missed the welcome menu entries (`Assistente de pipeline`, `FAQ — perguntas frequentes`), every Pipeline Assistant stage (`pensando…`, `cancelar`, `enviar`, status line, free-text prompt, error screen), Project Recon (`Análise do projeto`, `Selecionando o que investigar`, `Falha no seletor`, `processos em paralelo`, `concluídos`), the Pipeline Editor per-card-timeout copy, Model Selector subtitle and legend, model catalog descriptions, and the `agent-env.ts` port-allocation prompt fed into agents. - `example.pipeline.json` and `example.conditional.pipeline.json` translated to English so the on-disk samples match the README quick-start. - README: Node badge bumped 18 → 20 (matches `engines.node`); `HUU_IMAGE` pin example refreshed; embedded `example.pipeline.json` snippet retranslated to English; new "On a VPN?" section in **Run with Docker** documenting the auto MTU network and override env var. - `Dockerfile` builder stage copies the `webui/` workspace (`webui/package.json` before `npm ci`, full `webui/` tree before `npm run build`). Without this, `docker build` errored with `npm error No workspaces found: --workspace=webui` after the workspace was added at repo root in v1.0.1. - webui workspace's strict-mode TypeScript build now passes: `domain-types.PromptStep` narrowed via `Extract<…, { prompt: string }>` so editor components keep accessing `prompt` / `files` type-safely; `PipelineCard` narrows `'files' in step`; `PipelineEditorPage` splits / recomposes work-vs-check steps at the boundary; new `webui/src/pages/FaqPage.tsx` + Router case satisfy the screen-FSM exhaustiveness check; unused `CheckOutcome` import removed from `pipeline-io.ts`. ## [1.0.2] - 2026-05-21 ### Fixed - Unblock `npm run build` by fixing the webui workspace's strict TypeScript pass: `domain-types.PromptStep` is now narrowed to the work-step shape (via `Extract<…, { prompt: string }>`) so `StepRow` / `StepEditor` keep accessing `prompt` and `files` type-safely. `PipelineCard` narrows `'files' in step` before summing, and `PipelineEditorPage` splits/recomposes work-vs-check steps at the boundary (check steps round-trip untouched on save). Drops the unused `CheckOutcome` import in `src/lib/pipeline-io.ts`. - `Dockerfile` builder stage now copies the `webui/` workspace (`webui/package.json` before `npm ci`, full `webui/` tree before `npm run build`). Previously the workspace was added at the repo root but the Dockerfile only copied `src/`, so `docker build` errored with `npm error No workspaces found: --workspace=webui` once the build script started invoking `npm run build -w webui`. ### Added - `webui/src/pages/FaqPage.tsx` + Router case for `screen.kind === 'faq'` — the FSM exposed an FAQ kind with no web-side counterpart, breaking the exhaustiveness check. The new page mirrors the TUI FAQ content and dispatches `faq.back` to return to the welcome screen. ## [1.0.1] - 2026-05-21 ### Added - **Five new bundled default pipelines** materialized by `pipeline-bootstrap` on first run, all framework-agnostic and report-only: - `huu Docs Audit` — Diátaxis classification + README quality scorecard + staleness scan + API-doc coverage. - `huu Quality Audit` — SonarSource-style cyclomatic / cognitive complexity, function/file size, parameter count, nesting depth, duplication (jscpd) and dead-code (depcheck / vulture / staticcheck). - `huu Performance Audit` — static N+1 / big-O / sync-I/O / memory hotspot scan plus Core Web Vitals (Lighthouse-CI) for frontends and Brendan Gregg's USE checklist for backends/CLIs. - `huu Refactor Plan` — Fowler smell catalog + Mikado-graph plan per target + Strangler-Fig hint, no code changes. - `huu Security Audit` — OWASP Top 10:2021 per-file scan (semgrep when available), gitleaks secret sweep, dependency CVE scan (npm audit / pip-audit / cargo audit / govulncheck / osv-scanner), CWE Top 25:2024-aligned remediation roadmap. - `src/lib/default-pipelines/registry.ts` — single source of truth for the bundled catalog, consumed by `ensureAllDefaultPipelines()` in `pipeline-bootstrap`. - `src/app.tsx` mount-time `useEffect` calls `ensureAllDefaultPipelines(repoRoot)` so the catalog actually materializes for end users (the bootstrap was previously dead code: callable from tests only). - `scripts/smoke-defaults.sh` — verifies all 6 bundled defaults materialize, parse cleanly, and are idempotent on re-run. Run after `npx tsc` (or full `npm run build`). - Registry-iterating test guards in `src/lib/pipeline-bootstrap.test.ts`: JSON-vs-TS drift (modulo `exportedAt`), exactly-one-default per CheckStep, no `$file` token in project-scope step prompts. - Bundled-pipelines section in `README.md` and `docs/pipeline-json-guide.md` describing the strict report-only contract and the fan-out cap. - Per-file step prompts in all 5 audit pipelines now carry an explicit SCOPE NOTE + SKIP RULE (skip `node_modules/`, `dist/`, `build/`, `vendor/`, `*.generated.*`, `*.d.ts`, lock/snapshot files, etc.) so users don't blow through `Pipeline.maxNodeExecutions` on generated trees. - `PARALLEL_RULE_SHORT` exported from `src/lib/assistant-prompts.ts` and reused by the test, replacing a brittle inline regex. - **Welcome-screen UI** — the `huu` wordmark now renders as 3D ASCII art, and the FAQ screen is reachable from the welcome menu via `?`. ### Changed - **English everywhere.** The entire app now communicates in English: bundled pipeline prompts, the Pipeline Assistant interview prompt, the project-recon catalog and selector, file-suggestion prompt, FAQ screen, error messages, project-digest banners, assistant stubs (`assistant-client.ts`, `assistant-check-feasibility.ts`, `assistant-schema.ts`), and the root `AGENTS.md` (a.k.a. `CLAUDE.md` symlink). Old example pipelines `demo-rapida.pipeline.json` and `testes-seguranca.pipeline.json` renamed to `demo-quick.pipeline.json` and `security-tests.pipeline.json` with translated bodies. - **Audit pipelines are now strict report-only.** They write exclusively to `.huu/audits/.md` and `.huu/audits/-faq.json` (working files under `.huu/audits/.tmp/`). They no longer mutate `README.md` (badges removed), `package.json`, lockfiles, or any production source. Tool installs are ephemeral only — `npx --yes`, `pipx run`, or vendored binaries under `$HOME/.huu/bin/`. Only `huu Test Suite` still touches production state (writes `huu-tests.md` + a tests badge — by design, it is a setup pipeline). - `huu Refactor Plan` step 4 renamed and re-framed: it produces a STATIC Mikado-style dependency graph (we can't try-and-revert in report-only mode), and the report now states this honestly instead of implying empirical Mikado. - `ensureDefaultPipeline` retained as a back-compat thin wrapper; new entry point is `ensureAllDefaultPipelines` which iterates the catalog and is idempotent per default. ### Removed - `huu Test Suite` CheckStep "3.5 All tests green?" — the gate looped `failing → step 2` ("Test 3 representative files"), but failures actually surface from step 3 (per-file). The gate added LLM-judge cost without addressing the root cause; step 4's existing 3-iteration delete-failing-blocks logic is the correct circuit-breaker. - `huu Security Audit` CheckStep "4.5 Critical findings present?" — both outcomes pointed to step 5, making it a no-op LLM call. ## [1.0.0] - 2026-05-20 ### Added - `huu --web` — alternate browser UI that mirrors the TUI 1:1 with a click-driven layout (Atomic Design + Tailwind). Real-time updates via WebSocket. Bind: 127.0.0.1 + UUID token. Companion flags: `--web-port=`, `--no-open`, env `HUU_WEB_NO_OPEN=1`. Phase 1 requires `--yolo` (Docker port-publishing pending). - `src/lib/screen-fsm.ts` — pure FSM extracted from `src/app.tsx`; consumed by both the Ink TUI and the new web session. - `webui/` workspace — Vite + React + TypeScript + Tailwind front-end (Atomic Design: 11 atoms, 10 molecules, 9 organisms, 3 templates, 15 pages). - `scripts/smoke-web.sh` — fast-port smoke test for the web mode. - **Conditional pipeline steps with LLM-judged routing.** Pipelines can now include `CheckStep` nodes — decision points whose verdict is produced by an LLM judge agent with full shell access running in the integration worktree. Checks evaluate a natural-language `condition` (with `$runs` token substitution for iteration counting) and route to one of their declared `outcomes`, enabling forward jumps, loops back to earlier steps, and branching. **The integration worktree is never rewound** — loops re-execute the target step on top of the current HEAD, accumulating commits. Schema bumped to `huu-pipeline-v2` (v1 still accepted; the `type` field is optional on work steps for back-compat). New safety nets: `Pipeline.maxNodeExecutions` (default 50) caps total node visits per run; `CheckStep.maxRuns` (default 5) caps per-check revisits; the `default: true` outcome (exactly one required) fires on judge failure / unknown label / cap overflow. New files: `src/orchestrator/check-evaluator.ts` (judge spawner, reserved agentId 9998), `src/lib/assistant-check-feasibility.ts` (setup-time LangChain feasibility analysis producing an `instructionDraft`), `src/ui/components/CheckStepEditor.tsx` (TUI subform with embedded `OutcomesEditor`, `theme.ai` magenta). PipelineEditor gains the `C` shortcut for new check steps. Schema (`pipeline-io.ts`) gains topology validation: unique names, reference validity, default-outcome presence. Orchestrator's linear stage loop replaced with a graph cursor state machine; `RunManifest` gains `executionTrace`. Full reference in [`docs/pipeline-json-guide.md`](docs/pipeline-json-guide.md#conditional-steps-check-nodes); example: [`example.conditional.pipeline.json`](example.conditional.pipeline.json). ### Internal - Extracted `StateCoalescer` (`src/web/orchestrator-bridge.ts`) — reusable 8 Hz state coalescer for the WS broadcast path. - `Screen['model-selector']` now carries `backendKind`, replacing the front-end's hardcoded `'pi'` assumption in `ModelSelectorPage`. ### Added (Docker / persistence) - **Saved pipelines now survive `docker run --rm`.** The wrapper (`src/lib/docker-reexec.ts`) bind-mounts the host's `~/.huu` (and `~/Downloads` when present) into the container at the same absolute path, and forwards `HUU_HOST_HOME` so the in-container code resolves pipeline memory, global pipelines, model recents, and the default export Downloads target to the host filesystem. Before this, every "Save pipeline" inside Docker wrote to the container's ephemeral `$HOME` and was lost on exit; the saved-pipelines list reopened empty. A new helper `src/lib/huu-home.ts::getHuuHome()` reads `HUU_HOST_HOME` and falls back to `homedir()` for native runs (`--yolo`, `HUU_NO_DOCKER`, native-only subcommands) — no behavior change there. `compose.yaml` and the `huu init-docker` scaffold templates (`compose.huu.yaml`, `scripts/huu-docker`) mirror the new mount + env so all entry points are consistent. ### Fixed - **Agent execution and observability hardening.** Surgical fixes across the orchestrator, backends, git layer, and logging stack to close gaps that silently corrupted state or hid failures from operators: - `Orchestrator.abort()` now tracks each `agent.dispose()` promise and `start()`'s `finally` block awaits all in-flight `dispose`/`finalize` work with a 5s grace period before resolving — previously the `void agent.dispose()` fire-and-forget let the run "complete" while subprocess teardown raced with the next run's worktree creation. - `finalizeAgent` is now wrapped in a tracked promise with a `.catch` that surfaces unhandled errors instead of yielding a silent process-1 exit. `dlog` calls now bracket `git.hasChanges`, `commit`, and worktree removal so post-mortem can tell where finalize broke. - Integration worktree teardown moved into `start()`'s `finally` block, eliminating the orphan worktree+branch that previously leaked when a stage threw mid-run. - `WorktreeManager.createIntegrationWorktree` / `createAgentWorktree` now roll back the orphan branch when `git worktree add` fails. Without this, a retried run hit "branch already exists" forever. - `GitClient.merge()` returns an explicit `error` field carrying the underlying git stderr — previously a failed conflict-probe was smuggled into `conflicts[]`, where callers treated the error string as a file path and tried to spawn an LLM resolver against it. - `mergeAgentBranches` emits per-branch `dlog` entries (`merge.attempt`, `merge.ok`, `merge.conflict`, `merge.stage_end`) so "which branch introduced the conflict?" is answerable post-run. - `lib/openrouter.ts` capability cache is now keyed by API key. The previous global cache silently served keyA's view of the model catalog to keyB on multi-tenant / BYOK swaps. - `lib/active-run-sentinel.ts` records the writer PID alongside the cwd; new `probeActiveRunLiveness()` answers "is that PID still alive?" so stale sentinels (process killed before exit handler ran) can be detected. Format is forward-compatible with the legacy single-line cwd reader the HEALTHCHECK shell uses. - `lib/debug-logger.ts` now redacts API-key-shaped substrings (`sk-or-`, `sk-ant-`, `ghp_`, Bearer headers) before any structured field reaches disk. `lib/run-logger.ts` applies the same redaction when rendering the chronological + per-agent log files. - `LogEntry` gained `runId`, `stageIndex`, `stageName`, `kind` (`'orchestrator' | 'integrator' | 'worker' | 'system'`). Every orchestrator-emitted log line is enriched with current run/stage context — log aggregation can finally pivot across runs. - New `usage` AgentEvent variant carries structured token / cost telemetry. Both pi and copilot mappers emit it alongside the human-readable log line; the orchestrator accumulates it into `AgentStatus.tokensIn/Out/cacheRead/cacheWrite/cost`. Per-agent token reporting in the run log is no longer always zero. - `safe-terminal.ts` now emits a structured `signal.safe_exit` / `error.safe_uncaught` debug event with a process snapshot (active handles, RSS, uptime) before exit. Diagnosing "what was huu doing when SIGINT hit?" no longer requires guessing. - **Running `huu` from inside a git worktree (or a subdirectory of a repo) no longer fails with "not a git repository" after the Docker pull.** The wrapper now runs a host-side git preflight (`lib/git-preflight.ts`) BEFORE re-execing into the container, so a missing repo fails fast without a wasted image pull. When the preflight detects that `--git-common-dir` (worktree case) or `--show-toplevel` (subdirectory case) live outside cwd, those paths are added as additional same-path bind mounts so the worktree's `.git` file resolves identically inside the container. The in-container `ensureGitRepoOrExit` remains as a defensive backup for `--yolo`/native runs. ### Added - **Recommended model catalog refreshed + Artificial Analysis enrichment in the picker.** - Removed `deepseek/deepseek-v3.2`. Added `deepseek/deepseek-v4-pro`, `deepseek/deepseek-v4-flash`, and `xiaomi/mimo-v2.5-pro`. Existing entries (`minimax/minimax-m2.7`, `moonshotai/kimi-k2.6`, `z-ai/glm-5.1`, `google/gemini-3.1-pro-preview`, `openai/gpt-5.4-mini`, `openai/gpt-5.4`) were retained. - Each catalog entry now carries `description`, `bestFor` (use-case tags: `coding` / `reasoning` / `agentic` / `fast` / `cheap` / `general`), and `tier` (`flagship` / `workhorse` / `fast`). The fields are optional on the schema for retrocompatibility but populated for every recommended entry. - The quick model picker renders a fixed-width table — `Model · tok/s · Agnt · Code · Razn · $in/$out · BestFor` — with metrics pulled from Artificial Analysis when `ARTIFICIAL_ANALYSIS_API_KEY` is set. Without a key the columns degrade to `—` placeholders without blocking selection. "More models..." also forwards the AA key so both views share the data source. - The pipeline-assistant prompt now lists each model's description and bestFor tags inline, plus a "modelo recomendado por cenário" matrix so the assistant picks `modelId` per step against the scenario rather than a flat preference list. - **Project recon stage in the pipeline assistant.** Before the assistant starts asking questions, it now fires four MiniMax M2.7 agents in parallel — `stack`, `structure`, `libraries`, and `conventions` — each with its own loader and its own focused mission. Their findings are aggregated into a "Contexto do projeto" section injected into the assistant's system prompt, so the interview can ask project-specific questions instead of generic ones. - Errors are isolated per agent: if one times out or fails to parse, the other three still complete and the assistant proceeds with whatever context survived. - Stub mode (`apiKey === "stub"` / `HUU_LANGCHAIN_STUB=1`) returns canned bullets so smoke tests never touch the network. - `ESC` during recon goes back to the intent prompt (no confirm dialog — there's no user work to lose). ## [0.3.1] - 2026-04-29 ### Added - **Step `scope` field.** Each pipeline step now declares whether it runs on the **whole project** (`scope: "project"`), **once per file** (`scope: "per-file"`), or is left **flexible** (`scope: "flexible"`, also the default when `scope` is omitted). The Step Editor shows a Scope row above Files; cycle with `ENTER` or jump with `P`/`F`/`X`. - `project` locks the Files row to "whole project" — `F`/`W` are disabled and pressing `ENTER` is a no-op. - `per-file` makes file selection mandatory, and pressing `ENTER` on the Files row opens the picker (in addition to `F`). - `flexible` keeps the previous behavior (`F` to pick, `W` for whole project). - Loading older `huu-pipeline-v1` JSON without the `scope` field is fully back-compatible — those steps behave as `flexible`. ## [0.3.0] - 2026-04-29 Initial public release. Available on npm as `huu-pipe` (`npm install -g huu-pipe`) and as a container image at `ghcr.io/frederico-kluser/huu:latest`. ### Features - **Auto-Docker re-exec.** Typing `huu` in any folder transparently mounts that folder into the official container and runs there — the LLM agent never sees host-side `~/.ssh`, `~/.aws`, or `~/.npmrc` tokens. Set `HUU_NO_DOCKER=1` for native execution (development). - **Subcommands:** `huu run`, `huu init-docker`, `huu status`, `huu prune`. - **Bundled reference pipelines** at `$HUU_COOKBOOK_DIR` (`/opt/huu/cookbook/`) — usable without cloning the repo. - **Configurable via** `HUU_IMAGE`, `HUU_NO_DOCKER`, `HUU_DOCKER_PASS_ENV`, `HUU_WORKTREE_BASE`, `OPENROUTER_API_KEY_FILE`. ### Security - `OPENROUTER_API_KEY` delivered via bind-mounted file at `/run/secrets/openrouter_api_key` (mode `0600`); never appears in `docker inspect` or `ps auxf`. - Container UID/GID matched to host user via `--user "$(id -u):$(id -g)"`. - `safe.directory '*'` set system-wide in the image. [Unreleased]: https://github.com/frederico-kluser/huu/compare/v5.1.0...HEAD [5.1.0]: https://github.com/frederico-kluser/huu/compare/v5.0.0...v5.1.0 [5.0.0]: https://github.com/frederico-kluser/huu/compare/v4.0.0...v5.0.0 [4.0.0]: https://github.com/frederico-kluser/huu/compare/v3.1.0...v4.0.0 [3.1.0]: https://github.com/frederico-kluser/huu/compare/v3.0.0...v3.1.0 [3.0.0]: https://github.com/frederico-kluser/huu/compare/v2.1.0...v3.0.0 [2.1.0]: https://github.com/frederico-kluser/huu/compare/v2.0.0...v2.1.0 [2.0.0]: https://github.com/frederico-kluser/huu/compare/v1.4.0...v2.0.0 [1.4.0]: https://github.com/frederico-kluser/huu/compare/v1.3.0...v1.4.0 [1.3.0]: https://github.com/frederico-kluser/huu/compare/v1.2.0...v1.3.0 [1.2.0]: https://github.com/frederico-kluser/huu/compare/v1.1.0...v1.2.0 [1.1.0]: https://github.com/frederico-kluser/huu/compare/v1.0.2...v1.1.0 [1.0.2]: https://github.com/frederico-kluser/huu/compare/v1.0.1...v1.0.2 [1.0.1]: https://github.com/frederico-kluser/huu/releases/tag/v1.0.1 [0.3.1]: https://github.com/frederico-kluser/huu/releases/tag/v0.3.1 [0.3.0]: https://github.com/frederico-kluser/huu/releases/tag/v0.3.0