# Tuffgal product requirements (v1) > Status: **Approved · P1–P6 complete · v1.0.0 stabilization in progress** · Last updated 2026-06-05 This document captures the design intent and scope decisions behind Tuffgal v1. It is the source-of-record for what the product _is_ and what it deliberately _isn't_. The source code is the source-of-truth for what currently works. This document explains why it works how it does. ## TL;DR Tuffgal is a JSON-driven visual-regression harness for any web app a Playwright browser can render. **Declarative actions + stories** authored in JSON run against the real running app. Every action ends with a screenshot that becomes a **committable visual library**, so the regression net is the artifact, not a coverage percentage. When a screenshot changes, a human looks at the diff and decides. Same Playwright substrate as `@playwright/test`. Same stability primitives such as locator-first, masking, and route intercepts. What's different is **there is no test code per scenario**. A team adds Tuffgal to a project, writes 5–10 reusable actions, chains them into stories, and gets a regression net the same day. **Spoilers:** AI fuzzy matching + self-healing will be the eventual differentiator. For now, v1 is a rock-solid, declarative visual regression tool that doesn't make you write test code. ## The problem Today's testing landscape forces a binary choice: **Write component tests using Vitest, RTL, or Jest.** They're fast and isolated, but mock reality. They provide false confidence because they can pass silently when the rendered product breaks. **Write e2e tests using Playwright or Cypress.** It's a real browser + DOM, but every scenario is bespoke TypeScript. Authoring cost is high. The resulting test suites are write-only and flake under animation, hydration, and async data races. Tuffgal sits between them. Declarative JSON actions + stories are authored once and parameterized at use sites. They run against the real app. Visual diffs catch what assertions can't enumerate. Authoring is fast enough that adding a new flow can be done in a few minutes. The pilot consumer ([Linklater](https://github.com/nschneble/linklater)) proves this works in production: 24 stories, 21 user-journey flows, and zero flakes across multiple consecutive runs while replacing 462 component tests. ## Goals **Framework agnostic.** Support any browser-renderable web app, including React + Vite, Next.js, Vue, Svelte, and SolidJS. Plus server-rendered stacks such as Ruby on Rails, Vapor/Leaf, Django, and Express/EJS. **Lean harness, rich actions/stories.** The core contains the schema, scheduler, and runner. App-specific logic (e.g. fixtures, dev servers, and the test-mode contract) lives in the consumer project in a config file. **Zero new step primitives at extraction time.** The existing primitives (`click`, `input`, `intercept`, `navigate`, `read`, `scroll`, `type`, `wait`, `waitFor`) are sufficient. New primitives must clear a high bar: a real user-facing scenario the existing set cannot express. **First-class CI story.** GitHub Action for `uses:` with conditional artifact upload for the report and updated baselines. **Example directory.** Should contain at least one runnable recipe per supported tech stack. ## Non-goals (v1) **AI fuzzy matching.** Schema reserves the `position` field and an `AI=1` env hook for LLM fallback, but there's no provider integration yet. Will be the main feature and driving force in a future release. **Hosted SaaS.** OSS only. Cloud runs, dashboards, and team accounts are deferred for a future release. **Native mobile.** React Native, iOS, and Android are out of scope. The Playwright substrate cannot drive them. It's a separate product. **Multi-format authoring.** JSON only. No YAML, no TypeScript DSLs. **Sub-Playwright substrate swap.** Tuffgal is built on Playwright library mode. Consider WebDriver or Puppeteer adapters in a future release. **Browser breadth.** Chromium only for now. Firefox and WebKit deferred to a future release, or until a consumer requires them. ## Users + scenarios ### Primary persona: "Pragmatic full-stack engineer" Owns or maintains a small-to-medium web app. 10 screens, 20 flows. Wants confidence that UI changes don't regress without writing a 1,000 lines of test code. Already uses Playwright or Cypress and finds them costly to maintain. ### Adoption scenarios | Scenario | Tuffgal value | | ------------------------------------- | ---------------------------------------------------------------------------------------- | | New project, no tests yet | Author 5–10 actions, chain into stories. Get a regression net the same day | | Existing React app w/ component tests | Delete component tests covered by Tuffgal stories. Keep utility/hook unit tests | | Server-rendered Rails app | Same JSON actions. Replace the "no component test" gap with cross-flow visual regression | | CI adoption | Drop in the GitHub Action. Commit baselines. PR comments on visual changes | ## Architecture overview Tuffgal core is framework-agnostic because it operates at the HTTP + DOM layer via Playwright. Everything app-specific lives behind four pluggable bridges declared in `tuffgal.config.ts`. ``` ┌─────────────────────────────────────────────────────┐ │ Tuffgal harness │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Schema │ │ Runner │ │ Reporter │ │ CLI │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Steps │ │ Locator │ │ Diff │ │ Trace │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────┘ │ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ ┌──────────────┐ ┌───────────────────┐ ┌──────────────┐ │ DB bridge │ │ Dev-server bridge │ │ App contract │ └──────────────┘ └───────────────────┘ └──────────────┘ │ │ │ └──────────────────┴──────────────────┘ │ ┌───────────────┐ │ Consumer app │ └───────────────┘ ``` ### What stays in core (framework-agnostic) - Schema: `action.ts`, `story.ts`, `result.ts`; zod-validated JSON - Scheduler: DAG topo-sort + cycle detection + parallel workers - Runner: action dispatch, step retry, `expect.anyOf`, masking - Step primitives: `click`, `input`, `intercept`, `navigate`, `read`, `scroll`, `type`, `wait`, `waitFor` - Locator resolver: role+text → role → selector → text precedence (`position` reserved for AI) - Screenshot capture + SSIM + pixelmatch + a11y tree snapshot - Baseline store + approve flow - Reporter: HTML + traces - Coverage collector: V8 (monocart) wrapped (optional) - Clock freeze + storage state persistence + DAG-based label sharing ### What lives in the consumer project - DB reset + fixture callbacks → declared on `config.database` - Dev-server command + health check → declared on `config.devServers` - `actions/`, `stories/`, `baselines/` content → consumer-owned directories - Test-mode contract (env var-driven behavior changes in the app) → consumer documents and implements; Tuffgal supplies a recommended recipe in [`app-contract.md`](app-contract.md) - Flow inventory path → `config.flowInventory` ## Public API ### Config file: `tuffgal.config.ts` The single source of truth at consumer-project root. Full reference in [`config.md`](config.md). ```ts import { defineConfig } from 'tuffgal'; export default defineConfig({ apiHost: 'http://localhost:3000', baseUrl: process.env.APP_BASE_URL ?? 'http://localhost:5173', database: { reset: async () => { /* TRUNCATE, reseed test user */ }, fixtures: { name: async () => { /* idempotent inserts */ }, }, }, defaultTimeoutMs: 10_000, devServers: { command: 'npm run dev:test', healthCheck: [ { url: 'http://localhost:3000', timeoutMs: 60_000 }, { url: 'http://localhost:5173', timeoutMs: 60_000 }, ], }, flowInventory: 'docs/user-journeys.md', paths: { actions: 'tuffgal/actions', baselines: 'tuffgal/baselines', report: 'tuffgal/report', stories: 'tuffgal/stories', }, storageStatePins: ['session_token'], viewport: { width: 1280, height: 800 }, workers: undefined, }); ``` ### CLI > Historical scope sketch. The canonical, complete reference (every flag, exit > codes) lives in [cli.md](cli.md). ```bash npx tuffgal approve [--story ] # accept changed baselines npx tuffgal approve --new-only # accept only new baselines, skip changed npx tuffgal init # scaffold tuffgal.config.ts npx tuffgal run # run all stories npx tuffgal run --coverage # V8 coverage npx tuffgal run --headed # show browser npx tuffgal run --manage-servers # spawn devServers per config npx tuffgal run --story # one story npx tuffgal run --workers # parallelism npx tuffgal supervise # long-running devServers wrapper ``` ### GitHub Action ```yaml - uses: nschneble/tuffgal-action@v0 with: setup-script: test:ui:setup # optional, for DB bootstrap ``` Sibling repo [`nschneble/tuffgal-action`](https://github.com/nschneble/tuffgal-action). Composite action wrapping `tuffgal run --manage-servers` with `results.json` parsing and conditional artifact uploads for the report and updated baselines. See its README for the full inputs and outputs. ### Homebrew (planned, post-v1) ```bash brew install tuffgal/tap/tuffgal tuffgal run ``` Same CLI under a different distribution path. Useful for non-Node projects (e.g. Ruby on Rails) where the team would otherwise install Node just to run tests. Tuffgal would bundle the Node runtime in the formula. Deferred until post-v1 stabilization. ### Programmatic API ```ts import { loadConfig, runAll, type RunResult } from 'tuffgal'; const config = await loadConfig(process.cwd()); const result: RunResult = await runAll(config, { headed: false }); ``` See [`src/index.ts`](../src/index.ts) for more. ## Pluggable bridge design ### DB bridge, callback-based Consumer supplies functions on `config.database`. No driver opinion. No imports of pg/mysql/sqlite/mongo in Tuffgal core. See [`examples/postgres-prisma/`](../examples/postgres-prisma/) for a working Postgres + Prisma recipe. ```ts database: { reset: async () => { /* truncate + reseed */ }, fixtures: { 'user-with-records': async () => { /* insert */ } }, } ``` Tuffgal calls `reset()` once per run (before the scheduler starts) and `fixtures[name]()` per story declaration. The story DAG handles ordering. ### Dev-server bridge Consumer declares the shell command + health check URLs. ```ts devServers: { command: 'npm run dev:test', cwd: '..', // optional healthCheck: [ { url: 'http://localhost:3000', timeoutMs: 60_000 }, { url: 'http://localhost:5173', timeoutMs: 60_000 }, ], shutdownGraceMs: 5_000, // SIGKILL after shutdownSignal: 'SIGTERM', // default } ``` Used by: - `--manage-servers` (one-shot, CI-style) - `tuffgal supervise` (long-running, local iteration-style) See [`supervisor.md`](supervisor.md). ### App contract, test-mode env Tuffgal documents a _recommended_ contract for the consumer app to implement. See [`app-contract.md`](app-contract.md). The important bits: - Set `TUFFGAL=1` (or any chosen env var) when the app runs under the harness - Bypass rate limiters - Return deterministic responses where the production path is non-deterministic (random recommendations, third-party reads) - Skip background jobs (RSS polls, email sends) - Pin clock if the app does any time-driven UI server-side The contract is **not enforced** by Tuffgal. The consumer's app is wholly responsible. Tuffgal supplies route-intercept primitives so consumers can short-circuit non-deterministic endpoints at the browser layer when the server-side contract isn't feasible (e.g. third-party APIs). ### Storage state pins Configurable list of `localStorage` keys Tuffgal persists across stories. ```ts storageStatePins: ['session_token', 'refresh_token'], ``` Set `storageStatePins: []` for session cookie-based apps (e.g. Rails) because cookies auto-persist via Playwright's storage state. The field becomes a no-op for cookie apps. ## Release plan | Phase | Output | Status | | ---------------------------- | ----------------------------------------------------------------------- | ----------- | | **P1: Repo bootstrap** | New repo, license, README, package.json, CI skeleton | ✅ Complete | | **P2: Core extraction** | Move framework-agnostic code, wire up config interface | ✅ Complete | | **P3: Docs + scaffolder** | Docs, `init`, `supervise`, `examples/postgres-prisma` | ✅ Complete | | **P4: Pilot migration** | First consumer fully on Tuffgal, verify parity | ✅ Complete | | **P5: GitHub Action** | Sibling repo + composite action wrapping `tuffgal run --manage-servers` | ✅ Complete | | **P6: v0.1.0-alpha release** | npm publish with provenance, smoke from public install | ✅ Complete | | **P7: v1.0.0** | README polish, additional examples directory, public announce | Planned | | **P8: v1.1.0 (AI)** | LLM fallback in resolver, BYOLLM via `AI_PROVIDER` env | Deferred | ## Open questions / risks **Storage state for cookie-based apps.** Playwright auto-persists cookies via `context.storageState()`. Tuffgal's abstraction layers over this. Needs confirmation from a server-rendered consumer that cookie flows survive label-based storage-state inheritance. **Homebrew formula bundling.** Distributing Node + Chromium via brew is non-trivial. May need `pkg` or a similar tool to produce a standalone binary. Deferred to post-v1 if it blocks launch. **Documentation site.** README + `docs/` are enough for v1, but a docs site such as Astro Starlight or Mintlify raises the bar for adoption. Deferred unless adoption traction justifies the maintenance cost. **Trademark search.** "Tuffgal" needs USPTO + EU IPO check before commercial use. v1 is OSS-only so lower urgency, but worth running before any logo work. **AI fallback shape for v1.1.0.** BYOLLM is the working assumption; the consumer brings their own provider key and Tuffgal calls out for hint disambiguation. Provider abstraction TB. Thinking of OpenAI-compatible API, Anthropic, and local via Ollama. Out of scope for v1. ## Appendix, final decisions baked in - **AI:** v1.1 (BYOLLM) - **Browser:** Chromium only (v1) - **DB integration:** Callback-based (no driver opinion) - **Format:** JSON only (zod-validated) - **Framework scope:** React + Vite, Next, Vue/Svelte/Solid, server-rendered (Rails/Vapor/Django/Express) - **Hosted:** OSS only at launch - **License:** MIT - **Name:** Tuffgal - **Node:** 22+; compiled `dist/` shipped to npm with provenance - **Packaging:** npm + CLI + GitHub Action; Homebrew formula (post-v1) - **Repo strategy:** Standalone open-source product, MIT licensed - **Substrate:** Playwright library mode