--- name: ln-635-test-isolation-auditor description: "Audits whether test results can be trusted: flakiness, isolation, real external dependencies, time/random/order dependency, and shared state. Use when auditing test trustworthiness." allowed-tools: Read, Grep, Glob, Bash license: MIT model: claude-haiku-4-5 --- > **Paths:** File paths (`references/`, `../ln-*`) are relative to this skill directory. # Trustworthiness Auditor (L3 Worker) **Type:** L3 Worker Specialized worker auditing whether automated test results are deterministic, isolated, and trustworthy. ## Purpose & Scope - Audit **Test Trustworthiness** (Category 5: Medium Priority) - Check determinism, isolation, and dependency control - Detect flaky tests, time/random/order dependency, shared state, and real external dependencies - Emit `REWRITE_FOR_DETERMINISM` or `DELETE_IF_LOW_VALUE` - Calculate compliance score (X/10) ## Inputs **MANDATORY READ:** Load `references/audit_worker_core_contract.md`. Receives `contextStore` with: `tech_stack`, `testFilesMetadata`, `codebase_root`, `output_dir`. ## Workflow Detection policy: use two-layer detection (candidate scan, then context verification); load `references/two_layer_detection.md` only when the verification method is ambiguous. 1) **Parse Context:** Extract tech stack, trustworthiness checklist, test file list, output_dir from contextStore 2) **Check Isolation (Layer 1):** Check isolation for 6 categories (APIs, DB, FS, Time, Random, Network) 2b) **Context Analysis (Layer 2 -- MANDATORY):** For each isolation violation, ask: - Is this an **integration test**? (real dependencies are intentional) -> **do NOT flag**. Only flag isolation issues in **unit tests** - Is in-memory DB configured via test config (not visible in grep)? -> **skip** - Is this a test helper that sets up mocks for other tests? -> **skip** 3) **Check Determinism:** Check for flaky tests, time-dependent assertions, order-dependent tests, shared mutable state 4) **Evaluate trust action:** Use `REWRITE_FOR_DETERMINISM` by default; use `DELETE_IF_LOW_VALUE` only when the test is both untrustworthy and low-value according to obvious local evidence 5) **Collect Findings:** Record each violation with severity, location (file:line), effort estimate (S/M/L), action, recommendation 6) **Calculate Score:** Count violations by severity, calculate compliance score (X/10) 7) **Write Report:** Build full markdown report in memory per `references/templates/audit_worker_report_template.md`, write to `{output_dir}/ln-635--global.md` in single Write call 8) **Return Summary:** Return minimal summary to coordinator (see Output Format) ## Audit Rules: Test Isolation ### 1. External APIs **Good:** Mocked (jest.mock, sinon, nock) **Bad:** Real HTTP calls to external APIs **Detection:** - Grep for `axios.get`, `fetch(`, `http.request` without mocks - Check if test makes actual network calls **Severity:** **HIGH** **Recommendation:** Ensure external API calls are controlled (mock, stub, or test server). Tool choice depends on project stack. **Exception:** Integration tests are EXPECTED to use real dependencies -- do NOT flag **Effort:** M ### 2. Database **Good:** In-memory DB (sqlite :memory:) or mocked **Bad:** Real database (PostgreSQL, MySQL) **Detection:** - Check DB connection strings (localhost:5432, real DB URL) - Grep for `beforeAll(async () => { await db.connect() })` without `:memory:` **Severity:** **MEDIUM** **Recommendation:** Ensure DB state is controlled and isolated between test runs. **Exception:** Integration tests with in-memory DB via config -> skip **Effort:** M-L ### 3. File System **Good:** Mocked (mock-fs, vol) **Bad:** Real file reads/writes **Detection:** - Grep for `fs.readFile`, `fs.writeFile` without mocks - Check if test creates/deletes real files **Severity:** **MEDIUM** **Recommendation:** Ensure file system operations are isolated (mock, temp directory, or cleanup). Tool choice depends on project stack **Effort:** S-M ### 4. Time/Date **Good:** Mocked (jest.useFakeTimers, sinon.useFakeTimers) **Bad:** `new Date()`, `Date.now()` without mocks **Detection:** - Grep for `new Date()` in test files without `useFakeTimers` **Severity:** **MEDIUM** **Recommendation:** Ensure time-dependent logic uses controlled clock (fake timers, injected clock, or time provider). Tool choice depends on project stack **Effort:** S ### 5. Random **Good:** Seeded random (Math.seedrandom, fixed seed) **Bad:** `Math.random()` without seed **Detection:** - Grep for `Math.random()` without seed setup **Severity:** **LOW** **Recommendation:** Use seeded random for deterministic tests **Effort:** S ### 6. Network **Good:** Mocked (supertest for Express, no real ports) **Bad:** Real network requests (`localhost:3000`, binding to port) **Detection:** - Grep for `app.listen(3000)` in tests - Check for real HTTP requests **Severity:** **MEDIUM** **Recommendation:** Use `supertest` (no real port) **Effort:** M ## Audit Rules: Determinism ### 1. Flaky Tests **What:** Tests that pass/fail randomly **Detection:** - Run tests multiple times, check for inconsistent results - Grep for `setTimeout`, `setInterval` without proper awaits - Check for race conditions (async operations not awaited) **Severity:** **HIGH** **Recommendation:** Fix race conditions, use proper async/await **Effort:** M-L ### 2. Time-Dependent Assertions **What:** Assertions on current time (`expect(timestamp).toBeCloseTo(Date.now())`) **Detection:** - Grep for `Date.now()`, `new Date()` in assertions **Severity:** **MEDIUM** **Recommendation:** Mock time **Effort:** S ### 3. Order-Dependent Tests **What:** Tests that fail when run in different order **Detection:** - Run tests in random order, check for failures - Grep for shared mutable state between tests **Severity:** **MEDIUM** **Recommendation:** Isolate tests, reset state in beforeEach **Effort:** M ### 4. Shared Mutable State **What:** Global variables modified across tests **Detection:** - Grep for `let globalVar` at module level - Check for state shared between tests **Severity:** **MEDIUM** **Recommendation:** Use `beforeEach` to reset state **Effort:** S-M ## Audit Rules: Trustworthiness Drag ### 1. Overlarge Test With Shared Setup (>100 lines) **What:** Test with >100 lines, testing too many scenarios **Detection:** - Count lines per test - If >100 lines -> Giant **Severity:** **MEDIUM** **Recommendation:** Split into focused tests (one scenario per test) **Effort:** S-M ### 2. Slow Poke (>5 seconds) **What:** Test taking >5 seconds to run **Detection:** - Measure test duration - If >5s -> Slow Poke **Severity:** **MEDIUM** **Recommendation:** Control external deps with test doubles or in-memory services selected from the project stack; parallelize only after isolation is verified **Effort:** M ### 3. Conjoined Twins (Unit test without controlled dependencies) **What:** Test labeled "Unit" but not mocking dependencies **Detection:** - Check if test name includes "Unit" - Verify all dependencies are mocked - If no mocks -> actually Integration test **Severity:** **LOW** **Recommendation:** Either mock dependencies OR rename to Integration test **Effort:** S ### 4. Default Value Blindness (Tests with default config) **What:** Tests with default config values only. Use the non-default config rule from `references/risk_based_testing_guide.md`; load `references/risk_based_testing_methodology.md` only when examples are needed. **Detection:** - Grep for common defaults in test setup: `:8080`, `:3000`, `30000`, `limit: 20`, `offset: 0` - Check if test config values match framework/library defaults - Look for `|| DEFAULT` patterns in source code with matching test values **Severity:** **HIGH** **Effort:** S ## Scoring Algorithm **MANDATORY READ:** Load `references/audit_scoring.md`. **Severity mapping:** - Flaky tests, External API not controlled, Default Value Blindness -> HIGH - Real database, File system, Time/Date, Network, Overlarge shared setup, Slow Poke -> MEDIUM - Random without seed, Order-dependent, Conjoined Twins -> LOW ## Output Format **MANDATORY READ:** Load `references/templates/audit_worker_report_template.md`. Write JSON summary per `references/audit_summary_contract.md`. In managed mode the caller passes both `runId` and `summaryArtifactPath`; in standalone mode the worker generates its own run-scoped artifact path per shared contract. Write report to `{output_dir}/ln-635--global.md` with `category: "Test Trustworthiness"` and checks: api_isolation, db_isolation, fs_isolation, time_isolation, random_isolation, network_isolation, flaky_tests, order_dependency, shared_state, default_value_blindness. Return summary per `references/audit_summary_contract.md`. When `summaryArtifactPath` is absent, write the standalone runtime summary under `.hex-skills/runtime-artifacts/runs/{run_id}/evaluation-worker/{worker}--{identifier}.json` and optionally echo the same summary in structured output. ``` Report written: .hex-skills/runtime-artifacts/runs/{run_id}/audit-report/ln-635--global.md Score: X.X/10 | Issues: N (C:N H:N M:N L:N) ``` **Note:** Findings are flattened into single array. Use `principle` field prefix (Isolation / Determinism / Dependency Control) to identify issue category. Each finding includes `action: "REWRITE_FOR_DETERMINISM"` or `action: "DELETE_IF_LOW_VALUE"`. ## Critical Rules Apply the already-loaded `references/audit_worker_core_contract.md`. - **Do not auto-fix:** Report only - **Effort realism:** S = <1h, M = 1-4h, L = >4h - **Flat findings:** Merge isolation + determinism + dependency-control findings into single findings array, use `principle` prefix to distinguish - **Context-aware:** Supertest with real Express app is acceptable for integration tests - **Unique angle:** Only audit whether test results can be trusted. Do not evaluate product behavior, E2E journey value, portfolio value, missing coverage, oracle strength, manual evidence, or structure. - **Action required:** Every finding uses `REWRITE_FOR_DETERMINISM` unless evidence shows the test is also low-value enough to use `DELETE_IF_LOW_VALUE`. **Monitor (2.1.98+):** For repeated test runs expected >30s each, use `Monitor`. Fallback: `Bash(run_in_background=true)`. ## Definition of Done Apply the already-loaded `references/audit_worker_core_contract.md`. - [ ] contextStore parsed successfully (including output_dir) - [ ] All 3 audit groups completed: - Isolation (6 categories: APIs, DB, FS, Time, Random, Network) - Determinism (4 checks: flaky, time-dependent, order-dependent, shared state) - Dependency control (overlarge shared setup, slow tests, conjoined dependencies, default-value blindness) - [ ] Findings collected with severity, location, effort, action, recommendation - [ ] Score calculated using penalty algorithm - [ ] Report written to `{output_dir}/ln-635--global.md` (atomic single Write call) - [ ] Summary written per contract --- **Version:** 3.0.0 **Last Updated:** 2025-12-23