--- name: rust-write-tests description: Expert-level Rust testing — the "What Could Break?" framework, five transformations from superficial to expert tests, flake hunting protocol, intent-based assertions, naming conventions, and a mandatory self-review checklist. Triggers on writing Rust tests, designing test cases, improving test quality, or reviewing test coverage. --- # Rust Test Writing Skill Write tests that catch real bugs. Every test must guard a specific invariant -- not just prove the code "works." ## The "What Could Break?" Framework Before writing any test, answer these four questions: 1. **What invariant does this code maintain?** (e.g., "deserialized config always has a default profile") 2. **What edge case would violate it?** (e.g., "empty TOML table, missing key, extra unknown key") 3. **What platform difference could surface?** (e.g., "path separators, case sensitivity, symlink behavior") 4. **What would a future refactor accidentally break?** (e.g., "field added to struct but not to Display impl") If you can only answer #1, your test is a happy-path test. Answer all four and you have a regression suite. ## The 5 Test Transformations Each transformation shows a superficial test pattern and its expert replacement. ### Transformation 1: Weak assertions -> whole-object comparison ```rust // BEFORE: proves nothing about the rest of the struct let result = parse_config(input)?; assert!(result.is_ok()); // AFTER: catches any unexpected field change use pretty_assertions::assert_eq; let result = parse_config(input)?; assert_eq!(result, Config { name: "default".into(), timeout: Duration::from_secs(30), retries: 3, verbose: false, }); ``` ### Transformation 2: Single happy-path -> targeted test suite ```rust // BEFORE: one test, one path #[test] fn test_parse_config() { let cfg = parse("valid input").unwrap(); assert!(cfg.is_valid()); } // AFTER: 3-6 tests covering happy, error, edge, platform #[test] fn parse_config_returns_defaults_for_minimal_input() { .. } #[test] fn parse_config_rejects_negative_timeout() { .. } #[test] fn parse_config_preserves_unknown_fields_as_extensions() { .. } #[test] fn parse_config_handles_empty_string_gracefully() { .. } #[cfg(windows)] #[test] fn parse_config_normalizes_backslash_paths() { .. } ``` ### Transformation 3: Inline test module -> sibling `_tests.rs` file ```rust // BEFORE: tests pollute the production file diff // foo.rs pub fn compute() -> u32 { 42 } #[cfg(test)] mod tests { use super::*; #[test] fn it_works() { assert_eq!(compute(), 42); } } // AFTER: production code and test code in sibling files // foo.rs pub fn compute() -> u32 { 42 } #[cfg(test)] #[path = "foo_tests.rs"] mod tests; // foo_tests.rs use super::*; #[test] fn compute_returns_expected_value() { assert_eq!(compute(), 42); } ``` For `mod.rs` modules, use `mod_tests.rs`. ### Transformation 4: String-based test data -> typed struct construction ```rust // BEFORE: silent breakage when fields change let input: Config = serde_json::from_str(r#"{"name":"test","timeout":30}"#)?; // AFTER: compile-time safety for field additions/renames fn make_config(name: &str, timeout_secs: u64) -> Config { Config { name: name.to_string(), timeout: Duration::from_secs(timeout_secs), retries: 0, verbose: false, } } let input = make_config("test", 30); ``` Factory functions for domain objects let each test construct exactly the fixture it needs. No shared mutable state. No JSON parsing at test time. ### Transformation 5: `HashMap` in fixtures -> `BTreeMap` for determinism ```rust // BEFORE: test passes 99% of the time, flakes in CI let mut map = HashMap::new(); map.insert("b", 2); map.insert("a", 1); assert_eq!(format!("{map:?}"), r#"{"a": 1, "b": 2}"#); // order not guaranteed // AFTER: deterministic iteration order let mut map = BTreeMap::new(); map.insert("b", 2); map.insert("a", 1); assert_eq!(format!("{map:?}"), r#"{"a": 1, "b": 2}"#); // always this order ``` Use `BTreeMap` whenever output order affects assertions or snapshots. ## Test Flake Hunting Protocol Bolin's single most frequent pattern (97+ references across 30+ commits). When a test is flaky, follow this exact protocol: 1. **Identify the race window** -- read the event-emission code, find where timing assumptions break. Locate the exact line where the test assumes an event has arrived or a state has changed without proof. 2. **Replace timing with event-driven sync** -- wait for a specific event instead of sleeping or assuming order. Never use `sleep` as a synchronization primitive. 3. **Make assertions order-independent** -- sort collected values, use sets, or match by content not position. Non-deterministic event ordering is not a bug; asserting on it is. 4. **Stress-test the fix** -- run with the exact command: `cargo nextest run -p -j 2 --no-fail-fast --stress-count 50 --status-level leak` 5. **Document the non-determinism** in the commit message -- explain why the timing assumption was wrong and what synchronization replaced it. ```rust // BEFORE (timing-dependent): tokio::time::sleep(Duration::from_millis(100)).await; assert_eq!(events.len(), 2); assert_eq!(events[0].type_name, "item.create"); assert_eq!(events[1].type_name, "audio.delta"); // AFTER (event-driven, order-independent): wait_for_event(&rx, |e| e.type_name == "item.create").await; wait_for_event(&rx, |e| e.type_name == "audio.delta").await; // OR: collect, sort, compare let mut types: Vec<_> = events.iter().map(|e| &e.type_name).collect(); types.sort(); assert_eq!(types, vec!["audio.delta", "item.create"]); ``` Common flake sources to watch for: - `turn/started` emitted optimistically before state is actually ready - Event ordering across async channels (mpsc, broadcast) - HashMap iteration order in serialized output - Test harness Drop racing with child process shutdown (close stdin first, then wait, then kill) ## Intent-Based Assertions Replace exact command-string matching with intent-based semantic matching. Check that the test observes the right INTENT (operation + target) rather than a specific command format that varies across platforms or refactors. ```rust // BEFORE: brittle -- breaks if command formatting changes assert_eq!(cmd.to_string(), "rm -rf /tmp/workspace/build"); // AFTER: intent-based -- asserts the operation and target assert_eq!(cmd.operation(), Operation::Remove); assert!(cmd.target().ends_with("workspace/build")); assert!(cmd.is_recursive()); ``` When exact strings are unavoidable, assert on the semantically meaningful parts (path suffix, flag presence) rather than the full formatted string. ## Test Naming Convention Pattern: `{subject}_{scenario}_{expected_outcome}` A failed test name must be an actionable bug description. When it fails in CI, the name alone tells you what broke. The name should read as a specification: if it fails, you know exactly what invariant was violated. Exemplary names: ``` sandbox_detection_requires_keywords sandbox_detection_ignores_non_sandbox_mode aggregate_output_rebalances_when_stderr_is_small parse_config_rejects_negative_timeout permissions_profiles_reject_writes_outside_workspace_root permissions_profiles_allow_network_enablement legacy_sandbox_mode_config_builds_split_policies_without_drift under_development_features_are_disabled_by_default usage_limit_reached_error_formats_free_plan unexpected_status_cloudflare_html_is_simplified root_write_plus_carveouts_still_requires_platform_sandbox explicit_unreadable_paths_prevent_auto_approval_for_external_sandbox denied_hosts_take_priority_over_allowed_hosts_glob ``` Anti-pattern names: `test_parse`, `it_works`, `test_config_1`, `happy_path`. ## Testing Stack Quick Reference | Scenario | Tool | |----------|------| | HTTP mocking | `wiremock::MockServer` | | Filesystem isolation | `TempDir` (`tempfile` crate) | | Async tests | `#[tokio::test]` | | UI / output snapshots | `insta::assert_snapshot!` | | Struct comparison | `pretty_assertions::assert_eq` | | Enum variant checks | `assert_matches!` | | Deterministic collections | `BTreeMap` over `HashMap` | | Flake stress-testing | `cargo nextest run --stress-count 50` | ### When to use each - **wiremock**: Any test that would hit a real HTTP endpoint. Mount responses with `Mock::given().respond_with()`. Assert request bodies after the test. - **TempDir**: Every test that touches disk. Never mutate the process environment. Never hardcode `/tmp` or `C:\`. - **insta**: TUI widgets, CLI output, error messages -- anything where the exact text matters. Render to a buffer, snapshot with `assert_snapshot!`. - **pretty_assertions**: Default for all `assert_eq!` calls. Gives colored diffs on failure. Import at the top of every test file. - **nextest stress**: After fixing any flaky test. Always run with `-j 2 --no-fail-fast --stress-count 50` to confirm the fix holds under concurrency. ## Self-Review Checklist After writing tests, verify every item. Fix every violation before presenting the tests. ``` [ ] Uses pretty_assertions::assert_eq (not std assert_eq) [ ] Compares entire objects, not individual fields [ ] Each test guards a specific invariant (not just "it works") [ ] Test names follow {subject}_{scenario}_{expected_outcome} [ ] Test names encode the invariant being guarded [ ] TempDir for any filesystem tests (no hardcoded paths) [ ] No process environment mutation (no std::env::set_var) [ ] Error paths tested (not just happy path) [ ] At least 3 tests for any non-trivial function [ ] BTreeMap used where iteration order affects assertions [ ] Test file is a sibling _tests.rs, not inline mod tests {} [ ] No timing-dependent assertions (no sleep -> assert) [ ] Order-independent where event order is non-deterministic [ ] String assertions use intent matching, not exact format ```