# Integration Test Reporting Skill Run the Level 2 dummy agent integration test suite and produce a detailed HTML report with per-test input → outcome analysis. ## Trigger User wants to run integration tests and see results: - `/test-reporting` - `/test-reporting test_component_queen_live.py` - `/test-reporting --all` ## SOP: Running Tests ### Step 1: Select Scope If the user provides a specific test file or pattern, use it. Otherwise run the full suite. ```bash # Full suite cd core && echo "1" | uv run python tests/dummy_agents/run_all.py --interactive 2>&1 # Specific file (requires manual provider setup) cd core && uv run python -c " import sys sys.path.insert(0, '.') from tests.dummy_agents.run_all import detect_available from tests.dummy_agents.conftest import set_llm_selection avail = detect_available() claude = [p for p in avail if 'Claude Code' in p['name']] if not claude: avail_names = [p['name'] for p in avail] raise RuntimeError(f'No Claude Code subscription. Available: {avail_names}') provider = claude[0] set_llm_selection( model=provider['model'], api_key=provider['api_key'], extra_headers=provider.get('extra_headers'), api_base=provider.get('api_base'), ) import pytest sys.exit(pytest.main([ 'tests/dummy_agents/TEST_FILE_HERE', '-v', '--override-ini=asyncio_mode=auto', '--no-header', '--tb=long', '--log-cli-level=WARNING', '--junitxml=/tmp/hive_test_results.xml', ])) " ``` ### Step 2: Collect Results After the test run completes, collect: 1. **JUnit XML** from `--junitxml` output (if available) 2. **stdout/stderr** from the run 3. **Summary table** from `run_all.py` output (the Unicode table) ### Step 3: Generate HTML Report Write the report to `/tmp/hive_integration_test_report.html`. The report MUST include these sections: #### Header - Run timestamp (ISO 8601) - Provider used (model name, source) - Total tests / passed / failed / skipped - Total wall-clock time - Overall verdict: PASS (all green) or FAIL (with count) #### Per-Test Table For EVERY test (not just failures), include a row with: | Column | Description | |--------|-------------| | Component | Test file grouping (e.g., `component_queen_live`) | | Test Name | Function name (e.g., `test_queen_starts_in_planning_without_worker`) | | Status | PASS / FAIL / SKIP / ERROR with color badge | | Duration | Wall-clock seconds | | What | One-line description of what the test verifies | | How | How it works (setup → action → assertion) | | Why | Why this test matters (what bug/behavior it catches) | | Input | The input data or configuration (graph spec, initial prompt, phase, etc.) | | Expected Outcome | What the test asserts | | Actual Outcome | What actually happened (PASS: matches expected / FAIL: actual vs expected) | | Failure Detail | For failures only: full traceback + diagnosis | #### What / How / Why Descriptions These MUST be derived from the test function's docstring and code. Read each test file to extract: - **What**: From the docstring first line - **How**: From the test body (what fixtures, what graph, what assertions) - **Why**: From the docstring body or "Why this matters" section in the test module Use these mappings for the component test files: ``` test_component_llm.py → "LLM Provider" — streaming, tool calling, tokens test_component_tools.py → "Tool Registry + MCP" — connection, execution test_component_event_loop.py → "EventLoopNode" — iteration, output, stall test_component_edges.py → "Edge Evaluation" — conditional, priority test_component_conversation.py → "Conversation Persistence" — storage, cursor test_component_escalation.py → "Escalation Flow" — worker→queen signaling test_component_continuous.py → "Continuous Mode" — conversation threading test_component_queen.py → "Queen Phase (Unit)" — phase state, tools, events test_component_queen_live.py → "Queen Phase (Live)" — real queen, real LLM test_component_queen_state_machine.py → "Queen State Machine" — edge cases, races test_component_worker_comms.py → "Worker Communication" — events, data flow test_component_strict_outcomes.py → "Strict Outcomes" — exact path, output, quality ``` #### HTML Template Use this structure: ```html Hive Integration Test Report — {timestamp}

Hive Integration Test Report

Generated: {timestamp} | Provider: {provider} | Duration: {duration}s

Total
{total}
Passed
{passed}
Failed
{failed}
Verdict
{verdict}

Test Results

Component Test Status Time What Input → Expected → Actual
{component} {test_name} {status} {duration}s {what_description} Input: {input_description}
Expected: {expected_outcome}
Actual: {actual_outcome}
{failure_traceback}

Failure Analysis

For each failure, provide:

``` ### Step 4: Output 1. Write the HTML file to `/tmp/hive_integration_test_report.html` 2. Print the file path so the user can open it 3. Print a concise summary to the terminal: ``` Test Report: /tmp/hive_integration_test_report.html Result: 74/76 PASSED (2 failures) Failures: - parallel_merge::test_parallel_disjoint_output_keys - worker::test_worker_timestamped_note_artifact ``` ## Key Rules 1. ALWAYS use `--junitxml` when running pytest to get structured results 2. ALWAYS read the test source files to populate What/How/Why columns — do not guess 3. For Input/Expected/Actual, extract from the test's graph spec, assertions, and result 4. Color-code everything: green for pass, red for fail, amber for skip 5. Include the full traceback for failures in a scrollable `
` 6. Group tests by component (file name) with a visual separator 7. The report must be self-contained HTML (no external CSS/JS dependencies)