--- name: when-validating-code-works-use-functionality-audit type: testing-quality description: | Validates that code actually works through sandbox testing, execution verification, and systematic debugging. Use this skill after code generation or modification to ensure functionality is genuine rather than assumed. The skill creates isolated test environments, executes code with realistic inputs, identifies bugs through systematic analysis, and applies best practices to fix issues without breaking existing functionality. agents: - tester - coder - reviewer - production-validator phases: 5 memory_pattern: "testing/functionality-audit/phase-{N}/{agent}/{deliverable}" scripts: - pre-task - session-restore - phase-coordination - memory-sync - post-task hooks: pre: npx claude-flow hooks pre-task --description "Functionality audit validation" post: npx claude-flow hooks post-task --task-id "functionality-audit" --- # Functionality Audit - Code Execution Validation ## When to Use This Skill **Trigger Conditions:** - After generating new code or modifying existing code - When code appears complete but actual functionality is uncertain - Before merging PRs or deploying to production - When debugging reported issues or unexpected behavior - As part of quality assurance workflows - When validating third-party code integrations **Situations Requiring Functionality Audit:** - Code generated by AI that needs execution validation - Complex logic changes requiring runtime verification - Integration of new libraries or dependencies - Refactoring that may have introduced regressions - Migration to new frameworks or language versions ## Overview This skill systematically validates that code delivers its intended behavior through actual execution rather than static analysis alone. It creates isolated testing environments (sandboxes), executes code with realistic inputs, captures outputs and errors, identifies root causes of failures through systematic debugging, and applies fixes using best practices that preserve existing functionality. The skill emphasizes **genuine functionality over appearance**, detecting "theater code" that looks correct but fails during execution. It combines automated testing, manual validation, and debugging expertise to ensure code reliability. ## Phase 1: Setup Testing Environment (Sequential) **Agents**: tester (lead), coder (support) **Duration**: 10-15 minutes **Scripts**: ```bash # Initialize phase npx claude-flow hooks pre-task --description "Phase 1: Setup Testing Environment" npx claude-flow swarm init --topology hierarchical --max-agents 2 # Spawn agents npx claude-flow agent spawn --type tester --capabilities "sandbox-setup,environment-config,dependency-management" npx claude-flow agent spawn --type coder --capabilities "tooling-setup,script-generation" # Memory coordination - store environment config npx claude-flow memory store --key "testing/functionality-audit/phase-1/tester/sandbox-config" --value '{"isolated":true,"snapshot_enabled":true}' npx claude-flow memory store --key "testing/functionality-audit/phase-1/coder/dependencies" --value '{"package_manager":"npm","install_command":"npm install"}' # Execute phase work # 1. Create isolated sandbox environment echo "Creating sandbox for code execution..." mkdir -p /tmp/functionality-audit-sandbox cd /tmp/functionality-audit-sandbox # 2. Install dependencies and setup tools npm init -y 2>/dev/null || true npm install --save-dev jest @types/jest ts-node typescript 2>/dev/null || true # 3. Configure testing framework cat > jest.config.js << 'EOF' module.exports = { preset: 'ts-jest', testEnvironment: 'node', collectCoverage: true, coverageDirectory: 'coverage', testMatch: ['**/*.test.ts', '**/*.test.js'], verbose: true }; EOF # Complete phase npx claude-flow hooks post-task --task-id "phase-1-setup" npx claude-flow memory store --key "testing/functionality-audit/phase-1/output" --value '{"status":"complete","sandbox_ready":true}' ``` **Memory Pattern**: - Input: `testing/functionality-audit/phase-0/user/code-to-validate` - Output: `testing/functionality-audit/phase-1/tester/sandbox-ready` - Shared: `testing/functionality-audit/shared/environment-config` **Success Criteria**: - [ ] Isolated sandbox environment created successfully - [ ] All required dependencies installed without errors - [ ] Testing framework configured and operational - [ ] Environment snapshot created for rollback capability - [ ] Sandbox validation completed (basic sanity checks pass) **Deliverables**: - Configured sandbox environment - Dependency manifest (package.json or requirements.txt) - Testing framework configuration files - Environment setup documentation ## Phase 2: Execute Code with Realistic Inputs (Parallel) **Agents**: tester (lead), production-validator (support) **Duration**: 15-20 minutes **Scripts**: ```bash # Initialize phase npx claude-flow hooks pre-task --description "Phase 2: Execute Code with Realistic Inputs" npx claude-flow swarm scale --target-agents 2 # Retrieve sandbox config from Phase 1 npx claude-flow memory retrieve --key "testing/functionality-audit/phase-1/output" # Memory coordination - define test scenarios npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/test-scenarios" --value '{"unit_tests":true,"integration_tests":true,"edge_cases":true}' # Execute phase work # 1. Copy code to sandbox echo "Copying code to sandbox environment..." # (Code paths retrieved from memory) # 2. Generate test cases with realistic inputs cat > sandbox-tests.test.js << 'EOF' // Auto-generated test cases for functionality validation describe('Functionality Audit Tests', () => { test('Happy path with valid inputs', async () => { // Test implementation with realistic data }); test('Edge cases and boundary conditions', async () => { // Test edge cases }); test('Error handling with invalid inputs', async () => { // Test error scenarios }); }); EOF # 3. Execute tests and capture output npm test -- --coverage --verbose > test-output.log 2>&1 TEST_EXIT_CODE=$? # 4. Store results in memory npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/execution-results" --value "{\"exit_code\":$TEST_EXIT_CODE,\"timestamp\":\"$(date -Iseconds)\"}" # Complete phase npx claude-flow hooks post-task --task-id "phase-2-execute" ``` **Memory Pattern**: - Input: `testing/functionality-audit/phase-1/tester/sandbox-ready` - Output: `testing/functionality-audit/phase-2/tester/execution-results` - Shared: `testing/functionality-audit/shared/test-logs` **Success Criteria**: - [ ] All test cases executed without environment errors - [ ] Output captured completely (stdout, stderr, exit codes) - [ ] Code coverage metrics collected successfully - [ ] Performance metrics recorded (execution time, memory usage) - [ ] Results stored in structured format for analysis **Deliverables**: - Test execution logs with full output - Code coverage reports - Performance metrics - Captured errors and stack traces - Test results summary ## Phase 3: Debug Issues (Sequential) **Agents**: coder (lead), tester (support), reviewer (validation) **Duration**: 20-30 minutes **Scripts**: ```bash # Initialize phase npx claude-flow hooks pre-task --description "Phase 3: Debug Issues" npx claude-flow swarm scale --target-agents 3 # Retrieve execution results from Phase 2 npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results" # Memory coordination - analyze failures npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/debugging-strategy" --value '{"method":"systematic-root-cause","tools":["debugger","logs","profiler"]}' # Execute phase work # 1. Analyze test failures systematically echo "Analyzing failures and errors..." grep -E "(FAIL|ERROR|Exception)" test-output.log > failures.txt || true # 2. Identify root causes using debugging techniques # - Stack trace analysis # - Variable inspection # - Logic flow validation # - Dependency conflict detection # 3. Categorize issues by type cat > issue-analysis.json << 'EOF' { "syntax_errors": [], "runtime_errors": [], "logic_errors": [], "integration_failures": [], "dependency_issues": [] } EOF # 4. Prioritize fixes by impact # Critical: Code doesn't run at all # High: Core functionality broken # Medium: Edge cases failing # Low: Minor issues, optimizations # Store debugging results npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/root-causes" --value "$(cat issue-analysis.json)" # Complete phase npx claude-flow hooks post-task --task-id "phase-3-debug" ``` **Memory Pattern**: - Input: `testing/functionality-audit/phase-2/tester/execution-results` - Output: `testing/functionality-audit/phase-3/coder/root-causes` - Shared: `testing/functionality-audit/shared/issue-tracker` **Success Criteria**: - [ ] All failures categorized by root cause type - [ ] Stack traces analyzed and understood completely - [ ] Dependencies validated and conflicts identified - [ ] Logic errors traced to specific code locations - [ ] Fix strategy documented with priorities **Deliverables**: - Root cause analysis report - Issue categorization matrix - Priority-ranked fix list - Debugging session logs - Dependency conflict resolution plan ## Phase 4: Validate Functionality (Parallel) **Agents**: production-validator (lead), reviewer (quality gate), tester (regression) **Duration**: 15-25 minutes **Scripts**: ```bash # Initialize phase npx claude-flow hooks pre-task --description "Phase 4: Validate Functionality" npx claude-flow swarm scale --target-agents 3 # Retrieve fixes from Phase 3 npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes" # Memory coordination - define validation criteria npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/criteria" --value '{"functional_correctness":true,"performance_acceptable":true,"no_regressions":true}' # Execute phase work # 1. Apply fixes and rerun tests echo "Applying fixes and validating..." npm test -- --coverage --verbose > retest-output.log 2>&1 RETEST_EXIT_CODE=$? # 2. Compare results with baseline diff test-output.log retest-output.log > validation-diff.txt || true # 3. Validate no regressions introduced # - Check previously passing tests still pass # - Verify no new failures introduced # - Confirm fixes resolved original issues # 4. Production readiness assessment cat > production-readiness.json << 'EOF' { "all_tests_passing": false, "coverage_threshold_met": false, "performance_acceptable": false, "security_validated": false, "ready_for_deployment": false } EOF # Update readiness based on results if [ $RETEST_EXIT_CODE -eq 0 ]; then echo "Tests passing, updating readiness..." fi # Store validation results npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/readiness" --value "$(cat production-readiness.json)" # Complete phase npx claude-flow hooks post-task --task-id "phase-4-validate" ``` **Memory Pattern**: - Input: `testing/functionality-audit/phase-3/coder/root-causes` - Output: `testing/functionality-audit/phase-4/validator/readiness` - Shared: `testing/functionality-audit/shared/validation-results` **Success Criteria**: - [ ] All critical tests passing without failures - [ ] Code coverage meets or exceeds threshold (80%+) - [ ] No regressions detected in previously working code - [ ] Performance metrics within acceptable ranges - [ ] Production readiness criteria satisfied **Deliverables**: - Validation test results - Regression analysis report - Production readiness assessment - Performance comparison metrics - Quality gate approval status ## Phase 5: Report Results (Sequential) **Agents**: reviewer (lead), tester (metrics) **Duration**: 10-15 minutes **Scripts**: ```bash # Initialize phase npx claude-flow hooks pre-task --description "Phase 5: Report Results" npx claude-flow swarm scale --target-agents 2 # Retrieve all phase outputs npx claude-flow memory retrieve --key "testing/functionality-audit/phase-4/validator/readiness" npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes" npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results" # Execute phase work # 1. Compile comprehensive audit report cat > functionality-audit-report.md << 'EOF' # Functionality Audit Report ## Executive Summary - **Status**: [PASS/FAIL/PARTIAL] - **Tests Executed**: X - **Tests Passed**: Y - **Code Coverage**: Z% - **Critical Issues**: N ## Phase-by-Phase Results ### Phase 1: Environment Setup - Status: Complete - Issues: None ### Phase 2: Execution Testing - Total Tests: X - Passed: Y - Failed: Z - Coverage: N% ### Phase 3: Debugging - Issues Identified: X - Root Causes: Y categories - Fixes Applied: Z ### Phase 4: Validation - Regression Status: No regressions - Production Ready: Yes/No - Performance: Acceptable ## Recommendations 1. [Action items for addressing remaining issues] 2. [Performance optimization suggestions] 3. [Code quality improvements] ## Artifacts - Test logs: test-output.log - Coverage report: coverage/ - Issue analysis: issue-analysis.json - Validation results: production-readiness.json EOF # 2. Generate metrics dashboard cat > audit-metrics.json << 'EOF' { "timestamp": "", "duration_minutes": 0, "tests_total": 0, "tests_passed": 0, "coverage_percentage": 0, "issues_found": 0, "issues_fixed": 0, "production_ready": false } EOF # 3. Store final results npx claude-flow memory store --key "testing/functionality-audit/final-report" --value "$(cat functionality-audit-report.md)" # Complete phase and export metrics npx claude-flow hooks post-task --task-id "phase-5-report" --export-metrics true npx claude-flow hooks session-end --export-metrics true ``` **Memory Pattern**: - Input: `testing/functionality-audit/phase-{1-4}/*/output` - Output: `testing/functionality-audit/final-report` - Shared: `testing/functionality-audit/shared/complete-audit` **Success Criteria**: - [ ] Comprehensive audit report generated with all findings - [ ] Metrics compiled and formatted for stakeholder consumption - [ ] Actionable recommendations documented clearly - [ ] All artifacts linked and accessible - [ ] Results stored in memory for future reference **Deliverables**: - Comprehensive functionality audit report (Markdown) - Metrics dashboard (JSON) - Recommendations document - Complete test artifacts package - Memory-persisted results for trend analysis ## Memory Coordination ### Namespace Convention `testing/functionality-audit/phase-{N}/{agent-type}/{data-type}` **Examples:** - `testing/functionality-audit/phase-1/tester/sandbox-config` - `testing/functionality-audit/phase-2/tester/execution-results` - `testing/functionality-audit/phase-3/coder/root-causes` - `testing/functionality-audit/phase-4/validator/readiness` - `testing/functionality-audit/shared/environment-config` ### Cross-Agent Sharing ```javascript // Store outputs for downstream agents mcp__claude_flow__memory_store({ key: "testing/functionality-audit/phase-2/tester/output", value: JSON.stringify({ status: "complete", results: { tests_run: 45, tests_passed: 38, tests_failed: 7, coverage: 82.5, execution_time_ms: 3420 }, timestamp: Date.now(), artifacts: { logs: "/tmp/functionality-audit-sandbox/test-output.log", coverage: "/tmp/functionality-audit-sandbox/coverage/" } }) }) // Retrieve inputs from upstream agents mcp__claude_flow__memory_retrieve({ key: "testing/functionality-audit/phase-1/tester/sandbox-ready" }).then(data => { const sandboxConfig = JSON.parse(data); console.log(`Using sandbox: ${sandboxConfig.path}`); }) ``` ## Automation Scripts ### Hook Integration ```bash # Pre-task hook - Initialize audit session npx claude-flow hooks pre-task \ --description "Functionality audit for module: auth-service" \ --session-id "functionality-audit-$(date +%s)" # Post-edit hook - Track code changes during debugging npx claude-flow hooks post-edit \ --file "src/auth/login.ts" \ --memory-key "testing/functionality-audit/coder/edits" # Post-task hook - Finalize and export metrics npx claude-flow hooks post-task \ --task-id "functionality-audit" \ --export-metrics true # Session end - Generate summary and persist state npx claude-flow hooks session-end \ --export-metrics true \ --summary "Functionality audit completed: 38/45 tests passing, 7 issues debugged and fixed" ``` ## Evidence-Based Validation ### Self-Consistency Checking Before finalizing each phase, validate: 1. **Does this approach align with successful past work?** - Compare testing strategy with historical successful audits - Validate debugging methods match proven techniques - Confirm fix patterns follow established best practices 2. **Do the outputs support the stated objectives?** - Verify test results directly address functionality concerns - Ensure debugging identified actual root causes, not symptoms - Confirm fixes resolve issues without introducing regressions 3. **Is the chosen method appropriate for the context?** - Match testing depth to code criticality (production vs. prototype) - Scale debugging effort based on issue severity - Apply fix complexity proportional to problem scope 4. **Are there any internal contradictions?** - Check that passing tests align with manual validation - Verify coverage metrics match actual test execution - Confirm reported fixes actually appear in code changes ### Program-of-Thought Decomposition 1. **Define objective precisely** - Objective: Validate that code executes correctly with realistic inputs - Success metric: 100% of critical functionality tests pass - Constraint: No regressions introduced during debugging 2. **Decompose into sub-goals** - Sub-goal 1: Create isolated, reproducible test environment - Sub-goal 2: Execute code with comprehensive test coverage - Sub-goal 3: Identify and categorize all failures systematically - Sub-goal 4: Apply fixes using best practices - Sub-goal 5: Validate fixes and assess production readiness 3. **Identify dependencies** - Environment setup must complete before execution - Execution must complete before debugging - Debugging must complete before fix validation - Validation must complete before reporting 4. **Evaluate options** - Testing frameworks: Jest vs. Mocha vs. Vitest (choose based on project stack) - Debugging approaches: Interactive debugger vs. log analysis vs. profiling - Fix strategies: Minimal changes vs. comprehensive refactoring 5. **Synthesize solution** - Integrate testing framework with existing CI/CD - Combine automated testing with manual validation - Apply fixes incrementally with continuous validation ### Plan-and-Solve Framework **Planning Phase:** - Analyze code scope and criticality level - Define test coverage requirements (unit, integration, e2e) - Identify realistic input scenarios and edge cases - Establish success criteria and quality gates - Allocate time and resources per phase **Validation Gate 1:** Review strategy against objectives - Does test plan cover all critical functionality? - Are edge cases and error scenarios included? - Is debugging approach systematic and thorough? - Are fix criteria clear and measurable? **Implementation Phase:** - Execute Phases 1-4 with continuous monitoring - Track progress against success criteria - Adapt approach based on findings - Document issues and resolutions in real-time **Validation Gate 2:** Verify outputs and performance - All critical tests passing (100% of P0, 90%+ of P1) - Code coverage exceeds threshold (80%+ line, 70%+ branch) - No regressions introduced (100% of previously passing tests still pass) - Performance within acceptable ranges (no degradation >10%) **Optimization Phase:** - Refine tests for better coverage - Optimize debugging workflow for efficiency - Improve fix quality and code maintainability - Enhance reporting clarity and actionability **Validation Gate 3:** Confirm targets met before concluding - Production readiness assessment complete - All stakeholders informed of results - Artifacts stored and accessible - Lessons learned documented for future audits ## Integration with Other Skills **Related Skills:** - `when-detecting-fake-code-use-theater-detection` - Pre-audit detection of non-functional code - `when-reviewing-code-comprehensively-use-code-review-assistant` - Post-audit quality review - `when-verifying-quality-use-verification-quality` - Comprehensive quality validation - `when-auditing-code-style-use-style-audit` - Code style and conventions validation **Coordination Points:** - Theater detection can run before functionality audit to identify obviously broken code - Code review can follow functionality audit to assess code quality beyond functionality - Quality verification provides broader validation context for audit results - Style audit ensures code meets conventions after functional fixes **Memory Sharing:** ```bash # Share audit results with code review skill npx claude-flow memory store \ --key "code-review/input/functionality-audit-results" \ --value "$(cat functionality-audit-report.md)" # Retrieve theater detection findings npx claude-flow memory retrieve \ --key "theater-detection/output/suspicious-code" ``` ## Common Issues and Solutions ### Issue 1: Sandbox Environment Fails to Initialize **Symptoms:** Dependency installation errors, configuration failures **Root Cause:** Missing system dependencies, incompatible versions **Solution:** ```bash # Validate system dependencies node --version npm --version # Clean install rm -rf node_modules package-lock.json npm install --force # Use Docker for isolation if local setup problematic docker run -it --rm -v $(pwd):/workspace node:18 bash ``` ### Issue 2: Tests Fail Due to Missing Environment Variables **Symptoms:** Runtime errors referencing undefined config **Root Cause:** Environment variables not set in sandbox **Solution:** ```bash # Create .env file with required variables cat > .env << 'EOF' NODE_ENV=test API_KEY=test-key-12345 DATABASE_URL=sqlite::memory: EOF # Load environment in test setup require('dotenv').config(); ``` ### Issue 3: Code Coverage Lower Than Expected **Symptoms:** Coverage report shows gaps in tested code **Root Cause:** Missing test cases for edge cases, error paths **Solution:** - Review coverage report to identify untested lines - Add test cases for error handling paths - Test edge cases and boundary conditions - Mock external dependencies to isolate code under test ### Issue 4: Debugging Reveals Complex Interdependencies **Symptoms:** Fixing one issue breaks other functionality **Root Cause:** Tight coupling, shared state, hidden dependencies **Solution:** - Use dependency injection to decouple components - Refactor shared state into explicit parameters - Add integration tests to catch cascading failures - Document dependencies clearly in code comments ### Issue 5: Performance Degradation After Fixes **Symptoms:** Tests pass but execution time significantly increased **Root Cause:** Inefficient fix implementation, excessive validation **Solution:** ```bash # Profile execution to identify bottlenecks node --prof app.js node --prof-process isolate-*.log > profile.txt # Optimize hot paths # - Cache repeated computations # - Reduce unnecessary iterations # - Use appropriate data structures ``` ## Examples ### Example 1: Validating API Endpoint Functionality ```bash # Initialize audit for API endpoint npx claude-flow hooks pre-task --description "Audit /api/users endpoint" # Phase 1: Setup mkdir -p /tmp/api-audit cd /tmp/api-audit npm init -y npm install --save-dev jest supertest # Phase 2: Execute tests cat > users.test.js << 'EOF' const request = require('supertest'); const app = require('../src/app'); describe('GET /api/users', () => { test('returns 200 with user list', async () => { const response = await request(app).get('/api/users'); expect(response.statusCode).toBe(200); expect(Array.isArray(response.body)).toBe(true); }); test('handles authentication', async () => { const response = await request(app) .get('/api/users') .set('Authorization', 'Bearer invalid-token'); expect(response.statusCode).toBe(401); }); }); EOF npm test # Phase 3: Debug failures (if any) # Analyze error messages and stack traces # Fix authentication logic or data access issues # Phase 4: Validate fixes npm test -- --coverage # Phase 5: Report echo "API audit complete: All endpoints functional" ``` ### Example 2: Validating Data Processing Function ```javascript // Phase 2: Create test with realistic data const { processUserData } = require('./data-processor'); describe('processUserData', () => { test('processes valid user data correctly', () => { const input = { name: 'John Doe', email: 'john@example.com', age: 30, roles: ['user', 'admin'] }; const result = processUserData(input); expect(result).toHaveProperty('id'); expect(result.email).toBe('john@example.com'); expect(result.roles).toContain('admin'); }); test('handles missing fields gracefully', () => { const input = { name: 'Jane' }; const result = processUserData(input); expect(result).toHaveProperty('email'); expect(result.email).toBe(''); // Default value }); test('validates email format', () => { const input = { email: 'invalid-email' }; expect(() => processUserData(input)).toThrow('Invalid email format'); }); }); // Phase 3: Debug - Found that email validation regex was incorrect // Fix: Update regex to properly match email patterns // Phase 4: Validate - All tests pass, coverage 95% ``` ### Example 3: Validating Frontend Component ```javascript // Phase 2: Create component test with user interactions import { render, screen, fireEvent } from '@testing-library/react'; import LoginForm from './LoginForm'; describe('LoginForm', () => { test('submits form with valid credentials', async () => { const handleSubmit = jest.fn(); render(); fireEvent.change(screen.getByLabelText(/email/i), { target: { value: 'test@example.com' } }); fireEvent.change(screen.getByLabelText(/password/i), { target: { value: 'password123' } }); fireEvent.click(screen.getByRole('button', { name: /login/i })); await screen.findByText(/logging in/i); expect(handleSubmit).toHaveBeenCalledWith({ email: 'test@example.com', password: 'password123' }); }); test('displays validation errors', () => { render(); fireEvent.click(screen.getByRole('button', { name: /login/i })); expect(screen.getByText(/email is required/i)).toBeInTheDocument(); expect(screen.getByText(/password is required/i)).toBeInTheDocument(); }); }); // Phase 3: Debug - Found that form validation wasn't triggering // Fix: Add form validation logic with proper error state management // Phase 4: Validate - Component renders and validates correctly ```