--- name: evals description: "Evaluate job-search pipeline code and outputs for correctness. Use when running code reviews, verifying pipeline outputs after a run, or checking ongoing pipeline health. Handles static analysis, runtime verification, and health monitoring." --- # Evals Independent evaluation suite for the job-search pipeline. Lives outside the pipeline it evaluates. ## Evaluation Levels | Level | Script | When to Run | |-------|--------|-------------| | 1. Code Review | `scripts/code_review.py` | After building or changing pipeline code | | 2. Runtime Verify | `scripts/runtime_verify.py` | After a pipeline run completes | | 3. Health Monitor | `scripts/health_monitor.py` | Recurring — every pipeline run or 2-3x/week | ## Process 1. Run the relevant eval script for your situation 2. Review the report output 3. Fix issues (or flag for user approval if touching shared data) 4. Re-run the eval to verify fixes ## Usage ```bash # Level 1: Static code review of job-search pipeline python3 scripts/code_review.py # Level 2: Validate outputs after a pipeline run python3 scripts/runtime_verify.py # Level 3: Ongoing health monitoring python3 scripts/health_monitor.py python3 scripts/health_monitor.py --json ``` ## Scope - Evaluates: `career-manager/job-search/` (scripts, data, caches) - Does NOT evaluate: itself, other career-manager skills, or non-pipeline code - References: `workflow-standards/references/evaluation.md` for methodology ## Output Format Each script prints a check-by-check report with pass/warn/fail status and returns non-zero exit code if critical checks fail.