--- namespace: aiwg name: regression-auto-baseline platforms: [all] description: Automatically manage regression test baseline lifecycle triggered by releases, deployments, and quality gates --- # regression-auto-baseline Automatically manage regression test baseline lifecycle with triggers for releases, deployments, and quality gates. ## Triggers Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description): - "auto-update baselines" → automatic baseline management - "baseline on release" → release-triggered baseline update ## Purpose This skill automates regression baseline management by: - Triggering baseline updates on semantic version tags - Updating baselines after successful merges to main/master - Creating baselines on deployment success - Scheduling periodic baseline refreshes - Managing baseline retention and archival - Validating baseline quality before activation - Integrating with CI/CD for seamless automation ## Behavior When triggered or activated automatically, this skill: 1. **Detects trigger conditions**: - Semantic version tag created (v*.*.*) - Merge to protected branch completed - Deployment succeeded to environment - Quality gate passed - Manual approval received - Scheduled time reached 2. **Validates pre-conditions**: - Tests are passing (minimum threshold met) - No critical issues open - Quality gates passed - Human approval obtained (if required) - No concurrent baseline updates in progress 3. **Captures new baseline**: - Run full regression test suite - Capture all artifact types (functional, visual, performance, API) - Verify baseline stability (run multiple times if configured) - Generate checksums for integrity - Tag with git commit, release version, timestamp 4. **Validates baseline quality**: - Compare to previous baseline - Detect unexpected changes - Verify all expected outputs present - Check for baseline drift - Flag suspicious differences for review 5. **Activates baseline**: - Archive previous baseline version - Set new baseline as active - Update baseline manifest - Notify relevant teams - Create baseline comparison report 6. **Manages lifecycle**: - Archive old baselines per retention policy - Compress large baselines - Upload to cloud storage if configured - Prune outdated baselines - Maintain audit trail ## Trigger Types ### Semantic Version Tag ```yaml trigger: type: git_tag pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$" description: "Baseline on semantic version tags" behavior: on_tag_create: - validate_tag_format - check_if_tests_passing - create_baseline: name: "release-{tag}" auto_activate: true - tag_baseline_with_git_tag example: tag: v2.1.0 baseline_id: release-v2.1.0 status: auto-created ``` ### Merge to Main ```yaml trigger: type: git_merge branches: - main - master description: "Baseline after merge to protected branch" behavior: on_merge_complete: - wait_for_ci_success - check_test_pass_rate: minimum: 95% - create_baseline: name: "main-{commit_short}" auto_activate: false # Require approval - request_approval: approvers: [tech-lead, test-lead] example: merge_commit: abc123def baseline_id: main-abc123d approval_required: true ``` ### Deployment Success ```yaml trigger: type: deployment environments: - production - staging description: "Baseline after successful deployment" behavior: on_deployment_success: - wait_for_smoke_tests - verify_health_checks - create_baseline: name: "{environment}-{deploy_id}" auto_activate: true - link_to_deployment: id: deploy_id timestamp: deployment_timestamp example: environment: production deploy_id: prod-2026-01-28-01 baseline_id: production-prod-2026-01-28-01 ``` ### Manual Approval ```yaml trigger: type: manual description: "Baseline after explicit human approval" behavior: on_approval_received: - validate_approver_permissions - capture_approval_metadata - create_baseline: name: "approved-{timestamp}" auto_activate: true - record_approver: name: approver_name reason: approval_reason example: approver: tech-lead reason: "Post-release validation complete" baseline_id: approved-2026-01-28T15-30 ``` ### Scheduled Updates ```yaml trigger: type: schedule description: "Baseline on fixed schedule" schedules: nightly: cron: "0 2 * * *" description: "Nightly baseline capture" auto_activate: false weekly: cron: "0 3 * * 0" description: "Weekly baseline for long-term tracking" auto_activate: false monthly: cron: "0 4 1 * *" description: "Monthly baseline archive" auto_activate: false behavior: on_schedule_trigger: - check_if_tests_healthy - create_baseline: name: "{schedule}-{date}" auto_activate: false - generate_drift_report ``` ## Baseline Storage Strategies ### Local File Storage ```yaml storage: type: local location: .aiwg/testing/baselines/ structure: - functional/ - visual/ - performance/ - api/ retention: active: 1 # Keep 1 active baseline per type archive: 10 # Keep 10 archived versions compress: true # Compress archives with gzip cleanup: prune_after_days: 90 keep_tagged: true # Never prune tagged baselines ``` ### Git LFS ```yaml storage: type: git_lfs description: "Use Git LFS for large baselines (screenshots, large JSON)" configuration: track_patterns: - "*.png" - "*.jpg" - "baselines/**/*.json" max_file_size: 10MB behavior: on_baseline_create: - compress_if_needed - add_to_git_lfs - commit_with_metadata ``` ### Cloud Storage (S3/GCS) ```yaml storage: type: cloud provider: s3 # or gcs, azure_blob configuration: bucket: my-project-baselines path_prefix: baselines/{environment}/ encryption: AES256 behavior: on_baseline_create: - upload_to_cloud - generate_signed_url: expiry: 7 days - store_url_in_manifest - keep_local_copy: false # Optional: delete after upload retention: lifecycle_policy: - after_30_days: transition_to_infrequent_access - after_90_days: transition_to_glacier - after_365_days: delete ``` ### Artifact Registry ```yaml storage: type: artifact_registry description: "Integrate with artifact registries (npm, Maven, Docker)" configuration: registry_url: https://registry.example.com repository: baseline-artifacts versioning: semver # Follow package versioning behavior: on_baseline_create: - package_baseline: format: tarball name: "{project}-baseline" version: "{tag_version}" - publish_to_registry - tag_as_latest: true ``` ## Baseline Validation ### Pre-Activation Checks ```yaml validation: before_activation: - test_pass_rate: minimum: 95% description: "At least 95% of tests must pass" - quality_gates: all_must_pass: true description: "All quality gates must be green" - stability_check: runs: 3 consistency: 100% description: "Baseline must be 100% consistent across 3 runs" - critical_issues: max_open: 0 description: "No critical issues can be open" - human_approval: required_for: - production_baselines - breaking_changes approvers: [tech-lead, qa-lead] ``` ### Baseline Diff Detection ```yaml validation: diff_detection: compare_to_previous: true flag_changes: - new_failures - performance_degradation - unexpected_schema_changes - visual_differences severity_classification: critical: - new_test_failures - security_regressions high: - performance_degradation_gt_20_percent - breaking_api_changes medium: - new_warnings - minor_visual_differences low: - timestamp_changes - non-functional_metadata action_on_critical: - block_activation - notify_stakeholders - require_manual_review ``` ## Environment-Specific Baselines ```yaml environments: development: auto_baseline: true trigger: merge_to_develop retention: 5 approval_required: false staging: auto_baseline: true trigger: deployment_success retention: 10 approval_required: true approvers: [qa-lead] production: auto_baseline: true trigger: semantic_version_tag retention: 20 approval_required: true approvers: [tech-lead, release-manager] validation_level: strict ``` ## Selective Baseline Updates ```yaml selective_updates: description: "Update only specific baseline types" update_policies: functional: trigger: every_merge auto_activate: true visual: trigger: design_system_update auto_activate: false require_approval: true performance: trigger: weekly auto_activate: false threshold_change: 10% # Only update if >10% improvement api: trigger: version_tag auto_activate: true breaking_changes_require_approval: true ``` ## Integration with CI/CD ### GitHub Actions Example ```yaml # .github/workflows/auto-baseline.yml name: Auto-Baseline Management on: push: tags: - 'v*.*.*' workflow_dispatch: inputs: baseline_type: description: 'Baseline type to update' required: true type: choice options: - all - functional - visual - performance - api jobs: auto_baseline: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run tests and capture baseline run: | npm test -- --coverage npm run test:integration npm run test:e2e - name: Create baseline run: | npx aiwg baseline create auto-${GITHUB_REF_NAME} \ --type all \ --tag ${GITHUB_REF_NAME} \ --auto-activate - name: Upload to S3 run: | aws s3 sync .aiwg/testing/baselines/ \ s3://my-baselines/baselines/${GITHUB_REF_NAME}/ - name: Update manifest run: | git add .aiwg/testing/baselines/manifest.yaml git commit -m "chore: update baseline for ${GITHUB_REF_NAME}" git push ``` ### GitLab CI Example ```yaml # .gitlab-ci.yml baseline_on_release: stage: deploy only: - tags script: - npm test - npx aiwg baseline create release-${CI_COMMIT_TAG} - npx aiwg baseline upload --storage s3 artifacts: paths: - .aiwg/testing/baselines/ expire_in: 30 days ``` ## Conflict Resolution ### Concurrent Baseline Updates ```yaml concurrency_control: mode: lock_file lock_file: .aiwg/testing/baselines/.lock on_conflict: - check_lock_file - if_locked: wait_timeout: 300 # 5 minutes action_on_timeout: fail_with_message distributed_lock: enabled: true backend: redis key: "baseline_update_lock" ttl: 600 # 10 minutes ``` ### Merge Conflicts ```yaml merge_strategy: on_baseline_conflict: - prefer: newer_timestamp - fallback: manual_resolution resolution_workflow: - detect_conflict - preserve_both_baselines - notify_team_for_manual_merge - provide_diff_comparison ``` ## Baseline Rollback ```yaml rollback: enabled: true preserve_rollback_history: true rollback_triggers: - manual_command - quality_regression_detected - deployment_failure rollback_process: - identify_previous_baseline - validate_previous_baseline_integrity - deactivate_current_baseline - activate_previous_baseline - update_manifest - notify_stakeholders - record_rollback_reason example: current: baseline-v2.1.0 rollback_to: baseline-v2.0.5 reason: "Critical regression in v2.1.0" triggered_by: tech-lead ``` ## Baseline Diff Reports ```markdown # Baseline Diff Report **Date**: 2026-01-28 **Previous Baseline**: functional-auth-v2 **New Baseline**: functional-auth-v3 **Trigger**: semantic_version_tag v2.1.0 ## Executive Summary **Changes Detected**: 12 differences **Severity**: Medium **Recommendation**: Approve with review of API changes | Category | Previous | New | Change | |----------|----------|-----|--------| | Tests Passing | 45/45 | 47/47 | +2 new tests | | Functional Outputs | 100% match | 98% match | 2 API changes | | Performance | Baseline | +5% faster | Improvement | | Visual | Baseline | 100% match | No change | ## Detailed Changes ### Functional Changes #### Change 1: New field in login response **Type**: API Schema Change **Severity**: Medium - Breaking for strict clients **Previous** (baseline-v2): ```json { "token": "jwt.header.payload", "user": { "id": "uuid", "email": "user@example.com" } } ``` **New** (baseline-v3): ```json { "token": "jwt.header.payload", "user": { "id": "uuid", "email": "user@example.com", "display_name": "John Doe" } } ``` **Impact**: Clients using strict schema validation may fail **Recommendation**: Document as breaking change in CHANGELOG #### Change 2: Error message improved **Type**: Error Message Update **Severity**: Low - Informational **Previous**: ``` Error: Invalid input ``` **New**: ``` ValidationError: Email format is invalid. Expected format: user@example.com ``` **Impact**: Better user experience, non-breaking **Recommendation**: Approve ## Performance Changes | Metric | Previous | New | Change | Status | |--------|----------|-----|--------|--------| | Login p50 | 45ms | 42ms | -7% | ✅ Improved | | Login p95 | 120ms | 115ms | -4% | ✅ Improved | | Registration p50 | 85ms | 80ms | -6% | ✅ Improved | ## New Tests Added 1. `test/auth/display-name-validation.test.ts` - Tests display name field validation - Covers new feature 2. `test/auth/error-message-quality.test.ts` - Tests improved error messaging - Regression test for clarity ## Recommendations ### Immediate Actions - [ ] Review API schema change for breaking impact - [ ] Update API documentation with new field - [ ] Notify API consumers via changelog - [ ] Approve baseline if change is intentional ### Follow-up - [ ] Add versioning to API if not present - [ ] Consider deprecation period for schema changes - [ ] Update client SDKs with new field ## Approval To approve this baseline: ```bash aiwg baseline approve functional-auth-v3 \ --reviewed-by "tech-lead" \ --reason "API schema change documented and approved" ``` To reject and rollback: ```bash aiwg baseline rollback functional-auth-v3 \ --to functional-auth-v2 \ --reason "Breaking change not acceptable" ``` ``` ## Usage Examples ### Auto-Baseline on Release Tag ``` Event: Git tag v2.1.0 created Skill activates automatically: 1. Detect tag creation event 2. Wait for CI to complete 3. Check test pass rate: 100% (45/45) 4. Run baseline capture 5. Validate baseline stability (3 runs) 6. Compare to previous baseline 7. Generate diff report Output: "Auto-Baseline Created: release-v2.1.0 Trigger: Git tag v2.1.0 Previous: release-v2.0.5 Changes: 3 API fields added, 2 tests added Quality: 100% tests passing Stability: 100% consistent (3/3 runs) Status: Pending approval Approvers: tech-lead, release-manager Diff report: .aiwg/testing/baseline-diff-v2.1.0.md Approve with: aiwg baseline approve release-v2.1.0" ``` ### Auto-Baseline on Merge to Main ``` Event: PR #456 merged to main Skill activates: 1. Detect merge event 2. Wait for CI success 3. Check test pass rate: 98% (44/45) - 1 flaky test 4. Decision: Pass rate below 100%, require approval Output: "Auto-Baseline Blocked: main-abc123d Trigger: Merge to main (commit abc123d) Test pass rate: 98% (44/45) Reason: 1 test flaky, below 100% threshold Action required: - Stabilize flaky test: test/api/timeout.test.ts - Re-run baseline after fix Or override with: aiwg baseline create main-abc123d \ --force \ --approved-by tech-lead \ --reason 'Known flaky test acceptable'" ``` ### Scheduled Weekly Baseline ``` Event: Weekly schedule (Sunday 3 AM) Skill activates: 1. Detect schedule trigger 2. Run full test suite 3. Capture baseline: weekly-2026-01-28 4. Compare to last week 5. Generate drift report Output: "Weekly Baseline Created: weekly-2026-01-28 Previous: weekly-2026-01-21 Drift: Minimal (0.2% performance improvement) New tests: 3 added Test health: 100% passing Auto-activated: No (scheduled baselines require approval) Drift report: .aiwg/testing/weekly-drift-2026-01-28.md Review and approve: aiwg baseline review weekly-2026-01-28" ``` ## Integration This skill uses: - `regression-baseline`: Core baseline creation and comparison - `regression-bisect`: Find commits that changed baseline - `regression-metrics`: Track baseline drift over time - `test-coverage`: Ensure baseline captures critical paths - `issue-auto-sync`: Create issues for baseline approval - `project-awareness`: Detect CI/CD configuration ## Agent Orchestration ```yaml agents: baseline_creation: agent: test-engineer focus: Capture and validate baseline diff_analysis: agent: test-architect focus: Analyze baseline changes approval: agent: tech-lead focus: Review and approve baselines automation: agent: devops-engineer focus: CI/CD integration ``` ## Configuration ### Auto-Baseline Rules ```yaml # aiwg.yml regression: auto_baseline: enabled: true triggers: semantic_version_tag: enabled: true pattern: "^v[0-9]+\\.[0-9]+\\.[0-9]+$" auto_activate: false approval_required: true merge_to_main: enabled: true branches: [main, master] auto_activate: false test_pass_threshold: 100% deployment_success: enabled: true environments: [production, staging] auto_activate: true scheduled: enabled: true nightly: cron: "0 2 * * *" auto_activate: false weekly: cron: "0 3 * * 0" auto_activate: false validation: minimum_test_pass_rate: 95% stability_runs: 3 require_human_approval_for: - production_baselines - breaking_changes - critical_severity_changes storage: primary: local backup: s3 retention: active: 1 archive: 10 compress: true notifications: on_baseline_created: [slack, email] on_approval_needed: [slack, issue_comment] on_baseline_activated: [slack] ``` ### Retention Policy ```yaml retention: by_tag: release_tags: keep_forever development_tags: 30_days untagged: 7_days by_environment: production: 365_days staging: 90_days development: 30_days by_type: functional: 90_days visual: 60_days performance: 120_days api: 90_days compression: after_days: 7 algorithm: gzip keep_uncompressed: false ``` ## Notification Templates ### Baseline Created ``` 📸 Auto-Baseline Created Baseline: {baseline_id} Trigger: {trigger_type} Environment: {environment} Status: {status} Changes from previous: • {change_count} differences detected • Severity: {max_severity} Diff report: {diff_report_url} {approval_required ? "Approval required from: {approvers}" : "Auto-activated"} ``` ### Approval Needed ``` ⚠️ Baseline Approval Required Baseline: {baseline_id} Reason: {approval_reason} Changes: {change_summary} Review: {diff_report_url} Approve: aiwg baseline approve {baseline_id} --reason "..." Reject: aiwg baseline reject {baseline_id} --reason "..." Expires: {approval_timeout} ``` ### Baseline Activated ``` ✅ Baseline Activated Baseline: {baseline_id} Activated at: {activation_timestamp} Replaces: {previous_baseline_id} This baseline is now active for regression testing. All future test runs will be compared against this baseline. ``` ## Output Locations - Baselines: `.aiwg/testing/baselines/{type}/` - Manifest: `.aiwg/testing/baselines/manifest.yaml` - Diff reports: `.aiwg/testing/baseline-diffs/` - Approval requests: `.aiwg/testing/baseline-approvals/` - Archive: `.aiwg/testing/baselines/archive/` - Locks: `.aiwg/testing/baselines/.lock` ## References - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-baseline/SKILL.md - Manual baseline creation - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-bisect/SKILL.md - Find breaking commits - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-metrics/SKILL.md - Baseline drift tracking - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/testing/regression.yaml - Regression schema - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/regression-check.md - Regression testing command - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.md - Test Engineer agent - @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Executable feedback rules