# Gate your release on qulib confidence in CI

Stop shipping on vibes. Qulib's `confidence` command fuses live-app quality, test-automation maturity, and API coverage into a single scored verdict — `ship` / `caution` / `hold` / `block` — backed by a 0–100 score and an explicit risk list.

This recipe shows how to drop that verdict into a GitHub Actions release gate so your pipeline fails when the evidence says to wait.

---

## Why this is different from the `qulib-analyze` action

Qulib has two CI surfaces that answer different questions:

| Surface | Command | Verdict vocabulary | Best for |
|---------|---------|-------------------|----------|
| [`qulib-analyze` action](../../.github/actions/qulib-analyze/) | `qulib analyze --agent-summary` | `pass` / `warn` / `fail` | Coarse deploy gate: did the crawl find critical gaps? |
| **This recipe** | `qulib confidence --json` | `ship` / `caution` / `hold` / `block` (0–100 score) | Scored release decision: fuse all signals, apply a numeric threshold |

The analyze action is a boolean gate. The confidence command is a scored verdict — it combines multiple evidence dimensions, weights them, and produces a numeric confidence score you can threshold however your team needs.

Use the analyze action to block obviously broken deploys. Use the confidence recipe for a scored release decision (staging→production promotion, release candidate sign-off, etc.).

---

## Prerequisites

- Node.js 20+ in CI
- A deployed URL reachable from the GitHub Actions runner
- `@qulib/core` (installed via `npx` — no pre-install step needed)
- Playwright Chromium — qulib crawls with Playwright; the workflow below installs it automatically (~2–3 minutes added to your job)

---

## The complete workflow

Copy this into `.github/workflows/release-gate.yml` and set the `APP_URL` variable at the top:

```yaml
name: Release confidence gate

on:
  workflow_dispatch:
  push:
    branches: [main]

env:
  APP_URL: https://your-app.example.com   # <- change this
  QULIB_VERSION: 0.10.0                   # pin for reproducible CI
  SCORE_THRESHOLD: 70                     # fail if score < this (0–100)
  FAIL_ON_CAUTION: false                  # set to true for a stricter gate

jobs:
  release-gate:
    runs-on: ubuntu-latest
    permissions:
      contents: read

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Node.js 20
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Install Playwright Chromium
        run: npx --yes playwright@latest install --with-deps chromium

      - name: Run qulib confidence
        run: |
          npx --yes @qulib/core@${{ env.QULIB_VERSION }} confidence \
            --url "${{ env.APP_URL }}" \
            --repo . \
            --json > qulib-confidence.json
        continue-on-error: true  # CLI always exits 0; gate reads the JSON below

      - name: Evaluate verdict
        id: gate
        run: |
          node - <<'EOF'
          const fs = require('fs');
          const result = JSON.parse(fs.readFileSync('qulib-confidence.json', 'utf8'));

          const verdict = result.verdict;
          const score   = result.confidenceScore;
          const risks   = (result.topRisks || []).slice(0, 5).join('\n  ');
          const blockers = (result.blockers || []).join('\n  ');

          const THRESHOLD      = parseInt(process.env.SCORE_THRESHOLD || '70', 10);
          const FAIL_ON_CAUTION = process.env.FAIL_ON_CAUTION === 'true';

          console.log(`verdict:          ${verdict}`);
          console.log(`confidence score: ${score ?? 'null (nothing evaluable)'}`);
          if (risks)    console.log(`top risks:\n  ${risks}`);
          if (blockers) console.log(`blockers:\n  ${blockers}`);

          // Write outputs for downstream steps.
          const out = process.env.GITHUB_OUTPUT || '/dev/null';
          fs.appendFileSync(out, `verdict=${verdict}\n`);
          fs.appendFileSync(out, `score=${score ?? 'null'}\n`);

          // Gate logic — in order of severity:
          if (verdict === 'block') {
            console.error(`\nGATE FAILED: verdict=block — a hard-blocking signal was detected.`);
            if (blockers) console.error(`Blockers:\n  ${blockers}`);
            process.exit(1);
          }
          if (verdict === 'hold') {
            console.error(`\nGATE FAILED: verdict=hold — confidence ${score}/100 is below the hold floor (30).`);
            process.exit(1);
          }
          if (score !== null && score < THRESHOLD) {
            console.error(`\nGATE FAILED: confidence score ${score}/100 is below your threshold (${THRESHOLD}).`);
            process.exit(1);
          }
          if (FAIL_ON_CAUTION && verdict === 'caution') {
            console.error(`\nGATE FAILED: verdict=caution and FAIL_ON_CAUTION=true.`);
            process.exit(1);
          }

          console.log(`\nGATE PASSED: verdict=${verdict}, score=${score}/100`);
          EOF
        env:
          SCORE_THRESHOLD: ${{ env.SCORE_THRESHOLD }}
          FAIL_ON_CAUTION: ${{ env.FAIL_ON_CAUTION }}

      - name: Write job summary
        if: always()
        run: |
          node - <<'EOF'
          const fs = require('fs');
          const r = JSON.parse(fs.readFileSync('qulib-confidence.json', 'utf8'));
          const verdict = r.verdict;
          const score   = r.confidenceScore ?? 'n/a';
          const icon    = { ship: '✅', caution: '⚠️', hold: '🔶', block: '❌' }[verdict] || '❓';
          const risks   = (r.topRisks || []).map(x => `- ${x}`).join('\n') || '_none_';
          const notes   = (r.honestyNotes || []).map(x => `- ${x}`).join('\n') || '_none_';

          const md = [
            '## qulib release confidence',
            `| | |`,
            `|---|---|`,
            `| Verdict | ${icon} **${verdict}** |`,
            `| Confidence score | ${score}/100 |`,
            `| Threshold | ${process.env.SCORE_THRESHOLD}/100 |`,
            '',
            '### Top risks',
            risks,
            '',
            '### Honesty notes',
            notes,
          ].join('\n');

          fs.appendFileSync(process.env.GITHUB_STEP_SUMMARY, md + '\n');
          EOF
        env:
          SCORE_THRESHOLD: ${{ env.SCORE_THRESHOLD }}

      - name: Upload confidence report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: qulib-confidence
          path: qulib-confidence.json
          retention-days: 30
```

---

## Why the gate script is required

The `qulib confidence --json` command always exits 0, regardless of verdict. The verdict and score live in the JSON output — the CLI never fails the process. This is intentional: the CLI's job is to report honestly; the job of failing CI belongs to your gate policy.

The "Evaluate verdict" step above reads the JSON and applies your policy:

1. `block` → always fails (a hard-blocking signal was detected — a failed crawl, auth wall, or explicit blocker item).
2. `hold` → always fails (confidence is below the absolute floor of 30).
3. Score below your `SCORE_THRESHOLD` → fails (configurable; 70 is a sensible default).
4. `caution` with `FAIL_ON_CAUTION=true` → fails (opt-in stricter gate).
5. Everything else → passes.

---

## The verdict ladder

| Verdict | Score range | What it means | Default CI outcome |
|---------|------------|---------------|-------------------|
| `ship` | ≥ 80 | Strong confidence, no blockers, all required sources evaluated | PASS |
| `caution` | 30–79 | Known risks or an unknown signal on a required source | PASS (default) |
| `hold` | < 30 | Confidence is too low to make a release decision | FAIL |
| `block` | any | A hard-blocking item was detected | FAIL |

`block` is not about the score — it fires when qulib finds something that makes the number irrelevant: a crawl blocked by auth (`auth-required`), an explicit blocker evidence item, or no evaluable evidence at all. When the crawl never ran, a high score would be a lie; qulib refuses to produce one.

---

## Tuning the gate

### Change the threshold

Set `SCORE_THRESHOLD` in the `env` block. 70 is a reasonable starting point; teams with thin automation coverage often start at 50 and raise it as their test suite matures.

### Handle `caution` differently

`caution` (score 30–79) means "we have concerns but not a hard block." The default policy passes it so you can ship with a documented risk. Set `FAIL_ON_CAUTION: true` to treat any `caution` verdict as a blocker — useful for production promotion gates where you require high confidence.

### Authenticated scans

If your staging environment requires authentication, qulib can use a Playwright storage-state file. Write the secret to a file (never inline it) and pass `--storage-state`:

```yaml
- name: Write auth storage state
  run: echo '${{ secrets.QULIB_STORAGE_STATE }}' > /tmp/qulib-auth.json

- name: Run qulib confidence (authenticated)
  run: |
    npx --yes @qulib/core@${{ env.QULIB_VERSION }} confidence \
      --url "${{ env.APP_URL }}" \
      --repo . \
      --storage-state /tmp/qulib-auth.json \
      --json > qulib-confidence.json
  continue-on-error: true
```

Never write secrets directly into the workflow YAML.

### Repo-only gate (no live URL)

If you want to gate on automation maturity and API coverage without a live crawl, omit `--url`:

```yaml
run: |
  npx --yes @qulib/core@${{ env.QULIB_VERSION }} confidence \
    --repo . \
    --json > qulib-confidence.json
```

The verdict will reflect test-automation maturity and API surface coverage only. The Playwright install step is still harmless but can be skipped.

---

## Comparison: this recipe vs `qulib-analyze` action

| | `qulib-analyze` action | This recipe (`qulib confidence`) |
|---|---|---|
| Command | `qulib analyze --agent-summary` | `qulib confidence --json` |
| Verdict vocabulary | `pass` / `warn` / `fail` | `ship` / `caution` / `hold` / `block` |
| Score | release confidence 0–100 (in output) | 0–100 (explicit, gates on it) |
| Gate mechanism | `gate` field in JSON → action exit code | explicit Node.js script you own |
| Evidence sources | live-app crawl only | crawl + automation maturity + API coverage |
| Best for | coarse gate: "is this obviously broken?" | scored release decision: "are we confident enough to ship?" |
| Setup | drop-in composite action | copy-paste workflow |

Both surfaces are complementary. Many teams run both: the analyze action on every PR (fast coarse gate) and the confidence recipe on release branches (deeper scored verdict before staging→production promotion).

---

## Further reading

- [CI integration — analyze action](../../README.md#ci-integration-github-actions) — the existing coarse gate
- [Orchestrator integration](../orchestrator-integration.md) — feeding qulib verdicts into an AI agent loop
- [Release confidence — scoring details](../../README.md#confidence-layer) — how the score is computed and what each verdict means