# Testing and Release Regression

This project has two different testing surfaces:

- Offline tests that run on normal GitHub-hosted runners and do not need a live
  Ghidra project.
- Live Ghidra regression tests that deploy the extension, start Ghidra, and
  exercise MCP endpoints against a benchmark binary inside an active project.

The live tests are intentionally opt-in because they can add or reset
`Benchmark.dll` and `BenchmarkDebug.exe` in the current Ghidra project.

## Quick Commands

Run the normal local checks:

```text
python -m tools.setup build
pytest tests/unit/ -v --no-cov
```

Deploy without importing the benchmark binary:

```text
python -m tools.setup deploy --ghidra-path "F:\ghidra_12.0.4_PUBLIC"
```

Deploy and run the release-grade live regression:

```text
python -m tools.setup deploy --ghidra-path "F:\ghidra_12.0.4_PUBLIC" --test release
```

Opt in locally so every deploy runs the release regression:

```text
GHIDRA_MCP_DEPLOY_TESTS=release
```

`GHIDRA_MCP_DEPLOY_TESTS` belongs in a local `.env`; it is not intended as the
repository default.

## What Runs by Default

A plain deploy:

1. Detects matching running Ghidra processes for the target install.
2. Requests `save_all_programs`.
3. Requests graceful `exit_ghidra`, which saves open programs and debugger
   traces before closing.
4. Force-kills remaining matching Ghidra processes if graceful exit did not
   finish.
5. Installs the extension and bridge files.
6. Starts Ghidra.
7. Waits for MCP health.
8. Waits for a project-backed endpoint to confirm the active project is ready.
9. Runs MCP schema smoke checks.

A plain deploy does **not** import `Benchmark.dll` or `BenchmarkDebug.exe`
unless the user opts in with `--test ...` or `GHIDRA_MCP_DEPLOY_TESTS`.

## Live Test Tiers

Pass one or more `--test` values to `python -m tools.setup deploy`.

| Tier | Project Mutation | Purpose |
|------|------------------|---------|
| `selected-contract` | No benchmark import | Checks selected release-critical tools against live schema and `tests/endpoints.json`. |
| `endpoint-catalog` | No benchmark import | Confirms all catalog endpoints are present in the live schema. |
| `benchmark-read` | Imports/resets benchmark | Runs broader read-only endpoint checks against `/testing/benchmark/Benchmark.dll`. |
| `benchmark-write` | Imports/resets benchmark, writes test metadata | Runs reversible write smoke checks against the benchmark. |
| `multi-program` | Imports/resets benchmark | Confirms project-path targeting works when multiple programs are open. |
| `negative-contract` | Imports/resets benchmark | Asserts important error cases return actionable messages. |
| `debugger-live` | Imports/resets benchmark, launches test process | Launches `BenchmarkDebug.exe` through MCP debugger endpoints and reads live trace state. |
| `release` | Imports/resets benchmark, writes test metadata, launches test process | Runs the release-grade suite. |

The `release` tier currently runs:

1. Benchmark reset/import.
2. Selected endpoint contract checks.
3. Extended benchmark read checks.
4. Multi-program targeting checks.
5. Negative/error-shape checks.
6. Debugger live launch/status/module/register/stack checks.

Default deploy also runs the schema smoke check before any selected tier.

When a benchmark tier runs, deploy temporarily enables the `/prompt_policy`
endpoint. This narrowly-scoped prompt policy only responds to known automation
dialogs for the benchmark flow, such as benchmark analysis prompts, modified
file saves, and tool-layout save prompts. Unknown dialogs are left alone.

## Benchmark Fixture

The benchmark binary is built from `fun-doc/benchmark` and imported into the
active Ghidra project at:

```text
/testing/benchmark/Benchmark.dll
/testing/benchmark/BenchmarkDebug.exe
```

The filesystem build artifacts stay at:

```text
fun-doc/benchmark/build/Benchmark.dll
fun-doc/benchmark/build/BenchmarkDebug.exe
```

Before benchmark tiers run, the deploy harness deletes any existing benchmark
project file at the legacy and current benchmark paths, recreates the
`/testing/benchmark` folder, imports the current binaries, and waits for
analysis to become idle. It also removes restored benchmark CodeBrowser or
Debugger tool state from the active project before startup so old benchmark
windows do not trigger first-open dialogs before MCP is ready.

This reset is why benchmark tiers are opt-in: users should not get a test binary
added to their project merely because they deployed the extension.

The debugger live tier is Windows-only today and uses Ghidra's Trace RMI
debugger launcher. If the default Python on `PATH` is not compatible with the
Ghidra debugger wheels, set `GHIDRA_DEBUGGER_PYTHON` in local `.env` to the
Python executable Ghidra should use for debugger launches.

## GitHub Actions

### Pull Requests and Merges

`.github/workflows/tests.yml` runs on pull requests and pushes to `main` and
`develop`. It runs the merge-gating checks that work on GitHub-hosted runners:

- Maven build and offline Java tests.
- Python unit tests across supported Python versions.
- Pester setup tests on Windows.
- Documentation linting.

These checks should be configured as required status checks in branch protection.

The live Ghidra release regression can also run on pull requests, but it is
opt-in. Add the PR label:

```text
live-ghidra-regression
```

When that label is present, `.github/workflows/release-regression.yml` runs on a
self-hosted Windows runner and executes the live deploy regression.

This avoids making every external PR wait forever for a private self-hosted
runner while still giving maintainers a real gate for risky MCP/Ghidra changes.

### Releases

`.github/workflows/release-regression.yml` is also available manually from the
Actions tab. It is called by the release and pre-release workflows when
`run_live_regression` is enabled.

For manual release workflows:

1. Open **Create Release** or **Create Pre-Release** in GitHub Actions.
2. Enable `run_live_regression`.
3. Set `ghidra_path` for the self-hosted runner.
4. Start the workflow.

The release job waits for the live regression job and only publishes if it
passes or if the live regression option was not selected.

## Runner Requirements

The normal CI suite runs on GitHub-hosted runners.

The live release regression requires a self-hosted Windows runner with:

- Ghidra 12.0.4 installed.
- Java 21.
- Python 3.13.
- Maven.
- Access to the Ghidra project that should receive `/testing/benchmark`.
- Any `.env` credentials needed to open the project or authenticate to Ghidra
  Server.

The workflow uses:

```yaml
runs-on: [self-hosted, Windows]
```

Set the runner or repository variable `GHIDRA_PATH` if you do not want to pass
`ghidra_path` manually.

## Can the Full Live Suite Run in a GitHub Container?

Not as currently implemented.

The release regression is a GUI/project lifecycle test. It starts Ghidra, waits
for the active project, imports a binary into that project, and exercises GUI MCP
plugin endpoints. A stock GitHub-hosted container has no existing Ghidra project,
no desktop session, and no private Ghidra Server/project credentials.

There are two practical options:

- Use a self-hosted Windows runner for the current full live suite.
- Add a separate headless disposable-project suite later. That suite could run
  on hosted Linux or in a container if it creates a temporary project, starts the
  headless server, imports `Benchmark.dll`, and runs the same read/write
  endpoint contracts against headless mode.

The second option would be valuable, but it should be treated as a different
test target: headless parity and disposable-project coverage, not the same thing
as proving the installed GUI plugin works in a real user project.

## What To Run Before a Release

For the full release runbook, including versioning, documentation, PR, tagging,
and post-release steps, see
[`docs/releases/RELEASE_CHECKLIST.md`](releases/RELEASE_CHECKLIST.md).

Minimum local verification:

```text
python -m tools.setup preflight --ghidra-path "F:\ghidra_12.0.4_PUBLIC"
python -m tools.setup build
pytest tests/unit/ -v --no-cov
python -m tools.setup deploy --ghidra-path "F:\ghidra_12.0.4_PUBLIC" --test release
```

For GitHub releases, enable `run_live_regression` in the release workflow when a
self-hosted runner is available.