# Eval and Reports Orange Hyper v0.8.0 stabilizes the local-only Eval and Reports surface validated through v0.8.0-alpha.0 and v0.8.0-alpha.1. This is not telemetry. It does not upload data, call external APIs, call an LLM judge, run MCP servers, run hooks automatically, start subagents, or mutate project memory/config. It does not estimate token savings or claim success-rate improvement. It reads the current `.orange-hyper` project state and shows conservative signal summaries. ## Commands ```bash orange eval snapshot orange eval report orange eval explain ``` All commands support `--json` and keep Adapter JSON `contract_version: "0.1"`. Command ids: - `eval.snapshot` - `eval.report` - `eval.explain` ## `orange eval snapshot` `eval snapshot` summarizes current local project state. Included signals: - `project_id` / `project_name` from `.orange-hyper/config.json` - Quest count from `.orange-hyper/quests/` - completed Quest count from `.orange-hyper/quests/completed/` - verified / unverified completed Quest count - Memory Delta Proposal count from `.orange-hyper/proposals/memory-delta/` - accepted / rejected / pending proposal count - accepted graph node count from `.orange-hyper/graph/nodes/` - doctor errors/warnings count from `runDoctor(cwd)` without repair - hook warning summary from the latest local hook report when present - MCP Advisor catalog availability and local MCP-shaped signal count - growth candidate count from deterministic Growth Signal Preview - adapter recipe count from built-in adapter recipes - identity report existence under `.orange-hyper/identity/` Snapshot does not run `doctor --repair-project-id`, `graph rebuild-index`, `identity build`, hook events, MCP tools, or adapter recipes. ## `orange eval report` `eval report` creates a Markdown report from the same local-only snapshot. By default it writes only to stdout: ```bash orange eval report orange eval report --json ``` It writes a file only when explicitly requested: ```bash orange eval report --write-report orange eval report --write-report --json ``` Report files are written only under: ```text .orange-hyper/evals/reports/ ``` `--write-report` does not accept a path or value. This keeps report path selection inside the kernel and prevents path traversal. The report starts as Markdown. v0.8 does not add an HTML dashboard. ## Report Sections The Markdown report includes: - Project Summary - Quest Completion - Verification Honesty - Memory Proposal Flow - Graph Memory Health - Doctor Diagnostics - Hook Warning Usefulness - MCP Advisor Signals - Growth Signal Preview - Adapter Invocation Readiness - Known Gaps Sections use only these status values: - `good`: local evidence exists and the section has no current warning, unverified item, pending review item, or diagnostic error. - `needs-attention`: local evidence exists and shows a warning, error, pending review item, unverified completed Quest, or other explicit follow-up. - `insufficient-data`: the required local source is missing, no relevant local evidence exists yet, or the metric is intentionally unavailable. Every section includes: - `status` - `reason` - `evidence_count` `evidence_count` counts referenced metrics with available local evidence and a status other than `insufficient-data`. Unavailable metrics do not increase the count. v0.8 does not produce a score, rank, or grade. The top of the Markdown report includes a short summary: - project name / project id - `generated_at` - report mode: `local-only` - total section count - `needs-attention` section count - `insufficient-data` section count - no telemetry / no network / no LLM judge ## JSON Report Schema `orange eval report --json` returns an adapter-friendly JSON payload. Existing camelCase boundary fields remain available, and v0.8.0 keeps the fixed snake_case fields shown below. ```json { "report_id": "eval-report-20260618T010203000Z", "schema_version": 2, "generated_at": "2026-06-18T01:02:03.000Z", "project_id": "project_550e8400-e29b-41d4-a716-446655440000", "project_name": "orange-hyper", "local_only": true, "telemetry": false, "network_upload": false, "llm_judge": false, "summary": { "project_id": "project_550e8400-e29b-41d4-a716-446655440000", "project_name": "orange-hyper", "generated_at": "2026-06-18T01:02:03.000Z", "report_mode": "local-only", "total_sections": 11, "needs_attention_count": 1, "insufficient_data_count": 2, "no_telemetry": true, "no_network": true, "no_llm_judge": true }, "sections": [ { "title": "Project Summary", "status": "good", "reason": "Project identity exists and local project signals can be summarized.", "evidence_count": 4, "metrics": ["project.identity", "quest.count"] } ], "known_gaps": [ { "id": "token.savings", "status": "insufficient-data", "reason": "Token counts are not collected by the local-only Eval and Reports stable surface.", "source": "unavailable", "limitation": "Do not estimate token savings without explicit token usage collection.", "future_target": "An opt-in usage dataset would be required before reporting token savings." } ], "unavailable_metrics": [ { "id": "token.savings", "label": "Token savings", "status": "insufficient-data", "source": "unavailable", "value": null, "unavailable": true, "unavailable_reason": "token counts are not collected", "limitation": "No token usage collection exists in this local-only eval surface, so savings must remain unavailable." } ] } ``` ## `orange eval explain` `eval explain` describes where each metric came from. Examples: - `quest.count` comes from `.orange-hyper/quests/`. - `quest.completed`, `quest.verified`, and `quest.unverified` come from `.orange-hyper/quests/completed/`. - `memory.proposals` comes from `.orange-hyper/proposals/memory-delta/{pending,accepted,rejected}/`. - `graph.accepted_nodes` comes from the current-project graph reader over `.orange-hyper/graph/nodes/`. - `doctor.errors` and `doctor.warnings` come from local doctor diagnostics without repair. - `hook.warnings` comes from existing local hook report files when present. Eval does not run hook events automatically. - `hook.warning.usefulness` comes from the same existing local hook report. If there is no hook report, the status is `insufficient-data`. - `memory.acceptance_rate` is calculated from proposal state: accepted proposals divided by total proposals. It is not a success-rate improvement claim. - `mcp.advisor.availability` comes from the built-in read-only MCP Advisor catalog and local MCP-shaped growth signals. It does not call MCP servers. - `growth.candidates` comes from deterministic local Growth Signal Preview. - `adapter.recipes` comes from built-in adapter invocation recipes. - `identity.report.exists` checks `.orange-hyper/identity/summary.json` and `.orange-hyper/identity/orange-hyper.html`. ## Conservative Metrics Eval metrics are count-based or warning-based. Allowed status values: - `good` - `needs-attention` - `insufficient-data` Unavailable metrics stay unavailable. v0.8 does not estimate: - token savings - success-rate improvement - model capability improvement - raw-agent versus Orange-assisted outcome deltas The stable eval surface must not claim improvements such as "90% success-rate increase" or "tokens saved" without collected evidence. Current unavailable metric policy: - `token.savings`: unavailable because token counts are not collected. - `success_rate.improvement`: unavailable because there is no comparison group or comparative task-pack outcome dataset. - unavailable metrics use `value: null`, `status: "insufficient-data"`, and an explicit `limitation`. ## Local-Only Boundary Eval commands must not: - upload telemetry - call network APIs - call an LLM judge - estimate token savings - auto-create Quest, Proposal, Graph, or Identity artifacts - run MCP tools or install MCP servers - run hook events automatically - run subagents - start an auto planner or auto execution loop - repair doctor findings - mutate project memory or config The only eval write path is `orange eval report --write-report`, and that path is limited to `.orange-hyper/evals/reports/`. `--write-report` does not accept a path, filename, or value. Report filenames use a deterministic `eval-report-` prefix plus the report timestamp. The report is a local generated artifact, not project memory. ## Identity Integration `identity build` does not automatically include eval summaries in v0.8.0. Eval reports remain available only through explicit `orange eval report` commands. A user-approved identity summary integration may be considered as a future target, but it is not part of this stable release.