--- name: view-results description: View and analyze Hawk evaluation results. Use when the user wants to see eval-set results, check evaluation status, list samples, view transcripts, or analyze agent behavior from a completed evaluation run. --- # View Hawk Eval Results When the user wants to analyze evaluation results, use these hawk CLI commands: ## 1. List Eval Sets You can list all eval sets if the user do not know the eval set ID: ```bash hawk list eval-sets ``` Shows: eval set ID, creation date, creator. You can increase the limit of results returned by `--limit N`. ```bash hawk list eval-sets --limit 50 ``` Or you can search for a specific eval set by using `--search QUERY`. ```bash hawk list eval-sets --search pico ``` ## 2. List Evaluations With an eval set ID, you can list all evaluations in the eval-set: ```bash hawk list evals [EVAL_SET_ID] ``` Shows: task name, model, status (success/error/cancelled), and sample counts. ## 3. List Samples Or you can list individual samples and their scores: ```bash hawk list samples [EVAL_SET_ID] [--eval FILE] [--limit N] ``` ## 4. Download Transcript To get the full conversation for a specific sample: ```bash hawk transcript ``` The transcript includes full conversation with tool calls, scores, and metadata. To get even more details, you can get the raw data by using `--raw`: ```bash hawk transcript --raw ``` ### Batch Transcript Download You can also download all transcripts for an entire eval set: ```bash # Fetch all samples in an eval set hawk transcripts # Write to individual files in a directory hawk transcripts --output-dir ./transcripts # Limit number of samples hawk transcripts --limit 10 # Raw JSON output (one JSON per line to stdout, or .json files with --output-dir) hawk transcripts --raw ``` ## Workflow 1. Run `hawk list eval-sets` to see available eval sets 2a. Run `hawk list evals ` to see available evaluations 2b. or run `hawk list samples ` to find samples of interest 3a. Run `hawk transcript ` to get full details on a single sample 3b. or run `hawk transcripts --output-dir ./transcripts` to download all 4. Read and analyze the transcript(s) to understand the agent's behavior ## API Environments Production (`https://api.inspect-ai.internal.metr.org`) is used by default. Set `HAWK_API_URL` only when targeting non-production environments: | Environment | URL | |-------------|-----| | Staging | `https://api.inspect-ai.staging.metr-dev.org` | | Dev1 | `https://api.inspect-ai.dev1.staging.metr-dev.org` | | Dev2 | `https://api.inspect-ai.dev2.staging.metr-dev.org` | | Dev3 | `https://api.inspect-ai.dev3.staging.metr-dev.org` | | Dev4 | `https://api.inspect-ai.dev4.staging.metr-dev.org` | Example: ```bash HAWK_API_URL=https://api.inspect-ai.staging.metr-dev.org hawk list eval_sets ```