--- name: explore-dnn-model description: Manual invocation only; use only when the user explicitly requests `explore-dnn-model` by name. Explore how to run a given DNN model checkpoint in the current Python environment by locating weights + upstream source code, resolving dependencies with user confirmation, running reproducible experiments under `tmp/`, and producing reports about I/O contracts, timing, and profiling. --- # Explore DNN Model ## Minimum Required Inputs (Hard Requirement) To use this skill, the user must provide: - A model checkpoint / model file(s) as a **local** file or directory path (it may be outside the workspace). If the user provides only the checkpoint path (no model name, repo link, or source code), proceed by: 1) Attempting to identify the model name/family from the checkpoint file/dir itself (filenames, adjacent configs/README, embedded metadata, `state_dict` key patterns, etc.). 2) Searching for the implementation in the workspace and/or alongside the checkpoint directory (e.g., nearby Python packages, inference scripts, config files). 3) If still not found, using the best-guess model name/family to search online for the canonical implementation, then cloning the upstream source into `tmp//refs/` for investigation (prefer shallow clone; record URL + commit/tag used). ## Goals This skill has three goals: 1) Verify that the given DNN model can work (inference or training; default focus is **inference**) in the *current* Python environment of the workspace. 2) Determine how to use it (inference or training; default is **inference**) by reading the upstream source code and producing minimal, reproducible runs. 3) Produce two reports: - **Experiment report** (programmatic): generated from `tmp//outputs/` with minimal/no reasoning. - **Stakeholder report** (agent-written): generated by the agent from the experiment report + outputs/logs, with deeper analysis and recommendations. The reports cover: - Input and output contracts (formats, shapes, dtypes, preprocessing/postprocessing) - Benchmarks and performance profiling (latency/throughput/memory, device details) - User-provided metrics/targets (e.g., accuracy, mAP, IoU, F1, latency budget), and whether/how they are met Before changing anything, detect how the environment is managed by checking for: - `pixi.toml` and/or `pyproject.toml` (Pixi-managed project) - `.venv/` (venv-managed project) ## Dependency Policy (Ask Once, Then Apply) If any dependency is missing: - Do **not** install it automatically *without user confirmation*. - List the missing packages (and versions/constraints if known) and ask the developer how to proceed. - Provide clear options, let the developer choose, then proceed with the chosen approach. - Once the developer confirms an approach, apply it for **all** newly required packages (no need to ask approval per package). ### Version Strategy - First attempt: use the **latest versions** resolved by the selected package manager (`pixi`, `pip`, `uv`). - If that fails (import/runtime errors, incompatibilities): fall back to the **specific versions/constraints** documented by the model’s upstream source code or docs. ### Preferred Options (in order) **Pixi-managed env** - Ask the user to choose one: - Modify the current Pixi environment by adding deps to the relevant manifest (`pixi.toml` / `pyproject.toml`). - Create a new Pixi environment specifically to test this model. - Then use `pixi install`/`pixi run ...` to execute. - Prefer **PyPI** packages over **conda-forge** when both are available. - Avoid direct `pip install ...` into the Pixi environment unless the developer explicitly requests it. **`.venv`-managed env** Ask the user to choose one: - Install deps via `pip` (or `uv pip`) into the current `.venv`. - Create a new venv specifically for this model (keeps the repo venv clean). ## Inputs to Collect (ask if missing) - Model name and/or upstream repo link and/or source code path (optional but speeds up identification) - Model task/modality if unclear (classification/detection/segmentation/embedding/audio/video/etc.) - Checkpoint path (file/dir) and format (`.pt`, `.pth`, `.onnx`, `.engine`, etc.) - Any known I/O contract details (expected resolution, channel order, normalization, label mapping), if the user has them - CPU-only requirement (only if the user explicitly requests CPU-only) - Optional: user-provided metrics/targets to evaluate (quality and/or performance) Notes: - Determine framework/runtime automatically from checkpoint type + upstream code/docs + what’s available in the current Python environment. - If hardware is unspecified, default to using hardware acceleration when available (CUDA GPU, ROCm GPU, Apple MPS, etc.). Use CPU-only only if the user requested it. - If unspecified, the default objective is to confirm the model runs end-to-end from input → output (prefer real inputs found in the workspace; synthesize as a fallback) and record end-to-end timing. ## Core Workflow ### 0) Confirm artifacts and pick the target environment - Confirm the minimum required inputs are present: - Checkpoint/model path is accessible locally (file/dir exists). It may be outside the workspace. - If model name/repo/source path is not provided, start by inferring it from the checkpoint and nearby files; if needed, locate it online and clone into `tmp//refs/`. - Detect environment type: - If both Pixi and `.venv` exist, ask the user which one should be treated as the “current” environment for this exploration. - Device default: - If the user did not request CPU-only, use hardware acceleration when available (CUDA/ROCm/MPS/etc.). ### 1) Locate and read the upstream source code/docs - First try to find the implementation locally: - Search the workspace and the checkpoint directory for source code, inference scripts, configs, and docs. - Prefer local source if it appears to be the canonical/official implementation for the checkpoint. - If local source is not available or is clearly incomplete, use online search to find the canonical implementation: - Official GitHub repo, paper, model card, or vendor docs. - Check out the upstream repo under `tmp//refs/` using a shallow clone (`--depth=1`), pinning a tag/commit when possible. - Download/check out the relevant source code (pin a tag/commit when possible) and identify: - The exact inference entrypoints (scripts/modules), model class, preprocessing, postprocessing, and label mapping. - Any config files required to construct the model (YAML/JSON/TOML). - Do not “guess” preprocessing/postprocessing: confirm from code and/or reference examples. ### 2) Derive required dependencies Before running the model or changing the environment, determine the minimal dependencies required to run the model by using (in priority order): - Upstream source code (setup files, `requirements*.txt`, `pyproject.toml`, import graph). - Upstream docs/model card (pinned versions, known-good combos). - Checkpoint type (e.g., `.onnx` implies ONNX Runtime; `.pt/.pth` implies PyTorch; `.engine` implies TensorRT). Make a concise dependency list covering: - Runtime/framework (e.g., `torch`, `onnxruntime`, `opencv-python`) - Model-specific libs (e.g., `ultralytics`, `timm`, `transformers`, `mmengine`, etc.) - Utility deps used by the official inference path (e.g., `numpy`, `Pillow`, `pyyaml`) - Optional acceleration deps (CUDA/TensorRT) separated from the CPU baseline ### 3) Resolve missing dependencies (with user choice) - Check whether each required dependency is available in the current environment. - If anything is missing, ask the user which path to take: - **Pixi:** modify current manifest to add deps, or create a new Pixi env for this model. - **Venv:** install into current `.venv`, or create a new venv for this model. - After the user confirms, apply the decision for all required packages (no per-package prompts). - Use the **Version Strategy** above (latest first; fall back to pinned versions if needed). - After dependency changes, run a quick smoke test: - Imports for the core runtime stack - Minimal “load model” path (without a full benchmark yet) ### 4) Ensure the checkpoint exists locally - Do **not** download checkpoints automatically. - Developers must provide checkpoints/model files (local file/dir paths). - If the checkpoint is missing or only a URL is provided, ask the developer to download it and provide the local path. - If the developer wants a conventional location, prefer `checkpoints/` (gitignored). - Record provenance in a short note (based on what the developer provides): - Claimed source URL(s) or repo, version/commit/tag (if known), file size, and (if feasible) SHA256. ### 5) Create an experiment workspace under `tmp/` Default experiment directory: `/tmp/-