--- version: "1.0.0" evaluation: programmatic agent: claude-code model: claude-sonnet-4-6 model_provider: anthropic snapshot: python312-uv-office # The headline deliverable — the filled workbook. primary_outputs: - filled_workbook.xlsx secrets: {} --- # JSON → Excel — Template Filler Runbook > **EXECUTE THIS RUNBOOK NOW.** Work through the steps in order with tools, writing > every deliverable to `{{results_dir}}`. This is a task to perform, not a document to > summarize. Your first action is a tool call (Step 1). The inputs are already provided > below — never pause to ask the user for them. ## Inputs (already provided) - **Template workbook:** the first `*.xlsx` uploaded into `/app/assets/` (an EMPTY template with labels, headers, and placeholder cells). - **Fill data:** the first `*.json` uploaded into `/app/assets/` (the values to insert). - **Notes (optional):** the first `*.md` uploaded into `/app/assets/`, if any. - **Output basename:** {{output_basename}} · **Overflow strategy:** {{overflow_strategy}} ## Objective Given (a) an **empty** Excel template (`.xlsx`) whose cells already carry the document's structure — section labels, column headers, row labels — and placeholder cells for the data, and (b) a **JSON** file describing the values to insert, produce a **filled** copy of the template that writes every value into its correct cell **without ever overwriting a structural label or a formula**. The filler must: 1. **Preserve every structural label and formula** already in the template. Only *placeholder* cells (empty, `TBD`, `XX%`, `[Leave blank]`, dropdown prompts, checkbox lists) are writable. Labels are sacred. 2. **Match each fill to its destination** by one of three targeting modes (see Input Contract): explicit `cell`, a `match_label` (find the row/label and write to the adjacent placeholder, or to the column whose header matches each value key), or a `table_range` + `rows` (write a table's data rows under the existing header). 3. **Coerce values by the destination cell's `number_format`**, not by their surface form (a percent cell gets a float; a text cell keeps the string). 4. Preserve structure, styling, formulas, and merged cells. 5. **Detect and handle overflow** — more data rows than placeholder rows — via the configured strategy (`alert` / `insert` / `both`). 6. Tolerate missing/blank values; record everything dropped or skipped for auditability. This is the domain-agnostic engine behind any "fill an Excel template from structured data" task — budgets, price lists, status reports, scorecards, intake forms, etc. --- ## Parameter precedence (read first) **Explicit run parameters win over hints in the notes file.** If the user passes `overflow_strategy=alert`, use alert mode even if the notes say "insert new rows". The notes file is consulted only for: which fills are intentionally blank, dropdown/validation caveats, sheets to delete, and anything that does NOT correspond to a parameter already set. Surface any tension in `summary.md` under "Recommendations" — never silently override a parameter. --- ## REQUIRED OUTPUT FILES (MANDATORY) **You MUST write all of the following to `{{results_dir}}`. The task is NOT complete until every file exists and is non-empty. No exceptions.** | File | Description | |------|-------------| | `{{results_dir}}/filled_workbook.xlsx` | The filled workbook — same sheets and dimensions as the template (plus inserted rows when `overflow_strategy=insert`), original styling/formulas preserved, values populated. | | `{{results_dir}}/overflow_report.json` | Per-fill record of overflow events (answer larger than its range), the strategy used, and where alerts/inserted rows live. May be `[]`. | | `{{results_dir}}/skipped_fills.json` | Fills with empty/missing values or whose destination was a label (intentionally not written), keyed for traceability. May be `[]`. | | `{{results_dir}}/placement_report.json` | One record per fill: target, status, cells written, labels preserved, match score. | | `{{results_dir}}/summary.md` | Executive summary: counts placed/skipped/overflowed, sheets touched, parameter values, issues. | | `{{results_dir}}/validation_report.json` | Stage-by-stage validation with `overall_passed`. See Step 8. | If you finish your analysis but have not written all files, go back and write them first. --- ## Parameters | Parameter | Template Variable | Default | Description | |-----------|------------------|---------|-------------| | Results directory | `{{results_dir}}` | `/app/results` (Jetty) / `./results` (local) | Output directory for all results | | Template workbook | *(uploaded)* | first `*.xlsx` in `/app/assets/` | The empty `.xlsx` template | | Fill data | *(uploaded)* | first `*.json` in `/app/assets/` | The JSON of values to insert | | Notes | *(uploaded, optional)* | first `*.md` in `/app/assets/` | User notes — intentional blanks, dropdown caveats, sheets to drop | | Output basename | `{{output_basename}}` | `workbook` | Names the filled file (`FILLED_.xlsx`) | | Overflow strategy | `{{overflow_strategy}}` | `alert` | `alert` (truncate + flag), `insert` (add rows/cols), or `both` | | Max overflow inserts | `{{max_overflow_inserts}}` | `50` | Cap on inserted rows/cols per table before falling back to `alert` | --- ## Dependencies | Dependency | Type | Required | Description | |------------|------|----------|-------------| | `openpyxl` | Python package | Yes | Read/write `.xlsx` preserving styles, merged cells, comments, formulas | | `beautifulsoup4` | Python package | Optional | Parse `value_html` when a fill carries rich text/HTML | | `lxml` | Python package | Optional | Faster HTML parser backend for BeautifulSoup | --- ## Input Data Contract The fill data JSON is an object with a `fills` array. Each fill targets a destination in one of three ways and carries a value: ```json { "fills": [ { "sheet": "Budget", "match_label": "Salaries", "values": { "Q1": 120000, "Q2": 120000, "Q3": 126000, "Q4": 126000 } }, { "sheet": "Budget", "cell": "E4", "value": "Acme Corp" }, { "sheet": "Budget", "match_label": "Executive Summary", "value": "Long free-text answer…" }, { "sheet": "Inventory", "table_range": ["B10", "E14"], "header": ["Item","Qty","Unit Price","Total"], "rows": [["Widget", 3, "$10.00", "$30.00"], ["Gadget", 5, "$8.50", "$42.50"]] } ] } ``` Targeting modes (a fill uses exactly one): - **`cell`** — write `value` directly into that single cell (after placeholder/label check). - **`match_label`** + **`values`** (object keyed by column header) — find the row whose column-0 (or any) label best matches `match_label`; for each `values` key, find the column whose header matches the key, and write into the placeholder cell at (matched-row, matched-column). - **`match_label`** + **`value`** (single string) — find the label cell, then write `value` into the first placeholder cell to its right (extend-right, up to +8 columns). - **`table_range`** + **`rows`** (+ optional **`header`**) — write the table's data rows into the range, preserving any existing header row and aligning columns by header match. Optional per-fill keys: `value_html` (rich text — strip wrapper tags, keep inner text), `number_format` (force a format), `note` (provenance, never written into a cell). A fill with empty/`null`/whitespace value is intentionally blank → record in `skipped_fills.json`, write nothing. --- ## Step 1: Environment Setup ```bash python -m pip install --quiet openpyxl beautifulsoup4 lxml mkdir -p "{{results_dir}}" # Resolve inputs from /app/assets (uploaded files land there) TEMPLATE="$(ls /app/assets/*.xlsx 2>/dev/null | head -1)" DATA="$(ls /app/assets/*.json 2>/dev/null | head -1)" NOTES="$(ls /app/assets/*.md 2>/dev/null | head -1)" echo "template=$TEMPLATE"; echo "data=$DATA"; echo "notes=${NOTES:-}" [ -s "$TEMPLATE" ] || { echo "ERROR: no template .xlsx in /app/assets"; exit 1; } [ -s "$DATA" ] || { echo "ERROR: no data .json in /app/assets"; exit 1; } ``` --- ## Step 2: Read & Apply Notes (if present) If a notes `.md` exists, read it **before** touching the workbook and extract structured rules: intentional blanks, dropdown/validation caveats (write only an accepted option, drop free text), sheets to delete from the final export, and any per-cell quirks. Persist the extracted rules to `{{results_dir}}/notes_applied.json` so the trajectory shows what you actually applied. Remember: notes never override an explicit run parameter. --- ## Step 3: Load & Classify — build the placement plan 1. `wb = openpyxl.load_workbook(TEMPLATE)` — keep formulas (`data_only=False`). 2. Load the fill data JSON. 3. For every cell in each touched sheet, classify it (store a `cell_class` map): - **`formula`** — `value` is a string starting with `=`. **Never overwrite.** - **`empty`** — `None` or whitespace-only. - **`placeholder`** — writable. A cell is a placeholder if it matches ANY of: - Regex `^[\s\[\]xX%$,.\-]*$` (e.g. `XX%`, `X%`, `[XX]`, `--`) - Case-insensitive equals one of: `tbd`, `n/a`, `please select`, `please choose`, `select an option`, `select...`, `choose one`, `please specify`, `[leave blank]` - Has an attached `DataValidation` (dropdown) — a constrained input slot - A multi-line cell whose non-empty lines all start with a checkbox glyph (`☐ ☑ ☒` or `[ ]` / `[x]`) - **`label`** — any other non-empty cell. **Never overwrite.** 4. For each fill, resolve its destination per the Input Contract targeting mode and record a plan entry (target cells, matched header columns, match score). Validate the `sheet` exists (case-insensitive fallback); if missing, skip that fill (don't crash). 5. Write the plan to `{{results_dir}}/placement_plan.json` (including the `cell_class` of each target) for debugging. **Label/header matching** uses Jaccard similarity: tokenize on whitespace+punctuation, lowercase, drop `%`/`[]` and tokens of length ≤ 1, then `len(a&b)/len(a|b)`. Accept a match at ≥ 0.5 for labels, ≥ 0.3 for table-header alignment. --- ## Step 4: Fill the Workbook Process fills sorted by `(sheet, top_row, top_col)`. For each: ### 4a. Skip path If the fill's value is empty/`null`/whitespace, or its resolved destination is classified `label`/`formula`, append to `skipped_fills.json` (with the reason and the existing label text) and continue. **Never overwrite a `label` or `formula` cell** — that is a hard FAIL. ### 4b. `cell` target Re-check the cell's class. If `placeholder`/`empty`, normalize (Step 4d) and write (`cell.value = ...`; assign `.value` only, never the whole Cell, to preserve styling). ### 4c. `match_label` + `values` (row-label + header-keyed) 1. Find the **header row**: the first row containing any of the `values` keys as a label; record `{key: column}` for each matched key. 2. Find the **destination row**: the row whose column-0 label has the highest Jaccard similarity to `match_label` (≥ 0.5). 3. For each `(key, value)`: the destination cell is `(dest_row, header_col[key])`. If it's `placeholder`/`empty`, normalize and write; if `label`/`formula`, skip + record. ### 4c.1 `match_label` + `value` (label → adjacent placeholder) Find the label cell, then scan rightward from the next column (up to +8 columns) for the first `placeholder`/`empty` cell and write there. Record `wrote_at`. If none is writable, append to `skipped_fills.json` (`reason: "no_writable_cell_adjacent_to_label"`). ### 4c.2 `table_range` + `rows` Parse the range. If its first row contains labels, treat them as the header and align the incoming `header`/columns to the existing labels by Jaccard (≥ 0.3); write only data rows under the header. If column 0 of the range carries row labels, anchor each incoming row to the destination row whose col-0 label matches best (don't write sequentially). Per cell: re-check class; `label`/`formula` → skip, else normalize + write. If the table has more data rows than placeholder rows → overflow (Step 5). ### 4d. Value normalization (FORMAT-AWARE — apply to every write) Read the destination cell's `number_format` and coerce only when it calls for it: - **Percent** (`number_format` contains `%`): `"34%"` → float `0.34`. If the format is `General`/`@` (text), keep the string `"34%"`. - **Number** (`number_format` matches `0`/`#`, not `@`): strip `$ € ,` and write `float`/`int`. If `@` (text), keep the string. - **Date** (`y`/`m`/`d` mask): parse `YYYY-MM-DD` or `M/D/YYYY` to `datetime`; else keep. - **Empty/`None`**: if the rendered value is empty/`"None"`/whitespace → SKIP the write. (`str(None)` → `"None"` must never land in a cell.) - **Checkbox single-select**: if the destination is a multi-line checkbox placeholder and the value names one checked option, write just that option's text. - Otherwise write the rendered string, and set `wrap_text=True` for multi-line content. The template's `number_format` is the source of truth for type coercion, not the value's surface form. ### 4e. Always Record per fill in `placement_report.json`: `{target, status, cells_written, labels_preserved, match_score, overflowed, strategy_applied}`. --- ## Step 5: Overflow Handling ### 5a. `alert` (default) — truncate + flag, NEVER insert Under `alert`, **never** call `insert_rows`/`insert_cols`; dimensions must equal the template's. Truncate the value to fit, attach an `openpyxl.comments.Comment` to the first destination cell (`OVERFLOW: 12 rows / 5 available, 7 truncated. See overflow_report.json`), set a loud fill (`PatternFill(start_color="FFEB9C", end_color="FFEB9C", fill_type="solid")`, preserving any existing non-default fill), and append the full untruncated content to `overflow_report.json`. ### 5b. `insert` If `data_rows > dest_rows` and `delta ≤ {{max_overflow_inserts}}`: `ws.insert_rows(bottom+1, amount=delta)`, copy styling from the last original row onto inserted rows, then write. Maintain a `row_shift_map: {sheet: {orig_row: shift}}` and apply it to every subsequent fill's coords. Above the cap, fall back to `alert`. Column inserts are riskier (they break absolute-column formulas) — default to `alert` for column overflow unless notes request it. ### 5c. `both` Do `insert`, plus a brief comment on the first cell noting "N rows inserted to fit". Every overflow event is appended to `overflow_report.json` regardless of strategy. --- ## Step 6: Save & Self-Verify 1. `wb.save("{{results_dir}}/filled_workbook.xlsx")` 2. Reopen the saved file — `openpyxl.load_workbook(...)` must not raise. 3. Spot-check 5 placed fills: read back the destination and confirm a non-empty value. 4. Spot-check overflow events: confirm the comment/fill/inserted rows are present. --- ## Step 7: Evaluate Outputs Assign each fill a status: | Status | Criteria | |--------|----------| | `PASS` | Non-empty value, destination resolved, all `label`/`formula` cells in/around the destination preserved, value written, and (if it overflowed) the configured strategy applied + logged. | | `PARTIAL` | Placed with caveats: label/header match `< 0.5`/`0.3` (fell back), overflow truncated under `alert`, `insert` hit the cap, or dropdown coercion dropped explanatory text. | | `FAIL` | Non-empty value but could not place — sheet missing, range parse error, openpyxl write exception, **or a `label`/`formula` cell was overwritten** (hard FAIL — never silently roll forward). | | *(not counted)* | `skipped_fills.json` entries — not failures. | Aggregate into `validation_report.json` under `results`, including `labels_preserved_total`. --- ## Step 8: Iterate on Errors (max 3 rounds) For any `FAIL`: read the captured error, apply the matching Common Fix, retry just that fill, re-evaluate. Repeat at most 3 times; after that keep `FAIL` and surface it in `summary.md`. ### Common Fixes | Issue | Fix | |-------|-----| | Existing label/header overwritten | The destination's `cell_class` had `label`/`formula` cells that were ignored. Restore the template's value for that cell and re-resolve the target. **Hard FAIL until fixed.** | | Formula clobbered (e.g. wrote a number over `=SUM(...)`) | Never write into a cell whose value starts with `=`. Classify it `formula` and skip. | | Workbook grew under `alert` | You called `insert_rows` while `overflow_strategy=alert`. Re-read Step 5a — truncate-and-flag only. | | Overrode a parameter because the notes mentioned it | Re-read "Parameter precedence". The run parameter is non-negotiable. | | `"20%"` became `0.2` but the template keeps `"20%"` | The cell's `number_format` is `General`/`@` (text) — write the string as-is (Step 4d). | | Literal `"None"` written into a cell | The rendered value was `str(None)`; skip the write when empty/`None`/whitespace. | | `KeyError` on sheet name | Case-insensitive then `strip().lower()` then fuzzy-substring lookup; log the resolved name. | | `IllegalCharacterError` on write | Strip control/zero-width chars, replace tabs with spaces, trim to 32,767 chars (Excel's per-cell limit). | | `MergedCell` write rejection | Write to the merged range's top-left anchor; re-resolve anchors after any `insert_rows`. | | Insert broke a downstream fill's coords | Apply the `row_shift_map` to that fill's coords before retrying; else fall back to `alert`. | | Value JSON has no `fills` array | Accept a bare array too, or `{data:[...]}`; coerce to a `fills` list. Log the coercion. | --- ## Step 9: Write Executive Summary Write `{{results_dir}}/summary.md`: ```markdown # JSON → Excel — {{output_basename}} — Results ## Overview - **Date**: - **Template**: · **Data**: · **Overflow strategy**: {{overflow_strategy}} - **Sheets touched**: - **Fills**: | **Placed**: | **Skipped**: | **Failed**: ## Results Summary | Status | Count | % | |--------|-------|---| | PASS | … | … | | PARTIAL | … | … | | FAIL | … | … | | SKIPPED | … | … | ## Overflow Events - **Total**: · resolved by insert / alert: / · top offenders: ## Notes Applied ## Recommendations / Limitations ``` --- ## Step 10: Write Validation Report Write `{{results_dir}}/validation_report.json`: ```json { "version": "1.0.0", "run_date": "", "parameters": { "output_basename": "{{output_basename}}", "overflow_strategy": "{{overflow_strategy}}", "max_overflow_inserts": {{max_overflow_inserts}} }, "stages": [ { "name": "setup", "passed": true, "message": "Inputs resolved, deps installed" }, { "name": "plan", "passed": true, "message": "Built N placement plans" }, { "name": "placement", "passed": true, "message": "Placed P, skipped S, failed F" }, { "name": "overflow_handling", "passed": true, "message": "O overflow events handled" }, { "name": "label_preservation", "passed": true, "message": "All label/formula cells preserved" }, { "name": "save", "passed": true, "message": "Workbook saved and reopened cleanly" }, { "name": "report_generation", "passed": true, "message": "All output files written" } ], "results": { "pass": 0, "partial": 0, "fail": 0, "skipped": 0, "overflow_events": 0, "labels_preserved_total": 0 }, "overall_passed": true, "output_files": [ "{{results_dir}}/filled_workbook.xlsx", "{{results_dir}}/overflow_report.json", "{{results_dir}}/skipped_fills.json", "{{results_dir}}/placement_report.json", "{{results_dir}}/summary.md", "{{results_dir}}/validation_report.json" ] } ``` `overall_passed` is `true` iff every stage passed **and** `results.fail == 0`. --- ## Final Checklist (MANDATORY — do not skip) ### Verification Script ```bash echo "=== FINAL OUTPUT VERIFICATION ===" RESULTS_DIR="{{results_dir}}" FILLED="$RESULTS_DIR/filled_workbook.xlsx" for f in "$FILLED" \ "$RESULTS_DIR/overflow_report.json" \ "$RESULTS_DIR/skipped_fills.json" \ "$RESULTS_DIR/placement_report.json" \ "$RESULTS_DIR/summary.md" \ "$RESULTS_DIR/validation_report.json"; do [ -s "$f" ] && echo "PASS: $f ($(wc -c < "$f") bytes)" || echo "FAIL: $f is missing or empty" done python - < e[s].max_row or f[s].max_column > e[s].max_column): print(f"FAIL: sheet {s} grew under alert strategy"); sys.exit(1) print("PASS: dimension audit (no growth under alert)") PY echo "=== VERIFICATION COMPLETE ===" ``` ### Checklist - [ ] `filled_workbook.xlsx` exists and reopens cleanly with `openpyxl` - [ ] `overflow_report.json`, `skipped_fills.json`, `placement_report.json` exist (may be `[]`) - [ ] `summary.md` follows the Step 9 template; `validation_report.json` has `stages`/`results`/`overall_passed` - [ ] **Label-preservation audit**: for ≥ 5 sampled fills, the destination's surrounding label/header cells in the FILLED workbook still equal the template's values exactly. Any divergence is a FAIL. - [ ] **Formula audit**: every cell that was a formula in the template is still a formula in the output (none clobbered with a literal). - [ ] **Dimension audit**: under `overflow_strategy=alert`, every sheet's `max_row`/`max_column` equals the template's. Growth = a stray insert = FAIL. - [ ] Sheet names unchanged (none renamed/dropped/added unintentionally) **If ANY item fails, go back and fix it. Do NOT finish until all items pass.** --- ## Tips - **The template's existing cells are the source of truth for layout.** Snapshot and classify every destination cell first; labels and formulas are sacred, placeholders and empties are the only writable surfaces. - **Coerce by `number_format`, not by surface form.** `'0.0%'` → float; `'@'` text → keep the string. Blanket-normalizing destroys values the template intentionally keeps verbatim. - **Row-label anchoring beats sequential writes.** When a range's rows have labels in column 0, match each incoming row to the destination row whose label is most similar — sequential writing misaligns the moment the template has spacer or out-of-order rows. - **Never write over a formula.** A cell whose value starts with `=` is computed structure; writing a literal silently breaks the workbook's math. - **`alert` means zero inserts.** If `max_row`/`max_column` grew, you violated the strategy. - **`openpyxl.insert_rows` shifts everything below.** Keep a `row_shift_map`, process fills top-down, and re-resolve merged-cell anchors after any insert (Excel drops comments on non-anchor merged cells). - **Merged cells reject non-anchor writes** — always write to the top-left anchor. - **Jaccard similarity is enough** — tokenize, lowercase, drop `%`/`[]`, `len(a&b)/len(a|b)`. Thresholds 0.5 (labels) / 0.3 (table headers) are good defaults. - **Excel's per-cell limit is 32,767 chars** — truncate with a trailing marker, don't crash.