--- name: office-xlsx display_name: Office XLSX version: "1.0.1" description: >- Create, edit, and analyze Excel spreadsheets (.xlsx, .csv) with formula integrity and format preservation. Uses XML-level editing for existing files to prevent corruption of VBA, pivot tables, and sparklines. Covers financial formatting conventions, formula hygiene, and validation workflows. Keywords: Excel, xlsx, csv, spreadsheet, formula, pivot table, financial model. tags: - document - office - excel - spreadsheet - financial allowed-tools: >- Read Write Edit Bash --- # Office XLSX — Spreadsheet Processing with Formula Integrity Create and edit spreadsheets that preserve formulas, formatting, VBA macros, pivot tables, and sparklines. The key principle: **never silently corrupt data during read-write cycles.** --- ## Task Routing | Task | Route | Method | |------|-------|--------| | Read data, analyze, aggregate | **READ** | openpyxl (read-only) or pandas | | Create new spreadsheet from scratch | **CREATE** | XML template + manual XML editing | | Edit existing spreadsheet | **EDIT** | XML unpack → edit → repack | | Repair broken formulas | **FIX** | Treat as EDIT | | Verify formulas and structure | **VALIDATE** | Static check + LibreOffice recalculation | --- ## Why XML-Level Editing? **openpyxl round-trip silently destroys data.** When you open a .xlsx with openpyxl and save it, the following are quietly discarded: - Pivot tables - Sparklines - VBA macros - Conditional formatting (partially) - Data validation (partially) - Comments in some formats This is unacceptable for editing real-world files. The solution: **edit at the XML level**. A .xlsx file is a ZIP archive containing XML files: ``` [Content_Types].xml _rels/.rels xl/ workbook.xml ← sheet names, relationships sharedStrings.xml ← text cell values styles.xml ← number formats, fonts, fills worksheets/ sheet1.xml ← cell data, formulas, formatting refs _rels/workbook.xml.rels ``` ### XML Edit Pattern ```bash # 1. Unpack mkdir /tmp/xlsx_work && cd /tmp/xlsx_work unzip -o input.xlsx -d unpacked/ # 2. Edit target XML files (e.g., xl/worksheets/sheet1.xml) # ... modify only the specific cells/rows needed ... # 3. Repack cd unpacked && zip -r ../output.xlsx . -x ".*" ``` **Only modify what you need.** Everything else stays untouched — styles, charts, macros, pivot tables all survive. --- ## Route: READ Use openpyxl in **read-only mode** or pandas: ```python # openpyxl (preserves formatting info) from openpyxl import load_workbook wb = load_workbook('file.xlsx', read_only=True, data_only=True) ws = wb.active for row in ws.iter_rows(values_only=True): print(row) # pandas (best for analysis) import pandas as pd df = pd.read_excel('file.xlsx', sheet_name='Sheet1') ``` **Formatting rules for READ output:** - Preserve decimal places as shown in the spreadsheet - Currency: include symbol and 2 decimal places - Percentages: display as `XX.X%`, not `0.XXX` - Aggregations (sum, average, count): compute from DataFrame, don't invent numbers --- ## Route: CREATE **Always start from an XML template**, not from `openpyxl.Workbook()`: 1. Copy a minimal template directory (containing valid XML structure) 2. Edit `xl/worksheets/sheet1.xml` to add your data 3. Update `xl/sharedStrings.xml` if adding text values 4. Pack into .xlsx ### Cell XML Structure ```xml 42 0 SUM(A1:A2) 45678 ``` ### Formula-First Rule **Every derived value must be a real Excel formula**, not a hardcoded number. Bad: `1250` (dead number — can't update when inputs change) Good: `SUM(C2:C9)` (live formula — updates automatically) --- ## Route: EDIT ### Critical Integrity Rules 1. **Never create a new Workbook for edits.** Open → modify → save. Creating new loses everything. 2. **Same sheets rule.** Output must contain exactly the same sheets as input (same names, same order). 3. **Verify after save.** Open the output, check that unmodified cells are unchanged. ### Common Edit Operations **Fill cells with values:** ```xml NEW_VALUE ``` **Add a column:** Insert new `` elements in each ``, shift existing column references (A→B, B→C...). Update formulas that reference shifted columns. **Insert a row:** Add new `` element, increment `r` attributes of subsequent rows. Update formulas that reference shifted rows (e.g., `SUM(B2:B9)` becomes `SUM(B2:B10)` if row inserted within range). **Row-wide borders:** Set style index on all cells in the row. Define border style in `xl/styles.xml`. --- ## Route: VALIDATE ### Static Validation Check formulas without executing them: - All `SUM`, `AVERAGE`, `COUNT` ranges reference existing cells - No `#REF!`, `#NAME?`, `#VALUE!` in formula text - Circular references: cell references itself directly or indirectly - Off-by-one: `SUM(B2:B9)` when data extends to B10 ### Dynamic Validation Use LibreOffice to recalculate and compare: ```bash # Recalculate formulas soffice --headless --calc --convert-to xlsx --outdir /tmp input.xlsx # Compare original cached values with recalculated values # Discrepancy = formula or range error ``` --- ## Financial Formatting Standards When creating financial models or IB-style spreadsheets: ### Cell Color Conventions | Color | Meaning | Hex | |-------|---------|-----| | Blue | User input / hardcoded values | `#0000FF` | | Black | Formulas / derived values | `#000000` | | Green | Cross-sheet references / linked values | `#00B050` | | Gray | Static constants | `#808080` | | Orange | Review / caution | `#FF8C00` | | Red | Error / flag | `#FF0000` | ### Number Formatting | Type | Format | Example | |------|--------|---------| | Currency | `$#,##0.00` | `$1,250.00` | | Percentages | `0.0%` | `12.5%` | | Multiples | `0.0"x"` | `5.2x` | | Negative numbers | `#,##0.00;[Red](#,##0.00)` | Red in parentheses | | Zero values | `#,##0.00;[Red](#,##0.00);"-"` | Dash for zero | | Units in headers | — | `Revenue ($mm)` | ### IB Layout Standards - Totals should SUM the range directly above (not cherry-pick cells) - Hide gridlines; use horizontal borders above totals - Section headers: merged cells with dark fill, white text - Column labels: right-aligned for numbers, left-aligned for text - Indent sub-metrics under parent line items --- ## Formula Hygiene - **Use formulas for derived values**, never hardcode results - **Avoid dynamic array functions**: `FILTER`, `XLOOKUP`, `SORT`, `SEQUENCE` (compatibility issues) - **Keep formulas simple**: use helper cells for complex logic - **Avoid volatile functions**: `INDIRECT`, `OFFSET` (unless required) - **Use absolute references**: `$B$4` for constants, relative for copied formulas - **Guard against errors**: wrap with `IFERROR` when appropriate - **Never use `=TABLE`**: unsupported in most non-Excel environments --- ## Validation Checklist Before delivering any .xlsx: - [ ] File opens without errors in Excel/LibreOffice - [ ] All formulas produce correct results (spot-check 5+ cells) - [ ] No `#REF!`, `#DIV/0!`, `#VALUE!`, `#N/A`, `#NAME?` errors - [ ] Formatting matches requirements (currency, percentages, dates) - [ ] Unmodified content is unchanged (for EDIT tasks) - [ ] Pivot tables / charts / macros still work (for EDIT tasks) - [ ] Print layout is reasonable (page breaks, margins) --- ## Common Pitfalls | Pitfall | Solution | |---------|----------| | Pivot tables disappear after save | Used openpyxl round-trip — use XML-level editing instead | | Formulas show cached values, not recalculating | Open in Excel/LibreOffice to trigger recalculation | | Shared strings corrupted | When adding text cells, update `xl/sharedStrings.xml` count attribute | | Style applied to wrong cells | Style index in `` must match the correct index in `xl/styles.xml` | | Formula range off-by-one after row insert | Must shift all formula references in affected rows | | Date displays as number | Apply date number format via style index | --- ## Bundled Resources ### Scripts | Script | Purpose | Usage | |--------|---------|-------| | `scripts/xlsx_unpack.py` | Unpack .xlsx into XML directory | `python3 xlsx_unpack.py input.xlsx /tmp/work/` | | `scripts/xlsx_pack.py` | Repack XML directory into .xlsx | `python3 xlsx_pack.py /tmp/work/ output.xlsx` | | `scripts/xlsx_reader.py` | Read cells, ranges, formatting info | `python3 xlsx_reader.py input.xlsx --range A1:D10` | | `scripts/xlsx_add_column.py` | Insert column with data, shift existing | `python3 xlsx_add_column.py /tmp/work/ --col C --header "Total"` | | `scripts/xlsx_insert_row.py` | Insert rows, shift existing | `python3 xlsx_insert_row.py /tmp/work/ --row 5 --count 3` | | `scripts/xlsx_shift_rows.py` | Shift row references in formulas | `python3 xlsx_shift_rows.py /tmp/work/ --from 5 --offset 3` | | `scripts/formula_check.py` | Static formula validation (ranges, errors) | `python3 formula_check.py /tmp/work/` | | `scripts/libreoffice_recalc.py` | Dynamic validation via LibreOffice | `python3 libreoffice_recalc.py input.xlsx` | | `scripts/shared_strings_builder.py` | Rebuild shared strings XML | `python3 shared_strings_builder.py /tmp/work/` | | `scripts/style_audit.py` | Audit and report styles used | `python3 style_audit.py /tmp/work/` | ### References | Reference | Content | |-----------|---------| | `references/create.md` | Detailed guide for creating spreadsheets from XML template | | `references/edit.md` | Detailed guide for XML-level editing operations | | `references/fix.md` | Guide for repairing broken formulas and structure | | `references/format.md` | Financial formatting standards (34,000 words, IB-grade) | | `references/validate.md` | Complete validation workflow and criteria | | `references/read-analyze.md` | Guide for reading and analyzing existing spreadsheets | | `references/ooxml-cheatsheet.md` | Quick reference for OOXML spreadsheet XML structure | ### Templates `templates/minimal_xlsx/` — A valid minimal .xlsx structure (unpacked) for use as a CREATE starting point. Contains: `[Content_Types].xml`, `xl/workbook.xml`, `xl/worksheets/sheet1.xml`, `xl/styles.xml`, `xl/sharedStrings.xml`, and relationship files. ### How to Use 1. Copy scripts to workspace: `cp scripts/xlsx_unpack.py .` (repeat for each needed script) 2. Copy template to workspace: `cp -r templates/minimal_xlsx/ .` 3. Load references as needed: `cat references/edit.md` --- ## Dependencies - **Python**: `openpyxl` (read-only mode), `pandas` (analysis) - **System**: `unzip`, `zip` (for XML-level editing) - **Validation**: LibreOffice (`soffice`) for formula recalculation - **No external libraries needed for XML editing** — standard ZIP + text manipulation - **Bundled scripts**: Pure Python, no additional pip dependencies --- ## Attribution This skill incorporates knowledge from: - [MiniMax Office Skills](https://github.com/MiniMax-AI/skills) (MIT) — XML unpack/edit/repack pattern, formula integrity rules, helper scripts architecture - [OpenAI Skills](https://github.com/openai/skills) (Apache-2.0) — Financial formatting conventions, IB layout standards, formula hygiene rules, cell color conventions