--- name: Xlsx description: Excel file processing. USE WHEN xlsx, Excel, spreadsheet. SkillSearch('xlsx') for docs. --- # Requirements for Outputs ## ๐ŸŽฏ Load Full PAI Context **Before starting any task with this skill, load complete PAI context:** `read ~/.claude/skills/CORE/SKILL.md` This provides access to: - Complete contact list (Angela, Bunny, Saลกa, Greg, team members) - Stack preferences (TypeScript>Python, bun>npm, uv>pip) - Security rules and repository safety protocols - Response format requirements (structured emoji format) - Voice IDs for agent routing (ElevenLabs) - Personal preferences and operating instructions ## ๐Ÿ”€ When to Use This Sub-Skill This sub-skill activates when the user's request involves Excel spreadsheets (.xlsx, .xlsm, .csv, .tsv). ### Explicit Triggers - User mentions "create spreadsheet", "new Excel file", "Excel workbook" - User requests "formulas", "financial model", "financial modeling" - User wants to "recalculate" or "recalculate formulas" - User says "analyze data in Excel", "read Excel", "Excel data analysis" - User mentions .xlsx, .xlsm, .csv, or .tsv files ### Contextual Triggers - User provides path to .xlsx/.xlsm file - User discusses calculations, projections, or financial data - User mentions financial projections, revenue models, or valuations - User wants to work with spreadsheet formulas or data ### Workflow Routing **Creation Workflow (openpyxl):** - "Create spreadsheet", "new Excel file", "build financial model" - User wants to create new .xlsx files with formulas and formatting - Use openpyxl for formula support and Excel-specific features **Editing Workflow (openpyxl):** - "Edit spreadsheet", "modify Excel", "update cells" - User wants to modify existing .xlsx files while preserving formulas - Use `load_workbook()` to preserve existing formatting and formulas **Data Analysis Workflow (pandas):** - "Analyze data", "read Excel", "data visualization" - User wants to analyze or visualize data from Excel files - Use pandas for powerful data manipulation and analysis **Financial Modeling Workflow:** - "Financial model", "revenue projections", "valuation model" - User wants professional financial models with color coding - Follow financial model standards (blue inputs, black formulas, green links) **Recalculation Workflow:** - "Recalculate", "update formula values", "calculate formulas" - After creating/editing files with formulas - MANDATORY step after using formulas - run `recalc.py` script ## All Excel files ### Zero Formula Errors - Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?) ### Preserve Existing Templates (when updating templates) - Study and EXACTLY match existing format, style, and conventions when modifying files - Never impose standardized formatting on files with established patterns - Existing template conventions ALWAYS override these guidelines ## Financial models ### Color Coding Standards Unless otherwise stated by the user or existing template #### Industry-Standard Color Conventions - **Blue text (RGB: 0,0,255)**: Hardcoded inputs, and numbers users will change for scenarios - **Black text (RGB: 0,0,0)**: ALL formulas and calculations - **Green text (RGB: 0,128,0)**: Links pulling from other worksheets within same workbook - **Red text (RGB: 255,0,0)**: External links to other files - **Yellow background (RGB: 255,255,0)**: Key assumptions needing attention or cells that need to be updated ### Number Formatting Standards #### Required Format Rules - **Years**: Format as text strings (e.g., "2024" not "2,024") - **Currency**: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)") - **Zeros**: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-") - **Percentages**: Default to 0.0% format (one decimal) - **Multiples**: Format as 0.0x for valuation multiples (EV/EBITDA, P/E) - **Negative numbers**: Use parentheses (123) not minus -123 ### Formula Construction Rules #### Assumptions Placement - Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells - Use cell references instead of hardcoded values in formulas - Example: Use =B5*(1+$B$6) instead of =B5*1.05 #### Formula Error Prevention - Verify all cell references are correct - Check for off-by-one errors in ranges - Ensure consistent formulas across all projection periods - Test with edge cases (zero values, negative numbers) - Verify no unintended circular references #### Documentation Requirements for Hardcodes - Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]" - Examples: - "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]" - "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]" - "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity" - "Source: FactSet, 8/20/2025, Consensus Estimates Screen" # XLSX creation, editing, and analysis ## Overview A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks. ## Important Requirements **LibreOffice Required for Formula Recalculation**: You can assume LibreOffice is installed for recalculating formula values using the `recalc.py` script. The script automatically configures LibreOffice on first run ## Reading and analyzing data ### Data analysis with pandas For data analysis, visualization, and basic operations, use **pandas** which provides powerful data manipulation capabilities: ```python import pandas as pd # Read Excel df = pd.read_excel('file.xlsx') # Default: first sheet all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict # Analyze df.head() # Preview data df.info() # Column info df.describe() # Statistics # Write Excel df.to_excel('output.xlsx', index=False) ``` ## Excel File Workflows ## CRITICAL: Use Formulas, Not Hardcoded Values **Always use Excel formulas instead of calculating values in Python and hardcoding them.** This ensures the spreadsheet remains dynamic and updateable. ### โŒ WRONG - Hardcoding Calculated Values ```python # Bad: Calculating in Python and hardcoding result total = df['Sales'].sum() sheet['B10'] = total # Hardcodes 5000 # Bad: Computing growth rate in Python growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue'] sheet['C5'] = growth # Hardcodes 0.15 # Bad: Python calculation for average avg = sum(values) / len(values) sheet['D20'] = avg # Hardcodes 42.5 ``` ### โœ… CORRECT - Using Excel Formulas ```python # Good: Let Excel calculate the sum sheet['B10'] = '=SUM(B2:B9)' # Good: Growth rate as Excel formula sheet['C5'] = '=(C4-C2)/C2' # Good: Average using Excel function sheet['D20'] = '=AVERAGE(D2:D19)' ``` This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes. ## Common Workflow 1. **Choose tool**: pandas for data, openpyxl for formulas/formatting 2. **Create/Load**: Create new workbook or load existing file 3. **Modify**: Add/edit data, formulas, and formatting 4. **Save**: Write to file 5. **Recalculate formulas (MANDATORY IF USING FORMULAS)**: Use the recalc.py script ```bash python recalc.py output.xlsx ``` 6. **Verify and fix any errors**: - The script returns JSON with error details - If `status` is `errors_found`, check `error_summary` for specific error types and locations - Fix the identified errors and recalculate again - Common errors to fix: - `#REF!`: Invalid cell references - `#DIV/0!`: Division by zero - `#VALUE!`: Wrong data type in formula - `#NAME?`: Unrecognized formula name ### Creating new Excel files ```python # Using openpyxl for formulas and formatting from openpyxl import Workbook from openpyxl.styles import Font, PatternFill, Alignment wb = Workbook() sheet = wb.active # Add data sheet['A1'] = 'Hello' sheet['B1'] = 'World' sheet.append(['Row', 'of', 'data']) # Add formula sheet['B2'] = '=SUM(A1:A10)' # Formatting sheet['A1'].font = Font(bold=True, color='FF0000') sheet['A1'].fill = PatternFill('solid', start_color='FFFF00') sheet['A1'].alignment = Alignment(horizontal='center') # Column width sheet.column_dimensions['A'].width = 20 wb.save('output.xlsx') ``` ### Editing existing Excel files ```python # Using openpyxl to preserve formulas and formatting from openpyxl import load_workbook # Load existing file wb = load_workbook('existing.xlsx') sheet = wb.active # or wb['SheetName'] for specific sheet # Working with multiple sheets for sheet_name in wb.sheetnames: sheet = wb[sheet_name] print(f"Sheet: {sheet_name}") # Modify cells sheet['A1'] = 'New Value' sheet.insert_rows(2) # Insert row at position 2 sheet.delete_cols(3) # Delete column 3 # Add new sheet new_sheet = wb.create_sheet('NewSheet') new_sheet['A1'] = 'Data' wb.save('modified.xlsx') ``` ## Recalculating formulas Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided `recalc.py` script to recalculate formulas: ```bash python recalc.py [timeout_seconds] ``` Example: ```bash python recalc.py output.xlsx 30 ``` The script: - Automatically sets up LibreOffice macro on first run - Recalculates all formulas in all sheets - Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.) - Returns JSON with detailed error locations and counts - Works on both Linux and macOS ## Formula Verification Checklist Quick checks to ensure formulas work correctly: ### Essential Verification - [ ] **Test 2-3 sample references**: Verify they pull correct values before building full model - [ ] **Column mapping**: Confirm Excel columns match (e.g., column 64 = BL, not BK) - [ ] **Row offset**: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6) ### Common Pitfalls - [ ] **NaN handling**: Check for null values with `pd.notna()` - [ ] **Far-right columns**: FY data often in columns 50+ - [ ] **Multiple matches**: Search all occurrences, not just first - [ ] **Division by zero**: Check denominators before using `/` in formulas (#DIV/0!) - [ ] **Wrong references**: Verify all cell references point to intended cells (#REF!) - [ ] **Cross-sheet references**: Use correct format (Sheet1!A1) for linking sheets ### Formula Testing Strategy - [ ] **Start small**: Test formulas on 2-3 cells before applying broadly - [ ] **Verify dependencies**: Check all cells referenced in formulas exist - [ ] **Test edge cases**: Include zero, negative, and very large values ### Interpreting recalc.py Output The script returns JSON with error details: ```json { "status": "success", // or "errors_found" "total_errors": 0, // Total error count "total_formulas": 42, // Number of formulas in file "error_summary": { // Only present if errors found "#REF!": { "count": 2, "locations": ["Sheet1!B5", "Sheet1!C10"] } } } ``` ## Best Practices ### Library Selection - **pandas**: Best for data analysis, bulk operations, and simple data export - **openpyxl**: Best for complex formatting, formulas, and Excel-specific features ### Working with openpyxl - Cell indices are 1-based (row=1, column=1 refers to cell A1) - Use `data_only=True` to read calculated values: `load_workbook('file.xlsx', data_only=True)` - **Warning**: If opened with `data_only=True` and saved, formulas are replaced with values and permanently lost - For large files: Use `read_only=True` for reading or `write_only=True` for writing - Formulas are preserved but not evaluated - use recalc.py to update values ### Working with pandas - Specify data types to avoid inference issues: `pd.read_excel('file.xlsx', dtype={'id': str})` - For large files, read specific columns: `pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])` - Handle dates properly: `pd.read_excel('file.xlsx', parse_dates=['date_column'])` ## Code Style Guidelines **IMPORTANT**: When generating Python code for Excel operations: - Write minimal, concise Python code without unnecessary comments - Avoid verbose variable names and redundant operations - Avoid unnecessary print statements **For Excel files themselves**: - Add comments to cells with complex formulas or important assumptions - Document data sources for hardcoded values - Include notes for key calculations and model sections ## Examples **Example 1: Build a financial model** ``` User: "Create a revenue projection model for the next 5 years" โ†’ Creates workbook with assumptions sheet + projections โ†’ Uses Excel formulas (=SUM, growth rates) not hardcoded values โ†’ Applies color coding (blue inputs, black formulas), runs recalc.py ``` **Example 2: Analyze data from Excel file** ``` User: "What are the top 10 customers by revenue in this spreadsheet?" โ†’ Reads Excel with pandas โ†’ Groups, sorts, and filters data โ†’ Returns summary with statistics ``` **Example 3: Update existing spreadsheet** ``` User: "Add a new column with profit margin calculations" โ†’ Loads workbook preserving existing formulas โ†’ Adds new column with margin formula referencing existing cells โ†’ Saves and recalculates to verify no errors ```