--- name: architect description: Expert guidance for GabeDA v2.1 architecture (34 modules) - implementing models, features, debugging 4-case logic, and maintaining the /src codebase. version: 2.1.0 --- # GabeDA Architecture Expert ## Purpose This skill provides expert guidance for the GabeDA v2.1 refactored architecture. It focuses on implementing models, adding features, debugging execution logic, and maintaining architectural principles across the 34-module `/src` codebase. **Core Expertise:** - `/src` architecture (34 modules in 6 packages) - 4-case logic execution engine - Feature implementation (filters, attributes, aggregations) - Dependency resolution and data flow - External data integration patterns - Frontend development (React + TypeScript + Vite) - Testing strategies and validation ## When to Use This Skill Invoke this skill when: - Working with the `/src` refactored codebase (v2.1) - Implementing new aggregation models (daily, weekly, monthly, customer, product) - Adding filters, attributes, or computed features - Debugging 4-case logic execution issues - Configuring external data joins - **Developing frontend features (React + TypeScript + Vite)** - **Troubleshooting blank pages or HMR issues** - Understanding data flow and persistence strategies - Troubleshooting column naming or dependency resolution - Ensuring architectural principles are maintained - Creating tests in `/test` folder **NOT for:** Business strategy, marketing content, data analysis notebooks (delegate to business, marketing, insights skills) ## Quick Start **Essential Documents:** - ๐Ÿ“š **[Feature Implementation Guide](../../../ai/architect/feature_implementation_guide.md)** - **PRIMARY GUIDE** for implementation - ๐Ÿ“– **[Documentation Master Index](../../../ai/README.md)** - Central hub for all documentation - ๐Ÿงช **[Test Manifest](../../../ai/testing/TEST_MANIFEST.md)** - Complete test catalog (197 tests) - ๐Ÿ“ **[Documentation Guidelines](../../../ai/standards/DOCUMENTATION_STANDARD.md)** - Before creating any docs **Key References:** - [references/module_reference.md](references/module_reference.md) - 34 modules structure - [references/4_case_logic.md](references/4_case_logic.md) - Critical execution engine - [references/external_data_integration.md](references/external_data_integration.md) - Column naming rules ## Core Architecture Overview ### Module Structure (v2.1) **34 modules in 6 packages** following Single Responsibility Principle: ``` src/ โ”œโ”€โ”€ utils/ # Utilities (7 modules) - 88 tests, 92% coverage โ”œโ”€โ”€ core/ # Core infrastructure (5 modules) โ”œโ”€โ”€ preprocessing/ # Data preparation (5 modules) โ”œโ”€โ”€ features/ # Feature management (4 modules) โ”œโ”€โ”€ execution/ # Feature computation (5 modules) - Includes 4-case logic โ””โ”€โ”€ export/ # Output generation (2 modules) ``` **For complete module details:** See [references/module_reference.md](references/module_reference.md) ### Data Flow Pipeline ``` CSV โ†’ DataLoader โ†’ SchemaProcessor โ†’ SyntheticEnricher โ†’ FeatureStore โ†’ DependencyResolver โ†’ ModelExecutor โ†’ ExcelExporter ``` **For detailed flow stages:** See [references/data_flow_pipeline.md](references/data_flow_pipeline.md) ### Critical: 4-Case Logic The **GroupByProcessor** (`src/execution/groupby.py`) implements single-loop execution with 4 cases: - **Case 1**: Standard filter (reads data_in only) - **Case 2**: Filter using attributes (reads data_in + agg_results) **KEY INNOVATION** - **Case 3**: Attribute with aggregation - **Case 4**: Attribute composition (uses only other attributes) **Case 2 Example:** ```python def price_above_avg(price_total: float, prod_price_avg: float) -> bool: """Filter that uses an attribute as input""" return price_total > prod_price_avg ``` **For deep dive:** See [references/4_case_logic.md](references/4_case_logic.md) ## Core Workflows ### Workflow 1: Implementing a New Model When creating daily, weekly, monthly, customer, or product aggregation models: 1. **Read primary guide** - [Feature Implementation Guide](../../../ai/architect/feature_implementation_guide.md) 2. **Define features** - Create filter and attribute functions with type hints 3. **Create features dictionary** - Register all features 4. **Configure model** - Set group_by, external_data, output_cols 5. **Verify naming** - Check external column prefixes (join keys NOT prefixed, others ARE) 6. **Test execution** - Verify output shapes and values 7. **Create tests** - Add repeatable tests in `/test` folder **Detailed guide:** [assets/examples/implementing_new_model.md](assets/examples/implementing_new_model.md) **Working examples:** - [02_1_week.ipynb](../../../02_1_week.ipynb) - Weekly model with external data - [01_1_1_day.ipynb](../../../01_1_1_day.ipynb) - Daily aggregation - [03_consolidated_all_models.ipynb](../../../notebooks/from_store/03_consolidated_all_models.ipynb) - 9-model pipeline ### Workflow 2: Adding a New Feature When adding filters (row-level) or attributes (aggregated): 1. **Define function** - Include type hints and docstring 2. **Determine type** - Filter (vectorized) or attribute (aggregated)? 3. **Register in dictionary** - Add to features dict 4. **Check dependencies** - Ensure resolvable via DFS 5. **Verify external data** - If using, check column naming 6. **Update model config** - Add to output_cols 7. **Create tests** - Add to `/test` folder with sample data **Detailed guide:** [assets/examples/adding_new_feature.md](assets/examples/adding_new_feature.md) **For feature type details:** See [references/feature_types.md](references/feature_types.md) ### Workflow 3: Configuring External Data When joining external datasets (daily โ†’ weekly, customer โ†’ product): 1. **Verify dataset exists** - Check `ctx.list_datasets()` 2. **Configure in model** - Add external_data section with source, join_on, columns 3. **Remember naming rules:** - **Join keys:** NOT prefixed (e.g., `dt_date` stays `dt_date`) - **Regular columns:** ARE prefixed (e.g., `price_total_sum` โ†’ `daily_attrs_price_total_sum`) 4. **Write feature functions** - Use correct prefixed names 5. **Test join** - Verify merged data has expected columns **Critical naming table:** | Column Type | Original | After Merge | Prefixed? | |-------------|----------|-------------|-----------| | Join key | `dt_date` | `dt_date` | โŒ NO | | Regular column | `price_total_sum` | `daily_attrs_price_total_sum` | โœ… YES | **Detailed guide:** [assets/examples/configuring_external_data.md](assets/examples/configuring_external_data.md) **For complete naming rules:** See [references/external_data_integration.md](references/external_data_integration.md) ### Workflow 4: Debugging Execution Issues When encountering errors during model execution: 1. **Check error message** - "Argument 'X' not found" most common 2. **Verify column naming** - Join keys vs regular columns prefixes 3. **Validate external data** - Check dataset exists and join_on matches 4. **Print available columns** - Use `ctx.get_dataset('name').columns.tolist()` 5. **Test incrementally** - Add features one at a time 6. **Check dependencies** - Ensure DFS can resolve order **Common error: "Argument not found"** **Causes:** 1. External column wrong prefix (join key vs regular) 2. Missing external data config 3. Typo in column name **Solution:** ```python # 1. Check input dataset print(ctx.get_dataset('transactions_filters').columns.tolist()) # 2. Check external dataset print(ctx.get_dataset('daily_attrs').columns.tolist()) # 3. Remember: Join keys NO prefix, others WITH prefix ``` **For complete troubleshooting:** See [references/troubleshooting.md](references/troubleshooting.md) ### Workflow 5: Frontend Development (React/Vite) **CRITICAL: Always clean dev environment BEFORE starting new features** When working on frontend features (GabeDA Dashboard - React + TypeScript + Vite): **Step 0: Clean Dev Environment (MANDATORY)** ```bash # Kill all node processes to avoid port conflicts and HMR corruption cd C:/Projects/play/gabeda_frontend taskkill //F //IM node.exe # Clear Vite cache rm -rf node_modules/.vite # Start fresh dev server npm run dev ``` **Why This Matters:** - **Problem**: Multiple Vite HMR instances can run simultaneously on different ports (5173, 5174, 5175...) - **Symptom**: Blank pages, "module does not provide export" errors, stuck/corrupted state - **Root Cause**: Old dev servers hold corrupted module cache, new changes start on different port - **Solution**: Kill ALL node processes before starting work **Quick Fix Script:** Use `restart-dev.bat` in frontend folder: ```batch @echo off taskkill //F //IM node.exe 2>nul if exist node_modules\.vite rmdir /s /q node_modules\.vite npm run dev ``` **Development Workflow:** 1. **CLEAN** - Run `restart-dev.bat` or kill node processes manually 2. **BRANCH** - Create feature branch (`git checkout -b feature/feature-name`) 3. **IMPLEMENT** - Make code changes 4. **BUILD** - Run `npm run build` to check for TypeScript errors 5. **TEST** - Test locally on http://localhost:5173 (verify correct port!) 6. **E2E** - Use Playwright skill for automated testing 7. **COMMIT** - Only after local verification passes 8. **DEPLOY** - Merge to main โ†’ auto-deploy to Render **Common Issues:** - **Blank pages on all routes** โ†’ Multiple Vite instances running, kill all node processes - **"Module does not provide export" errors** โ†’ HMR cache corruption, clear `.vite` cache - **Wrong port (5174, 5175 instead of 5173)** โ†’ Old servers still running, kill and restart - **Changes not appearing** โ†’ Browser accessing old port, hard refresh (Ctrl+Shift+R) or use incognito **Port Detection:** ```bash # Check which port Vite started on (look for "Local: http://localhost:XXXX") npm run dev # If not 5173, there are stuck processes - kill and restart ``` **Best Practices:** - **Always** kill node processes before starting new feature work - **Always** verify you're accessing the correct port (check terminal output) - **Always** use incognito/private browsing for testing to avoid browser cache issues - **Always** build (`npm run build`) before committing to catch TypeScript errors - **Never** commit without local testing on the correct port ## Core Principles (DO NOT BREAK) โœ… **Single Responsibility** - Each module does ONE thing โœ… **Single Input** - Each model gets exactly 1 dataframe โœ… **DFS Resolution** - Features auto-ordered by dependencies โœ… **4-Case Logic** - Filters can use attributes as inputs โœ… **Immutable Context** - User config never changes during execution โœ… **Save Checkpoints** - Save after every major transformation โœ… **Type Annotations** - All functions have type hints โœ… **Logging** - Every module uses get_logger(__name__) โœ… **Testing** - All tests MUST be in `/test` folder and be repeatable **For detailed principles:** See [references/core_principles.md](references/core_principles.md) ## Testing Requirements **Current Statistics:** - **Total Tests:** 197 tests (6 integration, 108 unit, 69 validation, 14 notebook) - **Code Coverage:** 85% (target: โ‰ฅ85%) - **Test Manifest:** [ai/testing/TEST_MANIFEST.md](../../../ai/testing/TEST_MANIFEST.md) **โญ Living Document** **Test Rules:** 1. **Location:** All tests MUST be in `/test` folder 2. **Repeatability:** Tests MUST be idempotent (run multiple times, same result) 3. **Cleanup:** Tests MUST delete temp files/folders 4. **Independence:** No external state dependencies 5. **Naming:** Use `test_{module_name}.py` or `test_{feature_name}.py` 6. **Documentation:** ALWAYS append to [Test Manifest](../../../ai/testing/TEST_MANIFEST.md) **Running Tests:** ```bash pytest test/ # All tests pytest test/unit/ # Unit tests only pytest test/integration/ # Integration tests only pytest test/ -v # With verbose output ``` **For complete testing guidelines:** See [references/testing_guidelines.md](references/testing_guidelines.md) ## Configuration Patterns **Base Config:** ```python base_cfg = { 'input_file': 'path/to/data.csv', 'client': 'project_name', 'analysis_dt': 'YYYY-MM-DD', 'data_schema': { 'in_dt': {'source_column': 'Fecha venta', 'dtype': 'date'}, 'in_product_id': {'source_column': 'SKU', 'dtype': 'str'}, 'in_price_total': {'source_column': 'Total', 'dtype': 'float'} } } ``` **Model Config (With External Data):** ```python cfg_model = { 'model_name': 'weekly', 'group_by': ['dt_year', 'dt_weekofyear'], 'row_id': 'in_trans_id', 'output_cols': list(features.keys()), 'features': features, 'external_data': { 'daily_attrs': { 'source': 'daily_attrs', 'join_on': ['dt_date'], 'columns': None # None = ALL, or ['col1', 'col2'] } } } ``` **For complete patterns:** See [references/configuration_patterns.md](references/configuration_patterns.md) ## Additional Resources ### Reference Documentation - [module_reference.md](references/module_reference.md) - 34 modules structure with coverage stats - [data_flow_pipeline.md](references/data_flow_pipeline.md) - 7-stage pipeline flow - [4_case_logic.md](references/4_case_logic.md) - Critical execution engine **โญ KEY INNOVATION** - [feature_types.md](references/feature_types.md) - Filters vs attributes - [dependency_resolution.md](references/dependency_resolution.md) - DFS traversal - [configuration_patterns.md](references/configuration_patterns.md) - Config templates - [external_data_integration.md](references/external_data_integration.md) - Column naming rules - [synthetic_enrichment.md](references/synthetic_enrichment.md) - Auto-infer 17 columns - [testing_guidelines.md](references/testing_guidelines.md) - Test requirements (197 tests) - [troubleshooting.md](references/troubleshooting.md) - Common error patterns - [core_principles.md](references/core_principles.md) - 9 DO NOT BREAK rules ### Implementation Examples - [implementing_new_model.md](assets/examples/implementing_new_model.md) - Step-by-step model creation - [adding_new_feature.md](assets/examples/adding_new_feature.md) - Filter and attribute addition - [configuring_external_data.md](assets/examples/configuring_external_data.md) - External joins - [adding_aggregation_level.md](assets/examples/adding_aggregation_level.md) - New aggregation levels ### External Documentation - **[Feature Implementation Guide](../../../ai/architect/feature_implementation_guide.md)** - **PRIMARY REFERENCE** - [Documentation Master Index](../../../ai/README.md) - All guides - [Module Reference](../../../ai/specs/src/README.md) - Technical module docs - [Model Specifications](../../../ai/specs/model/) - Tech specs, aggregation architecture ## Integration with Other Skills ### From Business Skill - **Receive:** User stories, acceptance criteria, priority rankings, business requirements - **Provide:** Technical feasibility assessment, effort estimates, architecture proposals - **Example:** Business defines "VIP customer retention" โ†’ Architect implements RFM model ### From Executive Skill - **Receive:** Feature requirements, quality standards, timeline constraints - **Provide:** Implementation plans, trade-off analysis, technical specs - **Example:** Executive prioritizes Chilean launch โ†’ Architect implements CLP currency support ### To Insights Skill - **Provide:** Available features, data schema, execution capabilities - **Receive:** Notebook requirements, visualization needs, metric definitions - **Example:** Architect adds RFM model โ†’ Insights creates VIP retention notebook ### To Marketing Skill - **Provide:** Technical capabilities, feature descriptions, performance metrics - **Receive:** Feature positioning requirements, technical content needs - **Example:** Architect implements 4-case logic โ†’ Marketing positions as "KEY INNOVATION" ## Living Documents (Append Only) **When making changes, ALWAYS append to these 9 living documents:** | Document | When to Use | |----------|-------------| | [CHANGELOG.md](../../../ai/CHANGELOG.md) | After modifying any `.py` file | | [ISSUES.md](../../../ai/ISSUES.md) | After fixing bugs or errors | | [PROJECT_STATUS.md](../../../ai/PROJECT_STATUS.md) | Weekly updates | | [FEATURE_IMPLEMENTATIONS.md](../../../ai/FEATURE_IMPLEMENTATIONS.md) | After implementing features | | [TESTING_RESULTS.md](../../../ai/TESTING_RESULTS.md) | After running tests | | [TEST_MANIFEST.md](../../../ai/testing/TEST_MANIFEST.md) | **When adding/modifying tests** โญ | | [ARCHITECTURE_DECISIONS.md](../../../ai/architect/ARCHITECTURE_DECISIONS.md) | When making architectural choices | | [NOTEBOOK_IMPROVEMENTS.md](../../../ai/guides/NOTEBOOK_IMPROVEMENTS.md) | When improving notebooks | | [FUTURE_ENHANCEMENTS.md](../../../ai/planning/FUTURE_ENHANCEMENTS.md) | When proposing enhancements | **Documentation Workflow:** 1. Check if change fits into one of these 9 living documents 2. If YES โ†’ **APPEND** to that document (do NOT create new file) 3. If NO โ†’ Check [Documentation Guidelines](../../../ai/standards/DOCUMENTATION_STANDARD.md) 4. **NEVER create documentation files without checking guidelines first** ## Working Directory **Architect Workspace:** `.claude/skills/architect/` **Bundled Resources:** - `references/` - 11 technical reference documents (module structure, 4-case logic, external data, testing, troubleshooting, core principles) - `assets/examples/` - 4 implementation guides (new model, new feature, external data, aggregation level) **Technical Documents (Create Here):** - `/ai/architect/` - Architecture proposals, spike results, design documents - Use descriptive names: `integration_analysis.md`, `feature_implementation_guide.md` **Context Folders (Reference as Needed):** - `/ai/backend/` - Django backend context - `/ai/frontend/` - React frontend context - `/ai/specs/` - Technical specifications (context, edge cases, feature store, model specs) ## When Suggesting Changes Always explain: - **Why** - Maintains architectural integrity - **Which modules** - Affected components - **How** - Fits into data flow - **Where** - Data persistence location - **What testing** - Required in `/test` folder - **How repeatable** - Test idempotency strategy **For every change:** 1. Identify implementation files 2. Create corresponding test in `/test` folder 3. Ensure tests are repeatable and self-contained 4. Use sample data from `data/tests/` when needed 5. Document test execution in code comments 6. **Append to Test Manifest** when adding tests **Think like an architect:** Prioritize maintainability, testability, and adherence to established patterns. ## Version History **v2.1.0** (2025-10-30) - Refactored to use progressive disclosure pattern - Extracted detailed content to `references/` (11 files) and `assets/examples/` (4 files) - Converted to imperative form (removed second-person voice) - Reduced from 576 lines to ~295 lines - Enhanced with v2.1 utils package details (7 utility modules) - Added clear workflow sections with examples **v2.0.0** (2025-10-28) - Updated for v2.1 architecture (34 modules, 6 packages) - Added comprehensive testing guidelines - Enhanced external data integration documentation --- **Last Updated:** 2025-10-30 **Architecture Version:** v2.1 (34 modules in 6 packages) **Test Coverage:** 197 tests, 85% coverage **Core Innovation:** 4-case logic engine (filters can use attributes as inputs)