--- name: profile-engine description: Profile the Python engine to identify performance bottlenecks in NumPy/Numba code disable-model-invocation: true --- Profile the OSMOSE Python engine to identify performance bottlenecks and Numba JIT compilation issues. ## Arguments - `config` (optional): "bob" (Bay of Biscay) or "eec" (EEC) — default: "bob" - `years` (optional): simulation years (default: 5) - `focus` (optional): specific module to deep-profile (e.g., "predation", "movement") ## Steps 1. **Run cProfile** to get a high-level function-time breakdown: ``` .venv/bin/python -m cProfile -s cumulative -m osmose.engine.cli --config {config} --years {years} 2>&1 | head -40 ``` If no CLI entry point, use this inline approach: ``` .venv/bin/python -c " import cProfile from scripts.benchmark_engine import run_benchmark cProfile.run('run_benchmark()', sort='cumulative') " 2>&1 | head -40 ``` 2. **Identify top 5 hotspots**: From cProfile output, list the functions consuming the most cumulative time. Focus on `osmose/engine/` functions, not NumPy/Numba internals. 3. **Check Numba JIT status** for hotspot functions: ``` .venv/bin/python -c " import numba numba.config.DEVELOPER_MODE = True # Import the module to trigger JIT compilation from osmose.engine.processes import predation print('Numba JIT compilation successful') " ``` Look for: - Functions falling back to object mode (kills performance) - Type inference failures - Unsupported Python features inside `@njit` 4. **Check for non-vectorized loops**: Search hotspot files for Python loops over arrays that should be vectorized: ``` grep -n "for.*in range" osmose/engine/processes/{focus}.py ``` Flag any loop iterating over array elements that could use NumPy broadcasting. 5. **Check memory allocation patterns**: Look for array allocations inside hot loops: ``` grep -n "np.zeros\|np.empty\|np.array" osmose/engine/processes/{focus}.py ``` Arrays should be pre-allocated outside loops where possible. 6. **Compare against baseline timing**: - Bay of Biscay 5yr baseline: ~2.0s - EEC 5yr baseline: ~5.2s If current timing exceeds baseline by >10%, flag as regression. 7. **Report findings** as a table: | Rank | Function | Time (s) | % Total | Issue | |------|----------|----------|---------|-------| | 1 | predation._compute_kernel | X.XX | XX% | — | | 2 | movement._distribute | X.XX | XX% | Non-vectorized loop | 8. **Suggest optimizations** for any identified bottlenecks: - Python loop → NumPy vectorization - NumPy operation → Numba `@njit` - Repeated allocation → pre-allocated buffer - Object-mode Numba → fix type annotations ## Rules - Always use `.venv/bin/python`, never system python - Run from `/home/razinka/osmose/osmose-python/` - First JIT compilation is slow — run twice and report the second (warm) timing - Do NOT modify engine code during profiling — this is a read-only analysis - Always verify parity after any suggested optimization is applied - Performance baselines: BoB 5yr ~2.0s, EEC 5yr ~5.2s (Python), Java BoB ~2.3s, EEC ~7.2s