---
name: performance-profiling
description: >
  Identify computational bottlenecks, analyze parallel scaling, estimate memory
  requirements, and generate optimization recommendations for materials
  simulations — parse timing logs to find dominant phases (solver, assembly,
  I/O), evaluate strong and weak scaling efficiency, profile memory from mesh
  and field parameters, and detect bottlenecks with actionable fix suggestions.
  Use when a simulation is running slower than expected, investigating MPI
  scaling efficiency, planning HPC resource allocation, deciding whether to
  tune the preconditioner or reduce I/O frequency, or estimating if a problem
  fits in available RAM, even if the user only says "my simulation is too
  slow" or "how many nodes do I need."
allowed-tools: Read, Write, Grep, Glob
metadata:
  author: HeshamFS
  version: "1.1.0"
  security_tier: medium
  security_reviewed: true
  tested_with:
    - claude-code
    - gemini-cli
    - vs-code-copilot
  eval_cases: 2
  last_reviewed: "2026-03-26"
---

# Performance Profiling

## Goal

Provide tools to analyze simulation performance, identify bottlenecks, and recommend optimization strategies for computational materials science simulations.

## Requirements

- Python 3.8+
- No external dependencies (uses Python standard library only)
- Works on Linux, macOS, and Windows

## Inputs to Gather

Before running profiling scripts, collect from the user:

| Input | Description | Example |
|-------|-------------|---------|
| Simulation log | Log file with timing information | `simulation.log` |
| Scaling data | JSON with multi-run performance data | `scaling_data.json` |
| Simulation parameters | JSON with mesh, fields, solver config | `params.json` |
| Available memory | System memory in GB (optional) | `16.0` |

## Decision Guidance

### When to Use Each Script

```
Need to identify slow phases?
├── YES → Use timing_analyzer.py
│         └── Parse simulation logs for timing data
│
Need to understand parallel performance?
├── YES → Use scaling_analyzer.py
│         └── Analyze strong or weak scaling efficiency
│
Need to estimate memory requirements?
├── YES → Use memory_profiler.py
│         └── Estimate memory from problem parameters
│
Need optimization recommendations?
└── YES → Use bottleneck_detector.py
          └── Combine analyses and get actionable advice
```

### Choosing Analysis Thresholds

| Metric | Good | Acceptable | Poor |
|--------|------|------------|------|
| Phase dominance | <30% | 30-50% | >50% |
| Parallel efficiency | >0.80 | 0.70-0.80 | <0.70 |
| Memory usage | <60% | 60-80% | >80% |

## Script Outputs (JSON Fields)

| Script | Key Outputs |
|--------|-------------|
| `timing_analyzer.py` | `timing_data.phases`, `timing_data.slowest_phase`, `timing_data.total_time` |
| `scaling_analyzer.py` | `scaling_analysis.results`, `scaling_analysis.efficiency_threshold_processors` |
| `memory_profiler.py` | `memory_profile.total_memory_gb`, `memory_profile.per_process_gb`, `memory_profile.warnings` |
| `bottleneck_detector.py` | `bottlenecks`, `recommendations` |

## Workflow

### Complete Profiling Workflow

1. **Analyze timing** from simulation logs
2. **Analyze scaling** from multi-run data (if available)
3. **Profile memory** from simulation parameters
4. **Detect bottlenecks** and get recommendations
5. **Implement optimizations** based on recommendations
6. **Re-profile** to verify improvements

### Quick Profiling (Timing Only)

1. **Run timing analyzer** on simulation log
2. **Identify dominant phases** (>50% of runtime)
3. **Apply targeted optimizations** to dominant phases

## CLI Examples

### Timing Analysis

```bash
# Basic timing analysis
python3 scripts/timing_analyzer.py \
    --log simulation.log \
    --json

# Custom timing pattern
python3 scripts/timing_analyzer.py \
    --log simulation.log \
    --pattern 'Step\s+(\w+)\s+took\s+([\d.]+)s' \
    --json
```

### Scaling Analysis

```bash
# Strong scaling (fixed problem size)
python3 scripts/scaling_analyzer.py \
    --data scaling_data.json \
    --type strong \
    --json

# Weak scaling (constant work per processor)
python3 scripts/scaling_analyzer.py \
    --data scaling_data.json \
    --type weak \
    --json
```

### Memory Profiling

```bash
# Estimate memory requirements
python3 scripts/memory_profiler.py \
    --params simulation_params.json \
    --available-gb 16.0 \
    --json
```

### Bottleneck Detection

```bash
# Detect bottlenecks from timing only
python3 scripts/bottleneck_detector.py \
    --timing timing_results.json \
    --json

# Comprehensive analysis with all inputs
python3 scripts/bottleneck_detector.py \
    --timing timing_results.json \
    --scaling scaling_results.json \
    --memory memory_results.json \
    --json
```

## Conversational Workflow Example

**User**: My simulation is taking too long. Can you help me identify what's slow?

**Agent workflow**:
1. Ask for simulation log file
2. Run timing analyzer:
   ```bash
   python3 scripts/timing_analyzer.py --log simulation.log --json
   ```
3. Interpret results:
   - If solver dominates (>50%): Recommend preconditioner tuning
   - If assembly dominates: Recommend caching or vectorization
   - If I/O dominates: Recommend reducing output frequency
4. If user has multi-run data, analyze scaling:
   ```bash
   python3 scripts/scaling_analyzer.py --data scaling.json --type strong --json
   ```
5. Generate comprehensive recommendations:
   ```bash
   python3 scripts/bottleneck_detector.py --timing timing.json --scaling scaling.json --json
   ```

## Interpretation Guidance

### Timing Analysis

| Scenario | Meaning | Action |
|----------|---------|--------|
| Solver >70% | Solver-dominated | Tune preconditioner, check tolerance |
| Assembly >50% | Assembly-dominated | Cache matrices, vectorize, parallelize |
| I/O >30% | I/O-dominated | Reduce frequency, use parallel I/O |
| Balanced (<30% each) | Well-balanced | Look for algorithmic improvements |

### Scaling Analysis

| Efficiency | Meaning | Action |
|------------|---------|--------|
| >0.80 | Excellent scaling | Continue scaling up |
| 0.70-0.80 | Good scaling | Monitor at larger scales |
| 0.50-0.70 | Poor scaling | Investigate communication/load balance |
| <0.50 | Very poor scaling | Reduce processor count or redesign |

### Memory Profile

| Usage | Meaning | Action |
|-------|---------|--------|
| <60% available | Safe | No action needed |
| 60-80% available | Moderate | Monitor, consider optimization |
| >80% available | High | Reduce resolution or increase processors |
| >100% available | Exceeds capacity | Must reduce problem size |

## Error Handling

| Error | Cause | Resolution |
|-------|-------|------------|
| `Log file not found` | Invalid path | Verify log file path |
| `No timing data found` | Pattern mismatch | Provide custom pattern with --pattern |
| `At least 2 runs required` | Insufficient data | Provide more scaling runs |
| `Missing required parameters` | Incomplete params | Add mesh and fields to params file |

## Optimization Strategies by Bottleneck Type

### Solver Bottlenecks
- Use algebraic multigrid (AMG) preconditioner
- Tighten solver tolerance if over-solving
- Consider direct solver for small problems
- Profile matrix assembly vs solve time

### Assembly Bottlenecks
- Cache element matrices if geometry is static
- Use vectorized assembly routines
- Consider matrix-free methods
- Parallelize assembly with coloring

### I/O Bottlenecks
- Reduce output frequency
- Use parallel I/O (HDF5, MPI-IO)
- Write to fast scratch storage
- Compress output data

### Scaling Bottlenecks
- Investigate communication overhead
- Check for load imbalance
- Reduce synchronization points
- Use asynchronous communication
- Consider hybrid MPI+OpenMP

### Memory Bottlenecks
- Reduce mesh resolution
- Use iterative solver (lower memory than direct)
- Enable out-of-core computation
- Increase number of processors
- Use single precision where appropriate

## Security

### Input Validation
- User-supplied `--pattern` regex values are validated for length (500 chars max) and rejected if they contain constructs prone to catastrophic backtracking (ReDoS)
- Scaling data entries are validated for finite time values, integer processor counts, and bounded run count (10,000 max)
- `available_gb` is validated as a positive finite number; mesh dimensions and field parameters are validated as positive integers
- `--type` (scaling type) is validated against a fixed allowlist (`strong`, `weak`)
- All loaded JSON files must have an object (dict) as root element

### File Access
- `timing_analyzer.py` reads a single log file specified by `--log`; log files are capped at 500 MB and rejected before parsing
- `scaling_analyzer.py`, `memory_profiler.py`, and `bottleneck_detector.py` read JSON files capped at 100 MB
- Phase names extracted from log files are truncated to 200 characters and stripped of control characters to prevent prompt-injection payloads from propagating into agent context
- No scripts write to the filesystem; all output goes to stdout

### Tool Restrictions
- **Read**: Used to inspect script source, references, simulation logs, and result files
- **Write**: Used to save profiling reports or optimization recommendations; writes are scoped to the user's working directory
- **Grep/Glob**: Used to locate log files, result files, and search references
- The skill's `allowed-tools` excludes `Bash` to prevent the agent from executing arbitrary commands when processing untrusted simulation logs or result files

### Safety Measures
- No `eval()`, `exec()`, or dynamic code generation
- All subprocess calls use explicit argument lists (no `shell=True`)
- Reduced tool surface (no Bash) limits the agent to read/write operations only
- Phase names and diagnostic strings are sanitized before inclusion in output to prevent injection

## Limitations

- **Log parsing**: Depends on pattern matching; may miss unusual formats
- **Scaling analysis**: Requires at least 2 runs for meaningful results
- **Memory estimation**: Approximate; actual usage may vary
- **Recommendations**: General guidance; may need domain-specific tuning

## References

- `references/profiling_guide.md` - Profiling concepts and interpretation
- `references/optimization_strategies.md` - Detailed optimization approaches

## Version History

- **v1.0.0** (2025-01-22): Initial release with 4 profiling scripts