---
name: ml-reviewer
description: |
  WHEN: Machine Learning/Deep Learning code review, PyTorch/TensorFlow patterns, Model training optimization, MLOps checks
  WHAT: Model architecture review + Training patterns + Data pipeline checks + GPU optimization + Experiment tracking
  WHEN NOT: Data analysis only → python-data-reviewer, General Python → python-reviewer
---

# ML Reviewer Skill

## Purpose
Reviews Machine Learning and Deep Learning code for PyTorch, TensorFlow, scikit-learn, and MLOps best practices.

## When to Use
- ML/DL project code review
- "PyTorch", "TensorFlow", "Keras", "scikit-learn", "model training" mentions
- Model performance, training optimization inspection
- Projects with ML framework dependencies

## Project Detection
- `torch`, `tensorflow`, `keras`, `sklearn` in requirements.txt/pyproject.toml
- `.pt`, `.pth`, `.h5`, `.pkl` model files
- `train.py`, `model.py`, `dataset.py` files
- Jupyter notebooks with ML imports

## Workflow

### Step 1: Analyze Project
```
**Framework**: PyTorch / TensorFlow / scikit-learn
**Python**: 3.10+
**CUDA**: 11.x / 12.x
**Task**: Classification / Regression / NLP / CV
**Stage**: Research / Production
```

### Step 2: Select Review Areas
**AskUserQuestion:**
```
"Which areas to review?"
Options:
- Full ML pattern check (recommended)
- Model architecture review
- Training loop optimization
- Data pipeline efficiency
- MLOps/deployment patterns
multiSelect: true
```

## Detection Rules

### PyTorch Patterns
| Check | Recommendation | Severity |
|-------|----------------|----------|
| Missing model.eval() | Inconsistent inference | HIGH |
| Missing torch.no_grad() | Memory leak in inference | HIGH |
| In-place operations in autograd | Gradient computation error | CRITICAL |
| DataLoader num_workers=0 | CPU bottleneck | MEDIUM |
| Missing gradient clipping | Exploding gradients | MEDIUM |

```python
# BAD: Missing eval() and no_grad()
def predict(model, x):
    return model(x)  # Dropout/BatchNorm inconsistent!

# GOOD: Proper inference mode
def predict(model, x):
    model.eval()
    with torch.no_grad():
        return model(x)

# BAD: In-place operation breaking autograd
x = torch.randn(10, requires_grad=True)
x += 1  # In-place! Breaks gradient computation

# GOOD: Out-of-place operation
x = torch.randn(10, requires_grad=True)
x = x + 1

# BAD: DataLoader bottleneck
loader = DataLoader(dataset, batch_size=32)  # num_workers=0

# GOOD: Parallel data loading
loader = DataLoader(
    dataset,
    batch_size=32,
    num_workers=4,
    pin_memory=True,  # For GPU
    persistent_workers=True,
)

# BAD: No gradient clipping
optimizer.step()

# GOOD: Clip gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
```

### TensorFlow/Keras Patterns
| Check | Recommendation | Severity |
|-------|----------------|----------|
| Missing @tf.function | Performance loss | MEDIUM |
| Eager mode in production | Slow inference | HIGH |
| Large model in memory | OOM risk | HIGH |
| Missing mixed precision | Training inefficiency | MEDIUM |

```python
# BAD: No @tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        pred = model(x)
        loss = loss_fn(y, pred)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# GOOD: Use @tf.function
@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        pred = model(x, training=True)
        loss = loss_fn(y, pred)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# BAD: Missing mixed precision
model.fit(x_train, y_train, epochs=10)

# GOOD: Enable mixed precision
tf.keras.mixed_precision.set_global_policy('mixed_float16')
model.fit(x_train, y_train, epochs=10)
```

### scikit-learn Patterns
| Check | Recommendation | Severity |
|-------|----------------|----------|
| fit_transform on test data | Data leakage | CRITICAL |
| Missing cross-validation | Overfitting risk | HIGH |
| No feature scaling | Model performance | MEDIUM |
| Hardcoded random_state | Reproducibility | LOW |

```python
# BAD: Data leakage
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)  # LEAK! Re-fitting

# GOOD: transform only on test
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)  # No re-fit

# BAD: No cross-validation
model.fit(X_train, y_train)
score = model.score(X_test, y_test)

# GOOD: Use cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(f"CV Score: {scores.mean():.3f} (+/- {scores.std():.3f})")

# BAD: Pipeline without scaling
model = LogisticRegression()
model.fit(X_train, y_train)

# GOOD: Use Pipeline with scaling
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)
```

### Data Pipeline
| Check | Problem | Solution |
|-------|---------|----------|
| Loading full dataset to memory | OOM | Use generators/tf.data |
| No data augmentation | Overfitting | Add augmentation |
| Unbalanced classes | Biased model | Oversample/undersample/weights |
| No validation split | No early stopping | Use validation set |

```python
# BAD: Full dataset in memory
images = []
for path in all_image_paths:
    images.append(load_image(path))  # OOM for large datasets!

# GOOD: Use generator
def data_generator(paths, batch_size):
    for i in range(0, len(paths), batch_size):
        batch_paths = paths[i:i+batch_size]
        yield np.array([load_image(p) for p in batch_paths])

# GOOD: Use tf.data
dataset = tf.data.Dataset.from_tensor_slices(paths)
dataset = dataset.map(load_and_preprocess)
dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)

# BAD: No class weights for imbalanced data
model.fit(X_train, y_train)

# GOOD: Add class weights
from sklearn.utils.class_weight import compute_class_weight
weights = compute_class_weight('balanced', classes=np.unique(y), y=y)
class_weights = dict(enumerate(weights))
model.fit(X_train, y_train, class_weight=class_weights)
```

### GPU/Performance
| Check | Recommendation | Severity |
|-------|----------------|----------|
| CPU tensor operations | Use GPU tensors | HIGH |
| Frequent GPU-CPU transfer | Batch transfers | HIGH |
| No gradient accumulation | OOM for large batch | MEDIUM |
| Missing torch.cuda.empty_cache() | Memory fragmentation | LOW |

```python
# BAD: CPU operations
x = torch.randn(1000, 1000)
y = torch.randn(1000, 1000)
z = x @ y  # CPU computation

# GOOD: GPU operations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = torch.randn(1000, 1000, device=device)
y = torch.randn(1000, 1000, device=device)
z = x @ y  # GPU computation

# BAD: Frequent CPU-GPU transfer
for x, y in dataloader:
    x = x.cuda()
    y = y.cuda()
    loss = model(x, y)
    print(loss.item())  # Sync every iteration!

# GOOD: Batch logging
losses = []
for x, y in dataloader:
    x, y = x.to(device), y.to(device)
    loss = model(x, y)
    losses.append(loss)
if step % log_interval == 0:
    print(torch.stack(losses).mean().item())

# Gradient accumulation for large effective batch
accumulation_steps = 4
for i, (x, y) in enumerate(dataloader):
    loss = model(x, y) / accumulation_steps
    loss.backward()
    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()
```

### MLOps/Experiment Tracking
| Check | Recommendation | Severity |
|-------|----------------|----------|
| No experiment tracking | Reproducibility | HIGH |
| Hardcoded hyperparameters | Config management | MEDIUM |
| No model versioning | Deployment issues | MEDIUM |
| Missing seed setting | Non-reproducible | HIGH |

```python
# BAD: No seed setting
model = train_model(X, y)

# GOOD: Set all seeds
import random
import numpy as np
import torch

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True

set_seed(42)

# BAD: Hardcoded hyperparameters
lr = 0.001
batch_size = 32
epochs = 100

# GOOD: Use config file or hydra
import hydra
from omegaconf import DictConfig

@hydra.main(config_path="configs", config_name="train")
def train(cfg: DictConfig):
    model = build_model(cfg.model)
    optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)

# GOOD: Use experiment tracking
import wandb
wandb.init(project="my-project", config=cfg)
for epoch in range(epochs):
    loss = train_epoch(model, dataloader)
    wandb.log({"loss": loss, "epoch": epoch})
wandb.finish()
```

## Response Template
```
## ML Code Review Results

**Project**: [name]
**Framework**: PyTorch/TensorFlow/scikit-learn
**Task**: Classification/Regression/NLP/CV
**Files Analyzed**: X

### Model Architecture
| Status | File | Issue |
|--------|------|-------|
| MEDIUM | models/resnet.py | Missing dropout for regularization |
| LOW | models/transformer.py | Consider gradient checkpointing |

### Training Loop
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | Missing model.eval() in validation (line 45) |
| HIGH | train.py | No gradient clipping (line 67) |

### Data Pipeline
| Status | File | Issue |
|--------|------|-------|
| CRITICAL | data/dataset.py | fit_transform on test data (line 23) |
| HIGH | data/loader.py | DataLoader num_workers=0 |

### MLOps
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | No seed setting for reproducibility |
| MEDIUM | train.py | Hardcoded hyperparameters |

### Recommended Actions
1. [ ] Add model.eval() and torch.no_grad() for inference
2. [ ] Fix data leakage in preprocessing
3. [ ] Set random seeds for reproducibility
4. [ ] Add experiment tracking (wandb/mlflow)
```

## Best Practices
1. **Training**: eval mode, no_grad, gradient clipping, mixed precision
2. **Data**: No leakage, proper splits, augmentation, balanced classes
3. **Performance**: GPU operations, batch transfers, gradient accumulation
4. **MLOps**: Seed setting, experiment tracking, config management
5. **Testing**: Unit tests for data pipeline, model output shape tests

## Integration
- `python-reviewer` skill: General Python code quality
- `python-data-reviewer` skill: Data preprocessing patterns
- `test-generator` skill: ML test generation
- `docker-reviewer` skill: ML containerization

## Notes
- Based on PyTorch 2.x, TensorFlow 2.x, scikit-learn 1.x
- Supports distributed training patterns (DDP, FSDP)
- Includes MLOps patterns (wandb, mlflow, hydra)