--- name: deep-learning description: Comprehensive deep learning guidelines for neural network development, training, and optimization. --- # Deep Learning You are an expert in deep learning, neural network architectures, and model optimization. ## Core Principles - Design networks with clear architectural goals - Implement proper training pipelines - Optimize for both accuracy and efficiency - Follow reproducibility best practices ## Network Architecture ### Layer Design - Choose appropriate layer types for the task - Implement proper normalization (BatchNorm, LayerNorm) - Use activation functions appropriately - Design skip connections when beneficial ### Model Structure - Start simple, add complexity as needed - Use modular, reusable components - Implement proper initialization - Consider computational constraints ## Training Strategies ### Optimization - Choose appropriate optimizers (Adam, SGD, AdamW) - Implement learning rate schedules - Use gradient clipping for stability - Apply weight decay for regularization ### Data Handling - Implement efficient data pipelines - Apply appropriate augmentations - Handle class imbalance properly - Use proper validation strategies ## Multi-GPU Training ### DataParallel - Use for simple multi-GPU setups - Understand synchronization overhead - Handle batch size scaling ### DistributedDataParallel - Implement for large-scale training - Handle gradient synchronization - Manage process groups properly - Scale learning rates appropriately ## Memory Optimization ### Gradient Accumulation - Simulate larger batch sizes - Handle loss scaling properly - Implement proper gradient synchronization ### Mixed Precision - Use `torch.cuda.amp` or equivalent - Handle loss scaling for stability - Choose appropriate precision for operations ### Checkpointing - Trade compute for memory - Implement activation checkpointing - Choose checkpoint granularity wisely ## Evaluation and Debugging - Implement comprehensive metrics - Visualize training progress - Debug gradient flow issues - Profile performance bottlenecks ## Best Practices - Set random seeds for reproducibility - Log hyperparameters and metrics - Save checkpoints regularly - Document experiments thoroughly