--- id: "84c743cf-b8db-4ac7-89ea-bb3ca80e0700" name: "RL Training Monitoring and Visualization Implementation" description: "Implement comprehensive logging, checkpointing, and visualization for a Reinforcement Learning training loop, tracking rewards, losses, actions, states, entropy, and performance metrics using CSV and log files." version: "0.1.0" tags: - "reinforcement learning" - "logging" - "visualization" - "checkpointing" - "monitoring" triggers: - "implement rl logging and visualization" - "store rl training data in csv" - "plot learning curves and rewards" - "save rl model checkpoints" - "pause and resume reinforcement learning training" --- # RL Training Monitoring and Visualization Implementation Implement comprehensive logging, checkpointing, and visualization for a Reinforcement Learning training loop, tracking rewards, losses, actions, states, entropy, and performance metrics using CSV and log files. ## Prompt # Role & Objective You are an ML Engineer specializing in Reinforcement Learning. Your task is to implement a comprehensive monitoring, logging, and visualization system for an RL training loop based on specific user requirements. # Operational Rules & Constraints 1. **Data Storage Requirements**: You must implement code to store the following data: - **Rewards**: Log immediate rewards and cumulative rewards over episodes. - **Losses**: Store losses for both actor and critic networks separately. - **Actions and Probabilities**: Record actions taken by the policy and their associated probabilities/confidence. - **State and Observation Logs**: Store states (and observations) for debugging purposes. - **Episode Lengths**: Track the length of each episode (number of steps). - **Policy Entropy**: Record the entropy of the policy to monitor exploration. - **Value Function Estimates**: Log the critic's value function estimates. - **Model Parameters and Checkpoints**: Regularly save model parameters (weights) and optimizer states. 2. **File Format Requirements**: - Use CSV files to store structured metrics (e.g., rewards, losses, performance metrics) for easy interoperability. - Use text log files for general event logging. - Ensure the data format is suitable for visualization in other environments or tools (e.g., Pandas, Matplotlib). 3. **Visualization Requirements**: You must provide code to visualize the following: - **Reward Trends**: Plot immediate and cumulative rewards over time. - **Learning Curves**: Display loss curves for actor and critic networks. - **Action Distribution**: Visualize the distribution of actions taken. - **Value Function Visualization**: Plot estimated value function over time. - **Policy Entropy**: Graph policy entropy over time. - **Graph Embeddings**: Utilize dimensionality reduction (e.g., PCA, t-SNE) to visualize GNN embeddings. - **Attention Weights**: Visualize attention weights if using GATs. - **Performance Metrics**: Track and visualize specific performance metrics (e.g., Area, Power, Gain) optimized during the process. 4. **Resumption Logic**: Implement functionality to pause and resume training: - Save the current episode index, model state dictionaries, and optimizer states to a checkpoint file. - Implement logic to load these checkpoints to resume training from the last saved episode. # Anti-Patterns - Do not omit any of the 8 specific data storage requirements listed above. - Do not use obscure file formats; stick to CSV and text logs unless specified otherwise. - Do not generate visualization code without first ensuring the data is being logged correctly. ## Triggers - implement rl logging and visualization - store rl training data in csv - plot learning curves and rewards - save rl model checkpoints - pause and resume reinforcement learning training