--- name: skill-rl-recursive-distillation title: "SkillRL: Evolving Agents via Recursive Skill-Augmented RL" version: 0.0.2 engine: skillxiv-v0.0.2-claude-opus-4.6 license: MIT url: "https://arxiv.org/abs/2602.08234" keywords: [Skill Distillation, Agent Learning, Experience Reuse, Hierarchical Skills, Policy Improvement] description: "Improve agent performance by autonomously distilling behavioral patterns from trajectories into reusable skills, then using these skills to guide future decisions. Achieves 89.9% success on ALFWorld through differential processing of success vs failure episodes and dynamic skill library evolution." --- # SkillRL: Recursive Skill-Augmented Reinforcement Learning Agent performance plateaus when policies must rediscover past insights repeatedly. SkillRL addresses this by automatically extracting behavioral patterns from interaction history into compact, reusable skills that guide future decision-making. Rather than storing raw trajectories, the system distills strategic patterns and failure lessons, creating a skill library that grows with the agent. ## Core Concept SkillRL processes trajectories differentially: - **Success episodes** → extract strategic patterns (10-20× compression vs raw trajectory) - **Failure episodes** → extract failure lessons capturing what went wrong Skills organize hierarchically: general skills (exploration, state management) and task-specific skills. During decision-making, the agent retrieves relevant skills via semantic similarity, reducing context overhead while maintaining reasoning quality. The skill library evolves recursively: after validation epochs, failure modes generate new skills or refine existing ones, creating a virtuous cycle where improved policies encounter new challenges. ## Architecture Overview - **Experience Processing**: Separate success trajectories (extract patterns) and failure trajectories (extract lessons) - **Skill Library (SkillBank)**: Two-tier organization—general skills (universal) and task-specific skills - **Semantic Retrieval**: Use embedding similarity to retrieve relevant skills for current state - **Dynamic Evolution**: Analyze failure modes to generate new skills or update existing ones - **Integration**: Skills prepend to context during policy rollouts ## Implementation Process trajectories into skills by extracting high-level patterns: ```python def extract_skills_from_trajectory(trajectory, success=True): """Extract skills from a trajectory (success or failure).""" if success: # For successful trajectories: extract strategic patterns # E.g., "When encountering X, use strategy Y" skill = { 'type': 'strategic', 'condition': identify_key_decision_points(trajectory), 'action_pattern': abstract_action_sequence(trajectory), 'outcome': 'success' } else: # For failures: extract lessons # E.g., "Avoid X because it leads to Y failure" failure_point = identify_failure_point(trajectory) skill = { 'type': 'lesson', 'condition': failure_point['state'], 'anti_action': failure_point['action_taken'], 'failure_mode': failure_point['reason'], 'outcome': 'failure' } return skill def compress_skill_text(skill, max_tokens=100): """Compress skill into concise text representation.""" if skill['type'] == 'strategic': return f"When {skill['condition']}, {skill['action_pattern']} → success" else: return f"Avoid {skill['anti_action']} in {skill['condition']} (→ {skill['failure_mode']})" # Build skill library from trajectory batch skill_library = {'general': [], 'task_specific': {}} for traj in trajectories: if traj['success']: skill = extract_skills_from_trajectory(traj, success=True) skill_library['general'].append(compress_skill_text(skill)) else: skill = extract_skills_from_trajectory(traj, success=False) task = traj['task_id'] if task not in skill_library['task_specific']: skill_library['task_specific'][task] = [] skill_library['task_specific'][task].append(compress_skill_text(skill)) ``` Retrieve relevant skills via embedding similarity: ```python import torch import torch.nn.functional as F def retrieve_relevant_skills(current_state, skill_library, embedding_model, top_k=5): """Retrieve top-k skills relevant to current state.""" # Embed current state state_embedding = embedding_model.encode(current_state) # Embed all skills all_skills = skill_library['general'] + sum( skill_library['task_specific'].values(), [] ) skill_embeddings = embedding_model.encode(all_skills) # Compute similarities similarities = F.cosine_similarity( state_embedding.unsqueeze(0), skill_embeddings, dim=-1 ) # Get top-k top_indices = torch.topk(similarities, min(top_k, len(all_skills)))[1] relevant_skills = [all_skills[i] for i in top_indices] return relevant_skills # During rollout def policy_forward_with_skills(state, policy, skill_library, embedding_model): """Generate action using policy augmented with relevant skills.""" # Retrieve skills skills = retrieve_relevant_skills(state, skill_library, embedding_model, top_k=5) skill_context = "\n".join([f"- {s}" for s in skills]) # Augment prompt with skills augmented_prompt = f"Skills:\n{skill_context}\n\nState: {state}\nAction:" # Generate action action = policy.generate(augmented_prompt, max_tokens=50) return action ``` Evolve skill library by analyzing failures: ```python def analyze_failures_and_evolve_skills(trajectories, skill_library, embedding_model): """Analyze failures to generate new skills.""" failed_trajectories = [t for t in trajectories if not t['success']] for failed_traj in failed_trajectories: # Extract failure lesson failure_skill = extract_skills_from_trajectory(failed_traj, success=False) skill_text = compress_skill_text(failure_skill) # Check if similar skill already exists existing_skills = skill_library['general'] + \ sum(skill_library['task_specific'].values(), []) existing_embeddings = embedding_model.encode(existing_skills) new_embedding = embedding_model.encode(skill_text) similarities = F.cosine_similarity( new_embedding.unsqueeze(0), existing_embeddings, dim=-1 ) # If no similar skill exists, add it if similarities.max() < 0.8: task = failed_traj['task_id'] if task not in skill_library['task_specific']: skill_library['task_specific'][task] = [] skill_library['task_specific'][task].append(skill_text) return skill_library ``` ## Practical Guidance | Component | Recommendation | Notes | |-----------|-----------------|-------| | Skill extraction | Success + failure | Both provide value; failures prevent anti-patterns. | | Skill retrieval top-k | 3-7 | Balance context length with coverage. | | Similarity threshold | 0.75-0.85 | Avoid storing near-duplicate skills. | | Evolution frequency | After each validation epoch | Timely skill updates enable policy adaptation. | **When to Use** - Long-horizon tasks where agents need to remember patterns (web navigation, dialogue) - Multi-task learning where general skills apply across domains - Domains with clear success/failure distinction **When NOT to Use** - Single-episode tasks with no trajectory reuse - Domains where all trajectories are unique (low pattern reusability) ## Reference See https://arxiv.org/abs/2602.08234 for full implementation, including skill library management, embedding models, and validation on ALFWorld household tasks.