--- name: ksim-rl description: RL training library for humanoid locomotion and manipulation built on MuJoCo and JAX. Provides PPO, AMP, and custom task abstractions for sim-to-real robotics policy training. version: 1.0.0 category: robotics-rl author: K-Scale Labs source: kscalelabs/ksim license: MIT trit: -1 trit_label: MINUS color: "#3A2F9E" verified: false featured: true --- # KSIM-RL Skill **Trit**: -1 (MINUS - analysis/verification) **Color**: #3A2F9E (Deep Purple) **URI**: skill://ksim-rl#3A2F9E ## Overview KSIM is K-Scale Labs' reinforcement learning library for humanoid robot locomotion and manipulation. Built on MuJoCo for physics simulation and JAX for hardware-accelerated training. ## Core Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ KSIM ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ RLTask │ │ PPOTask │ │ AMPTask │ │ │ │ (abstract) │──│ (PPO impl) │──│ (Adversarial Motion) │ │ │ └─────────────┘ └─────────────┘ └─────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ PhysicsEngine │ │ │ │ ┌───────────────┐ ┌───────────────────────────────┐ │ │ │ │ │ MujocoEngine │ │ MjxEngine (JAX-accelerated) │ │ │ │ │ └───────────────┘ └───────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Environment Components │ │ │ │ • Actuators: Position, Velocity, Torque control │ │ │ │ • Observations: Joint states, IMU, local view │ │ │ │ • Rewards: Velocity tracking, gait, energy, stability │ │ │ │ • Terminations: Fall detection, boundary violations │ │ │ └─────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Key Features - **JAX-Accelerated**: Uses MJX for parallel environment simulation on GPU/TPU - **PPO Training**: Proximal Policy Optimization with configurable hyperparameters - **AMP Support**: Adversarial Motion Priors for realistic humanoid locomotion - **Modular Rewards**: Composable reward functions for gait, velocity, energy - **Domain Randomization**: Built-in randomizers for sim-to-real transfer ## API Usage ```python import ksim from ksim import PPOTask, MjxEngine from ksim.tasks.humanoid import HumanoidWalkingTask # Define custom task class KBotWalkingTask(PPOTask): model_path = "kbot.mjcf" # Observations observations = [ ksim.JointPosition(), ksim.JointVelocity(), ksim.IMUAngularVelocity(), ksim.BaseOrientation(), ] # Rewards rewards = [ ksim.LinearVelocityReward(scale=1.0), ksim.GaitPhaseReward(scale=0.5), ksim.EnergyPenalty(scale=-0.01), ] # Actuators actuators = [ ksim.PositionActuator( joint_name=".*", kp=100.0, kd=10.0, action_scale=0.5, ) ] # Train task = KBotWalkingTask() task.run_training( num_envs=4096, num_steps=1000000, learning_rate=3e-4, ) ``` ## GF(3) Triads This skill participates in balanced triads: ``` ksim-rl (-1) ⊗ kos-firmware (+1) ⊗ mujoco-scenes (0) = 0 ✓ ksim-rl (-1) ⊗ kos-firmware (+1) ⊗ urdf2mjcf (0) = needs balancing ``` ## Key Contributors - **codekansas** (Ben Bolte): Core architecture, PPO, rewards - **b-vm**: Randomizers, disturbances, policy training - **carlosdp**: Adaptive KL, action scaling - **WT-MM**: Visualization, markers ## Related Skills - `kos-firmware` (+1): Robot firmware and gRPC services - `mujoco-scenes` (0): Scene composition for MuJoCo - `evla-vla` (-1): Vision-language-action models - `urdf2mjcf` (-1): URDF to MJCF conversion - `ktune-sim2real` (-1): Servo tuning for sim2real ## References ```bibtex @misc{ksim2024, title={K-Sim: RL Training for Humanoid Locomotion}, author={K-Scale Labs}, year={2024}, url={https://github.com/kscalelabs/ksim} } ``` ## SDF Interleaving This skill connects to **Software Design for Flexibility** (Hanson & Sussman, 2021): ### Primary Chapter: 5. Evaluation **Concepts**: eval, apply, interpreter, environment ### GF(3) Balanced Triad ``` ksim-rl (○) + SDF.Ch5 (−) + [balancer] (+) = 0 ``` **Skill Trit**: 0 (ERGODIC - coordination) ### Secondary Chapters - Ch2: Domain-Specific Languages ### Connection Pattern Evaluation interprets expressions. This skill processes or generates evaluable forms.