---
name: active-inference-robotics
description: "Second-order skill synthesizing Patrick Kenny's discrete active inference framework with K-Scale's JAX/MuJoCo robotics stack for predictive coding in robot locomotion"
---

# Active Inference Robotics Skill (Second-Order)

> *"The agent's job is to predict its actions by predicting its sensations."* — Patrick Kenny

## Trigger Conditions

- User asks about bridging active inference with robot control
- Questions about predictive coding in locomotion policies
- Connecting KL divergence minimization to RL training
- Mean field approximation in robotics state estimation
- Sim2Real as inference about future observations

## Overview

**Second-order skill** synthesizing Patrick Kenny's discrete active inference framework with K-Scale's JAX/MuJoCo robotics stack. This skill emerges from the **constructive collision** between:

1. **Active Inference Institute** (ActInf ModelStream 019.1, Jan 2025)
2. **K-Scale Labs** (ksim, kos, kinfer ecosystem)
3. **MuJoCo Playground** (DeepMind's sim2real framework)

## The Constructive Collision

```
┌─────────────────────────────────────────────────────────────────────────────┐
│  CONSTRUCTIVE COLLISION: Two Threads Converging                              │
│                                                                              │
│  Thread A: Patrick Kenny (Nov 2025)                                          │
│  ════════════════════════════════════                                        │
│  "Active inference can be formulated as constrained KL divergence           │
│   minimization solved by standard mean field methods"                        │
│                                                                              │
│  Key insight: Expected Free Energy ≈ KL Divergence + Entropy Regularizer    │
│                                                                              │
│  Thread B: K-Scale Labs (2024-2025)                                          │
│  ═══════════════════════════════════                                         │
│  "RL-based closed-loop control using policies trained in simulation         │
│   has firmly won as the best way of achieving real-time control"            │
│                                                                              │
│  Key insight: Stateless vs Stateful behaviors as pure/coalgebraic semantics │
│                                                                              │
│  COLLISION POINT: Both minimize surprise about future observations          │
│  ══════════════════════════════════════════════════════════════════         │
│                                                                              │
│       Active Inference              Robotics RL                              │
│       ────────────────              ──────────                               │
│       Predictive Distribution  ←→   Policy π(a|s)                           │
│       Hidden Markov Model      ←→   MDP/POMDP                                │
│       Mean Field Updates       ←→   PPO Gradient Steps                       │
│       Variational Free Energy  ←→   Policy Loss                              │
│       Expected Free Energy     ←→   Value Function + Entropy                 │
│       Perception/Action Loop   ←→   Observation/Action Loop                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

## Kenny's Key Contribution

From [arXiv:2511.20321](https://arxiv.org/abs/2511.20321):

```
Perception/Action Divergence = VFE(past) + KL(future states)

Where:
- VFE(past) = Standard variational free energy on observed history
- KL(future) = Divergence of predictive distribution from HMM

This differs from Expected Free Energy by an ENTROPY REGULARIZER:
  EFE ≈ Pragmatic Value + Mutual Information
  PAD ≈ Pragmatic Value + Entropy(Q)
```

### Why Entropy Regularization Matters for Robotics

```python
# In ksim PPO training, entropy bonus prevents policy collapse:
loss = policy_loss + value_loss - entropy_coef * entropy

# Kenny's formulation shows this is NOT ad-hoc but principled:
# Entropy regularizer = not being overconfident about predictions
# Biological rationale: know limitations of future predictions
```

## Mapping to ksim Architecture

| Active Inference Concept | ksim Implementation |
|--------------------------|---------------------|
| Hidden Markov Model | `PhysicsEngine` (MJX/MuJoCo) |
| Observation distribution | `Observation.observe(state)` |
| State inference Q(s) | `Critic.forward(obs, carry)` |
| Action inference Q(a) | `Actor.forward(obs, carry)` |
| Mean field factorization | Independent Q(s_t) per timestep |
| Predictive distribution | Policy rollout trajectory |
| VFE minimization | PPO policy gradient |
| EFE/PAD minimization | Value function + entropy bonus |

## Second-Order Behavior Types

### 1. Reflexive Control (Kenny's "Sufficient" Model)

```python
# Agent predicts proprioceptive sensations → fulfills reflexively
class ReflexiveController:
    """
    Kenny: "If the agent can successfully predict its future sensations,
    it can fulfill them unconsciously via motor reflexes."
    """
    def step(self, predicted_proprio: Array) -> Action:
        # Low-level PD control fulfills proprioceptive predictions
        return self.pd_controller(predicted_proprio, self.current_state)
```

### 2. Deliberative Planning (EFE Extension)

```python
# When reflexive prediction fails, engage deliberative inference
class DeliberativeController:
    """
    Extends reflexive control with policy search over trajectories.
    This is where EFE differs from Kenny's PAD formulation.
    """
    def plan(self, beliefs: Distribution, horizon: int) -> Policy:
        # Tree search over policies weighted by expected free energy
        for policy in self.policy_space:
            efe = self.expected_free_energy(beliefs, policy, horizon)
            # EFE includes mutual information (curiosity/exploration)
            # PAD would use entropy instead (uncertainty awareness)
```

### 3. Hierarchical Composition

```
Level 3: Goal Selection (minimize long-horizon EFE)
    ↓ sets reference for
Level 2: Trajectory Planning (predictive distribution)
    ↓ sets reference for  
Level 1: Reflexive Execution (fulfill proprio predictions)
    ↓ actuates
Level 0: Motor Primitives (PD control, actuator dynamics)
```

## GF(3) Balanced Quad

```
active-inference (0) ⊗ kscale-ksim (0) ⊗ mujoco-playground (0) = 0 ✓

All three are ERGODIC — coordination/infrastructure skills.
This is a "resonant triad" where all components coordinate.

For generation (+1), add: skill-creator, algorithmic-art
For verification (-1), add: sheaf-cohomology, code-review
```

### Skill Colors (drand seed 12005093902789493003)

| Skill | Trit | Color | Role |
|-------|------|-------|------|
| `active-inference` | 0 | `#DF8D0F` | Coordination (theory) |
| `kscale-ksim` | 0 | `#25BC3D` | Coordination (simulation) |
| `mujoco-playground` | 0 | `#93DBDA` | Coordination (framework) |

## 2-3-5-7 Prime Sieve Experts

Applying prime-indexed refinement to identify domain experts:

| Prime | Expert | Domain | Key Contribution |
|-------|--------|--------|------------------|
| 2 | Patrick Kenny | Active Inference | Mean field formulation, PAD criterion |
| 3 | Thomas Parr | Active Inference | 2022 textbook, EFE derivation |
| 5 | Ben Bolte | K-Scale | ksim architecture, open-source humanoids |
| 7 | Karl Friston | Free Energy Principle | FEP foundations, continuous formulation |
| 11 | (DeepMind team) | MuJoCo Playground | MJX, sim2real zero-shot |
| 13 | Wesley Maa | K-Scale | Tooling, visualization |

## Mutual Awareness

This skill references and is referenced by:

```yaml
depends_on:
  - kscale-ksim        # Simulation implementation
  - kscale-ecosystem   # Hardware context
  - mujoco-playground  # Framework foundation
  
referenced_by:
  - cognitive-superposition  # Team mental models
  - parametrised-optics-cybernetics  # Category theory bridge
  - reafference-corollary-discharge  # Sensorimotor prediction
```

## Implementation Pattern

```python
# Unified Active Inference + RL Training Loop
class ActiveInferenceTrainer:
    """
    Combines Kenny's PAD criterion with ksim's PPO.
    """
    def __init__(self, hmm: PhysicsEngine, config: Config):
        self.hmm = hmm
        self.actor = Actor(config)
        self.critic = Critic(config)
        
    def perception_action_divergence(
        self, 
        observations: Array,  # O_{1:t} (past)
        q_future: Distribution  # Q(S_{t+1:T}, O_{t+1:T})
    ) -> Scalar:
        """
        Kenny's PAD = VFE(past) + KL(future states from HMM)
        """
        # Past: standard VFE on observation history
        vfe_past = self.variational_free_energy(observations)
        
        # Future: KL divergence of predicted states from HMM
        # Note: Observable emissions cancel out in future KL
        kl_future = self.kl_future_states(q_future, self.hmm)
        
        return vfe_past + kl_future
    
    def train_step(self, trajectory: Trajectory) -> Metrics:
        # PPO updates approximate mean field coordinate ascent
        # Entropy bonus provides Kenny's regularization
        return ppo_update(
            self.actor, 
            self.critic, 
            trajectory,
            entropy_coef=0.01  # ← The regularizer!
        )
```

## References

- [Kenny (2025) Active Inference from First Principles](https://arxiv.org/abs/2511.20321)
- [Parr, Pezzulo, Friston (2022) Active Inference Textbook](https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind)
- [ActInf ModelStream 019.1](https://www.youtube.com/watch?v=...) - Jan 15, 2026
- [K-Scale Labs GitHub](https://github.com/kscalelabs)
- [MuJoCo Playground](https://playground.mujoco.org/)
- [Ben Bolte's Blog](https://ben.bolte.cc/)

## ACSet Schema

```julia
@present SchActiveInferenceRobotics(FreeSchema) begin
    # Objects
    HMM::Ob           # Hidden Markov Model (generative model)
    State::Ob         # Latent state
    Observation::Ob   # Sensory observation
    Action::Ob        # Motor command
    Policy::Ob        # Action sequence
    
    # Morphisms (inference)
    perceive::Hom(Observation, State)    # Perception: O → S
    predict::Hom(State, Observation)     # Prediction: S → O
    act::Hom(State, Action)              # Action selection: S → A
    transition::Hom(State × Action, State)  # Dynamics: S × A → S'
    
    # Attributes
    FreeEnergy::AttrType
    vfe::Attr(State, FreeEnergy)         # Variational free energy
    efe::Attr(Policy, FreeEnergy)        # Expected free energy
    pad::Attr(Policy, FreeEnergy)        # Perception/action divergence
    
    # The key relationship (Kenny's contribution):
    # pad ≈ efe + entropy_regularizer
end
```


## SDF Interleaving

This skill connects to **Software Design for Flexibility** (Hanson & Sussman, 2021):

### Primary Chapter: 10. Adventure Game Example

**Concepts**: autonomous agent, game, synthesis

### GF(3) Balanced Triad

```
active-inference-robotics (+) + SDF.Ch10 (+) + [balancer] (+) = 0
```

**Skill Trit**: 1 (PLUS - generation)

### Secondary Chapters

- Ch3: Variations on an Arithmetic Theme
- Ch4: Pattern Matching

### Connection Pattern

Adventure games synthesize techniques. This skill integrates multiple patterns.