{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Intelligent Agents and Active Inference" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Preliminaries\n", "\n", "- Goal \n", " - Introduction to Active Inference and application to the design of synthetic intelligent agents \n", "- Materials \n", " - Mandatory\n", " - These lecture notes\n", " - Bert de Vries - 2021 - Presentation on [Beyond deep learning: natural AI systems](https://youtu.be/QYbcm6G_wsk?si=G9mkjmnDrQH9qk5k) (video)\n", " - Optional\n", " - Bert de Vries, Tim Scarfe and Keith Duggar - 2023 - Podcast on [Active Inference](https://youtu.be/2wnJ6E6rQsU?si=I4_k40j42_8E4igP). Machine Learning Street Talk podcast\n", " - Quite extensive discussion on many aspect regarding the Free Energy Principle and Active Inference, in particular relating to its implementation.\n", " - Raviv (2018), [The Genius Neuroscientist Who Might Hold the Key to True AI](./files/WIRED-Friston.pdf).\n", " - Interesting article on Karl Friston, who is a leading theoretical neuroscientist working on a theory that relates life and intelligent behavior to physics (and Free Energy minimization). (**highly recommended**) \n", " - Friston et al. (2022), [Designing Ecosystems of Intelligence from First Principles](https://arxiv.org/abs/2212.01354)\n", " - Friston's vision on the future of AI. \n", " - Van de Laar and De Vries (2019), [Simulating Active Inference Processes by Message Passing](https://www.frontiersin.org/articles/10.3389/frobt.2019.00020/full)\n", " - How to implement active inference by message passing in a Forney-style factor graph.\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Agents\n", "\n", "- In the previous lessons we assumed that a data set was given. \n", "- In this lesson we consider _agents_. An agent is a system that _interacts_ with its environment through both sensors and actuators.\n", "- Crucially, by acting onto the environment, the agent is able to affect the data that it will sense in the future.\n", " - As an example, by changing the direction where I look, I can affect the (visual) data that will be sensed by my retina.\n", "- With this definition of an agent, (biological) organisms are agents, and so are robots, self-driving cars, etc.\n", "- In an engineering context, we are particularly interesting in agents that behave with a *purpose* (with a goal in mind), e.g., to drive a car or to design a speech recognition algorithm.\n", "- In this lesson, we will describe how __goal-directed behavior__ by biological (and synthetic) agents can also be interpreted as minimization of a free energy functional. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Illustrative Example: the Mountain Car Problem\n", "\n", "- In this example, we consider [the mountain car problem](https://en.wikipedia.org/wiki/Mountain_car_problem) which is a classical benchmark problem in the reinforcement learning literature.\n", "- The car aims to drive up a steep hill and park at a designated location. However, its engine is too weak to climb the hill directly. Therefore, a successful agent should first climb a neighboring hill, and subsequently use its momentum to overcome the steep incline towards the goal position. \n", "- We will assume that the agent's knowledge about the car's process dynamics (i.e., its equations of motion) are known up to some additive Gaussian noise.\n", "- Your challenge is to design an agent that guides the car to the goal position. (The agent should be specified as a probabilistic model and the control signal should be formulated as a Bayesian inference task). \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
\n", "\n", "- Solution at the end of this lesson." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Karl Friston and the Free Energy Principle\n", "\n", "- We begin with a motivating example that requires \"intelligent\" goal-directed decision making: assume that you are an owl and that you're hungry. What are you going to do?\n", "\n", "- Have a look at [Prof. Karl Friston](https://www.wired.com/story/karl-friston-free-energy-principle-artificial-intelligence/)'s answer in this [video segment on the cost function for intelligent behavior](https://youtu.be/L0pVHbEg4Yw). (**Do watch the video!**)\n", "\n", "- Friston argues that intelligent decision making (behavior, action making) by an agent requires *minimization of a functional of beliefs*. \n", "\n", "- Friston further argues (later in the lecture and his papers) that this functional is a (variational) free energy (to be defined below), thus linking decision-making and acting to Bayesian inference. \n", "\n", "- In fact, Friston's **Free Energy Principle** (FEP) claims that all [biological self-organizing processes (including brain processes) can be described as Free Energy minimization in a probabilistic model](https://arxiv.org/abs/2201.06387).\n", " - This includes perception, learning, attention mechanisms, recall, acting and decision making, etc.\n", " \n", "- Taking inspiration from FEP, if we want to develop synthetic \"intelligent\" agents, we have (only) two issues to consider:\n", " 1. Specification of the FE functional.\n", " 2. *How* to minimize the FE functional (often in real-time under situated conditions). \n", "\n", "- Agents that follow the FEP are said to be involved in **Active Inference** (AIF). An AIF agent updates its states and parameters (and ultimately its model structure) solely by FE minimization, and selects its actions through (expected) FE minimization (to be explained below). \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Execution of an AIF Agent\n", "\n", "- Consider an AIF agent with observations (sensory states) $x_t$, latent internal states $s_t$ and latent control states $u_t$ for $t=1,2,\\ldots$. \n", "\n", "\n", "\n", "- The agent is embedded in an environment with \"external states\" $\\tilde{s}_t$. The dynamics of the environment are driven by actions. \n", "\n", "- Actions $a_t$ are selected by the agent. Actions affect the environment and consequently affect future observations. \n", "\n", "- In pseudo-code, an AIF agent executes the following algorithm:\n", "\n", "> **ACTIVE INFERENCE (AIF) AGENT ALGORITHM** \n", ">\n", "> SPECIFY generative model $p(x,s,u)$ \n", "> ASSUME/SPECIFY environmental process $R$\n", ">\n", "> FORALL t DO \n", "> \n", "> 1. $(x_t, \\tilde{s}_t) = R(a_t, \\tilde{s}_{t-1})$ % environment generates new observation \n", "> 2. $q(s_t) = \\arg\\min_q F[q]$ % update agent's internal states (\"perception\") \n", "> 3. $q(u_{t+1}) = \\arg\\min_q F_>[q]$ % update agent's control states (\"actions\") \n", "> 4. $a_{t+1} \\sim q(u_{t+1})$ % sample next action and push to environment\n", "> \n", "> END \n", "\n", "- In the above algorithm, $F[q]$ and $F_>[q]$ are appropriately defined Free Energy functionals, to be discussed below. Next, we discuss these steps in more details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Generative Model in an AIF agent\n", "\n", "- What should the agent's model $p(x,s,u)$ be modeling? This question was (already) answered by [Conant and Ashby (1970)](https://www.tandfonline.com/doi/abs/10.1080/00207727008920220) as the [*good regulator theorem*](https://en.wikipedia.org/wiki/Good_regulator ): **every good regulator of a system must be a model of that system**. See the [OPTIONAL SLIDE for more information](#good-regulator-theorem). \n", "\n", "- Conant and Ashley state: \"The theorem has the interesting corollary that the living brain, so far as it is to be successful and efficient as a regulator for survival, __must__ proceed, in learning, by the formation of a model (or models) of its environment.\"\n", "\n", "- Indeed, perception in brains is clearly affected by predictions about sensory inputs by the brain's own generative model.\n", "\n", "\n", "\n", "- In the above picture (The Gardener, by Giuseppe Arcimboldo, ca 1590\n", "), on the left you will likely see a bowl of vegetables, while the same picture upside down elicits with most people the perception of a gardener's face rather than an upside-down vegetable bowl. \n", "\n", "- The reason is that the brain's model predicts to see straight-up faces with much higher probability than upside-down vegetable bowls. \n", "\n", "- So the agent's model $p$ will be a model that aims to explain how environmental causes (latent states) lead to sensory observations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specification of AIF Agent's model and Environmental Dynamics\n", "\n", "- In this notebook, for illustrative purposes, we specify the **generative model** at time step $t$ of an AIF agent as \n", "$$\n", "p(x_t,s_t,u_t|s_{t-1}) = \\underbrace{p(x_t|s_t)}_{\\text{observations}} \\cdot \\underbrace{p(s_t|s_{t-1},u_t)}_{\\substack{\\text{state} \\\\ \\text{transition}}} \\cdot \\underbrace{p(u_t)}_{\\substack{\\text{action} \\\\ \\text{prior}}}\n", "$$\n", "\n", "- We will assume that the agent interacts with an environment, which we represent by a dynamic model $R$ as\n", "$$\n", "(x_t,\\tilde{s}_t) = R\\left( a_t,\\tilde{s}_{t-1}\\right)$$\n", "where $a_t$ are _actions_ (by the agent), $x_t$ are _outcomes_ (the agent's observations) and $\\tilde{s}_t$ holds the environmental latent _states_. \n", "\n", "- Note that $R$ only needs to be specified for simulated environments. If we were to deploy the agent in a real-world environment, we would not need to specify $R$. \n", "\n", "- The agent's knowledge about environmental process $R$ is expressed by its generative model $p(x_t,s_t,u_t|s_{t-1})$. \n", "\n", "- Note that we distinguish between _control states_ and _actions_. Control states $u_t$ are latent variables in the agent's generative model. An action $a_t$ is a realization of a control state as observed by the environment. \n", "\n", "- Observations $x_t$ are generated by the environment and observed by the agent. Vice versa, actions $a_t$ are generated by the agent and observed by the environment. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### State Updating in the AIF Agent\n", "\n", "- After the agent makes a new observation $x_t$, it will update beliefs over its latent variables. First the internal state variables $s$. \n", "\n", "- Assume the following at time step $t$:\n", " - the state of the agent's model has already been updated to $q(s_{t-1}| x_{1:t-1})$. \n", " - the agent has selected a new action $a_t$.\n", " - the agent has recorded a new observation $x_t$. \n", "\n", "- The **state updating** task is to infer $q(s_{t}|x_{1:t})$, based on the previous estimate $q(s_{t-1}| x_{1:t-1})$, the new data $\\{a_t,x_t\\}$, and the agent's generative model. \n", "\n", "- Technically, this is a Bayesian filtering task. In a real brain, this process is called **perception**. \n", "\n", "- We specify the following FE functional\n", "$$\n", "F[q] = \\sum_{s_t} q(s_t| x_{1:t}) \\log \\frac{\\overbrace{q(s_t| x_{1:t})}^{\\text{state posterior}}}{\\underbrace{p( x_t|s_t) p(s_t|s_{t-1},a_t)}_{\\text{generative model w new data}} \\underbrace{q(s_{t-1}|x_{1:t-1})}_{\\text{state prior}}}\n", "$$\n", "\n", "- The state updating task can be formulated as minimization of the above FE (see also [AIF Algorithm](#AIF-algorithm), step 2):\n", "$$\n", "q(s_t|x_{1:t}) = \\arg\\min_q F[q]\n", "$$\n", "\n", "- In case the generative model is a _Linear Gaussian Dynamical System_, minimization of the FE can be solved analytically in closed-form and [leads to the standard Kalman filter](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Dynamic-Models.ipynb#kalman-filter). \n", "\n", "- In case these (linear Gaussian) conditions are not met, we can still minimize the FE by other means and arrive at some approximation of the Kalman filter, see for example [Baltieri and Isomura (2021)](https://arxiv.org/abs/2111.10530) for a Laplace approximation to variational Kalman filtering. \n", "\n", "- Our toolbox [RxInfer](http://rxinfer.ml) specializes in automated execution of this minimization task. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Policy Updating in an AIF Agent\n", "\n", "- Once the agent has updated its internal states, it will turn to inferring the next action. \n", "\n", "- In order to select a __good__ next action, we need to investigate and compare consequences of a _sequence_ of future actions. \n", "\n", "- A sequence of future actions $a= (a_{t+1}, a_{t+2}, \\ldots, a_{t+T})$ is called a **policy**. Since relevant consequences are usually the result of an future action sequence rather than a single action, we will be interested in updating beliefs over policies. \n", "\n", "- In order to assess the consequences of a selected policy, we will, as a function of that policy, run the generative model forward-in-time to make predictions about future observations $x_{t+1:t+T}$. \n", "\n", "- Note that perception (state updating) preceeds policy updating. In order to accurately predict the future, the agent first needs to understand the current state of the world. \n", "\n", "- Consider an AIF agent at time step $t$ with (future) observations $x = (x_{t+1}, x_{t+2}, \\ldots, x_{t+T})$, latent future internal states $s= (s_t, s_{t+1}, \\ldots, s_{t+T})$, and latent future control variables $u= (u_{t+1}, u_{t+2}, \\ldots, u_{t+T})$. \n", "\n", "- From the agent's viewpoint, the evolution of these future variables are constrained by its generative model, rolled out into the future:\n", "$$\\begin{align*}\n", "p(x,s,u) &= \\underbrace{q(s_{t})}_{\\substack{\\text{current}\\\\ \\text{state}}} \\cdot \\underbrace{\\prod_{k=t+1}^{t+T} p(x_k|s_k) \\cdot p(s_k | s_{k-1}, u_k) p(u_k)}_{\\text{GM roll-out to future}}\n", "\\end{align*}$$\n", "\n", "- Consider the Free Energy functional for estimating posterior beliefs $q(s,u)$ over latent _future_ states and latent _future_ control signals: \n", "$$\\begin{align*}\n", "F_>[q] &= \\overbrace{\\sum_{x,s} q(x|s)}^{\\text{marginalize }x} \\bigg( \\overbrace{\\sum_u q(s,u) \\log \\frac{q(s,u)}{p(x,s,u)} }^{\\text{\"regular\" variational Free Energy}}\\bigg) \\\\\n", "&= \\sum_{x,s,u} q(x,s,u) \\log \\frac{q(s,u)}{p(x,s,u)}\n", "\\end{align*}$$\n", "\n", "- In principle, this is a regular FE functional, with one difference to previous versions: since future observations $x$ have not yet occurred, $F_>[q]$ marginalizes not only over latent states $s$ and policies $u$, but also over future observations $x$.\n", "\n", "- We will update the beliefs over policies by minimization of Free Energy functional $F_>[q]$. In the [optional slides below, we prove that the solution to this optimization task](#q-star) is given by (see [AIF Algorithm](#AIF-algorithm), step 3, above)\n", "$$\\begin{aligned}\n", "q^*(u) &= \\arg\\min_q F_>[q] \\\\\n", "&\\propto p(u)\\exp(-G(u))\\,,\n", "\\end{aligned}$$\n", " where the factor $p(u)$ is a prior over admissible policies, and the factor $\\exp(-G(u))$ updates the prior with information about future consequences of a selected policy $u$. \n", "\n", "- The function \n", "$$G(u) = \\sum_{x,s} q(x,s|u) \\log \\frac{q(s|u)}{p(x,s|u)}$$ \n", "is called the **Expected Free Energy** (EFE) for policy $u$. \n", "\n", "- The FEP takes the following stance: if FE minimization is all that an agent does, then the only consistent and appropriate behavior for an agent is to select actions that minimize the **expected** Free Energy in the future (where expectation is taken over current beliefs about future observations). \n", "\n", "- Note that, since $q^*(u) \\propto p(u)\\exp(-G(u))$, the probability $q^*(u)$ for selecting a policy $u$ increases when EFE $G(u)$ gets smaller. \n", "\n", "- Once the policy (control) variables have been updated, in simulated environments, it is common to assume that the next action $a_{t+1}$ (an action is the _observed_ control variable by the environment) gets selected in proportion to the probability of the related control variable (see [AIF Agent Algorithm](#AIF-algorithm), step 4, above), i.e., the environment samples the action from the control posterior:\n", "$$\n", "a_{t+1} \\sim q(u_{t+1}) \n", "$$\n", "\n", "- Next, we analyze some properties of the EFE." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Active Inference Analysis: exploitation-exploration dilemma \n", "\n", "- Consider the following decomposition of EFE:\n", "$$\\begin{aligned}\n", "G(u) &= \\sum_{x,s} q(x,s|u) \\log \\frac{q(s|u)}{p(x,s|u)} \\\\\n", "&= \\sum_{x,s} q(x,s|u) \\log \\frac{1}{p(x)} + \\sum_{x,s} q(x,s|u) \\log \\frac{q(s|u)}{p(s|x,u)}\\frac{q(s|x)}{q(s|x)} \\\\\n", "&= \\sum_x q(x|u) \\log \\frac{1}{p(x)} + \\sum_{x,s} q(x,s|u) \\log \\frac{q(s|u)}{q(s|x)} + \\underbrace{\\sum_{x,s} q(x,s|u) \\log \\frac{q(s|x)}{p(s|x,u)}}_{E\\left[ D_{\\text{KL}}[q(s|x),p(s|x,u)] \\right]\\geq 0} \\\\\n", "&\\geq \\underbrace{\\sum_x q(x|u) \\log \\frac{1}{p(x)}}_{\\substack{\\text{goal-seeking behavior} \\\\ \\text{(exploitation)}}} - \\underbrace{\\sum_{x,s} q(x,s|u) \\log \\frac{q(s|x)}{q(s|u)}}_{\\substack{\\text{information-seeking behavior}\\\\ \\text{(exploration)}}} \n", "\\end{aligned}$$ \n", "\n", "- Apparently, minimization of EFE leads to selection of policies that balances the following two imperatives: \n", "\n", " 1. minimization of the first term of $G(u)$, i.e. minimizing $\\sum_x q(x|u) \\log \\frac{1}{p(x)}$, leads to policies ($u$) that align the inferred observations $q(x|u)$ under policy $u$ (i.e., predicted future observations under policy $u$) with a prior $p(x)$ on future observations. We are in control to choose any prior $p(x)$ and usually we choose a prior that aligns with desired (goal) observations. Hence, policies with low EFE leads to **goal-seeking behavior** (a.k.a. pragmatic behavior or exploitation). [In the OPTIONAL SLIDES](#ambiguity-plus-risk), we derive an alternative (perhaps clearer) expression to support this interpretation]. \n", " \n", " 1. minimization of $G(u)$ maximizes the second term \n", " $$\\begin{aligned}\n", " \\sum_{x,s} q(x,s|u) \\log \\frac{q(s|x)}{q(s|u)} &= \\sum_{x,s} q(x,s|u) \\log \\frac{q(s|x)}{q(s|u)}\\frac{q(x|u)}{q(x|u)} \\\\\n", " &= \\underbrace{\\sum_{x,s} q(x,s|u) \\log \\frac{q(x,s|u)}{q(x|u)q(s|u)}}_{\\text{(conditional) mutual information }I[x,s|u]}\n", " \\end{aligned}$$ \n", " which is the (conditional) [__mutual information__](https://en.wikipedia.org/wiki/Mutual_information) between (posteriors on) future observations and states, for a given policy $u$. Thus, maximizing this term leads to actions that maximize statistical dependency between future observations and states. In other words, a policy with low EFE also leads to **information-seeking behavior** (a.k.a. epistemic behavior or exploration). \n", "\n", "- (The third term $\\sum_{x,s} q(x,s|u) \\log \\frac{q(s|x)}{p(s|x)}$ is an (expected) KL divergence between posterior and prior on the states. This can be interpreted as a complexity/regularization term and $G(u)$ minimization will drive this term to zero.) \n", "\n", "- Seeking actions that balance goal-seeking behavior (exploitation) and information-seeking behavior (exploration) is a [fundamental problem in the Reinforcement Learning literature](https://en.wikipedia.org/wiki/Exploration-exploitation_dilemma). \n", "\n", "- **Active Inference solves the exploration-exploitation dilemma**. Both objectives are served by EFE minimization without any need for tuning parameters. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### AIF Agents learn both the Problem and Solution\n", "\n", "- We highlight another great feature of FE minimizing agents. Consider an AIF agent ($m$) with generative model $p(x,s,u|m)$.\n", "\n", "- Consider the Divergence-Evidence decomposition of the FE again:\n", "\n", "$$\\begin{aligned}\n", "F[q] &= \\sum_{s,u} q(s,u) \\log \\frac{q(s,u)}{p(x,s,u|m)} \\\\\n", "&= \\underbrace{-\\log p(x|m)}_{\\substack{\\text{problem} \\\\ \\text{representation costs}}} + \\underbrace{\\sum_{s,u} q(s,u) \\log \\frac{q(s,u)}{p(s,u|x,m)}}_{\\text{solution costs}}\n", "\\end{aligned}$$\n", "\n", "- The first term, $-\\log p(x|m)$, is the (negative log-) evidence for model $m$, given recorded data $x$. \n", "\n", "- Minimization of FE maximizes the evidence for the given model. The model captures the **problem representation**. A model with high evidence predicts the data well and therefore \"understands the world\". \n", "\n", "- The second term scores the cost of inference. In almost all cases, the solution to a problem can be phrased as an inference task on the generative model. Hence, the second term **scores the accuracy of the inferred solution**, for the given model. \n", "\n", "- FE minimization optimizes a balanced trade-off between a good-enough problem representation and a good-enough solution proposal for that model. Since FE comprises both a cost for solution _and_ problem representation, it is a neutral criterion that applies across a very wide set of problems. \n", "\n", "- A good solution to the wrong problem is not good enough. A poor solution to a great problem statement is not sufficient either. In order to solve a problem well, we need both to represent the problem correctly (high model evidence) and we need to solve it well (low inference costs). \n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Brain's Action-Perception Loop by FE Minimization\n", "\n", "- The above derivations are not trivial, but we have just shown that FE-minimizing agents accomplish variational Bayesian perception (a la Kalman filtering), and a balanced exploration-exploitation trade-off for policy selection. \n", "\n", "- Moreover, the FE by itself serves as a proper objective across a very wide range of problems, since it scores both the cost of the problem statement and the cost of inferring the solution. \n", "\n", "- The current FEP theory claims that minimization of FE (and EFE) is all that brains do, i.e., FE minimization leads to perception, policy selection, learning, structure adaptation, attention, learning of problems and solutions, etc.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Engineering Challenge: Synthetic AIF Agents \n", "\n", "- We have here a framework (the FEP) for emergent intelligent behavior in self-organizing biological systems that\n", " - leads to optimal (Bayesian) information processing, including balancing accuracy vs complexity.\n", " - leads to balanced and continual learning of both problem representation and solution proposal\n", " - actively selects data in-the-field under situated conditions (no dependency on large data base)\n", " - pursues a optimal trade-off between exploration (information-seeking) and exploitation (goal-seeking) behavior\n", " - needs no external tuning parameters (such as step sizes, thresholds, etc.)\n", "\n", "- Clearly, the FEP, and synthetic AIF agents as a realization of FEP, comprise a very attractive framework for all things relating to AI and AI agents. \n", "\n", "- A current big AI challenge is to design synthetic AIF agents based solely on FE/EFE minimization.\n", "\n", " \n", "\n", "- Executing a synthetic AIF agent often poses a large computational problem because of the following reasons: \n", " 1. For interesting problems (e.g. speech recognition, scene analysis), generative models may contain thousands of latent variables. \n", " 2. The FE function is a time-varying function, since it is also a function of observable variables. \n", " 3. An AIF agent must execute inference in real-time if it is engaged and embedded in a real world environment.\n", " \n", "- So, in practice, executing a synthetic AIF agent may lead to a **task of minimizing a time-varying FE function of thousands of variables in real-time**!!\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Factor Graph Approach to Modeling of an Active Inference Agent\n", "\n", "- How to specify and execute a synthetic AIF agent is an active area of research. \n", "\n", "- There is no definitive solution approach to AIF agent modeling yet; we ([BIASlab](http://biaslab.org)) think that (reactive) message passing in a factor graph representation provides a promising path. \n", "\n", "- After selecting an action $a_t$ and making an observation $x_t$, the FFG for the rolled-out generative model is given by the following FFG:\n", "\n", "\n", "\n", "- The open red nodes for $p(x_{t+k})$ specify __desired future observations__, whereas the open black boxes for $p(s_k|s_{k-1},u_k)$ and $p(x_k|s_k)$ reflect the agent's beliefs about how the world actually evolves (ie, the __veridical model__). \n", "\n", "- The (brown) dashed box is the agent's Markov blanket. Given the states on the Markov blanket, the internal states of the agent are independent of the state of the world. \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### How to minimize FE: Online Active Inference\n", "\n", "- [Online active inference proceeds by iteratively executing three stages](https://www.frontiersin.org/articles/10.3389/frobt.2019.00020/full): \n", " 1. act-execute-observe \n", " 2. infer: update the latent variables and select an action \n", " 3. slide forward\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The Mountain car Problem Revisited\n", "\n", "Here we solve the mountain car problem as stated at the beginning of this lesson. Before implementing the active inference agent, let's first perform a naive approach that executes the engine's maximum power to reach the goal. As can be seen in the results, this approach fails since the car's engine is not strong enough to reach the goal directly. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using Pkg; Pkg.activate(\"../.\"); Pkg.instantiate();\n", "using IJulia; try IJulia.clear_output(); catch _ end" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "using LinearAlgebra, Plots, RxInfer\n", "import .ReactiveMP: getrecent, messageout\n", "include(\"./scripts/mountaincar_helper.jl\")\n", "\n", "# Environment variables\n", "initial_position = -0.5\n", "initial_velocity = 0.0\n", "engine_force_limit = 0.04\n", "friction_coefficient = 0.1\n", "\n", "Fa, Ff, Fg, height = create_physics(\n", " engine_force_limit = engine_force_limit,\n", " friction_coefficient = friction_coefficient\n", ")\n", "\n", "# Target position and velocity\n", "target = [0.5, 0.0];" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/home/wmkouw/syndr/Wouter/Onderwijs/Vakken/tueindhoven/5SSD0 - Bayesian Machine Learning & Information Processing/2024-2025 Q2/BMLIP/lessons/notebooks/ai_agent/ Mountain Car Problem: naive policy.gif\")" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Simulation of a naive policy, going full power toward the parking place \n", "\n", "# Let there be a world\n", "(execute, observe) = create_world(\n", " Fg = Fg, Ff = Ff, Fa = Fa, \n", " initial_position = initial_position, \n", " initial_velocity = initial_velocity\n", ")\n", "\n", "# Total simulation time\n", "N = 40 \n", "\n", "y = Vector{Vector{Float64}}(undef, N)\n", "for n in 1:N\n", " execute(100.0) # Act with the maximum power \n", " y[n] = observe() # Observe the current environmental outcome\n", "end\n", "\n", "plot_car(y, target, title_plot=\"Mountain Car Problem: naive policy\", fps=5)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "", "image/svg+xml": [ "\n", "\n" ], "text/html": [ "\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Let's also plot the goal and car positions over time \n", "trajectories = reduce(hcat,y)'\n", "plot(trajectories[:,1], label=\"car: naive policy\", title = \"Car and Goal Positions\", color = \"orange\")\n", "plot!(0.5 * ones(N), color = \"black\", linestyle=:dash, label = \"goal\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we try a more sophisticated active inference agent. Above, we specified a probabilistic generative model for the agent's environment and then constrained future observations by a prior distribution that is located on the goal position. We then execute the (1) Act-execute-observe --> (2) infer --> (3) slide procedures as discussed above to infer future actions. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "create_agent (generic function with 1 method)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "@model function mountain_car(m_u, V_u, m_x, V_x, m_s_t_min, V_s_t_min, T, Fg, Fa, Ff, engine_force_limit)\n", " \n", " # Transition function modeling transition due to gravity and friction\n", " g = (s_t_min::AbstractVector) -> begin \n", " \n", " s_t = similar(s_t_min) # Next state\n", " s_t[2] = s_t_min[2] + Fg(s_t_min[1]) + Ff(s_t_min[2]) # Update velocity\n", " s_t[1] = s_t_min[1] + s_t[2] # Update position\n", " \n", " return s_t\n", " end\n", " \n", " # Function for modeling engine control\n", " h = (u::AbstractVector) -> [0.0, Fa(u[1])] \n", " \n", " # Inverse engine force, from change in state to corresponding engine force\n", " h_inv = (delta_s_dot::AbstractVector) -> [atanh(clamp(delta_s_dot[2], -engine_force_limit+1e-3, engine_force_limit-1e-3)/engine_force_limit)] \n", " \n", " # Internal model perameters\n", " Gamma = 1e4*diageye(2) # Transition precision\n", " Theta = 1e-4*diageye(2) # Observation variance\n", "\n", " s_t_min ~ MvNormal(mean = m_s_t_min, cov = V_s_t_min)\n", " s_k_min = s_t_min\n", "\n", " local s\n", " \n", " for k in 1:T\n", " u[k] ~ MvNormal(mean = m_u[k], cov = V_u[k])\n", " u_h_k[k] ~ h(u[k]) where { meta = DeltaMeta(method = Linearization(), inverse = h_inv) }\n", " s_g_k[k] ~ g(s_k_min) where { meta = DeltaMeta(method = Linearization()) }\n", " u_s_sum[k] ~ s_g_k[k] + u_h_k[k]\n", " s[k] ~ MvNormal(mean = u_s_sum[k], precision = Gamma)\n", " x[k] ~ MvNormal(mean = s[k], cov = Theta)\n", " x[k] ~ MvNormal(mean = m_x[k], cov = V_x[k]) # goal\n", "\n", " s_k_min = s[k]\n", " end\n", " \n", " return (s, )\n", "end\n", "\n", "@meta function car_meta()\n", " dzdt() -> DeltaMeta(method = Linearization())\n", "end\n", "\n", "function create_agent(;T = 20, Fg, Fa, Ff, engine_force_limit, target, initial_position, initial_velocity)\n", " \n", " Epsilon = fill(huge, 1, 1) # Control prior variance\n", " m_u = Vector{Float64}[ [ 0.0] for k=1:T ] # Set control priors\n", " V_u = Matrix{Float64}[ Epsilon for k=1:T ]\n", "\n", " Sigma = 1e-4*diageye(2) # Goal prior variance\n", " m_x = [zeros(2) for k=1:T]\n", " V_x = [huge*diageye(2) for k=1:T]\n", " V_x[end] = Sigma # Set prior to reach goal at t=T\n", "\n", " # Set initial brain state prior\n", " m_s_t_min = [initial_position, initial_velocity] \n", " V_s_t_min = tiny * diageye(2)\n", " \n", " # Set current inference results\n", " result = nothing\n", "\n", " # The `compute` function performs Bayesian inference by message passing\n", " compute = (upsilon_t::Float64, y_hat_t::Vector{Float64}) -> begin\n", " \n", " m_u[1] = [ upsilon_t ] # Register action with the generative model\n", " V_u[1] = fill(tiny, 1, 1) # Clamp control prior to performed action\n", "\n", " m_x[1] = y_hat_t # Register observation with the generative model\n", " V_x[1] = tiny*diageye(2) # Clamp goal prior to observation\n", "\n", " data = Dict(:m_u => m_u, \n", " :V_u => V_u, \n", " :m_x => m_x, \n", " :V_x => V_x,\n", " :m_s_t_min => m_s_t_min,\n", " :V_s_t_min => V_s_t_min)\n", " \n", " model = mountain_car(T=T, Fg=Fg, Fa=Fa, Ff=Ff, engine_force_limit=engine_force_limit) \n", " result = infer(model = model, data = data)\n", " end\n", " \n", " # The `act` function returns the inferred best possible action\n", " act = () -> begin\n", " if result !== nothing\n", " return mode(result.posteriors[:u][2])[1]\n", " else\n", " # Without inference result we return some 'random' action\n", " return 0.0 \n", " end\n", " end\n", " \n", " # The `future` function returns the inferred future states\n", " future = () -> begin \n", " if result !== nothing \n", " return getindex.(mode.(result.posteriors[:s]), 1)\n", " else\n", " return zeros(T)\n", " end\n", " end\n", "\n", " # The `slide` function modifies the `(m_s_t_min, V_s_t_min)` for the next step\n", " # and shifts (or slides) the array of future goals `(m_x, V_x)` and inferred actions `(m_u, V_u)`\n", " slide = () -> begin\n", "\n", " model = RxInfer.getmodel(result.model)\n", " (s, ) = RxInfer.getreturnval(model)\n", " varref = RxInfer.getvarref(model, s) \n", " var = RxInfer.getvariable(varref)\n", " \n", " slide_msg_idx = 3 # This index is model dependent\n", " (m_s_t_min, V_s_t_min) = mean_cov(getrecent(messageout(var[2], slide_msg_idx)))\n", "\n", " m_u = circshift(m_u, -1)\n", " m_u[end] = [0.0]\n", " V_u = circshift(V_u, -1)\n", " V_u[end] = Epsilon\n", "\n", " m_x = circshift(m_x, -1)\n", " m_x[end] = target\n", " V_x = circshift(V_x, -1)\n", " V_x[end] = Sigma\n", " end\n", "\n", " return (compute, act, slide, future) \n", "end" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "Plots.AnimatedGif(\"/home/wmkouw/syndr/Wouter/Onderwijs/Vakken/tueindhoven/5SSD0 - Bayesian Machine Learning & Information Processing/2024-2025 Q2/BMLIP/lessons/notebooks/ai_agent/ Mountain Car Problem: active inference agent.gif\")" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Create another world\n", "(execute_ai, observe_ai) = create_world(\n", " Fg = Fg, \n", " Ff = Ff, \n", " Fa = Fa, \n", " initial_position = initial_position, \n", " initial_velocity = initial_velocity\n", ")\n", "\n", "# Planning horizon\n", "T_ai = 50\n", "\n", "# Let there be an agent\n", "(compute_ai, act_ai, slide_ai, future_ai) = create_agent(; \n", " \n", " T = T_ai, \n", " Fa = Fa,\n", " Fg = Fg, \n", " Ff = Ff, \n", " engine_force_limit = engine_force_limit,\n", " target = target,\n", " initial_position = initial_position,\n", " initial_velocity = initial_velocity\n", ") \n", "\n", "# Length of trial\n", "N_ai = 100\n", "\n", "# Step through experimental protocol\n", "agent_a = Vector{Float64}(undef, N_ai) # Actions\n", "agent_f = Vector{Vector{Float64}}(undef, N_ai) # Predicted future\n", "agent_x = Vector{Vector{Float64}}(undef, N_ai) # Observations\n", "\n", "for t=1:N_ai\n", " agent_a[t] = act_ai() # Invoke an action from the agent\n", " agent_f[t] = future_ai() # Fetch the predicted future states\n", " execute_ai(agent_a[t]) # The action influences hidden external states\n", " agent_x[t] = observe_ai() # Observe the current environmental outcome (update p)\n", " compute_ai(agent_a[t], agent_x[t]) # Infer beliefs from current model state (update q)\n", " slide_ai() # Prepare for next iteration\n", "end\n", "\n", "plot_car(agent_x, target, title_plot=\"Mountain Car Problem: active inference agent\", fps=5)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "", "image/svg+xml": [ "\n", "\n" ], "text/html": [ "\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Again, let's plot the goal and car positions over time\n", "trajectories = reduce(hcat, agent_x)'\n", "p1 = plot(trajectories[:,1], label=\"car: AIF agent\", title = \"Car and Goal Positions\", color = \"orange\")\n", "plot!(0.5 * ones(N_ai), color = \"black\", linestyle=:dash, label = \"goal\")\n", "p2 = plot(agent_a, title = \"Actions\", color = \"orange\")\n", "plot(p1,p2, layout = @layout [a ; b])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the AIF agent __explores__ other options, like going first in the opposite direction of the goal prior, to reach its goals. This agent is able to mix exploration (information-seeking behavior) with exploitation (goal-seeking behavior)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Extensions and Comments\n", "\n", "\n", "- Just to be sure, you don't need to memorize all FE/EFE decompositions nor are you expected to derive them on-the-spot. We present these decompositions only to provide insight into the multitude of forces that underlie FEM-based action selection.\n", "\n", "- In a sense, the FEP is an umbrella for describing the mechanics and self-organization of intelligent behavior, in man and machines. Lots of sub-fields in AI, such as reinforcement learning, can be interpreted as a special case of active inference under the FEP, see e.g., [Friston et al., 2009](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0006421). \n", "\n", "- Is EFE minimization really different from \"regular\" FE minimization? Not really, it appears that [EFE minimization can be reformulated as a special case of FE minimization](https://link.springer.com/article/10.1007/s00422-019-00805-w). In other words, FE minimization is still the only game in town.\n", "\n", "- Active inference also completes the \"scientific loop\" picture. Under the FEP, experimental/trial design is driven by EFE minimization. Bayesian probability theory (and FEP) contains all the equations for running scientific inquiry.\n", "\n", "\n", "- Essentially, AIF is an automated Scientific Inquiry Loop with an engineering twist. If there would be no goal prior, AIF would just lead to learning of a veridical (\"true\") generative model of the environment. This is what science is about. However, since we have goal prior constraints in the generative model, AIF leads to generating behavior (actions) with a purpose! For instance, when you want to cross a road, the goal prior \"I am not going to get hit by a car\", leads to inference of behavior that fulfills that prior. Similarly, through appropriate goal priors, the brain is able to design algorithms for object recognition, locomotion, speech generation, etc. In short, **AIF is an automated Bayes-optimal engineering design loop**!!\n", "\n", "- The big engineering challenge remains the computational load of AIF. The human brain consumes about 20 Watt and the neocortex only about 4 Watt (which is about the power consumption of a bicycle light). This is multiple orders of magnitude (at least 1 million times) cheaper than what we can engineer on silicon for similar tasks. \n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Final Thoughts\n", "\n", "- In the end, all the state inference, parameter estimation, etc., in this lecture series could have been implemented by FE minimization in an appropriately specified generative probabilistic model. However, the Free Energy Principle extends beyond state and parameter estimation. Driven by FE minimization, brains change their structure as well over time. In fact, the FEP extends beyond brains to a general theory for biological self-organization, e.g., [Darwin's natural selection process](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857288/) may be interpreted as a FE minimization-driven model optimization process, and here's an article on [FEP for predictive processing in plants](https://royalsocietypublishing.org/doi/10.1098/rsif.2017.0096). Moreover, Constrained-FE minimization (rephrased as the Principle of Maximum Relative Entropy) provides an elegant framework to derive most (if not all) physical laws, as Caticha exposes in his [brilliant monograph](./files/Caticha-2012-Entropic-Inference-and-the-Foundations-of-Physics.pdf) on Entropic Physics. Indeed, the framework of FE minimization is known in the physics community as the very fundamental [Principle of Least Action](https://en.wikipedia.org/wiki/Stationary-action_principle) that governs the equations-of-motion in nature. \n", "\n", "- So, the FEP is very fundamental and extends way beyond applications to machine learning. At [our research lab](http://biaslab.org) at TU/e, we work on developing FEP-based intelligent agents that go out into the world and autonomously learn to accomplish a pre-determined task, such as learning-to-walk or learning-to-process-noisy-speech-signals. Free free to approach us if you want to know more about that effort. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "##