{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Information Engines\n",
    "\n",
    "### Neil D. Lawrence\n",
    "\n",
    "### 2025-03-26"
   ],
   "id": "1478337e-4bd8-41f1-8f0f-bedd6cc41bb7"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Abstract**: The relationship between physical systems and intelligence\n",
    "has long fascinated researchers in computer science and physics. This\n",
    "talk explores fundamental connections between thermodynamic systems and\n",
    "intelligent decision-making through the lens of free energy principles.\n",
    "\n",
    "We examine how concepts from statistical mechanics - particularly the\n",
    "relationship between total energy, free energy, and entropy - might\n",
    "provide novel insights into the nature of intelligence and learning. By\n",
    "drawing parallels between physical systems and information processing,\n",
    "we consider how measurement and observation can be viewed as processes\n",
    "that modify available energy. The discussion encompasses how model\n",
    "approximations and uncertainties might be understood through\n",
    "thermodynamic analogies, and explores the implications of treating\n",
    "intelligence as an energy-efficient state-change process.\n",
    "\n",
    "While these connections remain speculative, they offer intriguing\n",
    "perspectives for discussing the fundamental nature of intelligence and\n",
    "learning systems. The talk aims to stimulate discussion about these\n",
    "potential relationships rather than present definitive conclusions."
   ],
   "id": "685694f0-5938-4394-813a-0d63b25fdb26"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "$$"
   ],
   "id": "e73ece83-f70a-41b2-8044-d08d2fd2a8f8"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "::: {.cell .markdown}\n",
    "\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!---->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- The last names to be defined. Should be defined entirely in terms of macros from above-->\n",
    "<!--\n",
    "\n",
    "-->"
   ],
   "id": "491efd7f-1322-48c8-9cf1-dbc5feff3a50"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hydrodynamica\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/daniel-bernoulli-hydrodynamica.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/daniel-bernoulli-hydrodynamica.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "When Laplace spoke of the curve of a simple molecule of air, he may well\n",
    "have been thinking of Daniel Bernoulli (1700-1782). Daniel Bernoulli was\n",
    "one name in a prodigious family. His father and brother were both\n",
    "mathematicians. Daniel’s main work was known as *Hydrodynamica*."
   ],
   "id": "fd86c747-c016-408d-b81f-a5965bfbde72"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='3yRVAAAAcAAJ', page='PP7')"
   ],
   "id": "022da6c3-71ca-48e3-858f-b447fb25d00f"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Figure: <i>Daniel Bernoulli’s *Hydrodynamica* published in 1738. It was\n",
    "one of the first works to use the idea of conservation of energy. It\n",
    "used Newton’s laws to predict the behaviour of gases.</i>\n",
    "\n",
    "Daniel Bernoulli described a kinetic theory of gases, but it wasn’t\n",
    "until 170 years later when these ideas were verified after Einstein had\n",
    "proposed a model of Brownian motion which was experimentally verified by\n",
    "Jean Baptiste Perrin."
   ],
   "id": "d7125727-f388-4871-bc8e-6ae91f673198"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='3yRVAAAAcAAJ', page='PA200')"
   ],
   "id": "475fb7b8-a8a8-41a2-bc6d-7dbb8ef47474"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Figure: <i>Daniel Bernoulli’s chapter on the kinetic theory of gases,\n",
    "for a review on the context of this chapter see Mikhailov (n.d.). For\n",
    "1738 this is extraordinary thinking. The notion of kinetic theory of\n",
    "gases wouldn’t become fully accepted in Physics until 1908 when a model\n",
    "of Einstein’s was verified by Jean Baptiste Perrin.</i>"
   ],
   "id": "7c9c3d76-2059-4d49-8abe-4e56e8704da0"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Entropy Billiards\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/entropy-billiards.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/entropy-billiards.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "<canvas id=\"multiball-canvas\" width=\"700\" height=\"500\" style=\"border:1px solid black;display:inline;text-align:left \">\n",
    "</canvas>\n",
    "\n",
    "Entropy:\n",
    "\n",
    "<output id=\"multiball-entropy\">\n",
    "</output>\n",
    "\n",
    "<button id=\"multiball-newball\" style=\"text-align:right\">\n",
    "\n",
    "New Ball\n",
    "\n",
    "</button>\n",
    "<button id=\"multiball-pause\" style=\"text-align:right\">\n",
    "\n",
    "Pause\n",
    "\n",
    "</button>\n",
    "<button id=\"multiball-skip\" style=\"text-align:right\">\n",
    "\n",
    "Skip 1000s\n",
    "\n",
    "</button>\n",
    "<button id=\"multiball-histogram\" style=\"text-align:right\">\n",
    "\n",
    "Histogram\n",
    "\n",
    "</button>\n",
    "\n",
    "<script src=\"https://cdn.plot.ly/plotly-latest.min.js\"></script>\n",
    "<script src=\"https://inverseprobability.com/talks/scripts//ballworld/ballworld.js\"></script>\n",
    "<script src=\"https://inverseprobability.com/talks/scripts//ballworld/multiball.js\"></script>\n",
    "\n",
    "Figure: <i>Bernoulli’s simple kinetic models of gases assume that the\n",
    "molecules of air operate like billiard balls.</i>"
   ],
   "id": "305e1c52-b38d-47b0-8097-5a87c6c2838a"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "892a2905-df10-4f27-a0bc-a2777233d97d"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p = np.random.randn(10000, 1)\n",
    "xlim = [-4, 4]\n",
    "x = np.linspace(xlim[0], xlim[1], 200)\n",
    "y = 1/np.sqrt(2*np.pi)*np.exp(-0.5*x*x)"
   ],
   "id": "0c030100-f74e-4f13-a145-10dfd7156570"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "c3b5a956-efca-4b03-a645-abbb4bed70db"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "ax.plot(x, y, 'r', linewidth=3)\n",
    "ax.hist(p, 100, density=True)\n",
    "ax.set_xlim(xlim)\n",
    "\n",
    "mlai.write_figure('gaussian-histogram.svg', directory='./ml')"
   ],
   "id": "d213d1b2-2764-4c3b-b8ba-043be0aaa8c1"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another important figure for Cambridge was the first to derive the\n",
    "probability distribution that results from small balls banging together\n",
    "in this manner. In doing so, James Clerk Maxwell founded the field of\n",
    "statistical physics.\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//ml/gaussian-histogram.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>James Clerk Maxwell 1831-1879 Derived distribution of\n",
    "velocities of particles in an ideal gas (elastic fluid).</i>\n",
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td width=\"30%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/james-clerk-maxwell.png\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"30%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/boltzmann2.jpg\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"30%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/j-w-gibbs.jpg\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "\n",
    "Figure: <i>James Clerk Maxwell (1831-1879), Ludwig Boltzmann (1844-1906)\n",
    "Josiah Willard Gibbs (1839-1903)</i>\n",
    "\n",
    "Many of the ideas of early statistical physicists were rejected by a\n",
    "cadre of physicists who didn’t believe in the notion of a molecule. The\n",
    "stress of trying to have his ideas established caused Boltzmann to\n",
    "commit suicide in 1906, only two years before the same ideas became\n",
    "widely accepted."
   ],
   "id": "87291356-08c8-4a9c-bdac-ba6072a69481"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='Vuk5AQAAMAAJ', page='PA373')"
   ],
   "id": "69c2892b-2f7b-42e1-9636-1a577f23f85f"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Figure: <i>Boltzmann’s paper Boltzmann (n.d.) which introduced the\n",
    "relationship between entropy and probability. A translation with notes\n",
    "is available in Sharp and Matschinsky (2015).</i>\n",
    "\n",
    "The important point about the uncertainty being represented here is that\n",
    "it is not genuine stochasticity, it is a lack of knowledge about the\n",
    "system. The techniques proposed by Maxwell, Boltzmann and Gibbs allow us\n",
    "to exactly represent the state of the system through a set of parameters\n",
    "that represent the sufficient statistics of the physical system. We know\n",
    "these values as the volume, temperature, and pressure. The challenge for\n",
    "us, when approximating the physical world with the techniques we will\n",
    "use is that we will have to sit somewhere between the deterministic and\n",
    "purely stochastic worlds that these different scientists described.\n",
    "\n",
    "One ongoing characteristic of people who study probability and\n",
    "uncertainty is the confidence with which they hold opinions about it.\n",
    "Another leader of the Cavendish laboratory expressed his support of the\n",
    "second law of thermodynamics (which can be proven through the work of\n",
    "Gibbs/Boltzmann) with an emphatic statement at the beginning of his\n",
    "book.\n",
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/arthur-stanley-eddington.jpg\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/natureofphysical00eddi_7.png\" style=\"width:80%\">\n",
    "\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "\n",
    "Figure: <i>Eddington’s book on the Nature of the Physical World\n",
    "(Eddington, 1929)</i>\n",
    "\n",
    "The same Eddington is also famous for dismissing the ideas of a young\n",
    "Chandrasekhar who had come to Cambridge to study in the Cavendish lab.\n",
    "Chandrasekhar demonstrated the limit at which a star would collapse\n",
    "under its own weight to a singularity, but when he presented the work to\n",
    "Eddington, he was dismissive suggesting that there “must be some natural\n",
    "law that prevents this abomination from happening”.\n",
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/natureofphysical00eddi_100.png\" style=\"width:80%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/ChandraNobel.png\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "\n",
    "Figure: <i>Chandrasekhar (1910-1995) derived the limit at which a star\n",
    "collapses in on itself. Eddington’s confidence in the 2nd law may have\n",
    "been what drove him to dismiss Chandrasekhar’s ideas, humiliating a\n",
    "young scientist who would later receive a Nobel prize for the work.</i>\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/natureofphysical00eddi_100_cropped.png\" style=\"width:60%\">\n",
    "\n",
    "Figure: <i>Eddington makes his feelings about the primacy of the second\n",
    "law clear. This primacy is perhaps because the second law can be\n",
    "demonstrated mathematically, building on the work of Maxwell, Gibbs and\n",
    "Boltzmann. Eddington (1929)</i>\n",
    "\n",
    "Presumably he meant that the creation of a black hole seemed to\n",
    "transgress the second law of thermodynamics, although later Hawking was\n",
    "able to show that blackholes do evaporate, but the time scales at which\n",
    "this evaporation occurs is many orders of magnitude slower than other\n",
    "processes in the universe."
   ],
   "id": "af77ad5f-49eb-487b-8f78-52b59a8cc623"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Maxwell’s Demon\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/maxwells-demon.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/maxwells-demon.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Maxwell’s demon is a thought experiment described by James Clerk Maxwell\n",
    "in his book, *Theory of Heat* (Maxwell, 1871) on page 308.\n",
    "\n",
    "> But if we conceive a being whose faculties are so sharpened that he\n",
    "> can follow every molecule in its course, such a being, whose\n",
    "> attributes are still as essentially finite as our own, would be able\n",
    "> to do what is at present impossible to us. For we have seen that the\n",
    "> molecules in a vessel full of air at uniform temperature are moving\n",
    "> with velocities by no means uniform, though the mean velocity of any\n",
    "> great number of them, arbitrarily selected, is almost exactly uniform.\n",
    "> Now let us suppose that such a vessel is divided into two portions, A\n",
    "> and B, by a division in which there is a small hole, and that a being,\n",
    "> who can see the individual molecules, opens and closes this hole, so\n",
    "> as to allow only the swifter molecules to pass from A to B, and the\n",
    "> only the slower ones to pass from B to A. He will thus, without\n",
    "> expenditure of work, raise the temperature of B and lower that of A,\n",
    "> in contradiction to the second law of thermodynamics.\n",
    ">\n",
    "> James Clerk Maxwell in *Theory of Heat* (Maxwell, 1871) page 308\n",
    "\n",
    "He goes onto say:\n",
    "\n",
    "> This is only one of the instances in which conclusions which we have\n",
    "> draw from our experience of bodies consisting of an immense number of\n",
    "> molecules may be found not to be applicable to the more delicate\n",
    "> observations and experiments which we may suppose made by one who can\n",
    "> perceive and handle the individual molecules which we deal with only\n",
    "> in large masses"
   ],
   "id": "beb8e6f2-2b95-4041-a496-b5022bb520d9"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='0p8AAAAAMAAJ', page='PA308')"
   ],
   "id": "89a32e29-f450-44a6-8016-1ee16d47dacc"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Figure: <i>Maxwell’s demon was designed to highlight the statistical\n",
    "nature of the second law of thermodynamics.</i>\n",
    "\n",
    "<canvas id=\"maxwell-canvas\" width=\"700\" height=\"500\" style=\"border:1px solid black;display:inline;text-align:left\">\n",
    "</canvas>\n",
    "\n",
    "Entropy:\n",
    "\n",
    "<output id=\"maxwell-entropy\">\n",
    "</output>\n",
    "\n",
    "<button id=\"maxwell-newball\" style=\"text-align:right\">\n",
    "\n",
    "New Ball\n",
    "\n",
    "</button>\n",
    "<button id=\"maxwell-pause\" style=\"text-align:right\">\n",
    "\n",
    "Pause\n",
    "\n",
    "</button>\n",
    "<button id=\"maxwell-skip\" style=\"text-align:right\">\n",
    "\n",
    "Skip 1000s\n",
    "\n",
    "</button>\n",
    "<button id=\"maxwell-histogram\" style=\"text-align:right\">\n",
    "\n",
    "Histogram\n",
    "\n",
    "</button>\n",
    "\n",
    "<script src=\"https://inverseprobability.com/talks/scripts//ballworld/maxwell.js\"></script>\n",
    "\n",
    "Figure: <i>Maxwell’s Demon. The demon decides balls are either cold\n",
    "(blue) or hot (red) according to their velocity. Balls are allowed to\n",
    "pass the green membrane from right to left only if they are cold, and\n",
    "from left to right, only if they are hot.</i>\n",
    "\n",
    "Maxwell’s demon allows us to connect thermodynamics with information\n",
    "theory (see e.g. Hosoya et al. (2015);Hosoya et al. (2011);Bub\n",
    "(2001);Brillouin (1951);Szilard (1929)). The connection arises due to a\n",
    "fundamental connection between information erasure and energy\n",
    "consumption Landauer (1961).\n",
    "\n",
    "Alemi and Fischer (2019)"
   ],
   "id": "c2af0dc0-3bb3-4dce-8f7f-55e48517516d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Information Theory and Thermodynamics\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/information-theory-overview.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/information-theory-overview.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Information theory provides a mathematical framework for quantifying\n",
    "information. Many of information theory’s core concepts parallel those\n",
    "found in thermodynamics. The theory was developed by Claude Shannon who\n",
    "spoke extensively to MIT’s Norbert Wiener at while it was in development\n",
    "(Conway and Siegelman, 2005). Wiener’s own ideas about information were\n",
    "inspired by Willard Gibbs, one of the pioneers of the mathematical\n",
    "understanding of free energy and entropy. Deep connections between\n",
    "physical systems and information processing have connected information\n",
    "and energy from the start."
   ],
   "id": "02cdb4b8-54ba-4f77-a378-232ef0c82ef0"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Entropy\n",
    "\n",
    "Shannon’s entropy measures the uncertainty or unpredictability of\n",
    "information content. This mathematical formulation is inspired by\n",
    "thermodynamic entropy, which describes the dispersal of energy in\n",
    "physical systems. Both concepts quantify the number of possible states\n",
    "and their probabilities.\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information/maxwell-demon.svg\" class=\"\" width=\"60%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Maxwell’s demon thought experiment illustrates the\n",
    "relationship between information and thermodynamics.</i>\n",
    "\n",
    "In thermodynamics, free energy represents the energy available to do\n",
    "work. A system naturally evolves to minimize its free energy, finding\n",
    "equilibrium between total energy and entropy. Free energy principles are\n",
    "also pervasive in variational methods in machine learning. They emerge\n",
    "from Bayesian approaches to learning and have been heavily promoted by\n",
    "e.g. Karl Friston as a model for the brain.\n",
    "\n",
    "The relationship between entropy and Free Energy can be explored through\n",
    "the Legendre transform. This is most easily reviewed if we restrict\n",
    "ourselves to distributions in the exponential family."
   ],
   "id": "2ddba819-6b7c-436d-8936-c7ad582b9b79"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exponential Family\n",
    "\n",
    "The exponential family has the form $$\n",
    "  \\rho(Z) = h(Z) \\exp\\left(\\boldsymbol{\\theta}^\\top T(Z) + A(\\boldsymbol{\\theta})\\right)\n",
    "$$ where $h(Z)$ is the base measure, $\\boldsymbol{\\theta}$ is the\n",
    "natural parameters, $T(Z)$ is the sufficient statistics and\n",
    "$A(\\boldsymbol{\\theta})$ is the log partition function. Its entropy can\n",
    "be computed as $$\n",
    "  S(Z) = A(\\boldsymbol{\\theta}) - \\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta}A(\\boldsymbol{\\theta}) - E_{\\rho(Z)}\\left[\\log h(Z)\\right],\n",
    "$$ where $E_{\\rho(Z)}[\\cdot]$ is the expectation under the distribution\n",
    "$\\rho(Z)$."
   ],
   "id": "a38580dc-1a75-4274-b247-9893179e5498"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Available Energy"
   ],
   "id": "291ab979-cd0f-4b45-9c47-cfcaad3a04a3"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Work through Measurement\n",
    "\n",
    "In machine learning and Bayesian inference, the Markov blanket is the\n",
    "set of variables that are conditionally independent of the variable of\n",
    "interest given the other variables. To introduce this idea into our\n",
    "information system, we first split the system into two parts, the\n",
    "variables, $X$, and the memory $M$.\n",
    "\n",
    "The variables are the portion of the system that is stochastically\n",
    "evolving over time. The memory is a low entropy partition of the system\n",
    "that will give us knowledge about this evolution.\n",
    "\n",
    "We can now write the joint entropy of the system in terms of the mutual\n",
    "information between the variables and the memory. $$\n",
    "S(Z) = S(X,M) = S(X|M) + S(M) = S(X) - I(X;M) + S(M).\n",
    "$$ This gives us the first hint at the connection between information\n",
    "and energy.\n",
    "\n",
    "If $M$ is viewed as a measurement then the change in entropy of the\n",
    "system before and after measurement is given by $S(X|M) - S(X)$ wehich\n",
    "is given by $-I(X;M)$. This is implies that measurement increases the\n",
    "amount of available energy we obtain from the system (Parrondo et al.,\n",
    "2015).\n",
    "\n",
    "The difference in available energy is given by $$\n",
    "\\Delta A = A(X) - A(Z|M) = I(X;M),\n",
    "$$ where we note that the resulting system is no longer in thermodynamic\n",
    "equilibrium due to the low entropy of the memory."
   ],
   "id": "03c4b1df-28b8-4c86-bc9d-ea51e3452f2c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Animal Game\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/the-animal-game.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/the-animal-game.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The Entropy Game is a framework for understanding efficient uncertainty\n",
    "reduction. To start think of finding the optimal strategy for\n",
    "identifying an unknown entity by asking the minimum number of yes/no\n",
    "questions."
   ],
   "id": "547902f2-3c81-4e1f-b237-cdb8fe1d5e77"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The 20 Questions Paradigm\n",
    "\n",
    "In the game of 20 Questions player one (Alice) thinks of an object,\n",
    "player two (Bob) must identify it by asking at most 20 yes/no questions.\n",
    "The optimal strategy is to divide the possibility space in half with\n",
    "each question. The binary search approach ensures maximum information\n",
    "gain with each inquiry and can access $2^20$ or about a million\n",
    "different objects.\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information/binary-search-tree.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>The optimal strategy in the Entropy Game resembles a binary\n",
    "search, dividing the search space in half with each question.</i>"
   ],
   "id": "36961487-0d4b-4e86-a234-d0d851043134"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Entropy Reduction and Decisions\n",
    "\n",
    "From an information-theoretic perspective, decisions can be taken in a\n",
    "way that efficiently reduces entropy - our the uncertainty about the\n",
    "state of the world. Each observation or action an intelligent agent\n",
    "takes should maximize expected information gain, optimally reducing\n",
    "uncertainty given available resources.\n",
    "\n",
    "The entropy before the question is $S(X)$. The entropy after the\n",
    "question is $S(X|M)$. The information gain is the difference between the\n",
    "two, $I(X;M) = S(X) - S(X|M)$. Optimal decision making systems maximize\n",
    "this information gain per unit cost."
   ],
   "id": "daa831f7-c62a-4fab-9b97-a9a2483990d3"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Thermodynamic Parallels\n",
    "\n",
    "The entropy game connects decision-making to thermodynamics.\n",
    "\n",
    "This perspective suggests a profound connection: intelligence might be\n",
    "understood as a special case of systems that efficiently extract,\n",
    "process, and utilize free energy from their environments, with\n",
    "thermodynamic principles setting fundamental constraints on what’s\n",
    "possible."
   ],
   "id": "ebc9a854-de93-4f6a-a143-ff89942db566"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Information Engines: Intelligence as an Energy-Efficiency\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/intelligence-thermodynamics-connection.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/intelligence-thermodynamics-connection.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The entropy game shows some parallels between thermodynamics and\n",
    "measurement. This allows us to imagine *information engines*, simple\n",
    "systems that convert information to energy. This is our first simple\n",
    "model of intelligence."
   ],
   "id": "ac8e8678-ae03-490d-bb3b-69ee30fffac7"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Measurement as a Thermodynamic Process: Information-Modified Second Law\n",
    "\n",
    "The second law of thermodynamics was generalised to include the effect\n",
    "of measurement by Sagawa and Ueda (Sagawa and Ueda, 2008). They showed\n",
    "that the maximum extractable work from a system can be increased by\n",
    "$k_BTI(X;M)$ where $k_B$ is Boltzmann’s constant, $T$ is temperature and\n",
    "$I(X;M)$ is the information gained by making a measurement, $M$, $$\n",
    "I(X;M) = \\sum_{x,m} \\rho(x,m) \\log \\frac{\\rho(x,m)}{\\rho(x)\\rho(m)},\n",
    "$$ where $\\rho(x,m)$ is the joint probability of the system and\n",
    "measurement (see e.g. eq 14 in Sagawa and Ueda (2008)). This can be\n",
    "written as $$\n",
    "W_\\text{ext} \\leq  - \\Delta\\mathcal{F} + k_BTI(X;M),\n",
    "$$ where $W_\\text{ext}$ is the extractable work and it is upper bounded\n",
    "by the negative change in free energy, $\\Delta \\mathcal{F}$, plus the\n",
    "energy gained from measurement, $k_BTI(X;M)$. This is the\n",
    "information-modified second law.\n",
    "\n",
    "The measurements can be seen as a thermodynamic process. In theory\n",
    "measurement, like computation is reversible. But in practice the process\n",
    "of measurement is likely to erode the free energy somewhat, but as long\n",
    "as the energy gained from information, $kTI(X;M)$ is greater than that\n",
    "spent in measurement the pricess can be thermodynamically efficient.\n",
    "\n",
    "The modified second law shows that the maximum additional extractable\n",
    "work is proportional to the information gained. So information\n",
    "acquisition creates extractable work potential. Thermodynamic\n",
    "consistency is maintained by properly accounting for information-entropy\n",
    "relationships."
   ],
   "id": "d4197405-b45e-4ad0-92f4-cd7c58206e74"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Efficacy of Feedback Control\n",
    "\n",
    "Sagawa and Ueda extended this relationship to provide a *generalised\n",
    "Jarzynski equality* for feedback processes (Sagawa and Ueda, 2010). The\n",
    "Jarzynski equality is an imporant result from nonequilibrium\n",
    "thermodynamics that relates the average work done across an ensemble to\n",
    "the free energy difference between initial and final states (Jarzynski,\n",
    "1997), $$\n",
    "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\right\\rangle = \\exp\\left(-\\frac{\\Delta\\mathcal{F}}{k_BT}\\right),\n",
    "$$ where $\\langle W \\rangle$ is the average work done across an ensemble\n",
    "of trajectories, $\\Delta\\mathcal{F}$ is the change in free energy, $k_B$\n",
    "is Boltzmann’s constant, and $\\Delta S$ is the change in entropy. Sagawa\n",
    "and Ueda extended this equality to to include information gain from\n",
    "measurement (Sagawa and Ueda, 2010), $$\n",
    "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\exp\\left(\\frac{\\Delta\\mathcal{F}}{k_BT}\\right) \\exp\\left(-\\mathcal{I}(X;M)\\right)\\right\\rangle = 1,\n",
    "$$ where $\\mathcal{I}(X;M) = \\log \\frac{\\rho(X|M)}{\\rho(X)}$ is the\n",
    "information gain from measurement, and the mutual information is\n",
    "recovered $I(X;M) = \\left\\langle \\mathcal{I}(X;M) \\right\\rangle$ as the\n",
    "average information gain.\n",
    "\n",
    "Sagawa and Ueda introduce an *efficacy* term that captures the effect of\n",
    "feedback on the system they note in the presence of feedback, $$\n",
    "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\exp\\left(\\frac{\\Delta\\mathcal{F}}{k_BT}\\right)\\right\\rangle = \\gamma,\n",
    "$$ where $\\gamma$ is the efficacy."
   ],
   "id": "fbfa8af9-7276-46eb-bfd8-bc1e1250513a"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Channel Coding Perspective on Memory\n",
    "\n",
    "When viewing $M$ as an information channel between past and future\n",
    "states, Shannon’s channel coding theorems apply (Shannon, 1948). The\n",
    "channel capacity $C$ represents the maximum rate of reliable information\n",
    "transmission \\[ C = \\_{(M)} I(X_1;M) \\] and for a memory of $n$ bits we\n",
    "have \\[ C n, \\] as the mutual information is upper bounded by the\n",
    "entropy of $\\rho(M)$ which is at most $n$ bits.\n",
    "\n",
    "This relationship seems to align with Ashby’s Law of Requisite Variety\n",
    "(pg 229 Ashby (1952)), which states that a control system must have at\n",
    "least as much ‘variety’ as the system it aims to control. In the context\n",
    "of memory systems, this means that to maintain temporal correlations\n",
    "effectively, the memory’s state space must be at least as large as the\n",
    "information content it needs to preserve. This provides a lower bound on\n",
    "the necessary memory capacity that complements the bound we get from\n",
    "Shannon for channel capacity.\n",
    "\n",
    "This helps determine the required memory size for maintaining temporal\n",
    "correlations, optimal coding strategies, and fundamental limits on\n",
    "temporal correlation preservation."
   ],
   "id": "3f6128ab-38c6-4b88-aefe-71ca51562297"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Decomposition into Past and Future"
   ],
   "id": "c24b11aa-70a8-40ed-b32a-cde344d02f46"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Approximations and Thermodynamic Efficiency\n",
    "\n",
    "Intelligent systems must balance measurement against energy efficiency\n",
    "and time requirements. A perfect model of the world would require\n",
    "infinite computational resources and speed, so approximations are\n",
    "necessary. This leads to uncertainties. Thermodynamics might be thought\n",
    "of as the physics of uncertainty: at equilibrium thermodynamic systems\n",
    "find thermodynamic states that minimize free energy, equivalent to\n",
    "maximising entropy."
   ],
   "id": "2ab9f10e-e4f1-4d32-b67a-ed5f51344b1f"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Markov Blanket\n",
    "\n",
    "To introduce some structure to the model assumption. We split $X$ into\n",
    "$X_0$ and $X_1$. $X_0$ is past and present of the system, $X_1$ is\n",
    "future The conditional mutual information $I(X_0;X_1|M)$ which is zero\n",
    "if $X_1$ and $X_0$ are independent conditioned on $M$."
   ],
   "id": "269c8da3-3740-479e-ad32-cd39f530a139"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## At What Scales Does this Apply?\n",
    "\n",
    "The equipartition theorem tells us that at equilibrium the average\n",
    "energy is $kT/2$ per degree of freedom. This means that for systems that\n",
    "operate at “human scale” the energy involved is many orders of magnitude\n",
    "larger than the amount of information we can store in memory. For a car\n",
    "engine producing 70 kW of power at 370 Kelvin, this implies $$\n",
    "\\frac{2 \\times 70,000}{370 \\times k_B} = \\frac{2 \\times 70,000}{370\\times 1.380649×10^{−23}} = 2.74 × 10^{25} \n",
    "$$ degrees of freedom per second. If we make a conservative assumption\n",
    "of one bit per degree of freedom, then the mutual information we would\n",
    "require in one second for comparative energy production would be around\n",
    "3400 zettabytes, implying a memory bandwidth of around 3,400 zettabytes\n",
    "per second. In 2025 the estimate of all the data in the world stands at\n",
    "149 zettabytes."
   ],
   "id": "002492bd-d325-4267-b4c1-1c9202e89509"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Small-Scale Biochemical Systems and Information Processing\n",
    "\n",
    "While macroscopic systems operate in regimes where traditional\n",
    "thermodynamics dominates, microscopic biological systems operate at\n",
    "scales where information and thermal fluctuations become critically\n",
    "important. Here we examine how the framework applies to molecular\n",
    "machines and processes that have evolved to operate efficiently at these\n",
    "scales.\n",
    "\n",
    "Molecular machines like ATP synthase, kinesin motors, and the\n",
    "photosynthetic apparatus can be viewed as sophisticated information\n",
    "engines that convert energy while processing information about their\n",
    "environment. These systems have evolved to exploit thermal fluctuations\n",
    "rather than fight against them, using information processing to extract\n",
    "useful work."
   ],
   "id": "3dee57d9-d91f-41ed-a900-243a3389acdb"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ATP Synthase: Nature’s Rotary Engine\n",
    "\n",
    "ATP synthase functions as a rotary molecular motor that synthesizes ATP\n",
    "from ADP and inorganic phosphate using a proton gradient. The system\n",
    "uses the proton gradient as both an energy source and an information\n",
    "source about the cell’s energetic state and exploits Brownian motion\n",
    "through a ratchet mechanism. It converts information about proton\n",
    "locations into mechanical rotation and ultimately chemical energy with\n",
    "approximately 3-4 protons required per ATP."
   ],
   "id": "4c5a594e-ac50-42cf-aa3a-43f975581110"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.lib.display import YouTubeVideo\n",
    "YouTubeVideo('kXpzp4RDGJI')"
   ],
   "id": "1fb15eea-de43-4e1a-80cf-f65e8f365b0f"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Estimates suggest that one synapse firing may require $10^4$ ATP\n",
    "molecules, so around $4 \\times 10^4$ protons. If we take the human brain\n",
    "as containing around $10^{14}$ synapses, and if we suggest each synapse\n",
    "only fires about once every five seconds, we would require approximately\n",
    "$10^{18}$ protons per second to power the synapses in our brain. With\n",
    "each proton having six degrees of freedom. Under these rough\n",
    "calculations the memory capacity distributed across the ATP Synthase in\n",
    "our brain must be of order $6 \\times 10^{18}$ bits per second or 750\n",
    "petabytes of information per second. Of course this memory capacity\n",
    "would be devolved across the billions of neurons within hundreds or\n",
    "thousands of mitochondria that each can contain thousands of ATP\n",
    "synthase molecules. By composition of extremely small systems we can see\n",
    "it’s possible to improve efficiencies in ways that seem very impractical\n",
    "for a car engine.\n",
    "\n",
    "Quick note to clarify, here we’re referring to the information\n",
    "requirements to make our brain more energy efficient in its information\n",
    "processing rather than the information processing capabilities of the\n",
    "neurons themselves!"
   ],
   "id": "6ada6b42-ab7f-45b6-80a5-9cc8288102fe"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Jaynes’ World\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Jaynes’ World is a zero-player game that implements a version of the\n",
    "entropy game. The dynamical system is defined by a distribution,\n",
    "$\\rho(Z)$, over a state space $Z$. The state space is partitioned into\n",
    "observable variables $X$ and memory variables $M$. The memory variables\n",
    "are considered to be in an *information resevoir*, a thermodynamic\n",
    "system that maintains information in an ordered state (see e.g. Barato\n",
    "and Seifert (2014)). The entropy of the whole system is bounded below by\n",
    "0 and above by $N$. So the entropy forms a *compact manifold* with\n",
    "respect to its parameters.\n",
    "\n",
    "Unlike the animal game, where decisions are made by reducing entropy at\n",
    "each step, our system evovles mathematically by maximising the\n",
    "instantaneous entropy production. Conceptually we can think of this as\n",
    "*ascending* the gradient of the entropy, $S(Z)$.\n",
    "\n",
    "In the animal game the questioner starts with maximum uncertainty and\n",
    "targets minimal uncertainty. Jaynes’ world starts with minimal\n",
    "uncertainty and aims for maximum uncertainty.\n",
    "\n",
    "We can phrase this as a thought experiment. Imagine you are in the game,\n",
    "at a given turn. You want to see where the game came from, so you look\n",
    "back across turns. The direction the game came from is now the direction\n",
    "of steepest descent. Regardless of where the game actually started it\n",
    "looks like it started at a minimal entropy configuration that we call\n",
    "the *origin*. Similarly, wherever the game is actually stopped there\n",
    "will nevertheless appear to be an end point we call *end* that will be a\n",
    "configuration of maximal entropy, $N$.\n",
    "\n",
    "This speculation allows us to impose the functional form of our\n",
    "proability distribution. As Jaynes has shown (Jaynes, 1957), the\n",
    "stationary points of a free-form optimisation (minimum or maximum) will\n",
    "place the distribution in the, $\\rho(Z)$ in the *exponential family*, $$\n",
    "\\rho(Z) = h(Z) \\exp(\\boldsymbol{\\theta}^\\top T(Z) - A(\\boldsymbol{\\theta})),\n",
    "$$ where $h(Z)$ is the base measure, $T(Z)$ are sufficient statistics,\n",
    "$A(\\boldsymbol{\\theta})$ is the log-partition function,\n",
    "$\\boldsymbol{\\theta}$ are the *natural parameters* of the distribution.}\n",
    "\n",
    "This constraint to the exponential family is highly convenient as we\n",
    "will rely on it heavily for the dynamics of the game. In particular, by\n",
    "focussing on the *natural parameters* we find that we are optimising\n",
    "within an *information geometry* (Amari, 2016). In exponential family\n",
    "distributions, the entropy gradient is given by, $$\n",
    "\\nabla_{\\boldsymbol{\\theta}}S(Z) = \\mathbf{g} = \\nabla^2_\\boldsymbol{\\theta} A(\\boldsymbol{\\theta}(M))\n",
    "$$ And the Fisher information matrix, $G(\\boldsymbol{\\theta})$, is also\n",
    "the *Hessian* of the manifold, $$\n",
    "G(\\boldsymbol{\\theta}) = \\nabla^2_{\\boldsymbol{\\theta}} A(\\boldsymbol{\\theta}) = \\text{Cov}[T(Z)].\n",
    "$$ Traditionally, when optimising on an information geometry we take\n",
    "*natural gradient* steps, equivalen to a Newton minimisation step, $$\n",
    "\\Delta \\boldsymbol{\\theta} = - G(\\boldsymbol{\\theta})^{-1} \\mathbf{g},\n",
    "$$ but this is not the direction that gives the instantaneious\n",
    "maximisation of the entropy production, instead our gradient step is\n",
    "given by $$\n",
    "\\Delta \\boldsymbol{\\theta} = \\eta \\mathbf{g},\n",
    "$$ where $\\eta$ is a ‘learning rate’."
   ],
   "id": "141ec6f6-f29b-4781-b628-d6f42d1405b4"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Markovian Decomposition\n",
    "\n",
    "Now $X$ is further divided into past/present $X_0$ and future $X_1$. The\n",
    "entropy can be decomposed into a Markovian component, where $X_0$ and\n",
    "$X_1$ are conditionally independent given $M$ and a non-Markovian\n",
    "component. The conditional mutual information is $$\n",
    "I(X_0; X_1 | M) = \\sum_{x_0,x_1,m} p(x_0,x_1,m) \\log \\frac{p(x_0,x_1|m)}{p(x_0|m)p(x_1|m)},\n",
    "$$ which measures the remaining dependency between past and future given\n",
    "the memory state. This leads to a key insight about memory capacity:\n",
    "effective information reservoirs must minimize this conditional mutual\n",
    "information while maintaining minimal entropy.\n",
    "\n",
    "When $I(X_0; X_1 | M) = 0$, the system becomes perfectly Markovian - the\n",
    "memory variables capture all dependencies between past and future.\n",
    "However, achieving this perfect Markovianity while maintaining minimal\n",
    "entropy in $M$ will create a fundamental tension that drives an\n",
    "*uncertainty principle*."
   ],
   "id": "2fbf4149-f0d6-4270-b2f3-71e64be8c8d2"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## System Evolution\n",
    "\n",
    "We are now in a position to summarise the start state and the end state\n",
    "of our system, as well as to speculate on the nature of the transition\n",
    "between the two states."
   ],
   "id": "a268e609-ec36-4ecb-ba9c-302ebc15d309"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Start State\n",
    "\n",
    "The *origin configuration* is a low entropy state, with value near the\n",
    "lower bound of 0. The information is highly structured, by definition we\n",
    "place all variables in $M$, the information resevoir at this time. The\n",
    "uncertainty principle is present to handle the competeing needs of\n",
    "precision in parameters (giving us the near-singular form for\n",
    "$\\boldsymbol{\\theta}(M)$, and capacity in the information channel that\n",
    "$M$ provides (the capacity $c(\\boldsymbol{\\theta})$ is upper bounded by\n",
    "$S(M)$."
   ],
   "id": "ae8d42e6-cb42-4240-88fc-d1647ed31395"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## End State\n",
    "\n",
    "The *end configuration* is a high entropy state, near the upper bound.\n",
    "Both the minimal entropy and maximal entropy states are revealed by Ed\n",
    "Jaynes’ variational minimisation approach and are in the exponential\n",
    "family. In many cases a version of Zeno’s paradox will arise where the\n",
    "system asymtotes to the final state, taking smaller steps at each time.\n",
    "At this point the system is at equilibrium."
   ],
   "id": "251ce744-ea14-4dc2-9c7a-eb00cde17f45"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Histogram Game\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-histogram.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-histogram.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "To illustrate the concept of the Jaynes’ world entropy game we’ll run a\n",
    "simple example using a four bin histogram. The entropy of a four bin\n",
    "histogram can be computed as, $$\n",
    "S(p) = - \\sum_{i=1}^4 p_i \\log_2 p_i.\n",
    "$$"
   ],
   "id": "b4c2c3e1-2f37-4484-a34c-ffa8b1a07ae4"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "c1071dd9-3e54-4e1b-aaf4-048875f1757c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we write some helper code to plot the histogram and compute its\n",
    "entropy."
   ],
   "id": "295e23ef-e7d8-4864-be4a-62a29d1c8b93"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot"
   ],
   "id": "46c2f40b-1267-4953-b2d5-bbe012c267a9"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_histogram(ax, p, max_height=None):\n",
    "    heights = p\n",
    "    if max_height is None:\n",
    "        max_height = 1.25*heights.max()\n",
    "    \n",
    "    # Safe entropy calculation that handles zeros\n",
    "    nonzero_p = p[p > 0]  # Filter out zeros\n",
    "    S = - (nonzero_p*np.log2(nonzero_p)).sum()\n",
    "\n",
    "    # Define bin edges\n",
    "    bins = [1, 2, 3, 4, 5]  # Bin edges\n",
    "\n",
    "    # Create the histogram\n",
    "    if ax is None:\n",
    "        fig, ax = plt.subplots(figsize=(6, 4))  # Adjust figure size \n",
    "    ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black') # Use weights for probabilities\n",
    "\n",
    "\n",
    "    # Customize the plot for better slide presentation\n",
    "    ax.set_xlabel(\"Bin\")\n",
    "    ax.set_ylabel(\"Probability\")\n",
    "    ax.set_title(f\"Four Bin Histogram (Entropy {S:.3f})\")\n",
    "    ax.set_xticks(bins[:-1]) # Show correct x ticks\n",
    "    ax.set_ylim(0,max_height) # Set y limit for visual appeal"
   ],
   "id": "62f8ed20-299c-46d3-8ba5-01b1a34fd571"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can compute the entropy of any given histogram."
   ],
   "id": "c2c8e985-423f-47ed-a923-cfdebde0b716"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Define probabilities\n",
    "p = np.zeros(4)\n",
    "p[0] = 4/13\n",
    "p[1] = 3/13\n",
    "p[2] = 3.7/13\n",
    "p[3] = 1 - p.sum()\n",
    "\n",
    "# Safe entropy calculation\n",
    "nonzero_p = p[p > 0]  # Filter out zeros\n",
    "entropy = - (nonzero_p*np.log2(nonzero_p)).sum()\n",
    "print(f\"The entropy of the histogram is {entropy:.3f}.\")"
   ],
   "id": "2eb320c2-f310-4f28-8b6e-9c135abb7239"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "deede616-05d4-4820-92bb-a11cf3269c13"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "fig.tight_layout()\n",
    "plot_histogram(ax, p)\n",
    "ax.set_title(f\"Four Bin Histogram (Entropy {entropy:.3f})\")\n",
    "mlai.write_figure(filename='four-bin-histogram.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "5f7ee398-49cd-4f37-b219-3371d4c8f462"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>The entropy of a four bin histogram.</i>\n",
    "\n",
    "We can play the entropy game by starting with a histogram with all the\n",
    "probability mass in the first bin and then ascending the gradient of the\n",
    "entropy function. To do this we represent the histogram parameters as a\n",
    "vector of length 4,\n",
    "$\\mathbf{ w}{\\lambda} = [\\lambda_1, \\lambda_2, \\lambda_3, \\lambda_4]$\n",
    "and define the histogram probabilities to be\n",
    "$p_i = \\lambda_i^2 / \\sum_{j=1}^4 \\lambda_j^2$."
   ],
   "id": "0b08d3a0-0bdb-49e2-9016-8ac38b88c3cf"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "88401b78-da08-4946-848f-0af1d51b26c5"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the entropy function \n",
    "def entropy(lambdas):\n",
    "    p = lambdas**2/(lambdas**2).sum()\n",
    "    \n",
    "    # Safe entropy calculation\n",
    "    nonzero_p = p[p > 0]\n",
    "    nonzero_lambdas = lambdas[p > 0]\n",
    "    return np.log2(np.sum(lambdas**2))-np.sum(nonzero_p * np.log2(nonzero_lambdas**2))\n",
    "\n",
    "# Define the gradient of the entropy function\n",
    "def entropy_gradient(lambdas):\n",
    "    denominator = np.sum(lambdas**2)\n",
    "    p = lambdas**2/denominator\n",
    "    \n",
    "    # Safe log calculation\n",
    "    log_terms = np.zeros_like(lambdas)\n",
    "    nonzero_idx = lambdas != 0\n",
    "    log_terms[nonzero_idx] = np.log2(np.abs(lambdas[nonzero_idx]))\n",
    "    \n",
    "    p_times_lambda_entropy = -2*log_terms/denominator\n",
    "    const = (p*p_times_lambda_entropy).sum()\n",
    "    gradient = 2*lambdas*(p_times_lambda_entropy - const)\n",
    "    return gradient\n",
    "\n",
    "# Numerical gradient check\n",
    "def numerical_gradient(func, lambdas, h=1e-5):\n",
    "    numerical_grad = np.zeros_like(lambdas)\n",
    "    for i in range(len(lambdas)):\n",
    "        temp_lambda_plus = lambdas.copy()\n",
    "        temp_lambda_plus[i] += h\n",
    "        temp_lambda_minus = lambdas.copy()\n",
    "        temp_lambda_minus[i] -= h\n",
    "        numerical_grad[i] = (func(temp_lambda_plus) - func(temp_lambda_minus)) / (2 * h)\n",
    "    return numerical_grad"
   ],
   "id": "6f677fd7-6793-44fd-b009-7e1fab7bafaf"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then ascend the gradeint of the entropy function, starting at a\n",
    "parameter setting where the mass is placed in the first bin, we take\n",
    "$\\lambda_2 = \\lambda_3 = \\lambda_4 = 0.01$ and $\\lambda_1 = 100$.\n",
    "\n",
    "First to check our code we compare our numerical and analytic gradients."
   ],
   "id": "05ccf59f-2b02-490a-b1ad-2f543df05ed4"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "608926af-59b9-49da-9be5-b380f4cbeece"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial parameters (lambda)\n",
    "initial_lambdas = np.array([100, 0.01, 0.01, 0.01])\n",
    "\n",
    "# Gradient check\n",
    "numerical_grad = numerical_gradient(entropy, initial_lambdas)\n",
    "analytical_grad = entropy_gradient(initial_lambdas)\n",
    "print(\"Numerical Gradient:\", numerical_grad)\n",
    "print(\"Analytical Gradient:\", analytical_grad)\n",
    "print(\"Gradient Difference:\", np.linalg.norm(numerical_grad - analytical_grad))  # Check if close to zero"
   ],
   "id": "0f21e560-dbc1-40f1-9d16-9a3ecc9ce38d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can run the steepest ascent algorithm."
   ],
   "id": "4a2e462e-891c-41f8-89d8-0d8ea1aa378c"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "a2deb8f0-da1a-46a9-8422-91c938394836"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Steepest ascent algorithm\n",
    "lambdas = initial_lambdas.copy()\n",
    "\n",
    "learning_rate = 1\n",
    "turns = 15000\n",
    "entropy_values = []\n",
    "lambdas_history = []\n",
    "\n",
    "for _ in range(turns):\n",
    "    grad = entropy_gradient(lambdas)\n",
    "    lambdas += learning_rate * grad # update lambda for steepest ascent\n",
    "    entropy_values.append(entropy(lambdas))\n",
    "    lambdas_history.append(lambdas.copy())"
   ],
   "id": "cc4b3afd-9106-4e9e-90bb-36f3adecacb9"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can plot the histogram at a set of chosen turn numbers to see the\n",
    "progress of the algorithm."
   ],
   "id": "b53cb74b-739a-4860-b42b-937b9b4fb1a1"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "83fa0b3c-f51a-434c-abde-63f72b8577d5"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "plot_at = [0, 100, 1000, 2500, 5000, 7500, 10000, 12500, turns-1]\n",
    "for i, iter in enumerate(plot_at):\n",
    "    plot_histogram(ax, lambdas_history[i]**2/(lambdas_history[i]**2).sum(), 1)\n",
    "    # write the figure,\n",
    "    mlai.write_figure(filename=f'four-bin-histogram-turn-{i:02d}.svg', \n",
    "                      directory = './information-game')"
   ],
   "id": "46fc0b27-6044-45d2-a993-5ce288fb73a9"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "from ipywidgets import IntSlider"
   ],
   "id": "567fa2ad-e677-4c34-9ab0-bc2e1ed643d5"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nu.display_plots('two_point_sample{sample:0>3}.svg', \n",
    "                            './information-game', \n",
    "                            sample=IntSlider(5, 5, 5, 1))"
   ],
   "id": "62d0b3b6-33d0-45ac-a4b7-bee9500236c8"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### \n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-00.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-02.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-04.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-06.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-08.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Intermediate stages of the histogram entropy game. After 0,\n",
    "1000, 5000, 10000 and 15000 iterations.</i>\n",
    "\n",
    "And we can also plot the changing entropy as a function of the number of\n",
    "game turns."
   ],
   "id": "d5fcdf06-d390-4cf5-b40a-18618886fc26"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "ax.plot(range(turns), entropy_values)\n",
    "ax.set_xlabel(\"turns\")\n",
    "ax.set_ylabel(\"entropy\")\n",
    "ax.set_title(\"Entropy vs. turns (Steepest Ascent)\")\n",
    "mlai.write_figure(filename='four-bin-histogram-entropy-vs-turns.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "7eadd63b-aee6-46bb-b4ce-0f289925a5a2"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-entropy-vs-turns.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Four bin histogram entropy game. The plot shows the\n",
    "increasing entropy against the number of turns across 15000 iterations\n",
    "of gradient ascent.</i>\n",
    "\n",
    "Note that the entropy starts at a saddle point, increaseases rapidly,\n",
    "and the levels off towards the maximum entropy, with the gradient\n",
    "decreasing slowly in the manner of Zeno’s paradox."
   ],
   "id": "548c4bc2-2fed-48a1-b356-3cc158b6805a"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Two-Bin Histogram Example\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/two-bin-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/two-bin-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The simplest possible example of Jaynes’ World is a two-bin histogram\n",
    "with probabilities $p$ and $1-p$. This minimal system allows us to\n",
    "visualize the entire entropy landscape.\n",
    "\n",
    "The natural parameter is the log odds, $\\theta = \\log\\frac{p}{1-p}$, and\n",
    "the update given by the entropy gradient is $$\n",
    "\\Delta \\theta_{\\text{steepest}} = \\eta \\frac{\\text{d}S}{\\text{d}\\theta} = \\eta p(1-p)(\\log(1-p) - \\log p).\n",
    "$$ The Fisher information is $$\n",
    "G(\\theta) = p(1-p)\n",
    "$$ This creates a dynamic where as $p$ approaches either 0 or 1 (minimal\n",
    "entropy states), the Fisher information approaches zero, creating a\n",
    "critical slowing” effect. This critical slowing is what leads to the\n",
    "formation of *information resevoirs*. Note also that in the *natural\n",
    "gradient* the updated is given by multiplying the gradient by the\n",
    "inverse Fisher information, which would lead to a more efficient update\n",
    "of the form, $$\n",
    "\\Delta \\theta_{\\text{natural}} =  \\eta(\\log(1-p) - \\log p),\n",
    "$$ however, it is precisely this efficiency that we want our game to\n",
    "avoid, because it is the inefficient behaviour in the reagion of saddle\n",
    "points that leads to critical slowing and the emergence of information\n",
    "resevoirs."
   ],
   "id": "7127738a-e6b9-463c-a848-b8597aef316d"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "468edc43-eafe-46e8-9c59-6cc17e59f3fa"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Python code for gradients\n",
    "p_values = np.linspace(0.000001, 0.999999, 10000)\n",
    "theta_values = np.log(p_values/(1-p_values))\n",
    "entropy = -p_values * np.log(p_values) - (1-p_values) * np.log(1-p_values)\n",
    "fisher_info = p_values * (1-p_values)\n",
    "gradient = fisher_info * (np.log(1-p_values) - np.log(p_values))"
   ],
   "id": "53759037-f134-445d-887d-29eeaefeccfb"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "020bc653-7ccd-4965-b498-15c30e2f0d72"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=plot.big_wide_figsize)\n",
    "\n",
    "ax1.plot(theta_values, entropy)\n",
    "ax1.set_xlabel('$\\\\theta$')\n",
    "ax1.set_ylabel('Entropy $S(p)$')\n",
    "ax1.set_title('Entropy Landscape')\n",
    "\n",
    "ax2.plot(theta_values, gradient)\n",
    "ax2.set_xlabel('$\\\\theta$')\n",
    "ax2.set_ylabel('$\\\\nabla_\\\\theta S(p)$')\n",
    "ax2.set_title('Entropy Gradient vs. Position')\n",
    "\n",
    "mlai.write_figure(filename='two-bin-histogram-entropy-gradients.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "fa29dd9c-4360-4b8a-9642-52902698b273"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-histogram-entropy-gradients.svg\" class=\"\" width=\"95%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy gradients of the two bin histogram agains\n",
    "position.</i>\n",
    "\n",
    "This simple example reveals the entropy extrema at $p = 0$, $p = 0.5$,\n",
    "and $p = 1$. At minimal entropy ($p \\approx 0$ or $p \\approx 1$), the\n",
    "gradient approaches zero, creating natural information reservoirs. The\n",
    "dynamics slow dramatically near these points - these are the areas of\n",
    "critical slowing that create information reservoirs."
   ],
   "id": "2cfb3e9e-f5ce-4de9-801a-4e5ad3c1990c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Four-Bin Saddle Point Example\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-saddle-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-saddle-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "To illustrate saddle points and information reservoirs, we need at least\n",
    "a 4-bin system. This creates a 3-dimensional parameter space where we\n",
    "can observe genuine saddle points.\n",
    "\n",
    "Consider a 4-bin system parameterized by natural parameters $\\theta_1$,\n",
    "$\\theta_2$, and $\\theta_3$ (with one constraint). A saddle point occurs\n",
    "where the gradient $\\nabla_\\theta S = 0$, but the Hessian has mixed\n",
    "eigenvalues - some positive, some negative.\n",
    "\n",
    "At these points, the Fisher information matrix $G(\\theta)$\n",
    "eigendecomposition reveals.\n",
    "\n",
    "-   Fast modes: large positive eigenvalues → rapid evolution\n",
    "-   Slow modes: small positive eigenvalues → gradual evolution\n",
    "-   Critical modes: near-zero eigenvalues → information reservoirs\n",
    "\n",
    "The eigenvectors of $G(\\theta)$ at the saddle point determine which\n",
    "parameter combinations form information reservoirs."
   ],
   "id": "bf2ad10c-ecbb-459d-826c-2f060dba9e92"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "74edf7a8-44be-402d-80b7-32bf0b913a4d"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exponential family entropy with saddle point\n",
    "def exponential_family_entropy(theta1, theta2, theta3=None):\n",
    "    \"\"\"\n",
    "    Compute entropy of a 4-bin exponential family distribution\n",
    "    parameterized by natural parameters theta1, theta2, theta3\n",
    "    (with the constraint that probabilities sum to 1)\n",
    "    \"\"\"\n",
    "    # If theta3 is not provided, we'll use a function of theta1 and theta2\n",
    "    if theta3 is None:\n",
    "        theta3 = -0.5 * (theta1 + theta2)\n",
    "    \n",
    "    # Compute the log-partition function (normalization constant)\n",
    "    theta4 = -(theta1 + theta2 + theta3)  # Constraint\n",
    "    log_Z = np.log(np.exp(theta1) + np.exp(theta2) + np.exp(theta3) + np.exp(theta4))\n",
    "    \n",
    "    # Compute probabilities\n",
    "    p1 = np.exp(theta1 - log_Z)\n",
    "    p2 = np.exp(theta2 - log_Z)\n",
    "    p3 = np.exp(theta3 - log_Z)\n",
    "    p4 = np.exp(theta4 - log_Z)\n",
    "    \n",
    "    # Compute entropy: -sum(p_i * log(p_i))\n",
    "    entropy = -np.sum(\n",
    "        np.array([p1, p2, p3, p4]) * \n",
    "        np.log(np.array([p1, p2, p3, p4])), \n",
    "        axis=0, where=np.array([p1, p2, p3, p4])>0\n",
    "    )\n",
    "    \n",
    "    return entropy\n",
    "\n",
    "def entropy_gradient(theta1, theta2, theta3=None):\n",
    "    \"\"\"\n",
    "    Compute the gradient of the entropy with respect to theta1 and theta2\n",
    "    \"\"\"\n",
    "    # If theta3 is not provided, we'll use a function of theta1 and theta2\n",
    "    if theta3 is None:\n",
    "        theta3 = -0.5 * (theta1 + theta2)\n",
    "    \n",
    "    # Compute the log-partition function\n",
    "    theta4 = -(theta1 + theta2 + theta3)  # Constraint\n",
    "    log_Z = np.log(np.exp(theta1) + np.exp(theta2) + np.exp(theta3) + np.exp(theta4))\n",
    "    \n",
    "    # Compute probabilities\n",
    "    p1 = np.exp(theta1 - log_Z)\n",
    "    p2 = np.exp(theta2 - log_Z)\n",
    "    p3 = np.exp(theta3 - log_Z)\n",
    "    p4 = np.exp(theta4 - log_Z)\n",
    "    \n",
    "    # For the gradient, we need to account for the constraint on theta3\n",
    "    # When theta3 = -0.5(theta1 + theta2), we have:\n",
    "    # theta4 = -(theta1 + theta2 + theta3) = -(theta1 + theta2 - 0.5(theta1 + theta2)) = -0.5(theta1 + theta2)\n",
    "    \n",
    "    # Gradient components with chain rule applied\n",
    "    # For theta1: ∂S/∂theta1 + ∂S/∂theta3 * ∂theta3/∂theta1 + ∂S/∂theta4 * ∂theta4/∂theta1\n",
    "    grad_theta1 = (p1 * (np.log(p1) + 1)) - 0.5 * (p3 * (np.log(p3) + 1)) - 0.5 * (p4 * (np.log(p4) + 1))\n",
    "    \n",
    "    # For theta2: ∂S/∂theta2 + ∂S/∂theta3 * ∂theta3/∂theta2 + ∂S/∂theta4 * ∂theta4/∂theta2\n",
    "    grad_theta2 = (p2 * (np.log(p2) + 1)) - 0.5 * (p3 * (np.log(p3) + 1)) - 0.5 * (p4 * (np.log(p4) + 1))\n",
    "    \n",
    "    return grad_theta1, grad_theta2\n",
    "\n",
    "# Create a grid of points\n",
    "x = np.linspace(-2, 2, 100)\n",
    "y = np.linspace(-2, 2, 100)\n",
    "X, Y = np.meshgrid(x, y)\n",
    "\n",
    "# Compute entropy and its gradient at each point\n",
    "Z = exponential_family_entropy(X, Y)\n",
    "dX, dY = entropy_gradient(X, Y)\n",
    "\n",
    "# Normalize gradient vectors for better visualization\n",
    "norm = np.sqrt(dX**2 + dY**2)\n",
    "# Avoid division by zero\n",
    "norm = np.where(norm < 1e-10, 1e-10, norm)\n",
    "dX_norm = dX / norm\n",
    "dY_norm = dY / norm\n",
    "\n",
    "# A few gradient vectors for visualization\n",
    "stride = 10"
   ],
   "id": "6e8e0ccc-699e-49e8-af85-86858d9a283b"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "50852d79-d9a6-433b-bc43-dd766a8d6c88"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "\n",
    "# Create contour lines only (no filled contours)\n",
    "contours = plt.contour(X, Y, Z, levels=15, colors='black', linewidths=0.8)\n",
    "plt.clabel(contours, inline=True, fontsize=8, fmt='%.2f')\n",
    "\n",
    "# Add gradient vectors (normalized for direction, but scaled by magnitude for visibility)\n",
    "# Note: We're using the negative of the gradient to point in direction of increasing entropy\n",
    "plt.quiver(X[::stride, ::stride], Y[::stride, ::stride], \n",
    "           -dX_norm[::stride, ::stride], -dY_norm[::stride, ::stride], \n",
    "           color='r', scale=30, width=0.003, scale_units='width')\n",
    "\n",
    "# Add labels and title\n",
    "plt.xlabel('$\\\\theta_1$')\n",
    "plt.ylabel('$\\\\theta_2$')\n",
    "plt.title('Entropy Contours with Gradient Field')\n",
    "\n",
    "# Mark the saddle point (approximately at origin for this system)\n",
    "plt.scatter([0], [0], color='yellow', s=100, marker='*', \n",
    "            edgecolor='black', zorder=10, label='Saddle Point')\n",
    "plt.legend()\n",
    "\n",
    "mlai.write_figure(filename='simplified-saddle-point-example.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "3f3bf73a-7543-451e-88e3-2dd9cbf743ca"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/simplified-saddle-point-example.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualisation of a saddle point projected down to two\n",
    "dimensions.</i>\n",
    "\n",
    "The animation of system evolution would show initial rapid movement\n",
    "along high-eigenvalue directions, progressive slowing in directions with\n",
    "low eigenvalues and formation of information reservoirs in the\n",
    "critically slowed directions. Parameter-capacity uncertainty emerges\n",
    "naturally at the saddle point."
   ],
   "id": "38b5d1dd-bffa-4040-bcff-e2e8d67f93e9"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saddle Points\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-saddle-points.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-saddle-points.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Saddle points represent critical transitions in the game’s evolution\n",
    "where the gradient $\\nabla_{\\boldsymbol{\\theta}}S \\approx 0$ but the\n",
    "game is not at a maximum or minimum. At these points.\n",
    "\n",
    "1.  The Fisher information matrix $G(\\boldsymbol{\\theta})$ has\n",
    "    eigenvalues with significantly different magnitudes\n",
    "2.  Some eigenvalues approach zero, creating “critically slowed”\n",
    "    directions in parameter space\n",
    "3.  Other eigenvalues remain large, allowing rapid evolution in certain\n",
    "    directions\n",
    "\n",
    "This creates a natural separation between “memory” variables (associated\n",
    "with near-zero eigenvalues) and “processing” variables (associated with\n",
    "large eigenvalues). The game’s behavior becomes highly non-isotropic in\n",
    "parameter space.\n",
    "\n",
    "At saddle points, direct gradient ascent stalls, and the game must\n",
    "leverage the Fourier duality between parameters and capacity variables\n",
    "to continue entropy production. The duality relationship $$\n",
    "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)]\n",
    "$$ allows the game to progress by temporarily increasing uncertainty in\n",
    "capacity space, which creates gradients in previously flat directions of\n",
    "parameter space.\n",
    "\n",
    "These saddle points often coincide with phase transitions between\n",
    "parameter-dominated and capacity-dominated regimes, where the game’s\n",
    "fundamental character changes in terms of information processing\n",
    "capabilities."
   ],
   "id": "928a3cd2-c6be-45d0-b761-e0407e3fb952"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saddle Point Seeking Behaviour\n",
    "\n",
    "In the game’s evolution, we follow steepest ascent in parameter space to\n",
    "maximize entropy. Let’s contrast with the *natural gradient* approach\n",
    "that is often used in information geometry.\n",
    "\n",
    "The steepest ascent direction in Euclidean space is given by, $$\n",
    "\\Delta \\boldsymbol{\\theta}_{\\text{steepest}} = \\eta \\nabla_{\\boldsymbol{\\theta}} S = \\eta \\mathbf{g}\n",
    "$$ where $\\eta$ is a learning rate and $\\mathbf{g}$ is the entropy\n",
    "gradient.\n",
    "\n",
    "In contrast, the natural gradient adjusts the update direction according\n",
    "to the Fisher information geometry, $$\n",
    "\\Delta \\boldsymbol{\\theta}_{\\text{natural}} = \\eta G(\\boldsymbol{\\theta})^{-1} \\nabla_{\\boldsymbol{\\theta}} S = \\eta G(\\boldsymbol{\\theta})^{-1} \\mathbf{g}\n",
    "$$ where $G(\\boldsymbol{\\theta})$ is the Fisher information matrix. This\n",
    "represents a Newton step in the natural parameter space. Often the\n",
    "Newton step is difficult to compute, but for exponential families and\n",
    "their entropies the Fisher information has a form closely related to the\n",
    "gradients and would be easy to leverage. The game *explicitly* uses\n",
    "steepest ascent and this leads to very different behaviour, in\n",
    "particular near saddle points. In this regime\n",
    "\n",
    "1.  *Steepest ascent* slows dramatically in directions where the\n",
    "    gradient is small, leading to extremely slow progress along the\n",
    "    critically slowed modes. This actually helps the game by preserving\n",
    "    information in these modes while allowing continued evolution in\n",
    "    other directions.\n",
    "\n",
    "2.  *Natural gradient* would normalize the updates by the Fisher\n",
    "    information, potentially accelerating progress in critically slowed\n",
    "    directions. This would destroy the natural emergence of information\n",
    "    reservoirs that we desire.\n",
    "\n",
    "The use of steepest ascent rather than natural gradient is deliberate in\n",
    "our game. It allows the Fisher information matrix’s eigenvalue structure\n",
    "to directly influence the temporal dynamics, creating a natural\n",
    "separation of timescales that preserves information in critically slowed\n",
    "modes while allowing rapid evolution in others.\n",
    "\n",
    "As the game approaches a saddle point\n",
    "\n",
    "1.  The gradient $\\nabla_{\\boldsymbol{\\theta}} S$ approaches zero in\n",
    "    some directions but remains non-zero in others\n",
    "\n",
    "2.  The eigendecomposition of the Fisher information matrix\n",
    "    $G(\\boldsymbol{\\theta}) = V \\Lambda V^T$ reveals which directions\n",
    "    are critically slowed\n",
    "\n",
    "3.  Update magnitudes in different directions become proportional to\n",
    "    their corresponding eigenvalues\n",
    "\n",
    "4.  This creates the hierarchical timescale separation that forms the\n",
    "    basis of our memory structure\n",
    "\n",
    "This behavior creates a computational architecture where different\n",
    "variables naturally assume different functional roles based on their\n",
    "update dynamics, without requiring explicit design. The information\n",
    "geometry of the parameter space, combined with steepest ascent dynamics,\n",
    "self-organizes the game into memory and processing components."
   ],
   "id": "053d5227-6b95-4fd2-b6ac-d30fb5e302fd"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient Flow and Least Action Principles\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-flow-least-action.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-flow-least-action.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The steepest ascent dynamics in our system naturally connect to least\n",
    "action principles in physics. We can demonstrate this connection through\n",
    "a simple visualisation of gradient flows.\n",
    "\n",
    "For our entropy game, we can define an information-theoretic action, $$\n",
    "\\mathcal{A}[\\gamma] = \\int_0^T \\left(\\frac{1}{2}\\dot{\\boldsymbol{\\theta}}^\\top G(\\boldsymbol{\\theta}) \\dot{\\boldsymbol{\\theta}} - S(\\boldsymbol{\\theta})\\right) \\text{d}t\n",
    "$$ where $\\gamma$ is a path through parameter space,\n",
    "$G(\\boldsymbol{\\theta})$ is the Fisher information matrix, and\n",
    "$S(\\boldsymbol{\\theta})$ is the entropy. The least action principle\n",
    "states that the system will follow paths that extremise this action.\n",
    "\n",
    "This is also what our steepest ascent dynamics produce: the system\n",
    "follows geodesics in the information geometry while maximizing entropy\n",
    "production. As the system evolves, it naturally creates information\n",
    "reservoirs in directions where the gradient is small but non-zero."
   ],
   "id": "6362b24f-993d-4e06-8b24-3b2d86042b07"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "62976203-8fce-4c23-95ef-efcc2325d015"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a potential energy landscape based on bivariate Gaussian entropy\n",
    "def potential(x, y):\n",
    "    # Interpret x as sqrt-precision parameter and y as correlation parameter\n",
    "    # Constrain y to be between -1 and 1 (valid correlation)\n",
    "    y_corr = np.tanh(y)\n",
    "    \n",
    "    # Construct precision and covariance matrices\n",
    "    precision = x**2  # x is sqrt-precision\n",
    "    variance = 1/precision\n",
    "    covariance = y_corr * variance\n",
    "    \n",
    "    # Covariance matrix\n",
    "    Sigma = np.array([[variance, covariance], \n",
    "                      [covariance, variance]])\n",
    "    \n",
    "    # Entropy of bivariate Gaussian\n",
    "    det_Sigma = variance**2 * (1 - y_corr**2)\n",
    "    entropy = 0.5 * np.log((2 * np.pi * np.e)**2 * det_Sigma)\n",
    "    \n",
    "    return entropy\n",
    "\n",
    "# Create gradient vector field for the Gaussian entropy\n",
    "def gradient(x, y):\n",
    "    # Small delta for numerical gradient\n",
    "    delta = 1e-6\n",
    "    \n",
    "    # Compute numerical gradient\n",
    "    dx = (potential(x + delta, y) - potential(x - delta, y)) / (2 * delta)\n",
    "    dy = (potential(x, y + delta) - potential(x, y - delta)) / (2 * delta)\n",
    "    \n",
    "    return dx, dy\n",
    "\n",
    "# Simulate and plot a particle path following gradient\n",
    "def simulate_path(start_x, start_y, steps=100000, dt=0.00001):\n",
    "    path_x, path_y = [start_x], [start_y]\n",
    "    x, y = start_x, start_y\n",
    "    for _ in range(steps):\n",
    "        dx_val, dy_val = gradient(x, y)\n",
    "        x += dx_val * dt\n",
    "        y += dy_val * dt\n",
    "        path_x.append(x)\n",
    "        path_y.append(y)\n",
    "    return path_x, path_y"
   ],
   "id": "8e26abe2-1f95-450e-be91-ebd84f1bcdcf"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualizing gradient flow and least action path\n",
    "\n",
    "# Create grid\n",
    "x = np.linspace(-3, 4, 100)\n",
    "y = np.linspace(-3, 4, 100)\n",
    "X, Y = np.meshgrid(x, y)\n",
    "Z = potential(X, Y)\n",
    "\n",
    "# Calculate gradient field\n",
    "dx, dy = gradient(X, Y)\n",
    "magnitude = np.sqrt(dx**2 + dy**2)\n",
    "path_x, path_y = simulate_path(2, 3)"
   ],
   "id": "2de1b144-3d2a-4614-9263-3604fddb7bd2"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai\n",
    "from matplotlib.colors import LogNorm"
   ],
   "id": "0122ad3c-3c91-47a4-af38-c3f7b25a7796"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Create the figure\n",
    "fig, ax = plt.subplots(figsize=(10, 8))\n",
    "\n",
    "# Plot potential as contour lines only (not filled)\n",
    "contour = ax.contour(X, Y, Z, levels=15, colors='black', alpha=0.7, linewidths=0.8)\n",
    "ax.clabel(contour, inline=True, fontsize=8)  # Add labels to contour lines\n",
    "\n",
    "# Plot gradient field\n",
    "stride = 5\n",
    "ax.quiver(X[::stride, ::stride], Y[::stride, ::stride], \n",
    "          dx[::stride, ::stride]/magnitude[::stride, ::stride], \n",
    "          dy[::stride, ::stride]/magnitude[::stride, ::stride],\n",
    "          magnitude[::stride, ::stride],\n",
    "          cmap='autumn', scale=25, width=0.002)\n",
    "\n",
    "# Plot path\n",
    "ax.plot(path_x, path_y, 'r-', linewidth=2, label='Least action path')\n",
    "\n",
    "ax.set_xlabel('$\\\\theta_1$')\n",
    "ax.set_ylabel('$\\\\theta_2$')\n",
    "ax.set_title('Gradient Flow and Least Action Path')\n",
    "ax.legend()\n",
    "ax.set_aspect('equal')\n",
    "\n",
    "mlai.write_figure(filename='gradient-flow-least-action.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "9e2bec61-1f87-488b-bdbc-c4424948f41b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/gradient-flow-least-action.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualisation of the gradient flow and least action path.</i>\n",
    "\n",
    "The visualization shows how a system following the entropy gradient\n",
    "traces a path of least action through the parameter space. This\n",
    "connection between steepest ascent and least action comes because\n",
    "entropy maximization and free energy minimization are dual views of the\n",
    "same underlying principle.\n",
    "\n",
    "At points where the gradient becomes small (near critical points), the\n",
    "system exhibits critical slowing, and information reservoirs naturally\n",
    "form. These are the regions where variables, $X$, become information\n",
    "reservoirs and effective parameters, $M$, that control system behaviour."
   ],
   "id": "21af327f-d3eb-459a-8178-3d8b0e442928"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Uncertainty Principle\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-uncertainty-principle.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-uncertainty-principle.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "One challenge is how to parameterise our exponential family. We’ve\n",
    "mentioned that the variables $Z$ are partitioned into observable\n",
    "variables $X$ and memory variables $M$. Given the minimal entropy\n",
    "initial state, the obvious initial choice is that at the origin all\n",
    "variables, $Z$, should be in the information reservoir, $M$. This\n",
    "implies that they are well determined and present a sensible choice for\n",
    "the source of our parameters.\n",
    "\n",
    "We define a mapping, $\\boldsymbol{\\theta}(M)$, that maps the information\n",
    "resevoir to a set of values that are equivalent to the *natural\n",
    "parameters*. If the entropy of these parameters is low, and the\n",
    "distribution $\\rho(\\boldsymbol{\\theta})$ is sharply peaked then we can\n",
    "move from treating the memory mapping, $\\boldsymbol{\\theta}(\\cdot)$, as\n",
    "a random processe to an assumption that it is a deterministic function.\n",
    "We can then follow gradients with respect to these $\\boldsymbol{\\theta}$\n",
    "values.\n",
    "\n",
    "This allows us to rewrite the distribution over $Z$ in a conditional\n",
    "form, $$\n",
    "\\rho(X|M) = h(X) \\exp(\\boldsymbol{\\theta}(M)^\\top T(X) - A(\\boldsymbol{\\theta}(M))).\n",
    "$$\n",
    "\n",
    "Unfortunately this assumption implies that $\\boldsymbol{\\theta}(\\cdot)$\n",
    "is a delta function, and since our representation as a compact manifold\n",
    "(bounded below by $0$ and above by $N$) it does not admit any such\n",
    "singularities."
   ],
   "id": "b91603c7-9a74-4cfb-ab7c-f87e79a13962"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Capacity $\\leftrightarrow$ Precision Paradox\n",
    "\n",
    "This creates an apparent paradox, at minimal entropy states, the\n",
    "information reservoir must simultaneously maintain precision in the\n",
    "parameters $\\boldsymbol{\\theta}(M)$ (for accurate system representation)\n",
    "but it must also provide sufficient capacity $c(M)$ (for information\n",
    "storage).\n",
    "\n",
    "The trade-off can be expressed as, $$\n",
    "\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k,\n",
    "$$ where $k$ is a constant. This relationship can be recognised as a\n",
    "natural *uncertainty principle* that underpins the behaviour of the\n",
    "game. This principle is a necessary consequence of information theory.\n",
    "It follows from the requirement for the parameter-like states, $M$ to\n",
    "have both precision and high capacity (in the Shannon sense). The\n",
    "uncertainty principle ensures that when parameters are sharply defined\n",
    "(low $\\Delta\\boldsymbol{\\theta}$), the capacity variables have high\n",
    "uncertainty (high $\\Delta c$), allowing information to be encoded in\n",
    "their relationships rather than absolute values.\n",
    "\n",
    "In practice this means that the parameters $\\boldsymbol{\\theta}(M)$ and\n",
    "capacity variables $c(M)$ must form a Fourier-dual pair, $$\n",
    "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)],\n",
    "$$ This duality becomes important at saddle points when direct gradient\n",
    "ascent stalls."
   ],
   "id": "c8a1efa0-88ee-4a9d-ac5a-c9e35969ba50"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quantum vs Classical Information Reservoirs\n",
    "\n",
    "The uncertainty principle means that the game can exhibit quantum-like\n",
    "information processing regimes during evolution. This inspires an\n",
    "information-theoretic perspective on the quantum-classical transition.\n",
    "\n",
    "At minimal entropy states near the origin, the information reservoir has\n",
    "characteristics reminiscent of quantum systems.\n",
    "\n",
    "1.  *Wave-like information encoding*: The information reservoir near the\n",
    "    origin necessarily encodes information in distributed,\n",
    "    interference-capable patterns due to the uncertainty principle\n",
    "    between parameters $\\boldsymbol{\\theta}(M)$ and capacity variables\n",
    "    $c(M)$.\n",
    "\n",
    "2.  *Non-local correlations*: Parameters are highly correlated through\n",
    "    the Fisher information matrix, creating structures where information\n",
    "    is stored in relationships rather than individual variables.\n",
    "\n",
    "3.  *Uncertainty-saturated regime*: The uncertainty relationship\n",
    "    $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k$ is nearly\n",
    "    saturated (approaches equality), similar to Heisenberg’s uncertainty\n",
    "    principle in quantum systems.\n",
    "\n",
    "As the system evolves towards higher entropy states, a transition occurs\n",
    "where some variables exhibit classical behavior.\n",
    "\n",
    "1.  *From wave-like to particle-like*: Variables transitioning from $M$\n",
    "    to $X$ shift from storing information in interference patterns to\n",
    "    storing it in definite values with statistical uncertainty.\n",
    "\n",
    "2.  *Decoherence-like process*: The uncertainty product\n",
    "    $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M)$ for these variables\n",
    "    grows significantly larger than the minimum value $k$, indicating a\n",
    "    departure from quantum-like behavior.\n",
    "\n",
    "3.  *Local information encoding*: Information becomes increasingly\n",
    "    encoded in local variables rather than distributed correlations.\n",
    "\n",
    "The saddle points in our entropy landscape mark critical transitions\n",
    "between quantum-like and classical information processing regimes. Near\n",
    "these points\n",
    "\n",
    "1.  The critically slowed modes maintain quantum-like characteristics,\n",
    "    functioning as coherent memory that preserves information through\n",
    "    interference patterns.\n",
    "\n",
    "2.  The rapidly evolving modes exhibit classical characteristics,\n",
    "    functioning as incoherent processors that manipulate information\n",
    "    through statistical operations.\n",
    "\n",
    "3.  This natural separation creates a hybrid computational architecture\n",
    "    where quantum-like memory interfaces with classical-like processing.\n",
    "\n",
    "The quantum-classical transition can be quantified using the moment\n",
    "generating function $M_Z(t)$. In quantum-like regimes, the MGF exhibits\n",
    "oscillatory behavior with complex analytic structure, whereas in\n",
    "classical regimes, it grows monotonically with simple analytic\n",
    "structure. The transition between these behaviors identifies variables\n",
    "moving between quantum-like and classical information processing modes.\n",
    "\n",
    "This perspective suggests that what we recognize as “quantum” versus\n",
    "“classical” behavior may fundamentally reflect different regimes of\n",
    "information processing - one optimized for coherent information storage\n",
    "(quantum-like) and the other for flexible information manipulation\n",
    "(classical-like). The emergence of both regimes from our\n",
    "entropy-maximizing model indicates that nature may exploit this\n",
    "computational architecture to optimize information processing across\n",
    "multiple scales."
   ],
   "id": "b539ede0-2804-4ec9-8518-01506545c2b1"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualising the Parameter-Capacity Uncertainty Principle\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/uncertainty-visualisation.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/uncertainty-visualisation.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The uncertainty principle between parameters $\\theta$ and capacity\n",
    "variables $c$ is a fundamental feature of information reservoirs. We can\n",
    "visualize this uncertainty relation using phase space plots.\n",
    "\n",
    "We can demonstrate how the uncertainty principle manifests in different\n",
    "regimes:\n",
    "\n",
    "1.  **Quantum-like regime**: Near minimal entropy, the uncertainty\n",
    "    product $\\Delta\\theta \\cdot \\Delta c$ approaches the lower bound\n",
    "    $k$, creating wave-like interference patterns in probability space.\n",
    "\n",
    "2.  **Transitional regime**: As entropy increases, uncertainty relations\n",
    "    begin to decouple, with $\\Delta\\theta \\cdot \\Delta c > k$.\n",
    "\n",
    "3.  **Classical regime**: At high entropy, parameter uncertainty\n",
    "    dominates, creating diffusion-like dynamics with minimal influence\n",
    "    from uncertainty relations.\n",
    "\n",
    "The visualization shows probability distributions for these three\n",
    "regimes in both parameter space and capacity space."
   ],
   "id": "34615f02-e42e-42b3-9582-c2e31cfd2707"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "88c5e458-bdc0-450b-8bb2-1fb530bbd230"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai\n",
    "from matplotlib.patches import Ellipse"
   ],
   "id": "0c739595-051f-4ee3-997f-922d2e613914"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualization of uncertainty ellipses\n",
    "fig, ax = plt.subplots(figsize=plot.big_figsize)\n",
    "\n",
    "# Parameters for uncertainty ellipses\n",
    "k = 1  # Uncertainty constant\n",
    "centers = [(0, 0), (2, 2), (4, 4)]\n",
    "widths = [0.25, 0.5, 2]\n",
    "heights = [4, 2.5, 2]\n",
    "#heights = [k/w for w in widths]\n",
    "colors = ['blue', 'green', 'red']\n",
    "labels = ['Quantum-like', 'Transitional', 'Classical']\n",
    "\n",
    "# Plot uncertainty ellipses\n",
    "for center, width, height, color, label in zip(centers, widths, heights, colors, labels):\n",
    "    ellipse = Ellipse(center, width, height, \n",
    "                     edgecolor=color, facecolor='none', \n",
    "                     linewidth=2, label=label)\n",
    "    ax.add_patch(ellipse)\n",
    "    \n",
    "    # Add text label\n",
    "    ax.text(center[0], center[1] + height/2 + 0.2, \n",
    "            label, ha='center', color=color)\n",
    "    \n",
    "    # Add area label (uncertainty product)\n",
    "    area =  width * height\n",
    "    ax.text(center[0], center[1] - height/2 - 0.3, \n",
    "            f'Area = {width:.2f} $\\\\times$ {height: .2f} $\\\\pi$', ha='center')\n",
    "\n",
    "# Set axis labels and limits\n",
    "ax.set_xlabel('Parameter $\\\\theta$')\n",
    "ax.set_ylabel('Capacity $C$')\n",
    "ax.set_xlim(-3, 7)\n",
    "ax.set_ylim(-3, 7)\n",
    "ax.set_aspect('equal')\n",
    "ax.grid(True, linestyle='--', alpha=0.7)\n",
    "ax.set_title('Parameter-Capacity Uncertainty Relation')\n",
    "\n",
    "# Add diagonal line representing constant uncertainty product\n",
    "x = np.linspace(0.25, 6, 100)\n",
    "y = k/x\n",
    "ax.plot(x, y, 'k--', alpha=0.5, label='Minimum uncertainty: $\\\\Delta \\\\theta \\\\Delta C = k$')\n",
    "\n",
    "ax.legend(loc='upper right')\n",
    "mlai.write_figure(filename='uncertainty-ellipses.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "2286276e-3ad1-45e5-90c0-40b5978ed8d5"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/uncertainty-ellipses.svg\" class=\"\" width=\"50%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualisaiton of the uncertainty trade-off between parameter\n",
    "precision and capacity.</i>\n",
    "\n",
    "This visualization helps explain why information reservoirs with\n",
    "quantum-like properties naturally emerge at minimal entropy. The\n",
    "uncertainty principle is not imposed but arises naturally from the\n",
    "constraints of Shannon information theory applied to physical systems\n",
    "operating at minimal entropy.\n",
    "\n",
    "<!--include{_information-game/includes/mgf-analysis-example.md}-->\n",
    "<!--include{_information-game/includes/jaynes-world-information-reservoirs.md}-->\n",
    "<!--include{_information-game/includes/hierarchical-memory-example.md}-->"
   ],
   "id": "9d66fcf0-c0f7-4195-886e-eae4b1ecdce5"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conceptual Framework\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conceptual-framework.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conceptual-framework.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The Jaynes’ world game illustrates fundamental principles of information\n",
    "dynamics.\n",
    "\n",
    "1.  *Information Conservation*: Total information remains constant but\n",
    "    redistributes between structure and randomness. This follows from\n",
    "    the fundamental uncertainty principle between parameters and\n",
    "    capacity. As parameters become less precisely specified, capacity\n",
    "    increases.\n",
    "\n",
    "2.  *Uncertainty Principle*: Precision in parameters trades off with\n",
    "    entropy capacity. This is not merely a mathematical constraint but a\n",
    "    necessary feature of any physical information reservoir that must\n",
    "    maintain both stability and sufficient capacity.\n",
    "\n",
    "3.  *Self-Organization*: The system autonomously navigates toward\n",
    "    maximum entropy while maintaining necessary structure through\n",
    "    critically slowed modes. These modes function as information\n",
    "    reservoirs that preserve essential constraints while allowing\n",
    "    maximum entropy production elsewhere.\n",
    "\n",
    "4.  *Information-Energy Duality*: The framework connects to\n",
    "    thermodynamic concepts through the relationship between entropy\n",
    "    production and available work. As shown by Sagawa and Ueda,\n",
    "    information gain can be translated into extractable work, suggesting\n",
    "    that our entropy game has a direct thermodynamic interpretation.\n",
    "\n",
    "The information-modified second law indicates that the maximum\n",
    "extractable work is increased by $k_BT\\cdot I(X;M)$, where $I(X;M)$ is\n",
    "the mutual information between observable variables and memory. This\n",
    "creates a direct connection between our information reservoir model and\n",
    "physical thermodynamic systems.\n",
    "\n",
    "The zero-player game provides a mathematical model for studying how\n",
    "complex systems evolve when they instantaneously maximize entropy\n",
    "production."
   ],
   "id": "d2636d2a-1d49-4ac0-8ac4-51a2dad9aa3b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conclusion.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conclusion.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The zero-player game Jaynes’ world provides a mathematical model for\n",
    "studying how complex systems evolve when they instantaneously maximize\n",
    "entropy production.\n",
    "\n",
    "Our analysis suggests the game could illustrate the fundamental\n",
    "principles of information dynamics, including information conservation,\n",
    "an uncertainty principle, self-organization, and information-energy\n",
    "duality.\n",
    "\n",
    "The game’s architecture should naturally organize into memory and\n",
    "processing components, without requiring explicit design.\n",
    "\n",
    "The game’s temporal dynamics are based on steepest ascent in parameter\n",
    "space, this allows for analysis through the Fisher information matrix’s\n",
    "eigenvalue structure to create a natural separation of timescales and\n",
    "the natural emergence of information reservoirs."
   ],
   "id": "e49f2b0a-e7bd-4e44-bea0-a2a99bad42b9"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unifying Perspectives on Intelligence\n",
    "\n",
    "There are multiple perspectives we can take to understanding optimal\n",
    "decision making: entropy games, thermodynamic information engines, least\n",
    "action principles (and optimal control), and Schrödinger’s bridge -\n",
    "provide different views. Through introducing Jaynes’ world we look to\n",
    "explore the relationship between these different views of decision\n",
    "making to provide a more complete perspective of the limitations and\n",
    "possibilities for making optimal decisions."
   ],
   "id": "6e837622-21d6-4a43-886b-066e173e0672"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A Unified View of Intelligence Through Information\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/unified-intelligence-perspective.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/unified-intelligence-perspective.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The multiple perspectives we’ve explored - entropy games, information\n",
    "engines, least action principles, and Schrödinger’s bridge - provide\n",
    "complementary views of intelligence as optimal information processing.\n",
    "Each framework highlights different aspects of this fundamental process:\n",
    "\n",
    "1.  **The Entropy Game** shows us that intelligence can be measured by\n",
    "    how efficiently a system reduces uncertainty through strategic\n",
    "    questioning or observation.\n",
    "\n",
    "2.  **Information Engines** reveal how intelligence converts information\n",
    "    into useful work, subject to thermodynamic constraints.\n",
    "\n",
    "3.  **Least Action Principles** demonstrate that intelligence follows\n",
    "    optimal paths through information space, minimizing cumulative\n",
    "    uncertainty.\n",
    "\n",
    "4.  **Schrödinger’s Bridge** illuminates how intelligence can be viewed\n",
    "    as optimal transport of probability distributions, finding the most\n",
    "    likely paths between states of knowledge.\n",
    "\n",
    "These perspectives converge on a unified view: intelligence is\n",
    "fundamentally about optimal information processing. Whether we’re\n",
    "discussing human cognition, artificial intelligence, or biological\n",
    "systems, the capacity to efficiently acquire, process, and utilize\n",
    "information lies at the core of intelligent behavior.\n",
    "\n",
    "This unified perspective offers promising directions for both\n",
    "theoretical research and practical applications. By understanding\n",
    "intelligence through the lens of information theory and thermodynamics,\n",
    "we may develop more principled approaches to artificial intelligence,\n",
    "gain deeper insights into cognitive processes, and discover fundamental\n",
    "limits on what intelligence can achieve."
   ],
   "id": "6eee976b-c3b3-4ffe-b470-7fe4d42e6fbc"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Thanks!\n",
    "\n",
    "For more information on these subjects and more you might want to check\n",
    "the following resources.\n",
    "\n",
    "-   book: [The Atomic\n",
    "    Human](https://www.penguin.co.uk/books/455130/the-atomic-human-by-lawrence-neil-d/9780241625248)\n",
    "-   twitter: [@lawrennd](https://twitter.com/lawrennd)\n",
    "-   podcast: [The Talking Machines](http://thetalkingmachines.com)\n",
    "-   newspaper: [Guardian Profile\n",
    "    Page](http://www.theguardian.com/profile/neil-lawrence)\n",
    "-   blog:\n",
    "    [http://inverseprobability.com](http://inverseprobability.com/blog.html)"
   ],
   "id": "69611788-5f20-4433-a5cd-5343c8e9d21e"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References"
   ],
   "id": "dbd09467-e803-439d-a016-f052bd518f80"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alemi, A.A., Fischer, I., 2019. [TherML: The thermodynamics of machine\n",
    "learning](https://openreview.net/forum?id=HJeQToAqKQ). arXiv Preprint\n",
    "arXiv:1807.04162.\n",
    "\n",
    "Amari, S., 2016. Information geometry and its applications, Applied\n",
    "mathematical sciences. Springer, Tokyo.\n",
    "<https://doi.org/10.1007/978-4-431-55978-8>\n",
    "\n",
    "Ashby, W.R., 1952. Design for a brain: The origin of adaptive behaviour.\n",
    "Chapman & Hall, London.\n",
    "\n",
    "Barato, A.C., Seifert, U., 2014. Stochastic thermodynamics with\n",
    "information reservoirs. Physical Review E 90, 042150.\n",
    "<https://doi.org/10.1103/PhysRevE.90.042150>\n",
    "\n",
    "Boltzmann, L., n.d. Über die Beziehung zwischen dem zweiten Hauptsatze\n",
    "der mechanischen Warmetheorie und der Wahrscheinlichkeitsrechnung,\n",
    "respective den Sätzen über das wärmegleichgewicht. Sitzungberichte der\n",
    "Kaiserlichen Akademie der Wissenschaften. Mathematisch-Naturwissen\n",
    "Classe. Abt. II LXXVI, 373–435.\n",
    "\n",
    "Brillouin, L., 1951. Maxwell’s demon cannot operate: Information and\n",
    "entropy. i. Journal of Applied Physics 22, 334–337.\n",
    "<https://doi.org/10.1063/1.1699951>\n",
    "\n",
    "Bub, J., 2001. Maxwell’s demon and the thermodynamics of computation.\n",
    "Studies in History and Philosophy of Science Part B: Modern Physics 32,\n",
    "569–579. <https://doi.org/10.1016/S1355-2198(01)00023-5>\n",
    "\n",
    "Conway, F., Siegelman, J., 2005. Dark hero of the information age: In\n",
    "search of norbert wiener the father of cybernetics. Basic Books, New\n",
    "York.\n",
    "\n",
    "Eddington, A.S., 1929. The nature of the physical world. Dent (London).\n",
    "<https://doi.org/10.2307/2180099>\n",
    "\n",
    "Hosoya, A., Maruyama, K., Shikano, Y., 2015. Operational derivation of\n",
    "Boltzmann distribution with Maxwell’s demon model. Scientific Reports 5,\n",
    "17011. <https://doi.org/10.1038/srep17011>\n",
    "\n",
    "Hosoya, A., Maruyama, K., Shikano, Y., 2011. Maxwell’s demon and data\n",
    "compression. Phys. Rev. E 84, 061117.\n",
    "<https://doi.org/10.1103/PhysRevE.84.061117>\n",
    "\n",
    "Jarzynski, C., 1997. Nonequilibrium equality for free energy\n",
    "differences. Physical Review Letters 78, 2690–2693.\n",
    "<https://doi.org/10.1103/PhysRevLett.78.2690>\n",
    "\n",
    "Jaynes, E.T., 1957. Information theory and statistical mechanics.\n",
    "Physical Review 106, 620–630. <https://doi.org/10.1103/PhysRev.106.620>\n",
    "\n",
    "Landauer, R., 1961. Irreversibility and heat generation in the computing\n",
    "process. IBM Journal of Research and Development 5, 183–191.\n",
    "<https://doi.org/10.1147/rd.53.0183>\n",
    "\n",
    "Maxwell, J.C., 1871. Theory of heat. Longmans, Green; Co, London.\n",
    "\n",
    "Mikhailov, G.K., n.d. Daniel bernoulli, hydrodynamica (1738).\n",
    "\n",
    "Parrondo, J.M.R., Horowitz, J.M., Sagawa, T., 2015. Thermodynamics of\n",
    "information. Nature Physics 11, 131–139.\n",
    "<https://doi.org/10.1038/nphys3230>\n",
    "\n",
    "Sagawa, T., Ueda, M., 2010. Generalized Jarzynski equality under\n",
    "nonequilibrium feedback control. Physical Review Letters 104, 090602.\n",
    "<https://doi.org/10.1103/PhysRevLett.104.090602>\n",
    "\n",
    "Sagawa, T., Ueda, M., 2008. Second law of thermodynamics with discrete\n",
    "quantum feedback control. Physical Review Letters 100, 080403.\n",
    "<https://doi.org/10.1103/PhysRevLett.100.080403>\n",
    "\n",
    "Shannon, C.E., 1948. A mathematical theory of communication. The Bell\n",
    "System Technical Journal 27, 379-423 and 623-656.\n",
    "\n",
    "Sharp, K., Matschinsky, F., 2015. Translation of Ludwig Boltzmann’s\n",
    "paper “on the relationship between the second fundamental theorem of the\n",
    "mechanical theory of heat and probability calculations regarding the\n",
    "conditions for thermal equilibrium.” Entropy 17, 1971–2009.\n",
    "<https://doi.org/10.3390/e17041971>\n",
    "\n",
    "Szilard, L., 1929. Über die Entropieverminderung in einem\n",
    "thermodynamischen System bei Eingriffen intelligenter Wesen. Zeitschrift\n",
    "für Physik 53, 840–856. <https://doi.org/10.1007/BF01341281>"
   ],
   "id": "ce794600-bde9-446a-9715-a6520a60e726"
  }
 ],
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {}
}