{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1107daca-8f52-435d-be1a-ba45a506c601",
   "metadata": {},
   "source": [
    "# Jaynes’ World\n",
    "\n",
    "### Neil D. Lawrence\n",
    "\n",
    "### 2025-04-15"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1bc2cd9c-adbb-4a08-b5c4-40330ca437f4",
   "metadata": {},
   "source": [
    "**Abstract**: The relationship between physical systems and intelligence\n",
    "has long fascinated researchers in computer science and physics. This\n",
    "talk explores fundamental connections between thermodynamic systems and\n",
    "intelligent decision-making through the lens of free energy principles.\n",
    "\n",
    "We examine how concepts from statistical mechanics - particularly the\n",
    "relationship between total energy, free energy, and entropy - might\n",
    "provide novel insights into the nature of intelligence and learning. By\n",
    "drawing parallels between physical systems and information processing,\n",
    "we consider how measurement and observation can be viewed as processes\n",
    "that modify available energy. The discussion encompasses how model\n",
    "approximations and uncertainties might be understood through\n",
    "thermodynamic analogies, and explores the implications of treating\n",
    "intelligence as an energy-efficient state-change process.\n",
    "\n",
    "While these connections remain speculative, they offer a potential\n",
    "shared language for discussing the emergence of natural laws and\n",
    "societal systems through the lens of information."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7780e39c-47a7-4493-8428-d0d2a23896c8",
   "metadata": {},
   "source": [
    "$$\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c07c6ab7-c20d-4956-9aa3-7bf461422041",
   "metadata": {},
   "source": [
    "::: {.cell .markdown}\n",
    "\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!---->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- The last names to be defined. Should be defined entirely in terms of macros from above-->\n",
    "<!--\n",
    "\n",
    "-->"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ebc68b48-253e-4092-837c-4f57d143af1b",
   "metadata": {},
   "source": [
    "## Hydrodynamica\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/daniel-bernoulli-hydrodynamica.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/daniel-bernoulli-hydrodynamica.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "When Laplace spoke of the curve of a simple molecule of air, he may well\n",
    "have been thinking of Daniel Bernoulli (1700-1782). Daniel Bernoulli was\n",
    "one name in a prodigious family. His father and brother were both\n",
    "mathematicians. Daniel’s main work was known as *Hydrodynamica*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7de1dcf9-054b-49c7-9601-dd84a5a17080",
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='3yRVAAAAcAAJ', page='PP7')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e68c5496-6b44-4176-a5d6-2e55705ac61e",
   "metadata": {},
   "source": [
    "Figure: <i>Daniel Bernoulli’s *Hydrodynamica* published in 1738. It was\n",
    "one of the first works to use the idea of conservation of energy. It\n",
    "used Newton’s laws to predict the behaviour of gases.</i>\n",
    "\n",
    "Daniel Bernoulli described a kinetic theory of gases, but it wasn’t\n",
    "until 170 years later when these ideas were verified after Einstein had\n",
    "proposed a model of Brownian motion which was experimentally verified by\n",
    "Jean Baptiste Perrin."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01be0a6f-2a32-4430-9b25-6a0e40e9d296",
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='3yRVAAAAcAAJ', page='PA200')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3bb8940-9e83-4837-8d04-c28ffda8fe13",
   "metadata": {},
   "source": [
    "Figure: <i>Daniel Bernoulli’s chapter on the kinetic theory of gases,\n",
    "for a review on the context of this chapter see Mikhailov (n.d.). For\n",
    "1738 this is extraordinary thinking. The notion of kinetic theory of\n",
    "gases wouldn’t become fully accepted in Physics until 1908 when a model\n",
    "of Einstein’s was verified by Jean Baptiste Perrin.</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb48faae-8f19-410b-b233-e5cb2f242725",
   "metadata": {},
   "source": [
    "## Entropy Billiards\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/entropy-billiards.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/entropy-billiards.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "<canvas id=\"multiball-canvas\" width=\"700\" height=\"500\" style=\"border:1px solid black;display:inline;text-align:left \">\n",
    "</canvas>\n",
    "\n",
    "Entropy:\n",
    "\n",
    "<output id=\"multiball-entropy\">\n",
    "</output>\n",
    "\n",
    "<button id=\"multiball-newball\" style=\"text-align:right\">\n",
    "\n",
    "New Ball\n",
    "\n",
    "</button>\n",
    "<button id=\"multiball-pause\" style=\"text-align:right\">\n",
    "\n",
    "Pause\n",
    "\n",
    "</button>\n",
    "<button id=\"multiball-skip\" style=\"text-align:right\">\n",
    "\n",
    "Skip 1000s\n",
    "\n",
    "</button>\n",
    "<button id=\"multiball-histogram\" style=\"text-align:right\">\n",
    "\n",
    "Histogram\n",
    "\n",
    "</button>\n",
    "\n",
    "<script src=\"https://cdn.plot.ly/plotly-latest.min.js\"></script>\n",
    "<script src=\"https://inverseprobability.com/talks/scripts//ballworld/ballworld.js\"></script>\n",
    "<script src=\"https://inverseprobability.com/talks/scripts//ballworld/multiball.js\"></script>\n",
    "\n",
    "Figure: <i>Bernoulli’s simple kinetic models of gases assume that the\n",
    "molecules of air operate like billiard balls.</i>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39723527-187b-4b0f-bf72-975c16d96b4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "297a26f5-ce74-4b4e-bc43-99984be8bfcd",
   "metadata": {},
   "outputs": [],
   "source": [
    "p = np.random.randn(10000, 1)\n",
    "xlim = [-4, 4]\n",
    "x = np.linspace(xlim[0], xlim[1], 200)\n",
    "y = 1/np.sqrt(2*np.pi)*np.exp(-0.5*x*x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acb37763-4a9c-4a08-af65-67b98c6afa85",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "779c6a99-77ff-4cb4-b1e4-517d955db9fd",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "ax.plot(x, y, 'r', linewidth=3)\n",
    "ax.hist(p, 100, density=True)\n",
    "ax.set_xlim(xlim)\n",
    "\n",
    "mlai.write_figure('gaussian-histogram.svg', directory='./ml')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06daa5ee-f8a8-4c7d-9071-fdf13da3b317",
   "metadata": {},
   "source": [
    "Another important figure for Cambridge was the first to derive the\n",
    "probability distribution that results from small balls banging together\n",
    "in this manner. In doing so, James Clerk Maxwell founded the field of\n",
    "statistical physics.\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//ml/gaussian-histogram.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>James Clerk Maxwell 1831-1879 Derived distribution of\n",
    "velocities of particles in an ideal gas (elastic fluid).</i>\n",
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td width=\"30%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/james-clerk-maxwell.png\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"30%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/boltzmann2.jpg\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"30%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/j-w-gibbs.jpg\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "\n",
    "Figure: <i>James Clerk Maxwell (1831-1879), Ludwig Boltzmann (1844-1906)\n",
    "Josiah Willard Gibbs (1839-1903)</i>\n",
    "\n",
    "Many of the ideas of early statistical physicists were rejected by a\n",
    "cadre of physicists who didn’t believe in the notion of a molecule. The\n",
    "stress of trying to have his ideas established caused Boltzmann to\n",
    "commit suicide in 1906, only two years before the same ideas became\n",
    "widely accepted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec7916fa-2b83-4b0f-b887-22af8fb0aeb7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='Vuk5AQAAMAAJ', page='PA373')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dd87ea0-899d-4e1c-b8ab-8ab93d90ed21",
   "metadata": {},
   "source": [
    "Figure: <i>Boltzmann’s paper Boltzmann (n.d.) which introduced the\n",
    "relationship between entropy and probability. A translation with notes\n",
    "is available in Sharp and Matschinsky (2015).</i>\n",
    "\n",
    "The important point about the uncertainty being represented here is that\n",
    "it is not genuine stochasticity, it is a lack of knowledge about the\n",
    "system. The techniques proposed by Maxwell, Boltzmann and Gibbs allow us\n",
    "to exactly represent the state of the system through a set of parameters\n",
    "that represent the sufficient statistics of the physical system. We know\n",
    "these values as the volume, temperature, and pressure. The challenge for\n",
    "us, when approximating the physical world with the techniques we will\n",
    "use is that we will have to sit somewhere between the deterministic and\n",
    "purely stochastic worlds that these different scientists described.\n",
    "\n",
    "One ongoing characteristic of people who study probability and\n",
    "uncertainty is the confidence with which they hold opinions about it.\n",
    "Another leader of the Cavendish laboratory expressed his support of the\n",
    "second law of thermodynamics (which can be proven through the work of\n",
    "Gibbs/Boltzmann) with an emphatic statement at the beginning of his\n",
    "book.\n",
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/arthur-stanley-eddington.jpg\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/natureofphysical00eddi_7.png\" style=\"width:80%\">\n",
    "\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "\n",
    "Figure: <i>Eddington’s book on the Nature of the Physical World\n",
    "(Eddington, 1929)</i>\n",
    "\n",
    "The same Eddington is also famous for dismissing the ideas of a young\n",
    "Chandrasekhar who had come to Cambridge to study in the Cavendish lab.\n",
    "Chandrasekhar demonstrated the limit at which a star would collapse\n",
    "under its own weight to a singularity, but when he presented the work to\n",
    "Eddington, he was dismissive suggesting that there “must be some natural\n",
    "law that prevents this abomination from happening”.\n",
    "\n",
    "<table>\n",
    "<tr>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/natureofphysical00eddi_100.png\" style=\"width:80%\">\n",
    "\n",
    "</td>\n",
    "<td width=\"49%\">\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/ChandraNobel.png\" style=\"width:100%\">\n",
    "\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n",
    "\n",
    "Figure: <i>Chandrasekhar (1910-1995) derived the limit at which a star\n",
    "collapses in on itself. Eddington’s confidence in the 2nd law may have\n",
    "been what drove him to dismiss Chandrasekhar’s ideas, humiliating a\n",
    "young scientist who would later receive a Nobel prize for the work.</i>\n",
    "\n",
    "<img class=\"\" src=\"https://inverseprobability.com/talks/../slides/diagrams//physics/natureofphysical00eddi_100_cropped.png\" style=\"width:60%\">\n",
    "\n",
    "Figure: <i>Eddington makes his feelings about the primacy of the second\n",
    "law clear. This primacy is perhaps because the second law can be\n",
    "demonstrated mathematically, building on the work of Maxwell, Gibbs and\n",
    "Boltzmann. Eddington (1929)</i>\n",
    "\n",
    "Presumably he meant that the creation of a black hole seemed to\n",
    "transgress the second law of thermodynamics, although later Hawking was\n",
    "able to show that blackholes do evaporate, but the time scales at which\n",
    "this evaporation occurs is many orders of magnitude slower than other\n",
    "processes in the universe."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f0e5b2d-5e7c-4cd0-96b9-b2e81b8c6b8c",
   "metadata": {},
   "source": [
    "## Maxwell’s Demon\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/maxwells-demon.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/maxwells-demon.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Maxwell’s demon is a thought experiment described by James Clerk Maxwell\n",
    "in his book, *Theory of Heat* (Maxwell, 1871) on page 308.\n",
    "\n",
    "> But if we conceive a being whose faculties are so sharpened that he\n",
    "> can follow every molecule in its course, such a being, whose\n",
    "> attributes are still as essentially finite as our own, would be able\n",
    "> to do what is at present impossible to us. For we have seen that the\n",
    "> molecules in a vessel full of air at uniform temperature are moving\n",
    "> with velocities by no means uniform, though the mean velocity of any\n",
    "> great number of them, arbitrarily selected, is almost exactly uniform.\n",
    "> Now let us suppose that such a vessel is divided into two portions, A\n",
    "> and B, by a division in which there is a small hole, and that a being,\n",
    "> who can see the individual molecules, opens and closes this hole, so\n",
    "> as to allow only the swifter molecules to pass from A to B, and the\n",
    "> only the slower ones to pass from B to A. He will thus, without\n",
    "> expenditure of work, raise the temperature of B and lower that of A,\n",
    "> in contradiction to the second law of thermodynamics.\n",
    ">\n",
    "> James Clerk Maxwell in *Theory of Heat* (Maxwell, 1871) page 308\n",
    "\n",
    "He goes onto say:\n",
    "\n",
    "> This is only one of the instances in which conclusions which we have\n",
    "> draw from our experience of bodies consisting of an immense number of\n",
    "> molecules may be found not to be applicable to the more delicate\n",
    "> observations and experiments which we may suppose made by one who can\n",
    "> perceive and handle the individual molecules which we deal with only\n",
    "> in large masses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cac5f0f5-86df-4f91-9d4f-f7fb8bdae843",
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "nu.display_google_book(id='0p8AAAAAMAAJ', page='PA308')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40b662dd-3b32-4b43-a5fb-84410ba8d976",
   "metadata": {},
   "source": [
    "Figure: <i>Maxwell’s demon was designed to highlight the statistical\n",
    "nature of the second law of thermodynamics.</i>\n",
    "\n",
    "<canvas id=\"maxwell-canvas\" width=\"700\" height=\"500\" style=\"border:1px solid black;display:inline;text-align:left\">\n",
    "</canvas>\n",
    "\n",
    "Entropy:\n",
    "\n",
    "<output id=\"maxwell-entropy\">\n",
    "</output>\n",
    "\n",
    "<button id=\"maxwell-newball\" style=\"text-align:right\">\n",
    "\n",
    "New Ball\n",
    "\n",
    "</button>\n",
    "<button id=\"maxwell-pause\" style=\"text-align:right\">\n",
    "\n",
    "Pause\n",
    "\n",
    "</button>\n",
    "<button id=\"maxwell-skip\" style=\"text-align:right\">\n",
    "\n",
    "Skip 1000s\n",
    "\n",
    "</button>\n",
    "<button id=\"maxwell-histogram\" style=\"text-align:right\">\n",
    "\n",
    "Histogram\n",
    "\n",
    "</button>\n",
    "\n",
    "<script src=\"https://inverseprobability.com/talks/scripts//ballworld/maxwell.js\"></script>\n",
    "\n",
    "Figure: <i>Maxwell’s Demon. The demon decides balls are either cold\n",
    "(blue) or hot (red) according to their velocity. Balls are allowed to\n",
    "pass the green membrane from right to left only if they are cold, and\n",
    "from left to right, only if they are hot.</i>\n",
    "\n",
    "Maxwell’s demon allows us to connect thermodynamics with information\n",
    "theory (see e.g. Hosoya et al. (2015);Hosoya et al. (2011);Bub\n",
    "(2001);Brillouin (1951);Szilard (1929)). The connection arises due to a\n",
    "fundamental connection between information erasure and energy\n",
    "consumption Landauer (1961).\n",
    "\n",
    "Alemi and Fischer (2019)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57e5a83a-0da2-4cf2-a4d0-44b5b9f8a3ec",
   "metadata": {},
   "source": [
    "# Information Theory and Thermodynamics\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/information-theory-overview.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/information-theory-overview.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Information theory provides a mathematical framework for quantifying\n",
    "information. Many of information theory’s core concepts parallel those\n",
    "found in thermodynamics. The theory was developed by Claude Shannon who\n",
    "spoke extensively to MIT’s Norbert Wiener at while it was in development\n",
    "(Conway and Siegelman, 2005). Wiener’s own ideas about information were\n",
    "inspired by Willard Gibbs, one of the pioneers of the mathematical\n",
    "understanding of free energy and entropy. Deep connections between\n",
    "physical systems and information processing have connected information\n",
    "and energy from the start."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f1d0809-c066-4981-92f6-b96301609384",
   "metadata": {},
   "source": [
    "## Entropy\n",
    "\n",
    "Shannon’s entropy measures the uncertainty or unpredictability of\n",
    "information content. This mathematical formulation is inspired by\n",
    "thermodynamic entropy, which describes the dispersal of energy in\n",
    "physical systems. Both concepts quantify the number of possible states\n",
    "and their probabilities.\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information/maxwell-demon.svg\" class=\"\" width=\"60%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Maxwell’s demon thought experiment illustrates the\n",
    "relationship between information and thermodynamics.</i>\n",
    "\n",
    "In thermodynamics, free energy represents the energy available to do\n",
    "work. A system naturally evolves to minimize its free energy, finding\n",
    "equilibrium between total energy and entropy. Free energy principles are\n",
    "also pervasive in variational methods in machine learning. They emerge\n",
    "from Bayesian approaches to learning and have been heavily promoted by\n",
    "e.g. Karl Friston as a model for the brain.\n",
    "\n",
    "The relationship between entropy and Free Energy can be explored through\n",
    "the Legendre transform. This is most easily reviewed if we restrict\n",
    "ourselves to distributions in the exponential family."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "945dad9f-0d0f-40eb-9379-4bda019e1d05",
   "metadata": {},
   "source": [
    "## Exponential Family\n",
    "\n",
    "The exponential family has the form $$\n",
    "  \\rho(Z) = h(Z) \\exp\\left(\\boldsymbol{\\theta}^\\top T(Z) + A(\\boldsymbol{\\theta})\\right)\n",
    "$$ where $h(Z)$ is the base measure, $\\boldsymbol{\\theta}$ is the\n",
    "natural parameters, $T(Z)$ is the sufficient statistics and\n",
    "$A(\\boldsymbol{\\theta})$ is the log partition function. Its entropy can\n",
    "be computed as $$\n",
    "  S(Z) = A(\\boldsymbol{\\theta}) - \\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta}A(\\boldsymbol{\\theta}) - E_{\\rho(Z)}\\left[\\log h(Z)\\right],\n",
    "$$ where $E_{\\rho(Z)}[\\cdot]$ is the expectation under the distribution\n",
    "$\\rho(Z)$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00c9ae12-4ffd-43b2-961c-ac5da227feae",
   "metadata": {},
   "source": [
    "## Available Energy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1df675b-bd11-497f-a6cc-fb0ea3a946dd",
   "metadata": {},
   "source": [
    "## Work through Measurement\n",
    "\n",
    "In machine learning and Bayesian inference, the Markov blanket is the\n",
    "set of variables that are conditionally independent of the variable of\n",
    "interest given the other variables. To introduce this idea into our\n",
    "information system, we first split the system into two parts, the\n",
    "variables, $X$, and the memory $M$.\n",
    "\n",
    "The variables are the portion of the system that is stochastically\n",
    "evolving over time. The memory is a low entropy partition of the system\n",
    "that will give us knowledge about this evolution.\n",
    "\n",
    "We can now write the joint entropy of the system in terms of the mutual\n",
    "information between the variables and the memory. $$\n",
    "S(Z) = S(X,M) = S(X|M) + S(M) = S(X) - I(X;M) + S(M).\n",
    "$$ This gives us the first hint at the connection between information\n",
    "and energy.\n",
    "\n",
    "If $M$ is viewed as a measurement then the change in entropy of the\n",
    "system before and after measurement is given by $S(X|M) - S(X)$ wehich\n",
    "is given by $-I(X;M)$. This is implies that measurement increases the\n",
    "amount of available energy we obtain from the system (Parrondo et al.,\n",
    "2015).\n",
    "\n",
    "The difference in available energy is given by $$\n",
    "\\Delta A = A(X) - A(Z|M) = I(X;M),\n",
    "$$ where we note that the resulting system is no longer in thermodynamic\n",
    "equilibrium due to the low entropy of the memory."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07607c88-1427-4d51-8066-b5335715da74",
   "metadata": {},
   "source": [
    "## The Animal Game\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/the-animal-game.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/the-animal-game.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The Entropy Game is a framework for understanding efficient uncertainty\n",
    "reduction. To start think of finding the optimal strategy for\n",
    "identifying an unknown entity by asking the minimum number of yes/no\n",
    "questions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82f605f5-c3c6-4c0c-b3c1-baa8ab83bd90",
   "metadata": {},
   "source": [
    "## The 20 Questions Paradigm\n",
    "\n",
    "In the game of 20 Questions player one (Alice) thinks of an object,\n",
    "player two (Bob) must identify it by asking at most 20 yes/no questions.\n",
    "The optimal strategy is to divide the possibility space in half with\n",
    "each question. The binary search approach ensures maximum information\n",
    "gain with each inquiry and can access $2^20$ or about a million\n",
    "different objects.\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information/binary-search-tree.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>The optimal strategy in the Entropy Game resembles a binary\n",
    "search, dividing the search space in half with each question.</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e9e72cf-93f7-46e7-8fa4-d24a989514d9",
   "metadata": {},
   "source": [
    "## Entropy Reduction and Decisions\n",
    "\n",
    "From an information-theoretic perspective, decisions can be taken in a\n",
    "way that efficiently reduces entropy - our the uncertainty about the\n",
    "state of the world. Each observation or action an intelligent agent\n",
    "takes should maximize expected information gain, optimally reducing\n",
    "uncertainty given available resources.\n",
    "\n",
    "The entropy before the question is $S(X)$. The entropy after the\n",
    "question is $S(X|M)$. The information gain is the difference between the\n",
    "two, $I(X;M) = S(X) - S(X|M)$. Optimal decision making systems maximize\n",
    "this information gain per unit cost."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff841de9-42fc-4cbf-830a-054111c9ac22",
   "metadata": {},
   "source": [
    "## Thermodynamic Parallels\n",
    "\n",
    "The entropy game connects decision-making to thermodynamics.\n",
    "\n",
    "This perspective suggests a profound connection: intelligence might be\n",
    "understood as a special case of systems that efficiently extract,\n",
    "process, and utilize free energy from their environments, with\n",
    "thermodynamic principles setting fundamental constraints on what’s\n",
    "possible."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b557f18-53e9-4947-9abf-75337f5a97e8",
   "metadata": {},
   "source": [
    "# Information Engines: Intelligence as an Energy-Efficiency\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/intelligence-thermodynamics-connection.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/intelligence-thermodynamics-connection.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The entropy game shows some parallels between thermodynamics and\n",
    "measurement. This allows us to imagine *information engines*, simple\n",
    "systems that convert information to energy. This is our first simple\n",
    "model of intelligence."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ef8c661-2052-43f3-b2d3-9bb9127ac9ec",
   "metadata": {},
   "source": [
    "## Measurement as a Thermodynamic Process: Information-Modified Second Law\n",
    "\n",
    "The second law of thermodynamics was generalised to include the effect\n",
    "of measurement by Sagawa and Ueda (Sagawa and Ueda, 2008). They showed\n",
    "that the maximum extractable work from a system can be increased by\n",
    "$k_BTI(X;M)$ where $k_B$ is Boltzmann’s constant, $T$ is temperature and\n",
    "$I(X;M)$ is the information gained by making a measurement, $M$, $$\n",
    "I(X;M) = \\sum_{x,m} \\rho(x,m) \\log \\frac{\\rho(x,m)}{\\rho(x)\\rho(m)},\n",
    "$$ where $\\rho(x,m)$ is the joint probability of the system and\n",
    "measurement (see e.g. eq 14 in Sagawa and Ueda (2008)). This can be\n",
    "written as $$\n",
    "W_\\text{ext} \\leq  - \\Delta\\mathcal{F} + k_BTI(X;M),\n",
    "$$ where $W_\\text{ext}$ is the extractable work and it is upper bounded\n",
    "by the negative change in free energy, $\\Delta \\mathcal{F}$, plus the\n",
    "energy gained from measurement, $k_BTI(X;M)$. This is the\n",
    "information-modified second law.\n",
    "\n",
    "The measurements can be seen as a thermodynamic process. In theory\n",
    "measurement, like computation is reversible. But in practice the process\n",
    "of measurement is likely to erode the free energy somewhat, but as long\n",
    "as the energy gained from information, $kTI(X;M)$ is greater than that\n",
    "spent in measurement the pricess can be thermodynamically efficient.\n",
    "\n",
    "The modified second law shows that the maximum additional extractable\n",
    "work is proportional to the information gained. So information\n",
    "acquisition creates extractable work potential. Thermodynamic\n",
    "consistency is maintained by properly accounting for information-entropy\n",
    "relationships."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e808d30-9ec1-48e8-9110-8a54102b6a48",
   "metadata": {},
   "source": [
    "## Efficacy of Feedback Control\n",
    "\n",
    "Sagawa and Ueda extended this relationship to provide a *generalised\n",
    "Jarzynski equality* for feedback processes (Sagawa and Ueda, 2010). The\n",
    "Jarzynski equality is an imporant result from nonequilibrium\n",
    "thermodynamics that relates the average work done across an ensemble to\n",
    "the free energy difference between initial and final states (Jarzynski,\n",
    "1997), $$\n",
    "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\right\\rangle = \\exp\\left(-\\frac{\\Delta\\mathcal{F}}{k_BT}\\right),\n",
    "$$ where $\\langle W \\rangle$ is the average work done across an ensemble\n",
    "of trajectories, $\\Delta\\mathcal{F}$ is the change in free energy, $k_B$\n",
    "is Boltzmann’s constant, and $\\Delta S$ is the change in entropy. Sagawa\n",
    "and Ueda extended this equality to to include information gain from\n",
    "measurement (Sagawa and Ueda, 2010), $$\n",
    "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\exp\\left(\\frac{\\Delta\\mathcal{F}}{k_BT}\\right) \\exp\\left(-\\mathcal{I}(X;M)\\right)\\right\\rangle = 1,\n",
    "$$ where $\\mathcal{I}(X;M) = \\log \\frac{\\rho(X|M)}{\\rho(X)}$ is the\n",
    "information gain from measurement, and the mutual information is\n",
    "recovered $I(X;M) = \\left\\langle \\mathcal{I}(X;M) \\right\\rangle$ as the\n",
    "average information gain.\n",
    "\n",
    "Sagawa and Ueda introduce an *efficacy* term that captures the effect of\n",
    "feedback on the system they note in the presence of feedback, $$\n",
    "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\exp\\left(\\frac{\\Delta\\mathcal{F}}{k_BT}\\right)\\right\\rangle = \\gamma,\n",
    "$$ where $\\gamma$ is the efficacy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d27f23c1-13b3-4329-96d5-7604b740da34",
   "metadata": {},
   "source": [
    "## Channel Coding Perspective on Memory\n",
    "\n",
    "When viewing $M$ as an information channel between past and future\n",
    "states, Shannon’s channel coding theorems apply (Shannon, 1948). The\n",
    "channel capacity $C$ represents the maximum rate of reliable information\n",
    "transmission \\[ C = \\_{(M)} I(X_1;M) \\] and for a memory of $n$ bits we\n",
    "have \\[ C n, \\] as the mutual information is upper bounded by the\n",
    "entropy of $\\rho(M)$ which is at most $n$ bits.\n",
    "\n",
    "This relationship seems to align with Ashby’s Law of Requisite Variety\n",
    "(pg 229 Ashby (1952)), which states that a control system must have at\n",
    "least as much ‘variety’ as the system it aims to control. In the context\n",
    "of memory systems, this means that to maintain temporal correlations\n",
    "effectively, the memory’s state space must be at least as large as the\n",
    "information content it needs to preserve. This provides a lower bound on\n",
    "the necessary memory capacity that complements the bound we get from\n",
    "Shannon for channel capacity.\n",
    "\n",
    "This helps determine the required memory size for maintaining temporal\n",
    "correlations, optimal coding strategies, and fundamental limits on\n",
    "temporal correlation preservation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1c4b6c1-f717-41bd-90bc-35a4bc01e94a",
   "metadata": {},
   "source": [
    "# Decomposition into Past and Future"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe94d6dd-2dac-4ab1-b9c6-2df068e6cd1f",
   "metadata": {},
   "source": [
    "## Model Approximations and Thermodynamic Efficiency\n",
    "\n",
    "Intelligent systems must balance measurement against energy efficiency\n",
    "and time requirements. A perfect model of the world would require\n",
    "infinite computational resources and speed, so approximations are\n",
    "necessary. This leads to uncertainties. Thermodynamics might be thought\n",
    "of as the physics of uncertainty: at equilibrium thermodynamic systems\n",
    "find thermodynamic states that minimize free energy, equivalent to\n",
    "maximising entropy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a4bf86f-636b-4084-bb12-f152f08c309e",
   "metadata": {},
   "source": [
    "## Markov Blanket\n",
    "\n",
    "To introduce some structure to the model assumption. We split $X$ into\n",
    "$X_0$ and $X_1$. $X_0$ is past and present of the system, $X_1$ is\n",
    "future The conditional mutual information $I(X_0;X_1|M)$ which is zero\n",
    "if $X_1$ and $X_0$ are independent conditioned on $M$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26de9400-7a4e-4ff2-af13-a95b3589b611",
   "metadata": {},
   "source": [
    "## At What Scales Does this Apply?\n",
    "\n",
    "The equipartition theorem tells us that at equilibrium the average\n",
    "energy is $kT/2$ per degree of freedom. This means that for systems that\n",
    "operate at “human scale” the energy involved is many orders of magnitude\n",
    "larger than the amount of information we can store in memory. For a car\n",
    "engine producing 70 kW of power at 370 Kelvin, this implies $$\n",
    "\\frac{2 \\times 70,000}{370 \\times k_B} = \\frac{2 \\times 70,000}{370\\times 1.380649×10^{−23}} = 2.74 × 10^{25} \n",
    "$$ degrees of freedom per second. If we make a conservative assumption\n",
    "of one bit per degree of freedom, then the mutual information we would\n",
    "require in one second for comparative energy production would be around\n",
    "3400 zettabytes, implying a memory bandwidth of around 3,400 zettabytes\n",
    "per second. In 2025 the estimate of all the data in the world stands at\n",
    "149 zettabytes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "160e4b6b-935c-4505-a054-69a45f20faed",
   "metadata": {},
   "source": [
    "## Small-Scale Biochemical Systems and Information Processing\n",
    "\n",
    "While macroscopic systems operate in regimes where traditional\n",
    "thermodynamics dominates, microscopic biological systems operate at\n",
    "scales where information and thermal fluctuations become critically\n",
    "important. Here we examine how the framework applies to molecular\n",
    "machines and processes that have evolved to operate efficiently at these\n",
    "scales.\n",
    "\n",
    "Molecular machines like ATP synthase, kinesin motors, and the\n",
    "photosynthetic apparatus can be viewed as sophisticated information\n",
    "engines that convert energy while processing information about their\n",
    "environment. These systems have evolved to exploit thermal fluctuations\n",
    "rather than fight against them, using information processing to extract\n",
    "useful work."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf4530a7-c311-4af2-bc09-da46cdfe2667",
   "metadata": {},
   "source": [
    "## ATP Synthase: Nature’s Rotary Engine\n",
    "\n",
    "ATP synthase functions as a rotary molecular motor that synthesizes ATP\n",
    "from ADP and inorganic phosphate using a proton gradient. The system\n",
    "uses the proton gradient as both an energy source and an information\n",
    "source about the cell’s energetic state and exploits Brownian motion\n",
    "through a ratchet mechanism. It converts information about proton\n",
    "locations into mechanical rotation and ultimately chemical energy with\n",
    "approximately 3-4 protons required per ATP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20093d38-42ab-46eb-8df8-c96c34fca4a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.lib.display import YouTubeVideo\n",
    "YouTubeVideo('kXpzp4RDGJI')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "179d85ab-26f4-4164-a94e-d2c72d6a9a0c",
   "metadata": {},
   "source": [
    "Estimates suggest that one synapse firing may require $10^4$ ATP\n",
    "molecules, so around $4 \\times 10^4$ protons. If we take the human brain\n",
    "as containing around $10^{14}$ synapses, and if we suggest each synapse\n",
    "only fires about once every five seconds, we would require approximately\n",
    "$10^{18}$ protons per second to power the synapses in our brain. With\n",
    "each proton having six degrees of freedom. Under these rough\n",
    "calculations the memory capacity distributed across the ATP Synthase in\n",
    "our brain must be of order $6 \\times 10^{18}$ bits per second or 750\n",
    "petabytes of information per second. Of course this memory capacity\n",
    "would be devolved across the billions of neurons within hundreds or\n",
    "thousands of mitochondria that each can contain thousands of ATP\n",
    "synthase molecules. By composition of extremely small systems we can see\n",
    "it’s possible to improve efficiencies in ways that seem very impractical\n",
    "for a car engine.\n",
    "\n",
    "Quick note to clarify, here we’re referring to the information\n",
    "requirements to make our brain more energy efficient in its information\n",
    "processing rather than the information processing capabilities of the\n",
    "neurons themselves!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "267e0366-bcfd-4892-80dc-10aadfc42fc9",
   "metadata": {},
   "source": [
    "## Jaynes’s Maximum Entropy Principle\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/jaynes-maximum-entropy.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/jaynes-maximum-entropy.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "In his seminal 1957 paper (Jaynes, 1957), Ed Jaynes proposed a\n",
    "foundation for statistical mechanics based on information theory. Rather\n",
    "than relying on ergodic hypotheses or ensemble interpretations, Jaynes\n",
    "recast that the problem of assigning probabilities in statistical as a\n",
    "problem of inference with incomplete information.\n",
    "\n",
    "A central problem in statistical mechanics is assigning initial\n",
    "probabilities when our knowledge is incomplete. For example, if we know\n",
    "only the average energy of a system, what probability distribution\n",
    "should we use? Jaynes argued that we should use the distribution that\n",
    "maximizes entropy subject to the constraints of our knowledge.\n",
    "\n",
    "Jaynes illustrated the approachwith a simple example: Suppose a die has\n",
    "been tossed many times, with an average result of 4.5 rather than the\n",
    "expected 3.5 for a fair die. What probability assignment $P_n$\n",
    "($n=1,2,...,6$) should we make for the next toss?\n",
    "\n",
    "We need to satisfy two constraints\n",
    "\n",
    "Many distributions could satisfy these constraints, but which one makes\n",
    "the fewest unwarranted assumptions? Jaynes argued that we should choose\n",
    "the distribution that is maximally noncommittal with respect to missing\n",
    "information - the one that maximizes the entropy, This principle leads\n",
    "to the exponential family of distributions, which in statistical\n",
    "mechanics gives us the canonical ensemble and other familiar\n",
    "distributions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "829148e6-c55b-459b-b788-3f029aa243b3",
   "metadata": {},
   "source": [
    "## The General Maximum-Entropy Formalism\n",
    "\n",
    "For a more general case, suppose a quantity $x$ can take values\n",
    "$(x_1, x_2, \\ldots, x_n)$ and we know the average values of several\n",
    "functions $f_k(x)$. The problem is to find the probability assignment\n",
    "$p_i = p(x_i)$ that satisfies and maximizes the entropy\n",
    "$S_I = -\\sum_{i=1}^n p_i \\log p_i$.\n",
    "\n",
    "Using Lagrange multipliers, the solution is the generalized canonical\n",
    "distribution, where $Z(\\lambda_1,\\ldots,\\lambda_m)$ is the partition\n",
    "function, The Lagrange multipliers $\\lambda_k$ are determined by the\n",
    "constraints, The maximum attainable entropy is"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d70737c4-b5ee-4fa7-b1e4-a64832ea816b",
   "metadata": {},
   "source": [
    "## Jaynes’ World\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Jaynes’ World is a zero-player game that implements a version of the\n",
    "entropy game. The dynamical system is defined by a distribution,\n",
    "$\\rho(Z)$, over a state space $Z$. The state space is partitioned into\n",
    "observable variables $X$ and memory variables $M$. The memory variables\n",
    "are considered to be in an *information resevoir*, a thermodynamic\n",
    "system that maintains information in an ordered state (see e.g. Barato\n",
    "and Seifert (2014)). The entropy of the whole system is bounded below by\n",
    "0 and above by $N$. So the entropy forms a *compact manifold* with\n",
    "respect to its parameters.\n",
    "\n",
    "Unlike the animal game, where decisions are made by reducing entropy at\n",
    "each step, our system evovles mathematically by maximising the\n",
    "instantaneous entropy production. Conceptually we can think of this as\n",
    "*ascending* the gradient of the entropy, $S(Z)$.\n",
    "\n",
    "In the animal game the questioner starts with maximum uncertainty and\n",
    "targets minimal uncertainty. Jaynes’ world starts with minimal\n",
    "uncertainty and aims for maximum uncertainty.\n",
    "\n",
    "We can phrase this as a thought experiment. Imagine you are in the game,\n",
    "at a given turn. You want to see where the game came from, so you look\n",
    "back across turns. The direction the game came from is now the direction\n",
    "of steepest descent. Regardless of where the game actually started it\n",
    "looks like it started at a minimal entropy configuration that we call\n",
    "the *origin*. Similarly, wherever the game is actually stopped there\n",
    "will nevertheless appear to be an end point we call *end* that will be a\n",
    "configuration of maximal entropy, $N$.\n",
    "\n",
    "This speculation allows us to impose the functional form of our\n",
    "proability distribution. As Jaynes has shown (Jaynes, 1957), the\n",
    "stationary points of a free-form optimisation (minimum or maximum) will\n",
    "place the distribution in the, $\\rho(Z)$ in the *exponential family*, $$\n",
    "\\rho(Z) = h(Z) \\exp(\\boldsymbol{\\theta}^\\top T(Z) - A(\\boldsymbol{\\theta})),\n",
    "$$ where $h(Z)$ is the base measure, $T(Z)$ are sufficient statistics,\n",
    "$A(\\boldsymbol{\\theta})$ is the log-partition function,\n",
    "$\\boldsymbol{\\theta}$ are the *natural parameters* of the distribution.}\n",
    "\n",
    "This constraint to the exponential family is highly convenient as we\n",
    "will rely on it heavily for the dynamics of the game. In particular, by\n",
    "focussing on the *natural parameters* we find that we are optimising\n",
    "within an *information geometry* (Amari, 2016). In exponential family\n",
    "distributions, the entropy gradient is given by, $$\n",
    "\\nabla_{\\boldsymbol{\\theta}}S(Z) = \\mathbf{g} = \\nabla^2_\\boldsymbol{\\theta} A(\\boldsymbol{\\theta}(M))\n",
    "$$ And the Fisher information matrix, $G(\\boldsymbol{\\theta})$, is also\n",
    "the *Hessian* of the manifold, $$\n",
    "G(\\boldsymbol{\\theta}) = \\nabla^2_{\\boldsymbol{\\theta}} A(\\boldsymbol{\\theta}) = \\text{Cov}[T(Z)].\n",
    "$$ Traditionally, when optimising on an information geometry we take\n",
    "*natural gradient* steps, equivalen to a Newton minimisation step, $$\n",
    "\\Delta \\boldsymbol{\\theta} = - G(\\boldsymbol{\\theta})^{-1} \\mathbf{g},\n",
    "$$ but this is not the direction that gives the instantaneious\n",
    "maximisation of the entropy production, instead our gradient step is\n",
    "given by $$\n",
    "\\Delta \\boldsymbol{\\theta} = \\eta \\mathbf{g},\n",
    "$$ where $\\eta$ is a ‘learning rate’."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "589a77f3-2e4f-44fa-bb04-9b2e9e767dee",
   "metadata": {},
   "source": [
    "## System Evolution\n",
    "\n",
    "We are now in a position to summarise the start state and the end state\n",
    "of our system, as well as to speculate on the nature of the transition\n",
    "between the two states."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52e0d5db-d508-4758-9831-d9e95df34a50",
   "metadata": {},
   "source": [
    "## Start State\n",
    "\n",
    "The *origin configuration* is a low entropy state, with value near the\n",
    "lower bound of 0. The information is highly structured, by definition we\n",
    "place all variables in $M$, the information resevoir at this time. The\n",
    "uncertainty principle is present to handle the competeing needs of\n",
    "precision in parameters (giving us the near-singular form for\n",
    "$\\boldsymbol{\\theta}(M)$, and capacity in the information channel that\n",
    "$M$ provides (the capacity $c(\\boldsymbol{\\theta})$ is upper bounded by\n",
    "$S(M)$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "569c3b94-fde4-4245-9c26-9fb5318864a6",
   "metadata": {},
   "source": [
    "## End State\n",
    "\n",
    "The *end configuration* is a high entropy state, near the upper bound.\n",
    "Both the minimal entropy and maximal entropy states are revealed by Ed\n",
    "Jaynes’ variational minimisation approach and are in the exponential\n",
    "family. In many cases a version of Zeno’s paradox will arise where the\n",
    "system asymtotes to the final state, taking smaller steps at each time.\n",
    "At this point the system is at equilibrium."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e783ef42-c53d-4b44-a2a0-9168267960d6",
   "metadata": {},
   "source": [
    "## Purpose\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-overview-and-definitions.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-overview-and-definitions.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "This game explores how structure, time, causality, and locality can\n",
    "emerge within a system governed solely by internal information-theoretic\n",
    "constraints. It serves as\n",
    "\n",
    "-   A *research framework* for observer-free dynamics and entropy-based\n",
    "    emergence,\n",
    "-   A *conceptual tool* for introducing deep ideas in physics in an\n",
    "    accessible, internally consistent setting."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5201c1c4-66eb-48ef-a8c8-fa14e4147d59",
   "metadata": {},
   "source": [
    "## Definitions and Global Constraints"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5ea5893-366b-444d-a54e-d9da17ef8a92",
   "metadata": {},
   "source": [
    "### System Structure\n",
    "\n",
    "-   Let $Z = \\{Z_1, Z_2, \\dots, Z_n\\}$ be the full set of system\n",
    "    variables.\n",
    "-   At time $t$, define a partition:\n",
    "    -   $X(t) \\subseteq Z$: active variables (currently contributing to\n",
    "        entropy)\n",
    "    -   $M(t) = Z \\setminus X(t)$: latent or frozen variables\n",
    "        (information reservoir)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b13c87bb-9318-4dc9-849e-95ed2afc8531",
   "metadata": {},
   "source": [
    "### Representation via Density Matrix\n",
    "\n",
    "-   The system’s state is given by a density matrix\n",
    "    $$\\rho(\\boldsymbol{\\theta}) = \\frac{1}{Z(\\boldsymbol{\\theta})} \\exp\\left( \\sum_i \\theta_i H_i \\right)$$\n",
    "    where\n",
    "\n",
    "    -   $\\boldsymbol{\\theta} \\in \\mathbb{R}^d$: natural parameters,\n",
    "    -   $H_i$: Hermitian operators associated with observables,\n",
    "    -   $Z(\\boldsymbol{\\theta}) = \\mathrm{Tr}[\\exp(\\sum_i \\theta_i H_i)]$\n",
    "\n",
    "-   The *log-partition function* is\n",
    "    $$A(\\boldsymbol{\\theta}) = \\log Z(\\boldsymbol{\\theta})$$\n",
    "\n",
    "-   The *entropy* is\n",
    "    $$S(\\boldsymbol{\\theta}) = A(\\boldsymbol{\\theta}) - \\boldsymbol{\\theta}^\\top \\nabla A(\\boldsymbol{\\theta})$$\n",
    "\n",
    "-   The *Fisher Information Matrix* is\n",
    "    $$G_{ij}(\\boldsymbol{\\theta}) = \\frac{\\partial^2 A}{\\partial \\theta_i \\partial \\theta_j}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "699b5bbc-fe55-4c0a-864f-151477c8bc25",
   "metadata": {},
   "source": [
    "### Entropy Capacity and Resolution\n",
    "\n",
    "-   The system has a *maximum entropy* of $N$ bits.\n",
    "\n",
    "-   This defines a *minimum detectable resolution* in natural parameter\n",
    "    space $$\\varepsilon \\sim \\frac{1}{2^N}$$\n",
    "\n",
    "-   Changes smaller than $\\varepsilon$ are treated as *invisible* by the\n",
    "    system.\n",
    "\n",
    "-   As a result, system dynamics exhibit *discrete, detectable\n",
    "    transitions* between distinguishable states."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "811e7350-ea4f-4ead-af7d-f7ca4606bb1e",
   "metadata": {},
   "source": [
    "### Clarification: Dual Role of Parameters and Variables\n",
    "\n",
    "-   Each variable $Z_i$ is associated with a generator $H_i$, and a\n",
    "    natural parameter $\\theta_i$.\n",
    "-   When we say a parameter $\\theta_i \\in X(t)$, we mean:\n",
    "    -   The component of the system associated with $H_i$ is active at\n",
    "        time $t$,\n",
    "    -   And its parameter is evolving with\n",
    "        $|\\dot{\\theta}_i| \\geq \\varepsilon$.\n",
    "-   This reflects the duality between *variables*, *observables*, and\n",
    "    *natural parameters* within exponential family representations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "738b6de9-4794-47f6-a085-8ca2fe60b2f7",
   "metadata": {},
   "source": [
    "## Core Axiom: Entropic Dynamics\n",
    "\n",
    "The system evolves by steepest ascent in entropy\n",
    "$$\\frac{d\\boldsymbol{\\theta}}{dt} = -G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4eccd2f1-ce2f-452e-a88c-07e9ceea963b",
   "metadata": {},
   "source": [
    "## Constructed Quantities and Lemmas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8e9fc34-184d-4a3d-ad6d-45963558450b",
   "metadata": {},
   "source": [
    "### Definition: Latent-to-Active Information Flow Functional\n",
    "\n",
    "Define the *information flow from latent structure to active parameters*\n",
    "as: $$\n",
    "\\Psi(t) := \\boldsymbol{\\theta}_{X(t)}^\\top G_{X(t) M(t)} \\boldsymbol{\\theta}_{M(t)}\n",
    "$$ This functional captures the curvature-mediated coupling between the\n",
    "current active system $X(t)$ and the latent reservoir $M(t)$. It\n",
    "reflects the degree to which structural information in $M$ contributes\n",
    "to entropy growth in $X$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f19085a-1b10-4c80-b0c8-b534d4f17f26",
   "metadata": {},
   "source": [
    "### Variable Partition\n",
    "\n",
    "$$X(t) = \\left\\{ i \\mid \\left| \\frac{d\\theta_i}{dt} \\right| \\geq \\varepsilon \\right\\}, \\quad M(t) = Z \\setminus X(t)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "280878ad-23c2-48e2-b352-558ad7b99736",
   "metadata": {},
   "source": [
    "## Lemma 1: Form of the Minimal Entropy Configuration\n",
    "\n",
    "The minimal-entropy state compatible with the system’s resolution\n",
    "constraint and regularity condition is represented by a density matrix\n",
    "of the exponential form:\n",
    "$$\\rho(\\boldsymbol{\\theta}_0) = \\frac{1}{Z(\\boldsymbol{\\theta}_0)} \\exp\\left( \\sum_i \\theta_{0i} H_i \\right)$$\n",
    "\n",
    "where $\\boldsymbol{\\theta}_0 \\approx \\boldsymbol{0}$, and all components\n",
    "$\\theta_{0i}$ are sub-threshold: $$|\\dot{\\theta}_{0i}| < \\varepsilon$$\n",
    "\n",
    "This state minimizes entropy under the constraint that it remains\n",
    "regular, continuous, and detectable only above a resolution scale\n",
    "$\\varepsilon \\sim 1/2^N$. Its structure can be derived via a\n",
    "*minimum-entropy analogue of Jaynes’ formalism*, using the same density\n",
    "matrix geometry but inverted optimization."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53b2ce63-6f25-4e57-842e-785735a3e01a",
   "metadata": {},
   "source": [
    "### Lemma 2: Symmetry Breaking\n",
    "\n",
    "If $\\theta_k \\in M(t)$ and $|\\dot{\\theta}_k| \\geq \\varepsilon$, then\n",
    "$$\\theta_k \\in X(t + \\delta)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7b9f36a-e848-44bb-b48a-d5e3c2da320f",
   "metadata": {},
   "source": [
    "### Entropy-Time\n",
    "\n",
    "$$\\tau(t) := S_{X(t)}(t)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e18711f-e817-461b-9acc-528c5a6eaf75",
   "metadata": {},
   "source": [
    "### Lemma 3: Monotonicity of Entropy-Time\n",
    "\n",
    "$$\\tau(t_2) \\geq \\tau(t_1) \\quad \\text{for all } t_2 > t_1$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05ff9faf-fe8a-44b3-a063-d4617276a292",
   "metadata": {},
   "source": [
    "### Corollary: Irreversibility\n",
    "\n",
    "$\\tau(t)$ increases monotonically, preventing time-reversal globally."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acc78660-4d90-4624-8e4a-2aa701b81096",
   "metadata": {},
   "source": [
    "### Variational Principle Within a Symmetry Class\n",
    "\n",
    "$$\\delta \\int_{\\tau_i}^{\\tau_{i+1}} \\boldsymbol{\\theta}_{X_i}^\\top G_{X_i X_i} \\boldsymbol{\\theta}_{X_i} \\, d\\tau = 0$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba70ae65-c5d5-417e-8909-32f5127110cc",
   "metadata": {},
   "source": [
    "## Lemma 4: Frieden-Analogous Extremal Flow\n",
    "\n",
    "At points where the latent-to-active flow functional $\\Psi(t)$ is\n",
    "locally extremal (i.e., \\$ = 0 \\$), the system may exhibit:\n",
    "\n",
    "-   Temporary stability or critical slowing,\n",
    "-   Transitions between symmetry classes (e.g., emergence of a new\n",
    "    active variable),\n",
    "-   Reconfiguration of internal curvature flow.\n",
    "\n",
    "These transitions play a role analogous to *extremizing an internal\n",
    "information exchange*, similar in spirit to Frieden’s\n",
    "$\\delta(I - J) = 0$ condition, but realized without requiring an\n",
    "observer or measurement."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e70f8f07-bd06-4c4b-9a9d-03569b92fe01",
   "metadata": {},
   "source": [
    "## Speculative Implications and Hypotheses\n",
    "\n",
    "-   *Local Reversibility* within fixed symmetry classes\n",
    "-   *Latent Memory*: influence of inactive variables through curvature\n",
    "-   *Pseudo-Saddles*: slow evolution from flat entropy curvature\n",
    "-   *Conditional Independence*: emergent locality via block structure in\n",
    "    $G$\n",
    "-   *Domain Transitions*: new behaviour as variables emerge or stall"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba11e956-c969-446f-9aa7-0609589c3b31",
   "metadata": {},
   "source": [
    "## Interpretation and Nuance"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1779cb13-40e9-4091-80fe-714a2faf8642",
   "metadata": {},
   "source": [
    "### 1. Apparent Zero-Entropy Start\n",
    "\n",
    "The game behaves as if it originates at low entropy, but this is a\n",
    "consequence of the entropy ascent, not an assumption."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2caf00c-528e-4a79-99fa-3b9a837418e7",
   "metadata": {},
   "source": [
    "### 2. Discretisation from Entropy Capacity\n",
    "\n",
    "Finite entropy bounds imply resolution constraints, producing discrete\n",
    "transitions without discretizing the space."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da166e56-73d2-41e3-a15f-44afcff41d5e",
   "metadata": {},
   "source": [
    "### 3. Dual Role of Parameters and Variables\n",
    "\n",
    "“$\\theta_i \\in X(t)$” means the observable governed by $H_i$ is actively\n",
    "evolving. Variables, parameters, and observables are dual facets of the\n",
    "representation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd47ac48-1526-47f7-80aa-c35cc0cfff78",
   "metadata": {},
   "source": [
    "### 4. Irreversibility vs Local Reversibility\n",
    "\n",
    "Monotonic entropy-time induces global irreversibility, but local\n",
    "symmetry classes may evolve reversibly."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbb04e82-d5bd-4f7f-a266-b5f0e2cb5fa3",
   "metadata": {},
   "source": [
    "### 5. Fisher Information is an Analytic Tool\n",
    "\n",
    "$G(\\boldsymbol{\\theta})$ helps us understand evolution—it is not known\n",
    "or used by the system itself."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "236bea1a-9740-4be1-a38e-e734e1b60470",
   "metadata": {},
   "source": [
    "### 6. No Observer or Collapse Needed\n",
    "\n",
    "Structure emerges from the system’s internal curvature and resolution\n",
    "constraint, without measurement postulates."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "736a86db-7f6f-4734-8d46-563cd15f3c66",
   "metadata": {},
   "source": [
    "### 7. Singularity Avoidance\n",
    "\n",
    "True singularities (e.g. delta functions) are excluded; minimal-entropy\n",
    "states are regularized via density matrices."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d9a64f6-cc2d-4b40-a6ff-5b191a54a610",
   "metadata": {},
   "source": [
    "### 8. Variational Principle is Optional\n",
    "\n",
    "Only valid within fixed symmetry classes. It offers insight but is not\n",
    "required by the system’s evolution."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa525edb-ade1-474e-bc5e-39d401eb4e15",
   "metadata": {},
   "source": [
    "## Connection to Frieden’s EPI: A Conceptual Analogue\n",
    "\n",
    "This game is not built to explain or reproduce any specific physical\n",
    "theory, but it reveals structural patterns that *resemble variational\n",
    "information principles*, such as those used in Frieden’s Extreme\n",
    "Physical Information (EPI) framework.\n",
    "\n",
    "Frieden’s principle proposes that physical systems arise by extremizing\n",
    "a quantity $I - J$, where:\n",
    "\n",
    "-   $J$ is source (intrinsic) information,\n",
    "-   $I$ is Fisher information extracted by an observer.\n",
    "\n",
    "In this game:\n",
    "\n",
    "-   No observer is present,\n",
    "-   But we still have a structure ($G(\\boldsymbol{\\theta})$) that\n",
    "    defines *latent curvature* (reservoir $M$),\n",
    "-   And *detectable entropy flow* (active $X$).\n",
    "\n",
    "The *information flow functional*: $$\n",
    "\\Psi(t) = \\boldsymbol{\\theta}_X^\\top G_{XM} \\boldsymbol{\\theta}_M\n",
    "$$ serves as a natural analogue to *internal information exchange*,\n",
    "without reference to measurement.\n",
    "\n",
    "When $\\Psi(t)$ is extremal, the system may undergo a qualitative shift\n",
    "in its active structure—suggesting an internal, geometry-driven analogue\n",
    "to EPI, but interpreted entirely within the unfolding of the system\n",
    "itself."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87430dee-ddc3-41d2-9ec6-dc1d5281b41b",
   "metadata": {},
   "source": [
    "## From Maximum to Minimal Entropy\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/jaynes-minimal-entropy.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/jaynes-minimal-entropy.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Jaynes formulated his principle in terms of maximizing entropy, we can\n",
    "also view certain problems as minimizing entropy under appropriate\n",
    "constraints. The duality becomes apparent when we consider the\n",
    "relationship between entropy and information.\n",
    "\n",
    "The maximum entropy principle finds the distribution that is maximally\n",
    "noncommittal given certain constraints. Conversely, we can seek the\n",
    "distribution that minimizes entropy subject to different constraints -\n",
    "this represents the distribution with maximum structure or information.\n",
    "\n",
    "Consider the uncertainty principle. When we seek states that minimize\n",
    "the product of position and momentum uncertainties, we are seeking\n",
    "minimal entropy states subject to the constraint of the uncertainty\n",
    "principle.\n",
    "\n",
    "The mathematical formalism remains the same, but with different\n",
    "constraints and optimization direction, where $g_k$ are functions\n",
    "representing constraints different from simple averages.\n",
    "\n",
    "The solution still takes the form of an exponential family, where\n",
    "$\\mu_k$ are Lagrange multipliers for the constraints."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "430d308c-ee12-41ac-aa55-e4b6d216d6d0",
   "metadata": {},
   "source": [
    "## Minimal Entropy States in Quantum Systems\n",
    "\n",
    "The pure states of quantum mechanics are those that minimize von Neumann\n",
    "entropy $S = -\\text{Tr}(\\rho \\log \\rho)$ subject to the constraints of\n",
    "quantum mechanics.\n",
    "\n",
    "For example, coherent states minimize the entropy subject to constraints\n",
    "on the expectation values of position and momentum operators. These\n",
    "states achieve the minimum uncertainty allowed by quantum mechanics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42e99eb2-4410-4ef6-9944-57f06b7d95e8",
   "metadata": {},
   "source": [
    "## Histogram Game\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-histogram.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-histogram.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "To illustrate the concept of the Jaynes’ world entropy game we’ll run a\n",
    "simple example using a four bin histogram. The entropy of a four bin\n",
    "histogram can be computed as, $$\n",
    "S(p) = - \\sum_{i=1}^4 p_i \\log_2 p_i.\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0f35223-d45e-4a98-ab9a-a49e8289f0d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eed3894d-1b13-453d-9857-fca8c23aff93",
   "metadata": {},
   "source": [
    "First we write some helper code to plot the histogram and compute its\n",
    "entropy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9ea8912-fc2c-4c55-908c-074e2fa7b46d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01f293b5-1174-4107-80dd-2ee6717eddaf",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_histogram(ax, p, max_height=None):\n",
    "    heights = p\n",
    "    if max_height is None:\n",
    "        max_height = 1.25*heights.max()\n",
    "    \n",
    "    # Safe entropy calculation that handles zeros\n",
    "    nonzero_p = p[p > 0]  # Filter out zeros\n",
    "    S = - (nonzero_p*np.log2(nonzero_p)).sum()\n",
    "\n",
    "    # Define bin edges\n",
    "    bins = [1, 2, 3, 4, 5]  # Bin edges\n",
    "\n",
    "    # Create the histogram\n",
    "    if ax is None:\n",
    "        fig, ax = plt.subplots(figsize=(6, 4))  # Adjust figure size \n",
    "    ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black') # Use weights for probabilities\n",
    "\n",
    "\n",
    "    # Customize the plot for better slide presentation\n",
    "    ax.set_xlabel(\"Bin\")\n",
    "    ax.set_ylabel(\"Probability\")\n",
    "    ax.set_title(f\"Four Bin Histogram (Entropy {S:.3f})\")\n",
    "    ax.set_xticks(bins[:-1]) # Show correct x ticks\n",
    "    ax.set_ylim(0,max_height) # Set y limit for visual appeal"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "368c5d40-4803-43b8-be0a-8ee25387bf35",
   "metadata": {},
   "source": [
    "We can compute the entropy of any given histogram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf0e6dde-6efb-4d95-ae08-aeb02610776f",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Define probabilities\n",
    "p = np.zeros(4)\n",
    "p[0] = 4/13\n",
    "p[1] = 3/13\n",
    "p[2] = 3.7/13\n",
    "p[3] = 1 - p.sum()\n",
    "\n",
    "# Safe entropy calculation\n",
    "nonzero_p = p[p > 0]  # Filter out zeros\n",
    "entropy = - (nonzero_p*np.log2(nonzero_p)).sum()\n",
    "print(f\"The entropy of the histogram is {entropy:.3f}.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c46802e5-5dd5-40fa-83fe-3c004e11bb3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bd27ed9-02fe-4ca6-ba83-90a29504a290",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "fig.tight_layout()\n",
    "plot_histogram(ax, p)\n",
    "ax.set_title(f\"Four Bin Histogram (Entropy {entropy:.3f})\")\n",
    "mlai.write_figure(filename='four-bin-histogram.svg', \n",
    "                  directory = './information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "869c1885-9860-4c8e-b37a-73cc9b0e66fb",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>The entropy of a four bin histogram.</i>\n",
    "\n",
    "We can play the entropy game by starting with a histogram with all the\n",
    "probability mass in the first bin and then ascending the gradient of the\n",
    "entropy function."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c1c020c-eaea-4bb5-9614-d99edac861a1",
   "metadata": {},
   "source": [
    "## Two-Bin Histogram Example\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/two-bin-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/two-bin-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The simplest possible example of Jaynes’ World is a two-bin histogram\n",
    "with probabilities $p$ and $1-p$. This minimal system allows us to\n",
    "visualize the entire entropy landscape.\n",
    "\n",
    "The natural parameter is the log odds, $\\theta = \\log\\frac{p}{1-p}$, and\n",
    "the update given by the entropy gradient is $$\n",
    "\\Delta \\theta_{\\text{steepest}} = \\eta \\frac{\\text{d}S}{\\text{d}\\theta} = \\eta p(1-p)(\\log(1-p) - \\log p).\n",
    "$$ The Fisher information is $$\n",
    "G(\\theta) = p(1-p)\n",
    "$$ This creates a dynamic where as $p$ approaches either 0 or 1 (minimal\n",
    "entropy states), the Fisher information approaches zero, creating a\n",
    "critical slowing” effect. This critical slowing is what leads to the\n",
    "formation of *information resevoirs*. Note also that in the *natural\n",
    "gradient* the updated is given by multiplying the gradient by the\n",
    "inverse Fisher information, which would lead to a more efficient update\n",
    "of the form, $$\n",
    "\\Delta \\theta_{\\text{natural}} =  \\eta(\\log(1-p) - \\log p),\n",
    "$$ however, it is this efficiency that we want our game to avoid,\n",
    "because it is the inefficient behaviour in the reagion of saddle points\n",
    "that leads to critical slowing and the emergence of information\n",
    "resevoirs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64949b7f-a707-4d5a-b7e8-ba3c5fac0981",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d85dc031-38d7-42ba-acea-541cd5dbfb23",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Python code for gradients\n",
    "p_values = np.linspace(0.000001, 0.999999, 10000)\n",
    "theta_values = np.log(p_values/(1-p_values))\n",
    "entropy = -p_values * np.log(p_values) - (1-p_values) * np.log(1-p_values)\n",
    "fisher_info = p_values * (1-p_values)\n",
    "gradient = fisher_info * (np.log(1-p_values) - np.log(p_values))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f04f4afc-9674-4158-b72d-080ef5ee29d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "145cfbf8-df3b-48ad-811b-0bffbf2b67a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=plot.big_wide_figsize)\n",
    "\n",
    "ax1.plot(theta_values, entropy)\n",
    "ax1.set_xlabel('$\\\\theta$')\n",
    "ax1.set_ylabel('Entropy $S(p)$')\n",
    "ax1.set_title('Entropy Landscape')\n",
    "\n",
    "ax2.plot(theta_values, gradient)\n",
    "ax2.set_xlabel('$\\\\theta$')\n",
    "ax2.set_ylabel('$\\\\nabla_\\\\theta S(p)$')\n",
    "ax2.set_title('Entropy Gradient vs. Position')\n",
    "\n",
    "mlai.write_figure(filename='two-bin-histogram-entropy-gradients.svg', \n",
    "                  directory = './information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4fa66ab-5761-4789-8d13-2f126f020739",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-histogram-entropy-gradients.svg\" class=\"\" width=\"95%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy gradients of the two bin histogram agains\n",
    "position.</i>\n",
    "\n",
    "This example reveals the entropy extrema at $p = 0$, $p = 0.5$, and\n",
    "$p = 1$. At minimal entropy ($p \\approx 0$ or $p \\approx 1$), the\n",
    "gradient approaches zero, creating natural information reservoirs. The\n",
    "dynamics slow dramatically near these points - these are the areas of\n",
    "critical slowing that create information reservoirs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24cc42df-512c-4a25-9cbf-7f97c3ce19fe",
   "metadata": {},
   "source": [
    "## Gradient Ascent in Natural Parameter Space\n",
    "\n",
    "We can visualize the entropy maximization process by performing gradient\n",
    "ascent in the natural parameter space $\\theta$. Starting from a\n",
    "low-entropy state, we follow the gradient of entropy with respect to\n",
    "$\\theta$ to reach the maximum entropy state."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0307facd-698f-471b-b0fa-af532b9b00fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a305a530-c886-459b-b4a5-133eed4659b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper functions for two-bin histogram\n",
    "def theta_to_p(theta):\n",
    "    \"\"\"Convert natural parameter theta to probability p\"\"\"\n",
    "    return 1.0 / (1.0 + np.exp(-theta))\n",
    "\n",
    "def p_to_theta(p):\n",
    "    \"\"\"Convert probability p to natural parameter theta\"\"\"\n",
    "    # Add small epsilon to avoid numerical issues\n",
    "    p = np.clip(p, 1e-10, 1-1e-10)\n",
    "    return np.log(p/(1-p))\n",
    "\n",
    "def entropy(theta):\n",
    "    \"\"\"Compute entropy for given theta\"\"\"\n",
    "    p = theta_to_p(theta)\n",
    "    # Safe entropy calculation\n",
    "    return -p * np.log2(p) - (1-p) * np.log2(1-p)\n",
    "\n",
    "def entropy_gradient(theta):\n",
    "    \"\"\"Compute gradient of entropy with respect to theta\"\"\"\n",
    "    p = theta_to_p(theta)\n",
    "    return p * (1-p) * (np.log2(1-p) - np.log2(p))\n",
    "\n",
    "def plot_histogram(ax, theta, max_height=None):\n",
    "    \"\"\"Plot two-bin histogram for given theta\"\"\"\n",
    "    p = theta_to_p(theta)\n",
    "    heights = np.array([p, 1-p])\n",
    "    \n",
    "    if max_height is None:\n",
    "        max_height = 1.25\n",
    "    \n",
    "    # Compute entropy\n",
    "    S = entropy(theta)\n",
    "    \n",
    "    # Create the histogram\n",
    "    bins = [1, 2, 3]  # Bin edges\n",
    "    if ax is None:\n",
    "        fig, ax = plt.subplots(figsize=(6, 4))\n",
    "    ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black')\n",
    "    \n",
    "    # Customize the plot\n",
    "    ax.set_xlabel(\"Bin\")\n",
    "    ax.set_ylabel(\"Probability\")\n",
    "    ax.set_title(f\"Two-Bin Histogram (Entropy {S:.3f})\")\n",
    "    ax.set_xticks(bins[:-1])\n",
    "    ax.set_ylim(0, max_height)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4cd35bb8-5083-4d75-b0a1-6885ba9551c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters for gradient ascent\n",
    "theta_initial = -9.0  # Start with low entropy \n",
    "learning_rate = 1\n",
    "num_steps = 1500\n",
    "\n",
    "# Initialize\n",
    "theta_current = theta_initial\n",
    "theta_history = [theta_current]\n",
    "p_history = [theta_to_p(theta_current)]\n",
    "entropy_history = [entropy(theta_current)]\n",
    "\n",
    "# Perform gradient ascent in theta space\n",
    "for step in range(num_steps):\n",
    "    # Compute gradient\n",
    "    grad = entropy_gradient(theta_current)\n",
    "    \n",
    "    # Update theta\n",
    "    theta_current = theta_current + learning_rate * grad\n",
    "    \n",
    "    # Store history\n",
    "    theta_history.append(theta_current)\n",
    "    p_history.append(theta_to_p(theta_current))\n",
    "    entropy_history.append(entropy(theta_current))\n",
    "    if step % 100 == 0:\n",
    "        print(f\"Step {step+1}: θ = {theta_current:.4f}, p = {p_history[-1]:.4f}, Entropy = {entropy_history[-1]:.4f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1dd1c5a-bef0-4bda-8384-807ae9c72003",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a05e23d-f355-47f0-b1cc-52c6a1479849",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a figure showing the evolution\n",
    "fig, axes = plt.subplots(2, 3, figsize=(15, 8))\n",
    "fig.tight_layout(pad=3.0)\n",
    "\n",
    "# Select steps to display\n",
    "steps_to_show = [0, 300, 600, 900, 1200, 1500]\n",
    "\n",
    "# Plot histograms for selected steps\n",
    "for i, step in enumerate(steps_to_show):\n",
    "    row, col = i // 3, i % 3\n",
    "    plot_histogram(axes[row, col], theta_history[step])\n",
    "    axes[row, col].set_title(f\"Step {step}: θ = {theta_history[step]:.2f}, p = {p_history[step]:.3f}\")\n",
    "\n",
    "mlai.write_figure(filename='two-bin-histogram-evolution.svg', \n",
    "                  directory = './information-game')\n",
    "\n",
    "# Plot entropy evolution\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(range(num_steps+1), entropy_history, 'o-')\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Entropy')\n",
    "plt.title('Entropy Evolution During Gradient Ascent')\n",
    "plt.grid(True)\n",
    "mlai.write_figure(filename='two-bin-entropy-evolution.svg', \n",
    "                  directory = './information-game')\n",
    "\n",
    "# Plot trajectory in theta space\n",
    "plt.figure(figsize=(10, 6))\n",
    "theta_range = np.linspace(-5, 5, 1000)\n",
    "entropy_curve = [entropy(t) for t in theta_range]\n",
    "plt.plot(theta_range, entropy_curve, 'b-', label='Entropy Landscape')\n",
    "plt.plot(theta_history, entropy_history, 'ro-', label='Gradient Ascent Path')\n",
    "plt.xlabel('Natural Parameter θ')\n",
    "plt.ylabel('Entropy')\n",
    "plt.title('Gradient Ascent Trajectory in Natural Parameter Space')\n",
    "plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "mlai.write_figure(filename='two-bin-trajectory.svg', \n",
    "                  directory = './information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a389213-2423-40e0-9e51-b0f6589c4c02",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-histogram-evolution.svg\" class=\"\" width=\"95%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Evolution of the two-bin histogram during gradient ascent in\n",
    "natural parameter space.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-entropy-evolution.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy evolution during gradient ascent for the two-bin\n",
    "histogram.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-trajectory.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Gradient ascent trajectory in the natural parameter space for\n",
    "the two-bin histogram.</i>\n",
    "\n",
    "The gradient ascent visualization shows how the system evolves in the\n",
    "natural parameter space $\\theta$. Starting from a negative $\\theta$\n",
    "(corresponding to a low-entropy state with $p << 0.5$), the system\n",
    "follows the gradient of entropy with respect to $\\theta$ until it\n",
    "reaches $\\theta = 0$ (corresponding to $p = 0.5$), which is the maximum\n",
    "entropy state.\n",
    "\n",
    "Note that the maximum entropy occurs at $\\theta = 0$, which corresponds\n",
    "to $p = 0.5$. The gradient of entropy with respect to $\\theta$ is zero\n",
    "at this point, making it a stable equilibrium for the gradient ascent\n",
    "process."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8cd18746-53ce-48d4-93a6-950b5ad09b8d",
   "metadata": {},
   "source": [
    "## Uncertainty Principle\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-uncertainty-principle.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-uncertainty-principle.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "One challenge is how to parameterise our exponential family. We’ve\n",
    "mentioned that the variables $Z$ are partitioned into observable\n",
    "variables $X$ and memory variables $M$. Given the minimal entropy\n",
    "initial state, the obvious initial choice is that at the origin all\n",
    "variables, $Z$, should be in the information reservoir, $M$. This\n",
    "implies that they are well determined and present a sensible choice for\n",
    "the source of our parameters.\n",
    "\n",
    "We define a mapping, $\\boldsymbol{\\theta}(M)$, that maps the information\n",
    "resevoir to a set of values that are equivalent to the *natural\n",
    "parameters*. If the entropy of these parameters is low, and the\n",
    "distribution $\\rho(\\boldsymbol{\\theta})$ is sharply peaked then we can\n",
    "move from treating the memory mapping, $\\boldsymbol{\\theta}(\\cdot)$, as\n",
    "a random processe to an assumption that it is a deterministic function.\n",
    "We can then follow gradients with respect to these $\\boldsymbol{\\theta}$\n",
    "values.\n",
    "\n",
    "This allows us to rewrite the distribution over $Z$ in a conditional\n",
    "form, $$\n",
    "\\rho(X|M) = h(X) \\exp(\\boldsymbol{\\theta}(M)^\\top T(X) - A(\\boldsymbol{\\theta}(M))).\n",
    "$$\n",
    "\n",
    "Unfortunately this assumption implies that $\\boldsymbol{\\theta}(\\cdot)$\n",
    "is a delta function, and since our representation as a compact manifold\n",
    "(bounded below by $0$ and above by $N$) it does not admit any such\n",
    "singularities."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c0ada4a-328c-47ef-8003-789bc98bb060",
   "metadata": {},
   "source": [
    "## Formal Derivation of the Uncertainty Principle\n",
    "\n",
    "We can derive the uncertainty principle formally from the\n",
    "information-theoretic properties of the system. Consider the mutual\n",
    "information between parameters $\\boldsymbol{\\theta}(M)$ and capacity\n",
    "variables $c(M)$: $$\n",
    "I(\\boldsymbol{\\theta}(M); c(M)) = H(\\boldsymbol{\\theta}(M)) + H(c(M)) - H(\\boldsymbol{\\theta}(M), c(M))\n",
    "$$ where $H(\\cdot)$ represents differential entropy.\n",
    "\n",
    "Since the total entropy of the system is bounded by $N$, we know that\n",
    "$h(\\boldsymbol{\\theta}(M), c(M)) \\leq N$. Additionally, for any two\n",
    "random variables, the mutual information satisfies\n",
    "$I(\\boldsymbol{\\theta}(M); c(M)) \\geq 0$, with equality if and only if\n",
    "they are independent.\n",
    "\n",
    "For our system to function as an effective information reservoir,\n",
    "$\\boldsymbol{\\theta}(M)$ and $c(M)$ cannot be independent - they must\n",
    "share information. This gives us, $$\n",
    "h(\\boldsymbol{\\theta}(M)) + h(c(M)) \\geq h(\\boldsymbol{\\theta}(M), c(M)) + I_{\\min}\n",
    "$$ where $I_{\\min} > 0$ is the minimum mutual information required for\n",
    "the system to function.\n",
    "\n",
    "For variables with fixed variance, differential entropy is maximized by\n",
    "Gaussian distributions. For a multivariate Gaussian with covariance\n",
    "matrix $\\Sigma$, the differential entropy is: $$\n",
    "h(\\mathcal{N}(0, \\Sigma)) = \\frac{1}{2}\\ln\\left((2\\pi e)^d|\\Sigma|\\right)\n",
    "$$ where $d$ is the dimensionality and $|\\Sigma|$ is the determinant of\n",
    "the covariance matrix.\n",
    "\n",
    "The Cramér-Rao inequality provides a lower bound on the variance of any\n",
    "unbiased estimator. If $\\boldsymbol{\\theta}$ is a parameter vector and\n",
    "$\\hat{\\boldsymbol{\\theta}}$ is an unbiased estimator, then: $$\n",
    "\\text{Cov}(\\hat{\\boldsymbol{\\theta}}) \\geq G^{-1}(\\boldsymbol{\\theta})\n",
    "$$ where $G(\\boldsymbol{\\theta})$ is the Fisher information matrix.\n",
    "\n",
    "In our context, the relationship between parameters\n",
    "$\\boldsymbol{\\theta}(M)$ and capacity variables $c(M)$ follows a similar\n",
    "bound. The Fisher information matrix for exponential family\n",
    "distributions has a special property: it equals the covariance of the\n",
    "sufficient statistics, which in our case are represented by the capacity\n",
    "variables $c(M)$. This gives us $$\n",
    "G(\\boldsymbol{\\theta}(M)) = \\text{Cov}(c(M))\n",
    "$$\n",
    "\n",
    "Applying the Cramér-Rao inequality we have $$\n",
    "\\text{Cov}(\\boldsymbol{\\theta}(M)) \\cdot \\text{Cov}(c(M)) \\geq G^{-1}(\\boldsymbol{\\theta}(M)) \\cdot G(\\boldsymbol{\\theta}(M)) = \\mathbf{I}\n",
    "$$ where $\\mathbf{I}$ is the identity matrix.\n",
    "\n",
    "For one-dimensional projections, this matrix inequality implies, $$\n",
    "\\text{Var}(\\boldsymbol{\\theta}(M)) \\cdot \\text{Var}(c(M)) \\geq 1\n",
    "$$ and converting to standard deviations we have $$\n",
    "\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq 1.\n",
    "$$\n",
    "\n",
    "When we incorporate the minimum mutual information constraint\n",
    "$I_{\\min}$, the bound tightens. Using the relationship between\n",
    "differential entropy and mutual information, we can derive $$\n",
    "\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k,\n",
    "$$ where $k = \\frac{1}{2\\pi e}e^{2I_{\\min}}$.\n",
    "\n",
    "This is our uncertainty principle, directly derived from\n",
    "information-theoretic constraints and the Cramér-Rao bound. It\n",
    "represents the fundamental trade-off between precision in parameter\n",
    "specification and capacity for information storage."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad8fc11d-1eda-4e31-b69d-8eb07833fdbd",
   "metadata": {},
   "source": [
    "## Definition of Capacity Variables\n",
    "\n",
    "We now provide a precise definition of the capacity variables $c(M)$.\n",
    "The capacity variables quantify the potential of memory variables to\n",
    "store information about observable variables. Mathematically, we define\n",
    "$c(M)$ as, $$\n",
    "c(M) = \\nabla_{\\boldsymbol{\\theta}} A(\\boldsymbol{\\theta}(M))\n",
    "$$ where $A(\\boldsymbol{\\theta})$ is the log-partition function from our\n",
    "exponential family distribution. This definition has a clear\n",
    "interpretation: $c(M)$ represents the expected values of the sufficient\n",
    "statistics under the current parameter values.\n",
    "\n",
    "This definition also naturally yields the Fourier relationship between\n",
    "parameters and capacity. In exponential families, the log-partition\n",
    "function and its derivatives form a Legendre transform pair, which is\n",
    "the mathematical basis for the Fourier duality we claim. Specifically,\n",
    "if we define the Fourier transform operator $\\mathcal{F}$ as the mapping\n",
    "that takes parameters to expected sufficient statistics, then: $$\n",
    "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c98d200-70bf-49c8-9909-e6833224653e",
   "metadata": {},
   "source": [
    "## Capacity $\\leftrightarrow$ Precision Paradox\n",
    "\n",
    "This creates an apparent paradox, at minimal entropy states, the\n",
    "information reservoir must simultaneously maintain precision in the\n",
    "parameters $\\boldsymbol{\\theta}(M)$ (for accurate system representation)\n",
    "but it must also provide sufficient capacity $c(M)$ (for information\n",
    "storage).\n",
    "\n",
    "The trade-off can be expressed as, $$\n",
    "\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k,\n",
    "$$ where $k$ is a constant. This relationship can be recognised as a\n",
    "natural *uncertainty principle* that underpins the behaviour of the\n",
    "game. This principle is a necessary consequence of information theory.\n",
    "It follows from the requirement for the parameter-like states, $M$ to\n",
    "have both precision and high capacity (in the Shannon sense ). The\n",
    "uncertainty principle ensures that when parameters are sharply defined\n",
    "(low $\\Delta\\boldsymbol{\\theta}$), the capacity variables have high\n",
    "uncertainty (high $\\Delta c$), allowing information to be encoded in\n",
    "their relationships rather than absolute values.\n",
    "\n",
    "This trade-off between precision and capacity directly parallels\n",
    "Shannon’s insights about information transmission (Shannon, 1948), where\n",
    "he demonstrated that increasing the precision of a signal requires\n",
    "increasing bandwidth or reducing noise immunity—creating an inherent\n",
    "trade-off in any communication system. Our formulation extends this\n",
    "principle to the information reservoir’s parameter space.\n",
    "\n",
    "In practice this means that the parameters $\\boldsymbol{\\theta}(M)$ and\n",
    "capacity variables $c(M)$ must form a Fourier-dual pair, $$\n",
    "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)],\n",
    "$$ This duality becomes important at saddle points when direct gradient\n",
    "ascent stalls.\n",
    "\n",
    "The mathematical formulation of the uncertainty principle comes from\n",
    "Hirschman Jr (1957) and later refined by Beckner (1975) and\n",
    "Białynicki-Birula and Mycielski (1975). These works demonstrated that\n",
    "Shannon’s information-theoretic entropy provides a natural framework for\n",
    "expressing the uncertainty principle, establishing a direct bridge\n",
    "between the mathematical formalism of quantum mechanics and information\n",
    "theory. Our capacity-precision trade-off follows this tradition,\n",
    "expressing the fundamental limits of information processing in our\n",
    "system."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74c29f31-b82e-4864-b877-f9901c9c8e5d",
   "metadata": {},
   "source": [
    "## Quantum vs Classical Information Reservoirs\n",
    "\n",
    "The uncertainty principle means that the game can exhibit quantum-like\n",
    "information processing regimes during evolution. This inspires an\n",
    "information-theoretic perspective on the quantum-classical transition.\n",
    "\n",
    "At minimal entropy states near the origin, the information reservoir has\n",
    "characteristics reminiscent of quantum systems.\n",
    "\n",
    "1.  *Wave-like information encoding*: The information reservoir near the\n",
    "    origin necessarily encodes information in distributed,\n",
    "    interference-capable patterns due to the uncertainty principle\n",
    "    between parameters $\\boldsymbol{\\theta}(M)$ and capacity variables\n",
    "    $c(M)$.\n",
    "\n",
    "2.  *Non-local correlations*: Parameters are highly correlated through\n",
    "    the Fisher information matrix, creating structures where information\n",
    "    is stored in relationships rather than individual variables.\n",
    "\n",
    "3.  *Uncertainty-saturated regime*: The uncertainty relationship\n",
    "    $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k$ is nearly\n",
    "    saturated (approaches equality), similar to Heisenberg’s uncertainty\n",
    "    principle in quantum systems and the entropic uncertainty relations\n",
    "    established by Białynicki-Birula and Mycielski (1975).\n",
    "\n",
    "As the system evolves towards higher entropy states, a transition occurs\n",
    "where some variables exhibit classical behavior.\n",
    "\n",
    "1.  *From wave-like to particle-like*: Variables transitioning from $M$\n",
    "    to $X$ shift from storing information in interference patterns to\n",
    "    storing it in definite values with statistical uncertainty.\n",
    "\n",
    "2.  *Decoherence-like process*: The uncertainty product\n",
    "    $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M)$ for these variables\n",
    "    grows significantly larger than the minimum value $k$, indicating a\n",
    "    departure from quantum-like behavior.\n",
    "\n",
    "3.  *Local information encoding*: Information becomes increasingly\n",
    "    encoded in local variables rather than distributed correlations.\n",
    "\n",
    "The saddle points in our entropy landscape mark critical transitions\n",
    "between quantum-like and classical information processing regimes. Near\n",
    "these points\n",
    "\n",
    "1.  The critically slowed modes maintain quantum-like characteristics,\n",
    "    functioning as coherent memory that preserves information through\n",
    "    interference patterns.\n",
    "\n",
    "2.  The rapidly evolving modes exhibit classical characteristics,\n",
    "    functioning as incoherent processors that manipulate information\n",
    "    through statistical operations.\n",
    "\n",
    "3.  This natural separation creates a hybrid computational architecture\n",
    "    where quantum-like memory interfaces with classical-like processing.\n",
    "\n",
    "The quantum-classical transition can be quantified using the moment\n",
    "generating function $M_Z(t)$. In quantum-like regimes, the MGF exhibits\n",
    "oscillatory behavior with complex analytic structure, whereas in\n",
    "classical regimes, it grows monotonically with simple analytic\n",
    "structure. The transition between these behaviors identifies variables\n",
    "moving between quantum-like and classical information processing modes.\n",
    "\n",
    "This perspective suggests that what we recognize as “quantum” versus\n",
    "“classical” behavior may fundamentally reflect different regimes of\n",
    "information processing - one optimized for coherent information storage\n",
    "(quantum-like) and the other for flexible information manipulation\n",
    "(classical-like). The emergence of both regimes from our\n",
    "entropy-maximizing model indicates that nature may exploit this\n",
    "computational architecture to optimize information processing across\n",
    "multiple scales.\n",
    "\n",
    "This formulation of the uncertainty principle in terms of information\n",
    "capacity and parameter precision follows the tradition established by\n",
    "Shannon (1948) and expanded upon by Hirschman Jr (1957) and others who\n",
    "connected information entropy uncertainty to Heisenberg’s uncertainty."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c706d0ed-c17e-4f5e-947b-8af4f82fc496",
   "metadata": {},
   "source": [
    "## Quantitative Demonstration\n",
    "\n",
    "We can demonstrate this principle quantitatively through a simple model.\n",
    "Consider a two-dimensional system with memory variables $M = (m_1, m_2)$\n",
    "that map to parameters\n",
    "$\\boldsymbol{\\theta}(M) = (\\theta_1(m_1), \\theta_2(m_2))$. The capacity\n",
    "variables are $c(M) = (c_1(m_1), c_2(m_2))$.\n",
    "\n",
    "At minimal entropy, when the system is near the origin, the uncertainty\n",
    "product is exactly: $$\n",
    "\\Delta\\theta_i(m_i) \\cdot \\Delta c_i(m_i) = k\n",
    "$$ for each dimension $i$.\n",
    "\n",
    "As the system evolves and entropy increases, some variables transition\n",
    "to classical behavior with: $$\n",
    "\\Delta\\theta_i(m_i) \\cdot \\Delta c_i(m_i) \\gg k\n",
    "$$\n",
    "\n",
    "This increased product reflects the transition from quantum-like to\n",
    "classical information processing. The variables that maintain the\n",
    "minimal uncertainty product $k$ continue to function as coherent\n",
    "information reservoirs, while those with larger uncertainty products\n",
    "function as classical processors.\n",
    "\n",
    "This principle provides testable predictions for any system modeled as\n",
    "an information reservoir. Specifically, we predict that variables\n",
    "functioning as effective memory must demonstrate precision-capacity\n",
    "trade-offs near the theoretical minimum $k$, while processing variables\n",
    "will show excess uncertainty above this minimum."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1187203a-cab9-42e1-b2e0-6b4b779a66f0",
   "metadata": {},
   "source": [
    "## Maximum Entropy and Density Matrices\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_physics/includes/jaynes-density-matrices.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_physics/includes/jaynes-density-matrices.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "In Jaynes (1957) Jaynes showed how the maximum entropy formalism is\n",
    "applied, in later papers such as Jaynes (1963) he showed how his maximum\n",
    "entropy formalism could be applied to von Neumann entropy of a density\n",
    "matrix.\n",
    "\n",
    "As Jaynes noted in his 1962 Brandeis lectures: “Assignment of initial\n",
    "probabilities must, in order to be useful, agree with the initial\n",
    "information we have (i.e., the results of measurements of certain\n",
    "parameters). For example, we might know that at time $t = 0$, a nuclear\n",
    "spin system having total (measured) magnetic moment $M(0)$, is placed in\n",
    "a magnetic field $H$, and the problem is to predict the subsequent\n",
    "variation $M(t)$… What initial density matrix for the spin system\n",
    "$\\rho(0)$, should we use?”\n",
    "\n",
    "Jaynes recognized that we should choose the density matrix that\n",
    "maximizes the von Neumann entropy, subject to constraints from our\n",
    "measurements, where $M_{op}$ is the operator corresponding to total\n",
    "magnetic moment.\n",
    "\n",
    "The solution is the quantum version of the maximum entropy distribution,\n",
    "where $A_i$ are the operators corresponding to measured observables,\n",
    "$\\lambda_i$ are Lagrange multipliers, and\n",
    "$Z = \\text{Tr}[\\exp(-\\lambda_1 A_1 - \\cdots - \\lambda_m A_m)]$ is the\n",
    "partition function.\n",
    "\n",
    "This unifies classical entropies and density matrix entropies under the\n",
    "same information-theoretic principle. It clarifies that quantum states\n",
    "with minimum entropy (pure states) represent maximum information, while\n",
    "mixed states represent incomplete information.\n",
    "\n",
    "Jaynes further noted that “strictly speaking, all this should be\n",
    "restated in terms of quantum theory using the density matrix formalism.\n",
    "This will introduce the $N!$ permutation factor, a natural zero for\n",
    "entropy, alteration of numerical values if discreteness of energy levels\n",
    "becomes comparable to $k_BT$, etc.”"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "127b907e-6d90-459d-b6ea-11032ba2f515",
   "metadata": {},
   "source": [
    "## Variational Derivation of the Initial Curvature Structure\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/minimal-entropy-density-matrix.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/minimal-entropy-density-matrix.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "We will determine constraints on the Fisher Information Matrix\n",
    "$G(\\boldsymbol{\\theta})$ that are consistent with the system’s unfolding\n",
    "rules and internal information geometry. We follow Jaynes (**Jaynes?**\n",
    "-) in solving a variational problem that captures the allowed structure\n",
    "of the system’s origin (minimal entropy) state.\n",
    "\n",
    "This section walks through the derivation of the minimal entropy\n",
    "configuration using Jaynes’s free-form variational principle. We aim to\n",
    "derive the form of the density matrix or distribution directly from\n",
    "information-theoretic constraints. The goal is also to make the process\n",
    "accessible to those unfamiliar with Jaynes’s original maximum entropy\n",
    "formalism.\n",
    "\n",
    "A density matrix has the form $$\n",
    "\\rho(\\boldsymbol{\\theta}) = \\frac{1}{Z(\\boldsymbol{\\theta})} \\exp\\left( \\sum_i \\theta_i H_i \\right)\n",
    "$$ where\n",
    "$Z(\\boldsymbol{\\theta}) = \\mathrm{tr}\\left[\\exp\\left( \\sum_i \\theta_i H_i \\right)\\right]$\n",
    "and $\\boldsymbol{\\theta} \\in \\mathbb{R}^d$, $H_i$ are Hermitian\n",
    "observables."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbffe832-3157-45cf-a129-6d9ddeefcdbd",
   "metadata": {},
   "source": [
    "## Jaynesian Derivation of Minimal Entropy Configuration\n",
    "\n",
    "Jaynes proposed that statistical mechanics problems should be treated as\n",
    "problems of inference. One assigns a probability distribution (or\n",
    "density matrix) that is maximally noncommittal with respect to missing\n",
    "information, subject to known constraints.\n",
    "\n",
    "While Jaynes applied this idea to derive the *maximum entropy*\n",
    "configuration given constraints, here we adapt it to derive the *minimum\n",
    "entropy* configuration, under an assumption of zero initial entropy\n",
    "bounded by a maximum entropy of $N$ bits.\n",
    "\n",
    "Let $\\rho$ be a density matrix describing the state of a system. The von\n",
    "Neumann entropy is, $$\n",
    "S[\\rho] = -\\mathrm{tr}(\\rho \\log \\rho),\n",
    "$$ we wish to *minimize* $S[\\rho]$, subject to constraints that encode\n",
    "the resolution bounds.\n",
    "\n",
    "In the game we assume that the system begins in a state of minimal\n",
    "entropy, the state cannot be a delta function (no singularities, so it\n",
    "must obey a resolution constraint $\\varepsilon$) and the entropy is\n",
    "bounded above by $N$ bits: $S[\\rho] \\leq N$.\n",
    "\n",
    "We apply a variational principle where we minimise $$\n",
    "S[\\rho] = -\\mathrm{tr}(\\rho \\log \\rho)\n",
    "$$ subject to normalization, $\\mathrm{tr}(\\rho) = 1$, a resolution\n",
    "constraint, $\\mathrm{tr}(\\rho \\hat{X}^2) \\geq \\epsilon^2$ and/or\n",
    "$\\mathrm{tr}(\\rho \\hat{P}^2) \\geq \\delta^2$, and other optional moment\n",
    "or dual-space constraints\n",
    "\n",
    "We introduce Lagrange multipliers $\\lambda_0$, $\\lambda_x$, $\\lambda_p$\n",
    "for these constraints and define the Lagrangian $$\n",
    "\\mathcal{L}[\\rho] = -\\mathrm{tr}(\\rho \\log \\rho)\n",
    "+ \\lambda_0 (\\mathrm{tr}(\\rho) - 1)\n",
    "- \\lambda_x (\\mathrm{tr}(\\rho \\hat{X}^2) - \\epsilon^2)\n",
    "- \\lambda_p (\\mathrm{tr}(\\rho \\hat{P}^2) - \\delta^2).\n",
    "$$\n",
    "\n",
    "To find the extremum, we take the functional derivative and set it to\n",
    "zero, $$\n",
    "\\frac{\\delta \\mathcal{L}}{\\delta \\rho} = -\\log \\rho - 1 - \\lambda_x \\hat{X}^2 - \\lambda_p \\hat{P}^2 + \\lambda_0 = 0\n",
    "$$ and solving for $\\rho$ gives $$\n",
    "\\rho = \\frac{1}{Z} \\exp\\left(-\\lambda_x \\hat{X}^2 - \\lambda_p \\hat{P}^2\\right)\n",
    "$$ where the partition function (which ensures normalisation) is $$\n",
    "Z = \\mathrm{tr}\\left[\\exp\\left(-\\lambda_x \\hat{X}^2 - \\lambda_p \\hat{P}^2\\right)\\right]\n",
    "$$ This is formally analogous to a Gaussian state for a density matrix,\n",
    "which is consistent with the minimum entropy distribution under\n",
    "uncertainty constraints.\n",
    "\n",
    "The Lagrange multipliers $\\lambda_x, \\lambda_p$ enforce lower bounds on\n",
    "variance. These define the natural parameters as $\\theta_x = -\\lambda_x$\n",
    "and $\\theta_p = -\\lambda_p$ in the exponential family form\n",
    "$\\rho(\\boldsymbol{\\theta}) \\propto \\exp(\\boldsymbol{\\theta} \\cdot \\mathbf{H})$.\n",
    "The form of $\\rho$ is a density matrix. The curvature (second\n",
    "derivative) of $\\log Z$ gives the Fisher Information matrix $G$.\n",
    "Steepest ascent trajectories in $\\boldsymbol{\\theta}$ space will trace\n",
    "the system’s entropy dynamics.\n",
    "\n",
    "For next steps we need to compute $G$ from $\\log Z$ to explore the\n",
    "information geometry. From this we should verify that the following\n",
    "conditions hold, $$\n",
    "\\left| \\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\\right]_i \\right| < \\varepsilon \\quad \\text{for all } i\n",
    "$$ which implies that all variables remain latent at initialization.\n",
    "\n",
    "The Hermitians have a *non-commuting observable pair* constraint, $$\n",
    "  [H_i, H_j] \\neq 0,\n",
    "$$ which is equivalent to an *uncertainty relation*, $$\n",
    "  \\mathrm{Var}(H_i) \\cdot \\mathrm{Var}(H_j) \\geq C > 0,\n",
    "$$ and ensures that we have *bounded curvature* $$\n",
    "\\mathrm{tr}(G(\\boldsymbol{\\theta})) \\geq \\gamma > 0.\n",
    "$$\n",
    "\n",
    "We can then use $\\varepsilon$ and $N$ to define initial thresholds and\n",
    "maximum resolution and examine how variables decouple and how\n",
    "saddle-like regions emerge as the landscape unfolds through gradient\n",
    "ascent.\n",
    "\n",
    "This constrained minimization problem yields the *structure of the\n",
    "initial density matrix* $\\rho(\\boldsymbol{\\theta}_0)$ and the\n",
    "*permissible curvature geometry* $G(\\boldsymbol{\\theta}_0)$ and a\n",
    "constraint-consistent basis of observables $\\{H_i\\}$ that have a\n",
    "quadratic form. This ensures the system begins in a *regular, latent,\n",
    "low-entropy state*.\n",
    "\n",
    "This serves as the foundational configuration from which entropy ascent\n",
    "and symmetry-breaking transitions emerge."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5917477a-65a2-4d38-8aa0-e030270cd2f0",
   "metadata": {},
   "source": [
    "## Fisher Information Matrix\n",
    "\n",
    "We’ll now derive the form of the Fisher Information Matrix\n",
    "$G(\\boldsymbol{\\theta})$ from the partition function: $$\n",
    "Z(\\boldsymbol{\\theta}) = \\mathrm{tr}\\left[\\exp\\left(\\sum_i \\theta_i H_i \\right)\\right]\n",
    "$$ We’ll proceed by differentiating with respect to $\\theta_i$ for the\n",
    "expectation values, then compute the second derivative to get the Fisher\n",
    "Information Matrix, $$\n",
    "G_{ij} = \\partial^2 \\log Z / \\partial \\theta_i \\partial \\theta_j.\n",
    "$$ which we’ll then link to the curvature.\n",
    "\n",
    "First we differentiate $\\log Z(\\boldsymbol{\\theta})$ with respect to\n",
    "$\\theta_i$, $$\n",
    "Z(\\boldsymbol{\\theta}) = \\mathrm{tr}\\left[ e^{\\sum_j \\theta_j H_j} \\right]\n",
    "$$ Taking the derivative of $\\log Z$ with respect to $\\theta_i$, we\n",
    "apply the chain rule to the definition of $\\log Z$, $$\n",
    "\\frac{\\partial \\log Z}{\\partial \\theta_i} = \\frac{1}{Z} \\frac{\\partial Z}{\\partial \\theta_i}\n",
    "= \\frac{1}{Z} \\mathrm{tr}\\left[ H_i \\, e^{\\sum_j \\theta_j H_j} \\right]\n",
    "$$ So we have $$\n",
    "\\frac{\\partial \\log Z}{\\partial \\theta_i} = \\mathrm{tr}(\\rho H_i) = \\langle H_i \\rangle\n",
    "$$ This is the expected value of $H_i$ under the current distribution\n",
    "$\\rho(\\boldsymbol{\\theta})$.\n",
    "\n",
    "We now compute the second derivative of $\\log Z(\\boldsymbol{\\theta})$ to\n",
    "obtain the Fisher Information Matrix elements $G_{ij}$, using the\n",
    "definition $$\n",
    "G_{ij} = \\frac{\\partial^2 \\log Z}{\\partial \\theta_i \\partial \\theta_j}\n",
    "$$ by differentiating the expression $$\n",
    "\\frac{\\partial \\log Z}{\\partial \\theta_i} = \\mathrm{tr}(\\rho H_i),\n",
    "$$ through another application of the product and chain rules. The\n",
    "second derivative then is $$\n",
    "\\frac{\\partial^2 \\log Z}{\\partial \\theta_i \\partial \\theta_j}\n",
    "= \\frac{\\partial}{\\partial \\theta_j} \\mathrm{tr}(\\rho H_i)\n",
    "= \\mathrm{tr}\\left( \\frac{\\partial \\rho}{\\partial \\theta_j} H_i \\right)\n",
    "$$ We can compute $\\frac{\\partial \\rho}{\\partial \\theta_j}$ since\n",
    "$\\rho = \\frac{1}{Z} e^{\\sum_k \\theta_k H_k}$, we can use the product\n",
    "rule $$\n",
    "\\frac{\\partial \\rho}{\\partial \\theta_j}\n",
    "= \\frac{\\partial}{\\partial \\theta_j} \\left( \\frac{1}{Z} e^{\\sum_k \\theta_k H_k} \\right)\n",
    "= -\\frac{1}{Z^2} \\frac{\\partial Z}{\\partial \\theta_j} e^{\\sum_k \\theta_k H_k} \\cdot \\frac{1}{Z} \\frac{\\partial}{\\partial \\theta_j} \\left( e^{\\sum_k \\theta_k H_k} \\right)\n",
    "$$ For the second term we use the operator identity for the exponential\n",
    "derivative, $$\n",
    "\\frac{\\partial \\rho}{\\partial \\theta_j} = \\rho \\left( H_j - \\langle H_j \\rangle \\right)\n",
    "$$ giving $$\n",
    "G_{ij} = \\mathrm{tr} \\left[ \\rho (H_j - \\langle H_j \\rangle) H_i \\right]\n",
    "= \\langle H_i H_j \\rangle - \\langle H_i \\rangle \\langle H_j \\rangle\n",
    "$$ or in other words the Fisher Information Matrix is the covariance\n",
    "matrix, $$\n",
    "G_{ij} = \\mathrm{Cov}(H_i, H_j).\n",
    "$$\n",
    "\n",
    "This reflects a structural property of the model: the log-partition\n",
    "function $\\log Z(\\boldsymbol{\\theta})$ acts as a cumulant generating\n",
    "function for the observables $H_i$. Its second derivatives yield the\n",
    "covariance matrix of the observables (i.e., the second cumulants\n",
    "correspond to variances and covariances). This induces a natural\n",
    "Riemannian geometry on the parameter space. The Fisher Information\n",
    "Matrix $G(\\boldsymbol{\\theta})$ encodes local curvature and sensitivity\n",
    "to variations in the natural parameters $\\boldsymbol{\\theta}$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43996692-a80c-4821-9b43-8c73bbc8912c",
   "metadata": {},
   "source": [
    "## Curvature and Latency Conditions\n",
    "\n",
    "The Fisher Information Matrix $G(\\boldsymbol{\\theta})$ describes the\n",
    "local curvature of the log-partition function\n",
    "$\\log Z(\\boldsymbol{\\theta})$ and thus characterises how sharply peaked\n",
    "the distribution $\\rho(\\boldsymbol{\\theta})$ is in different directions.\n",
    "Higher curvature reflects greater sensitivity to changes in the\n",
    "parameters — more ‘informative’ directions in the landscape.\n",
    "\n",
    "At the system’s origin, all degrees of freedom are latent — no variable\n",
    "has yet emerged. This means that the projection of curvature along the\n",
    "direction of each parameter must be small. We express this as a\n",
    "constraint on the vector $G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}$,\n",
    "requiring $$\n",
    "\\left| \\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\\right]_i \\right| < \\varepsilon \\quad \\text{for all } i.\n",
    "$$ This ensures that the entropy gradient, proportional to\n",
    "$G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}$, remains too shallow to\n",
    "trigger emergence in any direction — preserving the latent symmetry of\n",
    "the system at $\\boldsymbol{\\theta}_0$. This condition also has a\n",
    "geometric interpretation, $\\boldsymbol{\\theta}_0$ must lie near a saddle\n",
    "point or a flat region of the entropy landscape. The curvature is\n",
    "non-zero and globally bounded (from the uncertainty constraint), but its\n",
    "projection is uniformly suppressed across all directions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b94fbc38-4e7c-4ffa-854c-05f5d74e3bfc",
   "metadata": {},
   "source": [
    "## Initial Constraints and the Permissible Unfolding Domain\n",
    "\n",
    "We now summarise the constraints that define the initial domain from\n",
    "which the system unfolds. These constraints ensure the system begins in\n",
    "a regular, latent, low-entropy state, while preserving the potential for\n",
    "future variable emergence.\n",
    "\n",
    "We impose two key global conditions:\n",
    "\n",
    "1.  *Entropy bound*: $$\n",
    "    S[\\rho(\\boldsymbol{\\theta})] \\leq N,\n",
    "    $$ where $N$ specifies the total informational capacity of the\n",
    "    system. At the origin $\\boldsymbol{\\theta}_0$, the entropy is\n",
    "    minimal — but the system must remain within this global entropy\n",
    "    budget throughout its evolution.\n",
    "\n",
    "2.  *Directional latency*: $$\n",
    "    \\left| \\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta} \\right]_i \\right| < \\varepsilon \\quad \\text{for all } i.\n",
    "    $$ This enforces the condition that no direction in parameter space\n",
    "    dominates initially, preserving symmetry and avoiding premature\n",
    "    emergence.\n",
    "\n",
    "These two constraints together define a *feasible region* in parameter\n",
    "space: a domain $\\mathcal{D}_0 \\subset \\mathbb{R}^d$ where entropy\n",
    "remains low and gradients are uniformly shallow. In thermodynamics the\n",
    "entropy would be referred to as subextensive, i.e. it remains below\n",
    "capacity and does not yet scale with effective dimensionality.\n",
    "\n",
    "We interpret this region as a *latent pregeometry*: a structure defined\n",
    "not by explicit variables, but by information-theoretic curvature and\n",
    "bounded resolution. Emergence proceeds only when a trajectory moves\n",
    "outside this domain, at which point curvature accumulates along a\n",
    "dominant direction and a variable ‘activates’.\n",
    "\n",
    "The game therefore begins in $\\mathcal{D}_0$, where entropy gradients\n",
    "are present but too shallow to drive evolution. The unfolding process is\n",
    "triggered by internal perturbations or transitions that move the system\n",
    "into a steeper region of the entropy landscape — breaking the initial\n",
    "symmetry and initiating variable formation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7e4f78e-641f-4e86-8b69-61169d9601c8",
   "metadata": {},
   "source": [
    "## Gradient Ascent and the Initiation of Emergence\n",
    "\n",
    "Once the system leaves the latent domain (\\_0), it enters a region where\n",
    "curvature gradients become directionally amplified. In this region, the\n",
    "entropy gradient, $$\n",
    "\\nabla S[\\rho(\\boldsymbol{\\theta})] = G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta},\n",
    "$$ becomes sufficiently steep in one or more directions to drive\n",
    "dynamical activation. This transition marks the onset of emergence.\n",
    "\n",
    "The dynamics can be understood as entropic gradient ascent in the space\n",
    "of natural parameters $\\boldsymbol{\\theta}$, where the flow is governed\n",
    "by the information geometry encoded in the Fisher Information Matrix. At\n",
    "each point, the steepest ascent direction is given by $$\n",
    "\\frac{\\text{d}\\boldsymbol{\\theta}}{\\text{d}t} \\propto \\nabla S[\\rho] = G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}.\n",
    "$$\n",
    "\n",
    "This defines a deterministic flow field over parameter space. In regions\n",
    "of uniform curvature, the system evolves slowly and symmetrically. The\n",
    "system’s ability to resolve a specific direction corresponds to a rise\n",
    "in distinguishability, reflected in the local curvature profile. When\n",
    "one direction becomes locally dominant (i.e. the eigenvalues of\n",
    "$G(\\boldsymbol{\\theta})$ become asymmetric), the flow breaks symmetry\n",
    "and accelerates along that axis (i.e. a specific direction in\n",
    "$\\boldsymbol{\\theta}$-space becomes energetically or informationally\n",
    "preferred). A variable emerges.\n",
    "\n",
    "This process — emergence through spontaneous asymmetry in the curvature\n",
    "— does not require an external observer or measurement collapse.\n",
    "Instead, it is an internal dynamical effect of the geometry itself. A\n",
    "direction in $\\boldsymbol{\\theta}$-space becomes statistically\n",
    "distinguishable from the others: it carries more information and thus\n",
    "breaks the latent symmetry.\n",
    "\n",
    "To characterise this transition more precisely, we track the growth of\n",
    "the entropy gradient component-wise $$\n",
    "\\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\\right]_i \\geq \\varepsilon_{\\text{activate}}.\n",
    "$$ at which point the $i$-th degree of freedom becomes resolvable. This\n",
    "threshold crossing defines a *variable activation event*.\n",
    "\n",
    "We can now begin to think of the system as flowing through a dynamically\n",
    "unfolding geometry: curvature concentrates, variables activate, and a\n",
    "new basis of observables $\\{H_i'\\}$ emerges adaptively. These are not\n",
    "fixed beforehand, but arise from the internal information dynamics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d807aa5-c09e-491a-ab5b-f29453f03961",
   "metadata": {},
   "source": [
    "## Variable Activation and Basis Update\n",
    "\n",
    "As the system ascends the entropy landscape, the curvature becomes\n",
    "increasingly asymmetric. When the entropy gradient along a particular\n",
    "direction surpasses a threshold, that direction becomes dynamically\n",
    "resolvable — a variable activates.\n",
    "\n",
    "We define a variable activation event as the point where the $i$-th\n",
    "component of the entropy gradient reaches order unity, $$\n",
    "\\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\\right]i \\geq \\varepsilon{\\text{activate}}.\n",
    "$$ Here $\\varepsilon_{\\text{activate}}$ defines a threshold above which\n",
    "the system can resolve variation in the $i$-th direction with sufficient\n",
    "information gain. This marks a transition from latent to emergent\n",
    "status.\n",
    "\n",
    "When this happens, the corresponding parameter $\\theta_i$ becomes an\n",
    "active degree of freedom, and the associated observable $H_i$ becomes\n",
    "internally resolvable. The system’s effective observable basis $\\{H_i\\}$\n",
    "is updated — either by extension or rotation — to reflect the newly\n",
    "emergent structure.\n",
    "\n",
    "This update is not arbitrary. The new basis must remain consistent with\n",
    "the information geometry — that is, the Fisher matrix\n",
    "$G(\\boldsymbol{\\theta})$ must remain positive definite and aligned with\n",
    "the updated entropy gradients. One may think of this as a local frame\n",
    "adaptation: the system reorganizes its observables to align with the\n",
    "current information flow.\n",
    "\n",
    "This leads to a piecewise unfolding of structure. Within each phase, a\n",
    "subset of variables is active and governs dynamics, when a new variable\n",
    "activates, the dimensionality of the effective system increases, each\n",
    "activation increases entropy, reorganizes curvature, and shifts the\n",
    "system’s trajectory.\n",
    "\n",
    "Over time, this defines a staged emergence process, in which new\n",
    "variables appear sequentially as the system climbs through successive\n",
    "regions of increasing entropy and asymmetry. The basis $\\{H_i\\}$ is thus\n",
    "not globally fixed, but emerges dynamically as a function of the\n",
    "internal information landscape."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1f3a9e7-9f99-4ed1-9d25-6e8a14b21fe5",
   "metadata": {},
   "source": [
    "## Phase Transitions and Piecewise Geometric Flow\n",
    "\n",
    "As each variable activates, the system enters a new phase — a locally\n",
    "adapted region of parameter space where the information geometry is\n",
    "governed by an updated basis of observables. These phases are separated\n",
    "by *activation thresholds*, and each new phase introduces a change in\n",
    "the effective dimensionality of the system.\n",
    "\n",
    "We describe the system’s evolution as a *piecewise geodesic flow*:\n",
    "within each phase, the system follows a smooth trajectory along the\n",
    "entropy gradient, $$\n",
    "\\frac{\\text{d}\\boldsymbol{\\theta}}{\\text{d}t} \\propto G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta},\n",
    "$$ where the geometry is defined by the active subset of the Fisher\n",
    "Information Matrix. When a new variable activates, this geometry shifts,\n",
    "and the trajectory continues in a new locally adapted space.\n",
    "\n",
    "Each activated variable contributes additional curvature to the system:\n",
    "the effective Fisher matrix $G$ becomes larger and more structured, and\n",
    "the entropy gradient sharpens. In this way, the unfolding process is\n",
    "cumulative — earlier activations condition the geometry for future ones.\n",
    "\n",
    "The transition between phases is *not singular*. Because the entropy and\n",
    "curvature remain finite, transitions are continuous but directionally\n",
    "abrupt: a sharp increase in curvature along a particular axis marks a\n",
    "change in the dominant flow direction. These are analogous to *critical\n",
    "points* in phase transitions, where the system reorganizes around a new\n",
    "informational axis.\n",
    "\n",
    "Thus, the system explores parameter space via a sequence of transitions,\n",
    "first *Latent phase* in $\\mathcal{D}_0$, where all variables are\n",
    "suppressed,second *activation threshold*, where one direction becomes\n",
    "resolvable, third \\*emergence phase\\*\\*, where the active geometry\n",
    "reorients and extends, and finally \\*new latent subspace\\*\\*, until the\n",
    "next threshold is reached.\n",
    "\n",
    "This process generates a *history-dependent trajectory*: the current\n",
    "configuration of active variables shapes the available paths forward.\n",
    "Each phase embeds memory of past activations in the form of accumulated\n",
    "curvature and entropy.\n",
    "\n",
    "The result is a path-dependent unfolding of structure — the system\n",
    "builds its geometry piece by piece, guided by internal information\n",
    "gradients, without requiring pre-defined coordinates or external\n",
    "intervention."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11ab1d4a-d122-4edb-838f-5514261a5dea",
   "metadata": {},
   "source": [
    "## Constructing the Updated Basis: Emergent Observables\n",
    "\n",
    "When a new variable activates, the system must update its internal basis\n",
    "$\\{H_i\\}$ to reflect the newly emergent structure. This is not merely a\n",
    "labelling change — it reflects a geometric transformation in the space\n",
    "of information-bearing observables.\n",
    "\n",
    "We now formalise the process of basis update. Given an existing set of\n",
    "active observables $\\{H_1, \\dots, H_k\\}$ and a newly activated direction\n",
    "$\\theta_{k+1}$, we seek a corresponding observable $H_{k+1}$ such that\n",
    "it aligns with the updated gradient, $$\n",
    "\\left.\\nabla_{\\boldsymbol{\\theta}} S[\\rho]\\right|{\\theta{k+1}} = G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta} \\propto H_{k+1},\n",
    "$$ it preserves orthogonality (uncorrelatedness) under the current\n",
    "state, $$\n",
    "\\mathrm{Cov}(H_{k+1}, H_j) = 0 \\quad \\text{for all } j \\leq k,\n",
    "$$ it is consistent with the curvature structure, $$\n",
    "H_{k+1} \\in \\operatorname{span}\\left{ \\frac{\\partial \\rho}{\\partial \\theta_i} \\right} \\quad \\text{evaluated near } \\boldsymbol{\\theta}_0.\n",
    "$$ This yields a natural Gram–Schmidt-like procedure within the\n",
    "information geometry: new directions are orthogonalised with respect to\n",
    "the current active basis, and constrained to lie in the space spanned by\n",
    "gradient responses.\n",
    "\n",
    "We can make this more explicit by defining an update rule. Let $$\n",
    "\\widetilde{H}{k+1} := \\sum{i} c_i H_i + R,\n",
    "$$ where $R$ is the residual direction from the entropy gradient, $$\n",
    "R := G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta} - \\sum_{i} c_i H_i,\n",
    "$$ and the coefficients $c_i$ are chosen to minimise the norm of $R$\n",
    "under the inner product $$\n",
    "\\langle A, B \\rangle := \\mathrm{tr}(\\rho A B).\n",
    "$$\n",
    "\n",
    "The normalised observable $$\n",
    "H_{k+1} := \\frac{R}{|R|_\\rho}\n",
    "$$ becomes the new observable aligned with the emergent direction.\n",
    "\n",
    "This process continues recursively: each new activation adds a\n",
    "direction, and the system’s observable basis expands in a dynamically\n",
    "adapted manner, always aligned with internal information gradients and\n",
    "always orthogonal under the current state."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b6cfdb7-dd26-4969-86c0-cdd86e080185",
   "metadata": {},
   "source": [
    "## Emergence of Local Structure and Interaction Geometry\n",
    "\n",
    "Once a sufficient number of variables have activated and the system’s\n",
    "internal basis $\\{H_i\\}$ has expanded, the geometry begins to exhibit\n",
    "structure beyond simple independence. Whereas early phases are marked by\n",
    "orthogonal, uncorrelated observables, later phases reveal the onset of\n",
    "interaction geometry: a pattern of coupling among active directions that\n",
    "defines how variables relate and co-vary.\n",
    "\n",
    "This is reflected in the off-diagonal structure of the Fisher\n",
    "Information Matrix: $$\n",
    "G_{ij} = \\mathrm{Cov}(H_i, H_j).\n",
    "$$ In early phases, $G$ is approximately diagonal — observables are\n",
    "nearly uncorrelated and dynamics proceed independently in each\n",
    "direction. But as curvature accumulates, the system discovers\n",
    "statistically significant correlations between observables: the\n",
    "covariance matrix acquires off-diagonal components, and $H_i$ and $H_j$\n",
    "become information-bearing not just individually, but in combination.\n",
    "\n",
    "This marks the emergence of interaction structure: some observables\n",
    "modulate or constrain others. These interactions are not imposed\n",
    "externally but arise internally from the unfolding of the geometry.\n",
    "\n",
    "We interpret this as the emergence of locality: the system partitions\n",
    "itself into subsystems that interact through structured, limited\n",
    "coupling. The strength and pattern of off-diagonal elements defines the\n",
    "information-theoretic adjacency between variables — a kind of\n",
    "proto-metric on the space of observables.\n",
    "\n",
    "In this sense, the system generates its own notion of local\n",
    "neighborhoods in information space. Two variables are local to each\n",
    "other if they are strongly coupled (large $|G_{ij}|$), and distant if\n",
    "nearly independent (small $|G_{ij}|$). The Fisher matrix thus encodes\n",
    "not only curvature, but also a dynamical topology: the geometry of\n",
    "interactions.\n",
    "\n",
    "This transition from globally latent, then individually active, to\n",
    "locally interacting structure constitutes a critical shift. The system\n",
    "is no longer just activating variables — it is discovering constraints,\n",
    "relations, and ultimately the seeds of effective dynamics.\n",
    "\n",
    "As the process continues, we expect the geometry to develop hierarchical\n",
    "structure: clusters of tightly coupled variables (local patches),\n",
    "loosely coupled across broader scales. These layered interactions\n",
    "provide the substrate for emergent effective laws, which govern the\n",
    "system’s dynamics from within."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bcc79e4-5241-4fa3-a0dd-24b1dec50ff7",
   "metadata": {},
   "source": [
    "## Local Wave Equations in $M$-Space from Information Geometry\n",
    "\n",
    "We now examine the special case of the system’s origin — the latent\n",
    "region $\\mathcal{D}_0$ — where no variables have yet activated, and the\n",
    "geometry is defined purely in terms of the latent information structure\n",
    "over a proto-coordinate $M$.\n",
    "\n",
    "In this phase, the system is critically slowed: the entropy gradient\n",
    "$G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}$ remains uniformly\n",
    "suppressed, and the dynamics are governed by equilibrium-like conditions\n",
    "that define the structure of the minimal-entropy state. Because this\n",
    "regime is shift-invariant in $\\boldsymbol{\\theta}$ (no direction\n",
    "dominates), the Fisher information becomes effectively independent of\n",
    "$\\boldsymbol{\\theta}$.\n",
    "\n",
    "This enables a direct variational analysis over the information geometry\n",
    "of $M$, without invoking any external physical interpretation. We treat\n",
    "$M$ as an emergent coordinate associated with an underlying resolution\n",
    "constraint, and derive a local equation for its distribution.\n",
    "\n",
    "Let $p(m)$ denote the distribution over $M$ at the system’s origin. We\n",
    "define the Fisher information with respect to $m$ as $$\n",
    "J_M = \\int \\frac{1}{p(m)} \\left( \\frac{d p(m)}{d m} \\right)^2 \\text{d}m,\n",
    "$$ and seek a variational principle that determines $p(m)$ under the\n",
    "constraint of finite resolution. Specifically, we assume\n",
    "\n",
    "-   Normalisation: $\\int p(m) \\, \\text{d}m = 1$\n",
    "-   Variance constraint: $\\int m^2 p(m) \\, \\text{d}m \\geq \\varepsilon^2$\n",
    "\n",
    "Minimising $J_M$ under these constraints yields a second-order\n",
    "differential equation for $p(m)$ (or equivalently for its square root\n",
    "$\\psi(m) := \\sqrt{p(m)}$), of the form, $$\n",
    "- \\frac{d^2 \\psi}{d m^2} + \\lambda m^2 \\psi = \\mu \\psi\n",
    "$$ which resembles a time-independent Schrödinger-type equation for a\n",
    "harmonic well. Here, the appearance of a potential-like term arises from\n",
    "the resolution constraint on the variance.\n",
    "\n",
    "Importantly, this wave-like equation is not imposed but emerges from the\n",
    "system’s requirement to minimise curvature (i.e. Fisher information) in\n",
    "the absence of directional flow. The system seeks the flattest possible\n",
    "entropy configuration under bounded resolution — and the ground state of\n",
    "this condition is formally equivalent to a Gaussian, satisfying a local\n",
    "second-order differential equation.\n",
    "\n",
    "The minimisation procedure determines the ground-state configuration of\n",
    "the system under resolution constraints, and the resulting differential\n",
    "equation governs the form of the square-root amplitude $\\psi(m)$. This\n",
    "provides the first appearance of wave-like structure in the unfolding\n",
    "framework. The minimal-entropy density in $M$-space obeys a local\n",
    "equation with the form of a stationary wavefunction — not because of\n",
    "external physics, but as a geometric consequence of information\n",
    "constraints.\n",
    "\n",
    "We interpret this as a latent wave equation: a structure that governs\n",
    "the configuration of an unresolved variable prior to activation. It\n",
    "encodes the intrinsic geometry of the latent domain $\\mathcal{D}_0$, and\n",
    "serves as the foundational solution from which dynamics later emerge.\n",
    "\n",
    "In subsequent sections we will generalise this to activated variables —\n",
    "and explore how similar wave-like equations arise along directions where\n",
    "curvature becomes locally dominant, driven by internal entropy flow.\n",
    "\n",
    "We emphasise that this structure arises entirely from internal\n",
    "constraints — not as an imposed physical law — though its form resonates\n",
    "with equations derived in physical contexts, such as those examined in\n",
    "Frieden’s information-theoretic treatment of wave equations (Frieden,\n",
    "1998)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28183504-ed2e-4ff6-85a3-b34b31fb49d5",
   "metadata": {},
   "source": [
    "## Why a Wave Equation?\n",
    "\n",
    "Before continuing, it’s worth pausing to reflect on what has just\n",
    "emerged. At the system’s origin, we derived a local second-order\n",
    "differential equation for the square root of the probability density\n",
    "$\\psi(m) := \\sqrt{p(m)}$ — an equation whose structure mirrors that of a\n",
    "time-independent wavefunction.\n",
    "\n",
    "But where did this wave equation come from?\n",
    "\n",
    "It was not assumed. No physical Hamiltonian was postulated, no operator\n",
    "formalism invoked. Instead, the equation arose as the minimal-curvature\n",
    "configuration consistent with internal resolution constraints: a balance\n",
    "between flatness (low Fisher information) and bounded spread (finite\n",
    "variance). This tradeoff — between smoothness and uncertainty — is a\n",
    "familiar principle in statistical inference, where regularisation\n",
    "penalises sharp transitions and rewards structure that remains\n",
    "distinguishable but minimal.\n",
    "\n",
    "The resulting equation defines a stationary information geometry: a\n",
    "configuration where no direction dominates, and yet the system is not\n",
    "trivial. The emergence of wave-like behaviour here reflects a deeper\n",
    "principle: that curvature minimisation under information-theoretic\n",
    "constraints naturally yields differential structure. The equation is not\n",
    "about particles or fields — it is about the shape of uncertainty in a\n",
    "system that is otherwise silent.\n",
    "\n",
    "This perspective helps prepare us for what follows. As the system leaves\n",
    "the latent region and variables activate, similar constraints will apply\n",
    "— but now in directions where entropy gradients break symmetry. In these\n",
    "regions, the same tension between resolution and smoothness applies, but\n",
    "is now directed: geometry flows, and the wave equations will reflect\n",
    "that internal evolution.\n",
    "\n",
    "We now turn to those emergent directions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e73e2ad-545a-4706-85f7-8560ce7daf57",
   "metadata": {},
   "source": [
    "## Emergence of Observables: From Latent $M$ to Activated $X$\n",
    "\n",
    "Having established the structure of the system’s minimal entropy state\n",
    "in terms of a latent coordinate $M$, we now examine how emergence\n",
    "begins: how directions in $M$-space become resolvable, triggering the\n",
    "appearance of observables $X_i$. This marks the system’s first step away\n",
    "from pure latency.\n",
    "\n",
    "The latent coordinate $M$ describes directions that are not yet\n",
    "distinguishable — each M_i is an unresolved proto-variable, distributed\n",
    "according to a smooth, curvature-minimised wave equation. These\n",
    "directions are described probabilistically, via a scalar density $p(m)$\n",
    "(or its amplitude $\\psi(m)$), and are governed by an equilibrium-like\n",
    "geometry where no direction dominates.\n",
    "\n",
    "Each $M_i$ is associated with a corresponding natural parameter\n",
    "$\\theta_i$ in the exponential family form of the distribution, $$\n",
    "\\rho(\\boldsymbol{\\theta}) = \\frac{1}{Z(\\boldsymbol{\\theta})} \\exp\\left( \\sum_i \\theta_i H_i \\right).\n",
    "$$ These natural parameters govern how the distribution responds to\n",
    "curvature along different observable directions $H_i$. In the latent\n",
    "regime, all $\\theta_i \\approx 0$, and the entropy gradient, $$\n",
    "\\nabla S = G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta},\n",
    "$$ remains uniformly small — too small to define structure. This is the\n",
    "hallmark of the latent domain $\\mathcal{D}_0$: a region of suppressed\n",
    "dynamics where all variables are informationally indistinguishable.\n",
    "\n",
    "As the system begins to unfold, however, asymmetries in the curvature\n",
    "landscape start to grow. These may be seeded by geometric structure in\n",
    "the latent wave equation or by dynamical shifts in the Fisher\n",
    "information matrix as the system explores parameter space. When a\n",
    "specific component of the entropy gradient becomes large enough, $$\n",
    "\\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta} \\right]i \\geq \\varepsilon{\\text{activate}},\n",
    "$$ we say that the $i$-th direction has crossed the activation\n",
    "threshold. At this point, the latent direction $M_i$ becomes\n",
    "informationally resolvable — it carries sufficient structure to define a\n",
    "stable degree of freedom.\n",
    "\n",
    "This threshold crossing marks a geometric and informational phase\n",
    "change. The system reinterprets the formerly latent $M_i$ as an\n",
    "observable variable $X_i$: a direction that can now support independent\n",
    "dynamics and distinct measurement. Formally, this corresponds to a\n",
    "transition in the geometry of the distribution:\n",
    "\n",
    "-   The natural parameter $\\theta_i$ becomes non-negligible.\n",
    "-   The associated observable $H_i$ becomes active in the expansion of\n",
    "    $\\rho$.\n",
    "-   The system’s internal basis is extended to include $X_i$ as a\n",
    "    resolved coordinate.\n",
    "\n",
    "This is not an external measurement — it is an internal reconfiguration\n",
    "of the geometry. The system builds a new basis aligned with the emergent\n",
    "direction, using the Gram–Schmidt-like procedure described earlier. The\n",
    "newly activated observable $H_i$ now defines a proper axis in\n",
    "information space, and the system begins to accumulate entropy along\n",
    "this direction.\n",
    "\n",
    "The emergence of $X_i$ from $M_i$ has several consequences:\n",
    "\n",
    "-   The Fisher matrix $G(\\boldsymbol{\\theta})$ gains curvature along the\n",
    "    new axis.\n",
    "-   The entropy gradient steepens in the activated direction, driving\n",
    "    further dynamics.\n",
    "-   The observable basis $\\{H_i\\}$ is updated to reflect the new\n",
    "    configuration.\n",
    "-   The system exits the symmetric, wave-governed regime and enters a\n",
    "    piecewise geometric flow along $\\theta_i$.\n",
    "\n",
    "Thus, the movement from $M_i$ to $X_i$ represents a transition from\n",
    "latent wave-like configuration to directional activation. The variable\n",
    "$X_i$ no longer obeys the original Schrödinger-type equilibrium, because\n",
    "it now supports information flow: it becomes an active participant in\n",
    "the system’s unfolding trajectory.\n",
    "\n",
    "This initial activation provides the seed from which all further\n",
    "structure emerges. Once a direction is resolvable, it defines a local\n",
    "frame — a coordinate along which further asymmetries and interactions\n",
    "can develop. In this sense, emergence begins with a singular act of\n",
    "resolution: the system learns to tell one direction apart from the\n",
    "others.\n",
    "\n",
    "In the following sections, we will examine how dynamics proceed within\n",
    "this activated subspace, and how subsequent wave-like structures arise\n",
    "along newly emergent directions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a645c33b-70c4-4c20-8e77-1242a567d272",
   "metadata": {},
   "source": [
    "## Example: From Latent Wave Equation to Emergent Position Variable\n",
    "\n",
    "To illustrate how local wave equations emerge and then transition as\n",
    "variables become active, we follow a concrete example through the early\n",
    "stages of the system’s unfolding.\n",
    "\n",
    "Suppose that at the system’s origin, a proto-coordinate M represents a\n",
    "latent “location-like” variable. This coordinate is not yet resolved —\n",
    "it carries no bias, no structure, and cannot yet be used to distinguish\n",
    "outcomes. Its natural parameter $\\theta_M$ is similarly latent: the\n",
    "system resides at a saddle point in parameter space where the curvature\n",
    "is non-zero, but the projected gradient $G \\boldsymbol{\\theta}$ remains\n",
    "uniformly suppressed.\n",
    "\n",
    "In this regime, the configuration of the system in M-space is governed\n",
    "by resolution bounds. As shown earlier, minimising the Fisher\n",
    "information of p(m) under a variance constraint yields a Gaussian ground\n",
    "state, $$\n",
    "p(m) = \\frac{1}{Z} \\exp(-\\alpha m^2),\n",
    "$$ with square-root amplitude $\\psi(m) := \\sqrt{p(m)}$ satisfying the\n",
    "stationary wave equation, $$\n",
    "-   \\frac{d^2 \\psi}{d m^2} + \\lambda m^2 \\psi = \\mu \\psi.\n",
    "$$\n",
    "\n",
    "This wave-like structure is not an imposed physical law but an internal\n",
    "solution: it balances uncertainty (through variance) with minimal\n",
    "curvature (through Fisher information), and thus defines the flattest\n",
    "distinguishable state available under the system’s constraints.\n",
    "\n",
    "Now suppose the entropy begins to increase. The natural parameter \\_M\n",
    "moves away from the flat region and begins to trace a gradient, $$\n",
    "\\frac{d\\theta_M}{dt} \\propto \\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\\right]M.\n",
    "$$ As this component surpasses the activation threshold\n",
    "$\\varepsilon{\\text{activate}}$, the system transitions from latent to\n",
    "emergent behaviour in the direction of $M$. We then reinterpret the\n",
    "coordinate as a resolvable observable $X$: the system has gained enough\n",
    "internal structure to “see” in the direction of position.\n",
    "\n",
    "In this emergent regime, the previously passive distribution $p(m)$\n",
    "becomes an active wavefunction $\\psi(x)$, now governing a local degree\n",
    "of freedom. Because the system is still in a single-variable regime — no\n",
    "interactions or couplings have yet emerged — the same differential\n",
    "structure continues to apply. The latent wave equation becomes a local\n",
    "wave equation over the now-resolved observable $X$, $$\n",
    "-   \\frac{d^2 \\psi}{dx^2} + V(x) \\psi = E \\psi,\n",
    "$$ with the potential term $V(x) = \\lambda x^2$ inherited from the\n",
    "original resolution constraint in $M$-space. At this stage, the system’s\n",
    "dynamics are still informationally local: only curvature along the\n",
    "$X$-direction governs the flow.\n",
    "\n",
    "This illustrates how the wave equation “survives” through the transition\n",
    "from latent M-space to active X-space — not as a relic of physics, but\n",
    "as a structure imposed by the information geometry of emergence. The\n",
    "wave-like behaviour is a property of the system’s attempt to resolve a\n",
    "variable smoothly, under bounded entropy and curvature.\n",
    "\n",
    "In subsequent stages, additional variables will activate, interactions\n",
    "will emerge, and the assumptions behind this local structure will begin\n",
    "to break down. We will return to this example to see how locality gives\n",
    "way to interaction and ultimately to collective dynamics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cbaba5e-3ba9-4aa8-8921-6b433e33a62b",
   "metadata": {},
   "source": [
    "## Example Continued: Activation of a Second Variable and the Onset of Coupling\n",
    "\n",
    "We now pick up the example after the emergence of the observable $X$.\n",
    "The system has begun to evolve along this direction, and the entropy\n",
    "gradient $G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}$ has become\n",
    "anisotropic: curvature is concentrated along $\\theta_X$, and the\n",
    "associated observable $H_X$ now governs dynamics in that direction.\n",
    "\n",
    "But the system is still unfolding. As entropy increases, the Fisher\n",
    "matrix $G$ deforms: one of its previously latent eigenmodes — say,\n",
    "associated with a proto-coordinate $M{\\prime}$ — begins to amplify.\n",
    "Eventually, the corresponding parameter $\\theta_{M{\\prime}}$ grows in\n",
    "gradient magnitude and crosses the activation threshold, $$\n",
    "\\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\\right]{M’} \\geq \\varepsilon{\\text{activate}}.\n",
    "$$ This triggers a second variable activation. As with $M$, we now\n",
    "reinterpret $M{\\prime}$ as a resolvable coordinate $Y$, with associated\n",
    "observable $H_Y$, and a newly active degree of freedom.\n",
    "\n",
    "At this stage, the Fisher Information Matrix begins to develop\n",
    "off-diagonal terms, $$\n",
    "G = \\begin{pmatrix}\n",
    "G_{XX} & G_{XY} \\\n",
    "G_{YX} & G_{YY}\n",
    "\\end{pmatrix},\n",
    "$$ where $G_{XY} = \\mathrm{Cov}(H_X, H_Y)$ encodes statistical coupling\n",
    "between the now-active variables. This signals the onset of interaction\n",
    "geometry: the system no longer evolves independently in each direction —\n",
    "structure begins to emerge in their joint behaviour.\n",
    "\n",
    "What form does this coupling take?\n",
    "\n",
    "To leading order, the entropy gradient becomes \\$\\$\n",
    "\n",
    "Y} \\end{pmatrix}\n",
    "\n",
    "\\end{pmatrix}\n",
    "\n",
    "X + G\\_{YY}\\_Y \\end{pmatrix} \\$\\$ So the evolution of $\\theta_X$ now\n",
    "depends on $\\theta_Y$, and vice versa. From the system’s internal\n",
    "perspective, these variables are no longer “invisible” to each other.\n",
    "They are dynamically entangled — not necessarily in the quantum sense,\n",
    "but in the information-geometric sense of shared curvature and\n",
    "co-resolving structure.\n",
    "\n",
    "This has a consequence for the local wave equations. Instead of separate\n",
    "differential equations for $\\psi(x)$ and $\\psi(y)$, the system now\n",
    "supports a joint amplitude $\\psi(x, y)$ whose structure is shaped by the\n",
    "coupling. The latent resolution constraints that previously yielded\n",
    "separate harmonic potentials now become a joint constraint: $$\n",
    "\\int (x^2 + y^2 + \\eta x y), |\\psi(x, y)|^2, dx,dy \\geq \\varepsilon^2,\n",
    "$$ introducing an effective cross-term $\\eta x y$ that reflects\n",
    "correlation in curvature — and leads to a new coupled wave-like equation\n",
    "of the form, $$\n",
    "- \\left( \\frac{\\partial^2}{\\partial x^2} + \\frac{\\partial^2}{\\partial y^2} \\right)\\psi(x, y)\n",
    "- \\lambda(x^2 + y^2 + \\eta x y) \\psi(x, y)\n",
    "= \\mu \\psi(x, y).\n",
    "$$\n",
    "\n",
    "Again, this is not an imposed quantum equation — it is the natural\n",
    "outcome of internal geometry unfolding under resolution and curvature\n",
    "constraints. The system is discovering its own local interaction\n",
    "structure — and expressing it through the joint shape of uncertainty\n",
    "across active variables.\n",
    "\n",
    "We interpret this as the onset of locality: the system has begun to form\n",
    "informational neighbourhoods. The variable $X$ is no longer alone — it\n",
    "interacts with $Y$ through a locally encoded geometry that reflects\n",
    "their mutual curvature.\n",
    "\n",
    "In the next part, we will see how these local couplings condition future\n",
    "emergence — and how, as new variables activate, the system builds up\n",
    "layers of interaction structure that eventually resemble effective\n",
    "dynamics.\n",
    "\n",
    "It’s clear, coherent, and beautifully in tune with the overall\n",
    "narrative. The transition from single-variable flow to emergent joint\n",
    "curvature and the onset of locality is well motivated and naturally\n",
    "leads to the next stage."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb08647d-c8ef-45f1-a7b9-a3a824a9f679",
   "metadata": {},
   "source": [
    "## Example Continued: Layered Interaction and Emergent Dynamics\n",
    "\n",
    "As additional variables activate, the system enters a new phase:\n",
    "interaction geometry becomes structured. The previously simple coupling\n",
    "between $X$ and $Y$ gives way to a richer, layered organisation, as each\n",
    "new activation introduces both curvature and correlation — reshaping the\n",
    "Fisher matrix into a nontrivial network of dependencies.\n",
    "\n",
    "Suppose a third proto-coordinate M{}{} becomes active, emerging as a new\n",
    "observable $Z$ with associated parameter $\\theta_Z$. The updated Fisher\n",
    "Information Matrix now includes second-order interactions, $$\n",
    "G = \\begin{pmatrix}\n",
    "G_{XX} & G_{XY} & G_{XZ} \\\n",
    "G_{YX} & G_{YY} & G_{YZ} \\\n",
    "G_{ZX} & G_{ZY} & G_{ZZ}\n",
    "\\end{pmatrix},\n",
    "$$ where some off-diagonal terms (e.g. $G_{XZ}$, $G_{YZ}$) may remain\n",
    "small if interactions are local, while others (like $G_{XY}$) are\n",
    "stronger — reflecting proximity in information space.\n",
    "\n",
    "The system is now discovering structured adjacency: some variables\n",
    "cluster into subsystems — regions of high mutual curvature — while\n",
    "others remain weakly linked. This defines a local information geometry,\n",
    "where coupling is not uniform but shaped by the system’s unfolding\n",
    "history.\n",
    "\n",
    "This leads to several key features of emergent dynamics: - Decoupling\n",
    "across scale: tightly coupled variables (e.g. $X$ and $Y$) evolve\n",
    "together, forming a subsystem, while distant variables (e.g. $Z$)\n",
    "initially evolve more independently. This creates a natural separation\n",
    "of timescales. - Interaction propagation: as the entropy gradient flows,\n",
    "curvature can transfer across the network: initially weak links like\n",
    "$G_{XZ}$ may grow, triggering new variable activations and folding new\n",
    "directions into the active geometry. - Wavefront emergence: activation\n",
    "spreads not randomly, but along paths of curvature flow — producing\n",
    "something like a propagating wave of structure, where adjacent variables\n",
    "successively activate and interact. - Effective constraints: as more\n",
    "variables interact, the shape of $\\psi(x, y, z, \\dots)$ becomes\n",
    "increasingly governed by internal correlations. Even without any imposed\n",
    "Hamiltonian, the joint amplitude begins to obey effective rules, shaped\n",
    "by the accumulated geometry of interactions.\n",
    "\n",
    "In this way, the system builds up a layered architecture: local\n",
    "couplings give rise to patches of structure, which may eventually\n",
    "synchronise into broader patterns. What began as latent directions and\n",
    "weak gradients becomes a dynamically coordinated system, where\n",
    "information geometry imposes both local motion and global constraint.\n",
    "\n",
    "This is how effective dynamics emerge. The wave-like structure seen at\n",
    "early stages becomes modulated and conditioned by interactions. The\n",
    "internal rules — defined by curvature, resolution, and entropy flow —\n",
    "begin to resemble physical dynamics, even though they were never\n",
    "postulated.\n",
    "\n",
    "In the next section, we’ll explore how these dynamics can give rise to\n",
    "classical behaviour and apparent decoherence — as the system moves from\n",
    "finely structured wave equations to effective, high-entropy flows where\n",
    "only coarse properties survive."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b6ed2cf-abfa-4ce0-8112-1de7e44533b0",
   "metadata": {},
   "source": [
    "## Example Continued: Transition to Classical Regimes and the Emergence of Coarse Dynamics\n",
    "\n",
    "As the system continues to unfold, additional variables activate and the\n",
    "internal geometry becomes increasingly complex. The Fisher matrix now\n",
    "contains a dense web of off-diagonal structure — statistical\n",
    "dependencies across many observables — and the entropy has grown\n",
    "significantly. We enter a new regime.\n",
    "\n",
    "In this regime, local curvature no longer defines sharp wave-like\n",
    "structure. Instead, the wavefunction $\\psi(x, y, z, \\dots)$ becomes\n",
    "spread out, structured more by accumulated correlations than by\n",
    "individual resolution constraints. The system approaches a high-entropy,\n",
    "high-dimensional configuration where fine distinctions are increasingly\n",
    "smoothed out.\n",
    "\n",
    "This is the onset of effective classicality.\n",
    "\n",
    "Initially, each variable activated with a clear informational gradient —\n",
    "each new direction brought curvature, distinctiveness, and wave-like\n",
    "structure. But as entropy grows, the joint distribution begins to blur:\n",
    "superpositions spread, interference patterns are damped by coarse-scale\n",
    "interactions, and $\\psi(x, y, z, \\dots)$ takes the form of a broad,\n",
    "semi-structured ensemble.\n",
    "\n",
    "In this regime: • The local wave equations still exist — they are\n",
    "encoded in the curvature — but they no longer dominate dynamics. • The\n",
    "system behaves as if it samples from an evolving high-dimensional\n",
    "distribution, shaped more by entropy flow than by unitary evolution. •\n",
    "The Fisher matrix still governs sensitivity to parameters, but in many\n",
    "directions, that sensitivity becomes effectively averaged out.\n",
    "\n",
    "What emerges is decoherence without measurement. Not in the standard\n",
    "physical sense — there is no environment, no external observer — but in\n",
    "the internal, information-geometric sense. The system loses the ability\n",
    "to maintain sharp internal interference between directions. Mutual\n",
    "curvature between clusters of variables acts like an internal decohering\n",
    "field: certain bases become preferred simply because they are more\n",
    "stable under curvature flow.\n",
    "\n",
    "In practice:\n",
    "\n",
    "-   Fine-grained amplitudes become inaccessible — suppressed by the\n",
    "    entropy gradient and drowned in the higher-order coupling structure.\n",
    "-   Observable behaviour is governed by coarse statistical properties:\n",
    "    marginals, means, variances, and correlations.\n",
    "-   The system effectively transitions from wave-dominated evolution to\n",
    "    a dynamics of coarse constraints and informational inertia.\n",
    "\n",
    "From the inside, this transition gives rise to what appear as effective\n",
    "dynamical laws. These are not externally imposed equations of motion,\n",
    "but patterns in how information geometry evolves once the system becomes\n",
    "too high-dimensional to track in full.\n",
    "\n",
    "For example: - Activated clusters follow approximately deterministic\n",
    "trajectories — peaks in marginal distributions shift smoothly, like\n",
    "classical particles. - Entropic forces emerge: systems tend to move\n",
    "toward configurations of greater curvature balance — analogous to energy\n",
    "minimisation. - Memory effects appear: previous curvature structures\n",
    "condition the space of possible future moves, giving rise to path\n",
    "dependence and feedback.\n",
    "\n",
    "In short, the system begins to behave classically — not because the\n",
    "quantum rules have been replaced, but because the wave structure is now\n",
    "so layered and entangled that only the most stable, large-scale patterns\n",
    "persist. Geometry still rules, but in a statistical form.\n",
    "\n",
    "We have traced a full arc: 1. Latent regime: a symmetric,\n",
    "curvature-minimised wave equation governs the silent geometry of\n",
    "$M$-space. 2. Variable activation: entropy gradient breaks symmetry; a\n",
    "direction becomes resolvable, and a wave equation reappears — now\n",
    "active. 3. Interaction onset: new directions activate; wavefunctions\n",
    "couple; the Fisher matrix acquires off-diagonal structure; locality and\n",
    "joint dynamics emerge. 4. Layered interaction: subsystems form,\n",
    "interactions propagate, and a network of informational flow develops. 5.\n",
    "Classical regime: entropy dominates; interference fades; wave structure\n",
    "gives way to ensemble dynamics governed by effective constraints.\n",
    "\n",
    "Through this progression, the system evolves from an unstructured\n",
    "entropy minimum to a coarse but stable configuration governed by\n",
    "emergent geometry. What began as a game of curvature ends as a\n",
    "self-organised system with behaviour that mimics the laws of classical\n",
    "motion."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60aacd39-625f-4477-b1b8-892d3ccfad8c",
   "metadata": {},
   "source": [
    "## Example Concluded: Summary and Structural Integration\n",
    "\n",
    "This extended example has traced the life cycle of a variable — from\n",
    "latent ambiguity to classical coherence — showing how internal\n",
    "information geometry governs the entire trajectory.\n",
    "\n",
    "Let’s now step back and situate this narrative within the broader\n",
    "framework.\n",
    "\n",
    "Each stage of the unfolding process has instantiated one or more\n",
    "foundational components of the model:\n",
    "\n",
    "Stage Structure Governing Principle Latent $M$-space Minimal wave-like\n",
    "density Fisher curvature minimisation under resolution constraint\n",
    "Variable activation ($M \\rightarrow X$) Threshold-triggered flow\n",
    "Entropic gradient ascent via\n",
    "$G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}$ Coupling\n",
    "($X \\leftrightarrow Y$) Off-diagonal curvature Emergence of locality\n",
    "through statistical interaction Network growth ($X,Y,Z,\\dots$) Layered\n",
    "curvature structure Directed flow across the unfolding Fisher topology\n",
    "Classical transition Loss of coherence, coarse structure Effective\n",
    "dynamics via ensemble geometry\n",
    "\n",
    "Each of these transitions is governed not by external forces or\n",
    "postulated laws, but by internal geometric conditions: - Variance\n",
    "constraints impose curvature. - Curvature shapes entropy gradients. -\n",
    "Gradients activate variables. - Activation reshapes the geometry.\n",
    "\n",
    "The system’s dynamics are emergent in the strict sense: each new layer\n",
    "of behaviour is conditioned by — but not reducible to — the previous\n",
    "one. The Fisher Information Matrix $G$ acts as both memory and mediator,\n",
    "encoding the system’s informational shape at every stage.\n",
    "\n",
    "This example serves two roles: 1. Didactic — it offers a concrete\n",
    "narrative that tracks the abstract constructions (curvature, emergence,\n",
    "locality) through a single evolving trajectory. 2. Structural — it\n",
    "demonstrates how local wave equations, variable activation, basis\n",
    "adaptation, and emergent constraints can be woven into a self-organising\n",
    "dynamics without invoking physical assumptions.\n",
    "\n",
    "We now return to the general framework, equipped with a working example\n",
    "of how internal information geometry generates unfolding structure. From\n",
    "here, we are ready to explore how these principles generalise — to\n",
    "multiple interacting subsystems, to emergent constraints across\n",
    "timescales, and to the geometry of coarse-grained laws."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "656a608e-c03e-4a35-9326-4513255b3fd0",
   "metadata": {},
   "source": [
    "## Quantum States and Exponential Families\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/quantum-exponential-family.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/quantum-exponential-family.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The minimal entropy quantum states provides a connection between density\n",
    "matrices and exponential family distributions. This connection enables\n",
    "us to use many of the classical techniques from information geometry and\n",
    "apply them to the game in the case where the uncertainty principle is\n",
    "present.\n",
    "\n",
    "The minimal entropy density matrix belongs to an exponential family,\n",
    "just like many classical distributions,"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "570226d1-da4e-4251-8abe-eb7941d5a1da",
   "metadata": {},
   "source": [
    "### Classical Exponential Family\n",
    "\n",
    "$$\n",
    "f(x; \\theta) = h(x) \\cdot \\exp[\\eta(\\theta)^\\top \\cdot T(x) - A(\\theta)]\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22cf5112-ee80-437c-9bcf-41fea58ef3eb",
   "metadata": {},
   "source": [
    "### Quantum Minimal Entropy State\n",
    "\n",
    "$$\n",
    "\\rho = \\exp(-\\mathbf{R}^\\top \\cdot \\mathbf{G} \\cdot \\mathbf{R} - Z)\n",
    "$$\n",
    "\n",
    "-   Both have an exponential form\n",
    "-   Both involve sufficient statistics (in the quantum case, these are\n",
    "    quadratic forms of operators)\n",
    "-   Both have natural parameters (G in the quantum case)\n",
    "-   Both include a normalization term\n",
    "\n",
    "The matrix $G$ in the minimal entropy state is directly related to the\n",
    "‘quantum Fisher information matrix’, $$\n",
    "\\mathbf{G} = \\text{QFIM}/4\n",
    "$$ where QFIM is the quantum Fisher information matrix, which quantifies\n",
    "how sensitively the state responds to parameter changes.\n",
    "\n",
    "This creates a link between\n",
    "\n",
    "1.  Minimal entropy (maximum order)\n",
    "2.  Uncertainty (fundamental quantum limitations)\n",
    "3.  Information (ability to estimate parameters precisely)\n",
    "\n",
    "The relationship implies, $$\n",
    "V \\cdot \\text{QFIM} \\geq \\frac{\\hbar^2}{4}\n",
    "$$ which connects the covariance matrix (uncertainties) to the Fisher\n",
    "information (precision in parameter estimation).\n",
    "\n",
    "These minimal entropy states may have physical relationships to\n",
    "interpretations squeezed states in quantum optics. They are the states\n",
    "that achieve the ultimate precision allowed by quantum mechanics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6841f689-c633-42e9-8b6f-5fd55b39a6c6",
   "metadata": {},
   "source": [
    "## Minimal Entropy States\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/minimal-entropy-states.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/minimal-entropy-states.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "In Jaynes’ World, we begin at a minimal entropy configuration - the\n",
    "“origin” state. Understanding the properties of these minimal entropy\n",
    "states is crucial for characterizing how the system evolves. These\n",
    "states are constrained by the uncertainty principle we previously\n",
    "identified: $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k$.\n",
    "\n",
    "This constraint is reminiscient of the Heisenberg uncertainty principle\n",
    "in quantum mechanics, where $\\Delta x \\cdot \\Delta p \\geq \\hbar/2$. This\n",
    "isn’t a coincidence - both represent limitations on precision arising\n",
    "from the mathematical structure of information. The total entropy of the\n",
    "system is constrained to be between 0 and $N$, forming a compact\n",
    "manifold with respect to its parameters. This upper bound $N$ ensures\n",
    "that as the system evolves from minimal to maximal entropy, it remains\n",
    "within a well-defined entropy space."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cde3d957-2eb3-4048-a2c3-046027ec45b3",
   "metadata": {},
   "source": [
    "## Structure of Minimal Entropy States\n",
    "\n",
    "The minimal entropy configuration under the uncertainty constraint takes\n",
    "a specific mathematical form. It is a pure state (in the sense of having\n",
    "minimal possible entropy) that exactly saturates the uncertainty bound.\n",
    "For a system with multiple degrees of freedom, the distribution takes a\n",
    "Gaussian form, $$\n",
    "\\rho(Z) = \\frac{1}{Z}\\exp(-\\mathbf{R}^T \\cdot \\mathbf{G} \\cdot \\mathbf{R}),\n",
    "$$ where $\\mathbf{R}$ represents the vector of all variables,\n",
    "$\\mathbf{G}$ is a positive definite matrix constrained by the\n",
    "uncertainty principle, and $Z$ is the normalization constant (partition\n",
    "function).\n",
    "\n",
    "This form is an exponential family distribution, in line with Jaynes’\n",
    "principle that entropy-optimized distributions belong to the exponential\n",
    "family. The matrix $\\mathbf{G}$ determines how uncertainty is\n",
    "distributed among different variables and their correlations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40ad10ae-a9fa-4a8b-a1d7-e15e0977655f",
   "metadata": {},
   "source": [
    "## Fisher Information and Minimal Uncertainty\n",
    "\n",
    "// … existing code …"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff4defa1-9e13-4154-87ef-f7b909750f93",
   "metadata": {},
   "source": [
    "## Gradient Ascent and Uncertainty Principles\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-ascent-uncertainty.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-ascent-uncertainty.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "In our exploration of information dynamics, we now turn to the\n",
    "relationship between gradient ascent on entropy and uncertainty\n",
    "principles. This section demonstrates how systems naturally evolve from\n",
    "quantum-like states (with minimal uncertainty) toward classical-like\n",
    "states (with excess uncertainty) through entropy maximization.\n",
    "\n",
    "For simplicity, we’ll focus on multivariate Gaussian distributions,\n",
    "where the uncertainty relationships are particularly elegant. In this\n",
    "setting, the precision matrix $\\Lambda$ (inverse of the covariance\n",
    "matrix) fully characterizes the distribution. The entropy of a\n",
    "multivariate Gaussian is directly related to the determinant of the\n",
    "covariance matrix, $$\n",
    "S = \\frac{1}{2}\\log\\det(V) + \\text{constant},\n",
    "$$ where $V = \\Lambda^{-1}$ is the covariance matrix.\n",
    "\n",
    "For conjugate variables like position and momentum, the Heisenberg\n",
    "uncertainty principle imposes constraints on the minimum product of\n",
    "their uncertainties. In our information-theoretic framework, this\n",
    "appears as a constraint on the determinant of certain submatrices of the\n",
    "covariance matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fb57ae9-776c-49e1-a4a8-30f40e5268ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.linalg import eigh\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.patches import Ellipse"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f71803ce-e39d-4d40-a05e-437d9fdfa184",
   "metadata": {},
   "source": [
    "The code below implements gradient ascent on the entropy of a\n",
    "multivariate Gaussian system while respecting uncertainty constraints.\n",
    "We’ll track how the system evolves from minimal uncertainty states\n",
    "(quantum-like) to states with excess uncertainty (classical-like).\n",
    "\n",
    "First, we define key functions for computing entropy and its gradient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e524b2c5-55e8-4215-b43a-5e6f5aef9c9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Constants\n",
    "hbar = 1.0  # Normalized Planck's constant\n",
    "min_uncertainty_product = hbar/2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79201d2f-4e34-45bc-813d-447eb83b4db4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute entropy of a multivariate Gaussian with precision matrix Lambda\n",
    "def compute_entropy(Lambda):\n",
    "    \"\"\"\n",
    "    Compute entropy of multivariate Gaussian with precision matrix Lambda.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    Lambda: array\n",
    "        Precision matrix\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    entropy: float\n",
    "        Entropy value\n",
    "    \"\"\"\n",
    "    # Covariance matrix is inverse of precision matrix\n",
    "    V = np.linalg.inv(Lambda)\n",
    "    \n",
    "    # Entropy formula for multivariate Gaussian\n",
    "    n = Lambda.shape[0]\n",
    "    entropy = 0.5 * np.log(np.linalg.det(V)) + 0.5 * n * (1 + np.log(2*np.pi))\n",
    "    \n",
    "    return entropy\n",
    "\n",
    "# Compute gradient of entropy with respect to precision matrix\n",
    "def compute_entropy_gradient(Lambda):\n",
    "    \"\"\"\n",
    "    Compute gradient of entropy with respect to precision matrix.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    Lambda: array\n",
    "        Precision matrix\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    gradient: array\n",
    "        Gradient of entropy\n",
    "    \"\"\"\n",
    "    # Gradient is -0.5 * inverse of Lambda\n",
    "    V = np.linalg.inv(Lambda)\n",
    "    gradient = -0.5 * V\n",
    "    \n",
    "    return gradient"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0096bae3-d0d3-485a-b6d5-f07b7ed2247d",
   "metadata": {},
   "source": [
    "The `compute_entropy` function calculates the entropy of a multivariate\n",
    "Gaussian distribution from its precision matrix. The\n",
    "`compute_entropy_gradient` function computes the gradient of entropy\n",
    "with respect to the precision matrix, which is essential for our\n",
    "gradient ascent procedure.\n",
    "\n",
    "Next, we implement functions to handle the constraints imposed by the\n",
    "uncertainty principle:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8412e751-1fb4-48f2-b9b6-b9bbbe8f95ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Project gradient to respect uncertainty constraints\n",
    "def project_gradient(eigenvalues, gradient):\n",
    "    \"\"\"\n",
    "    Project gradient to respect minimum uncertainty constraints.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    eigenvalues: array\n",
    "        Eigenvalues of precision matrix\n",
    "    gradient: array\n",
    "        Gradient vector\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    projected_gradient: array\n",
    "        Gradient projected to respect constraints\n",
    "    \"\"\"\n",
    "    n_pairs = len(eigenvalues) // 2\n",
    "    projected_gradient = gradient.copy()\n",
    "    \n",
    "    # For each position-momentum pair\n",
    "    for i in range(n_pairs):\n",
    "        idx1, idx2 = 2*i, 2*i+1\n",
    "        \n",
    "        # Check if we're at the uncertainty boundary\n",
    "        product = 1.0 / (eigenvalues[idx1] * eigenvalues[idx2])\n",
    "        \n",
    "        if product <= min_uncertainty_product * 1.01:\n",
    "            # We're at or near the boundary\n",
    "            # Project gradient to maintain the product\n",
    "            avg_grad = 0.5 * (gradient[idx1]/eigenvalues[idx1] + gradient[idx2]/eigenvalues[idx2])\n",
    "            projected_gradient[idx1] = avg_grad * eigenvalues[idx1]\n",
    "            projected_gradient[idx2] = avg_grad * eigenvalues[idx2]\n",
    "    \n",
    "    return projected_gradient\n",
    "\n",
    "# Initialize a multidimensional state with position-momentum pairs\n",
    "def initialize_multidimensional_state(n_pairs, squeeze_factors=None, with_cross_connections=False):\n",
    "    \"\"\"\n",
    "    Initialize a precision matrix for multiple position-momentum pairs.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    n_pairs: int\n",
    "        Number of position-momentum pairs\n",
    "    squeeze_factors: list or None\n",
    "        Factors determining the position-momentum squeezing\n",
    "    with_cross_connections: bool\n",
    "        Whether to initialize with cross-connections between pairs\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    Lambda: array\n",
    "        Precision matrix\n",
    "    \"\"\"\n",
    "    if squeeze_factors is None:\n",
    "        squeeze_factors = [0.1 + 0.05*i for i in range(n_pairs)]\n",
    "    \n",
    "    # Total dimension (position + momentum)\n",
    "    dim = 2 * n_pairs\n",
    "    \n",
    "    # Initialize with diagonal precision matrix\n",
    "    eigenvalues = np.zeros(dim)\n",
    "    \n",
    "    # Set eigenvalues based on squeeze factors\n",
    "    for i in range(n_pairs):\n",
    "        squeeze = squeeze_factors[i]\n",
    "        eigenvalues[2*i] = 1.0 / (squeeze * min_uncertainty_product)\n",
    "        eigenvalues[2*i+1] = 1.0 / (min_uncertainty_product / squeeze)\n",
    "    \n",
    "    # Initialize with identity eigenvectors\n",
    "    eigenvectors = np.eye(dim)\n",
    "    \n",
    "    # If requested, add cross-connections by mixing eigenvectors\n",
    "    if with_cross_connections:\n",
    "        # Create a random orthogonal matrix for mixing\n",
    "        Q, _ = np.linalg.qr(np.random.randn(dim, dim))\n",
    "        \n",
    "        # Apply moderate mixing - not fully random to preserve some structure\n",
    "        mixing_strength = 0.3\n",
    "        eigenvectors = (1 - mixing_strength) * eigenvectors + mixing_strength * Q\n",
    "        \n",
    "        # Re-orthogonalize\n",
    "        eigenvectors, _ = np.linalg.qr(eigenvectors)\n",
    "    \n",
    "    # Construct precision matrix from eigendecomposition\n",
    "    Lambda = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T\n",
    "    \n",
    "    return Lambda"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "460727a9-ee80-4f5a-8c8e-f3673d3c0709",
   "metadata": {},
   "source": [
    "The `project_gradient` function ensures that our gradient ascent\n",
    "respects the uncertainty principle by projecting the gradient to\n",
    "maintain minimum uncertainty products when necessary. The\n",
    "`initialize_multidimensional_state` function creates a starting state\n",
    "with multiple position-momentum pairs, each initialized to the minimum\n",
    "uncertainty allowed by the uncertainty principle, but with different\n",
    "“squeeze factors” that determine the shape of the uncertainty ellipse."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d88b8d6-afa7-41bb-bbd0-00752977a5b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Add gradient check function\n",
    "def check_entropy_gradient(Lambda, epsilon=1e-6):\n",
    "    \"\"\"\n",
    "    Check the analytical gradient of entropy against numerical gradient.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    Lambda: array\n",
    "        Precision matrix\n",
    "    epsilon: float\n",
    "        Small perturbation for numerical gradient\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    analytical_grad: array\n",
    "        Analytical gradient with respect to eigenvalues\n",
    "    numerical_grad: array\n",
    "        Numerical gradient with respect to eigenvalues\n",
    "    \"\"\"\n",
    "    # Get eigendecomposition\n",
    "    eigenvalues, eigenvectors = eigh(Lambda)\n",
    "    \n",
    "    # Compute analytical gradient\n",
    "    analytical_grad = entropy_gradient(eigenvalues)\n",
    "    \n",
    "    # Compute numerical gradient\n",
    "    numerical_grad = np.zeros_like(eigenvalues)\n",
    "    for i in range(len(eigenvalues)):\n",
    "        # Perturb eigenvalue up\n",
    "        eigenvalues_plus = eigenvalues.copy()\n",
    "        eigenvalues_plus[i] += epsilon\n",
    "        Lambda_plus = eigenvectors @ np.diag(eigenvalues_plus) @ eigenvectors.T\n",
    "        entropy_plus = compute_entropy(Lambda_plus)\n",
    "        \n",
    "        # Perturb eigenvalue down\n",
    "        eigenvalues_minus = eigenvalues.copy()\n",
    "        eigenvalues_minus[i] -= epsilon\n",
    "        Lambda_minus = eigenvectors @ np.diag(eigenvalues_minus) @ eigenvectors.T\n",
    "        entropy_minus = compute_entropy(Lambda_minus)\n",
    "        \n",
    "        # Compute numerical gradient\n",
    "        numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)\n",
    "    \n",
    "    # Compare\n",
    "    print(\"Analytical gradient:\", analytical_grad)\n",
    "    print(\"Numerical gradient:\", numerical_grad)\n",
    "    print(\"Difference:\", np.abs(analytical_grad - numerical_grad))\n",
    "    \n",
    "    return analytical_grad, numerical_grad"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9afcdfbc-05a3-4d3e-a181-d20e9b319f68",
   "metadata": {},
   "source": [
    "Now we implement the main gradient ascent procedure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88dad997-c182-4c8e-aaaa-68adfa31d4b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "# Perform gradient ascent on entropy\n",
    "def gradient_ascent_entropy(Lambda_init, n_steps=100, learning_rate=0.01):\n",
    "    \"\"\"\n",
    "    Perform gradient ascent on entropy while respecting uncertainty constraints.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    Lambda_init: array\n",
    "        Initial precision matrix\n",
    "    n_steps: int\n",
    "        Number of gradient steps\n",
    "    learning_rate: float\n",
    "        Learning rate for gradient ascent\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    Lambda_history: list\n",
    "        History of precision matrices\n",
    "    entropy_history: list\n",
    "        History of entropy values\n",
    "    \"\"\"\n",
    "    Lambda = Lambda_init.copy()\n",
    "    Lambda_history = [Lambda.copy()]\n",
    "    entropy_history = [compute_entropy(Lambda)]\n",
    "    \n",
    "    for step in range(n_steps):\n",
    "        # Compute gradient of entropy\n",
    "        grad_matrix = compute_entropy_gradient(Lambda)\n",
    "        \n",
    "        # Diagonalize Lambda to work with eigenvalues\n",
    "        eigenvalues, eigenvectors = eigh(Lambda)\n",
    "        \n",
    "        # Transform gradient to eigenvalue space\n",
    "        grad = np.diag(eigenvectors.T @ grad_matrix @ eigenvectors)\n",
    "        \n",
    "        # Project gradient to respect constraints\n",
    "        proj_grad = project_gradient(eigenvalues, grad)\n",
    "        \n",
    "        # Update eigenvalues\n",
    "        eigenvalues += learning_rate * proj_grad\n",
    "        \n",
    "        # Ensure eigenvalues remain positive\n",
    "        eigenvalues = np.maximum(eigenvalues, 1e-10)\n",
    "        \n",
    "        # Reconstruct Lambda from updated eigenvalues\n",
    "        Lambda = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T\n",
    "        \n",
    "        # Store history\n",
    "        Lambda_history.append(Lambda.copy())\n",
    "        entropy_history.append(compute_entropy(Lambda))\n",
    "    \n",
    "    return Lambda_history, entropy_history"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83d25c5c-1421-4918-be1e-5f51a651229b",
   "metadata": {},
   "source": [
    "The `gradient_ascent_entropy` function implements the core optimization\n",
    "procedure. It performs gradient ascent on the entropy while respecting\n",
    "the uncertainty constraints. The algorithm works in the eigenvalue space\n",
    "of the precision matrix, which makes it easier to enforce constraints\n",
    "and ensure the matrix remains positive definite.\n",
    "\n",
    "To analyze the results, we implement functions to track uncertainty\n",
    "metrics and detect interesting dynamics:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56288065-03e2-484f-88c4-8145269dfe96",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Track uncertainty products and regime classification\n",
    "def track_uncertainty_metrics(Lambda_history):\n",
    "    \"\"\"\n",
    "    Track uncertainty products and classify regimes for each conjugate pair.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    Lambda_history: list\n",
    "        History of precision matrices\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    metrics: dict\n",
    "        Dictionary containing uncertainty metrics over time\n",
    "    \"\"\"\n",
    "    n_steps = len(Lambda_history)\n",
    "    n_pairs = Lambda_history[0].shape[0] // 2\n",
    "    \n",
    "    # Initialize tracking arrays\n",
    "    uncertainty_products = np.zeros((n_steps, n_pairs))\n",
    "    regimes = np.zeros((n_steps, n_pairs), dtype=object)\n",
    "    \n",
    "    for step, Lambda in enumerate(Lambda_history):\n",
    "        # Get covariance matrix\n",
    "        V = np.linalg.inv(Lambda)\n",
    "        \n",
    "        # Calculate Fisher information matrix\n",
    "        G = Lambda / 2\n",
    "        \n",
    "        # For each conjugate pair\n",
    "        for i in range(n_pairs):\n",
    "            # Extract 2x2 submatrix for this pair\n",
    "            idx1, idx2 = 2*i, 2*i+1\n",
    "            V_sub = V[np.ix_([idx1, idx2], [idx1, idx2])]\n",
    "            \n",
    "            # Compute uncertainty product (determinant of submatrix)\n",
    "            uncertainty_product = np.sqrt(np.linalg.det(V_sub))\n",
    "            uncertainty_products[step, i] = uncertainty_product\n",
    "            \n",
    "            # Classify regime\n",
    "            if abs(uncertainty_product - min_uncertainty_product) < 0.1*min_uncertainty_product:\n",
    "                regimes[step, i] = \"Quantum-like\"\n",
    "            else:\n",
    "                regimes[step, i] = \"Classical-like\"\n",
    "    \n",
    "    return {\n",
    "        'uncertainty_products': uncertainty_products,\n",
    "        'regimes': regimes\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9289f28-3572-49f4-84b1-1cf1cbc8731d",
   "metadata": {},
   "source": [
    "The `track_uncertainty_metrics` function analyzes the evolution of\n",
    "uncertainty products for each position-momentum pair and classifies them\n",
    "as either “quantum-like” (near minimum uncertainty) or “classical-like”\n",
    "(with excess uncertainty). This classification helps us understand how\n",
    "the system transitions between these regimes during entropy\n",
    "maximization.\n",
    "\n",
    "We also implement a function to detect saddle points in the gradient\n",
    "flow, which are critical for understanding the system’s dynamics:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea4beebb-6601-4518-9901-6d28990c70d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Detect saddle points in the gradient flow\n",
    "def detect_saddle_points(Lambda_history):\n",
    "    \"\"\"\n",
    "    Detect saddle-like behavior in the gradient flow.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    Lambda_history: list\n",
    "        History of precision matrices\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    saddle_metrics: dict\n",
    "        Metrics related to saddle point behavior\n",
    "    \"\"\"\n",
    "    n_steps = len(Lambda_history)\n",
    "    n_pairs = Lambda_history[0].shape[0] // 2\n",
    "    \n",
    "    # Track eigenvalues and their gradients\n",
    "    eigenvalues_history = np.zeros((n_steps, 2*n_pairs))\n",
    "    gradient_ratios = np.zeros((n_steps, n_pairs))\n",
    "    \n",
    "    for step, Lambda in enumerate(Lambda_history):\n",
    "        # Get eigenvalues\n",
    "        eigenvalues, _ = eigh(Lambda)\n",
    "        eigenvalues_history[step] = eigenvalues\n",
    "        \n",
    "        # For each pair, compute ratio of gradients\n",
    "        if step > 0:\n",
    "            for i in range(n_pairs):\n",
    "                idx1, idx2 = 2*i, 2*i+1\n",
    "                \n",
    "                # Change in eigenvalues\n",
    "                delta1 = abs(eigenvalues_history[step, idx1] - eigenvalues_history[step-1, idx1])\n",
    "                delta2 = abs(eigenvalues_history[step, idx2] - eigenvalues_history[step-1, idx2])\n",
    "                \n",
    "                # Ratio of max to min (high ratio indicates saddle-like behavior)\n",
    "                max_delta = max(delta1, delta2)\n",
    "                min_delta = max(1e-10, min(delta1, delta2))  # Avoid division by zero\n",
    "                gradient_ratios[step, i] = max_delta / min_delta\n",
    "    \n",
    "    # Identify candidate saddle points (where some gradients are much larger than others)\n",
    "    saddle_candidates = []\n",
    "    for step in range(1, n_steps):\n",
    "        if np.any(gradient_ratios[step] > 10):  # Threshold for saddle-like behavior\n",
    "            saddle_candidates.append(step)\n",
    "    \n",
    "    return {\n",
    "        'eigenvalues_history': eigenvalues_history,\n",
    "        'gradient_ratios': gradient_ratios,\n",
    "        'saddle_candidates': saddle_candidates\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8876d61c-ac7e-49b2-8712-28203f9185ff",
   "metadata": {},
   "source": [
    "The `detect_saddle_points` function identifies points in the gradient\n",
    "flow where some eigenvalues change much faster than others, indicating\n",
    "saddle-like behavior. These saddle points are important because they\n",
    "represent critical transitions in the system’s evolution.\n",
    "\n",
    "Finally, we implement visualization functions to help us understand the\n",
    "system’s behavior:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae8d7ea8-d40a-4af4-8425-2b9a89e6e249",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Visualize uncertainty ellipses for multiple pairs\n",
    "def plot_multidimensional_uncertainty(Lambda_history, step_indices, pairs_to_plot=None):\n",
    "    \"\"\"\n",
    "    Plot the evolution of uncertainty ellipses for multiple position-momentum pairs.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    Lambda_history: list\n",
    "        History of precision matrices\n",
    "    step_indices: list\n",
    "        Indices of steps to visualize\n",
    "    pairs_to_plot: list, optional\n",
    "        Indices of position-momentum pairs to plot\n",
    "    \"\"\"\n",
    "    n_pairs = Lambda_history[0].shape[0] // 2\n",
    "    \n",
    "    if pairs_to_plot is None:\n",
    "        pairs_to_plot = range(min(3, n_pairs))  # Plot up to 3 pairs by default\n",
    "    \n",
    "    fig, axes = plt.subplots(len(pairs_to_plot), len(step_indices), \n",
    "                            figsize=(4*len(step_indices), 3*len(pairs_to_plot)))\n",
    "    \n",
    "    # Handle case of single pair or single step\n",
    "    if len(pairs_to_plot) == 1:\n",
    "        axes = axes.reshape(1, -1)\n",
    "    if len(step_indices) == 1:\n",
    "        axes = axes.reshape(-1, 1)\n",
    "    \n",
    "    for row, pair_idx in enumerate(pairs_to_plot):\n",
    "        for col, step in enumerate(step_indices):\n",
    "            ax = axes[row, col]\n",
    "            Lambda = Lambda_history[step]\n",
    "            covariance = np.linalg.inv(Lambda)\n",
    "            \n",
    "            # Extract 2x2 submatrix for this pair\n",
    "            idx1, idx2 = 2*pair_idx, 2*pair_idx+1\n",
    "            cov_sub = covariance[np.ix_([idx1, idx2], [idx1, idx2])]\n",
    "            \n",
    "            # Get eigenvalues and eigenvectors of submatrix\n",
    "            values, vectors = eigh(cov_sub)\n",
    "            \n",
    "            # Calculate ellipse parameters\n",
    "            angle = np.degrees(np.arctan2(vectors[1, 0], vectors[0, 0]))\n",
    "            width, height = 2 * np.sqrt(values)\n",
    "            \n",
    "            # Create ellipse\n",
    "            ellipse = Ellipse((0, 0), width=width, height=height, angle=angle,\n",
    "                             edgecolor='blue', facecolor='lightblue', alpha=0.5)\n",
    "            \n",
    "            # Add to plot\n",
    "            ax.add_patch(ellipse)\n",
    "            ax.set_xlim(-3, 3)\n",
    "            ax.set_ylim(-3, 3)\n",
    "            ax.set_aspect('equal')\n",
    "            ax.grid(True)\n",
    "            \n",
    "            # Add minimum uncertainty circle\n",
    "            min_circle = plt.Circle((0, 0), min_uncertainty_product, \n",
    "                                   fill=False, color='red', linestyle='--')\n",
    "            ax.add_patch(min_circle)\n",
    "            \n",
    "            # Compute uncertainty product\n",
    "            uncertainty_product = np.sqrt(np.linalg.det(cov_sub))\n",
    "            \n",
    "            # Determine regime\n",
    "            if abs(uncertainty_product - min_uncertainty_product) < 0.1*min_uncertainty_product:\n",
    "                regime = \"Quantum-like\"\n",
    "                color = 'red'\n",
    "            else:\n",
    "                regime = \"Classical-like\"\n",
    "                color = 'blue'\n",
    "            \n",
    "            # Add labels\n",
    "            if row == 0:\n",
    "                ax.set_title(f\"Step {step}\")\n",
    "            if col == 0:\n",
    "                ax.set_ylabel(f\"Pair {pair_idx+1}\")\n",
    "            \n",
    "            # Add uncertainty product text\n",
    "            ax.text(0.05, 0.95, f\"ΔxΔp = {uncertainty_product:.2f}\",\n",
    "                   transform=ax.transAxes, fontsize=10, verticalalignment='top')\n",
    "            \n",
    "            # Add regime text\n",
    "            ax.text(0.05, 0.85, regime, transform=ax.transAxes, \n",
    "                   fontsize=10, verticalalignment='top', color=color)\n",
    "            \n",
    "            ax.set_xlabel(\"Position\")\n",
    "            ax.set_ylabel(\"Momentum\")\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    return fig"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "912ab904-3795-4bc1-a8f6-bd7bfa0fda50",
   "metadata": {},
   "source": [
    "The `plot_multidimensional_uncertainty` function visualizes the\n",
    "uncertainty ellipses for multiple position-momentum pairs at different\n",
    "steps of the gradient ascent process. These visualizations help us\n",
    "understand how the system transitions from quantum-like to\n",
    "classical-like regimes.\n",
    "\n",
    "This implementation builds on the `InformationReservoir` class we saw\n",
    "earlier, but generalizes to multiple position-momentum pairs and focuses\n",
    "specifically on the uncertainty relationships. The key connection is\n",
    "that both implementations track how systems naturally evolve from\n",
    "minimal entropy states (with quantum-like uncertainty relations) toward\n",
    "maximum entropy states (with classical-like uncertainty relations).\n",
    "\n",
    "As the system evolves through gradient ascent, we observe transitions.\n",
    "\n",
    "1.  *Uncertainty desaturation*: The system begins with a minimal entropy\n",
    "    state that exactly saturates the uncertainty bound\n",
    "    ($\\Delta x \\cdot \\Delta p = \\hbar/2$). As entropy increases, this\n",
    "    bound becomes less tightly saturated.\n",
    "\n",
    "2.  *Shape transformation*: The initial highly squeezed uncertainty\n",
    "    ellipse (with small position uncertainty and large momentum\n",
    "    uncertainty) gradually becomes more circular, representing a more\n",
    "    balanced distribution of uncertainty.\n",
    "\n",
    "3.  *Quantum-to-classical transition*: The system transitions from a\n",
    "    quantum-like regime (where uncertainty is at the minimum allowed by\n",
    "    quantum mechanics) to a more classical-like regime (where\n",
    "    statistical uncertainty dominates over quantum uncertainty).\n",
    "\n",
    "This evolution reveals how information naturally flows from highly\n",
    "ordered configurations toward maximum entropy states, while still\n",
    "respecting the fundamental constraints imposed by the uncertainty\n",
    "principle.\n",
    "\n",
    "In systems with multiple position-momentum pairs, the gradient ascent\n",
    "process encounters saddle points which trigger a natural slowdown. The\n",
    "system naturally slows down near saddle points, with some eigenvalue\n",
    "pairs evolving quickly while others hardly change. These saddle points\n",
    "represent partially equilibrated states where some degrees of freedom\n",
    "have reached maximum entropy while others remain ordered. At these\n",
    "critical points, some variables maintain quantum-like characteristics\n",
    "(uncertainty saturation) while others exhibit classical-like behavior\n",
    "(excess uncertainty).\n",
    "\n",
    "This natural separation creates a hybrid system where quantum-like\n",
    "memory interfaces with classical-like processing - emerging naturally\n",
    "from the geometry of the entropy landscape under uncertainty\n",
    "constraints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ebed3a7c-3b95-4e45-864b-c8c57d41bb4f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.linalg import eigh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b0e7f55-1783-4611-9883-05bfdb71b805",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Constants\n",
    "hbar = 1.0  # Normalized Planck's constant\n",
    "min_uncertainty_product = hbar/2\n",
    "\n",
    "# Verify gradient calculation\n",
    "print(\"Testing gradient calculation:\")\n",
    "test_Lambda = np.array([[2.0, 0.5], [0.5, 1.0]])  # Example precision matrix\n",
    "analytical_grad, numerical_grad = check_entropy_gradient(test_Lambda)\n",
    "\n",
    "# Verify if we're ascending or descending\n",
    "entropy_before = compute_entropy(test_Lambda)\n",
    "eigenvalues, eigenvectors = eigh(test_Lambda)\n",
    "step_size = 0.01\n",
    "eigenvalues_after = eigenvalues + step_size * analytical_grad\n",
    "test_Lambda_after = eigenvectors @ np.diag(eigenvalues_after) @ eigenvectors.T\n",
    "entropy_after = compute_entropy(test_Lambda_after)\n",
    "\n",
    "print(f\"Entropy before step: {entropy_before}\")\n",
    "print(f\"Entropy after step: {entropy_after}\")\n",
    "print(f\"Change in entropy: {entropy_after - entropy_before}\")\n",
    "if entropy_after > entropy_before:\n",
    "    print(\"We are ascending the entropy gradient\")\n",
    "else:\n",
    "    print(\"We are descending the entropy gradient\")\n",
    "\n",
    "test_grad = compute_entropy_gradient(test_Lambda)\n",
    "print(f\"Precision matrix:\\n{test_Lambda}\")\n",
    "print(f\"Entropy gradient:\\n{test_grad}\")\n",
    "print(f\"Entropy: {compute_entropy(test_Lambda):.4f}\")\n",
    "# Initialize system with 2 position-momentum pairs\n",
    "n_pairs = 2\n",
    "Lambda_init = initialize_multidimensional_state(n_pairs, squeeze_factors=[0.1, 0.5])\n",
    "# Run gradient ascent\n",
    "n_steps = 100\n",
    "Lambda_history, entropy_history = gradient_ascent_entropy(Lambda_init, n_steps, learning_rate=0.01)\n",
    "\n",
    "# Track metrics\n",
    "uncertainty_metrics = track_uncertainty_metrics(Lambda_history)\n",
    "saddle_metrics = detect_saddle_points(Lambda_history)\n",
    "\n",
    "# Print results\n",
    "print(\"\\nFinal entropy:\", entropy_history[-1])\n",
    "print(\"Initial uncertainty products:\", uncertainty_metrics['uncertainty_products'][0])\n",
    "print(\"Final uncertainty products:\", uncertainty_metrics['uncertainty_products'][-1])\n",
    "print(\"Saddle point candidates at steps:\", saddle_metrics['saddle_candidates'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa2ed922-29eb-4fe3-b73c-4374377db74e",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Plot entropy evolution\n",
    "plt.figure(figsize=plot.big_wide_figsize)\n",
    "plt.plot(entropy_history)\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Entropy')\n",
    "plt.title('Entropy Evolution During Gradient Ascent')\n",
    "plt.grid(True)\n",
    "mlai.write_figure(filename='entropy-evolution-during-gradient-ascent.svg', \n",
    "                  directory='./information-game')\n",
    "\n",
    "# Plot uncertainty products evolution\n",
    "plt.figure(figsize=plot.big_wide_figsize)\n",
    "for i in range(n_pairs):\n",
    "    plt.plot(uncertainty_metrics['uncertainty_products'][:, i], \n",
    "             label=f'Pair {i+1}')\n",
    "plt.axhline(y=min_uncertainty_product, color='k', linestyle='--', \n",
    "           label='Minimum uncertainty')\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Uncertainty Product (ΔxΔp)')\n",
    "plt.title('Evolution of Uncertainty Products')\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "\n",
    "mlai.write_figure(filename='uncertainty-products-evolution.svg', \n",
    "                  directory='./information-game')\n",
    "\n",
    "\n",
    "\n",
    "# Plot uncertainty ellipses at key steps\n",
    "step_indices = [0, 20, 50, 99]  # Initial, early, middle, final\n",
    "plot_multidimensional_uncertainty(Lambda_history, step_indices)\n",
    "\n",
    "# Plot eigenvalues evolution\n",
    "plt.subplots(figsize=plot.big_wide_figsize)\n",
    "for i in range(2*n_pairs):\n",
    "    plt.semilogy(saddle_metrics['eigenvalues_history'][:, i], \n",
    "                label=f'$\\\\lambda_{i+1}$')\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Eigenvalue (log scale)')\n",
    "plt.title('Evolution of Precision Matrix Eigenvalues')\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "plt.tight_layout()\n",
    "mlai.write_figure(filename='eigenvalue-evolution.svg', \n",
    "                  directory='./information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "359bec44-95eb-4824-b051-9e291ac692a1",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/eigenvalue-evolution.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Eigenvalue evolution during gradient ascent.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/uncertainty-products-evolution.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Uncertainty products evolution during gradient ascent.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/entropy-evolution-during-gradient-ascent.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy evolution during gradient ascent.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/uncertainty-products-evolution.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>.</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1dd0553-2e5e-4f39-8db5-2bf59ce9c85a",
   "metadata": {},
   "source": [
    "## Visualising the Parameter-Capacity Uncertainty Principle\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/uncertainty-visualisation.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/uncertainty-visualisation.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The uncertainty principle between parameters $\\theta$ and capacity\n",
    "variables $c$ is a fundamental feature of information reservoirs. We can\n",
    "visualize this uncertainty relation using phase space plots.\n",
    "\n",
    "We can demonstrate how the uncertainty principle manifests in different\n",
    "regimes:\n",
    "\n",
    "1.  **Quantum-like regime**: Near minimal entropy, the uncertainty\n",
    "    product $\\Delta\\theta \\cdot \\Delta c$ approaches the lower bound\n",
    "    $k$, creating wave-like interference patterns in probability space.\n",
    "\n",
    "2.  **Transitional regime**: As entropy increases, uncertainty relations\n",
    "    begin to decouple, with $\\Delta\\theta \\cdot \\Delta c > k$.\n",
    "\n",
    "3.  **Classical regime**: At high entropy, parameter uncertainty\n",
    "    dominates, creating diffusion-like dynamics with minimal influence\n",
    "    from uncertainty relations.\n",
    "\n",
    "The visualization shows probability distributions for these three\n",
    "regimes in both parameter space and capacity space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10778a1f-5d12-4000-a5b9-b218a91ab6f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e13b20f2-0bae-41f8-b718-f3ad07e53130",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai\n",
    "from matplotlib.patches import Ellipse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01baf800-c985-4387-8118-584c3aa55fa4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualization of uncertainty ellipses\n",
    "fig, ax = plt.subplots(figsize=plot.big_figsize)\n",
    "\n",
    "# Parameters for uncertainty ellipses\n",
    "k = 1  # Uncertainty constant\n",
    "centers = [(0, 0), (2, 2), (4, 4)]\n",
    "widths = [0.25, 0.5, 2]\n",
    "heights = [4, 2.5, 2]\n",
    "#heights = [k/w for w in widths]\n",
    "colors = ['blue', 'green', 'red']\n",
    "labels = ['Quantum-like', 'Transitional', 'Classical']\n",
    "\n",
    "# Plot uncertainty ellipses\n",
    "for center, width, height, color, label in zip(centers, widths, heights, colors, labels):\n",
    "    ellipse = Ellipse(center, width, height, \n",
    "                     edgecolor=color, facecolor='none', \n",
    "                     linewidth=2, label=label)\n",
    "    ax.add_patch(ellipse)\n",
    "    \n",
    "    # Add text label\n",
    "    ax.text(center[0], center[1] + height/2 + 0.2, \n",
    "            label, ha='center', color=color)\n",
    "    \n",
    "    # Add area label (uncertainty product)\n",
    "    area =  width * height\n",
    "    ax.text(center[0], center[1] - height/2 - 0.3, \n",
    "            f'Area = {width:.2f} $\\\\times$ {height: .2f} $\\\\pi$', ha='center')\n",
    "\n",
    "# Set axis labels and limits\n",
    "ax.set_xlabel('Parameter $\\\\theta$')\n",
    "ax.set_ylabel('Capacity $C$')\n",
    "ax.set_xlim(-3, 7)\n",
    "ax.set_ylim(-3, 7)\n",
    "ax.set_aspect('equal')\n",
    "ax.grid(True, linestyle='--', alpha=0.7)\n",
    "ax.set_title('Parameter-Capacity Uncertainty Relation')\n",
    "\n",
    "# Add diagonal line representing constant uncertainty product\n",
    "x = np.linspace(0.25, 6, 100)\n",
    "y = k/x\n",
    "ax.plot(x, y, 'k--', alpha=0.5, label='Minimum uncertainty: $\\\\Delta \\\\theta \\\\Delta C = k$')\n",
    "\n",
    "ax.legend(loc='upper right')\n",
    "mlai.write_figure(filename='uncertainty-ellipses.svg', \n",
    "                  directory = './information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "629987a7-76ee-4915-af26-a1d5e5813f2a",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/uncertainty-ellipses.svg\" class=\"\" width=\"50%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualisaiton of the uncertainty trade-off between parameter\n",
    "precision and capacity.</i>\n",
    "\n",
    "This visualization helps explain why information reservoirs with\n",
    "quantum-like properties naturally emerge at minimal entropy. The\n",
    "uncertainty principle is not imposed but arises naturally from the\n",
    "constraints of Shannon information theory applied to physical systems\n",
    "operating at minimal entropy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcec0562-5bb2-47ef-ae19-dd44d3e142c3",
   "metadata": {},
   "source": [
    "## Scaling to Large Systems: Emergent Statistical Behavior\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-ascent-large-system.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-ascent-large-system.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "We now extend our analysis to much larger systems with thousands of\n",
    "position-momentum pairs. This allows us to observe emergent statistical\n",
    "behaviors and phase transitions that aren’t apparent in smaller systems.\n",
    "\n",
    "Large-scale systems reveal how microscopic uncertainty constraints lead\n",
    "to macroscopic statistical patterns. By analyzing thousands of\n",
    "position-momentum pairs simultaneously, we can identify emergent\n",
    "behaviors and natural clustering of dynamical patterns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a26c22fe-f65e-44c9-959e-af8e07087e52",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optimized implementation for very large systems\n",
    "def large_scale_gradient_ascent(n_pairs, steps=100, learning_rate=1, sample_interval=5):\n",
    "    \"\"\"\n",
    "    Memory-efficient implementation of gradient ascent for very large systems.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    n_pairs: int\n",
    "        Number of position-momentum pairs\n",
    "    steps: int\n",
    "        Number of gradient steps to take\n",
    "    learning_rate: float\n",
    "        Step size for gradient ascent\n",
    "    sample_interval: int\n",
    "        Store state every sample_interval steps to save memory\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    sampled_states: list\n",
    "        Sparse history of states at sampled intervals\n",
    "    entropy_history: list\n",
    "        Complete history of entropy values\n",
    "    uncertainty_metrics: dict\n",
    "        Metrics tracking uncertainty products over time\n",
    "    \"\"\"\n",
    "    # Initialize with diagonal precision matrix (no need to store full matrix)\n",
    "    dim = 2 * n_pairs\n",
    "    eigenvalues = np.zeros(dim)\n",
    "    \n",
    "    # Initialize with minimal entropy state\n",
    "    for i in range(n_pairs):\n",
    "        squeeze = 0.1 * (1 + (i % 10))  # Cycle through 10 different squeeze factors\n",
    "        eigenvalues[2*i] = 1.0 / (squeeze * min_uncertainty_product)\n",
    "        eigenvalues[2*i+1] = 1.0 / (min_uncertainty_product / squeeze)\n",
    "    \n",
    "    # Storage for results (sparse to save memory)\n",
    "    sampled_states = []\n",
    "    entropy_history = []\n",
    "    uncertainty_products = np.zeros((steps+1, n_pairs))\n",
    "    \n",
    "    # Initial entropy and uncertainty\n",
    "    entropy = 0.5 * (dim * (1 + np.log(2*np.pi)) - np.sum(np.log(eigenvalues)))\n",
    "    entropy_history.append(entropy)\n",
    "    \n",
    "    # Track initial uncertainty products\n",
    "    for i in range(n_pairs):\n",
    "        uncertainty_products[0, i] = 1.0 / np.sqrt(eigenvalues[2*i] * eigenvalues[2*i+1])\n",
    "    \n",
    "    # Store initial state\n",
    "    sampled_states.append(eigenvalues.copy())\n",
    "    \n",
    "    # Gradient ascent loop\n",
    "    for step in range(steps):\n",
    "        # Compute gradient with respect to eigenvalues (diagonal precision)\n",
    "        grad = -1.0 / (2.0 * eigenvalues)\n",
    "        \n",
    "        # Project gradient to respect constraints\n",
    "        for i in range(n_pairs):\n",
    "            idx1, idx2 = 2*i, 2*i+1\n",
    "            \n",
    "            # Current uncertainty product (in eigenvalue space, this is inverse)\n",
    "            current_product = eigenvalues[idx1] * eigenvalues[idx2]\n",
    "            \n",
    "            # If we're already at minimum uncertainty, project gradient\n",
    "            if abs(current_product - 1/min_uncertainty_product**2) < 1e-6:\n",
    "                # Tangent direction preserves the product\n",
    "                tangent = np.array([-eigenvalues[idx2], -eigenvalues[idx1]])\n",
    "                tangent = tangent / np.linalg.norm(tangent)\n",
    "                \n",
    "                # Project the gradient onto this tangent\n",
    "                pair_gradient = np.array([grad[idx1], grad[idx2]])\n",
    "                projection = np.dot(pair_gradient, tangent) * tangent\n",
    "                \n",
    "                grad[idx1] = projection[0]\n",
    "                grad[idx2] = projection[1]\n",
    "        \n",
    "        # Update eigenvalues\n",
    "        eigenvalues += learning_rate * grad\n",
    "        \n",
    "        # Ensure eigenvalues remain positive\n",
    "        eigenvalues = np.maximum(eigenvalues, 1e-10)\n",
    "        \n",
    "        # Compute entropy\n",
    "        entropy = 0.5 * (dim * (1 + np.log(2*np.pi)) - np.sum(np.log(eigenvalues)))\n",
    "        entropy_history.append(entropy)\n",
    "        \n",
    "        # Track uncertainty products\n",
    "        for i in range(n_pairs):\n",
    "            uncertainty_products[step+1, i] = 1.0 / np.sqrt(eigenvalues[2*i] * eigenvalues[2*i+1])\n",
    "        \n",
    "        # Store state at sampled intervals\n",
    "        if step % sample_interval == 0 or step == steps-1:\n",
    "            sampled_states.append(eigenvalues.copy())\n",
    "    \n",
    "    # Compute regime classifications\n",
    "    regimes = np.zeros((steps+1, n_pairs), dtype=object)\n",
    "    for step in range(steps+1):\n",
    "        for i in range(n_pairs):\n",
    "            if abs(uncertainty_products[step, i] - min_uncertainty_product) < 0.1*min_uncertainty_product:\n",
    "                regimes[step, i] = \"Quantum-like\"\n",
    "            else:\n",
    "                regimes[step, i] = \"Classical-like\"\n",
    "    \n",
    "    uncertainty_metrics = {\n",
    "        'uncertainty_products': uncertainty_products,\n",
    "        'regimes': regimes\n",
    "    }\n",
    "    \n",
    "    return sampled_states, entropy_history, uncertainty_metrics\n",
    "\n",
    "# Add gradient check function for large systems\n",
    "def check_large_system_gradient(n_pairs=10, epsilon=1e-6):\n",
    "    \"\"\"\n",
    "    Check the analytical gradient against numerical gradient for a large system.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    n_pairs: int\n",
    "        Number of position-momentum pairs to test\n",
    "    epsilon: float\n",
    "        Small perturbation for numerical gradient\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    max_diff: float\n",
    "        Maximum difference between analytical and numerical gradients\n",
    "    \"\"\"\n",
    "    # Initialize a small test system\n",
    "    dim = 2 * n_pairs\n",
    "    eigenvalues = np.zeros(dim)\n",
    "    \n",
    "    # Initialize with minimal entropy state\n",
    "    for i in range(n_pairs):\n",
    "        squeeze = 0.1 * (1 + (i % 10))\n",
    "        eigenvalues[2*i] = 1.0 / (squeeze * min_uncertainty_product)\n",
    "        eigenvalues[2*i+1] = 1.0 / (min_uncertainty_product / squeeze)\n",
    "    \n",
    "    # Compute analytical gradient\n",
    "    analytical_grad = -1.0 / (2.0 * eigenvalues)\n",
    "    \n",
    "    # Compute numerical gradient\n",
    "    numerical_grad = np.zeros_like(eigenvalues)\n",
    "    \n",
    "    # Function to compute entropy from eigenvalues\n",
    "    def compute_entropy_from_eigenvalues(evals):\n",
    "        return 0.5 * (dim * (1 + np.log(2*np.pi)) - np.sum(np.log(evals)))\n",
    "    \n",
    "    # Initial entropy\n",
    "    base_entropy = compute_entropy_from_eigenvalues(eigenvalues)\n",
    "    \n",
    "    # Compute numerical gradient\n",
    "    for i in range(dim):\n",
    "        # Perturb eigenvalue up\n",
    "        eigenvalues_plus = eigenvalues.copy()\n",
    "        eigenvalues_plus[i] += epsilon\n",
    "        entropy_plus = compute_entropy_from_eigenvalues(eigenvalues_plus)\n",
    "        \n",
    "        # Perturb eigenvalue down\n",
    "        eigenvalues_minus = eigenvalues.copy()\n",
    "        eigenvalues_minus[i] -= epsilon\n",
    "        entropy_minus = compute_entropy_from_eigenvalues(eigenvalues_minus)\n",
    "        \n",
    "        # Compute numerical gradient\n",
    "        numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)\n",
    "    \n",
    "    # Compare\n",
    "    diff = np.abs(analytical_grad - numerical_grad)\n",
    "    max_diff = np.max(diff)\n",
    "    avg_diff = np.mean(diff)\n",
    "    \n",
    "    print(f\"Gradient check for {n_pairs} position-momentum pairs:\")\n",
    "    print(f\"Maximum difference: {max_diff:.8f}\")\n",
    "    print(f\"Average difference: {avg_diff:.8f}\")\n",
    "    \n",
    "    # Verify gradient ascent direction\n",
    "    step_size = 0.01\n",
    "    eigenvalues_after = eigenvalues + step_size * analytical_grad\n",
    "    entropy_after = compute_entropy_from_eigenvalues(eigenvalues_after)\n",
    "    \n",
    "    print(f\"Entropy before step: {base_entropy:.6f}\")\n",
    "    print(f\"Entropy after step: {entropy_after:.6f}\")\n",
    "    print(f\"Change in entropy: {entropy_after - base_entropy:.6f}\")\n",
    "    \n",
    "    if entropy_after > base_entropy:\n",
    "        print(\"✓ Gradient ascent confirmed: entropy increases\")\n",
    "    else:\n",
    "        print(\"✗ Error: entropy decreases with gradient step\")\n",
    "    \n",
    "    return max_diff\n",
    "\n",
    "# Analyze statistical properties of large-scale system\n",
    "def analyze_large_system(uncertainty_metrics, n_pairs, steps):\n",
    "    \"\"\"\n",
    "    Analyze statistical properties of a large-scale system.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    uncertainty_metrics: dict\n",
    "        Metrics from large_scale_gradient_ascent\n",
    "    n_pairs: int\n",
    "        Number of position-momentum pairs\n",
    "    steps: int\n",
    "        Number of gradient steps taken\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    analysis: dict\n",
    "        Statistical analysis results\n",
    "    \"\"\"\n",
    "    uncertainty_products = uncertainty_metrics['uncertainty_products']\n",
    "    regimes = uncertainty_metrics['regimes']\n",
    "    \n",
    "    # Compute statistics over time\n",
    "    mean_uncertainty = np.mean(uncertainty_products, axis=1)\n",
    "    std_uncertainty = np.std(uncertainty_products, axis=1)\n",
    "    min_uncertainty_over_time = np.min(uncertainty_products, axis=1)\n",
    "    max_uncertainty_over_time = np.max(uncertainty_products, axis=1)\n",
    "    \n",
    "    # Count regime transitions\n",
    "    quantum_count = np.zeros(steps+1)\n",
    "    for step in range(steps+1):\n",
    "        quantum_count[step] = np.sum(regimes[step] == \"Quantum-like\")\n",
    "    \n",
    "    # Identify clusters of similar behavior\n",
    "    from sklearn.cluster import KMeans\n",
    "    \n",
    "    # Reshape to have each pair as a sample with its uncertainty trajectory as features\n",
    "    pair_trajectories = uncertainty_products.T  # shape: (n_pairs, steps+1)\n",
    "    \n",
    "    # Use fewer clusters for very large systems\n",
    "    n_clusters = min(10, n_pairs // 100)\n",
    "    kmeans = KMeans(n_clusters=n_clusters, random_state=42)\n",
    "    cluster_labels = kmeans.fit_predict(pair_trajectories)\n",
    "    \n",
    "    # Count pairs in each cluster\n",
    "    cluster_counts = np.bincount(cluster_labels, minlength=n_clusters)\n",
    "    \n",
    "    # Get representative pairs from each cluster (closest to centroid)\n",
    "    representative_pairs = []\n",
    "    for i in range(n_clusters):\n",
    "        cluster_members = np.where(cluster_labels == i)[0]\n",
    "        if len(cluster_members) > 0:\n",
    "            # Find pair closest to cluster centroid\n",
    "            centroid = kmeans.cluster_centers_[i]\n",
    "            distances = np.linalg.norm(pair_trajectories[cluster_members] - centroid, axis=1)\n",
    "            closest_idx = cluster_members[np.argmin(distances)]\n",
    "            representative_pairs.append(closest_idx)\n",
    "    \n",
    "    return {\n",
    "        'mean_uncertainty': mean_uncertainty,\n",
    "        'std_uncertainty': std_uncertainty,\n",
    "        'min_uncertainty': min_uncertainty_over_time,\n",
    "        'max_uncertainty': max_uncertainty_over_time,\n",
    "        'quantum_count': quantum_count,\n",
    "        'quantum_fraction': quantum_count / n_pairs,\n",
    "        'cluster_counts': cluster_counts,\n",
    "        'representative_pairs': representative_pairs,\n",
    "        'cluster_labels': cluster_labels\n",
    "    }\n",
    "\n",
    "# Visualize results for large-scale system\n",
    "def visualize_large_system(sampled_states, entropy_history, uncertainty_metrics, analysis, n_pairs, steps):\n",
    "    \"\"\"\n",
    "    Create visualizations for large-scale system results.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    sampled_states: list\n",
    "        Sparse history of eigenvalues\n",
    "    entropy_history: list\n",
    "        History of entropy values\n",
    "    uncertainty_metrics: dict\n",
    "        Uncertainty metrics over time\n",
    "    analysis: dict\n",
    "        Statistical analysis results\n",
    "    n_pairs: int\n",
    "        Number of position-momentum pairs\n",
    "    steps: int\n",
    "        Number of gradient steps taken\n",
    "    \"\"\"\n",
    "    # Plot entropy evolution\n",
    "    plt.figure(figsize=(10, 6))\n",
    "    plt.plot(entropy_history)\n",
    "    plt.xlabel('Gradient Ascent Step')\n",
    "    plt.ylabel('Entropy')\n",
    "    plt.title(f'Entropy Evolution for {n_pairs} Position-Momentum Pairs')\n",
    "    plt.grid(True)\n",
    "    \n",
    "    # Plot uncertainty statistics\n",
    "    plt.figure(figsize=(10, 6))\n",
    "    plt.plot(analysis['mean_uncertainty'], label='Mean uncertainty')\n",
    "    plt.fill_between(range(steps+1), \n",
    "                    analysis['mean_uncertainty'] - analysis['std_uncertainty'],\n",
    "                    analysis['mean_uncertainty'] + analysis['std_uncertainty'],\n",
    "                    alpha=0.3, label='±1 std dev')\n",
    "    plt.plot(analysis['min_uncertainty'], 'g--', label='Min uncertainty')\n",
    "    plt.plot(analysis['max_uncertainty'], 'r--', label='Max uncertainty')\n",
    "    plt.axhline(y=min_uncertainty_product, color='k', linestyle=':', label='Quantum limit')\n",
    "    plt.xlabel('Gradient Ascent Step')\n",
    "    plt.ylabel('Uncertainty Product (ΔxΔp)')\n",
    "    plt.title(f'Uncertainty Evolution Statistics for {n_pairs} Pairs')\n",
    "    plt.legend()\n",
    "    plt.grid(True)\n",
    "    \n",
    "    # Plot quantum-classical transition\n",
    "    plt.figure(figsize=(10, 6))\n",
    "    plt.plot(analysis['quantum_fraction'] * 100)\n",
    "    plt.xlabel('Gradient Ascent Step')\n",
    "    plt.ylabel('Percentage of Pairs (%)')\n",
    "    plt.title('Percentage of Pairs in Quantum-like Regime')\n",
    "    plt.ylim(0, 100)\n",
    "    plt.grid(True)\n",
    "    \n",
    "    # Plot representative pairs from each cluster\n",
    "    plt.figure(figsize=(12, 8))\n",
    "    for i, pair_idx in enumerate(analysis['representative_pairs']):\n",
    "        cluster_idx = analysis['cluster_labels'][pair_idx]\n",
    "        count = analysis['cluster_counts'][cluster_idx]\n",
    "        plt.plot(uncertainty_metrics['uncertainty_products'][:, pair_idx], \n",
    "                label=f'Cluster {i+1} ({count} pairs, {count/n_pairs*100:.1f}%)')\n",
    "    \n",
    "    plt.axhline(y=min_uncertainty_product, color='k', linestyle=':', label='Quantum limit')\n",
    "    plt.xlabel('Gradient Ascent Step')\n",
    "    plt.ylabel('Uncertainty Product ($\\Delta x \\Delta p$)')\n",
    "    plt.title('Representative Uncertainty Trajectories from Each Cluster')\n",
    "    plt.legend()\n",
    "    plt.grid(True)\n",
    "    \n",
    "    # Visualize uncertainty ellipses for representative pairs\n",
    "    if len(sampled_states) > 0:\n",
    "        # Get indices of sampled steps\n",
    "        sampled_steps = list(range(0, steps+1, (steps+1)//len(sampled_states)))\n",
    "        if sampled_steps[-1] != steps:\n",
    "            sampled_steps[-1] = steps\n",
    "        \n",
    "        # Only visualize a few representative pairs\n",
    "        pairs_to_visualize = analysis['representative_pairs'][:min(4, len(analysis['representative_pairs']))]\n",
    "        \n",
    "        fig, axes = plt.subplots(len(pairs_to_visualize), len(sampled_states), \n",
    "                                figsize=(4*len(sampled_states), 3*len(pairs_to_visualize)))\n",
    "        \n",
    "        # Handle case of single pair or single step\n",
    "        if len(pairs_to_visualize) == 1:\n",
    "            axes = axes.reshape(1, -1)\n",
    "        if len(sampled_states) == 1:\n",
    "            axes = axes.reshape(-1, 1)\n",
    "        \n",
    "        for row, pair_idx in enumerate(pairs_to_visualize):\n",
    "            for col, step_idx in enumerate(range(len(sampled_states))):\n",
    "                ax = axes[row, col]\n",
    "                eigenvalues = sampled_states[step_idx]\n",
    "                \n",
    "                # Extract eigenvalues for this pair\n",
    "                idx1, idx2 = 2*pair_idx, 2*pair_idx+1\n",
    "                pos_eigenvalue = eigenvalues[idx1]\n",
    "                mom_eigenvalue = eigenvalues[idx2]\n",
    "                \n",
    "                # Convert precision eigenvalues to covariance eigenvalues\n",
    "                cov_eigenvalues = np.array([1/pos_eigenvalue, 1/mom_eigenvalue])\n",
    "                \n",
    "                # Calculate ellipse parameters (assuming principal axes aligned with coordinate axes)\n",
    "                width, height = 2 * np.sqrt(cov_eigenvalues)\n",
    "                \n",
    "                # Create ellipse\n",
    "                ellipse = Ellipse((0, 0), width=width, height=height, angle=0,\n",
    "                                edgecolor='blue', facecolor='lightblue', alpha=0.5)\n",
    "                \n",
    "                # Add to plot\n",
    "                ax.add_patch(ellipse)\n",
    "                ax.set_xlim(-3, 3)\n",
    "                ax.set_ylim(-3, 3)\n",
    "                ax.set_aspect('equal')\n",
    "                ax.grid(True)\n",
    "                \n",
    "                # Add minimum uncertainty circle\n",
    "                min_circle = plt.Circle((0, 0), min_uncertainty_product, \n",
    "                                      fill=False, color='red', linestyle='--')\n",
    "                ax.add_patch(min_circle)\n",
    "                \n",
    "                # Compute uncertainty product\n",
    "                uncertainty_product = np.sqrt(1/(pos_eigenvalue * mom_eigenvalue))\n",
    "                \n",
    "                # Determine regime\n",
    "                if abs(uncertainty_product - min_uncertainty_product) < 0.1*min_uncertainty_product:\n",
    "                    regime = \"Quantum-like\"\n",
    "                    color = 'red'\n",
    "                else:\n",
    "                    regime = \"Classical-like\"\n",
    "                    color = 'blue'\n",
    "                \n",
    "                # Add labels\n",
    "                if row == 0:\n",
    "                    step_num = sampled_steps[step_idx]\n",
    "                    ax.set_title(f\"Step {step_num}\")\n",
    "                if col == 0:\n",
    "                    cluster_idx = analysis['cluster_labels'][pair_idx]\n",
    "                    count = analysis['cluster_counts'][cluster_idx]\n",
    "                    ax.set_ylabel(f\"Cluster {row+1}\\n({count} pairs)\")\n",
    "                \n",
    "                # Add uncertainty product text\n",
    "                ax.text(0.05, 0.95, f\"ΔxΔp = {uncertainty_product:.2f}\",\n",
    "                      transform=ax.transAxes, fontsize=10, verticalalignment='top')\n",
    "                \n",
    "                # Add regime text\n",
    "                ax.text(0.05, 0.85, regime, transform=ax.transAxes, \n",
    "                      fontsize=10, verticalalignment='top', color=color)\n",
    "                \n",
    "                ax.set_xlabel(\"Position\")\n",
    "                ax.set_ylabel(\"Momentum\")\n",
    "        \n",
    "        plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60689bb0-737d-4391-ba62-38d1555f659d",
   "metadata": {},
   "source": [
    "In large-scale systems, we observe several emergent phenomena that\n",
    "aren’t apparent in smaller systems:\n",
    "\n",
    "1.  *Statistical phase transitions*: As the system evolves, we observe a\n",
    "    gradual transition from predominantly quantum-like behavior to\n",
    "    predominantly classical-like behavior. This transition resembles a\n",
    "    phase transition in statistical physics.\n",
    "\n",
    "2.  *Natural clustering*: The thousands of position-momentum pairs\n",
    "    naturally organize into clusters with similar dynamical behaviors.\n",
    "    Some clusters maintain quantum-like characteristics for longer\n",
    "    periods, while others quickly transition to classical-like behavior.\n",
    "\n",
    "3.  *Scale-invariant patterns*: The statistical properties of the system\n",
    "    show remarkable consistency across different scales, suggesting\n",
    "    underlying universal principles in the entropy-uncertainty\n",
    "    relationship.\n",
    "\n",
    "The quantum-classical boundary, which appears sharp in small systems,\n",
    "becomes a statistical property in large systems. At any given time, some\n",
    "fraction of the system exhibits quantum-like behavior while the\n",
    "remainder shows classical-like characteristics. This fraction evolves\n",
    "over time, creating a dynamic boundary between quantum and classical\n",
    "regimes.\n",
    "\n",
    "The clustering analysis reveals natural groupings of position-momentum\n",
    "pairs based on their dynamical trajectories. These clusters represent\n",
    "different “modes” of behavior within the large system, with some modes\n",
    "maintaining quantum coherence for longer periods while others quickly\n",
    "decohere into classical-like states."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "260296c9-8334-4156-98e8-2be57eaabbba",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.linalg import eigh\n",
    "from sklearn.cluster import KMeans"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95dbf4d7-5a86-4b2d-b2ff-e035c9e42c10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Constants\n",
    "hbar = 1.0  # Normalized Planck's constant\n",
    "min_uncertainty_product = hbar/2\n",
    "\n",
    "# Perform gradient check on a smaller test system\n",
    "print(\"Performing gradient check for large system implementation:\")\n",
    "gradient_error = check_large_system_gradient(n_pairs=10)\n",
    "print(f\"Gradient check completed with maximum error: {gradient_error:.8f}\")\n",
    "\n",
    "# Run large-scale simulation\n",
    "n_pairs = 5000  # 5000 position-momentum pairs (10,000×10,000 matrix)\n",
    "steps = 100      # Fewer steps for large system\n",
    "\n",
    "# Run the optimized implementation\n",
    "sampled_states, entropy_history, uncertainty_metrics = large_scale_gradient_ascent(\n",
    "    n_pairs=n_pairs, steps=steps, learning_rate=0.01, sample_interval=5)\n",
    "\n",
    "# Analyze results\n",
    "analysis = analyze_large_system(uncertainty_metrics, n_pairs, steps)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f22cf9d-eb14-4b5d-983f-24f1f0650834",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai\n",
    "from matplotlib.patches import Ellipse, Circle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "933561e8-283c-44e3-afab-18f1e0f12640",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize results\n",
    "visualize_large_system(sampled_states, entropy_history, uncertainty_metrics, \n",
    "                      analysis, n_pairs, steps)\n",
    "\n",
    "# Additional plot: Phase transition visualization\n",
    "plt.figure(figsize=(10, 6))\n",
    "quantum_fraction = analysis['quantum_fraction'] * 100\n",
    "classical_fraction = 100 - quantum_fraction\n",
    "\n",
    "plt.stackplot(range(steps+1), \n",
    "             [quantum_fraction, classical_fraction],\n",
    "             labels=['Quantum-like', 'Classical-like'],\n",
    "             colors=['red', 'blue'], alpha=0.7)\n",
    "\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Percentage of System (%)')\n",
    "plt.title('Quantum-Classical Phase Transition')\n",
    "plt.legend(loc='center right')\n",
    "plt.ylim(0, 100)\n",
    "plt.grid(True)\n",
    "\n",
    "mlai.write_figure(filename='large-scale-gradient-ascent-quantum-classical-phase-transition.svg', \n",
    "                  directory='./information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0de4a24-4306-4511-84c4-e3a5d02ceccd",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/large-scale-gradient-ascent-quantum-classical-phase-transition.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Large-scale gradient ascent reveals a quantum-classical phase\n",
    "transition.</i>\n",
    "\n",
    "The large-scale simulation reveals how microscopic uncertainty\n",
    "constraints lead to macroscopic statistical patterns. The system\n",
    "naturally organizes into regions of quantum-like and classical-like\n",
    "behavior, with a dynamic boundary that evolves over time.\n",
    "\n",
    "This perspective provides a new way to understand the quantum-classical\n",
    "transition not as a sharp boundary, but as a statistical property of\n",
    "large systems. The fraction of the system exhibiting quantum-like\n",
    "behavior gradually decreases as entropy increases, creating a smooth\n",
    "transition between quantum and classical regimes.\n",
    "\n",
    "The clustering analysis identifies natural groupings of\n",
    "position-momentum pairs based on their dynamical trajectories. These\n",
    "clusters represent different “modes” of behavior within the large\n",
    "system, with some modes maintaining quantum coherence for longer periods\n",
    "while others quickly decohere into classical-like states.\n",
    "\n",
    "This approach to large-scale quantum-classical systems provides a\n",
    "powerful framework for understanding how microscopic quantum constraints\n",
    "manifest in macroscopic statistical behaviors. It bridges quantum\n",
    "mechanics and statistical physics through the common language of\n",
    "information theory and entropy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a03657d-8ac4-4768-8411-0b104338dd41",
   "metadata": {},
   "source": [
    "## Four-Bin Saddle Point Example\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-saddle-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-saddle-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "To illustrate saddle points and information reservoirs, we need at least\n",
    "a 4-bin system. This creates a 3-dimensional parameter space where we\n",
    "can observe genuine saddle points.\n",
    "\n",
    "Consider a 4-bin system parameterized by natural parameters $\\theta_1$,\n",
    "$\\theta_2$, and $\\theta_3$ (with one constraint). A saddle point occurs\n",
    "where the gradient $\\nabla_\\theta S = 0$, but the Hessian has mixed\n",
    "eigenvalues - some positive, some negative.\n",
    "\n",
    "At these points, the Fisher information matrix $G(\\theta)$\n",
    "eigendecomposition reveals.\n",
    "\n",
    "-   Fast modes: large positive eigenvalues → rapid evolution\n",
    "-   Slow modes: small positive eigenvalues → gradual evolution\n",
    "-   Critical modes: near-zero eigenvalues → information reservoirs\n",
    "\n",
    "The eigenvectors of $G(\\theta)$ at the saddle point determine which\n",
    "parameter combinations form information reservoirs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0722d07-c533-42b8-8849-f6004219fd16",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19741cc6-d5a2-492d-9562-afcfec4d759c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exponential family entropy functions for 4-bin system\n",
    "def exponential_family_entropy(theta):\n",
    "    \"\"\"\n",
    "    Compute entropy of a 4-bin exponential family distribution\n",
    "    parameterized by natural parameters theta\n",
    "    \"\"\"\n",
    "    # Compute the log-partition function (normalization constant)\n",
    "    log_Z = np.log(np.sum(np.exp(theta)))\n",
    "    \n",
    "    # Compute probabilities\n",
    "    p = np.exp(theta - log_Z)\n",
    "    \n",
    "    # Compute entropy: -sum(p_i * log(p_i))\n",
    "    entropy = -np.sum(p * np.log(p), where=p>0)\n",
    "    \n",
    "    return entropy\n",
    "\n",
    "def entropy_gradient(theta):\n",
    "    \"\"\"\n",
    "    Compute the gradient of the entropy with respect to theta\n",
    "    \"\"\"\n",
    "    # Compute the log-partition function (normalization constant)\n",
    "    log_Z = np.log(np.sum(np.exp(theta)))\n",
    "    \n",
    "    # Compute probabilities\n",
    "    p = np.exp(theta - log_Z)\n",
    "    \n",
    "    # Gradient is theta times the second derivative of log partition function\n",
    "    return -p*theta + p*(np.dot(p, theta))\n",
    "\n",
    "# Add a gradient check function\n",
    "def check_gradient(theta, epsilon=1e-6):\n",
    "    \"\"\"\n",
    "    Check the analytical gradient against numerical gradient\n",
    "    \"\"\"\n",
    "    # Compute analytical gradient\n",
    "    analytical_grad = entropy_gradient(theta)\n",
    "    \n",
    "    # Compute numerical gradient\n",
    "    numerical_grad = np.zeros_like(theta)\n",
    "    for i in range(len(theta)):\n",
    "        theta_plus = theta.copy()\n",
    "        theta_plus[i] += epsilon\n",
    "        entropy_plus = exponential_family_entropy(theta_plus)\n",
    "        \n",
    "        theta_minus = theta.copy()\n",
    "        theta_minus[i] -= epsilon\n",
    "        entropy_minus = exponential_family_entropy(theta_minus)\n",
    "        \n",
    "        numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)\n",
    "    \n",
    "    # Compare\n",
    "    print(\"Analytical gradient:\", analytical_grad)\n",
    "    print(\"Numerical gradient:\", numerical_grad)\n",
    "    print(\"Difference:\", np.abs(analytical_grad - numerical_grad))\n",
    "    \n",
    "    return analytical_grad, numerical_grad\n",
    "\n",
    "# Project gradient to respect constraints (sum of theta is constant)\n",
    "def project_gradient(theta, grad):\n",
    "    \"\"\"\n",
    "    Project gradient to ensure sum constraint is respected\n",
    "    \"\"\"\n",
    "    # Project to space where sum of components is zero\n",
    "    return grad - np.mean(grad)\n",
    "\n",
    "# Perform gradient ascent on entropy\n",
    "def gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1):\n",
    "    \"\"\"\n",
    "    Perform gradient ascent on entropy for 4-bin system\n",
    "    \"\"\"\n",
    "    theta = theta_init.copy()\n",
    "    theta_history = [theta.copy()]\n",
    "    entropy_history = [exponential_family_entropy(theta)]\n",
    "    \n",
    "    for _ in range(steps):\n",
    "        # Compute gradient\n",
    "        grad = entropy_gradient(theta)\n",
    "        proj_grad = project_gradient(theta, grad)\n",
    "        \n",
    "        # Update parameters\n",
    "        theta += learning_rate * proj_grad\n",
    "        \n",
    "        # Store history\n",
    "        theta_history.append(theta.copy())\n",
    "        entropy_history.append(exponential_family_entropy(theta))\n",
    "    \n",
    "    return np.array(theta_history), np.array(entropy_history)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "650393ac-219f-43a8-af25-a759514f4083",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test the gradient calculation\n",
    "test_theta = np.array([0.5, -0.3, 0.1, -0.3])\n",
    "test_theta = test_theta - np.mean(test_theta)  # Ensure constraint is satisfied\n",
    "print(\"Testing gradient calculation:\")\n",
    "analytical_grad, numerical_grad = check_gradient(test_theta)\n",
    "\n",
    "# Verify if we're ascending or descending\n",
    "entropy_before = exponential_family_entropy(test_theta)\n",
    "step_size = 0.01\n",
    "test_theta_after = test_theta + step_size * analytical_grad\n",
    "entropy_after = exponential_family_entropy(test_theta_after)\n",
    "print(f\"Entropy before step: {entropy_before}\")\n",
    "print(f\"Entropy after step: {entropy_after}\")\n",
    "print(f\"Change in entropy: {entropy_after - entropy_before}\")\n",
    "if entropy_after > entropy_before:\n",
    "    print(\"We are ascending the entropy gradient\")\n",
    "else:\n",
    "    print(\"We are descending the entropy gradient\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fa04c14-b698-4b5b-bd10-08ca17702950",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize with asymmetric distribution (away from saddle point)\n",
    "theta_init = np.array([1.0, -0.5, -0.2, -0.3])\n",
    "theta_init = theta_init - np.mean(theta_init)  # Ensure constraint is satisfied\n",
    "\n",
    "# Run gradient ascent\n",
    "theta_history, entropy_history = gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1.0)\n",
    "\n",
    "# Create a grid for visualization\n",
    "x = np.linspace(-2, 2, 100)\n",
    "y = np.linspace(-2, 2, 100)\n",
    "X, Y = np.meshgrid(x, y)\n",
    "\n",
    "# Compute entropy at each grid point (with constraint on theta3 and theta4)\n",
    "Z = np.zeros_like(X)\n",
    "for i in range(X.shape[0]):\n",
    "    for j in range(X.shape[1]):\n",
    "        # Create full theta vector with constraint that sum is zero\n",
    "        theta1, theta2 = X[i,j], Y[i,j]\n",
    "        theta3 = -0.5 * (theta1 + theta2)\n",
    "        theta4 = -0.5 * (theta1 + theta2)\n",
    "        theta = np.array([theta1, theta2, theta3, theta4])\n",
    "        Z[i,j] = exponential_family_entropy(theta)\n",
    "\n",
    "# Compute gradient field\n",
    "dX = np.zeros_like(X)\n",
    "dY = np.zeros_like(Y)\n",
    "for i in range(X.shape[0]):\n",
    "    for j in range(X.shape[1]):\n",
    "        # Create full theta vector with constraint\n",
    "        theta1, theta2 = X[i,j], Y[i,j]\n",
    "        theta3 = -0.5 * (theta1 + theta2)\n",
    "        theta4 = -0.5 * (theta1 + theta2)\n",
    "        theta = np.array([theta1, theta2, theta3, theta4])\n",
    "        \n",
    "        # Get full gradient and project\n",
    "        grad = entropy_gradient(theta)\n",
    "        proj_grad = project_gradient(theta, grad)\n",
    "        \n",
    "        # Store first two components\n",
    "        dX[i,j] = proj_grad[0]\n",
    "        dY[i,j] = proj_grad[1]\n",
    "\n",
    "# Normalize gradient vectors for better visualization\n",
    "norm = np.sqrt(dX**2 + dY**2)\n",
    "# Avoid division by zero\n",
    "norm = np.where(norm < 1e-10, 1e-10, norm)\n",
    "dX_norm = dX / norm\n",
    "dY_norm = dY / norm\n",
    "\n",
    "# A few gradient vectors for visualization\n",
    "stride = 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd49080a-2053-4a35-b508-ceb4dbe0ea0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb8916e9-b0f4-4a39-8b72-c68645cf5b44",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "\n",
    "# Create contour lines only (no filled contours)\n",
    "contours = plt.contour(X, Y, Z, levels=15, colors='black', linewidths=0.8)\n",
    "plt.clabel(contours, inline=True, fontsize=8, fmt='%.2f')\n",
    "\n",
    "# Add gradient vectors (normalized for direction, but scaled by magnitude for visibility)\n",
    "plt.quiver(X[::stride, ::stride], Y[::stride, ::stride], \n",
    "           dX_norm[::stride, ::stride], dY_norm[::stride, ::stride], \n",
    "           color='r', scale=30, width=0.003, scale_units='width')\n",
    "\n",
    "# Plot the gradient ascent trajectory\n",
    "plt.plot(theta_history[:, 0], theta_history[:, 1], 'b-', linewidth=2, \n",
    "         label='Gradient Ascent Path')\n",
    "plt.scatter(theta_history[0, 0], theta_history[0, 1], color='green', s=100, \n",
    "           marker='o', label='Start')\n",
    "plt.scatter(theta_history[-1, 0], theta_history[-1, 1], color='purple', s=100, \n",
    "           marker='*', label='End')\n",
    "\n",
    "# Add labels and title\n",
    "plt.xlabel('$\\\\theta_1$')\n",
    "plt.ylabel('$\\\\theta_2$')\n",
    "plt.title('Entropy Contours with Gradient Field')\n",
    "\n",
    "# Mark the saddle point (approximately at origin for this system)\n",
    "plt.scatter([0], [0], color='yellow', s=100, marker='*', \n",
    "            edgecolor='black', zorder=10, label='Saddle Point')\n",
    "plt.legend()\n",
    "\n",
    "mlai.write_figure(filename='simplified-saddle-point-example.svg', \n",
    "                  directory = './information-game')\n",
    "\n",
    "# Plot entropy evolution during gradient ascent\n",
    "plt.figure(figsize=plot.big_figsize)\n",
    "plt.plot(entropy_history)\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Entropy')\n",
    "plt.title('Entropy Evolution During Gradient Ascent')\n",
    "plt.grid(True)\n",
    "mlai.write_figure(filename='four-bin-entropy-evolution.svg', \n",
    "                  directory = './information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fca9cda3-7a0f-4ac2-a140-9152a98e967c",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/simplified-saddle-point-example.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualisation of a saddle point projected down to two\n",
    "dimensions.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-entropy-evolution.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy evolution during gradient ascent on the four-bin\n",
    "system.</i>\n",
    "\n",
    "The animation of system evolution would show initial rapid movement\n",
    "along high-eigenvalue directions, progressive slowing in directions with\n",
    "low eigenvalues and formation of information reservoirs in the\n",
    "critically slowed directions. Parameter-capacity uncertainty emerges\n",
    "naturally at the saddle point."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72b8f978-4b79-4ebd-9b17-675d650d68c5",
   "metadata": {},
   "source": [
    "## Saddle Points\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-saddle-points.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-saddle-points.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Saddle points represent critical transitions in the game’s evolution\n",
    "where the gradient $\\nabla_{\\boldsymbol{\\theta}}S \\approx 0$ but the\n",
    "game is not at a maximum or minimum. At these points.\n",
    "\n",
    "1.  The Fisher information matrix $G(\\boldsymbol{\\theta})$ has\n",
    "    eigenvalues with significantly different magnitudes\n",
    "2.  Some eigenvalues approach zero, creating “critically slowed”\n",
    "    directions in parameter space\n",
    "3.  Other eigenvalues remain large, allowing rapid evolution in certain\n",
    "    directions\n",
    "\n",
    "This creates a natural separation between “memory” variables (associated\n",
    "with near-zero eigenvalues) and “processing” variables (associated with\n",
    "large eigenvalues). The game’s behavior becomes highly non-isotropic in\n",
    "parameter space.\n",
    "\n",
    "At saddle points, direct gradient ascent stalls, and the game must\n",
    "leverage the Fourier duality between parameters and capacity variables\n",
    "to continue entropy production. The duality relationship $$\n",
    "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)]\n",
    "$$ allows the game to progress by temporarily increasing uncertainty in\n",
    "capacity space, which creates gradients in previously flat directions of\n",
    "parameter space.\n",
    "\n",
    "These saddle points often coincide with phase transitions between\n",
    "parameter-dominated and capacity-dominated regimes, where the game’s\n",
    "fundamental character changes in terms of information processing\n",
    "capabilities.\n",
    "\n",
    "At saddle points, we see the first manifestation of the uncertainty\n",
    "principle that will be explored in more detail. The relationship between\n",
    "parameters and capacity variables becomes important as the game\n",
    "navigates these critical regions. The Fourier duality relationship $$\n",
    "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)]\n",
    "$$ is not just a mathematical convenience but represents a constraint on\n",
    "information processing that parallels emerges from uncertainty\n",
    "principles. The duality is essential for understanding how the game\n",
    "maintains both precision in parameters and sufficient capacity for\n",
    "information storage.\n",
    "\n",
    "The emergence of critically slowed directions at saddle points directly\n",
    "leads to the formation of information reservoirs that we’ll explore in\n",
    "depth. These reservoirs form when certain parameter combinations become\n",
    "effectively “frozen” due to near-zero eigenvalues in the Fisher\n",
    "information matrix. This natural separation of timescales creates a\n",
    "hierarchical memory structure that resembles biological information\n",
    "processing systems, where different variables operate at different\n",
    "temporal scales. The game’s deliberate use of steepest ascent rather\n",
    "than natural gradient ensures these reservoirs form organically as the\n",
    "system evolves."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9301eb20-6719-4acd-9d92-f15ff1b39390",
   "metadata": {},
   "source": [
    "## Saddle Point Seeking Behaviour\n",
    "\n",
    "In the game’s evolution, we follow steepest ascent in parameter space to\n",
    "maximize entropy. Let’s contrast with the *natural gradient* approach\n",
    "that is often used in information geometry.\n",
    "\n",
    "The steepest ascent direction in Euclidean space is given by, $$\n",
    "\\Delta \\boldsymbol{\\theta}_{\\text{steepest}} = \\eta \\nabla_{\\boldsymbol{\\theta}} S = \\eta \\mathbf{g}\n",
    "$$ where $\\eta$ is a learning rate and $\\mathbf{g}$ is the entropy\n",
    "gradient.\n",
    "\n",
    "In contrast, the natural gradient adjusts the update direction according\n",
    "to the Fisher information geometry, $$\n",
    "\\Delta \\boldsymbol{\\theta}_{\\text{natural}} = \\eta G(\\boldsymbol{\\theta})^{-1} \\nabla_{\\boldsymbol{\\theta}} S = \\eta G(\\boldsymbol{\\theta})^{-1} \\mathbf{g}\n",
    "$$ where $G(\\boldsymbol{\\theta})$ is the Fisher information matrix. This\n",
    "represents a Newton step in the natural parameter space. Often the\n",
    "Newton step is difficult to compute, but for exponential families and\n",
    "their entropies the Fisher information has a form closely related to the\n",
    "gradients and would be easy to leverage. The game *explicitly* uses\n",
    "steepest ascent and this leads to very different behaviour, in\n",
    "particular near saddle points. In this regime\n",
    "\n",
    "1.  *Steepest ascent* slows dramatically in directions where the\n",
    "    gradient is small, leading to extremely slow progress along the\n",
    "    critically slowed modes. This actually helps the game by preserving\n",
    "    information in these modes while allowing continued evolution in\n",
    "    other directions.\n",
    "\n",
    "2.  *Natural gradient* would normalize the updates by the Fisher\n",
    "    information, potentially accelerating progress in critically slowed\n",
    "    directions. This would destroy the natural emergence of information\n",
    "    reservoirs that we desire.\n",
    "\n",
    "The use of steepest ascent rather than natural gradient is deliberate in\n",
    "our game. It allows the Fisher information matrix’s eigenvalue structure\n",
    "to directly influence the temporal dynamics, creating a natural\n",
    "separation of timescales that preserves information in critically slowed\n",
    "modes while allowing rapid evolution in others.\n",
    "\n",
    "As the game approaches a saddle point\n",
    "\n",
    "1.  The gradient $\\nabla_{\\boldsymbol{\\theta}} S$ approaches zero in\n",
    "    some directions but remains non-zero in others\n",
    "\n",
    "2.  The eigendecomposition of the Fisher information matrix\n",
    "    $G(\\boldsymbol{\\theta}) = V \\Lambda V^T$ reveals which directions\n",
    "    are critically slowed\n",
    "\n",
    "3.  Update magnitudes in different directions become proportional to\n",
    "    their corresponding eigenvalues\n",
    "\n",
    "4.  This creates the hierarchical timescale separation that forms the\n",
    "    basis of our memory structure\n",
    "\n",
    "This behavior creates a computational architecture where different\n",
    "variables naturally assume different functional roles based on their\n",
    "update dynamics, without requiring explicit design. The information\n",
    "geometry of the parameter space, combined with steepest ascent dynamics,\n",
    "self-organizes the game into memory and processing components.\n",
    "\n",
    "The saddle point dynamics in Jaynes’ World provide a mathematical\n",
    "framework for understanding how the game navigates the information\n",
    "landscapes. The balance between fast-evolving “processing” variables and\n",
    "slow-evolving “memory” variables offers insights into how complexity\n",
    "might emerge in environments that instantaneously maximise entropy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46282719-9f1b-41db-9617-2aef866685e3",
   "metadata": {},
   "source": [
    "## Gradient Flow and Least Action Principles\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-flow-least-action.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/gradient-flow-least-action.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The steepest ascent dynamics in our system naturally connect to least\n",
    "action principles in physics. We can demonstrate this connection through\n",
    "visualizing how our uncertainty ellipses evolve along paths of steepest\n",
    "entropy increase.\n",
    "\n",
    "For our entropy game, we can define an information-theoretic action, $$\n",
    "\\mathcal{A}[\\gamma] = \\int_0^T \\left(\\dot{\\boldsymbol{\\theta}} \\cdot \\nabla_{\\boldsymbol{\\theta}} S - \\frac{1}{2}\\|\\dot{\\boldsymbol{\\theta}}\\|^2\\right) \\text{d}t\n",
    "$$ where $\\gamma$ is a path through parameter space. The first term\n",
    "represents the rate of entropy production along the path\n",
    "($\\frac{\\text{d}S}{\\text{d}\\theta} \\cdot \\frac{\\text{d}\\theta}{\\text{d}t}$),\n",
    "while the second term constrains how quickly parameters can change.\n",
    "Maximizing this action leads naturally to gradient flow dynamics, where\n",
    "changes in parameters follow the entropy gradient: $$\n",
    "\\dot{\\boldsymbol{\\theta}} = \\nabla_{\\boldsymbol{\\theta}} S\n",
    "$$\n",
    "\n",
    "This is exactly what our steepest ascent dynamics implement: the system\n",
    "follows the entropy gradient, with the learning rate controlling the\n",
    "size of parameter updates. As the system evolves, it naturally creates\n",
    "information reservoirs in directions where the gradient is small but\n",
    "non-zero."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01a81031-7ab7-4d09-bbd9-f9813ad701a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1899b109-99c6-4d3d-9e57-a2d921cd3ba0",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def simulate_action_path(G_init, steps=100, learning_rate=0.01):\n",
    "    \"\"\"\n",
    "    Simulate path through parameter space following entropy gradient.\n",
    "    Returns both the path and the entropy production rate.\n",
    "    \"\"\"\n",
    "    G = G_init.copy()\n",
    "    path_history = []\n",
    "    entropy_production = []\n",
    "    \n",
    "    for _ in range(steps):\n",
    "        # Get current state\n",
    "        eigenvalues, eigenvectors = eigh(G)\n",
    "        \n",
    "        # Compute gradient\n",
    "        grad = entropy_gradient(eigenvalues)\n",
    "        proj_grad = project_gradient(eigenvalues, grad)\n",
    "        \n",
    "        # Store current point and entropy production rate\n",
    "        path_history.append(eigenvalues.copy())\n",
    "        entropy_production.append(np.dot(proj_grad, grad))\n",
    "        \n",
    "        # Update eigenvalues\n",
    "        eigenvalues += learning_rate * proj_grad\n",
    "        eigenvalues = np.maximum(eigenvalues, 1e-10)\n",
    "        \n",
    "        # Reconstruct G\n",
    "        G = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T\n",
    "        \n",
    "    return np.array(path_history), np.array(entropy_production)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3c5066c-51b6-452d-82a0-1e72f7f8b8f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Initialize system with 2 position-momentum pairs\n",
    "n_pairs = 2\n",
    "G_init = initialize_multidimensional_state(n_pairs, squeeze_factors=[0.1, 0.2])\n",
    "\n",
    "# Simulate path\n",
    "path_history, entropy_production = simulate_action_path(G_init)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7357d91-4780-4dee-a731-8af97142c6a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Create figure with two subplots\n",
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "\n",
    "# Plot 1: Path through eigenvalue space\n",
    "ax.plot(path_history[:, 0], path_history[:, 1], 'r-', label='Pair 1')\n",
    "ax.plot(path_history[:, 2], path_history[:, 3], 'b-', label='Pair 2')\n",
    "ax.set_xlabel('Position eigenvalue')\n",
    "ax.set_ylabel('Momentum eigenvalue')\n",
    "ax.set_title('Path Through Parameter Space')\n",
    "ax.legend()\n",
    "\n",
    "# Add minimum uncertainty hyperbolas\n",
    "x = np.linspace(0.1, 5, 100)\n",
    "ax.plot(x, min_uncertainty_product/x, 'k--', alpha=0.5, label='Min uncertainty')\n",
    "ax.set_xscale('log')\n",
    "ax.set_yscale('log')\n",
    "ax.grid(True)\n",
    "\n",
    "mlai.write_figure(filename='gradient-flow-least-action.svg', \n",
    "                  directory='./information-game')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba4429a4-4035-4a45-8e70-137e1e4987fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot 2: Uncertainty ellipses at selected points\n",
    "steps_to_show = [0, 25, 50, -1]\n",
    "plot_multidimensional_uncertainty(Lambda_history, step_indices=steps_to_show, pairs_to_plot=[0, 1])\n",
    "ax2.set_title('Evolution of Uncertainty Ellipses')\n",
    "\n",
    "plt.tight_layout()\n",
    "\n",
    "mlai.write_figure(filename='gradient-flow-least-action-uncertainty-ellipses.svg', \n",
    "                  directory='./information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03189eef-2c1a-4d6c-bab1-f3d278d308d7",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/gradient-flow-least-action.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualization of the gradient flow through parameter\n",
    "space.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/gradient-flow-least-action-uncertainty-ellipses.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualization of the corresponding evolution of uncertainty\n",
    "ellipses (right). The dashed lines show minimum uncertainty bounds.</i>\n",
    "\n",
    "The action integral governing this evolution can be written: $$\n",
    "\\mathcal{A}[\\gamma] = \\int_0^T \\left(\\dot{\\boldsymbol{\\theta}} \\cdot \\nabla_{\\boldsymbol{\\theta}} S - \\frac{1}{2}\\dot{\\boldsymbol{\\theta}}^\\top G \\dot{\\boldsymbol{\\theta}}\\right) \\text{d}t\n",
    "$$ where $G$ is the Fisher information metric. The path follows steepest\n",
    "entropy increase while respecting quantum uncertainty constraints,\n",
    "naturally transitioning from quantum-like to classical-like behavior."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8129d2b4-bb8e-4b80-9d26-e9c15ac7c9ee",
   "metadata": {},
   "source": [
    "## Extreme Physical Information and Entropy Maximization\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/epi-entropy-equivalence.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/epi-entropy-equivalence.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The gradient flow dynamics we’ve been exploring have interesting\n",
    "connections to Roy Frieden’s Extreme Physical Information (EPI)\n",
    "principle. This section explores the formal equivalence between entropy\n",
    "maximization in exponential families and EPI optimization.\n",
    "\n",
    "In our entropy game, we’ve been maximizing entropy through gradient\n",
    "ascent on the natural parameters $\\boldsymbol{\\theta}$. For exponential\n",
    "family distributions, the entropy gradient has a particularly elegant\n",
    "form $$\n",
    "\\nabla_{\\boldsymbol{\\theta}}S(Z) = \\mathbb{E}[T(Z)] - \\nabla_{\\boldsymbol{\\theta}}A(\\boldsymbol{\\theta}),\n",
    "$$ where $T(Z)$ are the sufficient statistics and\n",
    "$A(\\boldsymbol{\\theta})$ is the log-partition function. The Fisher\n",
    "information matrix is precisely the Hessian of this log-partition\n",
    "function, $$\n",
    "G(\\boldsymbol{\\theta}) = \\nabla^2_{\\boldsymbol{\\theta}} A(\\boldsymbol{\\theta}).\n",
    "$$ This establishes a direct connection between entropy maximization and\n",
    "the geometry of the parameter space as measured by Fisher information.\n",
    "\n",
    "Roy Frieden’s Extreme Physical Information (EPI) principle proposes that\n",
    "physical laws arise from the optimization of an information-theoretic\n",
    "functional. The EPI functional is defined as, $$\n",
    "\\Delta = I(X|\\boldsymbol{\\theta}) - J(M|\\boldsymbol{\\theta}),\n",
    "$$ where $I(X|\\boldsymbol{\\theta})$ is the Fisher information associated\n",
    "with observable variables $X$, and $J(M|\\boldsymbol{\\theta})$ is the\n",
    "“bound information” associated with memory variables $M$. The principle\n",
    "states that physical systems evolve to minimize this difference.\n",
    "\n",
    "For exponential families, the Fisher information can be expressed in\n",
    "terms of the Fisher information matrix $$\n",
    "I(X|\\boldsymbol{\\theta}) = \\text{Tr}[G_X(\\boldsymbol{\\theta})]\n",
    "$$ where $G_X(\\boldsymbol{\\theta})$ is the Fisher information matrix for\n",
    "the observable variables.\n",
    "\n",
    "The formal equivalence between entropy maximization and EPI optimization\n",
    "can be established by examining their equilibrium conditions. For\n",
    "entropy maximization, equilibrium occurs when $$\n",
    "\\nabla_{\\boldsymbol{\\theta}}S(Z) = \\mathbb{E}[T(Z)] - \\nabla_{\\boldsymbol{\\theta}}A(\\boldsymbol{\\theta}) = 0.\n",
    "$$ This implies\n",
    "$\\mathbb{E}[T(Z)] = \\nabla_{\\boldsymbol{\\theta}}A(\\boldsymbol{\\theta})$,\n",
    "which is the moment-matching condition for exponential families.\n",
    "\n",
    "For EPI optimization, equilibrium occurs when $$\n",
    "\\frac{\\delta \\Delta}{\\delta \\rho} = \\text{constant}.\n",
    "$$ For exponential families with a partitioned system $Z = (X,M)$, this\n",
    "condition becomes: $$\n",
    "\\frac{\\delta}{\\delta \\rho}\\left(\\text{Tr}[G_X(\\boldsymbol{\\theta})] - \\text{Tr}[G_M(\\boldsymbol{\\theta})]\\right) = \\text{constant}.\n",
    "$$\n",
    "\n",
    "When we express this in terms of natural parameters and apply the\n",
    "calculus of variations, we arrive at the same moment-matching condition\n",
    "$$\n",
    "\\mathbb{E}[T(Z)] = \\nabla_{\\boldsymbol{\\theta}}A(\\boldsymbol{\\theta}).\n",
    "$$\n",
    "\n",
    "This equivalence holds specifically when the system respects certain\n",
    "Markov properties, namely when $X$ and $X'$ are conditionally\n",
    "independent given $M$. Under these conditions, both approaches lead to\n",
    "the same equilibrium distribution.\n",
    "\n",
    "This equivalence can also be understood through the lens of information\n",
    "geometry. The Fisher information matrix $G(\\boldsymbol{\\theta})$ defines\n",
    "a Riemannian metric on the manifold of probability distributions.\n",
    "\n",
    "The entropy gradient flow follows geodesics in the dual geometry\n",
    "(mixture geometry), while the EPI optimization follows geodesics in the\n",
    "primal geometry (exponential family geometry). At equilibrium, these\n",
    "paths converge to the same point on the manifold - the maximum entropy\n",
    "distribution subject to the given constraints.\n",
    "\n",
    "To demonstrate this equivalence computationally, we’ll implement both\n",
    "optimization processes and compare their trajectories through parameter\n",
    "space. The following code simulates a system with observable variables\n",
    "and memory variables, tracking how they evolve under both entropy\n",
    "maximization and EPI optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e79ced9-000b-4926-8e20-0ed74008a82a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.linalg import eigh\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4d1123e-5d19-4f43-b16b-c13ca8533777",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def compute_epi_functional(G, partition_indices):\n",
    "    \"\"\"\n",
    "    Compute the EPI functional for a given Fisher information matrix\n",
    "    and partition of variables into observables and memory.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    G: array\n",
    "        Fisher information matrix\n",
    "    partition_indices: list\n",
    "        Indices of observable variables (complement is memory variables)\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    Delta: float\n",
    "        EPI functional value\n",
    "    \"\"\"\n",
    "    n = G.shape[0]\n",
    "    obs_indices = np.array(partition_indices)\n",
    "    mem_indices = np.array([i for i in range(n) if i not in obs_indices])\n",
    "    \n",
    "    # Extract submatrices\n",
    "    G_obs = G[np.ix_(obs_indices, obs_indices)]\n",
    "    G_mem = G[np.ix_(mem_indices, mem_indices)]\n",
    "    \n",
    "    # Compute trace of each submatrix\n",
    "    I_obs = np.trace(G_obs)\n",
    "    J_mem = np.trace(G_mem)\n",
    "    \n",
    "    return I_obs - J_mem"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a11fa8f2-9ac4-423f-97c9-67de9cb3e08d",
   "metadata": {},
   "source": [
    "The `compute_epi_functional` function calculates Frieden’s EPI\n",
    "functional (Δ = I - J) by extracting the Fisher information submatrices\n",
    "for observable and memory variables, computing their traces, and\n",
    "returning the difference. This implements the mathematical definition of\n",
    "the EPI functional for our computational experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04b94901-2bb2-49de-b4ba-565b420d0287",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def epi_gradient(G, partition_indices):\n",
    "    \"\"\"\n",
    "    Compute gradient of EPI functional with respect to parameters.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    G: array\n",
    "        Fisher information matrix\n",
    "    partition_indices: list\n",
    "        Indices of observable variables\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    gradient: array\n",
    "        Gradient of EPI functional\n",
    "    \"\"\"\n",
    "    n = G.shape[0]\n",
    "    gradient = np.zeros(n)\n",
    "    \n",
    "    obs_indices = np.array(partition_indices)\n",
    "    mem_indices = np.array([i for i in range(n) if i not in obs_indices])\n",
    "    \n",
    "    # Set gradient components (simplified model)\n",
    "    gradient[obs_indices] = -1.0  # Minimize Fisher information for observables\n",
    "    gradient[mem_indices] = 1.0   # Maximize Fisher information for memory\n",
    "    \n",
    "    return gradient"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b04162fb-4d30-4a13-aef8-f5b381b7c347",
   "metadata": {},
   "source": [
    "The `epi_gradient` function computes the direction for minimizing the\n",
    "EPI functional. For observable variables, we want to minimize Fisher\n",
    "information (reducing uncertainty), while for memory variables, we want\n",
    "to maximize it (increasing capacity). This gradient guides the system\n",
    "toward the equilibrium where observable and memory variables reach an\n",
    "optimal information balance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee07493a-823c-4dca-85f8-cc795bd12934",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "def compare_entropy_and_epi_paths(G_init, partition_indices, steps=100, learning_rate=0.01):\n",
    "    \"\"\"\n",
    "    Compare paths of entropy maximization and EPI optimization.\n",
    "    \n",
    "    Parameters:\n",
    "    -----------\n",
    "    G_init: array\n",
    "        Initial Fisher information matrix\n",
    "    partition_indices: list\n",
    "        Indices of observable variables\n",
    "    steps: int\n",
    "        Number of gradient steps\n",
    "    learning_rate: float\n",
    "        Step size for gradient updates\n",
    "    \n",
    "    Returns:\n",
    "    --------\n",
    "    entropy_path: array\n",
    "        Path through parameter space under entropy maximization\n",
    "    epi_path: array\n",
    "        Path through parameter space under EPI optimization\n",
    "    \"\"\"\n",
    "    # Initialize\n",
    "    G_entropy = G_init.copy()\n",
    "    G_epi = G_init.copy()\n",
    "    \n",
    "    entropy_path = []\n",
    "    epi_path = []\n",
    "    \n",
    "    for _ in range(steps):\n",
    "        # Entropy maximization step\n",
    "        eigenvalues_entropy, eigenvectors_entropy = eigh(G_entropy)\n",
    "        entropy_grad = entropy_gradient(eigenvalues_entropy)\n",
    "        eigenvalues_entropy += learning_rate * entropy_grad\n",
    "        eigenvalues_entropy = np.maximum(eigenvalues_entropy, 1e-10)\n",
    "        G_entropy = eigenvectors_entropy @ np.diag(eigenvalues_entropy) @ eigenvectors_entropy.T\n",
    "        entropy_path.append(eigenvalues_entropy.copy())\n",
    "        \n",
    "        # EPI optimization step\n",
    "        eigenvalues_epi, eigenvectors_epi = eigh(G_epi)\n",
    "        epi_grad = epi_gradient(G_epi, partition_indices)\n",
    "        projected_epi_grad = project_gradient(eigenvalues_epi, epi_grad)\n",
    "        eigenvalues_epi += learning_rate * projected_epi_grad\n",
    "        eigenvalues_epi = np.maximum(eigenvalues_epi, 1e-10)\n",
    "        G_epi = eigenvectors_epi @ np.diag(eigenvalues_epi) @ eigenvectors_epi.T\n",
    "        epi_path.append(eigenvalues_epi.copy())\n",
    "    \n",
    "    return np.array(entropy_path), np.array(epi_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d709b9d5-d756-4155-9a72-3a1b99a4a6c6",
   "metadata": {},
   "source": [
    "The `compare_entropy_and_epi_paths` function is our main simulation\n",
    "function. It runs both optimization processes in parallel, tracking the\n",
    "eigenvalues of the Fisher information matrix at each step. This allows\n",
    "us to compare how the two approaches navigate through parameter space.\n",
    "While they may take different paths, our theoretical analysis suggests\n",
    "they should reach similar equilibrium states.\n",
    "\n",
    "This implementation builds on the `InformationReservoir` class from\n",
    "previous examples, but generalizes to higher dimensions with multiple\n",
    "position-momentum pairs. It extends the concept of uncertainty relations\n",
    "to track how these uncertainties evolve under both entropy maximization\n",
    "and EPI optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5487f8d-6cd2-47eb-bbde-826acf311d07",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Initialize system with 2 position-momentum pairs\n",
    "n_pairs = 2\n",
    "G_init = initialize_multidimensional_state(n_pairs, squeeze_factors=[0.1, 0.2])\n",
    "\n",
    "# Define partition: first pair is observable, second pair is memory\n",
    "partition_indices = [0, 1]  # Indices of first position-momentum pair\n",
    "\n",
    "# Compare paths\n",
    "entropy_path, epi_path = compare_entropy_and_epi_paths(G_init, partition_indices)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12f36307-334f-40d2-955f-91c484d5ea57",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Create figure with two subplots\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))\n",
    "\n",
    "# Plot 1: Paths through eigenvalue space for first pair\n",
    "ax1.plot(entropy_path[:, 0], entropy_path[:, 1], 'r-', label='Entropy Max')\n",
    "ax1.plot(epi_path[:, 0], epi_path[:, 1], 'b--', label='EPI Min')\n",
    "ax1.set_xlabel('Position eigenvalue')\n",
    "ax1.set_ylabel('Momentum eigenvalue')\n",
    "ax1.set_title('Observable Variables Path')\n",
    "ax1.legend()\n",
    "ax1.set_xscale('log')\n",
    "ax1.set_yscale('log')\n",
    "ax1.grid(True)\n",
    "\n",
    "# Plot 2: Paths through eigenvalue space for second pair\n",
    "ax2.plot(entropy_path[:, 2], entropy_path[:, 3], 'r-', label='Entropy Max')\n",
    "ax2.plot(epi_path[:, 2], epi_path[:, 3], 'b--', label='EPI Min')\n",
    "ax2.set_xlabel('Position eigenvalue')\n",
    "ax2.set_ylabel('Momentum eigenvalue')\n",
    "ax2.set_title('Memory Variables Path')\n",
    "ax2.legend()\n",
    "ax2.set_xscale('log')\n",
    "ax2.set_yscale('log')\n",
    "ax2.grid(True)\n",
    "\n",
    "plt.tight_layout()\n",
    "\n",
    "mlai.write_figure(filename='epi-entropy-comparison.svg', \n",
    "                  directory='./information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a26fc82-fa5b-4e5a-8389-c4bcf16405f1",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/epi-entropy-comparison.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Comparison of parameter paths under entropy maximization (red\n",
    "solid line) and EPI optimization (blue dashed line) for observable\n",
    "variables (left) and memory variables (right).</i>\n",
    "\n",
    "The figure above compares the paths taken through parameter space under\n",
    "entropy maximization versus EPI optimization. For observable variables\n",
    "(left), both approaches lead to similar equilibrium states but may\n",
    "follow different trajectories. For memory variables (right), the paths\n",
    "can diverge more significantly depending on the specific constraints and\n",
    "initial conditions.\n",
    "\n",
    "Despite these potential differences in trajectories, both approaches\n",
    "ultimately identify information-theoretic optima that balance\n",
    "uncertainty between different parts of the system. This computational\n",
    "demonstration supports the theoretical equivalence we established in the\n",
    "mathematical derivation.\n",
    "\n",
    "This visualization shows that while the paths taken by entropy\n",
    "maximization (red solid line) and EPI optimization (blue dashed line)\n",
    "may differ, they ultimately reach similar equilibrium states. This\n",
    "provides concrete evidence for the abstract mathematical equivalence\n",
    "discussed in the theoretical section.\n",
    "\n",
    "This connection to Frieden’s work also relates to our earlier discussion\n",
    "of least action principles. The EPI principle can be viewed as a special\n",
    "case of a more general variational principle, where the “action” being\n",
    "minimized is an information-theoretic functional rather than a physical\n",
    "one. This reinforces the idea that information geometry provides a\n",
    "natural framework for understanding both physical and\n",
    "information-theoretic dynamics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e7e19fa-e07a-451e-bf15-e6fb1489ffa9",
   "metadata": {},
   "source": [
    "## Spontaneous Organization Through Entropy Maximization\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/spontaneous-organization.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/spontaneous-organization.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "For the system to ‘spontaneously organise’ we need to understand how\n",
    "mutual information evolves under our dynamics.\n",
    "\n",
    "We’re maximizing entropy in the natural parameter space\n",
    "$\\boldsymbol{\\theta}$, not directly in probability space. This\n",
    "distinction is crucial - while maximizing entropy in probability space\n",
    "would lead to independence between variables, maximizing entropy in\n",
    "natural parameter space can simultaneously increase both joint entropy\n",
    "and mutual information.\n",
    "\n",
    "To make this notion of “organization” more concrete, we should consider:\n",
    "\n",
    "1.  Spatial or network topology - how variables are connected in some\n",
    "    underlying structure\n",
    "2.  Locality of interactions - how information flows between neighboring\n",
    "    components\n",
    "3.  Emergence of recognizable patterns or structures at different scales\n",
    "\n",
    "Joint distribution over variables $Z = (X, M)$, where $M$ represents\n",
    "memory variables in an information reservoir (at a saddle point in the\n",
    "dynamics) and $X$ represents observable variables. The system evolves by\n",
    "maximizing entropy $S$ in the natural parameter space\n",
    "$\\boldsymbol{\\theta}$, $$\n",
    "\\frac{\\text{d}\\boldsymbol{\\theta}}{\\text{d}t} = \\eta \\nabla_{\\boldsymbol{\\theta}}S[p(z,t)].\n",
    "$$ To understand spontaneous organization, we need to examine how mutual\n",
    "information $I(X;M)$ evolves under these dynamics. We can decompose the\n",
    "joint entropy, $$\n",
    "S[p(z,t)] = S(X) + S(M) - I(X;M).\n",
    "$$ Taking the derivative across turns, $$\n",
    "\\frac{\\text{d}S}{\\text{d}t} = \\frac{\\text{d}S(X)}{\\text{d}t} + \\frac{\\text{d}S(M)}{\\text{d}t} - \\frac{\\text{d}I(X;M)}{\\text{d}t}.\n",
    "$$\n",
    "\n",
    "We know $\\frac{\\text{d}S}{\\text{d}t} > 0$ (entropy is being maximized),\n",
    "and because $M$ are at saddle points we know that\n",
    "$\\frac{\\text{d}S(M)}{\\text{d}t} \\approx 0$. Therefore we can rearrange\n",
    "to find, $$\n",
    "\\frac{\\text{d}I(X;M)}{\\text{d}t} \\approx \\frac{\\text{d}S(X)}{\\text{d}t}  - \\frac{\\text{d}S}{\\text{d}t}.\n",
    "$$ Spontaneous organization emerges when\n",
    "$\\frac{\\text{d}I(X;M)}{\\text{d}t} > 0$, which occurs when $$\n",
    "\\frac{\\text{d}S(X)}{\\text{d}t} > \\frac{\\text{d}S}{\\text{d}t}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e29dcce-9c77-47ee-bb44-38f013595d9f",
   "metadata": {},
   "source": [
    "## Fisher Information and Multiple Timescales in Spontaneous Organization\n",
    "\n",
    "We introduce the Fisher information and the effect of multiple\n",
    "timescales to analyze when the gradient condition\n",
    "$\\frac{\\text{d}S(X)}{\\text{d}t} > \\frac{\\text{d}S}{\\text{d}t}$ holds.\n",
    "\n",
    "The Fisher information matrix $G(\\boldsymbol{\\theta})$ provides a\n",
    "natural metric on the statistical manifold of probability distributions.\n",
    "For our joint distribution $p(z|\\boldsymbol{\\theta})$, the Fisher\n",
    "information is defined as $$\n",
    "G_{ij}(\\boldsymbol{\\theta}) = \\mathbb{E}\\left[\\frac{\\partial \\log p(z|\\boldsymbol{\\theta})}{\\partial \\theta_i}\\frac{\\partial \\log p(z|\\boldsymbol{\\theta})}{\\partial \\theta_j}\\right].\n",
    "$$\n",
    "\n",
    "When we partition our variables into fast variables $X$ and slow\n",
    "variables $M$ (representing the information reservoir), we are\n",
    "suggesting a a timescale separation in the natural parameter dynamics,\n",
    "$$\n",
    "\\frac{\\text{d}\\boldsymbol{\\theta}_X}{\\text{d}t} = \\eta_X \\nabla_{\\boldsymbol{\\theta}_X}S[p(z,t)],\n",
    "$$ $$\n",
    "\\frac{\\text{d}\\boldsymbol{\\theta}_M}{\\text{d}t} =  \\eta_M \\nabla_{\\boldsymbol{\\theta}_M}S[p(z,t)],\n",
    "$$ where\n",
    "$\\left|\\nabla_{\\boldsymbol{\\theta}_X}S[p(z,t)]\\right| \\gg \\left|\\nabla_{\\boldsymbol{\\theta}_M}S[p(z,t)]\\right|$\n",
    "indicates that $X$ evolves much faster than $M$.\n",
    "\n",
    "This timescale separation reflects an asymmetry that would drive\n",
    "spontaneous organization. The entropy dynamics can be expressed in terms\n",
    "of the Fisher information matrix and the natural parameter velocities,\n",
    "$$\n",
    "\\frac{\\text{d}S}{\\text{d}t} = \\nabla_{\\boldsymbol{\\theta}}S \\cdot \\frac{\\text{d}\\boldsymbol{\\theta}}{\\text{d}t} = \\nabla_{\\boldsymbol{\\theta}_X}S \\cdot \\frac{\\text{d}\\boldsymbol{\\theta}_X}{\\text{d}t} + \\nabla_{\\boldsymbol{\\theta}_M}S \\cdot \\frac{\\text{d}\\boldsymbol{\\theta}_M}{\\text{d}t}.\n",
    "$$\n",
    "\n",
    "Substituting our gradient ascent dynamics with different learning rates:\n",
    "$$\n",
    "\\frac{\\text{d}S}{\\text{d}t} = \\nabla_{\\boldsymbol{\\theta}_X}S \\cdot (\\eta_X \\nabla_{\\boldsymbol{\\theta}_X}S) + \\nabla_{\\boldsymbol{\\theta}_M}S \\cdot (\\eta_M \\nabla_{\\boldsymbol{\\theta}_M}S) = \\eta_X \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|^2 + \\eta_M \\|\\nabla_{\\boldsymbol{\\theta}_M}S\\|^2.\n",
    "$$\n",
    "\n",
    "Similarly, the marginal entropy of $X$ evolves according to, $$\n",
    "\\frac{\\text{d}S(X)}{\\text{d}t} = \\nabla_{\\boldsymbol{\\theta}_X}S(X) \\cdot \\frac{\\text{d}\\boldsymbol{\\theta}_X}{\\text{d}t} = \\nabla_{\\boldsymbol{\\theta}_X}S(X) \\cdot (\\eta_X \\nabla_{\\boldsymbol{\\theta}_X}S) = \\eta_X \\nabla_{\\boldsymbol{\\theta}_X}S(X) \\cdot \\nabla_{\\boldsymbol{\\theta}_X}S.\n",
    "$$\n",
    "\n",
    "Note that this is not generally equal to\n",
    "$\\eta_X \\|\\nabla_{\\boldsymbol{\\theta}_X}S(X)\\|^2$ unless\n",
    "$\\nabla_{\\boldsymbol{\\theta}_X}S = \\nabla_{\\boldsymbol{\\theta}_X}S(X)$,\n",
    "which is not typically the case when variables are correlated.\n",
    "\n",
    "The gradient condition for spontaneous organization,\n",
    "$\\frac{\\text{d}I(X;M)}{\\text{d}t} > 0$, can be rewritten using our\n",
    "earlier relation $$\n",
    "\\frac{\\text{d}I(X;M)}{\\text{d}t} \\approx \\frac{\\text{d}S(X)}{\\text{d}t} - \\frac{\\text{d}S}{\\text{d}t},\n",
    "$$ giving $$\n",
    "\\eta_X \\nabla_{\\boldsymbol{\\theta}_X}S(X) \\cdot \\nabla_{\\boldsymbol{\\theta}_X}S > \\eta_X \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|^2 + \\eta_M \\|\\nabla_{\\boldsymbol{\\theta}_M}S\\|^2.\n",
    "$$\n",
    "\n",
    "Since $\\eta_M \\|\\nabla_{\\boldsymbol{\\theta}_M}S\\|^2 > 0$ (except exactly\n",
    "at saddle points), this inequality requires: $$\n",
    "\\nabla_{\\boldsymbol{\\theta}_X}S(X) \\cdot \\nabla_{\\boldsymbol{\\theta}_X}S > \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|^2.\n",
    "$$\n",
    "\n",
    "This is a stronger condition than simply requiring the gradients to be\n",
    "aligned. By the Cauchy-Schwarz inequality, we know that\n",
    "$\\nabla_{\\boldsymbol{\\theta}_X}S(X) \\cdot \\nabla_{\\boldsymbol{\\theta}_X}S \\leq \\|\\nabla_{\\boldsymbol{\\theta}_X}S(X)\\| \\cdot \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|$.\n",
    "Therefore, the condition can only be satisfied when\n",
    "$\\|\\nabla_{\\boldsymbol{\\theta}_X}S(X)\\| > \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|$\n",
    "and the gradients are sufficiently aligned.\n",
    "\n",
    "This inequality suggests that spontaneous organization occurs when the\n",
    "gradient of marginal entropy $S(X)$ with respect to\n",
    "$\\boldsymbol{\\theta}_X$ has a larger magnitude than the gradient of\n",
    "joint entropy $S$ with respect to the same parameters.\n",
    "\n",
    "This condition can be satisfied when $X$ variables are strongly coupled\n",
    "to $M$ variables in a specific way. We express the mutual information\n",
    "gradient $$\n",
    "\\nabla_{\\boldsymbol{\\theta}_X}I(X;M) = \\nabla_{\\boldsymbol{\\theta}_X}S(X) + \\nabla_{\\boldsymbol{\\theta}_X}S(M) - \\nabla_{\\boldsymbol{\\theta}_X}S.\n",
    "$$\n",
    "\n",
    "Since $M$ evolves slowly, we can approximate\n",
    "$\\nabla_{\\boldsymbol{\\theta}_X}S(M) \\approx 0$, yielding $$\n",
    "\\nabla_{\\boldsymbol{\\theta}_X}I(X;M) \\approx \\nabla_{\\boldsymbol{\\theta}_X}S(X) - \\nabla_{\\boldsymbol{\\theta}_X}S.\n",
    "$$\n",
    "\n",
    "Our condition for spontaneous organization can be rewritten as $$\n",
    "\\|\\nabla_{\\boldsymbol{\\theta}_X}S(X)\\|^2 > \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|^2.\n",
    "$$\n",
    "\n",
    "We can expand this condition using the relationship between these\n",
    "gradients. Since\n",
    "$\\nabla_{\\boldsymbol{\\theta}_X}I(X;M) \\approx \\nabla_{\\boldsymbol{\\theta}_X}S(X) - \\nabla_{\\boldsymbol{\\theta}_X}S$,\n",
    "we can write $$\n",
    "\\|\\nabla_{\\boldsymbol{\\theta}_X}S(X)\\|^2 = \\|\\nabla_{\\boldsymbol{\\theta}_X}S + \\nabla_{\\boldsymbol{\\theta}_X}I(X;M)\\|^2.\n",
    "$$\n",
    "\n",
    "Expanding this squared norm we have $$\n",
    "\\|\\nabla_{\\boldsymbol{\\theta}_X}S(X)\\|^2 = \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|^2 + \\|\\nabla_{\\boldsymbol{\\theta}_X}I(X;M)\\|^2 + 2\\nabla_{\\boldsymbol{\\theta}_X}S \\cdot \\nabla_{\\boldsymbol{\\theta}_X}I(X;M).\n",
    "$$\n",
    "\n",
    "For our condition\n",
    "$\\|\\nabla_{\\boldsymbol{\\theta}_X}S(X)\\|^2 > \\|\\nabla_{\\boldsymbol{\\theta}_X}S\\|^2$\n",
    "to be satisfied, we need $$\n",
    "\\|\\nabla_{\\boldsymbol{\\theta}_X}I(X;M)\\|^2 + 2\\nabla_{\\boldsymbol{\\theta}_X}S \\cdot \\nabla_{\\boldsymbol{\\theta}_X}I(X;M) > 0\n",
    "$$\n",
    "\n",
    "To analyze when this condition holds, we must examine the Fisher\n",
    "information geometry near saddle points. At a saddle point of the\n",
    "entropy landscape, the Hessian matrix of the entropy has both positive\n",
    "and negative eigenvalues. The Fisher information matrix\n",
    "$G(\\boldsymbol{\\theta})$ provides the natural metric on this statistical\n",
    "manifold.\n",
    "\n",
    "Near a saddle point, the Fisher information matrix exhibits a\n",
    "characteristic eigenvalue spectrum with a separation between large and\n",
    "small eigenvalues. The eigenvectors corresponding to small eigenvalues\n",
    "define the slow manifold (associated with memory variables $M$), while\n",
    "those with large eigenvalues correspond to fast-evolving directions\n",
    "(associated with observable variables $X$).\n",
    "\n",
    "The gradient of joint entropy can be decomposed into components along\n",
    "these eigendirections. Due to the timescale separation, the gradient\n",
    "components along fast directions quickly equilibrate, while components\n",
    "along slow directions persist. This creates a scenario where:\n",
    "\n",
    "1.  The gradient flow predominantly occurs along fast directions, with\n",
    "    slow directions acting as constraints\n",
    "2.  The system explores configurations that maximize entropy subject to\n",
    "    these constraints\n",
    "\n",
    "Under these conditions, the dot product\n",
    "$\\nabla_{\\boldsymbol{\\theta}_X}S \\cdot \\nabla_{\\boldsymbol{\\theta}_X}I(X;M)$\n",
    "can become positive when the entropy gradient aligns with directions\n",
    "that increase mutual information. This alignment is not random but\n",
    "emerges deterministically in specific regions of the parameter space,\n",
    "particularly near saddle points where the eigenvalue spectrum of the\n",
    "Fisher information matrix exhibits a clear separation between fast and\n",
    "slow modes. As the system evolves toward these saddle points, it\n",
    "naturally enters configurations where the alignment condition is\n",
    "satisfied due to the geometric properties of the entropy landscape.\n",
    "\n",
    "This analysis identifies the conditions under which spontaneous\n",
    "organisation becomes possible within the framework of entropy\n",
    "maximization in natural parameter space. The key insight is that the\n",
    "geometry of the Fisher information near saddle points creates regions\n",
    "where entropy maximization and mutual information may occur\n",
    "simultaneously.\n",
    "\n",
    "This timescale separation enables an adiabatic elimination process where\n",
    "fast variables $X$ reach a quasi-equilibrium for each slow configuration\n",
    "of $M$. This creates effective dynamics where $M$ adapts to encode\n",
    "statistical regularities in the behavior of $X$.\n",
    "\n",
    "Mathematically, we can express this using the Hessian matrices, $$\n",
    "\\mathbf{H}_X = \\frac{\\partial^2 S}{\\partial \\boldsymbol{\\theta}_X \\partial \\boldsymbol{\\theta}_X},\n",
    "$$ $$\n",
    "\\mathbf{H}_{XM} = \\frac{\\partial^2 S}{\\partial \\boldsymbol{\\theta}_X \\partial \\boldsymbol{\\theta}_M}.\n",
    "$$\n",
    "\n",
    "The condition for spontaneous organization becomes $$\n",
    "\\frac{\\text{d}I(X;M)}{\\text{d}t} \\approx \\eta_X \\text{tr}(\\mathbf{H}_{S(X)}) - \\eta_X \\text{tr}(\\mathbf{H}_S) - \\eta_M \\text{tr}(\\mathbf{H}_{XM}) = -\\eta_M \\text{tr}(\\mathbf{H}_{XM}).\n",
    "$$\n",
    "\n",
    "This approximation is valid when the system has reached a\n",
    "quasi-equilibrium state for the fast variables $X$, where\n",
    "$\\nabla_{\\boldsymbol{\\theta}_X}S \\approx \\nabla_{\\boldsymbol{\\theta}_X}S(X)$.\n",
    "In this regime, the first two terms approximately cancel out, leaving\n",
    "the cross-correlation term dominant. Here, $\\mathbf{H}_{S(X)}$ is the\n",
    "Hessian of the marginal entropy $S(X)$ with respect to\n",
    "$\\boldsymbol{\\theta}_X$, $\\mathbf{H}_S$ is the Hessian of the joint\n",
    "entropy, and $\\mathbf{H}_{XM}$ is the cross-correlation Hessian\n",
    "measuring how changes in $\\boldsymbol{\\theta}_X$ affect gradients with\n",
    "respect to $\\boldsymbol{\\theta}_M$.\n",
    "\n",
    "Thus, mutual information increases when\n",
    "$\\text{tr}(\\mathbf{H}_{XM}) < 0$, which occurs when the\n",
    "cross-correlation Hessian between $X$ and $M$ has predominantly negative\n",
    "eigenvalues. This represents configurations where joint entropy\n",
    "increases more efficiently by strengthening correlations rather than\n",
    "breaking them.\n",
    "\n",
    "This provides a precise mathematical characterization of when\n",
    "spontaneous organization emerges from entropy maximization in natural\n",
    "parameter space under multiple timescales."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b1c2d03-0a8c-4afb-9e4f-640a62d97cd3",
   "metadata": {},
   "source": [
    "## Locality Through Conditional Independence\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/conditional-independence-structures.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/conditional-independence-structures.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "One way to formalize the notion of locality in our information-theoretic\n",
    "framework is through conditional independence structures.\n",
    "\n",
    "When we have a small number of slow modes (M) that act as information\n",
    "reservoirs, they can induce conditional independence between subsets of\n",
    "fast variables (X), creating a form of locality.\n",
    "\n",
    "This approach connects our abstract information-theoretic framework to\n",
    "more intuitive notions of spatial organization and modularity without\n",
    "requiring an explicit spatial embedding.\n",
    "\n",
    "We partition our fast variables X into subsets\n",
    "$X = \\{X^1, X^2, ..., X^K\\}$, where each $X^k$ might represent variables\n",
    "that are “close” to each other in some abstract sense.\n",
    "\n",
    "The joint entropy of the entire system can be decomposed as $$\n",
    "\\begin{align}\n",
    "S(X, M) &= S(M) + S(X|M)\\\\\n",
    "&= S(M) + \\sum_{k=1}^K S(X^k|M) - \\sum_{k=1}^K \\sum_{j<k} I(X^k; X^j|M).\n",
    "\\end{align}\n",
    "$$ Here, $I(X^k; X^j|M)$ is the conditional mutual information between\n",
    "subsets $X^k$ and $X^j$ given M. This term quantifies how much\n",
    "dependence remains between these subsets after accounting for the\n",
    "information in the slow modes M.\n",
    "\n",
    "When the slow modes $M$ capture the global structure of the system, the\n",
    "conditional mutual information terms become very small, $$\n",
    "I(X^k; X^j|M) \\approx 0 \\quad \\text{for } j \\neq k.\n",
    "$$ This means that different regions of the system become conditionally\n",
    "independent given the state of the slow modes, $$\n",
    "p(X^1, X^2, ..., X^K|M) \\approx \\prod_{k=1}^K p(X^k|M).\n",
    "$$ This factorization gives us our notion of locality - each subsystem\n",
    "$X^k$ can be understood in terms of its relationship to the global slow\n",
    "modes $M$, with minimal direct influence from other subsystems.\n",
    "\n",
    "For multivariate Gaussian systems, we can formalize this connection\n",
    "precisely. If we consider the precision matrix (inverse covariance) of\n",
    "the joint distribution $\\Lambda$ and partition it according to slow\n",
    "modes $M$ and fast variables $X$, $$\n",
    "\\Lambda = \\begin{bmatrix} \\Lambda_{MM} & \\Lambda_{MX} \\\\ \\Lambda_{XM} & \\Lambda_{XX} \\end{bmatrix}.\n",
    "$$ The conditional precision matrix of $X$ given $M$ is simply\n",
    "$\\Lambda_{X|M} = \\Lambda_{XX}$. When $X$ is further partitioned into\n",
    "subsets $\\{X^1, X^2, ..., X^K\\}$, conditional independence between these\n",
    "subsets given $M$ requires $\\Lambda_{X|M}$ to have a block-diagonal\n",
    "structure, $$\n",
    "\\Lambda_{X|M} = \\Lambda_{XX} = \\begin{bmatrix} \n",
    "\\Lambda_{X^1 X^1} & 0 & \\cdots & 0 \\\\\n",
    "0 & \\Lambda_{X^2 X^2} & \\cdots & 0 \\\\\n",
    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "0 & 0 & \\cdots & \\Lambda_{X^K X^K}\n",
    "\\end{bmatrix}\n",
    "$$ The eigenvalue spectrum of the Fisher information matrix determines\n",
    "how effectively this block structure can be achieved. When there’s a\n",
    "clear separation between a few very small eigenvalues (corresponding to\n",
    "$M$) and the rest (corresponding to $X$), the slow modes can optimally\n",
    "capture the global dependencies, leaving minimal residual dependencies\n",
    "between different regions of $X$.\n",
    "\n",
    "The degree to which this factorization holds can be quantified by the\n",
    "off-diagonal blocks in $\\Lambda_{X|M}$. The magnitude of these elements\n",
    "directly determines the conditional mutual information terms\n",
    "$I(X^k; X^j|M)$. The eigenvalue gap between slow and fast modes\n",
    "determines how effectively the slow modes can absorb the dependencies,\n",
    "leading to smaller off-diagonal elements and thus conditional\n",
    "independence.\n",
    "\n",
    "Importantly, this same principle applies to systems represented by\n",
    "density matrices with quadratic Hamiltonians. For a system with density\n",
    "matrix $\\rho$, we can decompose it as $$\n",
    "\\rho = \\exp(-\\mathcal{H})/Z\n",
    "$$ where $\\mathcal{H}$ is a quadratic Hamiltonian of the form $$\n",
    "\\mathcal{H} = \\frac{1}{2}z^T J z\n",
    "$$ with $z$ being the state vector and $J$ the coupling matrix. The\n",
    "Hamiltonian $\\mathcal{H}$ must be Hermitian (self-adjoint) to ensure the\n",
    "density matrix is physically valid, and the structure of $J$ directly\n",
    "determines the correlation structure in the system.\n",
    "\n",
    "The eigendecomposition of $J$ identifies the normal modes of the system:\n",
    "$$\n",
    "J = U \\Sigma U^T\n",
    "$$ where $\\Sigma$ is a diagonal matrix of eigenvalues. The smallest\n",
    "eigenvalues correspond to the slow modes, and their associated\n",
    "eigenvectors in $U$ define how these modes couple to the original\n",
    "variables.\n",
    "\n",
    "For conditional independence in density matrix formalism, when we\n",
    "partition the system into subsystems and condition on the slow modes,\n",
    "the residual couplings between subsystems are determined by the block\n",
    "structure of $J$ after “integrating out” the slow modes. This produces\n",
    "an effective $J'$ for the subsystems given the slow modes, and the\n",
    "off-diagonal blocks of this effective $J'$ determine the conditional\n",
    "mutual information between subsystems.\n",
    "\n",
    "The eigenvalue gap again plays the crucial role: a larger separation\n",
    "between slow and fast eigenvalues allows the slow modes to more\n",
    "effectively absorb the cross-system couplings, leading to an effective\n",
    "$J'$ that is more block-diagonal and thus creating stronger conditional\n",
    "independence.\n",
    "\n",
    "For readers interested in the quantum Fisher information perspective,\n",
    "note that for systems with quadratic Hamiltonians, the quantum Fisher\n",
    "information matrix is directly related to the coupling matrix $J$.\n",
    "Specifically, for a Gaussian quantum state with density matrix\n",
    "$\\rho = \\exp(-\\mathcal{H})/Z$, the quantum Fisher information matrix\n",
    "$F_Q$ can be expressed in terms of the second derivatives of the\n",
    "Hamiltonian, $$\n",
    "[F_Q]_{ij} \\propto \\frac{\\partial^2 \\mathcal{H}}{\\partial \\theta_i \\partial \\theta_j},\n",
    "$$ where $\\theta_i$ are parameters of the system. For quadratic\n",
    "Hamiltonians, these derivatives yield elements of the coupling matrix\n",
    "$J$. The eigenvalue structure of $F_Q$ then determines the information\n",
    "geometry of the system, including which parameters correspond to slow\n",
    "modes (small eigenvalues) versus fast modes (large eigenvalues).\n",
    "\n",
    "The non-commutative nature of quantum operators is embedded in the\n",
    "structure of $J$ and consequently in $F_Q$, which affects how\n",
    "information is distributed and how conditional independence structures\n",
    "form in quantum systems compared to classical ones. The symmetry\n",
    "properties of $F_Q$ reflect the uncertainty relations inherent in\n",
    "quantum mechanics, providing additional constraints on how effectively\n",
    "slow modes can induce conditional independence.\n",
    "\n",
    "The connection to the eigenvalue spectrum provides a formal link between\n",
    "the abstract mathematics of the game and intuitive notions of spatial\n",
    "organization.\n",
    "\n",
    "When the Fisher information matrix has a few eigenvalues that are much\n",
    "smaller than the rest (large separation in the timescales over which the\n",
    "system evolves), the corresponding eigenvectors define the slow modes\n",
    "$M$. These slow modes act as sufficient statistics for the interactions\n",
    "between different regions of the system.\n",
    "\n",
    "The conditional independence structure induced by these slow modes\n",
    "creates a graph structure of dependencies. Variables that remain\n",
    "conditionally dependent given $M$ are “closer” to each other than those\n",
    "that become conditionally independent.\n",
    "\n",
    "This is analogous to how in physics, systems with long-range\n",
    "interactions often have a small number of conserved quantities or order\n",
    "parameters (slow modes) that govern the large-scale behavior, while\n",
    "local fluctuations (fast modes) can be treated as approximately\n",
    "independent when conditioned on these global variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aadc78f0-261d-4875-af7f-5519fabf3bf8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import networkx as nx\n",
    "import scipy.stats as stats\n",
    "import mlai.plot as plot\n",
    "import matplotlib.gridspec as gridspec\n",
    "from matplotlib.colors import LinearSegmentedColormap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d52debfb-2bf6-466a-b328-cb3830d23aed",
   "metadata": {},
   "outputs": [],
   "source": [
    "class ConditionalIndependenceDemo:\n",
    "    def __init__(self, n_clusters=4, n_vars_per_cluster=5, n_slow_modes=2):\n",
    "        \"\"\"\n",
    "        Demonstrate how slow modes induce conditional independence structures.\n",
    "        \n",
    "        Parameters:\n",
    "        -----------\n",
    "        n_clusters: int\n",
    "            Number of variable clusters (regions)\n",
    "        n_vars_per_cluster: int\n",
    "            Number of variables in each cluster\n",
    "        n_slow_modes: int\n",
    "            Number of slow modes (information reservoir variables)\n",
    "        \"\"\"\n",
    "        self.n_clusters = n_clusters\n",
    "        self.n_vars_per_cluster = n_vars_per_cluster\n",
    "        self.n_slow_modes = n_slow_modes\n",
    "        self.n_total_vars = n_clusters * n_vars_per_cluster + n_slow_modes\n",
    "        \n",
    "        # Generate a precision matrix with block structure\n",
    "        self.precision = self._generate_precision_matrix()\n",
    "        self.covariance = np.linalg.inv(self.precision)\n",
    "        \n",
    "        # Compute eigendecomposition of the precision matrix\n",
    "        self.eigenvalues, self.eigenvectors = np.linalg.eigh(self.precision)\n",
    "        \n",
    "        # Identify slow modes (smallest eigenvalues)\n",
    "        self.slow_indices = np.argsort(self.eigenvalues)[:n_slow_modes]\n",
    "        self.fast_indices = np.argsort(self.eigenvalues)[n_slow_modes:]\n",
    "        \n",
    "    def _generate_precision_matrix(self):\n",
    "        \"\"\"Generate a precision matrix with block structure and slow modes.\"\"\"\n",
    "        n = self.n_total_vars\n",
    "        \n",
    "        # Start with a block diagonal structure for fast variables\n",
    "        precision = np.zeros((n, n))\n",
    "        \n",
    "        # Create blocks for each cluster\n",
    "        for i in range(self.n_clusters):\n",
    "            start_idx = i * self.n_vars_per_cluster\n",
    "            end_idx = start_idx + self.n_vars_per_cluster\n",
    "            \n",
    "            # Within-cluster connections (strong precision = strong direct dependencies)\n",
    "            block = np.random.uniform(0.7, 1.0, \n",
    "                                     (self.n_vars_per_cluster, self.n_vars_per_cluster))\n",
    "            block = (block + block.T) / 2  # Make symmetric\n",
    "            np.fill_diagonal(block, 1.0)  # Set diagonal to 1\n",
    "            \n",
    "            precision[start_idx:end_idx, start_idx:end_idx] = block\n",
    "        \n",
    "        # Add slow modes that connect across clusters\n",
    "        slow_start = self.n_clusters * self.n_vars_per_cluster\n",
    "        slow_end = n\n",
    "        \n",
    "        # Slow modes have connections to all fast variables\n",
    "        for i in range(slow_start, slow_end):\n",
    "            for j in range(slow_start):\n",
    "                # Weaker connections but present\n",
    "                precision[i, j] = precision[j, i] = np.random.uniform(0.1, 0.3)\n",
    "        \n",
    "        # Slow modes are also connected to each other\n",
    "        slow_block = np.random.uniform(0.2, 0.4, \n",
    "                                      (self.n_slow_modes, self.n_slow_modes))\n",
    "        slow_block = (slow_block + slow_block.T) / 2\n",
    "        np.fill_diagonal(slow_block, 0.5)  # Smaller diagonal values = slower modes\n",
    "        \n",
    "        precision[slow_start:slow_end, slow_start:slow_end] = slow_block\n",
    "        \n",
    "        # Ensure the matrix is positive definite\n",
    "        min_eig = np.min(np.linalg.eigvalsh(precision))\n",
    "        if min_eig <= 0:\n",
    "            precision += np.eye(n) * (abs(min_eig) + 0.01)\n",
    "            \n",
    "        return precision\n",
    "    \n",
    "    def compute_mutual_information_matrix(self, conditional_on_slow=False):\n",
    "        \"\"\"\n",
    "        Compute pairwise mutual information between variables.\n",
    "        \n",
    "        Parameters:\n",
    "        -----------\n",
    "        conditional_on_slow: bool\n",
    "            If True, compute conditional mutual information given slow modes\n",
    "        \n",
    "        Returns:\n",
    "        --------\n",
    "        mi_matrix: numpy array\n",
    "            Matrix of (conditional) mutual information values\n",
    "        \"\"\"\n",
    "        n_fast = self.n_clusters * self.n_vars_per_cluster\n",
    "        mi_matrix = np.zeros((n_fast, n_fast))\n",
    "        \n",
    "        if conditional_on_slow:\n",
    "            # Compute conditional mutual information given slow modes\n",
    "            # Using Schur complement to get conditional distribution\n",
    "            slow_idx = slice(n_fast, self.n_total_vars)\n",
    "            fast_idx = slice(0, n_fast)\n",
    "            \n",
    "            # Extract blocks\n",
    "            P_ff = self.precision[fast_idx, fast_idx]\n",
    "            \n",
    "            # Conditional precision of fast variables given slow variables is \n",
    "            # just the fast block of the precision matrix\n",
    "            cond_precision = P_ff\n",
    "            cond_covariance = np.linalg.inv(cond_precision)\n",
    "            \n",
    "            # Compute conditional mutual information from conditional covariance\n",
    "            for i in range(n_fast):\n",
    "                for j in range(i+1, n_fast):\n",
    "                    # For multivariate Gaussian, conditional MI is related to partial correlation\n",
    "                    partial_corr = -cond_precision[i, j] / np.sqrt(cond_precision[i, i] * cond_precision[j, j])\n",
    "                    # Convert to mutual information\n",
    "                    mi = -0.5 * np.log(1 - partial_corr**2)\n",
    "                    mi_matrix[i, j] = mi_matrix[j, i] = mi\n",
    "        else:\n",
    "            # Compute unconditional mutual information\n",
    "            for i in range(n_fast):\n",
    "                for j in range(i+1, n_fast):\n",
    "                    # Extract the 2x2 covariance submatrix\n",
    "                    subcov = self.covariance[[i, j]][:, [i, j]]\n",
    "                    \n",
    "                    # For bivariate Gaussian, MI is related to correlation\n",
    "                    corr = subcov[0, 1] / np.sqrt(subcov[0, 0] * subcov[1, 1])\n",
    "                    # Convert correlation to mutual information\n",
    "                    mi = -0.5 * np.log(1 - corr**2)\n",
    "                    mi_matrix[i, j] = mi_matrix[j, i] = mi\n",
    "        \n",
    "        return mi_matrix\n",
    "    \n",
    "    def visualize_conditional_independence(self):\n",
    "        \"\"\"Visualize how slow modes induce conditional independence.\"\"\"\n",
    "        # Compute mutual information matrices\n",
    "        mi_unconditional = self.compute_mutual_information_matrix(conditional_on_slow=False)\n",
    "        mi_conditional = self.compute_mutual_information_matrix(conditional_on_slow=True)\n",
    "        \n",
    "        # Create a visualization\n",
    "        fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "        gs = gridspec.GridSpec(2, 3, height_ratios=[1, 1], width_ratios=[1, 1, 0.1])\n",
    "        \n",
    "        # Plot the precision matrix with block structure\n",
    "        ax1 = plt.subplot(gs[0, 0])\n",
    "        im1 = ax1.imshow(self.precision, cmap='viridis')\n",
    "        ax1.set_title('Precision Matrix\\nBlock structure with slow modes')\n",
    "        ax1.set_xlabel('Variable index')\n",
    "        ax1.set_ylabel('Variable index')\n",
    "        \n",
    "        # Add lines to delineate the blocks\n",
    "        for i in range(1, self.n_clusters):\n",
    "            idx = i * self.n_vars_per_cluster - 0.5\n",
    "            ax1.axhline(y=idx, color='red', linestyle='-', linewidth=0.5)\n",
    "            ax1.axvline(x=idx, color='red', linestyle='-', linewidth=0.5)\n",
    "        \n",
    "        # Add line to delineate slow modes\n",
    "        idx = self.n_clusters * self.n_vars_per_cluster - 0.5\n",
    "        ax1.axhline(y=idx, color='red', linestyle='-', linewidth=1.5)\n",
    "        ax1.axvline(x=idx, color='red', linestyle='-', linewidth=1.5)\n",
    "        \n",
    "        # Plot eigenvalue spectrum\n",
    "        ax2 = plt.subplot(gs[0, 1])\n",
    "        ax2.plot(range(self.n_total_vars), np.sort(self.eigenvalues), 'o-')\n",
    "        ax2.set_title('Eigenvalue Spectrum\\nSmall eigenvalues = slow modes')\n",
    "        ax2.set_xlabel('Index')\n",
    "        ax2.set_ylabel('Eigenvalue')\n",
    "        ax2.set_yscale('log')\n",
    "        ax2.grid(True, alpha=0.3)\n",
    "        \n",
    "        # Indicate the separation of eigenvalues\n",
    "        ax2.axvline(x=self.n_slow_modes-0.5, color='red', linestyle='--')\n",
    "        ax2.axhspan(-0.1, self.eigenvalues[self.slow_indices].max()*1.1, \n",
    "                   color='blue', alpha=0.2)\n",
    "        ax2.text(self.n_slow_modes/2, self.eigenvalues[self.slow_indices].max()/2, \n",
    "                'Slow Modes', ha='center')\n",
    "        \n",
    "        # Create a custom colormap that shows difference more clearly\n",
    "        cmap = LinearSegmentedColormap.from_list('mi_diff', \n",
    "                                            [(0, 'blue'), (0.5, 'white'), (1, 'red')])\n",
    "        \n",
    "        # Plot unconditional mutual information\n",
    "        ax3 = plt.subplot(gs[1, 0])\n",
    "        im3 = ax3.imshow(mi_unconditional, cmap='inferno')\n",
    "        ax3.set_title('Unconditional Mutual Information\\nStrong dependencies between regions')\n",
    "        ax3.set_xlabel('Fast variable index')\n",
    "        ax3.set_ylabel('Fast variable index')\n",
    "        \n",
    "        # Add lines to delineate the clusters\n",
    "        for i in range(1, self.n_clusters):\n",
    "            idx = i * self.n_vars_per_cluster - 0.5\n",
    "            ax3.axhline(y=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "            ax3.axvline(x=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "        \n",
    "        # Plot conditional mutual information\n",
    "        ax4 = plt.subplot(gs[1, 1])\n",
    "        im4 = ax4.imshow(mi_conditional, cmap='inferno')\n",
    "        ax4.set_title('Conditional Mutual Information\\nWeaker dependencies after conditioning on slow modes')\n",
    "        ax4.set_xlabel('Fast variable index')\n",
    "        ax4.set_ylabel('Fast variable index')\n",
    "        \n",
    "        # Add lines to delineate the clusters\n",
    "        for i in range(1, self.n_clusters):\n",
    "            idx = i * self.n_vars_per_cluster - 0.5\n",
    "            ax4.axhline(y=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "            ax4.axvline(x=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "        \n",
    "        # Add colorbar\n",
    "        cax = plt.subplot(gs[1, 2])\n",
    "        cbar = plt.colorbar(im4, cax=cax)\n",
    "        cbar.set_label('Mutual Information')\n",
    "        \n",
    "        plt.tight_layout()\n",
    "        return fig\n",
    "    \n",
    "    def visualize_dependency_graphs(self, threshold=0.1):\n",
    "        \"\"\"\n",
    "        Visualize dependency graphs with and without conditioning on slow modes.\n",
    "        \n",
    "        Parameters:\n",
    "        -----------\n",
    "        threshold: float\n",
    "            Threshold for including edges in the graph\n",
    "        \"\"\"\n",
    "        # Compute mutual information matrices\n",
    "        mi_unconditional = self.compute_mutual_information_matrix(conditional_on_slow=False)\n",
    "        mi_conditional = self.compute_mutual_information_matrix(conditional_on_slow=True)\n",
    "        \n",
    "        # Create dependency graphs\n",
    "        n_fast = self.n_clusters * self.n_vars_per_cluster\n",
    "        G_uncond = nx.Graph()\n",
    "        G_cond = nx.Graph()\n",
    "        \n",
    "        # Add nodes\n",
    "        for i in range(n_fast):\n",
    "            cluster_id = i // self.n_vars_per_cluster\n",
    "            # Position nodes in clusters\n",
    "            angle = 2 * np.pi * (i % self.n_vars_per_cluster) / self.n_vars_per_cluster\n",
    "            radius = 1.0\n",
    "            x = (2 + cluster_id % 2) * 3 + radius * np.cos(angle)\n",
    "            y = (cluster_id // 2) * 3 + radius * np.sin(angle)\n",
    "            \n",
    "            G_uncond.add_node(i, pos=(x, y), cluster=cluster_id)\n",
    "            G_cond.add_node(i, pos=(x, y), cluster=cluster_id)\n",
    "        \n",
    "        # Add edges based on mutual information\n",
    "        for i in range(n_fast):\n",
    "            for j in range(i+1, n_fast):\n",
    "                # Unconditional graph\n",
    "                if mi_unconditional[i, j] > threshold:\n",
    "                    G_uncond.add_edge(i, j, weight=mi_unconditional[i, j])\n",
    "                \n",
    "                # Conditional graph\n",
    "                if mi_conditional[i, j] > threshold:\n",
    "                    G_cond.add_edge(i, j, weight=mi_conditional[i, j])\n",
    "        \n",
    "        # Create a visualization\n",
    "        fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "        \n",
    "        # Plot unconditional dependency graph\n",
    "        ax1 = plt.subplot(1, 2, 1)\n",
    "        pos_uncond = nx.get_node_attributes(G_uncond, 'pos')\n",
    "        \n",
    "        # Color nodes by cluster\n",
    "        node_colors = [G_uncond.nodes[i]['cluster'] for i in G_uncond.nodes]\n",
    "        \n",
    "        # Draw nodes\n",
    "        nx.draw_networkx_nodes(G_uncond, pos_uncond, \n",
    "                              node_color=node_colors, \n",
    "                              node_size=100,\n",
    "                              cmap=plt.cm.tab10,\n",
    "                              ax=ax1)\n",
    "        \n",
    "        # Draw edges with width proportional to mutual information\n",
    "        edges = G_uncond.edges()\n",
    "        edge_weights = [G_uncond[u][v]['weight']*3 for u, v in edges]\n",
    "        \n",
    "        nx.draw_networkx_edges(G_uncond, pos_uncond, \n",
    "                              width=edge_weights, \n",
    "                              alpha=0.6,\n",
    "                              edge_color='gray',\n",
    "                              ax=ax1)\n",
    "        \n",
    "        # Add node labels\n",
    "        nx.draw_networkx_labels(G_uncond, pos_uncond, font_size=8, ax=ax1)\n",
    "        \n",
    "        ax1.set_title('Unconditional Dependency Graph\\nMany cross-cluster dependencies')\n",
    "        ax1.set_axis_off()\n",
    "        \n",
    "        # Plot conditional dependency graph\n",
    "        ax2 = plt.subplot(1, 2, 2)\n",
    "        pos_cond = nx.get_node_attributes(G_cond, 'pos')\n",
    "        \n",
    "        # Color nodes by cluster\n",
    "        node_colors = [G_cond.nodes[i]['cluster'] for i in G_cond.nodes]\n",
    "        \n",
    "        # Draw nodes\n",
    "        nx.draw_networkx_nodes(G_cond, pos_cond, \n",
    "                              node_color=node_colors, \n",
    "                              node_size=100,\n",
    "                              cmap=plt.cm.tab10,\n",
    "                              ax=ax2)\n",
    "        \n",
    "        # Draw edges with width proportional to conditional mutual information\n",
    "        edges = G_cond.edges()\n",
    "        edge_weights = [G_cond[u][v]['weight']*3 for u, v in edges]\n",
    "        \n",
    "        nx.draw_networkx_edges(G_cond, pos_cond, \n",
    "                              width=edge_weights, \n",
    "                              alpha=0.6,\n",
    "                              edge_color='gray',\n",
    "                              ax=ax2)\n",
    "        \n",
    "        # Add node labels\n",
    "        nx.draw_networkx_labels(G_cond, pos_cond, font_size=8, ax=ax2)\n",
    "        \n",
    "        ax2.set_title('Conditional Dependency Graph\\nMostly within-cluster dependencies remain')\n",
    "        ax2.set_axis_off()\n",
    "        \n",
    "        plt.tight_layout()\n",
    "        return fig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c33d6f1-655f-4641-8d4b-350259cc215b",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Run the demonstration\n",
    "np.random.seed(42)  # For reproducibility\n",
    "demo = ConditionalIndependenceDemo(n_clusters=4, n_vars_per_cluster=5, n_slow_modes=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef122866-0f56-4111-976d-fe6bd2ebba96",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb571abc-203a-4ec9-a0e1-74c7c10c00db",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig1 = demo.visualize_conditional_independence()\n",
    "\n",
    "mlai.write_figure(filename='conditional-independence-matrices.svg', \n",
    "                  directory='./information-game')\n",
    "\n",
    "fig2 = demo.visualize_dependency_graphs(threshold=0.1)\n",
    "\n",
    "mlai.write_figure(filename='conditional-independence-graphs.svg', \n",
    "                  directory='./information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "468a6b5a-8afe-44b4-9f2c-f874b677e814",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/conditional-independence-matrices.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualization of how conditioning on slow modes induces\n",
    "independence between clusters of fast variables.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/conditional-independence-graphs.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Dependency graphs before and after conditioning on slow\n",
    "modes, showing the emergence of modularity.</i>\n",
    "\n",
    "The visualisation demonstrates how conditioning on slow modes\n",
    "drastically reduces the mutual information between variables in\n",
    "different clusters, while preserving dependencies within clusters. This\n",
    "creates a modular structure where each cluster becomes nearly\n",
    "independent given the state of the slow modes.\n",
    "\n",
    "This modular organization emerges from the eigenvalue structure of the\n",
    "Fisher information matrix, without requiring any explicit spatial\n",
    "embedding or pre-defined notion of locality. The slow modes act as a\n",
    "information bottleneck that encodes the necessary global information\n",
    "while allowing local regions to operate semi-independently.\n",
    "\n",
    "In a physical system, structures like this manifest as the emergence of\n",
    "spatial patterns or functional modules that interact primarily through a\n",
    "small number of global variables. In a neural network, such structures\n",
    "correspond to the formation of specialized modules that handle different\n",
    "aspects of processing while communicating through a compressed global\n",
    "representation.\n",
    "\n",
    "The notions of locality in our framework are not about physical\n",
    "distance, but about the conditional independence structure induced by\n",
    "the slow modes of the system. This abstract notion of locality that can\n",
    "be applied to any system where information flows are important."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28caeed9-4a14-4bd3-abf0-f1b8cdceb0bf",
   "metadata": {},
   "source": [
    "## Information Topography\n",
    "\n",
    "Building on the conditional independence structure, we can define an\n",
    "“information topography” - a conceptual landscape that characterizes how\n",
    "information flows through the system.\n",
    "\n",
    "This topography emerges from the pattern of mutual information between\n",
    "variables and their dependency on the slow modes. We can visualize this\n",
    "as a landscape where.\n",
    "\n",
    "1.  The “elevation” corresponds to the conditional entropy of variables\n",
    "    given the slow modes\n",
    "2.  The “valleys” or “channels” represent strong information pathways\n",
    "    between variables\n",
    "3.  The “watersheds” or “ridges” separate regions that are conditionally\n",
    "    independent given M\n",
    "\n",
    "Mathematically, we can define a distance metric between variables based\n",
    "on their conditional mutual information, $$\n",
    "d_I(X^i, X^j) = \\frac{1}{I(X^i; X^j|M) + \\epsilon}\n",
    "$$ where $\\epsilon$ is a small constant to avoid division by zero.\n",
    "Variables with higher conditional mutual information are “closer” in\n",
    "this information metric."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2db9ff7d-0d43-4d34-a514-8dc6d718473e",
   "metadata": {},
   "source": [
    "## Properties of Information Topography\n",
    "\n",
    "The information topography has several important properties.\n",
    "\n",
    "1.  It is non-Euclidean - the triangle inequality may not hold\n",
    "2.  It is dynamic - changes in the slow modes reshape the entire\n",
    "    landscape\n",
    "3.  It is hierarchical - we can define different topographies at\n",
    "    different scales by considering different subsets of the slow modes\n",
    "\n",
    "The eigenvalue spectrum of the Fisher information matrix directly shapes\n",
    "this topography. The larger the separation between the few smallest\n",
    "eigenvalues and the rest, the more pronounced the “ridges” in the\n",
    "topography, leading to stronger locality and modularity.\n",
    "\n",
    "This perspective allows us to quantify notions like “information\n",
    "distance” and “information barriers” without requiring an explicit\n",
    "spatial embedding, providing a framework for understanding modularity\n",
    "across different types of complex systems."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5ff0180-e610-4b2a-ae1b-15833f7ffcd8",
   "metadata": {},
   "outputs": [],
   "source": [
    "class InformationTopographyDemo:\n",
    "    def __init__(self, n_clusters=4, n_vars_per_cluster=5, n_slow_modes=2):\n",
    "        \"\"\"\n",
    "        Visualize the information topography based on minimal entropy gradient framework.\n",
    "        \n",
    "        Parameters:\n",
    "        -----------\n",
    "        n_clusters: int\n",
    "            Number of variable clusters\n",
    "        n_vars_per_cluster: int\n",
    "            Number of variables per cluster\n",
    "        n_slow_modes: int\n",
    "            Number of slow modes that induce conditional independence\n",
    "        \"\"\"\n",
    "        self.n_clusters = n_clusters\n",
    "        self.n_vars_per_cluster = n_vars_per_cluster\n",
    "        self.n_vars = n_clusters * n_vars_per_cluster\n",
    "        self.n_slow_modes = n_slow_modes\n",
    "        self.hbar = 1.0\n",
    "        self.min_uncertainty_product = self.hbar / 2\n",
    "        \n",
    "        # Initialize system with position-momentum pairs\n",
    "        # Instead of working with CMI directly, we'll build from precision matrix\n",
    "        self.dim = 2 * self.n_vars\n",
    "        self.initialize_precision_matrix()\n",
    "        \n",
    "    def initialize_precision_matrix(self):\n",
    "        \"\"\"Initialize precision matrix with eigenvalue structure that creates clusters.\"\"\"\n",
    "        # Create a precision matrix that has a clear eigenvalue structure\n",
    "        # with a few small eigenvalues (slow modes) and many large eigenvalues (fast modes)\n",
    "        \n",
    "        # Start with identity matrix\n",
    "        Lambda = np.eye(self.dim)\n",
    "        \n",
    "        # Generate random eigenvectors\n",
    "        Q, _ = np.linalg.qr(np.random.randn(self.dim, self.dim))\n",
    "        \n",
    "        # Set eigenvalues: a few small ones (slow modes) and many large ones (fast modes)\n",
    "        eigenvalues = np.ones(self.dim)\n",
    "        \n",
    "        # Slow modes - small eigenvalues\n",
    "        eigenvalues[:self.n_slow_modes] = 0.1 + 0.1 * np.random.rand(self.n_slow_modes)\n",
    "        \n",
    "        # Fast modes - larger eigenvalues organized in clusters\n",
    "        for i in range(self.n_clusters):\n",
    "            cluster_start = self.n_slow_modes + i * self.n_vars_per_cluster\n",
    "            cluster_end = cluster_start + self.n_vars_per_cluster\n",
    "            \n",
    "            # Each cluster has similar eigenvalues\n",
    "            base_value = 1.0 + i * 0.5\n",
    "            eigenvalues[cluster_start:cluster_end] = base_value + 0.2 * np.random.rand(self.n_vars_per_cluster)\n",
    "        \n",
    "        # Construct precision matrix with this eigenstructure\n",
    "        self.Lambda = Q @ np.diag(eigenvalues) @ Q.T\n",
    "        \n",
    "        # Get inverse (covariance matrix)\n",
    "        self.covariance = np.linalg.inv(self.Lambda)\n",
    "        \n",
    "        # Store eigendecomposition\n",
    "        self.eigenvalues, self.eigenvectors = np.linalg.eigh(self.Lambda)\n",
    "        self.slow_mode_vectors = self.eigenvectors[:, :self.n_slow_modes]\n",
    "        \n",
    "    def compute_conditional_mutual_information(self):\n",
    "        \"\"\"Compute conditional mutual information matrix given slow modes.\"\"\"\n",
    "        # Compute full mutual information from covariance\n",
    "        mi_full = np.zeros((self.n_vars, self.n_vars))\n",
    "        \n",
    "        # Compute conditional mutual information given slow modes\n",
    "        mi_conditional = np.zeros((self.n_vars, self.n_vars))\n",
    "        \n",
    "        # For each pair of variables (considering position only for simplicity)\n",
    "        for i in range(self.n_vars):\n",
    "            for j in range(i+1, self.n_vars):\n",
    "                # Extract positions from covariance matrix\n",
    "                pos_i, pos_j = i*2, j*2\n",
    "                cov_ij = self.covariance[np.ix_([pos_i, pos_j], [pos_i, pos_j])]\n",
    "                \n",
    "                # Compute unconditional mutual information\n",
    "                var_i = self.covariance[pos_i, pos_i]\n",
    "                var_j = self.covariance[pos_j, pos_j]\n",
    "                mi = 0.5 * np.log(var_i * var_j / np.linalg.det(cov_ij))\n",
    "                mi_full[i, j] = mi\n",
    "                mi_full[j, i] = mi\n",
    "                \n",
    "                # Compute residual covariance after conditioning on slow modes\n",
    "                cov_i_slow = self.covariance[pos_i, :self.n_slow_modes]\n",
    "                cov_j_slow = self.covariance[pos_j, :self.n_slow_modes]\n",
    "                cov_slow = self.covariance[:self.n_slow_modes, :self.n_slow_modes]\n",
    "                \n",
    "                # Schur complement formula for conditional covariance\n",
    "                cov_ij_given_slow = cov_ij - np.array([\n",
    "                    [cov_i_slow @ np.linalg.solve(cov_slow, cov_i_slow), \n",
    "                     cov_i_slow @ np.linalg.solve(cov_slow, cov_j_slow)],\n",
    "                    [cov_j_slow @ np.linalg.solve(cov_slow, cov_i_slow),\n",
    "                     cov_j_slow @ np.linalg.solve(cov_slow, cov_j_slow)]\n",
    "                ])\n",
    "                \n",
    "                # Compute conditional mutual information\n",
    "                var_i_given_slow = cov_ij_given_slow[0, 0]\n",
    "                var_j_given_slow = cov_ij_given_slow[1, 1]\n",
    "                if np.linalg.det(cov_ij_given_slow) > 0:  # Numerical stability check\n",
    "                    cmi = 0.5 * np.log(var_i_given_slow * var_j_given_slow / np.linalg.det(cov_ij_given_slow))\n",
    "                    mi_conditional[i, j] = cmi\n",
    "                    mi_conditional[j, i] = cmi\n",
    "        \n",
    "        return mi_full, mi_conditional\n",
    "    \n",
    "    def compute_information_distance(self, conditional_mi, epsilon=1e-6):\n",
    "        \"\"\"Convert conditional mutual information to a distance metric.\"\"\"\n",
    "        # Higher CMI = closer in information space\n",
    "        # Lower CMI = further apart (conditional independence)\n",
    "        distance = 1.0 / (conditional_mi + epsilon)\n",
    "        np.fill_diagonal(distance, 0)  # No self-distance\n",
    "        return distance\n",
    "    \n",
    "    def visualize_information_landscape(self):\n",
    "        \"\"\"Visualize the information topography as a landscape.\"\"\"\n",
    "        # Compute conditional mutual information\n",
    "        mi_full, mi_conditional = self.compute_conditional_mutual_information()\n",
    "        \n",
    "        # Compute information distance matrix\n",
    "        distance = self.compute_information_distance(mi_conditional)\n",
    "        \n",
    "        # Use multidimensional scaling to project the distance matrix to 2D\n",
    "        from sklearn.manifold import MDS\n",
    "        \n",
    "        # Apply MDS to embed in 2D space\n",
    "        mds = MDS(n_components=2, dissimilarity='precomputed', random_state=42)\n",
    "        pos = mds.fit_transform(distance)\n",
    "        \n",
    "        # Create a visualization\n",
    "        fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "        \n",
    "        # Plot the embedded points with cluster colors\n",
    "        ax = fig.add_subplot(111)\n",
    "        \n",
    "        # Assign colors by cluster\n",
    "        colors = plt.cm.tab10(np.linspace(0, 1, self.n_clusters))\n",
    "        \n",
    "        for i in range(self.n_vars):\n",
    "            cluster_id = i // self.n_vars_per_cluster\n",
    "            ax.scatter(pos[i, 0], pos[i, 1], c=[colors[cluster_id]], \n",
    "                      s=100, label=f\"Cluster {cluster_id}\" if i % self.n_vars_per_cluster == 0 else \"\")\n",
    "            ax.text(pos[i, 0] + 0.02, pos[i, 1] + 0.02, str(i), fontsize=9)\n",
    "        \n",
    "        # Add connections based on conditional mutual information\n",
    "        # Stronger connections = higher CMI = lower distance\n",
    "        threshold = np.percentile(mi_conditional[mi_conditional > 0], 70)  # Only show top 30% strongest connections\n",
    "        \n",
    "        for i in range(self.n_vars):\n",
    "            for j in range(i+1, self.n_vars):\n",
    "                if mi_conditional[i, j] > threshold:\n",
    "                    # Line width proportional to mutual information\n",
    "                    width = mi_conditional[i, j] * 5\n",
    "                    ax.plot([pos[i, 0], pos[j, 0]], [pos[i, 1], pos[j, 1]], \n",
    "                           'k-', alpha=0.5, linewidth=width)\n",
    "        \n",
    "        # Add slow mode projections as gradient in background\n",
    "        # This shows how the slow modes influence the information landscape\n",
    "        grid_resolution = 100\n",
    "        x_min, x_max = pos[:, 0].min() - 0.5, pos[:, 0].max() + 0.5\n",
    "        y_min, y_max = pos[:, 1].min() - 0.5, pos[:, 1].max() + 0.5\n",
    "        xx, yy = np.meshgrid(np.linspace(x_min, x_max, grid_resolution),\n",
    "                             np.linspace(y_min, y_max, grid_resolution))\n",
    "        \n",
    "        # Interpolate slow mode projection values to the grid\n",
    "        from scipy.interpolate import Rbf\n",
    "        \n",
    "        # Use just the first slow mode for visualization\n",
    "        slow_mode_projection = self.slow_mode_vectors[:, 0]\n",
    "        \n",
    "        # Extract position variables (even indices)\n",
    "        pos_indices = np.arange(0, self.dim, 2)\n",
    "        pos_slow_projection = slow_mode_projection[pos_indices]\n",
    "        \n",
    "        # Normalize for visualization\n",
    "        pos_slow_projection = (pos_slow_projection - pos_slow_projection.min()) / (pos_slow_projection.max() - pos_slow_projection.min())\n",
    "        \n",
    "        # Create RBF interpolation\n",
    "        rbf = Rbf(pos[:, 0], pos[:, 1], pos_slow_projection, function='multiquadric')\n",
    "        slow_mode_grid = rbf(xx, yy)\n",
    "        \n",
    "        # Plot slow mode influence as background gradient\n",
    "        im = ax.imshow(slow_mode_grid, extent=[x_min, x_max, y_min, y_max], \n",
    "                      origin='lower', cmap='viridis', alpha=0.3)\n",
    "        plt.colorbar(im, ax=ax, label='Slow Mode Influence')\n",
    "        \n",
    "        # Remove duplicate legend entries\n",
    "        handles, labels = ax.get_legend_handles_labels()\n",
    "        by_label = dict(zip(labels, handles))\n",
    "        ax.legend(by_label.values(), by_label.keys(), loc='best')\n",
    "        \n",
    "        ax.set_title('Information Topography: Variables Positioned by Information Distance')\n",
    "        ax.set_xlabel('Dimension 1')\n",
    "        ax.set_ylabel('Dimension 2')\n",
    "        ax.grid(True, alpha=0.3)\n",
    "        \n",
    "        return fig\n",
    "    \n",
    "    def visualize_information_landscape_3d(self):\n",
    "        \"\"\"Visualize the information topography as a 3D landscape.\"\"\"\n",
    "        # Compute conditional mutual information\n",
    "        mi_full, mi_conditional = self.compute_conditional_mutual_information()\n",
    "        \n",
    "        # Compute information distance matrix\n",
    "        distance = self.compute_information_distance(mi_conditional)\n",
    "        \n",
    "        # Use multidimensional scaling to project the distance matrix to 2D\n",
    "        from sklearn.manifold import MDS\n",
    "        from scipy.interpolate import griddata\n",
    "        \n",
    "        # Apply MDS to embed in 2D space\n",
    "        mds = MDS(n_components=2, dissimilarity='precomputed', random_state=42)\n",
    "        pos = mds.fit_transform(distance)\n",
    "        \n",
    "        # Create a visualization\n",
    "        fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "        ax = fig.add_subplot(111, projection='3d')\n",
    "        \n",
    "        # Assign colors by cluster\n",
    "        colors = plt.cm.tab10(np.linspace(0, 1, self.n_clusters))\n",
    "        \n",
    "        # Calculate \"elevation\" based on connection to slow modes\n",
    "        # Higher elevation = more strongly coupled to slow modes (more global influence)\n",
    "        \n",
    "        # First, compute coupling strength to slow modes\n",
    "        slow_mode_coupling = np.zeros(self.n_vars)\n",
    "        for i in range(self.n_vars):\n",
    "            pos_i = i*2\n",
    "            # Project onto slow modes\n",
    "            coupling = np.sum(np.abs(self.eigenvectors[pos_i, :self.n_slow_modes]))\n",
    "            slow_mode_coupling[i] = coupling\n",
    "            \n",
    "        # Normalize to [0,1] range\n",
    "        elevation = (slow_mode_coupling - slow_mode_coupling.min()) / (slow_mode_coupling.max() - slow_mode_coupling.min())\n",
    "        \n",
    "        # Plot the points in 3D\n",
    "        for i in range(self.n_vars):\n",
    "            cluster_id = i // self.n_vars_per_cluster\n",
    "            ax.scatter(pos[i, 0], pos[i, 1], elevation[i], \n",
    "                      c=[colors[cluster_id]], s=100, \n",
    "                      label=f\"Cluster {cluster_id}\" if i % self.n_vars_per_cluster == 0 else \"\")\n",
    "            ax.text(pos[i, 0], pos[i, 1], elevation[i] + 0.05, str(i), fontsize=9)\n",
    "        \n",
    "        # Create a surface representing the information landscape\n",
    "        # Grid the data\n",
    "        xi = np.linspace(pos[:, 0].min(), pos[:, 0].max(), 100)\n",
    "        yi = np.linspace(pos[:, 1].min(), pos[:, 1].max(), 100)\n",
    "        X, Y = np.meshgrid(xi, yi)\n",
    "        \n",
    "        # Interpolate elevation for the grid\n",
    "        Z = griddata((pos[:, 0], pos[:, 1]), elevation, (X, Y), method='cubic')\n",
    "        \n",
    "        # Plot the surface\n",
    "        surf = ax.plot_surface(X, Y, Z, cmap='viridis', alpha=0.6, linewidth=0)\n",
    "        \n",
    "        # Add connections based on conditional mutual information\n",
    "        threshold = np.percentile(mi_conditional[mi_conditional > 0], 80)\n",
    "        for i in range(self.n_vars):\n",
    "            for j in range(i+1, self.n_vars):\n",
    "                if mi_conditional[i, j] > threshold:\n",
    "                    ax.plot([pos[i, 0], pos[j, 0]], \n",
    "                           [pos[i, 1], pos[j, 1]],\n",
    "                           [elevation[i], elevation[j]],\n",
    "                           'k-', alpha=0.5, linewidth=mi_conditional[i, j]*3)\n",
    "        \n",
    "        # Remove duplicate legend entries\n",
    "        handles, labels = ax.get_legend_handles_labels()\n",
    "        by_label = dict(zip(labels, handles))\n",
    "        ax.legend(by_label.values(), by_label.keys(), loc='best')\n",
    "        \n",
    "        ax.set_title('3D Information Topography: Elevation = Slow Mode Coupling')\n",
    "        ax.set_xlabel('Dimension 1')\n",
    "        ax.set_ylabel('Dimension 2')\n",
    "        ax.set_zlabel('Slow Mode Coupling')\n",
    "        \n",
    "        return fig\n",
    "        \n",
    "    def visualize_eigenvalue_spectrum(self):\n",
    "        \"\"\"Visualize the eigenvalue spectrum showing slow vs. fast modes.\"\"\"\n",
    "        fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "        ax = fig.add_subplot(111)\n",
    "        \n",
    "        # Plot eigenvalues\n",
    "        eigenvalues = np.sort(self.eigenvalues)\n",
    "        ax.semilogy(range(1, len(eigenvalues)+1), eigenvalues, 'o-')\n",
    "        \n",
    "        # Highlight slow modes\n",
    "        ax.semilogy(range(1, self.n_slow_modes+1), eigenvalues[:self.n_slow_modes], 'ro', ms=10, label='Slow Modes')\n",
    "        \n",
    "        # Add vertical line separating slow from fast modes\n",
    "        ax.axvline(x=self.n_slow_modes + 0.5, color='k', linestyle='--')\n",
    "        ax.text(self.n_slow_modes + 1, eigenvalues[self.n_slow_modes-1], 'Slow Modes', \n",
    "               ha='left', va='center', fontsize=12)\n",
    "        ax.text(self.n_slow_modes, eigenvalues[self.n_slow_modes], 'Fast Modes', \n",
    "               ha='right', va='center', fontsize=12)\n",
    "        \n",
    "        ax.set_xlabel('Index')\n",
    "        ax.set_ylabel('Eigenvalue (log scale)')\n",
    "        ax.set_title('Eigenvalue Spectrum Showing Slow and Fast Modes')\n",
    "        ax.grid(True)\n",
    "        \n",
    "        return fig"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51a40b41-5790-43b9-a478-2df24aa4d792",
   "metadata": {},
   "source": [
    "## Information Topography Visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83ff050f-f443-48ed-9f52-ec7afdb473c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Create the information topography visualization\n",
    "topo_demo = InformationTopographyDemo(n_clusters=4, n_vars_per_cluster=5, n_slow_modes=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e937980c-12e7-4deb-940c-2649a5f90fc9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize eigenvalue spectrum\n",
    "fig0 = topo_demo.visualize_eigenvalue_spectrum()\n",
    "\n",
    "mlai.write_figure(filename='information-topography-eigenspectrum.svg', \n",
    "                 directory='./information-game')\n",
    "\n",
    "fig3 = topo_demo.visualize_information_landscape()\n",
    "\n",
    "mlai.write_figure(filename='information-topography-2d.svg', \n",
    "                  directory='./information-game')\n",
    "\n",
    "fig4 = topo_demo.visualize_information_landscape_3d()\n",
    "\n",
    "mlai.write_figure(filename='information-topography-3d.svg', \n",
    "                  directory='./information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c185ab0e-f0c8-41f2-b9ba-d6914a5e164e",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/information-topography-eigenspectrum.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Eigenvalue spectrum showing separation between slow and fast\n",
    "modes that shapes the information topography.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/information-topography-2d.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Information topography visualized as a 2D landscape with\n",
    "points positioned according to information distance.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/information-topography-3d.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>3D visualization of the information landscape where elevation\n",
    "represents coupling to slow modes.</i>\n",
    "\n",
    "The information topography visualizations now directly connect to the\n",
    "minimal entropy gradient framework. The eigenvalue spectrum shows the\n",
    "clear separation between slow and fast modes that shapes the entire\n",
    "information landscape. Variables that are strongly coupled to the same\n",
    "slow modes remain conditionally dependent even after accounting for slow\n",
    "modes, forming natural clusters in the topography.\n",
    "\n",
    "The 2D landscape reveals how variables cluster based on their\n",
    "conditional information distances, with the background gradient showing\n",
    "the influence of the primary slow mode. The 3D visualization adds\n",
    "another dimension where elevation represents coupling strength to slow\n",
    "modes - variables with higher elevation have more global influence\n",
    "across the system.\n",
    "\n",
    "This approach demonstrates how the conditional independence structure\n",
    "emerges naturally from the eigenvalue spectrum of the Fisher information\n",
    "matrix. The slow modes act as common causes that induce dependencies\n",
    "between otherwise independent variables, creating a rich information\n",
    "topography with valleys (strong dependencies) and ridges (conditional\n",
    "independence)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9951087c-a177-41b3-a64f-8fcb20497d17",
   "metadata": {},
   "source": [
    "## Temporal Markovian Decomposition\n",
    "\n",
    "The conditional independence framework we’ve developed for spatial or\n",
    "structural organization can be extended naturally to the temporal\n",
    "domain. Just as slow modes induce conditional independence between\n",
    "different regions in space, they also mediate dependencies between\n",
    "different points in time.\n",
    "\n",
    "If we divide $X$ into past/present $X_0$ and future $X_1$, we can\n",
    "analyze how information flows across time through the slow modes $M$.\n",
    "The entropy can be decomposed into a Markovian component, where $X_0$\n",
    "and $X_1$ are conditionally independent given $M$, and a non-Markovian\n",
    "component. The conditional mutual information is $$\n",
    "I(X_0; X_1 | M) = \\sum_{x_0,x_1,m} p(x_0,x_1,m) \\log \\frac{p(x_0,x_1|m)}{p(x_0|m)p(x_1|m)},\n",
    "$$ which measures the remaining dependency between past and future after\n",
    "accounting for the information stored in the slow modes. This provides a\n",
    "quantitative measure of how effectively $M$ serves as a memory that\n",
    "captures temporal dependencies.\n",
    "\n",
    "When $I(X_0; X_1 | M) = 0$, the system becomes perfectly Markovian - the\n",
    "slow modes capture all dependencies between past and future. This is\n",
    "analogous to how these same slow modes create conditional independence\n",
    "between spatial regions. The eigenvalue structure of the Fisher\n",
    "information matrix that gives rise to spatial modularity also determines\n",
    "the temporal memory capacity of the system.\n",
    "\n",
    "Just as there is an information topography in space, we can define a\n",
    "temporal information landscape where “distance” corresponds to\n",
    "conditional mutual information between variables at different time\n",
    "points given $M$. Temporal watersheds emerge where the slow modes fail\n",
    "to bridge temporal dependencies, creating effective boundaries in the\n",
    "system’s dynamics.\n",
    "\n",
    "This framework highlights the tension in information processing systems.\n",
    "The slow modes must simultaneously: 1. Maintain minimal entropy (for\n",
    "efficiency) 2. Induce conditional independence between spatial regions\n",
    "(for modularity) 3. Capture temporal dependencies between past and\n",
    "future (for memory)\n",
    "\n",
    "These competing objectives create an uncertainty principle: systems\n",
    "cannot simultaneously optimize for all three without trade-offs. Systems\n",
    "with strong spatial modularity may sacrifice temporal memory, while\n",
    "systems with excellent memory may require more complex slow mode\n",
    "structure.\n",
    "\n",
    "So far, we have analyzed conditional independence structures given a\n",
    "predefined eigenvalue structure. A natural question is: can such\n",
    "structures emerge naturally from more fundamental principles? To address\n",
    "this, we can leverage the gradient ascent framework we developed earlier\n",
    "to demonstrate how conditional independence patterns emerge as the\n",
    "system evolves towards maximum entropy states.\n",
    "\n",
    "This integration completes our theoretical picture: the eigenvalue\n",
    "structures that lead to locality through conditional independence are\n",
    "not arbitrary mathematical constructions, but natural consequences of\n",
    "entropy maximization under uncertainty constraints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5b8bb8a3-2059-4a6c-a920-f7116fc829e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run a large-scale gradient ascent simulation to generate eigenvalue structure\n",
    "n_clusters = 4\n",
    "n_vars_per_cluster = 5\n",
    "n_slow_modes = 2\n",
    "n_pairs = n_clusters * n_vars_per_cluster  # Total number of position-momentum pairs\n",
    "total_dims = 2 * n_pairs + n_slow_modes    # Total system dimensionality\n",
    "steps = 100\n",
    "\n",
    "# Initialize with minimal entropy state but with cross-cluster connections\n",
    "Lambda_init = initialize_multidimensional_state(n_pairs, \n",
    "                                                squeeze_factors=[0.1 + 0.1*i for i in range(n_pairs)],\n",
    "                                                with_cross_connections=True)\n",
    "\n",
    "# Run gradient ascent\n",
    "eigenvalues_history, entropy_history = gradient_ascent_entropy(Lambda_init, steps, learning_rate=0.01)\n",
    "\n",
    "# At different stages of gradient ascent, compute conditional independence metrics\n",
    "stage_indices = [0, steps//4, steps//2, steps-1]  # Initial, early, middle, final stages\n",
    "\n",
    "# Create a conditional independence demo using the evolved eigenvalue structure\n",
    "ci_demo = ConditionalIndependenceDemo(n_clusters=n_clusters, \n",
    "                                     n_vars_per_cluster=n_vars_per_cluster, \n",
    "                                     n_slow_modes=n_slow_modes)\n",
    "\n",
    "# Track conditional mutual information at different gradient ascent stages\n",
    "mi_stages = []\n",
    "for stage in stage_indices:\n",
    "    # Use evolved eigenvalues to construct precision matrix\n",
    "    precision = ci_demo.precision.copy()\n",
    "    \n",
    "    # Update diagonal with evolved eigenvalues\n",
    "    np.fill_diagonal(precision, eigenvalues_history[stage])\n",
    "    \n",
    "    # Compute mutual information matrices\n",
    "    mi_unconditional = ci_demo.compute_mutual_information_matrix(conditional_on_slow=False)\n",
    "    mi_conditional = ci_demo.compute_mutual_information_matrix(conditional_on_slow=True)\n",
    "    \n",
    "    mi_stages.append({\n",
    "        'step': stage,\n",
    "        'unconditional': mi_unconditional,\n",
    "        'conditional': mi_conditional\n",
    "    })"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b9cd7a2-4b64-4fca-9ca5-cd239864726a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize how conditional independence emerges through gradient ascent\n",
    "fig = plt.figure(figsize=(15, 12))\n",
    "gs = gridspec.GridSpec(2, 2, height_ratios=[1, 1])\n",
    "\n",
    "# Plot eigenvalue evolution\n",
    "ax1 = plt.subplot(gs[0, 0])\n",
    "for i in range(len(eigenvalues_history[0])):\n",
    "    if i >= 2*n_pairs:  # Slow modes\n",
    "        ax1.semilogy(eigenvalues_history[0][:, i], 'r-', alpha=0.7)\n",
    "    else:  # Fast variables\n",
    "        ax1.semilogy(eigenvalues_history[0][:, i], 'b-', alpha=0.4)\n",
    "\n",
    "# Highlight representative eigenvalues\n",
    "ax1.semilogy(eigenvalues_history[0][:, 0], 'b-', linewidth=2, label='Fast variable')\n",
    "ax1.semilogy(eigenvalues_history[0][:, -1], 'r-', linewidth=2, label='Slow mode')\n",
    "\n",
    "ax1.set_xlabel('Gradient Ascent Step')\n",
    "ax1.set_ylabel('Eigenvalue (log scale)')\n",
    "ax1.set_title('Eigenvalue Evolution During Gradient Ascent')\n",
    "ax1.legend()\n",
    "ax1.grid(True, alpha=0.3)\n",
    "\n",
    "# Plot entropy evolution\n",
    "ax2 = plt.subplot(gs[0, 1])\n",
    "ax2.plot(entropy_history)\n",
    "ax2.set_xlabel('Gradient Ascent Step')\n",
    "ax2.set_ylabel('Entropy')\n",
    "ax2.set_title('Entropy Evolution')\n",
    "ax2.grid(True, alpha=0.3)\n",
    "\n",
    "# Plot conditional vs unconditional mutual information matrices for final stage\n",
    "final_stage = mi_stages[-1]\n",
    "\n",
    "# Plot unconditional mutual information\n",
    "ax3 = plt.subplot(gs[1, 0])\n",
    "im3 = ax3.imshow(final_stage['unconditional'], cmap='inferno')\n",
    "ax3.set_title('Final Unconditional\\nMutual Information')\n",
    "ax3.set_xlabel('Fast variable index')\n",
    "ax3.set_ylabel('Fast variable index')\n",
    "\n",
    "# Add lines to delineate the clusters\n",
    "for i in range(1, n_clusters):\n",
    "    idx = i * n_vars_per_cluster - 0.5\n",
    "    ax3.axhline(y=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "    ax3.axvline(x=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "plt.colorbar(im3, ax=ax3)\n",
    "\n",
    "# Plot conditional mutual information\n",
    "ax4 = plt.subplot(gs[1, 1])\n",
    "im4 = ax4.imshow(final_stage['conditional'], cmap='inferno')\n",
    "ax4.set_title('Final Conditional Mutual Information\\nGiven Slow Modes')\n",
    "ax4.set_xlabel('Fast variable index')\n",
    "ax4.set_ylabel('Fast variable index')\n",
    "\n",
    "# Add lines to delineate the clusters\n",
    "for i in range(1, n_clusters):\n",
    "    idx = i * n_vars_per_cluster - 0.5\n",
    "    ax4.axhline(y=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "    ax4.axvline(x=idx, color='white', linestyle='-', linewidth=0.5)\n",
    "plt.colorbar(im4, ax=ax4)\n",
    "\n",
    "plt.tight_layout()\n",
    "mlai.write_figure(filename='emergent-conditional-independence.svg', \n",
    "                 directory='./information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d715cc0-3035-44db-891c-4f4965f3186c",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/emergent-conditional-independence.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Through gradient ascent on entropy, we observe the emergence\n",
    "of eigenvalue structures that lead to conditional independence patterns.\n",
    "The top row shows eigenvalue and entropy evolution during gradient\n",
    "ascent. The bottom row shows the unconditional mutual information (left)\n",
    "and conditional mutual information given slow modes (right) at the final\n",
    "stage.</i>\n",
    "\n",
    "The experiment results reveal:\n",
    "\n",
    "1.  *Natural eigenvalue separation*: As the system evolves toward\n",
    "    maximum entropy, we observe a natural separation of eigenvalues into\n",
    "    “slow” and “fast” modes. The slow modes (those with small\n",
    "    eigenvalues and thus large variances) tend to develop connections\n",
    "    across different regions of the system.\n",
    "\n",
    "2.  *Emergent conditional independence*: The conditional mutual\n",
    "    information matrix shows that, after conditioning on the slow modes,\n",
    "    the dependencies between variables from different clusters are\n",
    "    significantly reduced. This confirms that the conditional\n",
    "    independence structure emerges naturally through entropy\n",
    "    maximization.\n",
    "\n",
    "3.  *Block structure in mutual information*: Without conditioning, the\n",
    "    mutual information matrix shows significant dependencies across\n",
    "    different regions. After conditioning on the slow modes, a block\n",
    "    structure emerges where variables within the same cluster remain\n",
    "    dependent, but cross-cluster dependencies are minimized.\n",
    "\n",
    "This demonstrates a profound connection: the mathematical structure\n",
    "required for locality through conditional independence is not an\n",
    "artificial construction, but emerges naturally from entropy maximization\n",
    "subject to uncertainty constraints. The slow modes that act as\n",
    "information reservoirs connecting different parts of the system arise as\n",
    "a consequence of the system seeking its maximum entropy configuration\n",
    "while respecting fundamental constraints.\n",
    "\n",
    "This emergent locality provides a potential explanation for how complex\n",
    "systems can maintain both global coherence (through slow modes) and\n",
    "local autonomy (through conditional independence structures). It\n",
    "suggests that the hierarchical organization observed in many natural and\n",
    "artificial systems may be a natural consequence of information-theoretic\n",
    "principles rather than requiring explicit design."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "deda4659-0064-4f48-b13c-6df149142ac4",
   "metadata": {},
   "source": [
    "## Fundamental Tradeoffs in Information Processing Systems\n",
    "\n",
    "The game exhibits three properties that emerge from the characteristic\n",
    "structure of the Fisher information matrix: information capacity,\n",
    "modularity, and memory.\n",
    "\n",
    "1.  **Information Capacity**: Mathematically expressed through the\n",
    "    variances of the slow modes, where\n",
    "    $\\sigma_i^2 \\propto \\frac{1}{\\lambda_i}$. Smaller eigenvalues permit\n",
    "    higher variance in corresponding directions, allowing more\n",
    "    information to be carried. This capacity arises directly from\n",
    "    entropy maximization under uncertainty constraints.\n",
    "\n",
    "2.  **Modularity**: Formalized through conditional independence\n",
    "    relations $I(X^i; X^j | M) \\approx 0$ between variables in different\n",
    "    modules given the slow modes. When this conditional mutual\n",
    "    information approaches zero, the precision matrix develops block\n",
    "    structures that mathematically define spatial or functional modules.\n",
    "\n",
    "3.  **Memory**: Characterized by the temporal Markov property, where\n",
    "    $I(X_0; X_1 | M) = 0$ indicates that slow modes completely mediate\n",
    "    dependencies between past and future states. This mathematical\n",
    "    condition defines the system’s capacity to preserve relevant\n",
    "    information across time.\n",
    "\n",
    "The interrelationship between these properties can be understood by\n",
    "examining their mathematical definitions. All three depend on the same\n",
    "underlying eigenstructure of the Fisher information matrix, creating\n",
    "inherent constraints. This leads to a mathematical uncertainty relation:\n",
    "where: and $k$ is a system-dependent constant.\n",
    "\n",
    "Here, $\\mathcal{C}(M)$ is defined as the sum of the variances\n",
    "$\\sigma_i^2$ of the slow modes, which is equivalently the sum of\n",
    "reciprocals of the eigenvalues $\\lambda_i$ of the Fisher information\n",
    "matrix corresponding to these modes. This quantity mathematically\n",
    "represents the total information capacity of the slow modes - how much\n",
    "information they can effectively store or transmit. Higher capacity\n",
    "allows the slow modes to capture more complex dependencies across the\n",
    "system, but may require more physical resources to maintain.\n",
    "\n",
    "This uncertainty relation emerges from the shared dependence on the\n",
    "eigenstructure. When a system increases the information capacity of slow\n",
    "modes to improve memory, these modes necessarily couple more variables\n",
    "across space, reducing modularity. Conversely, strong modularity\n",
    "requires specific eigenvalue patterns that may constrain the slow modes’\n",
    "ability to capture temporal dependencies.\n",
    "\n",
    "When examining the Markov property specifically, we observe that it\n",
    "emerges naturally when the eigenstructure allocates sufficient\n",
    "information capacity to slow modes to mediate temporal dependencies. The\n",
    "emergence or failure of Markovianity can be precisely quantified through\n",
    "$I(X_0; X_1 | M)$, where non-zero values indicate direct information\n",
    "pathways between past and future that bypass the slow mode bottleneck.\n",
    "\n",
    "This mathematical framework reveals why no system can simultaneously\n",
    "maximize information capacity, modularity, and memory - the constraints\n",
    "are not design limitations but fundamental properties of information\n",
    "geometry. The eigenstructure must balance these properties based on the\n",
    "underlying physics of information propagation through the system."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9422f35c-7fcb-4621-a9cb-07750aec3ecf",
   "metadata": {},
   "source": [
    "## The Duality Between Modularity and Memory\n",
    "\n",
    "Modularity and memory represent a duality in information processing\n",
    "systems. While they appear distinct - modularity concerns\n",
    "spatial/functional organization while memory concerns temporal\n",
    "dependencies - they are two manifestations of the same underlying\n",
    "mathematical structure.\n",
    "\n",
    "Both properties emerge from conditional independence relationships\n",
    "mediated by the slow modes: - Modularity: $I(X^i; X^j | M) \\approx 0$\n",
    "for variables in different spatial/functional modules - Memory:\n",
    "$I(X_0; X_1 | M) \\approx 0$ for variables at different time points\n",
    "\n",
    "This reveals a symmetry: modularity can be viewed as “spatial memory”\n",
    "where the slow modes maintain information about the relationships\n",
    "between different parts of the system. Conversely, memory can be viewed\n",
    "as “temporal modularity” where the slow modes create effective\n",
    "independence between past and future states, mediated by the present\n",
    "state of the slow modes.\n",
    "\n",
    "The mathematical structures that support this duality are apparent when\n",
    "we examine dynamical systems over time. The same slow modes that create\n",
    "effective boundaries between spatial modules create bridges across time.\n",
    "\n",
    "The eigenvalue structure of the Fisher information matrix determines\n",
    "both: 1. How effectively the system partitions into modules (spatial\n",
    "organization) 2. How effectively the system retains relevant information\n",
    "over time (temporal organization)\n",
    "\n",
    "In hierarchical systems the slow modes at each level of the hierarchy\n",
    "simultaneously.\n",
    "\n",
    "1.  Define the boundaries between modules at that level (modularity)\n",
    "2.  Determine what information persists from past to future at that\n",
    "    timescale (memory)\n",
    "\n",
    "This perspective provides a unified framework for understanding how\n",
    "information is organized across both space and time in complex systems."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e10701b6-a5ff-48c3-ba88-910e77813560",
   "metadata": {},
   "source": [
    "## Is Landauer’s Limit Related to Shannon’s Gaussian Channel Capacity?\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/landauer-shannon-connection.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/landauer-shannon-connection.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Digital memory can be viewed as a communication channel through time -\n",
    "storing a bit is equivalent to transmitting information to a future\n",
    "moment. This perspective immediately suggests that we look for a\n",
    "connection between Landauer’s erasure principle and Shannon’s channel\n",
    "capacity. The connection might arise because both these systems are\n",
    "about maintaining reliable information against thermal noise.\n",
    "\n",
    "The Landauer limit (Landauer, 1961) is the minimum amount of heat energy\n",
    "that is dissapated when a bit of information is erased. Conceptually\n",
    "it’s the potential energy associated with holding a bit to an\n",
    "identifiable single value that is differentiable from the background\n",
    "thermal noise (representated by temperature).\n",
    "\n",
    "The Gaussian channel capacity (Shannon, 1948) represents how\n",
    "identifiable a signal $S$, is relative to the background noise, $N$.\n",
    "Here we trigger a small exploration of potential relationship between\n",
    "these two values.\n",
    "\n",
    "When we store a bit in memory, we maintain a signal that can be reliably\n",
    "distinguished from thermal noise, just as in a communication channel.\n",
    "This suggests that Landauer’s limit for erasure of one bit of\n",
    "information, $E_{min} = k_BT$, and Shannon’s Gaussian channel capacity,\n",
    "$$\n",
    "C = \\frac{1}{2}\\log_2\\left(1 + \\frac{S}{N}\\right),\n",
    "$$ might be different views of the same limit.\n",
    "\n",
    "Landauer’s limit states that erasing one bit of information requires a\n",
    "minimum energy of $E_{\\text{min}} = k_BT$. For a communication channel\n",
    "operating over time $1/B$, the signal power $S = EB$ and noise power\n",
    "$N = k_BTB$. This gives us: $$\n",
    "C = \\frac{1}{2}\\log_2\\left(1 + \\frac{S}{N}\\right) = \\frac{1}{2}\\log_2\\left(1 + \\frac{E}{k_BT}\\right)\n",
    "$$ where the bandwidth B cancels out in the ratio.\n",
    "\n",
    "When we operate at Landauer’s limit, setting $E = k_BT$, we get a\n",
    "signal-to-noise ratio of exactly 1: $$\n",
    "\\frac{S}{N} = \\frac{E}{k_BT} = 1\n",
    "$$ This yields a channel capacity of exactly half a bit per second, $$\n",
    "C = \\frac{1}{2}\\log_2(2) = \\frac{1}{2} \\text{ bit/s}\n",
    "$$\n",
    "\n",
    "The factor of 1/2 appears in Shannon’s formula because of Nyquist’s\n",
    "theorem - we need two samples per cycle at bandwidth B to represent a\n",
    "signal. The bandwidth $B$ appears in both signal and noise power but\n",
    "cancels in their ratio, showing how Landauer’s energy-per-bit limit\n",
    "connects to Shannon’s bits-per-second capacity.\n",
    "\n",
    "This connection suggests that Landauer’s limit may correspond to the\n",
    "energy needed to establish a signal-to-noise ratio sufficient to\n",
    "transmit one bit of information per second. The temperature $T$ may set\n",
    "both the minimum energy scale for information erasure and the noise\n",
    "floor for information transmission."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99686170-35d0-4a61-9a68-ee6d7bece523",
   "metadata": {},
   "source": [
    "## Implications for Information Engines\n",
    "\n",
    "This connection suggests that the fundamental limits on information\n",
    "processing may arise from the need to maintain signals above the thermal\n",
    "noise floor. Whether we’re erasing information (Landauer) or\n",
    "transmitting it (Shannon), we need to overcome the same fundamental\n",
    "noise threshold set by temperature.\n",
    "\n",
    "This perspective suggests that both memory operations (erasure) and\n",
    "communication operations (transmission) are limited by the same physical\n",
    "principles. The temperature $T$ emerges as a fundamental parameter that\n",
    "sets the scale for both energy requirements and information capacity.\n",
    "\n",
    "The connection between Landauer’s limit and Shannon’s channel capacity\n",
    "is intriguing but still remains speculative. For Landauer’s original\n",
    "work see Landauer (1961), Bennett’s review and developments see Bennet\n",
    "(1982), and for a more recent overview and connection to developments in\n",
    "non-equilibrium thermodynamics Parrondo et al. (2015)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5704880-dc17-4c27-8e3f-acd9faf4d662",
   "metadata": {},
   "source": [
    "## Detecting Transitions with Moment Generating Functions\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/mgf-analysis-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/mgf-analysis-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "How can we detect transitions between quantum-like and classical\n",
    "behaviour? The moment generating function (MGF) of the entropy provides\n",
    "a potential route for analyzing variable transitions and detecting\n",
    "transitions between $X$ and $M$.\n",
    "\n",
    "For each variable in our system, we can compute its moment generating\n",
    "function (MGF), $$\n",
    "M_{Z_i}(t) = \\mathbb{E}[e^{tZ_i}] = \\exp(A(\\theta + te_i) - A(\\theta))\n",
    "$$ where $e_i$ is the standard basis vector for the $i$-th coordinate.\n",
    "\n",
    "The behavior of this MGF reveals what regimes variables are operating\n",
    "in.\n",
    "\n",
    "1.  **Quantum-like variables** show oscillatory MGF behavior, with\n",
    "    complex analytic structure\n",
    "2.  **Classical variables** show monotonic MGF growth, with simple\n",
    "    analytic structure\n",
    "\n",
    "This provides a diagnostic tool to identify which variables are\n",
    "functioning as quantum-like information reservoirs versus classical\n",
    "processing components."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1767c3df-1046-446f-8e81-8734c8c7c505",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53dd9632-30db-49b9-a184-df48972786d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Visualizing MGF differences between quantum-like and classical variables\n",
    "\n",
    "# Define example MGF functions\n",
    "def quantum_like_mgf(t):\n",
    "    \"\"\"MGF with oscillatory behavior (quantum-like)\"\"\"\n",
    "    return np.exp(t**2/2) * np.cos(2*t)\n",
    "\n",
    "def classical_mgf(t):\n",
    "    \"\"\"MGF with monotonic growth (classical)\"\"\"\n",
    "    return np.exp(t**2/2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c21c2ed-00d0-4b39-a0e0-e4e6de3f73de",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Create a range of t values\n",
    "t = np.linspace(-3, 3, 1000)\n",
    "\n",
    "# Compute MGFs\n",
    "qm_mgf = quantum_like_mgf(t)\n",
    "cl_mgf = classical_mgf(t)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a3c7fcd-553b-41de-a72b-0af17fbf627a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d65e83ad-7e49-4f91-9fd2-f5346b613981",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create figure\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))\n",
    "\n",
    "# Plot MGFs\n",
    "ax1.plot(t, qm_mgf, 'b-', label='Quantum-like variable')\n",
    "ax1.plot(t, cl_mgf, 'r-', label='Classical variable')\n",
    "ax1.set_xlabel('t')\n",
    "ax1.set_ylabel('M(t)')\n",
    "ax1.set_title('Moment (Cummulant) Generating Functions')\n",
    "ax1.legend()\n",
    "ax1.grid(True, linestyle='--', alpha=0.7)\n",
    "\n",
    "# Plot log derivatives (oscillation index)\n",
    "d_qm_log_mgf = np.gradient(np.log(np.abs(qm_mgf)), t)\n",
    "d_cl_log_mgf = np.gradient(np.log(cl_mgf), t)\n",
    "\n",
    "ax2.plot(t, d_qm_log_mgf, 'b-', label='Quantum-like variable')\n",
    "ax2.plot(t, d_cl_log_mgf, 'r-', label='Classical variable')\n",
    "ax2.set_xlabel('t')\n",
    "ax2.set_ylabel('$\\\\frac{\\\\text{d} \\\\log M(t)}{\\\\text{d}t}')\n",
    "ax2.set_title('Log-Derivative of MGF')\n",
    "ax2.legend()\n",
    "ax2.grid(True, linestyle='--', alpha=0.7)\n",
    "\n",
    "mlai.write_figure(filename='oscillation-in-cummulant-generating-function.svg', \n",
    "                  directory = './information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a9f3521-189a-4ee1-9001-ba6cf3c6dfad",
   "metadata": {},
   "source": [
    "The oscillation in the derivative of the log-MGF provides a clear\n",
    "signature of quantum-like behavior. This “oscillation index” can be used\n",
    "to quantify how much a variable displays quantum versus classical\n",
    "characteristics.\n",
    "\n",
    "This analysis offers a practical method to detect the quantum-classical\n",
    "transition in our information reservoirs without needing to directly\n",
    "observe the system’s internal state. It connects directly to\n",
    "information-theoretic channel properties and provides a bridge between\n",
    "our abstract model and experimentally observable quantities."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc87956e-a540-46f8-b661-2c8663f36535",
   "metadata": {},
   "source": [
    "## Information Reservoirs\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-information-reservoirs.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-information-reservoirs.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The uncertainty principle means that the game can exhibit quantum-like\n",
    "information processing regimes during evolution.\n",
    "\n",
    "At minimal entropy states near the origin, the information reservoir has\n",
    "characteristics reminiscent of quantum systems.\n",
    "\n",
    "1.  *Wave-like information encoding*: The information reservoir near the\n",
    "    origin necessarily encodes information in distributed,\n",
    "    interference-capable patterns due to the uncertainty principle\n",
    "    between parameters $\\boldsymbol{\\theta}(M)$ and capacity variables\n",
    "    $c(M)$.\n",
    "\n",
    "2.  *Non-local correlations*: Parameters are highly correlated through\n",
    "    the Fisher information matrix, creating structures where information\n",
    "    is stored in relationships rather than individual variables.\n",
    "\n",
    "3.  *Uncertainty-saturated regime*: The uncertainty relationship\n",
    "    $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k$ is nearly\n",
    "    saturated (approaches equality), similar to Heisenberg’s uncertainty\n",
    "    principle in quantum systems.\n",
    "\n",
    "As the system evolves towards higher entropy states, a transition occurs\n",
    "where some variables exhibit classical behavior.\n",
    "\n",
    "1.  *From wave-like to particle-like*: Variables transitioning from $M$\n",
    "    to $X$ shift from storing information in interference patterns to\n",
    "    storing it in definite values with statistical uncertainty.\n",
    "\n",
    "2.  *Decoherence-like process*: The uncertainty product\n",
    "    $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M)$ for these variables\n",
    "    grows significantly larger than the minimum value $k$, indicating a\n",
    "    departure from quantum-like behavior.\n",
    "\n",
    "3.  *Local information encoding*: Information becomes increasingly\n",
    "    encoded in local variables rather than distributed correlations.\n",
    "\n",
    "The saddle points in our entropy landscape mark critical transitions\n",
    "between quantum-like and classical information processing regimes. Near\n",
    "these points\n",
    "\n",
    "1.  The critically slowed modes maintain quantum-like characteristics,\n",
    "    functioning as coherent memory that preserves information through\n",
    "    interference patterns.\n",
    "\n",
    "2.  The rapidly evolving modes exhibit classical characteristics,\n",
    "    functioning as incoherent processors that manipulate information\n",
    "    through statistical operations.\n",
    "\n",
    "3.  This natural separation creates a hybrid computational architecture\n",
    "    where quantum-like memory interfaces with classical-like processing."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41cf8517-db64-43c1-b2fe-89ccdae68180",
   "metadata": {},
   "source": [
    "## Classical Hierarchical Memory Structure\n",
    "\n",
    "As the system evolves further toward higher entropy, a purely classical\n",
    "hierarchical memory structure can emerge. Unlike quantum-like reservoirs\n",
    "that rely on interference patterns and non-local correlations, classical\n",
    "information reservoirs in our system organize hierarchically:\n",
    "\n",
    "1.  *Timescale Hierarchy*: Variables separate into distinct timescale\n",
    "    bands based on eigenvalues of the Fisher information matrix.\n",
    "    Slower-changing variables (smaller eigenvalues) act as context for\n",
    "    faster-changing variables (larger eigenvalues), creating a natural\n",
    "    temporal hierarchy.\n",
    "\n",
    "2.  *Markov Blanket Formation*: Groups of variables form statistical\n",
    "    “shields” or Markov blankets that conditionally separate one part of\n",
    "    the system from another. This creates modular information processing\n",
    "    units with relative statistical independence.\n",
    "\n",
    "3.  *Mean-Field Dynamics*: Fast variables respond to the average or\n",
    "    “mean field” of slow variables, while slow variables integrate the\n",
    "    statistics of fast variables. This two-way coupling creates stable\n",
    "    hierarchical processing without requiring quantum coherence.\n",
    "\n",
    "4.  *Scale-Free Organization*: The hierarchical structure often exhibits\n",
    "    scale-free properties, with similar statistical relationships\n",
    "    appearing across different scales of organization. This enables\n",
    "    efficient information compression and retrieval.\n",
    "\n",
    "This classical hierarchical structure might be evident in systems with\n",
    "many variables and complex parameter spaces. It would emerge alongside\n",
    "the formation of conditional independence structures, $$\n",
    "p(X|M) \\approx \\prod_i p(X_i|M_{\\text{pa}(i)})\n",
    "$$ Here $M_{\\text{pa}(i)}$ represents the “parent” memory variables in\n",
    "the hierarchy that directly influence $X_i$. This factorization of the\n",
    "joint distribution reflects the emergence of causal hierarchies that\n",
    "enable efficient classical information processing.\n",
    "\n",
    "Such a hierarchical memory structure would maintains high information\n",
    "capacity through multiplexing across timescales rather than through\n",
    "quantum-like uncertainty relations. Variables at different levels of the\n",
    "hierarchy would simultaneously encode different aspects of information.\n",
    "\n",
    "1.  *Slow variables*: Encode stable, context-like information (akin to\n",
    "    semantic memory)\n",
    "2.  *Intermediate variables*: Encode relationships and transformations\n",
    "    (akin to episodic memory)\n",
    "3.  *Fast variables*: Encode immediate state information (akin to\n",
    "    working memory)\n",
    "\n",
    "This classical hierarchical structure provides a powerful information\n",
    "processing architecture that emerges naturally from entropy\n",
    "maximization, without requiring quantum effects. Complex, efficient\n",
    "memory systems can develop purely through classical statistical\n",
    "mechanics when operating far from the minimal entropy regime.\n",
    "\n",
    "The moment generating function $M_Z(t)$ still provides the diagnostic:\n",
    "classical hierarchical systems show distinct factorization patterns in\n",
    "the MGF that reflect the conditional independence structure, with each\n",
    "level of the hierarchy contributing characteristic timescales to the\n",
    "overall dynamics."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bccf1708-b8f2-42e0-8aff-403354876dd3",
   "metadata": {},
   "source": [
    "## Variable Transitions\n",
    "\n",
    "How do the $z_i$ variables transition between $X$ and $M$? We need an\n",
    "approach to identifying when the character of variables has changed.\n",
    "\n",
    "The moment generating function (MGF) can help identify transition\n",
    "candidates, $$\n",
    "M_Z(t) = E[e^{t \\cdot Z}] = \\exp(A(\\boldsymbol{\\theta}+t) - A(\\boldsymbol{\\theta})).\n",
    "$$ Taking the logarithm gives us the cumulant generating function: $$\n",
    "K_Z(t) = \\log M_Z(t) = A(\\boldsymbol{\\theta}+t) - A(\\boldsymbol{\\theta}).\n",
    "$$ Variables transition when their contribution to cumulants changes\n",
    "significantly. Specifically, we can track the second derivative of the\n",
    "cumulant generating function (which gives the variance) for each\n",
    "variable, $$\n",
    "\\frac{d^2}{dt_i^2}K_Z(t)|_{t=0} = \\frac{\\partial^2 A(\\boldsymbol{\\theta})}{\\partial \\theta_i^2}.\n",
    "$$ When a variable’s variance begins to grow rapidly as we move along\n",
    "the entropy gradient, it indicates that this variable is transitioning\n",
    "from memory ($M$) to observable ($X$). Conversely, when a variable’s\n",
    "contribution to higher-order cumulants decreases, it may be\n",
    "transitioning from $X$ to $M$.\n",
    "\n",
    "This transition can also be understood as a change in the Shannon\n",
    "channel characteristics of the variable - from a low-noise,\n",
    "precision-optimized channel (in $M$) to a high-bandwidth, high-entropy\n",
    "channel (in $X$)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bb64546-c882-43cd-aa1e-20123307a94e",
   "metadata": {},
   "source": [
    "## Hierarchical Memory Organization Example\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/hierarchical-memory-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/hierarchical-memory-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "As the game evolves to classical regimes, a hierarchical memory\n",
    "structure can emerge. We illustrate the idea with a simple dynamical\n",
    "system example.\n",
    "\n",
    "Consider a system with 8 variables that undergo steepest ascent entropy\n",
    "maximization. As the system evolves, assume the eigenvalue spectrum of\n",
    "the Fisher information matrix has a separation into timescales as\n",
    "follows.\n",
    "\n",
    "1.  *Very slow variables* (eigenvalues ≈ 0.01) - Deep memory, context\n",
    "    variables\n",
    "2.  *Slow variables* (eigenvalues ≈ 0.1) - Long-term memory\n",
    "3.  *Medium variables* (eigenvalues ≈ 1.0) - Intermediate\n",
    "    processing/memory\n",
    "4.  *Fast variables* (eigenvalues ≈ 10.0) - Rapid processing, minimal\n",
    "    memory\n",
    "\n",
    "This implies a natural hierarchy where slow variables can provide\n",
    "context for faster variables, and faster variables can be guidedl guided\n",
    "by slower variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d054efaa-40c5-47cd-8212-cc60b46a45aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai\n",
    "import networkx as nx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7dce1a8a-af31-4063-bf9e-af479a6a036f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualizing hierarchical memory structure\n",
    "\n",
    "# Create a hierarchical structure\n",
    "G = nx.DiGraph()\n",
    "\n",
    "# Add nodes for different timescales\n",
    "timescales = {\n",
    "    'context': {'color': 'blue', 'size': 800, 'eigenvalue': 0.01},\n",
    "    'long-term': {'color': 'green', 'size': 500, 'eigenvalue': 0.1},\n",
    "    'intermediate': {'color': 'orange', 'size': 300, 'eigenvalue': 1.0},\n",
    "    'processing': {'color': 'red', 'size': 100, 'eigenvalue': 10.0}\n",
    "}\n",
    "\n",
    "# Add nodes and connections\n",
    "for level in timescales:\n",
    "    G.add_node(level, **timescales[level])\n",
    "\n",
    "# Add edges (hierarchical connections)\n",
    "G.add_edge('context', 'long-term')\n",
    "G.add_edge('context', 'intermediate')\n",
    "G.add_edge('context', 'processing')\n",
    "G.add_edge('long-term', 'intermediate')\n",
    "G.add_edge('long-term', 'processing') \n",
    "G.add_edge('intermediate', 'processing')\n",
    "\n",
    "# Create figure\n",
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "\n",
    "# Get node attributes\n",
    "node_colors = [G.nodes[n]['color'] for n in G.nodes]\n",
    "node_sizes = [G.nodes[n]['size'] for n in G.nodes]\n",
    "eigenvalues = [G.nodes[n]['eigenvalue'] for n in G.nodes]\n",
    "\n",
    "# Create a hierarchical layout\n",
    "pos = {\n",
    "    'context': (0, 3),\n",
    "    'long-term': (0, 2), \n",
    "    'intermediate': (0, 1),\n",
    "    'processing': (0, 0)\n",
    "}\n",
    "\n",
    "# Draw the network\n",
    "nx.draw_networkx(G, pos, with_labels=True, node_color=node_colors, \n",
    "                node_size=node_sizes, font_color='white', \n",
    "                font_weight='bold', ax=ax, arrowsize=20)\n",
    "\n",
    "# Add eigenvalue labels\n",
    "for node, position in pos.items():\n",
    "    eigenvalue = G.nodes[node]['eigenvalue']\n",
    "    ax.text(position[0] + 0.2, position[1], \n",
    "            f'$\\\\lambda = {eigenvalue}$', \n",
    "            fontsize=12)\n",
    "\n",
    "ax.set_title('Hierarchical Memory Organization')\n",
    "ax.set_axis_off()\n",
    "\n",
    "# Add a second plot showing update dynamics\n",
    "ax2 = fig.add_axes([0.6, 0.2, 0.35, 0.6])\n",
    "t = np.linspace(0, 100, 1000)\n",
    "for node, info in timescales.items():\n",
    "    eigenvalue = info['eigenvalue']\n",
    "    color = info['color']\n",
    "    ax2.plot(t, np.sin(eigenvalue * t) * np.exp(-0.01 * t), \n",
    "             color=color, label=f\"{node} (λ={eigenvalue})\")\n",
    "\n",
    "ax2.set_xlabel('Time')\n",
    "ax2.set_ylabel('Parameter value')\n",
    "ax2.set_title('Update Dynamics at Different Timescales')\n",
    "ax2.legend()\n",
    "ax2.grid(True, linestyle='--', alpha=0.7)\n",
    "\n",
    "mlai.write_figure(filename='hierarchical-memory-organisation-example.svg', \n",
    "                  directory = './information-game')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80e8cdf9-4c46-4719-93b5-cbf4ac78c08f",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/hierarchical-memory-organisation-example.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i></i>\n",
    "\n",
    "A hierarchical memory structure emerges naturally during entropy\n",
    "maximization. The timescale separation creates a computational\n",
    "architecture where different levels operate at different characteristic\n",
    "timescales.\n",
    "\n",
    "The hierarchy is important in understanding how it is possible for\n",
    "information information reservoirs to achieve high capacity (entropy)\n",
    "without underlying quantum-like interference effects. Different\n",
    "variables are characterised based on their eigenvalue in the Fisher\n",
    "information matrix."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2157369-c342-427c-a302-89949a2da09b",
   "metadata": {},
   "source": [
    "## Conceptual Framework\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conceptual-framework.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conceptual-framework.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The Jaynes’ world game illustrates fundamental principles of information\n",
    "dynamics.\n",
    "\n",
    "1.  *Information Conservation*: Total information remains constant but\n",
    "    redistributes between structure and randomness. This follows from\n",
    "    the fundamental uncertainty principle between parameters and\n",
    "    capacity. As parameters become less precisely specified, capacity\n",
    "    increases.\n",
    "\n",
    "2.  *Uncertainty Principle*: Precision in parameters trades off with\n",
    "    entropy capacity. This is not merely a mathematical constraint but a\n",
    "    necessary feature of any physical information reservoir that must\n",
    "    maintain both stability and sufficient capacity.\n",
    "\n",
    "3.  *Self-Organization*: The system autonomously navigates toward\n",
    "    maximum entropy while maintaining necessary structure through\n",
    "    critically slowed modes. These modes function as information\n",
    "    reservoirs that preserve essential constraints while allowing\n",
    "    maximum entropy production elsewhere.\n",
    "\n",
    "4.  *Information-Energy Duality*: The framework connects to\n",
    "    thermodynamic concepts through the relationship between entropy\n",
    "    production and available work. As shown by Sagawa and Ueda,\n",
    "    information gain can be translated into extractable work, suggesting\n",
    "    that our entropy game has a direct thermodynamic interpretation.\n",
    "\n",
    "The information-modified second law indicates that the maximum\n",
    "extractable work is increased by $k_BT\\cdot I(X;M)$, where $I(X;M)$ is\n",
    "the mutual information between observable variables and memory. This\n",
    "creates a direct connection between our information reservoir model and\n",
    "physical thermodynamic systems.\n",
    "\n",
    "The zero-player game provides a mathematical model for studying how\n",
    "complex systems evolve when they instantaneously maximize entropy\n",
    "production."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0477271-47f9-474e-8218-7fdf66d13fb2",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conclusion.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-conclusion.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The zero-player game Jaynes’ world provides a mathematical model for\n",
    "studying how complex systems evolve when they instantaneously maximize\n",
    "entropy production.\n",
    "\n",
    "Our analysis suggests the game could illustrate the fundamental\n",
    "principles of information dynamics, including information conservation,\n",
    "an uncertainty principle, self-organization, and information-energy\n",
    "duality.\n",
    "\n",
    "The game’s architecture should naturally organize into memory and\n",
    "processing components, without requiring explicit design.\n",
    "\n",
    "The game’s temporal dynamics are based on steepest ascent in parameter\n",
    "space, this allows for analysis through the Fisher information matrix’s\n",
    "eigenvalue structure to create a natural separation of timescales and\n",
    "the natural emergence of information reservoirs."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6ab8610-2dbc-416b-b800-1375176f0aae",
   "metadata": {},
   "source": [
    "## Unifying Perspectives on Intelligence\n",
    "\n",
    "There are multiple perspectives we can take to understanding optimal\n",
    "decision making: entropy games, thermodynamic information engines, least\n",
    "action principles (and optimal control), and Schrödinger’s bridge -\n",
    "provide different views. Through introducing Jaynes’ world we look to\n",
    "explore the relationship between these different views of decision\n",
    "making to provide a more complete perspective of the limitations and\n",
    "possibilities for making optimal decisions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de72dd18-fcfc-4ffa-a83d-74fdcd1bfe69",
   "metadata": {},
   "source": [
    "## A Unified View of Intelligence Through Information\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/unified-intelligence-perspective.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/unified-intelligence-perspective.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "The multiple perspectives we’ve explored - entropy games, information\n",
    "engines, least action principles, and Schrödinger’s bridge - provide\n",
    "complementary views of intelligence as optimal information processing.\n",
    "Each framework highlights different aspects of this fundamental process:\n",
    "\n",
    "1.  **The Entropy Game** shows us that intelligence can be measured by\n",
    "    how efficiently a system reduces uncertainty through strategic\n",
    "    questioning or observation.\n",
    "\n",
    "2.  **Information Engines** reveal how intelligence converts information\n",
    "    into useful work, subject to thermodynamic constraints.\n",
    "\n",
    "3.  **Least Action Principles** demonstrate that intelligence follows\n",
    "    optimal paths through information space, minimizing cumulative\n",
    "    uncertainty.\n",
    "\n",
    "4.  **Schrödinger’s Bridge** illuminates how intelligence can be viewed\n",
    "    as optimal transport of probability distributions, finding the most\n",
    "    likely paths between states of knowledge.\n",
    "\n",
    "These perspectives converge on a unified view: intelligence is\n",
    "fundamentally about optimal information processing. Whether we’re\n",
    "discussing human cognition, artificial intelligence, or biological\n",
    "systems, the capacity to efficiently acquire, process, and utilize\n",
    "information lies at the core of intelligent behavior.\n",
    "\n",
    "This unified perspective offers promising directions for both\n",
    "theoretical research and practical applications. By understanding\n",
    "intelligence through the lens of information theory and thermodynamics,\n",
    "we may develop more principled approaches to artificial intelligence,\n",
    "gain deeper insights into cognitive processes, and discover fundamental\n",
    "limits on what intelligence can achieve."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93b79d7a-3747-4c61-ac90-82c614f66ef7",
   "metadata": {},
   "source": [
    "## Thanks!\n",
    "\n",
    "For more information on these subjects and more you might want to check\n",
    "the following resources.\n",
    "\n",
    "-   book: [The Atomic\n",
    "    Human](https://www.penguin.co.uk/books/455130/the-atomic-human-by-lawrence-neil-d/9780241625248)\n",
    "-   twitter: [@lawrennd](https://twitter.com/lawrennd)\n",
    "-   podcast: [The Talking Machines](http://thetalkingmachines.com)\n",
    "-   newspaper: [Guardian Profile\n",
    "    Page](http://www.theguardian.com/profile/neil-lawrence)\n",
    "-   blog:\n",
    "    [http://inverseprobability.com](http://inverseprobability.com/blog.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30bc4e4e-1348-42ce-90ba-0f4df0cf713b",
   "metadata": {},
   "source": [
    "## References"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3729d15d-8878-47ad-9d87-24542d38acc7",
   "metadata": {},
   "source": [
    "Alemi, A.A., Fischer, I., 2019. [TherML: The thermodynamics of machine\n",
    "learning](https://openreview.net/forum?id=HJeQToAqKQ). arXiv Preprint\n",
    "arXiv:1807.04162.\n",
    "\n",
    "Amari, S., 2016. Information geometry and its applications, Applied\n",
    "mathematical sciences. Springer, Tokyo.\n",
    "<https://doi.org/10.1007/978-4-431-55978-8>\n",
    "\n",
    "Ashby, W.R., 1952. Design for a brain: The origin of adaptive behaviour.\n",
    "Chapman & Hall, London.\n",
    "\n",
    "Barato, A.C., Seifert, U., 2014. Stochastic thermodynamics with\n",
    "information reservoirs. Physical Review E 90, 042150.\n",
    "<https://doi.org/10.1103/PhysRevE.90.042150>\n",
    "\n",
    "Beckner, W., 1975. Inequalities in Fourier analysis. Annals of\n",
    "Mathematics 159–182. <https://doi.org/10.2307/1970980>\n",
    "\n",
    "Bennet, C.H., 1982. The thermodynamics of computation—a review.\n",
    "International Journal of Theoretical Physics 21, 906–940.\n",
    "\n",
    "Białynicki-Birula, I., Mycielski, J., 1975. Uncertainty relations for\n",
    "information entropy in wave mechanics. Communications in Mathematical\n",
    "Physics 44, 129–132. <https://doi.org/10.1007/BF01608825>\n",
    "\n",
    "Boltzmann, L., n.d. Über die Beziehung zwischen dem zweiten Hauptsatze\n",
    "der mechanischen Warmetheorie und der Wahrscheinlichkeitsrechnung,\n",
    "respective den Sätzen über das wärmegleichgewicht. Sitzungberichte der\n",
    "Kaiserlichen Akademie der Wissenschaften. Mathematisch-Naturwissen\n",
    "Classe. Abt. II LXXVI, 373–435.\n",
    "\n",
    "Brillouin, L., 1951. Maxwell’s demon cannot operate: Information and\n",
    "entropy. i. Journal of Applied Physics 22, 334–337.\n",
    "<https://doi.org/10.1063/1.1699951>\n",
    "\n",
    "Bub, J., 2001. Maxwell’s demon and the thermodynamics of computation.\n",
    "Studies in History and Philosophy of Science Part B: Modern Physics 32,\n",
    "569–579. <https://doi.org/10.1016/S1355-2198(01)00023-5>\n",
    "\n",
    "Conway, F., Siegelman, J., 2005. Dark hero of the information age: In\n",
    "search of norbert wiener the father of cybernetics. Basic Books, New\n",
    "York.\n",
    "\n",
    "Eddington, A.S., 1929. The nature of the physical world. Dent (London).\n",
    "<https://doi.org/10.2307/2180099>\n",
    "\n",
    "Frieden, B.R., 1998. Physics from fisher information: A unification.\n",
    "Cambridge University Press, Cambridge, UK.\n",
    "<https://doi.org/10.1017/CBO9780511622670>\n",
    "\n",
    "Hirschman Jr, I.I., 1957. A note on entropy. American Journal of\n",
    "Mathematics 79, 152–156. <https://doi.org/10.2307/2372390>\n",
    "\n",
    "Hosoya, A., Maruyama, K., Shikano, Y., 2015. Operational derivation of\n",
    "Boltzmann distribution with Maxwell’s demon model. Scientific Reports 5,\n",
    "17011. <https://doi.org/10.1038/srep17011>\n",
    "\n",
    "Hosoya, A., Maruyama, K., Shikano, Y., 2011. Maxwell’s demon and data\n",
    "compression. Phys. Rev. E 84, 061117.\n",
    "<https://doi.org/10.1103/PhysRevE.84.061117>\n",
    "\n",
    "Jarzynski, C., 1997. Nonequilibrium equality for free energy\n",
    "differences. Physical Review Letters 78, 2690–2693.\n",
    "<https://doi.org/10.1103/PhysRevLett.78.2690>\n",
    "\n",
    "Jaynes, E.T., 1963. Information theory and statistical mechanics, in:\n",
    "Ford, K.W. (Ed.), Brandeis University Summer Institute Lectures in\n",
    "Theoretical Physics, Vol. 3: Statistical Physics. W. A. Benjamin, Inc.,\n",
    "New York, pp. 181–218.\n",
    "\n",
    "Jaynes, E.T., 1957. Information theory and statistical mechanics.\n",
    "Physical Review 106, 620–630. <https://doi.org/10.1103/PhysRev.106.620>\n",
    "\n",
    "Landauer, R., 1961. Irreversibility and heat generation in the computing\n",
    "process. IBM Journal of Research and Development 5, 183–191.\n",
    "<https://doi.org/10.1147/rd.53.0183>\n",
    "\n",
    "Maxwell, J.C., 1871. Theory of heat. Longmans, Green; Co, London.\n",
    "\n",
    "Mikhailov, G.K., n.d. Daniel bernoulli, hydrodynamica (1738).\n",
    "\n",
    "Parrondo, J.M.R., Horowitz, J.M., Sagawa, T., 2015. Thermodynamics of\n",
    "information. Nature Physics 11, 131–139.\n",
    "<https://doi.org/10.1038/nphys3230>\n",
    "\n",
    "Sagawa, T., Ueda, M., 2010. Generalized Jarzynski equality under\n",
    "nonequilibrium feedback control. Physical Review Letters 104, 090602.\n",
    "<https://doi.org/10.1103/PhysRevLett.104.090602>\n",
    "\n",
    "Sagawa, T., Ueda, M., 2008. Second law of thermodynamics with discrete\n",
    "quantum feedback control. Physical Review Letters 100, 080403.\n",
    "<https://doi.org/10.1103/PhysRevLett.100.080403>\n",
    "\n",
    "Shannon, C.E., 1948. A mathematical theory of communication. The Bell\n",
    "System Technical Journal 27, 379–423, 623–656.\n",
    "<https://doi.org/10.1002/j.1538-7305.1948.tb01338.x>\n",
    "\n",
    "Sharp, K., Matschinsky, F., 2015. Translation of Ludwig Boltzmann’s\n",
    "paper “on the relationship between the second fundamental theorem of the\n",
    "mechanical theory of heat and probability calculations regarding the\n",
    "conditions for thermal equilibrium.” Entropy 17, 1971–2009.\n",
    "<https://doi.org/10.3390/e17041971>\n",
    "\n",
    "Szilard, L., 1929. Über die Entropieverminderung in einem\n",
    "thermodynamischen System bei Eingriffen intelligenter Wesen. Zeitschrift\n",
    "für Physik 53, 840–856. <https://doi.org/10.1007/BF01341281>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}