{ "cells": [ { "cell_type": "markdown", "id": "7227bb64-33ba-4e47-bc63-82b4c460712c", "metadata": {}, "source": [ "# Jaynes’ World\n", "\n", "### Neil D. Lawrence\n", "\n", "### 2025-04-15" ] }, { "cell_type": "markdown", "id": "dcc04151-27e5-4b45-8a52-7561a7bed051", "metadata": {}, "source": [ "**Abstract**: The relationship between physical systems and intelligence\n", "has long fascinated researchers in computer science and physics. This\n", "talk explores fundamental connections between thermodynamic systems and\n", "intelligent decision-making through the lens of free energy principles.\n", "\n", "We examine how concepts from statistical mechanics - particularly the\n", "relationship between total energy, free energy, and entropy - might\n", "provide novel insights into the nature of intelligence and learning. By\n", "drawing parallels between physical systems and information processing,\n", "we consider how measurement and observation can be viewed as processes\n", "that modify available energy. The discussion encompasses how model\n", "approximations and uncertainties might be understood through\n", "thermodynamic analogies, and explores the implications of treating\n", "intelligence as an energy-efficient state-change process.\n", "\n", "While these connections remain speculative, they offer a potential\n", "shared language for discussing the emergence of natural laws and\n", "societal systems through the lens of information." ] }, { "cell_type": "markdown", "id": "4b773e1c-3155-4361-885b-543764d06ddd", "metadata": {}, "source": [ "$$\n", "$$" ] }, { "cell_type": "markdown", "id": "72d6f976-9730-44ce-ab1d-5105276add1c", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "## Hydrodynamica\n", "\n", "\\[edit\\]\n", "\n", "When Laplace spoke of the curve of a simple molecule of air, he may well\n", "have been thinking of Daniel Bernoulli (1700-1782). Daniel Bernoulli was\n", "one name in a prodigious family. His father and brother were both\n", "mathematicians. Daniel’s main work was known as *Hydrodynamica*.\n", "\n", "``` python\n", "import notutils as nu\n", "nu.display_google_book(id='3yRVAAAAcAAJ', page='PP7')\n", "```\n", "\n", "Figure: Daniel Bernoulli’s *Hydrodynamica* published in 1738. It was\n", "one of the first works to use the idea of conservation of energy. It\n", "used Newton’s laws to predict the behaviour of gases.\n", "\n", "Daniel Bernoulli described a kinetic theory of gases, but it wasn’t\n", "until 170 years later when these ideas were verified after Einstein had\n", "proposed a model of Brownian motion which was experimentally verified by\n", "Jean Baptiste Perrin.\n", "\n", "``` python\n", "import notutils as nu\n", "nu.display_google_book(id='3yRVAAAAcAAJ', page='PA200')\n", "```\n", "\n", "Figure: Daniel Bernoulli’s chapter on the kinetic theory of gases,\n", "for a review on the context of this chapter see Mikhailov (n.d.). For\n", "1738 this is extraordinary thinking. The notion of kinetic theory of\n", "gases wouldn’t become fully accepted in Physics until 1908 when a model\n", "of Einstein’s was verified by Jean Baptiste Perrin.\n", "\n", "## Entropy Billiards\n", "\n", "\\[edit\\]\n", "\n", "\n", "\n", "\n", "Entropy:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Figure: Bernoulli’s simple kinetic models of gases assume that the\n", "molecules of air operate like billiard balls.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "p = np.random.randn(10000, 1)\n", "xlim = [-4, 4]\n", "x = np.linspace(xlim[0], xlim[1], 200)\n", "y = 1/np.sqrt(2*np.pi)*np.exp(-0.5*x*x)\n", "```\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "```\n", "\n", "``` python\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "ax.plot(x, y, 'r', linewidth=3)\n", "ax.hist(p, 100, density=True)\n", "ax.set_xlim(xlim)\n", "\n", "mlai.write_figure('gaussian-histogram.svg', directory='./ml')\n", "```\n", "\n", "Another important figure for Cambridge was the first to derive the\n", "probability distribution that results from small balls banging together\n", "in this manner. In doing so, James Clerk Maxwell founded the field of\n", "statistical physics.\n", "\n", "\n", "\n", "Figure: James Clerk Maxwell 1831-1879 Derived distribution of\n", "velocities of particles in an ideal gas (elastic fluid).\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "Figure: James Clerk Maxwell (1831-1879), Ludwig Boltzmann (1844-1906)\n", "Josiah Willard Gibbs (1839-1903)\n", "\n", "Many of the ideas of early statistical physicists were rejected by a\n", "cadre of physicists who didn’t believe in the notion of a molecule. The\n", "stress of trying to have his ideas established caused Boltzmann to\n", "commit suicide in 1906, only two years before the same ideas became\n", "widely accepted.\n", "\n", "``` python\n", "import notutils as nu\n", "nu.display_google_book(id='Vuk5AQAAMAAJ', page='PA373')\n", "```\n", "\n", "Figure: Boltzmann’s paper Boltzmann (n.d.) which introduced the\n", "relationship between entropy and probability. A translation with notes\n", "is available in Sharp and Matschinsky (2015).\n", "\n", "The important point about the uncertainty being represented here is that\n", "it is not genuine stochasticity, it is a lack of knowledge about the\n", "system. The techniques proposed by Maxwell, Boltzmann and Gibbs allow us\n", "to exactly represent the state of the system through a set of parameters\n", "that represent the sufficient statistics of the physical system. We know\n", "these values as the volume, temperature, and pressure. The challenge for\n", "us, when approximating the physical world with the techniques we will\n", "use is that we will have to sit somewhere between the deterministic and\n", "purely stochastic worlds that these different scientists described.\n", "\n", "One ongoing characteristic of people who study probability and\n", "uncertainty is the confidence with which they hold opinions about it.\n", "Another leader of the Cavendish laboratory expressed his support of the\n", "second law of thermodynamics (which can be proven through the work of\n", "Gibbs/Boltzmann) with an emphatic statement at the beginning of his\n", "book.\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "Figure: Eddington’s book on the Nature of the Physical World\n", "(Eddington, 1929)\n", "\n", "The same Eddington is also famous for dismissing the ideas of a young\n", "Chandrasekhar who had come to Cambridge to study in the Cavendish lab.\n", "Chandrasekhar demonstrated the limit at which a star would collapse\n", "under its own weight to a singularity, but when he presented the work to\n", "Eddington, he was dismissive suggesting that there “must be some natural\n", "law that prevents this abomination from happening”.\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "Figure: Chandrasekhar (1910-1995) derived the limit at which a star\n", "collapses in on itself. Eddington’s confidence in the 2nd law may have\n", "been what drove him to dismiss Chandrasekhar’s ideas, humiliating a\n", "young scientist who would later receive a Nobel prize for the work.\n", "\n", "\n", "\n", "Figure: Eddington makes his feelings about the primacy of the second\n", "law clear. This primacy is perhaps because the second law can be\n", "demonstrated mathematically, building on the work of Maxwell, Gibbs and\n", "Boltzmann. Eddington (1929)\n", "\n", "Presumably he meant that the creation of a black hole seemed to\n", "transgress the second law of thermodynamics, although later Hawking was\n", "able to show that blackholes do evaporate, but the time scales at which\n", "this evaporation occurs is many orders of magnitude slower than other\n", "processes in the universe.\n", "\n", "## Maxwell’s Demon\n", "\n", "\\[edit\\]\n", "\n", "Maxwell’s demon is a thought experiment described by James Clerk Maxwell\n", "in his book, *Theory of Heat* (Maxwell, 1871) on page 308.\n", "\n", "> But if we conceive a being whose faculties are so sharpened that he\n", "> can follow every molecule in its course, such a being, whose\n", "> attributes are still as essentially finite as our own, would be able\n", "> to do what is at present impossible to us. For we have seen that the\n", "> molecules in a vessel full of air at uniform temperature are moving\n", "> with velocities by no means uniform, though the mean velocity of any\n", "> great number of them, arbitrarily selected, is almost exactly uniform.\n", "> Now let us suppose that such a vessel is divided into two portions, A\n", "> and B, by a division in which there is a small hole, and that a being,\n", "> who can see the individual molecules, opens and closes this hole, so\n", "> as to allow only the swifter molecules to pass from A to B, and the\n", "> only the slower ones to pass from B to A. He will thus, without\n", "> expenditure of work, raise the temperature of B and lower that of A,\n", "> in contradiction to the second law of thermodynamics.\n", ">\n", "> James Clerk Maxwell in *Theory of Heat* (Maxwell, 1871) page 308\n", "\n", "He goes onto say:\n", "\n", "> This is only one of the instances in which conclusions which we have\n", "> draw from our experience of bodies consisting of an immense number of\n", "> molecules may be found not to be applicable to the more delicate\n", "> observations and experiments which we may suppose made by one who can\n", "> perceive and handle the individual molecules which we deal with only\n", "> in large masses\n", "\n", "``` python\n", "import notutils as nu\n", "nu.display_google_book(id='0p8AAAAAMAAJ', page='PA308')\n", "```\n", "\n", "Figure: Maxwell’s demon was designed to highlight the statistical\n", "nature of the second law of thermodynamics.\n", "\n", "\n", "\n", "\n", "Entropy:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Figure: Maxwell’s Demon. The demon decides balls are either cold\n", "(blue) or hot (red) according to their velocity. Balls are allowed to\n", "pass the green membrane from right to left only if they are cold, and\n", "from left to right, only if they are hot.\n", "\n", "Maxwell’s demon allows us to connect thermodynamics with information\n", "theory (see e.g. Hosoya et al. (2015);Hosoya et al. (2011);Bub\n", "(2001);Brillouin (1951);Szilard (1929)). The connection arises due to a\n", "fundamental connection between information erasure and energy\n", "consumption Landauer (1961).\n", "\n", "Alemi and Fischer (2019)\n", "\n", "# Information Theory and Thermodynamics\n", "\n", "\\[edit\\]\n", "\n", "Information theory provides a mathematical framework for quantifying\n", "information. Many of information theory’s core concepts parallel those\n", "found in thermodynamics. The theory was developed by Claude Shannon who\n", "spoke extensively to MIT’s Norbert Wiener at while it was in development\n", "(Conway and Siegelman, 2005). Wiener’s own ideas about information were\n", "inspired by Willard Gibbs, one of the pioneers of the mathematical\n", "understanding of free energy and entropy. Deep connections between\n", "physical systems and information processing have connected information\n", "and energy from the start.\n", "\n", "## Entropy\n", "\n", "Shannon’s entropy measures the uncertainty or unpredictability of\n", "information content. This mathematical formulation is inspired by\n", "thermodynamic entropy, which describes the dispersal of energy in\n", "physical systems. Both concepts quantify the number of possible states\n", "and their probabilities.\n", "\n", "\n", "\n", "Figure: Maxwell’s demon thought experiment illustrates the\n", "relationship between information and thermodynamics.\n", "\n", "In thermodynamics, free energy represents the energy available to do\n", "work. A system naturally evolves to minimize its free energy, finding\n", "equilibrium between total energy and entropy. Free energy principles are\n", "also pervasive in variational methods in machine learning. They emerge\n", "from Bayesian approaches to learning and have been heavily promoted by\n", "e.g. Karl Friston as a model for the brain.\n", "\n", "The relationship between entropy and Free Energy can be explored through\n", "the Legendre transform. This is most easily reviewed if we restrict\n", "ourselves to distributions in the exponential family.\n", "\n", "## Exponential Family\n", "\n", "The exponential family has the form $$\n", " \\rho(Z) = h(Z) \\exp\\left(\\boldsymbol{\\theta}^\\top T(Z) + A(\\boldsymbol{\\theta})\\right)\n", "$$ where $h(Z)$ is the base measure, $\\boldsymbol{\\theta}$ is the\n", "natural parameters, $T(Z)$ is the sufficient statistics and\n", "$A(\\boldsymbol{\\theta})$ is the log partition function. Its entropy can\n", "be computed as $$\n", " S(Z) = A(\\boldsymbol{\\theta}) - \\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta}A(\\boldsymbol{\\theta}) - E_{\\rho(Z)}\\left[\\log h(Z)\\right],\n", "$$ where $E_{\\rho(Z)}[\\cdot]$ is the expectation under the distribution\n", "$\\rho(Z)$.\n", "\n", "## Available Energy\n", "\n", "## Work through Measurement\n", "\n", "In machine learning and Bayesian inference, the Markov blanket is the\n", "set of variables that are conditionally independent of the variable of\n", "interest given the other variables. To introduce this idea into our\n", "information system, we first split the system into two parts, the\n", "variables, $X$, and the memory $M$.\n", "\n", "The variables are the portion of the system that is stochastically\n", "evolving over time. The memory is a low entropy partition of the system\n", "that will give us knowledge about this evolution.\n", "\n", "We can now write the joint entropy of the system in terms of the mutual\n", "information between the variables and the memory. $$\n", "S(Z) = S(X,M) = S(X|M) + S(M) = S(X) - I(X;M) + S(M).\n", "$$ This gives us the first hint at the connection between information\n", "and energy.\n", "\n", "If $M$ is viewed as a measurement then the change in entropy of the\n", "system before and after measurement is given by $S(X|M) - S(X)$ wehich\n", "is given by $-I(X;M)$. This is implies that measurement increases the\n", "amount of available energy we obtain from the system (Parrondo et al.,\n", "2015).\n", "\n", "The difference in available energy is given by $$\n", "\\Delta A = A(X) - A(Z|M) = I(X;M),\n", "$$ where we note that the resulting system is no longer in thermodynamic\n", "equilibrium due to the low entropy of the memory.\n", "\n", "## The Animal Game\n", "\n", "\\[edit\\]\n", "\n", "The Entropy Game is a framework for understanding efficient uncertainty\n", "reduction. To start think of finding the optimal strategy for\n", "identifying an unknown entity by asking the minimum number of yes/no\n", "questions.\n", "\n", "## The 20 Questions Paradigm\n", "\n", "In the game of 20 Questions player one (Alice) thinks of an object,\n", "player two (Bob) must identify it by asking at most 20 yes/no questions.\n", "The optimal strategy is to divide the possibility space in half with\n", "each question. The binary search approach ensures maximum information\n", "gain with each inquiry and can access $2^20$ or about a million\n", "different objects.\n", "\n", "\n", "\n", "Figure: The optimal strategy in the Entropy Game resembles a binary\n", "search, dividing the search space in half with each question.\n", "\n", "## Entropy Reduction and Decisions\n", "\n", "From an information-theoretic perspective, decisions can be taken in a\n", "way that efficiently reduces entropy - our the uncertainty about the\n", "state of the world. Each observation or action an intelligent agent\n", "takes should maximize expected information gain, optimally reducing\n", "uncertainty given available resources.\n", "\n", "The entropy before the question is $S(X)$. The entropy after the\n", "question is $S(X|M)$. The information gain is the difference between the\n", "two, $I(X;M) = S(X) - S(X|M)$. Optimal decision making systems maximize\n", "this information gain per unit cost.\n", "\n", "## Thermodynamic Parallels\n", "\n", "The entropy game connects decision-making to thermodynamics.\n", "\n", "This perspective suggests a profound connection: intelligence might be\n", "understood as a special case of systems that efficiently extract,\n", "process, and utilize free energy from their environments, with\n", "thermodynamic principles setting fundamental constraints on what’s\n", "possible.\n", "\n", "# Information Engines: Intelligence as an Energy-Efficiency\n", "\n", "\\[edit\\]\n", "\n", "The entropy game shows some parallels between thermodynamics and\n", "measurement. This allows us to imagine *information engines*, simple\n", "systems that convert information to energy. This is our first simple\n", "model of intelligence.\n", "\n", "## Measurement as a Thermodynamic Process: Information-Modified Second Law\n", "\n", "The second law of thermodynamics was generalised to include the effect\n", "of measurement by Sagawa and Ueda (Sagawa and Ueda, 2008). They showed\n", "that the maximum extractable work from a system can be increased by\n", "$k_BTI(X;M)$ where $k_B$ is Boltzmann’s constant, $T$ is temperature and\n", "$I(X;M)$ is the information gained by making a measurement, $M$, $$\n", "I(X;M) = \\sum_{x,m} \\rho(x,m) \\log \\frac{\\rho(x,m)}{\\rho(x)\\rho(m)},\n", "$$ where $\\rho(x,m)$ is the joint probability of the system and\n", "measurement (see e.g. eq 14 in Sagawa and Ueda (2008)). This can be\n", "written as $$\n", "W_\\text{ext} \\leq - \\Delta\\mathcal{F} + k_BTI(X;M),\n", "$$ where $W_\\text{ext}$ is the extractable work and it is upper bounded\n", "by the negative change in free energy, $\\Delta \\mathcal{F}$, plus the\n", "energy gained from measurement, $k_BTI(X;M)$. This is the\n", "information-modified second law.\n", "\n", "The measurements can be seen as a thermodynamic process. In theory\n", "measurement, like computation is reversible. But in practice the process\n", "of measurement is likely to erode the free energy somewhat, but as long\n", "as the energy gained from information, $kTI(X;M)$ is greater than that\n", "spent in measurement the pricess can be thermodynamically efficient.\n", "\n", "The modified second law shows that the maximum additional extractable\n", "work is proportional to the information gained. So information\n", "acquisition creates extractable work potential. Thermodynamic\n", "consistency is maintained by properly accounting for information-entropy\n", "relationships.\n", "\n", "## Efficacy of Feedback Control\n", "\n", "Sagawa and Ueda extended this relationship to provide a *generalised\n", "Jarzynski equality* for feedback processes (Sagawa and Ueda, 2010). The\n", "Jarzynski equality is an imporant result from nonequilibrium\n", "thermodynamics that relates the average work done across an ensemble to\n", "the free energy difference between initial and final states (Jarzynski,\n", "1997), $$\n", "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\right\\rangle = \\exp\\left(-\\frac{\\Delta\\mathcal{F}}{k_BT}\\right),\n", "$$ where $\\langle W \\rangle$ is the average work done across an ensemble\n", "of trajectories, $\\Delta\\mathcal{F}$ is the change in free energy, $k_B$\n", "is Boltzmann’s constant, and $\\Delta S$ is the change in entropy. Sagawa\n", "and Ueda extended this equality to to include information gain from\n", "measurement (Sagawa and Ueda, 2010), $$\n", "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\exp\\left(\\frac{\\Delta\\mathcal{F}}{k_BT}\\right) \\exp\\left(-\\mathcal{I}(X;M)\\right)\\right\\rangle = 1,\n", "$$ where $\\mathcal{I}(X;M) = \\log \\frac{\\rho(X|M)}{\\rho(X)}$ is the\n", "information gain from measurement, and the mutual information is\n", "recovered $I(X;M) = \\left\\langle \\mathcal{I}(X;M) \\right\\rangle$ as the\n", "average information gain.\n", "\n", "Sagawa and Ueda introduce an *efficacy* term that captures the effect of\n", "feedback on the system they note in the presence of feedback, $$\n", "\\left\\langle \\exp\\left(-\\frac{W}{k_B T}\\right) \\exp\\left(\\frac{\\Delta\\mathcal{F}}{k_BT}\\right)\\right\\rangle = \\gamma,\n", "$$ where $\\gamma$ is the efficacy.\n", "\n", "## Channel Coding Perspective on Memory\n", "\n", "When viewing $M$ as an information channel between past and future\n", "states, Shannon’s channel coding theorems apply (Shannon, 1948). The\n", "channel capacity $C$ represents the maximum rate of reliable information\n", "transmission \\[ C = \\_{(M)} I(X_1;M) \\] and for a memory of $n$ bits we\n", "have \\[ C n, \\] as the mutual information is upper bounded by the\n", "entropy of $\\rho(M)$ which is at most $n$ bits.\n", "\n", "This relationship seems to align with Ashby’s Law of Requisite Variety\n", "(pg 229 Ashby (1952)), which states that a control system must have at\n", "least as much ‘variety’ as the system it aims to control. In the context\n", "of memory systems, this means that to maintain temporal correlations\n", "effectively, the memory’s state space must be at least as large as the\n", "information content it needs to preserve. This provides a lower bound on\n", "the necessary memory capacity that complements the bound we get from\n", "Shannon for channel capacity.\n", "\n", "This helps determine the required memory size for maintaining temporal\n", "correlations, optimal coding strategies, and fundamental limits on\n", "temporal correlation preservation.\n", "\n", "# Decomposition into Past and Future\n", "\n", "## Model Approximations and Thermodynamic Efficiency\n", "\n", "Intelligent systems must balance measurement against energy efficiency\n", "and time requirements. A perfect model of the world would require\n", "infinite computational resources and speed, so approximations are\n", "necessary. This leads to uncertainties. Thermodynamics might be thought\n", "of as the physics of uncertainty: at equilibrium thermodynamic systems\n", "find thermodynamic states that minimize free energy, equivalent to\n", "maximising entropy.\n", "\n", "## Markov Blanket\n", "\n", "To introduce some structure to the model assumption. We split $X$ into\n", "$X_0$ and $X_1$. $X_0$ is past and present of the system, $X_1$ is\n", "future The conditional mutual information $I(X_0;X_1|M)$ which is zero\n", "if $X_1$ and $X_0$ are independent conditioned on $M$.\n", "\n", "## At What Scales Does this Apply?\n", "\n", "The equipartition theorem tells us that at equilibrium the average\n", "energy is $kT/2$ per degree of freedom. This means that for systems that\n", "operate at “human scale” the energy involved is many orders of magnitude\n", "larger than the amount of information we can store in memory. For a car\n", "engine producing 70 kW of power at 370 Kelvin, this implies $$\n", "\\frac{2 \\times 70,000}{370 \\times k_B} = \\frac{2 \\times 70,000}{370\\times 1.380649×10^{−23}} = 2.74 × 10^{25} \n", "$$ degrees of freedom per second. If we make a conservative assumption\n", "of one bit per degree of freedom, then the mutual information we would\n", "require in one second for comparative energy production would be around\n", "3400 zettabytes, implying a memory bandwidth of around 3,400 zettabytes\n", "per second. In 2025 the estimate of all the data in the world stands at\n", "149 zettabytes.\n", "\n", "## Small-Scale Biochemical Systems and Information Processing\n", "\n", "While macroscopic systems operate in regimes where traditional\n", "thermodynamics dominates, microscopic biological systems operate at\n", "scales where information and thermal fluctuations become critically\n", "important. Here we examine how the framework applies to molecular\n", "machines and processes that have evolved to operate efficiently at these\n", "scales.\n", "\n", "Molecular machines like ATP synthase, kinesin motors, and the\n", "photosynthetic apparatus can be viewed as sophisticated information\n", "engines that convert energy while processing information about their\n", "environment. These systems have evolved to exploit thermal fluctuations\n", "rather than fight against them, using information processing to extract\n", "useful work.\n", "\n", "## ATP Synthase: Nature’s Rotary Engine\n", "\n", "ATP synthase functions as a rotary molecular motor that synthesizes ATP\n", "from ADP and inorganic phosphate using a proton gradient. The system\n", "uses the proton gradient as both an energy source and an information\n", "source about the cell’s energetic state and exploits Brownian motion\n", "through a ratchet mechanism. It converts information about proton\n", "locations into mechanical rotation and ultimately chemical energy with\n", "approximately 3-4 protons required per ATP.\n", "\n", "``` python\n", "from IPython.lib.display import YouTubeVideo\n", "YouTubeVideo('kXpzp4RDGJI')\n", "```\n", "\n", "Estimates suggest that one synapse firing may require $10^4$ ATP\n", "molecules, so around $4 \\times 10^4$ protons. If we take the human brain\n", "as containing around $10^{14}$ synapses, and if we suggest each synapse\n", "only fires about once every five seconds, we would require approximately\n", "$10^{18}$ protons per second to power the synapses in our brain. With\n", "each proton having six degrees of freedom. Under these rough\n", "calculations the memory capacity distributed across the ATP Synthase in\n", "our brain must be of order $6 \\times 10^{18}$ bits per second or 750\n", "petabytes of information per second. Of course this memory capacity\n", "would be devolved across the billions of neurons within hundreds or\n", "thousands of mitochondria that each can contain thousands of ATP\n", "synthase molecules. By composition of extremely small systems we can see\n", "it’s possible to improve efficiencies in ways that seem very impractical\n", "for a car engine.\n", "\n", "Quick note to clarify, here we’re referring to the information\n", "requirements to make our brain more energy efficient in its information\n", "processing rather than the information processing capabilities of the\n", "neurons themselves!\n", "\n", "## Jaynes’s Maximum Entropy Principle\n", "\n", "\\[edit\\]\n", "\n", "In his seminal 1957 paper (Jaynes, 1957), Ed Jaynes proposed a\n", "foundation for statistical mechanics based on information theory. Rather\n", "than relying on ergodic hypotheses or ensemble interpretations, Jaynes\n", "recast that the problem of assigning probabilities in statistical as a\n", "problem of inference with incomplete information.\n", "\n", "A central problem in statistical mechanics is assigning initial\n", "probabilities when our knowledge is incomplete. For example, if we know\n", "only the average energy of a system, what probability distribution\n", "should we use? Jaynes argued that we should use the distribution that\n", "maximizes entropy subject to the constraints of our knowledge.\n", "\n", "Jaynes illustrated the approachwith a simple example: Suppose a die has\n", "been tossed many times, with an average result of 4.5 rather than the\n", "expected 3.5 for a fair die. What probability assignment $P_n$\n", "($n=1,2,...,6$) should we make for the next toss?\n", "\n", "We need to satisfy two constraints\n", "\n", "Many distributions could satisfy these constraints, but which one makes\n", "the fewest unwarranted assumptions? Jaynes argued that we should choose\n", "the distribution that is maximally noncommittal with respect to missing\n", "information - the one that maximizes the entropy, This principle leads\n", "to the exponential family of distributions, which in statistical\n", "mechanics gives us the canonical ensemble and other familiar\n", "distributions.\n", "\n", "## The General Maximum-Entropy Formalism\n", "\n", "For a more general case, suppose a quantity $x$ can take values\n", "$(x_1, x_2, \\ldots, x_n)$ and we know the average values of several\n", "functions $f_k(x)$. The problem is to find the probability assignment\n", "$p_i = p(x_i)$ that satisfies and maximizes the entropy\n", "$S_I = -\\sum_{i=1}^n p_i \\log p_i$.\n", "\n", "Using Lagrange multipliers, the solution is the generalized canonical\n", "distribution, where $Z(\\lambda_1,\\ldots,\\lambda_m)$ is the partition\n", "function, The Lagrange multipliers $\\lambda_k$ are determined by the\n", "constraints, The maximum attainable entropy is\n", "\n", "## Jaynes’ World\n", "\n", "\\[edit\\]\n", "\n", "Jaynes’ World is a zero-player game that implements a version of the\n", "entropy game. The dynamical system is defined by a distribution,\n", "$\\rho(Z)$, over a state space $Z$. The state space is partitioned into\n", "observable variables $X$ and memory variables $M$. The memory variables\n", "are considered to be in an *information resevoir*, a thermodynamic\n", "system that maintains information in an ordered state (see e.g. Barato\n", "and Seifert (2014)). The entropy of the whole system is bounded below by\n", "0 and above by $N$. So the entropy forms a *compact manifold* with\n", "respect to its parameters.\n", "\n", "Unlike the animal game, where decisions are made by reducing entropy at\n", "each step, our system evovles mathematically by maximising the\n", "instantaneous entropy production. Conceptually we can think of this as\n", "*ascending* the gradient of the entropy, $S(Z)$.\n", "\n", "In the animal game the questioner starts with maximum uncertainty and\n", "targets minimal uncertainty. Jaynes’ world starts with minimal\n", "uncertainty and aims for maximum uncertainty.\n", "\n", "We can phrase this as a thought experiment. Imagine you are in the game,\n", "at a given turn. You want to see where the game came from, so you look\n", "back across turns. The direction the game came from is now the direction\n", "of steepest descent. Regardless of where the game actually started it\n", "looks like it started at a minimal entropy configuration that we call\n", "the *origin*. Similarly, wherever the game is actually stopped there\n", "will nevertheless appear to be an end point we call *end* that will be a\n", "configuration of maximal entropy, $N$.\n", "\n", "This speculation allows us to impose the functional form of our\n", "proability distribution. As Jaynes has shown (Jaynes, 1957), the\n", "stationary points of a free-form optimisation (minimum or maximum) will\n", "place the distribution in the, $\\rho(Z)$ in the *exponential family*, $$\n", "\\rho(Z) = h(Z) \\exp(\\boldsymbol{\\theta}^\\top T(Z) - A(\\boldsymbol{\\theta})),\n", "$$ where $h(Z)$ is the base measure, $T(Z)$ are sufficient statistics,\n", "$A(\\boldsymbol{\\theta})$ is the log-partition function,\n", "$\\boldsymbol{\\theta}$ are the *natural parameters* of the distribution.}\n", "\n", "This constraint to the exponential family is highly convenient as we\n", "will rely on it heavily for the dynamics of the game. In particular, by\n", "focussing on the *natural parameters* we find that we are optimising\n", "within an *information geometry* (Amari, 2016). In exponential family\n", "distributions, the entropy gradient is given by, $$\n", "\\nabla_{\\boldsymbol{\\theta}}S(Z) = \\mathbf{g} = \\nabla^2_\\boldsymbol{\\theta} A(\\boldsymbol{\\theta}(M))\n", "$$ And the Fisher information matrix, $G(\\boldsymbol{\\theta})$, is also\n", "the *Hessian* of the manifold, $$\n", "G(\\boldsymbol{\\theta}) = \\nabla^2_{\\boldsymbol{\\theta}} A(\\boldsymbol{\\theta}) = \\text{Cov}[T(Z)].\n", "$$ Traditionally, when optimising on an information geometry we take\n", "*natural gradient* steps, equivalen to a Newton minimisation step, $$\n", "\\Delta \\boldsymbol{\\theta} = - G(\\boldsymbol{\\theta})^{-1} \\mathbf{g},\n", "$$ but this is not the direction that gives the instantaneious\n", "maximisation of the entropy production, instead our gradient step is\n", "given by $$\n", "\\Delta \\boldsymbol{\\theta} = \\eta \\mathbf{g},\n", "$$ where $\\eta$ is a ‘learning rate’.\n", "\n", "## System Evolution\n", "\n", "We are now in a position to summarise the start state and the end state\n", "of our system, as well as to speculate on the nature of the transition\n", "between the two states.\n", "\n", "## Start State\n", "\n", "The *origin configuration* is a low entropy state, with value near the\n", "lower bound of 0. The information is highly structured, by definition we\n", "place all variables in $M$, the information resevoir at this time. The\n", "uncertainty principle is present to handle the competeing needs of\n", "precision in parameters (giving us the near-singular form for\n", "$\\boldsymbol{\\theta}(M)$, and capacity in the information channel that\n", "$M$ provides (the capacity $c(\\boldsymbol{\\theta})$ is upper bounded by\n", "$S(M)$.\n", "\n", "## End State\n", "\n", "The *end configuration* is a high entropy state, near the upper bound.\n", "Both the minimal entropy and maximal entropy states are revealed by Ed\n", "Jaynes’ variational minimisation approach and are in the exponential\n", "family. In many cases a version of Zeno’s paradox will arise where the\n", "system asymtotes to the final state, taking smaller steps at each time.\n", "At this point the system is at equilibrium.\n", "\n", "## Jaynes’ World\n", "\n", "\\[edit\\]\n", "\n", "This game explores how structure, time, causality, and locality might\n", "emerge within a system governed solely by internal information-theoretic\n", "constraints. The hope is that it can serve as\n", "\n", "- A *research framework* for observer-free dynamics and entropy-based\n", " emergence,\n", "- A *conceptual tool* for exploring the notion of an information\n", " topography: A landscape in which information flows under\n", " constraints.\n", "\n", "## Definitions and Global Constraints\n", "\n", "### System Structure\n", "\n", "Let $Z = \\{Z_1, Z_2, \\dots, Z_n\\}$ be the full set of system variables.\n", "At game turn $t$, define a partition where $X(t) \\subseteq Z$: are\n", "active variables (currently contributing to entropy) and\n", "$M(t) = Z \\setminus X(t)$: latent or frozen variables that are stored in\n", "the form of an *information reservoir* (Barato and Seifert\n", "(2014),Parrondo et al. (2015)).\n", "\n", "### Representation via Density Matrix\n", "\n", "We’ll argue that the configuration space must be represented by a\n", "density matrix, $$\n", "\\rho(\\boldsymbol{\\theta}) = \\frac{1}{Z(\\boldsymbol{\\theta})} \\exp\\left( \\sum_i \\theta_i H_i \\right),\n", "$$ where $\\boldsymbol{\\theta} \\in \\mathbb{R}^d$ are the natural\n", "parameters, each $H_i$ is a Hermitian operator associated with the\n", "observables and the partition function is given by\n", "$Z(\\boldsymbol{\\theta}) = \\mathrm{Tr}[\\exp(\\sum_i \\theta_i H_i)]$.\n", "\n", "From this we can see that the *log-partition function*, which has an\n", "interpretation as the cummulant generating function, is $$\n", "A(\\boldsymbol{\\theta}) = \\log Z(\\boldsymbol{\\theta})\n", "$$ and the von Neumann *entropy* is $$\n", "S(\\boldsymbol{\\theta}) = A(\\boldsymbol{\\theta}) - \\boldsymbol{\\theta}^\\top \\nabla A(\\boldsymbol{\\theta}).\n", "$$ We can show that the *Fisher Information Matrix* is $$\n", "G_{ij}(\\boldsymbol{\\theta}) = \\frac{\\partial^2 A}{\\partial \\theta_i \\partial \\theta_j}.\n", "$$\n", "\n", "### Entropy Capacity and Resolution\n", "\n", "We define our system to have a *maximum entropy* of $N$ bits. If the\n", "dimension $d$ of the parameter space is fixed, this implies a *minimum\n", "detectable resolution* in natural parameter space, $$\n", "\\varepsilon \\sim \\frac{1}{2^N},\n", "$$ where changes in natural parameters smaller than $\\varepsilon$ are\n", "treated as *invisible* by the system. As a result, system dynamics\n", "exhibit *discrete, detectable transitions* between distinguishable\n", "states.\n", "\n", "Note if the dimension $d$ scales with $N$ (e.g., $d = \\alpha N$ for some\n", "constant $\\alpha$), then the resolution constraint becomes more complex.\n", "In this case, the volume of distinguishable states $(\\varepsilon)^d$\n", "must equal $2^N$, which leads to $\\varepsilon = 2^{1/\\alpha}$, a\n", "constant independent of $N$. This suggests that as the system’s entropy\n", "capacity grows, it maintains a constant resolution while exponentially\n", "increasing the number of distinguishable states.\n", "\n", "## Dual Role of Parameters and Variables\n", "\n", "Each variable $Z_i$ is associated with a generator $H_i$, and a natural\n", "parameter $\\theta_i$. When we say a parameter $\\theta_i \\in X(t)$, we\n", "mean that the component of the system associated with $H_i$ is active at\n", "time $t$ and its parameter is evolving with\n", "$|\\dot{\\theta}_i| \\geq \\varepsilon$. This comes from the duality\n", "*variables*, *observables*, and *natural parameters* that we find in\n", "exponential family representations and we also see in a density matrix\n", "representation.\n", "\n", "## Core Axiom: Entropic Dynamics\n", "\n", "Our core axiom is that the system evolves by steepest ascent in entropy.\n", "The gradient of the density matrix with respect to the natural\n", "parameters is given by $$\n", "\\nabla S[\\rho] = -G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\n", "$$ and so we set $$\n", "\\frac{d\\boldsymbol{\\theta}}{dt} = -G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\n", "$$\n", "\n", "## Histogram Game\n", "\n", "\\[edit\\]\n", "\n", "To illustrate the concept of the Jaynes’ world entropy game we’ll run a\n", "simple example using a four bin histogram. The entropy of a four bin\n", "histogram can be computed as, $$\n", "S(p) = - \\sum_{i=1}^4 p_i \\log_2 p_i.\n", "$$\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "First we write some helper code to plot the histogram and compute its\n", "entropy.\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "```\n", "\n", "``` python\n", "def plot_histogram(ax, p, max_height=None):\n", " heights = p\n", " if max_height is None:\n", " max_height = 1.25*heights.max()\n", " \n", " # Safe entropy calculation that handles zeros\n", " nonzero_p = p[p > 0] # Filter out zeros\n", " S = - (nonzero_p*np.log2(nonzero_p)).sum()\n", "\n", " # Define bin edges\n", " bins = [1, 2, 3, 4, 5] # Bin edges\n", "\n", " # Create the histogram\n", " if ax is None:\n", " fig, ax = plt.subplots(figsize=(6, 4)) # Adjust figure size \n", " ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black') # Use weights for probabilities\n", "\n", "\n", " # Customize the plot for better slide presentation\n", " ax.set_xlabel(\"Bin\")\n", " ax.set_ylabel(\"Probability\")\n", " ax.set_title(f\"Four Bin Histogram (Entropy {S:.3f})\")\n", " ax.set_xticks(bins[:-1]) # Show correct x ticks\n", " ax.set_ylim(0,max_height) # Set y limit for visual appeal\n", "```\n", "\n", "We can compute the entropy of any given histogram.\n", "\n", "``` python\n", "\n", "# Define probabilities\n", "p = np.zeros(4)\n", "p[0] = 4/13\n", "p[1] = 3/13\n", "p[2] = 3.7/13\n", "p[3] = 1 - p.sum()\n", "\n", "# Safe entropy calculation\n", "nonzero_p = p[p > 0] # Filter out zeros\n", "entropy = - (nonzero_p*np.log2(nonzero_p)).sum()\n", "print(f\"The entropy of the histogram is {entropy:.3f}.\")\n", "```\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "```\n", "\n", "``` python\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "fig.tight_layout()\n", "plot_histogram(ax, p)\n", "ax.set_title(f\"Four Bin Histogram (Entropy {entropy:.3f})\")\n", "mlai.write_figure(filename='four-bin-histogram.svg', \n", " directory = './information-game')\n", "```\n", "\n", "\n", "\n", "Figure: The entropy of a four bin histogram.\n", "\n", "We can play the entropy game by starting with a histogram with all the\n", "probability mass in the first bin and then ascending the gradient of the\n", "entropy function.\n", "\n", "## Two-Bin Histogram Example\n", "\n", "The simplest possible example of Jaynes’ World is a two-bin histogram\n", "with probabilities $p$ and $1-p$. This minimal system allows us to\n", "visualize the entire entropy landscape.\n", "\n", "The natural parameter is the log odds, $\\theta = \\log\\frac{p}{1-p}$, and\n", "the update given by the entropy gradient is $$\n", "\\Delta \\theta_{\\text{steepest}} = \\eta \\frac{\\text{d}S}{\\text{d}\\theta} = \\eta p(1-p)(\\log(1-p) - \\log p).\n", "$$ The Fisher information is $$\n", "G(\\theta) = p(1-p)\n", "$$ This creates a dynamic where as $p$ approaches either 0 or 1 (minimal\n", "entropy states), the Fisher information approaches zero, creating a\n", "critical slowing” effect. This critical slowing is what leads to the\n", "formation of *information resevoirs*. Note also that in the *natural\n", "gradient* the updated is given by multiplying the gradient by the\n", "inverse Fisher information, which would lead to a more efficient update\n", "of the form, $$\n", "\\Delta \\theta_{\\text{natural}} = \\eta(\\log(1-p) - \\log p),\n", "$$ however, it is this efficiency that we want our game to avoid,\n", "because it is the inefficient behaviour in the reagion of saddle points\n", "that leads to critical slowing and the emergence of information\n", "resevoirs.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "# Python code for gradients\n", "p_values = np.linspace(0.000001, 0.999999, 10000)\n", "theta_values = np.log(p_values/(1-p_values))\n", "entropy = -p_values * np.log(p_values) - (1-p_values) * np.log(1-p_values)\n", "fisher_info = p_values * (1-p_values)\n", "gradient = fisher_info * (np.log(1-p_values) - np.log(p_values))\n", "```\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "```\n", "\n", "``` python\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=plot.big_wide_figsize)\n", "\n", "ax1.plot(theta_values, entropy)\n", "ax1.set_xlabel('$\\\\theta$')\n", "ax1.set_ylabel('Entropy $S(p)$')\n", "ax1.set_title('Entropy Landscape')\n", "\n", "ax2.plot(theta_values, gradient)\n", "ax2.set_xlabel('$\\\\theta$')\n", "ax2.set_ylabel('$\\\\nabla_\\\\theta S(p)$')\n", "ax2.set_title('Entropy Gradient vs. Position')\n", "\n", "mlai.write_figure(filename='two-bin-histogram-entropy-gradients.svg', \n", " directory = './information-game')\n", "```\n", "\n", "\n", "\n", "Figure: Entropy gradients of the two bin histogram agains\n", "position.\n", "\n", "This example reveals the entropy extrema at $p = 0$, $p = 0.5$, and\n", "$p = 1$. At minimal entropy ($p \\approx 0$ or $p \\approx 1$), the\n", "gradient approaches zero, creating natural information reservoirs. The\n", "dynamics slow dramatically near these points - these are the areas of\n", "critical slowing that create information reservoirs.\n", "\n", "## Gradient Ascent in Natural Parameter Space\n", "\n", "We can visualize the entropy maximization process by performing gradient\n", "ascent in the natural parameter space $\\theta$. Starting from a\n", "low-entropy state, we follow the gradient of entropy with respect to\n", "$\\theta$ to reach the maximum entropy state.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "# Helper functions for two-bin histogram\n", "def theta_to_p(theta):\n", " \"\"\"Convert natural parameter theta to probability p\"\"\"\n", " return 1.0 / (1.0 + np.exp(-theta))\n", "\n", "def p_to_theta(p):\n", " \"\"\"Convert probability p to natural parameter theta\"\"\"\n", " # Add small epsilon to avoid numerical issues\n", " p = np.clip(p, 1e-10, 1-1e-10)\n", " return np.log(p/(1-p))\n", "\n", "def entropy(theta):\n", " \"\"\"Compute entropy for given theta\"\"\"\n", " p = theta_to_p(theta)\n", " # Safe entropy calculation\n", " return -p * np.log2(p) - (1-p) * np.log2(1-p)\n", "\n", "def entropy_gradient(theta):\n", " \"\"\"Compute gradient of entropy with respect to theta\"\"\"\n", " p = theta_to_p(theta)\n", " return p * (1-p) * (np.log2(1-p) - np.log2(p))\n", "\n", "def plot_histogram(ax, theta, max_height=None):\n", " \"\"\"Plot two-bin histogram for given theta\"\"\"\n", " p = theta_to_p(theta)\n", " heights = np.array([p, 1-p])\n", " \n", " if max_height is None:\n", " max_height = 1.25\n", " \n", " # Compute entropy\n", " S = entropy(theta)\n", " \n", " # Create the histogram\n", " bins = [1, 2, 3] # Bin edges\n", " if ax is None:\n", " fig, ax = plt.subplots(figsize=(6, 4))\n", " ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black')\n", " \n", " # Customize the plot\n", " ax.set_xlabel(\"Bin\")\n", " ax.set_ylabel(\"Probability\")\n", " ax.set_title(f\"Two-Bin Histogram (Entropy {S:.3f})\")\n", " ax.set_xticks(bins[:-1])\n", " ax.set_ylim(0, max_height)\n", "```\n", "\n", "``` python\n", "# Parameters for gradient ascent\n", "theta_initial = -9.0 # Start with low entropy \n", "learning_rate = 1\n", "num_steps = 1500\n", "\n", "# Initialize\n", "theta_current = theta_initial\n", "theta_history = [theta_current]\n", "p_history = [theta_to_p(theta_current)]\n", "entropy_history = [entropy(theta_current)]\n", "\n", "# Perform gradient ascent in theta space\n", "for step in range(num_steps):\n", " # Compute gradient\n", " grad = entropy_gradient(theta_current)\n", " \n", " # Update theta\n", " theta_current = theta_current + learning_rate * grad\n", " \n", " # Store history\n", " theta_history.append(theta_current)\n", " p_history.append(theta_to_p(theta_current))\n", " entropy_history.append(entropy(theta_current))\n", " if step % 100 == 0:\n", " print(f\"Step {step+1}: θ = {theta_current:.4f}, p = {p_history[-1]:.4f}, Entropy = {entropy_history[-1]:.4f}\")\n", "```\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "```\n", "\n", "``` python\n", "# Create a figure showing the evolution\n", "fig, axes = plt.subplots(2, 3, figsize=(15, 8))\n", "fig.tight_layout(pad=3.0)\n", "\n", "# Select steps to display\n", "steps_to_show = [0, 300, 600, 900, 1200, 1500]\n", "\n", "# Plot histograms for selected steps\n", "for i, step in enumerate(steps_to_show):\n", " row, col = i // 3, i % 3\n", " plot_histogram(axes[row, col], theta_history[step])\n", " axes[row, col].set_title(f\"Step {step}: θ = {theta_history[step]:.2f}, p = {p_history[step]:.3f}\")\n", "\n", "mlai.write_figure(filename='two-bin-histogram-evolution.svg', \n", " directory = './information-game')\n", "\n", "# Plot entropy evolution\n", "plt.figure(figsize=(10, 6))\n", "plt.plot(range(num_steps+1), entropy_history, 'o-')\n", "plt.xlabel('Gradient Ascent Step')\n", "plt.ylabel('Entropy')\n", "plt.title('Entropy Evolution During Gradient Ascent')\n", "plt.grid(True)\n", "mlai.write_figure(filename='two-bin-entropy-evolution.svg', \n", " directory = './information-game')\n", "\n", "# Plot trajectory in theta space\n", "plt.figure(figsize=(10, 6))\n", "theta_range = np.linspace(-5, 5, 1000)\n", "entropy_curve = [entropy(t) for t in theta_range]\n", "plt.plot(theta_range, entropy_curve, 'b-', label='Entropy Landscape')\n", "plt.plot(theta_history, entropy_history, 'ro-', label='Gradient Ascent Path')\n", "plt.xlabel('Natural Parameter θ')\n", "plt.ylabel('Entropy')\n", "plt.title('Gradient Ascent Trajectory in Natural Parameter Space')\n", "plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)\n", "plt.legend()\n", "plt.grid(True)\n", "mlai.write_figure(filename='two-bin-trajectory.svg', \n", " directory = './information-game')\n", "```\n", "\n", "\n", "\n", "Figure: Evolution of the two-bin histogram during gradient ascent in\n", "natural parameter space.\n", "\n", "\n", "\n", "Figure: Entropy evolution during gradient ascent for the two-bin\n", "histogram.\n", "\n", "\n", "\n", "Figure: Gradient ascent trajectory in the natural parameter space for\n", "the two-bin histogram.\n", "\n", "The gradient ascent visualization shows how the system evolves in the\n", "natural parameter space $\\theta$. Starting from a negative $\\theta$\n", "(corresponding to a low-entropy state with $p << 0.5$), the system\n", "follows the gradient of entropy with respect to $\\theta$ until it\n", "reaches $\\theta = 0$ (corresponding to $p = 0.5$), which is the maximum\n", "entropy state.\n", "\n", "Note that the maximum entropy occurs at $\\theta = 0$, which corresponds\n", "to $p = 0.5$. The gradient of entropy with respect to $\\theta$ is zero\n", "at this point, making it a stable equilibrium for the gradient ascent\n", "process.\n", "\n", "## Four Bin Histogram Entropy Game\n", "\n", "\\[edit\\]\n", "\n", "To do this we represent the histogram parameters as a vector of length\n", "4, $\\mathbf{ w}{\\lambda} = [\\lambda_1, \\lambda_2, \\lambda_3, \\lambda_4]$\n", "and define the histogram probabilities to be\n", "$p_i = \\lambda_i^2 / \\sum_{j=1}^4 \\lambda_j^2$.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "# Define the entropy function \n", "def entropy(lambdas):\n", " p = lambdas**2/(lambdas**2).sum()\n", " \n", " # Safe entropy calculation\n", " nonzero_p = p[p > 0]\n", " nonzero_lambdas = lambdas[p > 0]\n", " return np.log2(np.sum(lambdas**2))-np.sum(nonzero_p * np.log2(nonzero_lambdas**2))\n", "\n", "# Define the gradient of the entropy function\n", "def entropy_gradient(lambdas):\n", " denominator = np.sum(lambdas**2)\n", " p = lambdas**2/denominator\n", " \n", " # Safe log calculation\n", " log_terms = np.zeros_like(lambdas)\n", " nonzero_idx = lambdas != 0\n", " log_terms[nonzero_idx] = np.log2(np.abs(lambdas[nonzero_idx]))\n", " \n", " p_times_lambda_entropy = -2*log_terms/denominator\n", " const = (p*p_times_lambda_entropy).sum()\n", " gradient = 2*lambdas*(p_times_lambda_entropy - const)\n", " return gradient\n", "\n", "# Numerical gradient check\n", "def numerical_gradient(func, lambdas, h=1e-5):\n", " numerical_grad = np.zeros_like(lambdas)\n", " for i in range(len(lambdas)):\n", " temp_lambda_plus = lambdas.copy()\n", " temp_lambda_plus[i] += h\n", " temp_lambda_minus = lambdas.copy()\n", " temp_lambda_minus[i] -= h\n", " numerical_grad[i] = (func(temp_lambda_plus) - func(temp_lambda_minus)) / (2 * h)\n", " return numerical_grad\n", "```\n", "\n", "We can then ascend the gradeint of the entropy function, starting at a\n", "parameter setting where the mass is placed in the first bin, we take\n", "$\\lambda_2 = \\lambda_3 = \\lambda_4 = 0.01$ and $\\lambda_1 = 100$.\n", "\n", "First to check our code we compare our numerical and analytic gradients.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "# Initial parameters (lambda)\n", "initial_lambdas = np.array([100, 0.01, 0.01, 0.01])\n", "\n", "# Gradient check\n", "numerical_grad = numerical_gradient(entropy, initial_lambdas)\n", "analytical_grad = entropy_gradient(initial_lambdas)\n", "print(\"Numerical Gradient:\", numerical_grad)\n", "print(\"Analytical Gradient:\", analytical_grad)\n", "print(\"Gradient Difference:\", np.linalg.norm(numerical_grad - analytical_grad)) # Check if close to zero\n", "```\n", "\n", "Now we can run the steepest ascent algorithm.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "# Steepest ascent algorithm\n", "lambdas = initial_lambdas.copy()\n", "\n", "learning_rate = 1\n", "turns = 15000\n", "entropy_values = []\n", "lambdas_history = []\n", "\n", "for _ in range(turns):\n", " grad = entropy_gradient(lambdas)\n", " lambdas += learning_rate * grad # update lambda for steepest ascent\n", " entropy_values.append(entropy(lambdas))\n", " lambdas_history.append(lambdas.copy())\n", "```\n", "\n", "We can plot the histogram at a set of chosen turn numbers to see the\n", "progress of the algorithm.\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "```\n", "\n", "``` python\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "plot_at = [0, 100, 1000, 2500, 5000, 7500, 10000, 12500, turns-1]\n", "for i, iter in enumerate(plot_at):\n", " plot_histogram(ax, lambdas_history[i]**2/(lambdas_history[i]**2).sum(), 1)\n", " # write the figure,\n", " mlai.write_figure(filename=f'four-bin-histogram-turn-{i:02d}.svg', \n", " directory = './information-game')\n", "```\n", "\n", "``` python\n", "import notutils as nu\n", "from ipywidgets import IntSlider\n", "```\n", "\n", "``` python\n", "nu.display_plots('two_point_sample{sample:0>3}.svg', \n", " './information-game', \n", " sample=IntSlider(5, 5, 5, 1))\n", "```\n", "\n", "### \n", "\n", "\n", "\n", "Figure: Intermediate stages of the histogram entropy game. After 0,\n", "1000, 5000, 10000 and 15000 iterations.\n", "\n", "And we can also plot the changing entropy as a function of the number of\n", "game turns.\n", "\n", "``` python\n", "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n", "ax.plot(range(turns), entropy_values)\n", "ax.set_xlabel(\"turns\")\n", "ax.set_ylabel(\"entropy\")\n", "ax.set_title(\"Entropy vs. turns (Steepest Ascent)\")\n", "mlai.write_figure(filename='four-bin-histogram-entropy-vs-turns.svg', \n", " directory = './information-game')\n", "```\n", "\n", "\n", "\n", "Figure: Four bin histogram entropy game. The plot shows the\n", "increasing entropy against the number of turns across 15000 iterations\n", "of gradient ascent.\n", "\n", "Note that the entropy starts at a saddle point, increaseases rapidly,\n", "and the levels off towards the maximum entropy, with the gradient\n", "decreasing slowly in the manner of Zeno’s paradox.\n", "\n", "## Constructed Quantities and Lemmas\n", "\n", "### Variable Partition\n", "\n", "$$\n", "X(t) = \\left\\{ i \\mid \\left| \\frac{\\text{d}\\theta_i}{\\text{d}t} \\right| \\geq \\varepsilon \\right\\}, \\quad M(t) = Z \\setminus X(t)\n", "$$\n", "\n", "### Fisher Information Matrix Partitioning\n", "\n", "We partition the Fisher Information Matrix $G(\\boldsymbol{\\theta})$\n", "according to the active variables $X(t)$ and latent information\n", "reservoir $M(t)$: $$\n", "G(\\boldsymbol{\\theta}) = \n", "\\begin{bmatrix}\n", "G_{XX} & G_{XM} \\\\\n", "G_{MX} & G_{MM}\n", "\\end{bmatrix}\n", "$$ where $G_{XX}$ represents the information geometry within active\n", "variables, $G_{MM}$ within the latent reservoir, and\n", "$G_{XM} = G_{MX}^\\top$ captures the cross-coupling between active and\n", "latent components. This partitioning reveals how information flows\n", "between observable dynamics and the latent structure.\n", "\n", "## Lemma 1: Form of the Minimal Entropy Configuration\n", "\n", "The minimal-entropy state compatible with the system’s resolution\n", "constraint and regularity condition is represented by a density matrix\n", "of the exponential form, $$\n", "\\rho(\\boldsymbol{\\theta}_o) = \\frac{1}{Z(\\boldsymbol{\\theta}_o)} \\exp\\left( \\sum_i \\theta_{oi} H_i \\right),\n", "$$ where all components $\\theta_{oi}$ are sub-threshold $$\n", "|\\dot{\\theta}_{oi}| < \\varepsilon.\n", "$$ This state minimizes entropy under the constraint that it remains\n", "regular, continuous, and detectable only above a resolution scale \\$\\$.\n", "Its structure can be derived via a *minimum-entropy* analogue of Jaynes’\n", "formalism, using the same density matrix geometry but inverted\n", "optimization.\n", "\n", "### Lemma 2: Symmetry Breaking\n", "\n", "If $\\theta_k \\in M(t)$ and $|\\dot{\\theta}_k| \\geq \\varepsilon$, then $$\n", "\\theta_k \\in X(t + \\delta t).\n", "$$\n", "\n", "## Four-Bin Saddle Point Example\n", "\n", "\\[edit\\]\n", "\n", "To illustrate saddle points and information reservoirs, we need at least\n", "a 4-bin system. This creates a 3-dimensional parameter space where we\n", "can observe genuine saddle points.\n", "\n", "Consider a 4-bin system parameterized by natural parameters $\\theta_1$,\n", "$\\theta_2$, and $\\theta_3$ (with one constraint). A saddle point occurs\n", "where the gradient $\\nabla_\\theta S = 0$, but the Hessian has mixed\n", "eigenvalues - some positive, some negative.\n", "\n", "At these points, the Fisher information matrix $G(\\theta)$\n", "eigendecomposition reveals.\n", "\n", "- Fast modes: large positive eigenvalues → rapid evolution\n", "- Slow modes: small positive eigenvalues → gradual evolution\n", "- Critical modes: near-zero eigenvalues → information reservoirs\n", "\n", "The eigenvectors of $G(\\theta)$ at the saddle point determine which\n", "parameter combinations form information reservoirs.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "# Exponential family entropy functions for 4-bin system\n", "def exponential_family_entropy(theta):\n", " \"\"\"\n", " Compute entropy of a 4-bin exponential family distribution\n", " parameterized by natural parameters theta\n", " \"\"\"\n", " # Compute the log-partition function (normalization constant)\n", " log_Z = np.log(np.sum(np.exp(theta)))\n", " \n", " # Compute probabilities\n", " p = np.exp(theta - log_Z)\n", " \n", " # Compute entropy: -sum(p_i * log(p_i))\n", " entropy = -np.sum(p * np.log(p), where=p>0)\n", " \n", " return entropy\n", "\n", "def entropy_gradient(theta):\n", " \"\"\"\n", " Compute the gradient of the entropy with respect to theta\n", " \"\"\"\n", " # Compute the log-partition function (normalization constant)\n", " log_Z = np.log(np.sum(np.exp(theta)))\n", " \n", " # Compute probabilities\n", " p = np.exp(theta - log_Z)\n", " \n", " # Gradient is theta times the second derivative of log partition function\n", " return -p*theta + p*(np.dot(p, theta))\n", "\n", "# Add a gradient check function\n", "def check_gradient(theta, epsilon=1e-6):\n", " \"\"\"\n", " Check the analytical gradient against numerical gradient\n", " \"\"\"\n", " # Compute analytical gradient\n", " analytical_grad = entropy_gradient(theta)\n", " \n", " # Compute numerical gradient\n", " numerical_grad = np.zeros_like(theta)\n", " for i in range(len(theta)):\n", " theta_plus = theta.copy()\n", " theta_plus[i] += epsilon\n", " entropy_plus = exponential_family_entropy(theta_plus)\n", " \n", " theta_minus = theta.copy()\n", " theta_minus[i] -= epsilon\n", " entropy_minus = exponential_family_entropy(theta_minus)\n", " \n", " numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)\n", " \n", " # Compare\n", " print(\"Analytical gradient:\", analytical_grad)\n", " print(\"Numerical gradient:\", numerical_grad)\n", " print(\"Difference:\", np.abs(analytical_grad - numerical_grad))\n", " \n", " return analytical_grad, numerical_grad\n", "\n", "# Project gradient to respect constraints (sum of theta is constant)\n", "def project_gradient(theta, grad):\n", " \"\"\"\n", " Project gradient to ensure sum constraint is respected\n", " \"\"\"\n", " # Project to space where sum of components is zero\n", " return grad - np.mean(grad)\n", "\n", "# Perform gradient ascent on entropy\n", "def gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1):\n", " \"\"\"\n", " Perform gradient ascent on entropy for 4-bin system\n", " \"\"\"\n", " theta = theta_init.copy()\n", " theta_history = [theta.copy()]\n", " entropy_history = [exponential_family_entropy(theta)]\n", " \n", " for _ in range(steps):\n", " # Compute gradient\n", " grad = entropy_gradient(theta)\n", " proj_grad = project_gradient(theta, grad)\n", " \n", " # Update parameters\n", " theta += learning_rate * proj_grad\n", " \n", " # Store history\n", " theta_history.append(theta.copy())\n", " entropy_history.append(exponential_family_entropy(theta))\n", " \n", " return np.array(theta_history), np.array(entropy_history)\n", "```\n", "\n", "``` python\n", "# Test the gradient calculation\n", "test_theta = np.array([0.5, -0.3, 0.1, -0.3])\n", "test_theta = test_theta - np.mean(test_theta) # Ensure constraint is satisfied\n", "print(\"Testing gradient calculation:\")\n", "analytical_grad, numerical_grad = check_gradient(test_theta)\n", "\n", "# Verify if we're ascending or descending\n", "entropy_before = exponential_family_entropy(test_theta)\n", "step_size = 0.01\n", "test_theta_after = test_theta + step_size * analytical_grad\n", "entropy_after = exponential_family_entropy(test_theta_after)\n", "print(f\"Entropy before step: {entropy_before}\")\n", "print(f\"Entropy after step: {entropy_after}\")\n", "print(f\"Change in entropy: {entropy_after - entropy_before}\")\n", "if entropy_after > entropy_before:\n", " print(\"We are ascending the entropy gradient\")\n", "else:\n", " print(\"We are descending the entropy gradient\")\n", "```\n", "\n", "``` python\n", "# Initialize with asymmetric distribution (away from saddle point)\n", "theta_init = np.array([1.0, -0.5, -0.2, -0.3])\n", "theta_init = theta_init - np.mean(theta_init) # Ensure constraint is satisfied\n", "\n", "# Run gradient ascent\n", "theta_history, entropy_history = gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1.0)\n", "\n", "# Create a grid for visualization\n", "x = np.linspace(-2, 2, 100)\n", "y = np.linspace(-2, 2, 100)\n", "X, Y = np.meshgrid(x, y)\n", "\n", "# Compute entropy at each grid point (with constraint on theta3 and theta4)\n", "Z = np.zeros_like(X)\n", "for i in range(X.shape[0]):\n", " for j in range(X.shape[1]):\n", " # Create full theta vector with constraint that sum is zero\n", " theta1, theta2 = X[i,j], Y[i,j]\n", " theta3 = -0.5 * (theta1 + theta2)\n", " theta4 = -0.5 * (theta1 + theta2)\n", " theta = np.array([theta1, theta2, theta3, theta4])\n", " Z[i,j] = exponential_family_entropy(theta)\n", "\n", "# Compute gradient field\n", "dX = np.zeros_like(X)\n", "dY = np.zeros_like(Y)\n", "for i in range(X.shape[0]):\n", " for j in range(X.shape[1]):\n", " # Create full theta vector with constraint\n", " theta1, theta2 = X[i,j], Y[i,j]\n", " theta3 = -0.5 * (theta1 + theta2)\n", " theta4 = -0.5 * (theta1 + theta2)\n", " theta = np.array([theta1, theta2, theta3, theta4])\n", " \n", " # Get full gradient and project\n", " grad = entropy_gradient(theta)\n", " proj_grad = project_gradient(theta, grad)\n", " \n", " # Store first two components\n", " dX[i,j] = proj_grad[0]\n", " dY[i,j] = proj_grad[1]\n", "\n", "# Normalize gradient vectors for better visualization\n", "norm = np.sqrt(dX**2 + dY**2)\n", "# Avoid division by zero\n", "norm = np.where(norm < 1e-10, 1e-10, norm)\n", "dX_norm = dX / norm\n", "dY_norm = dY / norm\n", "\n", "# A few gradient vectors for visualization\n", "stride = 10\n", "```\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "```\n", "\n", "``` python\n", "fig = plt.figure(figsize=plot.big_wide_figsize)\n", "\n", "# Create contour lines only (no filled contours)\n", "contours = plt.contour(X, Y, Z, levels=15, colors='black', linewidths=0.8)\n", "plt.clabel(contours, inline=True, fontsize=8, fmt='%.2f')\n", "\n", "# Add gradient vectors (normalized for direction, but scaled by magnitude for visibility)\n", "plt.quiver(X[::stride, ::stride], Y[::stride, ::stride], \n", " dX_norm[::stride, ::stride], dY_norm[::stride, ::stride], \n", " color='r', scale=30, width=0.003, scale_units='width')\n", "\n", "# Plot the gradient ascent trajectory\n", "plt.plot(theta_history[:, 0], theta_history[:, 1], 'b-', linewidth=2, \n", " label='Gradient Ascent Path')\n", "plt.scatter(theta_history[0, 0], theta_history[0, 1], color='green', s=100, \n", " marker='o', label='Start')\n", "plt.scatter(theta_history[-1, 0], theta_history[-1, 1], color='purple', s=100, \n", " marker='*', label='End')\n", "\n", "# Add labels and title\n", "plt.xlabel('$\\\\theta_1$')\n", "plt.ylabel('$\\\\theta_2$')\n", "plt.title('Entropy Contours with Gradient Field')\n", "\n", "# Mark the saddle point (approximately at origin for this system)\n", "plt.scatter([0], [0], color='yellow', s=100, marker='*', \n", " edgecolor='black', zorder=10, label='Saddle Point')\n", "plt.legend()\n", "\n", "mlai.write_figure(filename='simplified-saddle-point-example.svg', \n", " directory = './information-game')\n", "\n", "# Plot entropy evolution during gradient ascent\n", "plt.figure(figsize=plot.big_figsize)\n", "plt.plot(entropy_history)\n", "plt.xlabel('Gradient Ascent Step')\n", "plt.ylabel('Entropy')\n", "plt.title('Entropy Evolution During Gradient Ascent')\n", "plt.grid(True)\n", "mlai.write_figure(filename='four-bin-entropy-evolution.svg', \n", " directory = './information-game')\n", "```\n", "\n", "\n", "\n", "Figure: Visualisation of a saddle point projected down to two\n", "dimensions.\n", "\n", "\n", "\n", "Figure: Entropy evolution during gradient ascent on the four-bin\n", "system.\n", "\n", "The animation of system evolution would show initial rapid movement\n", "along high-eigenvalue directions, progressive slowing in directions with\n", "low eigenvalues and formation of information reservoirs in the\n", "critically slowed directions. Parameter-capacity uncertainty emerges\n", "naturally at the saddle point.\n", "\n", "### Entropy-Time\n", "\n", "$$\n", "\\tau(t) := S_{X(t)}(t)\n", "$$\n", "\n", "### Lemma 3: Monotonicity of Entropy-Time\n", "\n", "$$\n", "\\tau(t_2) \\geq \\tau(t_1) \\quad \\text{for all } t_2 > t_1\n", "$$\n", "\n", "### Corollary: Irreversibility\n", "\n", "$\\tau(t)$ increases monotonically, preventing time-reversal globally.\n", "\n", "## Conjecture: Frieden-Analogous Extremal Flow\n", "\n", "At points where the latent-to-active flow functional is locally extremal\n", "(e.g., \\$ \\$), the system may exhibit critical slowing where information\n", "resevoir variables are slow relative to active variables. It may be\n", "possible to separate the system entropy into active variables and,\n", "$I = S[\\rho_X]$ and “intrinsic information” $J= S[\\rho_{X|M}]$ allowing\n", "us to create an information analogous to B. Roy Frieden’s extreme\n", "physical information (Frieden (1998)) which allows derivation of locally\n", "valid differential equations that depend on the *information\n", "topography*.\n", "\n", "## From Maximum to Minimal Entropy\n", "\n", "\\[edit\\]\n", "\n", "Jaynes formulated his principle in terms of maximizing entropy, we can\n", "also view certain problems as minimizing entropy under appropriate\n", "constraints. The duality becomes apparent when we consider the\n", "relationship between entropy and information.\n", "\n", "The maximum entropy principle finds the distribution that is maximally\n", "noncommittal given certain constraints. Conversely, we can seek the\n", "distribution that minimizes entropy subject to different constraints -\n", "this represents the distribution with maximum structure or information.\n", "\n", "Consider the uncertainty principle. When we seek states that minimize\n", "the product of position and momentum uncertainties, we are seeking\n", "minimal entropy states subject to the constraint of the uncertainty\n", "principle.\n", "\n", "The mathematical formalism remains the same, but with different\n", "constraints and optimization direction, where $g_k$ are functions\n", "representing constraints different from simple averages.\n", "\n", "The solution still takes the form of an exponential family, where\n", "$\\mu_k$ are Lagrange multipliers for the constraints.\n", "\n", "## Minimal Entropy States in Quantum Systems\n", "\n", "The pure states of quantum mechanics are those that minimize von Neumann\n", "entropy $S = -\\text{Tr}(\\rho \\log \\rho)$ subject to the constraints of\n", "quantum mechanics.\n", "\n", "For example, coherent states minimize the entropy subject to constraints\n", "on the expectation values of position and momentum operators. These\n", "states achieve the minimum uncertainty allowed by quantum mechanics.\n", "\n", "## Uncertainty Principle\n", "\n", "\\[edit\\]\n", "\n", "One challenge is how to parameterise our exponential family. We’ve\n", "mentioned that the variables $Z$ are partitioned into observable\n", "variables $X$ and memory variables $M$. Given the minimal entropy\n", "initial state, the obvious initial choice is that at the origin all\n", "variables, $Z$, should be in the information reservoir, $M$. This\n", "implies that they are well determined and present a sensible choice for\n", "the source of our parameters.\n", "\n", "We define a mapping, $\\boldsymbol{\\theta}(M)$, that maps the information\n", "resevoir to a set of values that are equivalent to the *natural\n", "parameters*. If the entropy of these parameters is low, and the\n", "distribution $\\rho(\\boldsymbol{\\theta})$ is sharply peaked then we can\n", "move from treating the memory mapping, $\\boldsymbol{\\theta}(\\cdot)$, as\n", "a random processe to an assumption that it is a deterministic function.\n", "We can then follow gradients with respect to these $\\boldsymbol{\\theta}$\n", "values.\n", "\n", "This allows us to rewrite the distribution over $Z$ in a conditional\n", "form, $$\n", "\\rho(X|M) = h(X) \\exp(\\boldsymbol{\\theta}(M)^\\top T(X) - A(\\boldsymbol{\\theta}(M))).\n", "$$\n", "\n", "Unfortunately this assumption implies that $\\boldsymbol{\\theta}(\\cdot)$\n", "is a delta function, and since our representation as a compact manifold\n", "(bounded below by $0$ and above by $N$) it does not admit any such\n", "singularities.\n", "\n", "## Formal Derivation of the Uncertainty Principle\n", "\n", "We can derive the uncertainty principle formally from the\n", "information-theoretic properties of the system. Consider the mutual\n", "information between parameters $\\boldsymbol{\\theta}(M)$ and capacity\n", "variables $c(M)$: $$\n", "I(\\boldsymbol{\\theta}(M); c(M)) = H(\\boldsymbol{\\theta}(M)) + H(c(M)) - H(\\boldsymbol{\\theta}(M), c(M))\n", "$$ where $H(\\cdot)$ represents differential entropy.\n", "\n", "Since the total entropy of the system is bounded by $N$, we know that\n", "$h(\\boldsymbol{\\theta}(M), c(M)) \\leq N$. Additionally, for any two\n", "random variables, the mutual information satisfies\n", "$I(\\boldsymbol{\\theta}(M); c(M)) \\geq 0$, with equality if and only if\n", "they are independent.\n", "\n", "For our system to function as an effective information reservoir,\n", "$\\boldsymbol{\\theta}(M)$ and $c(M)$ cannot be independent - they must\n", "share information. This gives us, $$\n", "h(\\boldsymbol{\\theta}(M)) + h(c(M)) \\geq h(\\boldsymbol{\\theta}(M), c(M)) + I_{\\min}\n", "$$ where $I_{\\min} > 0$ is the minimum mutual information required for\n", "the system to function.\n", "\n", "For variables with fixed variance, differential entropy is maximized by\n", "Gaussian distributions. For a multivariate Gaussian with covariance\n", "matrix $\\Sigma$, the differential entropy is: $$\n", "h(\\mathcal{N}(0, \\Sigma)) = \\frac{1}{2}\\ln\\left((2\\pi e)^d|\\Sigma|\\right)\n", "$$ where $d$ is the dimensionality and $|\\Sigma|$ is the determinant of\n", "the covariance matrix.\n", "\n", "The Cramér-Rao inequality provides a lower bound on the variance of any\n", "unbiased estimator. If $\\boldsymbol{\\theta}$ is a parameter vector and\n", "$\\hat{\\boldsymbol{\\theta}}$ is an unbiased estimator, then: $$\n", "\\text{Cov}(\\hat{\\boldsymbol{\\theta}}) \\geq G^{-1}(\\boldsymbol{\\theta})\n", "$$ where $G(\\boldsymbol{\\theta})$ is the Fisher information matrix.\n", "\n", "In our context, the relationship between parameters\n", "$\\boldsymbol{\\theta}(M)$ and capacity variables $c(M)$ follows a similar\n", "bound. The Fisher information matrix for exponential family\n", "distributions has a special property: it equals the covariance of the\n", "sufficient statistics, which in our case are represented by the capacity\n", "variables $c(M)$. This gives us $$\n", "G(\\boldsymbol{\\theta}(M)) = \\text{Cov}(c(M))\n", "$$\n", "\n", "Applying the Cramér-Rao inequality we have $$\n", "\\text{Cov}(\\boldsymbol{\\theta}(M)) \\cdot \\text{Cov}(c(M)) \\geq G^{-1}(\\boldsymbol{\\theta}(M)) \\cdot G(\\boldsymbol{\\theta}(M)) = \\mathbf{I}\n", "$$ where $\\mathbf{I}$ is the identity matrix.\n", "\n", "For one-dimensional projections, this matrix inequality implies, $$\n", "\\text{Var}(\\boldsymbol{\\theta}(M)) \\cdot \\text{Var}(c(M)) \\geq 1\n", "$$ and converting to standard deviations we have $$\n", "\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq 1.\n", "$$\n", "\n", "When we incorporate the minimum mutual information constraint\n", "$I_{\\min}$, the bound tightens. Using the relationship between\n", "differential entropy and mutual information, we can derive $$\n", "\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k,\n", "$$ where $k = \\frac{1}{2\\pi e}e^{2I_{\\min}}$.\n", "\n", "This is our uncertainty principle, directly derived from\n", "information-theoretic constraints and the Cramér-Rao bound. It\n", "represents the fundamental trade-off between precision in parameter\n", "specification and capacity for information storage.\n", "\n", "## Definition of Capacity Variables\n", "\n", "We now provide a precise definition of the capacity variables $c(M)$.\n", "The capacity variables quantify the potential of memory variables to\n", "store information about observable variables. Mathematically, we define\n", "$c(M)$ as, $$\n", "c(M) = \\nabla_{\\boldsymbol{\\theta}} A(\\boldsymbol{\\theta}(M))\n", "$$ where $A(\\boldsymbol{\\theta})$ is the log-partition function from our\n", "exponential family distribution. This definition has a clear\n", "interpretation: $c(M)$ represents the expected values of the sufficient\n", "statistics under the current parameter values.\n", "\n", "This definition also naturally yields the Fourier relationship between\n", "parameters and capacity. In exponential families, the log-partition\n", "function and its derivatives form a Legendre transform pair, which is\n", "the mathematical basis for the Fourier duality we claim. Specifically,\n", "if we define the Fourier transform operator $\\mathcal{F}$ as the mapping\n", "that takes parameters to expected sufficient statistics, then: $$\n", "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)]\n", "$$\n", "\n", "## Capacity $\\leftrightarrow$ Precision Paradox\n", "\n", "This creates an apparent paradox, at minimal entropy states, the\n", "information reservoir must simultaneously maintain precision in the\n", "parameters $\\boldsymbol{\\theta}(M)$ (for accurate system representation)\n", "but it must also provide sufficient capacity $c(M)$ (for information\n", "storage).\n", "\n", "The trade-off can be expressed as, $$\n", "\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k,\n", "$$ where $k$ is a constant. This relationship can be recognised as a\n", "natural *uncertainty principle* that underpins the behaviour of the\n", "game. This principle is a necessary consequence of information theory.\n", "It follows from the requirement for the parameter-like states, $M$ to\n", "have both precision and high capacity (in the Shannon sense ). The\n", "uncertainty principle ensures that when parameters are sharply defined\n", "(low $\\Delta\\boldsymbol{\\theta}$), the capacity variables have high\n", "uncertainty (high $\\Delta c$), allowing information to be encoded in\n", "their relationships rather than absolute values.\n", "\n", "This trade-off between precision and capacity directly parallels\n", "Shannon’s insights about information transmission (Shannon, 1948), where\n", "he demonstrated that increasing the precision of a signal requires\n", "increasing bandwidth or reducing noise immunity—creating an inherent\n", "trade-off in any communication system. Our formulation extends this\n", "principle to the information reservoir’s parameter space.\n", "\n", "In practice this means that the parameters $\\boldsymbol{\\theta}(M)$ and\n", "capacity variables $c(M)$ must form a Fourier-dual pair, $$\n", "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)],\n", "$$ This duality becomes important at saddle points when direct gradient\n", "ascent stalls.\n", "\n", "The mathematical formulation of the uncertainty principle comes from\n", "Hirschman Jr (1957) and later refined by Beckner (1975) and\n", "Białynicki-Birula and Mycielski (1975). These works demonstrated that\n", "Shannon’s information-theoretic entropy provides a natural framework for\n", "expressing the uncertainty principle, establishing a direct bridge\n", "between the mathematical formalism of quantum mechanics and information\n", "theory. Our capacity-precision trade-off follows this tradition,\n", "expressing the fundamental limits of information processing in our\n", "system.\n", "\n", "## Quantum vs Classical Information Reservoirs\n", "\n", "The uncertainty principle means that the game can exhibit quantum-like\n", "information processing regimes during evolution. This inspires an\n", "information-theoretic perspective on the quantum-classical transition.\n", "\n", "At minimal entropy states near the origin, the information reservoir has\n", "characteristics reminiscent of quantum systems.\n", "\n", "1. *Wave-like information encoding*: The information reservoir near the\n", " origin necessarily encodes information in distributed,\n", " interference-capable patterns due to the uncertainty principle\n", " between parameters $\\boldsymbol{\\theta}(M)$ and capacity variables\n", " $c(M)$.\n", "\n", "2. *Non-local correlations*: Parameters are highly correlated through\n", " the Fisher information matrix, creating structures where information\n", " is stored in relationships rather than individual variables.\n", "\n", "3. *Uncertainty-saturated regime*: The uncertainty relationship\n", " $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k$ is nearly\n", " saturated (approaches equality), similar to Heisenberg’s uncertainty\n", " principle in quantum systems and the entropic uncertainty relations\n", " established by Białynicki-Birula and Mycielski (1975).\n", "\n", "As the system evolves towards higher entropy states, a transition occurs\n", "where some variables exhibit classical behavior.\n", "\n", "1. *From wave-like to particle-like*: Variables transitioning from $M$\n", " to $X$ shift from storing information in interference patterns to\n", " storing it in definite values with statistical uncertainty.\n", "\n", "2. *Decoherence-like process*: The uncertainty product\n", " $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M)$ for these variables\n", " grows significantly larger than the minimum value $k$, indicating a\n", " departure from quantum-like behavior.\n", "\n", "3. *Local information encoding*: Information becomes increasingly\n", " encoded in local variables rather than distributed correlations.\n", "\n", "The saddle points in our entropy landscape mark critical transitions\n", "between quantum-like and classical information processing regimes. Near\n", "these points\n", "\n", "1. The critically slowed modes maintain quantum-like characteristics,\n", " functioning as coherent memory that preserves information through\n", " interference patterns.\n", "\n", "2. The rapidly evolving modes exhibit classical characteristics,\n", " functioning as incoherent processors that manipulate information\n", " through statistical operations.\n", "\n", "3. This natural separation creates a hybrid computational architecture\n", " where quantum-like memory interfaces with classical-like processing.\n", "\n", "The quantum-classical transition can be quantified using the moment\n", "generating function $M_Z(t)$. In quantum-like regimes, the MGF exhibits\n", "oscillatory behavior with complex analytic structure, whereas in\n", "classical regimes, it grows monotonically with simple analytic\n", "structure. The transition between these behaviors identifies variables\n", "moving between quantum-like and classical information processing modes.\n", "\n", "This perspective suggests that what we recognize as “quantum” versus\n", "“classical” behavior may fundamentally reflect different regimes of\n", "information processing - one optimized for coherent information storage\n", "(quantum-like) and the other for flexible information manipulation\n", "(classical-like). The emergence of both regimes from our\n", "entropy-maximizing model indicates that nature may exploit this\n", "computational architecture to optimize information processing across\n", "multiple scales.\n", "\n", "This formulation of the uncertainty principle in terms of information\n", "capacity and parameter precision follows the tradition established by\n", "Shannon (1948) and expanded upon by Hirschman Jr (1957) and others who\n", "connected information entropy uncertainty to Heisenberg’s uncertainty.\n", "\n", "## Quantitative Demonstration\n", "\n", "We can demonstrate this principle quantitatively through a simple model.\n", "Consider a two-dimensional system with memory variables $M = (m_1, m_2)$\n", "that map to parameters\n", "$\\boldsymbol{\\theta}(M) = (\\theta_1(m_1), \\theta_2(m_2))$. The capacity\n", "variables are $c(M) = (c_1(m_1), c_2(m_2))$.\n", "\n", "At minimal entropy, when the system is near the origin, the uncertainty\n", "product is exactly: $$\n", "\\Delta\\theta_i(m_i) \\cdot \\Delta c_i(m_i) = k\n", "$$ for each dimension $i$.\n", "\n", "As the system evolves and entropy increases, some variables transition\n", "to classical behavior with: $$\n", "\\Delta\\theta_i(m_i) \\cdot \\Delta c_i(m_i) \\gg k\n", "$$\n", "\n", "This increased product reflects the transition from quantum-like to\n", "classical information processing. The variables that maintain the\n", "minimal uncertainty product $k$ continue to function as coherent\n", "information reservoirs, while those with larger uncertainty products\n", "function as classical processors.\n", "\n", "This principle provides testable predictions for any system modeled as\n", "an information reservoir. Specifically, we predict that variables\n", "functioning as effective memory must demonstrate precision-capacity\n", "trade-offs near the theoretical minimum $k$, while processing variables\n", "will show excess uncertainty above this minimum.\n", "\n", "## Maximum Entropy and Density Matrices\n", "\n", "\\[edit\\]\n", "\n", "In Jaynes (1957) Jaynes showed how the maximum entropy formalism is\n", "applied, in later papers such as Jaynes (1963) he showed how his maximum\n", "entropy formalism could be applied to von Neumann entropy of a density\n", "matrix.\n", "\n", "As Jaynes noted in his 1962 Brandeis lectures: “Assignment of initial\n", "probabilities must, in order to be useful, agree with the initial\n", "information we have (i.e., the results of measurements of certain\n", "parameters). For example, we might know that at time $t = 0$, a nuclear\n", "spin system having total (measured) magnetic moment $M(0)$, is placed in\n", "a magnetic field $H$, and the problem is to predict the subsequent\n", "variation $M(t)$… What initial density matrix for the spin system\n", "$\\rho(0)$, should we use?”\n", "\n", "Jaynes recognized that we should choose the density matrix that\n", "maximizes the von Neumann entropy, subject to constraints from our\n", "measurements, where $M_{op}$ is the operator corresponding to total\n", "magnetic moment.\n", "\n", "The solution is the quantum version of the maximum entropy distribution,\n", "where $A_i$ are the operators corresponding to measured observables,\n", "$\\lambda_i$ are Lagrange multipliers, and\n", "$Z = \\text{Tr}[\\exp(-\\lambda_1 A_1 - \\cdots - \\lambda_m A_m)]$ is the\n", "partition function.\n", "\n", "This unifies classical entropies and density matrix entropies under the\n", "same information-theoretic principle. It clarifies that quantum states\n", "with minimum entropy (pure states) represent maximum information, while\n", "mixed states represent incomplete information.\n", "\n", "Jaynes further noted that “strictly speaking, all this should be\n", "restated in terms of quantum theory using the density matrix formalism.\n", "This will introduce the $N!$ permutation factor, a natural zero for\n", "entropy, alteration of numerical values if discreteness of energy levels\n", "becomes comparable to $k_BT$, etc.”\n", "\n", "## Quantum States and Exponential Families\n", "\n", "\\[edit\\]\n", "\n", "The minimal entropy quantum states provides a connection between density\n", "matrices and exponential family distributions. This connection enables\n", "us to use many of the classical techniques from information geometry and\n", "apply them to the game in the case where the uncertainty principle is\n", "present.\n", "\n", "The minimal entropy density matrix belongs to an exponential family,\n", "just like many classical distributions,\n", "\n", "### Classical Exponential Family\n", "\n", "$$\n", "f(x; \\theta) = h(x) \\cdot \\exp[\\eta(\\theta)^\\top \\cdot T(x) - A(\\theta)]\n", "$$\n", "\n", "### Quantum Minimal Entropy State\n", "\n", "$$\n", "\\rho = \\exp(-\\mathbf{R}^\\top \\cdot \\mathbf{G} \\cdot \\mathbf{R} - Z)\n", "$$\n", "\n", "- Both have an exponential form\n", "- Both involve sufficient statistics (in the quantum case, these are\n", " quadratic forms of operators)\n", "- Both have natural parameters (G in the quantum case)\n", "- Both include a normalization term\n", "\n", "The matrix $G$ in the minimal entropy state is directly related to the\n", "‘quantum Fisher information matrix’, $$\n", "\\mathbf{G} = \\text{QFIM}/4\n", "$$ where QFIM is the quantum Fisher information matrix, which quantifies\n", "how sensitively the state responds to parameter changes.\n", "\n", "This creates a link between\n", "\n", "1. Minimal entropy (maximum order)\n", "2. Uncertainty (fundamental quantum limitations)\n", "3. Information (ability to estimate parameters precisely)\n", "\n", "The relationship implies, $$\n", "V \\cdot \\text{QFIM} \\geq \\frac{\\hbar^2}{4}\n", "$$ which connects the covariance matrix (uncertainties) to the Fisher\n", "information (precision in parameter estimation).\n", "\n", "These minimal entropy states may have physical relationships to\n", "interpretations squeezed states in quantum optics. They are the states\n", "that achieve the ultimate precision allowed by quantum mechanics.\n", "\n", "## Minimal Entropy States\n", "\n", "\\[edit\\]\n", "\n", "In Jaynes’ World, we begin at a minimal entropy configuration - the\n", "“origin” state. Understanding the properties of these minimal entropy\n", "states is crucial for characterizing how the system evolves. These\n", "states are constrained by the uncertainty principle we previously\n", "identified: $\\Delta\\boldsymbol{\\theta}(M) \\cdot \\Delta c(M) \\geq k$.\n", "\n", "This constraint is reminiscient of the Heisenberg uncertainty principle\n", "in quantum mechanics, where $\\Delta x \\cdot \\Delta p \\geq \\hbar/2$. This\n", "isn’t a coincidence - both represent limitations on precision arising\n", "from the mathematical structure of information. The total entropy of the\n", "system is constrained to be between 0 and $N$, forming a compact\n", "manifold with respect to its parameters. This upper bound $N$ ensures\n", "that as the system evolves from minimal to maximal entropy, it remains\n", "within a well-defined entropy space.\n", "\n", "## Structure of Minimal Entropy States\n", "\n", "The minimal entropy configuration under the uncertainty constraint takes\n", "a specific mathematical form. It is a pure state (in the sense of having\n", "minimal possible entropy) that exactly saturates the uncertainty bound.\n", "For a system with multiple degrees of freedom, the distribution takes a\n", "Gaussian form, $$\n", "\\rho(Z) = \\frac{1}{Z}\\exp(-\\mathbf{R}^T \\cdot \\mathbf{G} \\cdot \\mathbf{R}),\n", "$$ where $\\mathbf{R}$ represents the vector of all variables,\n", "$\\mathbf{G}$ is a positive definite matrix constrained by the\n", "uncertainty principle, and $Z$ is the normalization constant (partition\n", "function).\n", "\n", "This form is an exponential family distribution, in line with Jaynes’\n", "principle that entropy-optimized distributions belong to the exponential\n", "family. The matrix $\\mathbf{G}$ determines how uncertainty is\n", "distributed among different variables and their correlations.\n", "\n", "## Fisher Information and Minimal Uncertainty\n", "\n", "// … existing code …\n", "\n", "## Gradient Ascent and Uncertainty Principles\n", "\n", "\\[edit\\]\n", "\n", "In our exploration of information dynamics, we now turn to the\n", "relationship between gradient ascent on entropy and uncertainty\n", "principles. This section demonstrates how systems naturally evolve from\n", "quantum-like states (with minimal uncertainty) toward classical-like\n", "states (with excess uncertainty) through entropy maximization.\n", "\n", "For simplicity, we’ll focus on multivariate Gaussian distributions,\n", "where the uncertainty relationships are particularly elegant. In this\n", "setting, the precision matrix $\\Lambda$ (inverse of the covariance\n", "matrix) fully characterizes the distribution. The entropy of a\n", "multivariate Gaussian is directly related to the determinant of the\n", "covariance matrix, $$\n", "S = \\frac{1}{2}\\log\\det(V) + \\text{constant},\n", "$$ where $V = \\Lambda^{-1}$ is the covariance matrix.\n", "\n", "For conjugate variables like position and momentum, the Heisenberg\n", "uncertainty principle imposes constraints on the minimum product of\n", "their uncertainties. In our information-theoretic framework, this\n", "appears as a constraint on the determinant of certain submatrices of the\n", "covariance matrix.\n", "\n", "``` python\n", "import numpy as np\n", "from scipy.linalg import eigh\n", "import matplotlib.pyplot as plt\n", "from matplotlib.patches import Ellipse\n", "```\n", "\n", "The code below implements gradient ascent on the entropy of a\n", "multivariate Gaussian system while respecting uncertainty constraints.\n", "We’ll track how the system evolves from minimal uncertainty states\n", "(quantum-like) to states with excess uncertainty (classical-like).\n", "\n", "First, we define key functions for computing entropy and its gradient.\n", "\n", "``` python\n", "\n", "# Constants\n", "hbar = 1.0 # Normalized Planck's constant\n", "min_uncertainty_product = hbar/2\n", "```\n", "\n", "``` python\n", "# Compute entropy of a multivariate Gaussian with precision matrix Lambda\n", "def compute_entropy(Lambda):\n", " \"\"\"\n", " Compute entropy of multivariate Gaussian with precision matrix Lambda.\n", " \n", " Parameters:\n", " -----------\n", " Lambda: array\n", " Precision matrix\n", " \n", " Returns:\n", " --------\n", " entropy: float\n", " Entropy value\n", " \"\"\"\n", " # Covariance matrix is inverse of precision matrix\n", " V = np.linalg.inv(Lambda)\n", " \n", " # Entropy formula for multivariate Gaussian\n", " n = Lambda.shape[0]\n", " entropy = 0.5 * np.log(np.linalg.det(V)) + 0.5 * n * (1 + np.log(2*np.pi))\n", " \n", " return entropy\n", "\n", "# Compute gradient of entropy with respect to precision matrix\n", "def compute_entropy_gradient(Lambda):\n", " \"\"\"\n", " Compute gradient of entropy with respect to precision matrix.\n", " \n", " Parameters:\n", " -----------\n", " Lambda: array\n", " Precision matrix\n", " \n", " Returns:\n", " --------\n", " gradient: array\n", " Gradient of entropy\n", " \"\"\"\n", " # Gradient is -0.5 * inverse of Lambda\n", " V = np.linalg.inv(Lambda)\n", " gradient = -0.5 * V\n", " \n", " return gradient\n", "```\n", "\n", "The `compute_entropy` function calculates the entropy of a multivariate\n", "Gaussian distribution from its precision matrix. The\n", "`compute_entropy_gradient` function computes the gradient of entropy\n", "with respect to the precision matrix, which is essential for our\n", "gradient ascent procedure.\n", "\n", "Next, we implement functions to handle the constraints imposed by the\n", "uncertainty principle:\n", "\n", "``` python\n", "\n", "# Project gradient to respect uncertainty constraints\n", "def project_gradient(eigenvalues, gradient):\n", " \"\"\"\n", " Project gradient to respect minimum uncertainty constraints.\n", " \n", " Parameters:\n", " -----------\n", " eigenvalues: array\n", " Eigenvalues of precision matrix\n", " gradient: array\n", " Gradient vector\n", " \n", " Returns:\n", " --------\n", " projected_gradient: array\n", " Gradient projected to respect constraints\n", " \"\"\"\n", " n_pairs = len(eigenvalues) // 2\n", " projected_gradient = gradient.copy()\n", " \n", " # For each position-momentum pair\n", " for i in range(n_pairs):\n", " idx1, idx2 = 2*i, 2*i+1\n", " \n", " # Check if we're at the uncertainty boundary\n", " product = 1.0 / (eigenvalues[idx1] * eigenvalues[idx2])\n", " \n", " if product <= min_uncertainty_product * 1.01:\n", " # We're at or near the boundary\n", " # Project gradient to maintain the product\n", " avg_grad = 0.5 * (gradient[idx1]/eigenvalues[idx1] + gradient[idx2]/eigenvalues[idx2])\n", " projected_gradient[idx1] = avg_grad * eigenvalues[idx1]\n", " projected_gradient[idx2] = avg_grad * eigenvalues[idx2]\n", " \n", " return projected_gradient\n", "\n", "# Initialize a multidimensional state with position-momentum pairs\n", "def initialize_multidimensional_state(n_pairs, squeeze_factors=None, with_cross_connections=False):\n", " \"\"\"\n", " Initialize a precision matrix for multiple position-momentum pairs.\n", " \n", " Parameters:\n", " -----------\n", " n_pairs: int\n", " Number of position-momentum pairs\n", " squeeze_factors: list or None\n", " Factors determining the position-momentum squeezing\n", " with_cross_connections: bool\n", " Whether to initialize with cross-connections between pairs\n", " \n", " Returns:\n", " --------\n", " Lambda: array\n", " Precision matrix\n", " \"\"\"\n", " if squeeze_factors is None:\n", " squeeze_factors = [0.1 + 0.05*i for i in range(n_pairs)]\n", " \n", " # Total dimension (position + momentum)\n", " dim = 2 * n_pairs\n", " \n", " # Initialize with diagonal precision matrix\n", " eigenvalues = np.zeros(dim)\n", " \n", " # Set eigenvalues based on squeeze factors\n", " for i in range(n_pairs):\n", " squeeze = squeeze_factors[i]\n", " eigenvalues[2*i] = 1.0 / (squeeze * min_uncertainty_product)\n", " eigenvalues[2*i+1] = 1.0 / (min_uncertainty_product / squeeze)\n", " \n", " # Initialize with identity eigenvectors\n", " eigenvectors = np.eye(dim)\n", " \n", " # If requested, add cross-connections by mixing eigenvectors\n", " if with_cross_connections:\n", " # Create a random orthogonal matrix for mixing\n", " Q, _ = np.linalg.qr(np.random.randn(dim, dim))\n", " \n", " # Apply moderate mixing - not fully random to preserve some structure\n", " mixing_strength = 0.3\n", " eigenvectors = (1 - mixing_strength) * eigenvectors + mixing_strength * Q\n", " \n", " # Re-orthogonalize\n", " eigenvectors, _ = np.linalg.qr(eigenvectors)\n", " \n", " # Construct precision matrix from eigendecomposition\n", " Lambda = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T\n", " \n", " return Lambda\n", "```\n", "\n", "The `project_gradient` function ensures that our gradient ascent\n", "respects the uncertainty principle by projecting the gradient to\n", "maintain minimum uncertainty products when necessary. The\n", "`initialize_multidimensional_state` function creates a starting state\n", "with multiple position-momentum pairs, each initialized to the minimum\n", "uncertainty allowed by the uncertainty principle, but with different\n", "“squeeze factors” that determine the shape of the uncertainty ellipse.\n", "\n", "``` python\n", "\n", "# Add gradient check function\n", "def check_entropy_gradient(Lambda, epsilon=1e-6):\n", " \"\"\"\n", " Check the analytical gradient of entropy against numerical gradient.\n", " \n", " Parameters:\n", " -----------\n", " Lambda: array\n", " Precision matrix\n", " epsilon: float\n", " Small perturbation for numerical gradient\n", " \n", " Returns:\n", " --------\n", " analytical_grad: array\n", " Analytical gradient with respect to eigenvalues\n", " numerical_grad: array\n", " Numerical gradient with respect to eigenvalues\n", " \"\"\"\n", " # Get eigendecomposition\n", " eigenvalues, eigenvectors = eigh(Lambda)\n", " \n", " # Compute analytical gradient\n", " analytical_grad = entropy_gradient(eigenvalues)\n", " \n", " # Compute numerical gradient\n", " numerical_grad = np.zeros_like(eigenvalues)\n", " for i in range(len(eigenvalues)):\n", " # Perturb eigenvalue up\n", " eigenvalues_plus = eigenvalues.copy()\n", " eigenvalues_plus[i] += epsilon\n", " Lambda_plus = eigenvectors @ np.diag(eigenvalues_plus) @ eigenvectors.T\n", " entropy_plus = compute_entropy(Lambda_plus)\n", " \n", " # Perturb eigenvalue down\n", " eigenvalues_minus = eigenvalues.copy()\n", " eigenvalues_minus[i] -= epsilon\n", " Lambda_minus = eigenvectors @ np.diag(eigenvalues_minus) @ eigenvectors.T\n", " entropy_minus = compute_entropy(Lambda_minus)\n", " \n", " # Compute numerical gradient\n", " numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)\n", " \n", " # Compare\n", " print(\"Analytical gradient:\", analytical_grad)\n", " print(\"Numerical gradient:\", numerical_grad)\n", " print(\"Difference:\", np.abs(analytical_grad - numerical_grad))\n", " \n", " return analytical_grad, numerical_grad\n", "```\n", "\n", "Now we implement the main gradient ascent procedure.\n", "\n", "``` python\n", "\n", "\n", "# Perform gradient ascent on entropy\n", "def gradient_ascent_entropy(Lambda_init, n_steps=100, learning_rate=0.01):\n", " \"\"\"\n", " Perform gradient ascent on entropy while respecting uncertainty constraints.\n", " \n", " Parameters:\n", " -----------\n", " Lambda_init: array\n", " Initial precision matrix\n", " n_steps: int\n", " Number of gradient steps\n", " learning_rate: float\n", " Learning rate for gradient ascent\n", " \n", " Returns:\n", " --------\n", " Lambda_history: list\n", " History of precision matrices\n", " entropy_history: list\n", " History of entropy values\n", " \"\"\"\n", " Lambda = Lambda_init.copy()\n", " Lambda_history = [Lambda.copy()]\n", " entropy_history = [compute_entropy(Lambda)]\n", " \n", " for step in range(n_steps):\n", " # Compute gradient of entropy\n", " grad_matrix = compute_entropy_gradient(Lambda)\n", " \n", " # Diagonalize Lambda to work with eigenvalues\n", " eigenvalues, eigenvectors = eigh(Lambda)\n", " \n", " # Transform gradient to eigenvalue space\n", " grad = np.diag(eigenvectors.T @ grad_matrix @ eigenvectors)\n", " \n", " # Project gradient to respect constraints\n", " proj_grad = project_gradient(eigenvalues, grad)\n", " \n", " # Update eigenvalues\n", " eigenvalues += learning_rate * proj_grad\n", " \n", " # Ensure eigenvalues remain positive\n", " eigenvalues = np.maximum(eigenvalues, 1e-10)\n", " \n", " # Reconstruct Lambda from updated eigenvalues\n", " Lambda = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T\n", " \n", " # Store history\n", " Lambda_history.append(Lambda.copy())\n", " entropy_history.append(compute_entropy(Lambda))\n", " \n", " return Lambda_history, entropy_history\n", "```\n", "\n", "The `gradient_ascent_entropy` function implements the core optimization\n", "procedure. It performs gradient ascent on the entropy while respecting\n", "the uncertainty constraints. The algorithm works in the eigenvalue space\n", "of the precision matrix, which makes it easier to enforce constraints\n", "and ensure the matrix remains positive definite.\n", "\n", "To analyze the results, we implement functions to track uncertainty\n", "metrics and detect interesting dynamics:\n", "\n", "``` python\n", "\n", "# Track uncertainty products and regime classification\n", "def track_uncertainty_metrics(Lambda_history):\n", " \"\"\"\n", " Track uncertainty products and classify regimes for each conjugate pair.\n", " \n", " Parameters:\n", " -----------\n", " Lambda_history: list\n", " History of precision matrices\n", " \n", " Returns:\n", " --------\n", " metrics: dict\n", " Dictionary containing uncertainty metrics over time\n", " \"\"\"\n", " n_steps = len(Lambda_history)\n", " n_pairs = Lambda_history[0].shape[0] // 2\n", " \n", " # Initialize tracking arrays\n", " uncertainty_products = np.zeros((n_steps, n_pairs))\n", " regimes = np.zeros((n_steps, n_pairs), dtype=object)\n", " \n", " for step, Lambda in enumerate(Lambda_history):\n", " # Get covariance matrix\n", " V = np.linalg.inv(Lambda)\n", " \n", " # Calculate Fisher information matrix\n", " G = Lambda / 2\n", " \n", " # For each conjugate pair\n", " for i in range(n_pairs):\n", " # Extract 2x2 submatrix for this pair\n", " idx1, idx2 = 2*i, 2*i+1\n", " V_sub = V[np.ix_([idx1, idx2], [idx1, idx2])]\n", " \n", " # Compute uncertainty product (determinant of submatrix)\n", " uncertainty_product = np.sqrt(np.linalg.det(V_sub))\n", " uncertainty_products[step, i] = uncertainty_product\n", " \n", " # Classify regime\n", " if abs(uncertainty_product - min_uncertainty_product) < 0.1*min_uncertainty_product:\n", " regimes[step, i] = \"Quantum-like\"\n", " else:\n", " regimes[step, i] = \"Classical-like\"\n", " \n", " return {\n", " 'uncertainty_products': uncertainty_products,\n", " 'regimes': regimes\n", " }\n", "```\n", "\n", "The `track_uncertainty_metrics` function analyzes the evolution of\n", "uncertainty products for each position-momentum pair and classifies them\n", "as either “quantum-like” (near minimum uncertainty) or “classical-like”\n", "(with excess uncertainty). This classification helps us understand how\n", "the system transitions between these regimes during entropy\n", "maximization.\n", "\n", "We also implement a function to detect saddle points in the gradient\n", "flow, which are critical for understanding the system’s dynamics:\n", "\n", "``` python\n", "\n", "# Detect saddle points in the gradient flow\n", "def detect_saddle_points(Lambda_history):\n", " \"\"\"\n", " Detect saddle-like behavior in the gradient flow.\n", " \n", " Parameters:\n", " -----------\n", " Lambda_history: list\n", " History of precision matrices\n", " \n", " Returns:\n", " --------\n", " saddle_metrics: dict\n", " Metrics related to saddle point behavior\n", " \"\"\"\n", " n_steps = len(Lambda_history)\n", " n_pairs = Lambda_history[0].shape[0] // 2\n", " \n", " # Track eigenvalues and their gradients\n", " eigenvalues_history = np.zeros((n_steps, 2*n_pairs))\n", " gradient_ratios = np.zeros((n_steps, n_pairs))\n", " \n", " for step, Lambda in enumerate(Lambda_history):\n", " # Get eigenvalues\n", " eigenvalues, _ = eigh(Lambda)\n", " eigenvalues_history[step] = eigenvalues\n", " \n", " # For each pair, compute ratio of gradients\n", " if step > 0:\n", " for i in range(n_pairs):\n", " idx1, idx2 = 2*i, 2*i+1\n", " \n", " # Change in eigenvalues\n", " delta1 = abs(eigenvalues_history[step, idx1] - eigenvalues_history[step-1, idx1])\n", " delta2 = abs(eigenvalues_history[step, idx2] - eigenvalues_history[step-1, idx2])\n", " \n", " # Ratio of max to min (high ratio indicates saddle-like behavior)\n", " max_delta = max(delta1, delta2)\n", " min_delta = max(1e-10, min(delta1, delta2)) # Avoid division by zero\n", " gradient_ratios[step, i] = max_delta / min_delta\n", " \n", " # Identify candidate saddle points (where some gradients are much larger than others)\n", " saddle_candidates = []\n", " for step in range(1, n_steps):\n", " if np.any(gradient_ratios[step] > 10): # Threshold for saddle-like behavior\n", " saddle_candidates.append(step)\n", " \n", " return {\n", " 'eigenvalues_history': eigenvalues_history,\n", " 'gradient_ratios': gradient_ratios,\n", " 'saddle_candidates': saddle_candidates\n", " }\n", "```\n", "\n", "The `detect_saddle_points` function identifies points in the gradient\n", "flow where some eigenvalues change much faster than others, indicating\n", "saddle-like behavior. These saddle points are important because they\n", "represent critical transitions in the system’s evolution.\n", "\n", "Finally, we implement visualization functions to help us understand the\n", "system’s behavior:\n", "\n", "``` python\n", "\n", "# Visualize uncertainty ellipses for multiple pairs\n", "def plot_multidimensional_uncertainty(Lambda_history, step_indices, pairs_to_plot=None):\n", " \"\"\"\n", " Plot the evolution of uncertainty ellipses for multiple position-momentum pairs.\n", " \n", " Parameters:\n", " -----------\n", " Lambda_history: list\n", " History of precision matrices\n", " step_indices: list\n", " Indices of steps to visualize\n", " pairs_to_plot: list, optional\n", " Indices of position-momentum pairs to plot\n", " \"\"\"\n", " n_pairs = Lambda_history[0].shape[0] // 2\n", " \n", " if pairs_to_plot is None:\n", " pairs_to_plot = range(min(3, n_pairs)) # Plot up to 3 pairs by default\n", " \n", " fig, axes = plt.subplots(len(pairs_to_plot), len(step_indices), \n", " figsize=(4*len(step_indices), 3*len(pairs_to_plot)))\n", " \n", " # Handle case of single pair or single step\n", " if len(pairs_to_plot) == 1:\n", " axes = axes.reshape(1, -1)\n", " if len(step_indices) == 1:\n", " axes = axes.reshape(-1, 1)\n", " \n", " for row, pair_idx in enumerate(pairs_to_plot):\n", " for col, step in enumerate(step_indices):\n", " ax = axes[row, col]\n", " Lambda = Lambda_history[step]\n", " covariance = np.linalg.inv(Lambda)\n", " \n", " # Extract 2x2 submatrix for this pair\n", " idx1, idx2 = 2*pair_idx, 2*pair_idx+1\n", " cov_sub = covariance[np.ix_([idx1, idx2], [idx1, idx2])]\n", " \n", " # Get eigenvalues and eigenvectors of submatrix\n", " values, vectors = eigh(cov_sub)\n", " \n", " # Calculate ellipse parameters\n", " angle = np.degrees(np.arctan2(vectors[1, 0], vectors[0, 0]))\n", " width, height = 2 * np.sqrt(values)\n", " \n", " # Create ellipse\n", " ellipse = Ellipse((0, 0), width=width, height=height, angle=angle,\n", " edgecolor='blue', facecolor='lightblue', alpha=0.5)\n", " \n", " # Add to plot\n", " ax.add_patch(ellipse)\n", " ax.set_xlim(-3, 3)\n", " ax.set_ylim(-3, 3)\n", " ax.set_aspect('equal')\n", " ax.grid(True)\n", " \n", " # Add minimum uncertainty circle\n", " min_circle = plt.Circle((0, 0), min_uncertainty_product, \n", " fill=False, color='red', linestyle='--')\n", " ax.add_patch(min_circle)\n", " \n", " # Compute uncertainty product\n", " uncertainty_product = np.sqrt(np.linalg.det(cov_sub))\n", " \n", " # Determine regime\n", " if abs(uncertainty_product - min_uncertainty_product) < 0.1*min_uncertainty_product:\n", " regime = \"Quantum-like\"\n", " color = 'red'\n", " else:\n", " regime = \"Classical-like\"\n", " color = 'blue'\n", " \n", " # Add labels\n", " if row == 0:\n", " ax.set_title(f\"Step {step}\")\n", " if col == 0:\n", " ax.set_ylabel(f\"Pair {pair_idx+1}\")\n", " \n", " # Add uncertainty product text\n", " ax.text(0.05, 0.95, f\"ΔxΔp = {uncertainty_product:.2f}\",\n", " transform=ax.transAxes, fontsize=10, verticalalignment='top')\n", " \n", " # Add regime text\n", " ax.text(0.05, 0.85, regime, transform=ax.transAxes, \n", " fontsize=10, verticalalignment='top', color=color)\n", " \n", " ax.set_xlabel(\"Position\")\n", " ax.set_ylabel(\"Momentum\")\n", " \n", " plt.tight_layout()\n", " return fig\n", "```\n", "\n", "The `plot_multidimensional_uncertainty` function visualizes the\n", "uncertainty ellipses for multiple position-momentum pairs at different\n", "steps of the gradient ascent process. These visualizations help us\n", "understand how the system transitions from quantum-like to\n", "classical-like regimes.\n", "\n", "This implementation builds on the `InformationReservoir` class we saw\n", "earlier, but generalizes to multiple position-momentum pairs and focuses\n", "specifically on the uncertainty relationships. The key connection is\n", "that both implementations track how systems naturally evolve from\n", "minimal entropy states (with quantum-like uncertainty relations) toward\n", "maximum entropy states (with classical-like uncertainty relations).\n", "\n", "As the system evolves through gradient ascent, we observe transitions.\n", "\n", "1. *Uncertainty desaturation*: The system begins with a minimal entropy\n", " state that exactly saturates the uncertainty bound\n", " ($\\Delta x \\cdot \\Delta p = \\hbar/2$). As entropy increases, this\n", " bound becomes less tightly saturated.\n", "\n", "2. *Shape transformation*: The initial highly squeezed uncertainty\n", " ellipse (with small position uncertainty and large momentum\n", " uncertainty) gradually becomes more circular, representing a more\n", " balanced distribution of uncertainty.\n", "\n", "3. *Quantum-to-classical transition*: The system transitions from a\n", " quantum-like regime (where uncertainty is at the minimum allowed by\n", " quantum mechanics) to a more classical-like regime (where\n", " statistical uncertainty dominates over quantum uncertainty).\n", "\n", "This evolution reveals how information naturally flows from highly\n", "ordered configurations toward maximum entropy states, while still\n", "respecting the fundamental constraints imposed by the uncertainty\n", "principle.\n", "\n", "In systems with multiple position-momentum pairs, the gradient ascent\n", "process encounters saddle points which trigger a natural slowdown. The\n", "system naturally slows down near saddle points, with some eigenvalue\n", "pairs evolving quickly while others hardly change. These saddle points\n", "represent partially equilibrated states where some degrees of freedom\n", "have reached maximum entropy while others remain ordered. At these\n", "critical points, some variables maintain quantum-like characteristics\n", "(uncertainty saturation) while others exhibit classical-like behavior\n", "(excess uncertainty).\n", "\n", "This natural separation creates a hybrid system where quantum-like\n", "memory interfaces with classical-like processing - emerging naturally\n", "from the geometry of the entropy landscape under uncertainty\n", "constraints.\n", "\n", "``` python\n", "import numpy as np\n", "from scipy.linalg import eigh\n", "```\n", "\n", "``` python\n", "# Constants\n", "hbar = 1.0 # Normalized Planck's constant\n", "min_uncertainty_product = hbar/2\n", "\n", "# Verify gradient calculation\n", "print(\"Testing gradient calculation:\")\n", "test_Lambda = np.array([[2.0, 0.5], [0.5, 1.0]]) # Example precision matrix\n", "analytical_grad, numerical_grad = check_entropy_gradient(test_Lambda)\n", "\n", "# Verify if we're ascending or descending\n", "entropy_before = compute_entropy(test_Lambda)\n", "eigenvalues, eigenvectors = eigh(test_Lambda)\n", "step_size = 0.01\n", "eigenvalues_after = eigenvalues + step_size * analytical_grad\n", "test_Lambda_after = eigenvectors @ np.diag(eigenvalues_after) @ eigenvectors.T\n", "entropy_after = compute_entropy(test_Lambda_after)\n", "\n", "print(f\"Entropy before step: {entropy_before}\")\n", "print(f\"Entropy after step: {entropy_after}\")\n", "print(f\"Change in entropy: {entropy_after - entropy_before}\")\n", "if entropy_after > entropy_before:\n", " print(\"We are ascending the entropy gradient\")\n", "else:\n", " print(\"We are descending the entropy gradient\")\n", "\n", "test_grad = compute_entropy_gradient(test_Lambda)\n", "print(f\"Precision matrix:\\n{test_Lambda}\")\n", "print(f\"Entropy gradient:\\n{test_grad}\")\n", "print(f\"Entropy: {compute_entropy(test_Lambda):.4f}\")\n", "# Initialize system with 2 position-momentum pairs\n", "n_pairs = 2\n", "Lambda_init = initialize_multidimensional_state(n_pairs, squeeze_factors=[0.1, 0.5])\n", "# Run gradient ascent\n", "n_steps = 100\n", "Lambda_history, entropy_history = gradient_ascent_entropy(Lambda_init, n_steps, learning_rate=0.01)\n", "\n", "# Track metrics\n", "uncertainty_metrics = track_uncertainty_metrics(Lambda_history)\n", "saddle_metrics = detect_saddle_points(Lambda_history)\n", "\n", "# Print results\n", "print(\"\\nFinal entropy:\", entropy_history[-1])\n", "print(\"Initial uncertainty products:\", uncertainty_metrics['uncertainty_products'][0])\n", "print(\"Final uncertainty products:\", uncertainty_metrics['uncertainty_products'][-1])\n", "print(\"Saddle point candidates at steps:\", saddle_metrics['saddle_candidates'])\n", "```\n", "\n", "``` python\n", "\n", "# Plot entropy evolution\n", "plt.figure(figsize=plot.big_wide_figsize)\n", "plt.plot(entropy_history)\n", "plt.xlabel('Gradient Ascent Step')\n", "plt.ylabel('Entropy')\n", "plt.title('Entropy Evolution During Gradient Ascent')\n", "plt.grid(True)\n", "mlai.write_figure(filename='entropy-evolution-during-gradient-ascent.svg', \n", " directory='./information-game')\n", "\n", "# Plot uncertainty products evolution\n", "plt.figure(figsize=plot.big_wide_figsize)\n", "for i in range(n_pairs):\n", " plt.plot(uncertainty_metrics['uncertainty_products'][:, i], \n", " label=f'Pair {i+1}')\n", "plt.axhline(y=min_uncertainty_product, color='k', linestyle='--', \n", " label='Minimum uncertainty')\n", "plt.xlabel('Gradient Ascent Step')\n", "plt.ylabel('Uncertainty Product (ΔxΔp)')\n", "plt.title('Evolution of Uncertainty Products')\n", "plt.legend()\n", "plt.grid(True)\n", "\n", "mlai.write_figure(filename='uncertainty-products-evolution.svg', \n", " directory='./information-game')\n", "\n", "\n", "\n", "# Plot uncertainty ellipses at key steps\n", "step_indices = [0, 20, 50, 99] # Initial, early, middle, final\n", "plot_multidimensional_uncertainty(Lambda_history, step_indices)\n", "\n", "# Plot eigenvalues evolution\n", "plt.subplots(figsize=plot.big_wide_figsize)\n", "for i in range(2*n_pairs):\n", " plt.semilogy(saddle_metrics['eigenvalues_history'][:, i], \n", " label=f'$\\\\lambda_{i+1}$')\n", "plt.xlabel('Gradient Ascent Step')\n", "plt.ylabel('Eigenvalue (log scale)')\n", "plt.title('Evolution of Precision Matrix Eigenvalues')\n", "plt.legend()\n", "plt.grid(True)\n", "plt.tight_layout()\n", "mlai.write_figure(filename='eigenvalue-evolution.svg', \n", " directory='./information-game')\n", "```\n", "\n", "\n", "\n", "Figure: Eigenvalue evolution during gradient ascent.\n", "\n", "\n", "\n", "Figure: Uncertainty products evolution during gradient ascent.\n", "\n", "\n", "\n", "Figure: Entropy evolution during gradient ascent.\n", "\n", "\n", "\n", "Figure: .\n", "\n", "## Visualising the Parameter-Capacity Uncertainty Principle\n", "\n", "\\[edit\\]\n", "\n", "The uncertainty principle between parameters $\\theta$ and capacity\n", "variables $c$ is a fundamental feature of information reservoirs. We can\n", "visualize this uncertainty relation using phase space plots.\n", "\n", "We can demonstrate how the uncertainty principle manifests in different\n", "regimes:\n", "\n", "1. **Quantum-like regime**: Near minimal entropy, the uncertainty\n", " product $\\Delta\\theta \\cdot \\Delta c$ approaches the lower bound\n", " $k$, creating wave-like interference patterns in probability space.\n", "\n", "2. **Transitional regime**: As entropy increases, uncertainty relations\n", " begin to decouple, with $\\Delta\\theta \\cdot \\Delta c > k$.\n", "\n", "3. **Classical regime**: At high entropy, parameter uncertainty\n", " dominates, creating diffusion-like dynamics with minimal influence\n", " from uncertainty relations.\n", "\n", "The visualization shows probability distributions for these three\n", "regimes in both parameter space and capacity space.\n", "\n", "``` python\n", "import numpy as np\n", "```\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "from matplotlib.patches import Ellipse\n", "```\n", "\n", "``` python\n", "# Visualization of uncertainty ellipses\n", "fig, ax = plt.subplots(figsize=plot.big_figsize)\n", "\n", "# Parameters for uncertainty ellipses\n", "k = 1 # Uncertainty constant\n", "centers = [(0, 0), (2, 2), (4, 4)]\n", "widths = [0.25, 0.5, 2]\n", "heights = [4, 2.5, 2]\n", "#heights = [k/w for w in widths]\n", "colors = ['blue', 'green', 'red']\n", "labels = ['Quantum-like', 'Transitional', 'Classical']\n", "\n", "# Plot uncertainty ellipses\n", "for center, width, height, color, label in zip(centers, widths, heights, colors, labels):\n", " ellipse = Ellipse(center, width, height, \n", " edgecolor=color, facecolor='none', \n", " linewidth=2, label=label)\n", " ax.add_patch(ellipse)\n", " \n", " # Add text label\n", " ax.text(center[0], center[1] + height/2 + 0.2, \n", " label, ha='center', color=color)\n", " \n", " # Add area label (uncertainty product)\n", " area = width * height\n", " ax.text(center[0], center[1] - height/2 - 0.3, \n", " f'Area = {width:.2f} $\\\\times$ {height: .2f} $\\\\pi$', ha='center')\n", "\n", "# Set axis labels and limits\n", "ax.set_xlabel('Parameter $\\\\theta$')\n", "ax.set_ylabel('Capacity $C$')\n", "ax.set_xlim(-3, 7)\n", "ax.set_ylim(-3, 7)\n", "ax.set_aspect('equal')\n", "ax.grid(True, linestyle='--', alpha=0.7)\n", "ax.set_title('Parameter-Capacity Uncertainty Relation')\n", "\n", "# Add diagonal line representing constant uncertainty product\n", "x = np.linspace(0.25, 6, 100)\n", "y = k/x\n", "ax.plot(x, y, 'k--', alpha=0.5, label='Minimum uncertainty: $\\\\Delta \\\\theta \\\\Delta C = k$')\n", "\n", "ax.legend(loc='upper right')\n", "mlai.write_figure(filename='uncertainty-ellipses.svg', \n", " directory = './information-game')\n", "```\n", "\n", "\n", "\n", "Figure: Visualisaiton of the uncertainty trade-off between parameter\n", "precision and capacity.\n", "\n", "This visualization helps explain why information reservoirs with\n", "quantum-like properties naturally emerge at minimal entropy. The\n", "uncertainty principle is not imposed but arises naturally from the\n", "constraints of Shannon information theory applied to physical systems\n", "operating at minimal entropy.\n", "\n", "## Scaling to Large Systems: Emergent Statistical Behavior\n", "\n", "\\[edit\\]\n", "\n", "We now extend our analysis to much larger systems with thousands of\n", "position-momentum pairs. This allows us to observe emergent statistical\n", "behaviors and phase transitions that aren’t apparent in smaller systems.\n", "\n", "Large-scale systems reveal how microscopic uncertainty constraints lead\n", "to macroscopic statistical patterns. By analyzing thousands of\n", "position-momentum pairs simultaneously, we can identify emergent\n", "behaviors and natural clustering of dynamical patterns.\n", "\n", "``` python\n", "# Optimized implementation for very large systems\n", "def large_scale_gradient_ascent(n_pairs, steps=100, learning_rate=1, sample_interval=5):\n", " \"\"\"\n", " Memory-efficient implementation of gradient ascent for very large systems.\n", " \n", " Parameters:\n", " -----------\n", " n_pairs: int\n", " Number of position-momentum pairs\n", " steps: int\n", " Number of gradient steps to take\n", " learning_rate: float\n", " Step size for gradient ascent\n", " sample_interval: int\n", " Store state every sample_interval steps to save memory\n", " \n", " Returns:\n", " --------\n", " sampled_states: list\n", " Sparse history of states at sampled intervals\n", " entropy_history: list\n", " Complete history of entropy values\n", " uncertainty_metrics: dict\n", " Metrics tracking uncertainty products over time\n", " \"\"\"\n", " # Initialize with diagonal precision matrix (no need to store full matrix)\n", " dim = 2 * n_pairs\n", " eigenvalues = np.zeros(dim)\n", " \n", " # Initialize with minimal entropy state\n", " for i in range(n_pairs):\n", " squeeze = 0.1 * (1 + (i % 10)) # Cycle through 10 different squeeze factors\n", " eigenvalues[2*i] = 1.0 / (squeeze * min_uncertainty_product)\n", " eigenvalues[2*i+1] = 1.0 / (min_uncertainty_product / squeeze)\n", " \n", " # Storage for results (sparse to save memory)\n", " sampled_states = []\n", " entropy_history = []\n", " uncertainty_products = np.zeros((steps+1, n_pairs))\n", " \n", " # Initial entropy and uncertainty\n", " entropy = 0.5 * (dim * (1 + np.log(2*np.pi)) - np.sum(np.log(eigenvalues)))\n", " entropy_history.append(entropy)\n", " \n", " # Track initial uncertainty products\n", " for i in range(n_pairs):\n", " uncertainty_products[0, i] = 1.0 / np.sqrt(eigenvalues[2*i] * eigenvalues[2*i+1])\n", " \n", " # Store initial state\n", " sampled_states.append(eigenvalues.copy())\n", " \n", " # Gradient ascent loop\n", " for step in range(steps):\n", " # Compute gradient with respect to eigenvalues (diagonal precision)\n", " grad = -1.0 / (2.0 * eigenvalues)\n", " \n", " # Project gradient to respect constraints\n", " for i in range(n_pairs):\n", " idx1, idx2 = 2*i, 2*i+1\n", " \n", " # Current uncertainty product (in eigenvalue space, this is inverse)\n", " current_product = eigenvalues[idx1] * eigenvalues[idx2]\n", " \n", " # If we're already at minimum uncertainty, project gradient\n", " if abs(current_product - 1/min_uncertainty_product**2) < 1e-6:\n", " # Tangent direction preserves the product\n", " tangent = np.array([-eigenvalues[idx2], -eigenvalues[idx1]])\n", " tangent = tangent / np.linalg.norm(tangent)\n", " \n", " # Project the gradient onto this tangent\n", " pair_gradient = np.array([grad[idx1], grad[idx2]])\n", " projection = np.dot(pair_gradient, tangent) * tangent\n", " \n", " grad[idx1] = projection[0]\n", " grad[idx2] = projection[1]\n", " \n", " # Update eigenvalues\n", " eigenvalues += learning_rate * grad\n", " \n", " # Ensure eigenvalues remain positive\n", " eigenvalues = np.maximum(eigenvalues, 1e-10)\n", " \n", " # Compute entropy\n", " entropy = 0.5 * (dim * (1 + np.log(2*np.pi)) - np.sum(np.log(eigenvalues)))\n", " entropy_history.append(entropy)\n", " \n", " # Track uncertainty products\n", " for i in range(n_pairs):\n", " uncertainty_products[step+1, i] = 1.0 / np.sqrt(eigenvalues[2*i] * eigenvalues[2*i+1])\n", " \n", " # Store state at sampled intervals\n", " if step % sample_interval == 0 or step == steps-1:\n", " sampled_states.append(eigenvalues.copy())\n", " \n", " # Compute regime classifications\n", " regimes = np.zeros((steps+1, n_pairs), dtype=object)\n", " for step in range(steps+1):\n", " for i in range(n_pairs):\n", " if abs(uncertainty_products[step, i] - min_uncertainty_product) < 0.1*min_uncertainty_product:\n", " regimes[step, i] = \"Quantum-like\"\n", " else:\n", " regimes[step, i] = \"Classical-like\"\n", " \n", " uncertainty_metrics = {\n", " 'uncertainty_products': uncertainty_products,\n", " 'regimes': regimes\n", " }\n", " \n", " return sampled_states, entropy_history, uncertainty_metrics\n", "\n", "# Add gradient check function for large systems\n", "def check_large_system_gradient(n_pairs=10, epsilon=1e-6):\n", " \"\"\"\n", " Check the analytical gradient against numerical gradient for a large system.\n", " \n", " Parameters:\n", " -----------\n", " n_pairs: int\n", " Number of position-momentum pairs to test\n", " epsilon: float\n", " Small perturbation for numerical gradient\n", " \n", " Returns:\n", " --------\n", " max_diff: float\n", " Maximum difference between analytical and numerical gradients\n", " \"\"\"\n", " # Initialize a small test system\n", " dim = 2 * n_pairs\n", " eigenvalues = np.zeros(dim)\n", " \n", " # Initialize with minimal entropy state\n", " for i in range(n_pairs):\n", " squeeze = 0.1 * (1 + (i % 10))\n", " eigenvalues[2*i] = 1.0 / (squeeze * min_uncertainty_product)\n", " eigenvalues[2*i+1] = 1.0 / (min_uncertainty_product / squeeze)\n", " \n", " # Compute analytical gradient\n", " analytical_grad = -1.0 / (2.0 * eigenvalues)\n", " \n", " # Compute numerical gradient\n", " numerical_grad = np.zeros_like(eigenvalues)\n", " \n", " # Function to compute entropy from eigenvalues\n", " def compute_entropy_from_eigenvalues(evals):\n", " return 0.5 * (dim * (1 + np.log(2*np.pi)) - np.sum(np.log(evals)))\n", " \n", " # Initial entropy\n", " base_entropy = compute_entropy_from_eigenvalues(eigenvalues)\n", " \n", " # Compute numerical gradient\n", " for i in range(dim):\n", " # Perturb eigenvalue up\n", " eigenvalues_plus = eigenvalues.copy()\n", " eigenvalues_plus[i] += epsilon\n", " entropy_plus = compute_entropy_from_eigenvalues(eigenvalues_plus)\n", " \n", " # Perturb eigenvalue down\n", " eigenvalues_minus = eigenvalues.copy()\n", " eigenvalues_minus[i] -= epsilon\n", " entropy_minus = compute_entropy_from_eigenvalues(eigenvalues_minus)\n", " \n", " # Compute numerical gradient\n", " numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)\n", " \n", " # Compare\n", " diff = np.abs(analytical_grad - numerical_grad)\n", " max_diff = np.max(diff)\n", " avg_diff = np.mean(diff)\n", " \n", " print(f\"Gradient check for {n_pairs} position-momentum pairs:\")\n", " print(f\"Maximum difference: {max_diff:.8f}\")\n", " print(f\"Average difference: {avg_diff:.8f}\")\n", " \n", " # Verify gradient ascent direction\n", " step_size = 0.01\n", " eigenvalues_after = eigenvalues + step_size * analytical_grad\n", " entropy_after = compute_entropy_from_eigenvalues(eigenvalues_after)\n", " \n", " print(f\"Entropy before step: {base_entropy:.6f}\")\n", " print(f\"Entropy after step: {entropy_after:.6f}\")\n", " print(f\"Change in entropy: {entropy_after - base_entropy:.6f}\")\n", " \n", " if entropy_after > base_entropy:\n", " print(\"✓ Gradient ascent confirmed: entropy increases\")\n", " else:\n", " print(\"✗ Error: entropy decreases with gradient step\")\n", " \n", " return max_diff\n", "\n", "# Analyze statistical properties of large-scale system\n", "def analyze_large_system(uncertainty_metrics, n_pairs, steps):\n", " \"\"\"\n", " Analyze statistical properties of a large-scale system.\n", " \n", " Parameters:\n", " -----------\n", " uncertainty_metrics: dict\n", " Metrics from large_scale_gradient_ascent\n", " n_pairs: int\n", " Number of position-momentum pairs\n", " steps: int\n", " Number of gradient steps taken\n", " \n", " Returns:\n", " --------\n", " analysis: dict\n", " Statistical analysis results\n", " \"\"\"\n", " uncertainty_products = uncertainty_metrics['uncertainty_products']\n", " regimes = uncertainty_metrics['regimes']\n", " \n", " # Compute statistics over time\n", " mean_uncertainty = np.mean(uncertainty_products, axis=1)\n", " std_uncertainty = np.std(uncertainty_products, axis=1)\n", " min_uncertainty_over_time = np.min(uncertainty_products, axis=1)\n", " max_uncertainty_over_time = np.max(uncertainty_products, axis=1)\n", " \n", " # Count regime transitions\n", " quantum_count = np.zeros(steps+1)\n", " for step in range(steps+1):\n", " quantum_count[step] = np.sum(regimes[step] == \"Quantum-like\")\n", " \n", " # Identify clusters of similar behavior\n", " from sklearn.cluster import KMeans\n", " \n", " # Reshape to have each pair as a sample with its uncertainty trajectory as features\n", " pair_trajectories = uncertainty_products.T # shape: (n_pairs, steps+1)\n", " \n", " # Use fewer clusters for very large systems\n", " n_clusters = min(10, n_pairs // 100)\n", " kmeans = KMeans(n_clusters=n_clusters, random_state=42)\n", " cluster_labels = kmeans.fit_predict(pair_trajectories)\n", " \n", " # Count pairs in each cluster\n", " cluster_counts = np.bincount(cluster_labels, minlength=n_clusters)\n", " \n", " # Get representative pairs from each cluster (closest to centroid)\n", " representative_pairs = []\n", " for i in range(n_clusters):\n", " cluster_members = np.where(cluster_labels == i)[0]\n", " if len(cluster_members) > 0:\n", " # Find pair closest to cluster centroid\n", " centroid = kmeans.cluster_centers_[i]\n", " distances = np.linalg.norm(pair_trajectories[cluster_members] - centroid, axis=1)\n", " closest_idx = cluster_members[np.argmin(distances)]\n", " representative_pairs.append(closest_idx)\n", " \n", " return {\n", " 'mean_uncertainty': mean_uncertainty,\n", " 'std_uncertainty': std_uncertainty,\n", " 'min_uncertainty': min_uncertainty_over_time,\n", " 'max_uncertainty': max_uncertainty_over_time,\n", " 'quantum_count': quantum_count,\n", " 'quantum_fraction': quantum_count / n_pairs,\n", " 'cluster_counts': cluster_counts,\n", " 'representative_pairs': representative_pairs,\n", " 'cluster_labels': cluster_labels\n", " }\n", "\n", "# Visualize results for large-scale system\n", "def visualize_large_system(sampled_states, entropy_history, uncertainty_metrics, analysis, n_pairs, steps):\n", " \"\"\"\n", " Create visualizations for large-scale system results.\n", " \n", " Parameters:\n", " -----------\n", " sampled_states: list\n", " Sparse history of eigenvalues\n", " entropy_history: list\n", " History of entropy values\n", " uncertainty_metrics: dict\n", " Uncertainty metrics over time\n", " analysis: dict\n", " Statistical analysis results\n", " n_pairs: int\n", " Number of position-momentum pairs\n", " steps: int\n", " Number of gradient steps taken\n", " \"\"\"\n", " # Plot entropy evolution\n", " plt.figure(figsize=(10, 6))\n", " plt.plot(entropy_history)\n", " plt.xlabel('Gradient Ascent Step')\n", " plt.ylabel('Entropy')\n", " plt.title(f'Entropy Evolution for {n_pairs} Position-Momentum Pairs')\n", " plt.grid(True)\n", " \n", " # Plot uncertainty statistics\n", " plt.figure(figsize=(10, 6))\n", " plt.plot(analysis['mean_uncertainty'], label='Mean uncertainty')\n", " plt.fill_between(range(steps+1), \n", " analysis['mean_uncertainty'] - analysis['std_uncertainty'],\n", " analysis['mean_uncertainty'] + analysis['std_uncertainty'],\n", " alpha=0.3, label='±1 std dev')\n", " plt.plot(analysis['min_uncertainty'], 'g--', label='Min uncertainty')\n", " plt.plot(analysis['max_uncertainty'], 'r--', label='Max uncertainty')\n", " plt.axhline(y=min_uncertainty_product, color='k', linestyle=':', label='Quantum limit')\n", " plt.xlabel('Gradient Ascent Step')\n", " plt.ylabel('Uncertainty Product (ΔxΔp)')\n", " plt.title(f'Uncertainty Evolution Statistics for {n_pairs} Pairs')\n", " plt.legend()\n", " plt.grid(True)\n", " \n", " # Plot quantum-classical transition\n", " plt.figure(figsize=(10, 6))\n", " plt.plot(analysis['quantum_fraction'] * 100)\n", " plt.xlabel('Gradient Ascent Step')\n", " plt.ylabel('Percentage of Pairs (%)')\n", " plt.title('Percentage of Pairs in Quantum-like Regime')\n", " plt.ylim(0, 100)\n", " plt.grid(True)\n", " \n", " # Plot representative pairs from each cluster\n", " plt.figure(figsize=(12, 8))\n", " for i, pair_idx in enumerate(analysis['representative_pairs']):\n", " cluster_idx = analysis['cluster_labels'][pair_idx]\n", " count = analysis['cluster_counts'][cluster_idx]\n", " plt.plot(uncertainty_metrics['uncertainty_products'][:, pair_idx], \n", " label=f'Cluster {i+1} ({count} pairs, {count/n_pairs*100:.1f}%)')\n", " \n", " plt.axhline(y=min_uncertainty_product, color='k', linestyle=':', label='Quantum limit')\n", " plt.xlabel('Gradient Ascent Step')\n", " plt.ylabel('Uncertainty Product ($\\Delta x \\Delta p$)')\n", " plt.title('Representative Uncertainty Trajectories from Each Cluster')\n", " plt.legend()\n", " plt.grid(True)\n", " \n", " # Visualize uncertainty ellipses for representative pairs\n", " if len(sampled_states) > 0:\n", " # Get indices of sampled steps\n", " sampled_steps = list(range(0, steps+1, (steps+1)//len(sampled_states)))\n", " if sampled_steps[-1] != steps:\n", " sampled_steps[-1] = steps\n", " \n", " # Only visualize a few representative pairs\n", " pairs_to_visualize = analysis['representative_pairs'][:min(4, len(analysis['representative_pairs']))]\n", " \n", " fig, axes = plt.subplots(len(pairs_to_visualize), len(sampled_states), \n", " figsize=(4*len(sampled_states), 3*len(pairs_to_visualize)))\n", " \n", " # Handle case of single pair or single step\n", " if len(pairs_to_visualize) == 1:\n", " axes = axes.reshape(1, -1)\n", " if len(sampled_states) == 1:\n", " axes = axes.reshape(-1, 1)\n", " \n", " for row, pair_idx in enumerate(pairs_to_visualize):\n", " for col, step_idx in enumerate(range(len(sampled_states))):\n", " ax = axes[row, col]\n", " eigenvalues = sampled_states[step_idx]\n", " \n", " # Extract eigenvalues for this pair\n", " idx1, idx2 = 2*pair_idx, 2*pair_idx+1\n", " pos_eigenvalue = eigenvalues[idx1]\n", " mom_eigenvalue = eigenvalues[idx2]\n", " \n", " # Convert precision eigenvalues to covariance eigenvalues\n", " cov_eigenvalues = np.array([1/pos_eigenvalue, 1/mom_eigenvalue])\n", " \n", " # Calculate ellipse parameters (assuming principal axes aligned with coordinate axes)\n", " width, height = 2 * np.sqrt(cov_eigenvalues)\n", " \n", " # Create ellipse\n", " ellipse = Ellipse((0, 0), width=width, height=height, angle=0,\n", " edgecolor='blue', facecolor='lightblue', alpha=0.5)\n", " \n", " # Add to plot\n", " ax.add_patch(ellipse)\n", " ax.set_xlim(-3, 3)\n", " ax.set_ylim(-3, 3)\n", " ax.set_aspect('equal')\n", " ax.grid(True)\n", " \n", " # Add minimum uncertainty circle\n", " min_circle = plt.Circle((0, 0), min_uncertainty_product, \n", " fill=False, color='red', linestyle='--')\n", " ax.add_patch(min_circle)\n", " \n", " # Compute uncertainty product\n", " uncertainty_product = np.sqrt(1/(pos_eigenvalue * mom_eigenvalue))\n", " \n", " # Determine regime\n", " if abs(uncertainty_product - min_uncertainty_product) < 0.1*min_uncertainty_product:\n", " regime = \"Quantum-like\"\n", " color = 'red'\n", " else:\n", " regime = \"Classical-like\"\n", " color = 'blue'\n", " \n", " # Add labels\n", " if row == 0:\n", " step_num = sampled_steps[step_idx]\n", " ax.set_title(f\"Step {step_num}\")\n", " if col == 0:\n", " cluster_idx = analysis['cluster_labels'][pair_idx]\n", " count = analysis['cluster_counts'][cluster_idx]\n", " ax.set_ylabel(f\"Cluster {row+1}\\n({count} pairs)\")\n", " \n", " # Add uncertainty product text\n", " ax.text(0.05, 0.95, f\"ΔxΔp = {uncertainty_product:.2f}\",\n", " transform=ax.transAxes, fontsize=10, verticalalignment='top')\n", " \n", " # Add regime text\n", " ax.text(0.05, 0.85, regime, transform=ax.transAxes, \n", " fontsize=10, verticalalignment='top', color=color)\n", " \n", " ax.set_xlabel(\"Position\")\n", " ax.set_ylabel(\"Momentum\")\n", " \n", " plt.tight_layout()\n", "```\n", "\n", "In large-scale systems, we observe several emergent phenomena that\n", "aren’t apparent in smaller systems:\n", "\n", "1. *Statistical phase transitions*: As the system evolves, we observe a\n", " gradual transition from predominantly quantum-like behavior to\n", " predominantly classical-like behavior. This transition resembles a\n", " phase transition in statistical physics.\n", "\n", "2. *Natural clustering*: The thousands of position-momentum pairs\n", " naturally organize into clusters with similar dynamical behaviors.\n", " Some clusters maintain quantum-like characteristics for longer\n", " periods, while others quickly transition to classical-like behavior.\n", "\n", "3. *Scale-invariant patterns*: The statistical properties of the system\n", " show remarkable consistency across different scales, suggesting\n", " underlying universal principles in the entropy-uncertainty\n", " relationship.\n", "\n", "The quantum-classical boundary, which appears sharp in small systems,\n", "becomes a statistical property in large systems. At any given time, some\n", "fraction of the system exhibits quantum-like behavior while the\n", "remainder shows classical-like characteristics. This fraction evolves\n", "over time, creating a dynamic boundary between quantum and classical\n", "regimes.\n", "\n", "The clustering analysis reveals natural groupings of position-momentum\n", "pairs based on their dynamical trajectories. These clusters represent\n", "different “modes” of behavior within the large system, with some modes\n", "maintaining quantum coherence for longer periods while others quickly\n", "decohere into classical-like states.\n", "\n", "``` python\n", "import numpy as np\n", "from scipy.linalg import eigh\n", "from sklearn.cluster import KMeans\n", "```\n", "\n", "``` python\n", "# Constants\n", "hbar = 1.0 # Normalized Planck's constant\n", "min_uncertainty_product = hbar/2\n", "\n", "# Perform gradient check on a smaller test system\n", "print(\"Performing gradient check for large system implementation:\")\n", "gradient_error = check_large_system_gradient(n_pairs=10)\n", "print(f\"Gradient check completed with maximum error: {gradient_error:.8f}\")\n", "\n", "# Run large-scale simulation\n", "n_pairs = 5000 # 5000 position-momentum pairs (10,000×10,000 matrix)\n", "steps = 100 # Fewer steps for large system\n", "\n", "# Run the optimized implementation\n", "sampled_states, entropy_history, uncertainty_metrics = large_scale_gradient_ascent(\n", " n_pairs=n_pairs, steps=steps, learning_rate=0.01, sample_interval=5)\n", "\n", "# Analyze results\n", "analysis = analyze_large_system(uncertainty_metrics, n_pairs, steps)\n", "```\n", "\n", "``` python\n", "import matplotlib.pyplot as plt\n", "import mlai.plot as plot\n", "import mlai\n", "from matplotlib.patches import Ellipse, Circle\n", "```\n", "\n", "``` python\n", "# Visualize results\n", "visualize_large_system(sampled_states, entropy_history, uncertainty_metrics, \n", " analysis, n_pairs, steps)\n", "\n", "# Additional plot: Phase transition visualization\n", "plt.figure(figsize=(10, 6))\n", "quantum_fraction = analysis['quantum_fraction'] * 100\n", "classical_fraction = 100 - quantum_fraction\n", "\n", "plt.stackplot(range(steps+1), \n", " [quantum_fraction, classical_fraction],\n", " labels=['Quantum-like', 'Classical-like'],\n", " colors=['red', 'blue'], alpha=0.7)\n", "\n", "plt.xlabel('Gradient Ascent Step')\n", "plt.ylabel('Percentage of System (%)')\n", "plt.title('Quantum-Classical Phase Transition')\n", "plt.legend(loc='center right')\n", "plt.ylim(0, 100)\n", "plt.grid(True)\n", "\n", "mlai.write_figure(filename='large-scale-gradient-ascent-quantum-classical-phase-transition.svg', \n", " directory='./information-game')\n", "```\n", "\n", "\n", "\n", "Figure: Large-scale gradient ascent reveals a quantum-classical phase\n", "transition.\n", "\n", "The large-scale simulation reveals how microscopic uncertainty\n", "constraints lead to macroscopic statistical patterns. The system\n", "naturally organizes into regions of quantum-like and classical-like\n", "behavior, with a dynamic boundary that evolves over time.\n", "\n", "This perspective provides a new way to understand the quantum-classical\n", "transition not as a sharp boundary, but as a statistical property of\n", "large systems. The fraction of the system exhibiting quantum-like\n", "behavior gradually decreases as entropy increases, creating a smooth\n", "transition between quantum and classical regimes.\n", "\n", "The clustering analysis identifies natural groupings of\n", "position-momentum pairs based on their dynamical trajectories. These\n", "clusters represent different “modes” of behavior within the large\n", "system, with some modes maintaining quantum coherence for longer periods\n", "while others quickly decohere into classical-like states.\n", "\n", "This approach to large-scale quantum-classical systems provides a\n", "powerful framework for understanding how microscopic quantum constraints\n", "manifest in macroscopic statistical behaviors. It bridges quantum\n", "mechanics and statistical physics through the common language of\n", "information theory and entropy.\n", "\n", "## Saddle Points\n", "\n", "\\[edit\\]\n", "\n", "Saddle points represent critical transitions in the game’s evolution\n", "where the gradient $\\nabla_{\\boldsymbol{\\theta}}S \\approx 0$ but the\n", "game is not at a maximum or minimum. At these points.\n", "\n", "1. The Fisher information matrix $G(\\boldsymbol{\\theta})$ has\n", " eigenvalues with significantly different magnitudes\n", "2. Some eigenvalues approach zero, creating “critically slowed”\n", " directions in parameter space\n", "3. Other eigenvalues remain large, allowing rapid evolution in certain\n", " directions\n", "\n", "This creates a natural separation between “memory” variables (associated\n", "with near-zero eigenvalues) and “processing” variables (associated with\n", "large eigenvalues). The game’s behavior becomes highly non-isotropic in\n", "parameter space.\n", "\n", "At saddle points, direct gradient ascent stalls, and the game must\n", "leverage the Fourier duality between parameters and capacity variables\n", "to continue entropy production. The duality relationship $$\n", "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)]\n", "$$ allows the game to progress by temporarily increasing uncertainty in\n", "capacity space, which creates gradients in previously flat directions of\n", "parameter space.\n", "\n", "These saddle points often coincide with phase transitions between\n", "parameter-dominated and capacity-dominated regimes, where the game’s\n", "fundamental character changes in terms of information processing\n", "capabilities.\n", "\n", "At saddle points, we see the first manifestation of the uncertainty\n", "principle that will be explored in more detail. The relationship between\n", "parameters and capacity variables becomes important as the game\n", "navigates these critical regions. The Fourier duality relationship $$\n", "c(M) = \\mathcal{F}[\\boldsymbol{\\theta}(M)]\n", "$$ is not just a mathematical convenience but represents a constraint on\n", "information processing that parallels emerges from uncertainty\n", "principles. The duality is essential for understanding how the game\n", "maintains both precision in parameters and sufficient capacity for\n", "information storage.\n", "\n", "The emergence of critically slowed directions at saddle points directly\n", "leads to the formation of information reservoirs that we’ll explore in\n", "depth. These reservoirs form when certain parameter combinations become\n", "effectively “frozen” due to near-zero eigenvalues in the Fisher\n", "information matrix. This natural separation of timescales creates a\n", "hierarchical memory structure that resembles biological information\n", "processing systems, where different variables operate at different\n", "temporal scales. The game’s deliberate use of steepest ascent rather\n", "than natural gradient ensures these reservoirs form organically as the\n", "system evolves.\n", "\n", "## Saddle Point Seeking Behaviour\n", "\n", "In the game’s evolution, we follow steepest ascent in parameter space to\n", "maximize entropy. Let’s contrast with the *natural gradient* approach\n", "that is often used in information geometry.\n", "\n", "The steepest ascent direction in Euclidean space is given by, $$\n", "\\Delta \\boldsymbol{\\theta}_{\\text{steepest}} = \\eta \\nabla_{\\boldsymbol{\\theta}} S = \\eta \\mathbf{g}\n", "$$ where $\\eta$ is a learning rate and $\\mathbf{g}$ is the entropy\n", "gradient.\n", "\n", "In contrast, the natural gradient adjusts the update direction according\n", "to the Fisher information geometry, $$\n", "\\Delta \\boldsymbol{\\theta}_{\\text{natural}} = \\eta G(\\boldsymbol{\\theta})^{-1} \\nabla_{\\boldsymbol{\\theta}} S = \\eta G(\\boldsymbol{\\theta})^{-1} \\mathbf{g}\n", "$$ where $G(\\boldsymbol{\\theta})$ is the Fisher information matrix. This\n", "represents a Newton step in the natural parameter space. Often the\n", "Newton step is difficult to compute, but for exponential families and\n", "their entropies the Fisher information has a form closely related to the\n", "gradients and would be easy to leverage. The game *explicitly* uses\n", "steepest ascent and this leads to very different behaviour, in\n", "particular near saddle points. In this regime\n", "\n", "1. *Steepest ascent* slows dramatically in directions where the\n", " gradient is small, leading to extremely slow progress along the\n", " critically slowed modes. This actually helps the game by preserving\n", " information in these modes while allowing continued evolution in\n", " other directions.\n", "\n", "2. *Natural gradient* would normalize the updates by the Fisher\n", " information, potentially accelerating progress in critically slowed\n", " directions. This would destroy the natural emergence of information\n", " reservoirs that we desire.\n", "\n", "The use of steepest ascent rather than natural gradient is deliberate in\n", "our game. It allows the Fisher information matrix’s eigenvalue structure\n", "to directly influence the temporal dynamics, creating a natural\n", "separation of timescales that preserves information in critically slowed\n", "modes while allowing rapid evolution in others.\n", "\n", "As the game approaches a saddle point\n", "\n", "1. The gradient $\\nabla_{\\boldsymbol{\\theta}} S$ approaches zero in\n", " some directions but remains non-zero in others\n", "\n", "2. The eigendecomposition of the Fisher information matrix\n", " $G(\\boldsymbol{\\theta}) = V \\Lambda V^T$ reveals which directions\n", " are critically slowed\n", "\n", "3. Update magnitudes in different directions become proportional to\n", " their corresponding eigenvalues\n", "\n", "4. This creates the hierarchical timescale separation that forms the\n", " basis of our memory structure\n", "\n", "This behavior creates a computational architecture where different\n", "variables naturally assume different functional roles based on their\n", "update dynamics, without requiring explicit design. The information\n", "geometry of the parameter space, combined with steepest ascent dynamics,\n", "self-organizes the game into memory and processing components.\n", "\n", "The saddle point dynamics in Jaynes’ World provide a mathematical\n", "framework for understanding how the game navigates the information\n", "landscapes. The balance between fast-evolving “processing” variables and\n", "slow-evolving “memory” variables offers insights into how complexity\n", "might emerge in environments that instantaneously maximise entropy.\n", "\n", "## Dynamical System\n", "\n", "\\[edit\\]\n", "\n", "Consider a dynamical system governed by the equation, $$\n", "\\dot{\\boldsymbol{\\theta}} = -G(\\boldsymbol{\\theta})\\boldsymbol{\\theta},\n", "$$ where $\\boldsymbol{\\theta} \\in \\mathbb{R}^n$ represents the natural\n", "parameters of an exponential family distribution $\\rho_\\theta$, and\n", "$G(\\boldsymbol{\\theta})$ is the Fisher information matrix with elements:\n", "$$\n", "G_{ij}(\\boldsymbol{\\theta}) = \\mathbb{E}_{\\rho_\\theta}\\left[\\frac{\\partial \\log \\rho_\\theta}{\\partial \\theta_i}\\frac{\\partial \\log \\rho_\\theta}{\\partial \\theta_j}\\right]\n", "$$ This system describes the steepest ascent in the entropy of the\n", "distribution $\\rho_\\theta$, constrained to the manifold of exponential\n", "family distributions. Unlike natural gradient descent, which optimizes a\n", "cost function, this system maximizes the entropy of the underlying\n", "configuration governed by the exponential family distribution or density\n", "matrix.\n", "\n", "## Entropy Bounds and Compactness\n", "\n", "Recall that the entropy of the system is bounded such that, $$\n", "0 \\leq S[\\rho_\\theta] \\leq N\n", "$$ where $S[\\rho_\\theta] = -\\mathbb{E}_{\\rho_\\theta}[\\log \\rho_\\theta]$\n", "is the entropy functional, and $N$ represents the maximum possible\n", "entropy value for the system. These bounds create a compact manifold in\n", "the space of distributions, which constrains the parameter evolution.\n", "\n", "## Resolution Constraints\n", "\n", "The system exhibits a minimum resolution constraint, formulated as an\n", "uncertainty relation between parameters $\\boldsymbol{\\theta}$ and their\n", "conjugate variables (the gradients) $$\n", "\\Delta\\theta_i \\cdot \\Delta(\\nabla_{\\theta_i}) \\geq \\frac{c}{2},\n", "$$ where $c$ is a constant representing the minimum resolution of the\n", "system. This constraint imposes limits on the precision with which\n", "parameters can be simultaneously specified with their conjugate\n", "variables.\n", "\n", "# Multi-Scale Dynamics and Parameter Separation\n", "\n", "## Parameter Partitioning\n", "\n", "The parameter vector $\\boldsymbol{\\theta}$ can be partitioned into two\n", "subsets\n", "\n", "- $\\boldsymbol{\\theta}_M$: Parameters with gradients below the\n", " resolution threshold (slow-moving)\n", "- $\\boldsymbol{\\theta}_X$: Parameters with resolvable gradients\n", " (fast-moving)\n", "\n", "The Fisher information matrix can also be partitioned $$\n", "G(\\boldsymbol{\\theta}) = \\begin{bmatrix} G_{XX} & G_{XM} \\\\ G_{MX} & G_{MM} \\end{bmatrix}\n", "$$\n", "\n", "## Schur Complement Analysis\n", "\n", "The Schur complement of $G_{MM}$ in $G(\\boldsymbol{\\theta})$ is defined\n", "as $$\n", "G^\\prime_X = G_{XX} - G_{XM}G_{MM}^{-1}G_{MX}\n", "$$ This matrix $G^\\prime_X$ represents the effective information\n", "geometry for the fast parameters after accounting for their coupling to\n", "the slow parameters. It yields a dynamical equation for the fast\n", "parameters,\n", "$$\\dot{\\boldsymbol{\\theta}}_X = -G^\\prime_X\\boldsymbol{\\theta}_X + \\text{correction terms}\n", "$$ The Schur complement provides a framework for analyzing how\n", "resolution constraints create a natural separation of time scales in the\n", "system’s evolution.\n", "\n", "## Sparsification Through Entropy Maximization\n", "\n", "*speculative*\n", "\n", "As the system evolves to maximize entropy, it should move toward states\n", "where parameters become more statistically independent, as minimising\n", "mutual information between variables reduces the joint entropy. Any\n", "tendency toward independence during entropy maximization would cause the\n", "Fisher information matrix $G(\\boldsymbol{\\theta})$ to trend toward a\n", "more diagonal structure over time, as off-diagonal elements represent\n", "statistical correlations between parameters.\n", "\n", "# Action Functional Representation\n", "\n", "## Action Definition\n", "\n", "The dynamics of the system can be derived from an action functional $$\n", "A[\\gamma] = \\int_0^1 \\dot{\\gamma}(t)^T G(\\gamma(t)) \\dot{\\gamma}(t) \\, \\text{d}t,\n", "$$ where $\\gamma(t)$ represents a path through parameter space.\n", "\n", "## Variational Analysis\n", "\n", "For the path that minimizes this action, the first variation must\n", "vanish, $$\n", "\\left. \\frac{\\text{d}}{\\text{d}\\epsilon} A[\\gamma + \\epsilon \\eta] \\right|_{\\epsilon=0} = 0,\n", "$$ where $\\eta(t)$ is an arbitrary function with\n", "$\\eta(0) = \\eta(1) = 0$.\n", "\n", "Through variational calculus we recover the Euler-Lagrange equation, $$\n", "\\frac{\\text{d}}{\\text{d}t}(G(\\gamma)\\dot{\\gamma}) = \\frac{1}{2} \\dot{\\gamma}^T \\frac{\\partial G}{\\partial \\gamma} \\dot{\\gamma}\n", "$$\n", "\n", "## Time Parameterization\n", "\n", "To recover the original dynamical equation, we introduce the time\n", "parameterization, $$\n", "\\frac{\\text{d}\\tau}{\\text{d}t} = \\frac{1}{\\boldsymbol{\\theta}^\\top G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}}\n", "$$\n", "\n", "Under this parameterization, the Euler-Lagrange equation simplifies to\n", "our original dynamics. To prove this, we start with the parameterized\n", "path $\\gamma(t) = \\boldsymbol{\\theta}(\\tau(t))$, which gives $$\n", "\\dot{\\gamma} = \\frac{\\text{d}\\boldsymbol{\\theta}}{\\text{d}\\tau} \\frac{\\text{d}\\tau}{\\text{d}t}.\n", "$$ Substituting this into the Euler-Lagrange equation and applying our\n", "specific parameterization, $$\n", "\\frac{\\text{d}}{\\text{d}t}(G(\\gamma)\\dot{\\gamma}) = \\frac{\\text{d}}{\\text{d}t}\\left(G(\\boldsymbol{\\theta})\\frac{\\text{d}\\boldsymbol{\\theta}}{\\text{d}\\tau}\\frac{\\text{d}\\tau}{\\text{d}t}\\right) = \\frac{1}{2} \\dot{\\gamma}^\\top \\frac{\\partial G}{\\partial \\gamma} \\dot{\\gamma}\n", "$$\n", "\n", "With our choice of\n", "$\\frac{\\text{d}\\tau}{\\text{d}t} = \\frac{1}{\\boldsymbol{\\theta}^\\top G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}}$\n", "and after algebraic manipulation, this reduces to $$\n", "\\frac{\\text{d}\\boldsymbol{\\theta}}{\\text{d}\\tau} = -G(\\boldsymbol{\\theta})\\boldsymbol{\\theta}\n", "$$ and so our original dynamical equation when expressed in terms of the\n", "*system time* $\\tau$ confirming that our action functional correctly\n", "generates the original dynamics.\n", "\n", "## Information-Geometric Interpretation of Time Parameterization\n", "\n", "The time parameterization can be rewritten by recognizing that\n", "$G(\\boldsymbol{\\theta})\\boldsymbol{\\theta} = -\\nabla_\\boldsymbol{\\theta} S[\\rho_\\theta]$,\n", "the negative gradient of entropy with respect to the parameters $$\n", "\\frac{\\text{d}\\tau}{\\text{d}t} = \\frac{1}{\\boldsymbol{\\theta}^\\top G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}} = \\frac{1}{-\\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta} S[\\rho_\\boldsymbol{\\theta}]}.\n", "$$ The inverse relation is $$\n", "\\frac{\\text{d}t}{\\text{d}\\tau} = \\boldsymbol{\\theta}^\\top G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta} = -\\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta} S[\\rho_\\boldsymbol{\\theta}]\n", "$$ which expresses the rate at which parameterized time flows relative\n", "to system time as the directional derivative of entropy along the\n", "parameter vector. It measures the entropy production rate of the system\n", "in the direction of the current parameter vector.\n", "\n", "# Information-Theoretic Interpretation\n", "\n", "## Entropy Maximization\n", "\n", "The dynamical system describes the steepest ascent path in entropy\n", "space, constrained by the structure of the density matrix\n", "representation. As parameters evolve according to\n", "$\\dot{\\boldsymbol{\\theta}} = -G(\\boldsymbol{\\theta})\\boldsymbol{\\theta}$,\n", "we expect the system to move toward states of increasing statistical\n", "independence, which generally correspond to higher entropy\n", "configurations.\n", "\n", "## Information Flow and Topography\n", "\n", "The equation\n", "$\\dot{\\boldsymbol{\\theta}} = -G(\\boldsymbol{\\theta})\\boldsymbol{\\theta}$\n", "can be interpreted as an information flow equation, where the product\n", "$G(\\boldsymbol{\\theta})\\boldsymbol{\\theta}$ represents an information\n", "current that indicates how information propagates through the parameter\n", "space as the system evolves. Under this interpretation the Fisher\n", "information matrix represents the *information topography*.\n", "\n", "## Resolution and Uncertainty\n", "\n", "The resolution constraints introduce uncertainty relations into the\n", "classical statistical framework. These constraints alter the convergence\n", "properties of the entropy maximization process, creating bounds on\n", "information extraction and parameter precision.\n", "\n", "## Temporal Information Dynamics\n", "\n", "The time parameterization reveals that the flow of time in the system is\n", "connected to information processing efficiency\n", "\n", "1. In regions where parameters are strongly aligned with entropy change\n", " (high\n", " $\\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta} S[\\rho_\\boldsymbol{\\theta}]$),\n", " parameterized time flows rapidly relative to system time.\n", "\n", "2. In regions where parameters are weakly coupled to entropy change\n", " (low\n", " $\\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta} S[\\rho_\\boldsymbol{\\theta}]$),\n", " parameterized time flows slowly.\n", "\n", "3. At critical points where parameters become orthogonal to the entropy\n", " gradient\n", " ($\\boldsymbol{\\theta}^\\top \\nabla_\\boldsymbol{\\theta} S[\\rho_\\boldsymbol{\\theta}] \\approx 0$),\n", " the time parameterization approaches singularity indicating phase\n", " transitions in the system’s information structure.\n", "\n", "# Connections to Physical Theories\n", "\n", "## Frieden’s Extreme Physical Information\n", "\n", "Our framework connects to Frieden’s Extreme Physical Information (EPI)\n", "principle, which posits that physical systems evolve to extremize the\n", "physical information $I = K - J$, where $K$ represents the observed\n", "Fisher information and $J$ represents the intrinsic or bound\n", "information.\n", "\n", "Frieden (1998) demonstrated that fundamental laws of physics, including\n", "relativistic ones, can emerge from the EPI principle. This suggests our\n", "information-geometric framework is capable of describing a rich set of\n", "underlying “physics”.\n", "\n", "## Conclusion\n", "\n", "Viewing the dynamical system from the gradient flow\n", "$\\dot{\\boldsymbol{\\theta}} = -G(\\boldsymbol{\\theta})\\boldsymbol{\\theta}$\n", "provides a framework for understanding parameter evolution. By\n", "reformulating this system in terms of an action functional and analysing\n", "its behaviour through the Schur complement, we gain insights into the\n", "multi-scale nature of information flow in complex statistical systems.\n", "\n", "The time parameterisation that connects the action to the original\n", "dynamics reveals how the system’s evolution adjusts to information\n", "content, moving slowly through information-rich regions while rapidly\n", "traversing information-sparse areas. This establishes a connection\n", "between information flow and temporal dynamics.\n", "\n", "