{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Information Topography\n",
    "\n",
    "### Neil D. Lawrence\n",
    "\n",
    "### 2025-04-14"
   ],
   "id": "5deae5a9-1013-42ef-96ce-dda44265d2e3"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Abstract**: Physical landscapes are shaped by elevation, valleys, and\n",
    "peaks. We might expect information landscapes are molded by entropy,\n",
    "precision, and capacity constraints. To explore how these ideas might\n",
    "manifest we introduce Jaynes’ world, an entropy game that maximises\n",
    "instantaneous entropy production.\n",
    "\n",
    "In this talk we’ll argue that this landscape has a precision/capacity\n",
    "trade-off that suggests the underlying configuration requires a density\n",
    "matrix representation."
   ],
   "id": "db481d97-1c30-497b-a35d-f46be0dd24cb"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "$$"
   ],
   "id": "1268bc91-740e-4499-ac66-4c461724408d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "::: {.cell .markdown}\n",
    "\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!---->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- Do not edit this file locally. -->\n",
    "<!-- The last names to be defined. Should be defined entirely in terms of macros from above-->\n",
    "<!--\n",
    "\n",
    "-->"
   ],
   "id": "548e5087-7668-4030-af28-a934f201537b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Jaynes’ World\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-overview-and-definitions.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-overview-and-definitions.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "This game explores how structure, time, causality, and locality might\n",
    "emerge within a system governed solely by internal information-theoretic\n",
    "constraints. The hope is that it can serve as\n",
    "\n",
    "-   A *research framework* for observer-free dynamics and entropy-based\n",
    "    emergence,\n",
    "-   A *conceptual tool* for exploring the notion of an information\n",
    "    topography: A landscape in which information flows under\n",
    "    constraints."
   ],
   "id": "92d175fe-8d89-4c00-84a9-3a7dc7f99a2d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Definitions and Global Constraints"
   ],
   "id": "c8f1ec3b-c2a5-48c0-b869-dcd1cdbe1f5b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### System Structure\n",
    "\n",
    "Let $Z = \\{Z_1, Z_2, \\dots, Z_n\\}$ be the full set of system variables.\n",
    "At game turn $t$, define a partition where $X(t) \\subseteq Z$: are\n",
    "active variables (currently contributing to entropy) and\n",
    "$M(t) = Z \\setminus X(t)$: latent or frozen variables that are stored in\n",
    "the form of an *information reservoir* (Barato and Seifert\n",
    "(2014),Parrondo et al. (2015))."
   ],
   "id": "0fb4b523-84b7-4a8b-9660-8ae6eb0a0827"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Representation via Density Matrix\n",
    "\n",
    "We’ll argue that the configuration space must be represented by a\n",
    "density matrix, $$\n",
    "\\rho(\\boldsymbol{\\theta}) = \\frac{1}{Z(\\boldsymbol{\\theta})} \\exp\\left( \\sum_i \\theta_i H_i \\right),\n",
    "$$ where $\\boldsymbol{\\theta} \\in \\mathbb{R}^d$ are the natural\n",
    "parameters, each $H_i$ is a Hermitian operator associated with the\n",
    "observables and the partition function is given by\n",
    "$Z(\\boldsymbol{\\theta}) = \\mathrm{Tr}[\\exp(\\sum_i \\theta_i H_i)]$.\n",
    "\n",
    "From this we can see that the *log-partition function*, which has an\n",
    "interpretation as the cummulant generating function, is $$\n",
    "A(\\boldsymbol{\\theta}) = \\log Z(\\boldsymbol{\\theta})\n",
    "$$ and the von Neumann *entropy* is $$\n",
    "S(\\boldsymbol{\\theta}) = A(\\boldsymbol{\\theta}) - \\boldsymbol{\\theta}^\\top \\nabla A(\\boldsymbol{\\theta}).\n",
    "$$ We can show that the *Fisher Information Matrix* is $$\n",
    "G_{ij}(\\boldsymbol{\\theta}) = \\frac{\\partial^2 A}{\\partial \\theta_i \\partial \\theta_j}.\n",
    "$$"
   ],
   "id": "9a47612c-a044-47a2-89e5-8730fc5f631b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Entropy Capacity and Resolution\n",
    "\n",
    "We define our system to have a *maximum entropy* of $N$ bits. If the\n",
    "dimension $d$ of the parameter space is fixed, this implies a *minimum\n",
    "detectable resolution* in natural parameter space, $$\n",
    "\\varepsilon \\sim \\frac{1}{2^N},\n",
    "$$ where changes in natural parameters smaller than $\\varepsilon$ are\n",
    "treated as *invisible* by the system. As a result, system dynamics\n",
    "exhibit *discrete, detectable transitions* between distinguishable\n",
    "states.\n",
    "\n",
    "Note if the dimension $d$ scales with $N$ (e.g., $d = \\alpha N$ for some\n",
    "constant $\\alpha$), then the resolution constraint becomes more complex.\n",
    "In this case, the volume of distinguishable states $(\\varepsilon)^d$\n",
    "must equal $2^N$, which leads to $\\varepsilon = 2^{1/\\alpha}$, a\n",
    "constant independent of $N$. This suggests that as the system’s entropy\n",
    "capacity grows, it maintains a constant resolution while exponentially\n",
    "increasing the number of distinguishable states."
   ],
   "id": "ec71481a-5ea9-412d-8604-c78e5b71cb98"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dual Role of Parameters and Variables\n",
    "\n",
    "Each variable $Z_i$ is associated with a generator $H_i$, and a natural\n",
    "parameter $\\theta_i$. When we say a parameter $\\theta_i \\in X(t)$, we\n",
    "mean that the component of the system associated with $H_i$ is active at\n",
    "time $t$ and its parameter is evolving with\n",
    "$|\\dot{\\theta}_i| \\geq \\varepsilon$. This comes from the duality\n",
    "*variables*, *observables*, and *natural parameters* that we find in\n",
    "exponential family representations and we also see in a density matrix\n",
    "representation."
   ],
   "id": "338f58e6-68dc-4b5b-849f-8e692fa8f4bb"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Core Axiom: Entropic Dynamics\n",
    "\n",
    "Our core axiom is that the system evolves by steepest ascent in entropy.\n",
    "The gradient of the density matrix with respect to the natural\n",
    "parameters is given by $$\n",
    "\\nabla S[\\rho] = -G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\n",
    "$$ and so we set $$\n",
    "\\frac{d\\boldsymbol{\\theta}}{dt} = -G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\n",
    "$$"
   ],
   "id": "cf09fd17-0b5e-471c-b1f0-cf57acea4f33"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Histogram Game\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-histogram.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynes-world-histogram.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "To illustrate the concept of the Jaynes’ world entropy game we’ll run a\n",
    "simple example using a four bin histogram. The entropy of a four bin\n",
    "histogram can be computed as, $$\n",
    "S(p) = - \\sum_{i=1}^4 p_i \\log_2 p_i.\n",
    "$$"
   ],
   "id": "b8a180d9-345b-4f49-9dee-30417522eb93"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "f77eb203-bdf4-44c6-b2f3-cfcf1eb298e3"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we write some helper code to plot the histogram and compute its\n",
    "entropy."
   ],
   "id": "996f6e1e-6f08-4a95-9ea7-f69ae026260e"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot"
   ],
   "id": "4b060c7c-4670-44d3-857a-b70ee2e34fef"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_histogram(ax, p, max_height=None):\n",
    "    heights = p\n",
    "    if max_height is None:\n",
    "        max_height = 1.25*heights.max()\n",
    "    \n",
    "    # Safe entropy calculation that handles zeros\n",
    "    nonzero_p = p[p > 0]  # Filter out zeros\n",
    "    S = - (nonzero_p*np.log2(nonzero_p)).sum()\n",
    "\n",
    "    # Define bin edges\n",
    "    bins = [1, 2, 3, 4, 5]  # Bin edges\n",
    "\n",
    "    # Create the histogram\n",
    "    if ax is None:\n",
    "        fig, ax = plt.subplots(figsize=(6, 4))  # Adjust figure size \n",
    "    ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black') # Use weights for probabilities\n",
    "\n",
    "\n",
    "    # Customize the plot for better slide presentation\n",
    "    ax.set_xlabel(\"Bin\")\n",
    "    ax.set_ylabel(\"Probability\")\n",
    "    ax.set_title(f\"Four Bin Histogram (Entropy {S:.3f})\")\n",
    "    ax.set_xticks(bins[:-1]) # Show correct x ticks\n",
    "    ax.set_ylim(0,max_height) # Set y limit for visual appeal"
   ],
   "id": "bdffaa5d-c513-43b5-af2f-e16dc5dfe27b"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can compute the entropy of any given histogram."
   ],
   "id": "54039f0e-9704-48fd-958d-e459d8a6eabe"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Define probabilities\n",
    "p = np.zeros(4)\n",
    "p[0] = 4/13\n",
    "p[1] = 3/13\n",
    "p[2] = 3.7/13\n",
    "p[3] = 1 - p.sum()\n",
    "\n",
    "# Safe entropy calculation\n",
    "nonzero_p = p[p > 0]  # Filter out zeros\n",
    "entropy = - (nonzero_p*np.log2(nonzero_p)).sum()\n",
    "print(f\"The entropy of the histogram is {entropy:.3f}.\")"
   ],
   "id": "0c9f8d47-c87a-4839-8bfa-ee0bf6d2eda3"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "fc81b366-0bea-422a-9f61-89c014a56e98"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "fig.tight_layout()\n",
    "plot_histogram(ax, p)\n",
    "ax.set_title(f\"Four Bin Histogram (Entropy {entropy:.3f})\")\n",
    "mlai.write_figure(filename='four-bin-histogram.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "574b98cb-0a53-4229-8622-7e9de04f7544"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>The entropy of a four bin histogram.</i>\n",
    "\n",
    "We can play the entropy game by starting with a histogram with all the\n",
    "probability mass in the first bin and then ascending the gradient of the\n",
    "entropy function."
   ],
   "id": "bd271cd9-7181-40c5-bfca-0e14eda2d830"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Two-Bin Histogram Example\n",
    "\n",
    "The simplest possible example of Jaynes’ World is a two-bin histogram\n",
    "with probabilities $p$ and $1-p$. This minimal system allows us to\n",
    "visualize the entire entropy landscape.\n",
    "\n",
    "The natural parameter is the log odds, $\\theta = \\log\\frac{p}{1-p}$, and\n",
    "the update given by the entropy gradient is $$\n",
    "\\Delta \\theta_{\\text{steepest}} = \\eta \\frac{\\text{d}S}{\\text{d}\\theta} = \\eta p(1-p)(\\log(1-p) - \\log p).\n",
    "$$ The Fisher information is $$\n",
    "G(\\theta) = p(1-p)\n",
    "$$ This creates a dynamic where as $p$ approaches either 0 or 1 (minimal\n",
    "entropy states), the Fisher information approaches zero, creating a\n",
    "critical slowing” effect. This critical slowing is what leads to the\n",
    "formation of *information resevoirs*. Note also that in the *natural\n",
    "gradient* the updated is given by multiplying the gradient by the\n",
    "inverse Fisher information, which would lead to a more efficient update\n",
    "of the form, $$\n",
    "\\Delta \\theta_{\\text{natural}} =  \\eta(\\log(1-p) - \\log p),\n",
    "$$ however, it is this efficiency that we want our game to avoid,\n",
    "because it is the inefficient behaviour in the reagion of saddle points\n",
    "that leads to critical slowing and the emergence of information\n",
    "resevoirs."
   ],
   "id": "f14b5ef3-af6d-4020-9855-1f3968b8f4a7"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "480087bf-a012-42f0-8aa2-b2525b50d193"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Python code for gradients\n",
    "p_values = np.linspace(0.000001, 0.999999, 10000)\n",
    "theta_values = np.log(p_values/(1-p_values))\n",
    "entropy = -p_values * np.log(p_values) - (1-p_values) * np.log(1-p_values)\n",
    "fisher_info = p_values * (1-p_values)\n",
    "gradient = fisher_info * (np.log(1-p_values) - np.log(p_values))"
   ],
   "id": "7ceda2e4-d5bd-4241-8216-3705487b174b"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "1619b3c1-1096-4a5a-9069-53eae89e79b1"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=plot.big_wide_figsize)\n",
    "\n",
    "ax1.plot(theta_values, entropy)\n",
    "ax1.set_xlabel('$\\\\theta$')\n",
    "ax1.set_ylabel('Entropy $S(p)$')\n",
    "ax1.set_title('Entropy Landscape')\n",
    "\n",
    "ax2.plot(theta_values, gradient)\n",
    "ax2.set_xlabel('$\\\\theta$')\n",
    "ax2.set_ylabel('$\\\\nabla_\\\\theta S(p)$')\n",
    "ax2.set_title('Entropy Gradient vs. Position')\n",
    "\n",
    "mlai.write_figure(filename='two-bin-histogram-entropy-gradients.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "311c8163-59e2-4d07-afdc-80f720715eb3"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-histogram-entropy-gradients.svg\" class=\"\" width=\"95%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy gradients of the two bin histogram agains\n",
    "position.</i>\n",
    "\n",
    "This example reveals the entropy extrema at $p = 0$, $p = 0.5$, and\n",
    "$p = 1$. At minimal entropy ($p \\approx 0$ or $p \\approx 1$), the\n",
    "gradient approaches zero, creating natural information reservoirs. The\n",
    "dynamics slow dramatically near these points - these are the areas of\n",
    "critical slowing that create information reservoirs."
   ],
   "id": "f94bd519-6f18-4cfd-b491-0645c96e2b53"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient Ascent in Natural Parameter Space\n",
    "\n",
    "We can visualize the entropy maximization process by performing gradient\n",
    "ascent in the natural parameter space $\\theta$. Starting from a\n",
    "low-entropy state, we follow the gradient of entropy with respect to\n",
    "$\\theta$ to reach the maximum entropy state."
   ],
   "id": "30ab750c-e8a2-4351-ad92-64fb4adaf19d"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "fc719b12-21d7-484b-976b-fc4c0f48bf75"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper functions for two-bin histogram\n",
    "def theta_to_p(theta):\n",
    "    \"\"\"Convert natural parameter theta to probability p\"\"\"\n",
    "    return 1.0 / (1.0 + np.exp(-theta))\n",
    "\n",
    "def p_to_theta(p):\n",
    "    \"\"\"Convert probability p to natural parameter theta\"\"\"\n",
    "    # Add small epsilon to avoid numerical issues\n",
    "    p = np.clip(p, 1e-10, 1-1e-10)\n",
    "    return np.log(p/(1-p))\n",
    "\n",
    "def entropy(theta):\n",
    "    \"\"\"Compute entropy for given theta\"\"\"\n",
    "    p = theta_to_p(theta)\n",
    "    # Safe entropy calculation\n",
    "    return -p * np.log2(p) - (1-p) * np.log2(1-p)\n",
    "\n",
    "def entropy_gradient(theta):\n",
    "    \"\"\"Compute gradient of entropy with respect to theta\"\"\"\n",
    "    p = theta_to_p(theta)\n",
    "    return p * (1-p) * (np.log2(1-p) - np.log2(p))\n",
    "\n",
    "def plot_histogram(ax, theta, max_height=None):\n",
    "    \"\"\"Plot two-bin histogram for given theta\"\"\"\n",
    "    p = theta_to_p(theta)\n",
    "    heights = np.array([p, 1-p])\n",
    "    \n",
    "    if max_height is None:\n",
    "        max_height = 1.25\n",
    "    \n",
    "    # Compute entropy\n",
    "    S = entropy(theta)\n",
    "    \n",
    "    # Create the histogram\n",
    "    bins = [1, 2, 3]  # Bin edges\n",
    "    if ax is None:\n",
    "        fig, ax = plt.subplots(figsize=(6, 4))\n",
    "    ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black')\n",
    "    \n",
    "    # Customize the plot\n",
    "    ax.set_xlabel(\"Bin\")\n",
    "    ax.set_ylabel(\"Probability\")\n",
    "    ax.set_title(f\"Two-Bin Histogram (Entropy {S:.3f})\")\n",
    "    ax.set_xticks(bins[:-1])\n",
    "    ax.set_ylim(0, max_height)"
   ],
   "id": "49247dd0-e67b-4484-87da-bc653fe62424"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters for gradient ascent\n",
    "theta_initial = -9.0  # Start with low entropy \n",
    "learning_rate = 1\n",
    "num_steps = 1500\n",
    "\n",
    "# Initialize\n",
    "theta_current = theta_initial\n",
    "theta_history = [theta_current]\n",
    "p_history = [theta_to_p(theta_current)]\n",
    "entropy_history = [entropy(theta_current)]\n",
    "\n",
    "# Perform gradient ascent in theta space\n",
    "for step in range(num_steps):\n",
    "    # Compute gradient\n",
    "    grad = entropy_gradient(theta_current)\n",
    "    \n",
    "    # Update theta\n",
    "    theta_current = theta_current + learning_rate * grad\n",
    "    \n",
    "    # Store history\n",
    "    theta_history.append(theta_current)\n",
    "    p_history.append(theta_to_p(theta_current))\n",
    "    entropy_history.append(entropy(theta_current))\n",
    "    if step % 100 == 0:\n",
    "        print(f\"Step {step+1}: θ = {theta_current:.4f}, p = {p_history[-1]:.4f}, Entropy = {entropy_history[-1]:.4f}\")"
   ],
   "id": "9e0e5311-1bc3-49f7-83fb-0b20759e1727"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "a1d94eab-2dba-4869-b746-871660d88feb"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a figure showing the evolution\n",
    "fig, axes = plt.subplots(2, 3, figsize=(15, 8))\n",
    "fig.tight_layout(pad=3.0)\n",
    "\n",
    "# Select steps to display\n",
    "steps_to_show = [0, 300, 600, 900, 1200, 1500]\n",
    "\n",
    "# Plot histograms for selected steps\n",
    "for i, step in enumerate(steps_to_show):\n",
    "    row, col = i // 3, i % 3\n",
    "    plot_histogram(axes[row, col], theta_history[step])\n",
    "    axes[row, col].set_title(f\"Step {step}: θ = {theta_history[step]:.2f}, p = {p_history[step]:.3f}\")\n",
    "\n",
    "mlai.write_figure(filename='two-bin-histogram-evolution.svg', \n",
    "                  directory = './information-game')\n",
    "\n",
    "# Plot entropy evolution\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(range(num_steps+1), entropy_history, 'o-')\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Entropy')\n",
    "plt.title('Entropy Evolution During Gradient Ascent')\n",
    "plt.grid(True)\n",
    "mlai.write_figure(filename='two-bin-entropy-evolution.svg', \n",
    "                  directory = './information-game')\n",
    "\n",
    "# Plot trajectory in theta space\n",
    "plt.figure(figsize=(10, 6))\n",
    "theta_range = np.linspace(-5, 5, 1000)\n",
    "entropy_curve = [entropy(t) for t in theta_range]\n",
    "plt.plot(theta_range, entropy_curve, 'b-', label='Entropy Landscape')\n",
    "plt.plot(theta_history, entropy_history, 'ro-', label='Gradient Ascent Path')\n",
    "plt.xlabel('Natural Parameter θ')\n",
    "plt.ylabel('Entropy')\n",
    "plt.title('Gradient Ascent Trajectory in Natural Parameter Space')\n",
    "plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "mlai.write_figure(filename='two-bin-trajectory.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "d978bcc0-a6dc-492d-9b29-9885638db887"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-histogram-evolution.svg\" class=\"\" width=\"95%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Evolution of the two-bin histogram during gradient ascent in\n",
    "natural parameter space.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-entropy-evolution.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy evolution during gradient ascent for the two-bin\n",
    "histogram.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/two-bin-trajectory.svg\" class=\"\" width=\"80%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Gradient ascent trajectory in the natural parameter space for\n",
    "the two-bin histogram.</i>\n",
    "\n",
    "The gradient ascent visualization shows how the system evolves in the\n",
    "natural parameter space $\\theta$. Starting from a negative $\\theta$\n",
    "(corresponding to a low-entropy state with $p << 0.5$), the system\n",
    "follows the gradient of entropy with respect to $\\theta$ until it\n",
    "reaches $\\theta = 0$ (corresponding to $p = 0.5$), which is the maximum\n",
    "entropy state.\n",
    "\n",
    "Note that the maximum entropy occurs at $\\theta = 0$, which corresponds\n",
    "to $p = 0.5$. The gradient of entropy with respect to $\\theta$ is zero\n",
    "at this point, making it a stable equilibrium for the gradient ascent\n",
    "process."
   ],
   "id": "2cdbadce-31ae-4ec4-b303-bdf40391cdea"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Four Bin Histogram Entropy Game\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "To do this we represent the histogram parameters as a vector of length\n",
    "4, $\\mathbf{ w}{\\lambda} = [\\lambda_1, \\lambda_2, \\lambda_3, \\lambda_4]$\n",
    "and define the histogram probabilities to be\n",
    "$p_i = \\lambda_i^2 / \\sum_{j=1}^4 \\lambda_j^2$."
   ],
   "id": "ed240e1c-0a4d-46ce-b09e-3aa9e843f408"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "4db1ee82-7fbb-43f4-bb89-2330f5ac1a8d"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the entropy function \n",
    "def entropy(lambdas):\n",
    "    p = lambdas**2/(lambdas**2).sum()\n",
    "    \n",
    "    # Safe entropy calculation\n",
    "    nonzero_p = p[p > 0]\n",
    "    nonzero_lambdas = lambdas[p > 0]\n",
    "    return np.log2(np.sum(lambdas**2))-np.sum(nonzero_p * np.log2(nonzero_lambdas**2))\n",
    "\n",
    "# Define the gradient of the entropy function\n",
    "def entropy_gradient(lambdas):\n",
    "    denominator = np.sum(lambdas**2)\n",
    "    p = lambdas**2/denominator\n",
    "    \n",
    "    # Safe log calculation\n",
    "    log_terms = np.zeros_like(lambdas)\n",
    "    nonzero_idx = lambdas != 0\n",
    "    log_terms[nonzero_idx] = np.log2(np.abs(lambdas[nonzero_idx]))\n",
    "    \n",
    "    p_times_lambda_entropy = -2*log_terms/denominator\n",
    "    const = (p*p_times_lambda_entropy).sum()\n",
    "    gradient = 2*lambdas*(p_times_lambda_entropy - const)\n",
    "    return gradient\n",
    "\n",
    "# Numerical gradient check\n",
    "def numerical_gradient(func, lambdas, h=1e-5):\n",
    "    numerical_grad = np.zeros_like(lambdas)\n",
    "    for i in range(len(lambdas)):\n",
    "        temp_lambda_plus = lambdas.copy()\n",
    "        temp_lambda_plus[i] += h\n",
    "        temp_lambda_minus = lambdas.copy()\n",
    "        temp_lambda_minus[i] -= h\n",
    "        numerical_grad[i] = (func(temp_lambda_plus) - func(temp_lambda_minus)) / (2 * h)\n",
    "    return numerical_grad"
   ],
   "id": "e5261ae4-8676-4a18-b503-2eef178688e2"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then ascend the gradeint of the entropy function, starting at a\n",
    "parameter setting where the mass is placed in the first bin, we take\n",
    "$\\lambda_2 = \\lambda_3 = \\lambda_4 = 0.01$ and $\\lambda_1 = 100$.\n",
    "\n",
    "First to check our code we compare our numerical and analytic gradients."
   ],
   "id": "70533d82-c34d-4294-b068-a9f7b3990f05"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "6ad0719a-5097-4076-831e-3efbf02bb898"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial parameters (lambda)\n",
    "initial_lambdas = np.array([100, 0.01, 0.01, 0.01])\n",
    "\n",
    "# Gradient check\n",
    "numerical_grad = numerical_gradient(entropy, initial_lambdas)\n",
    "analytical_grad = entropy_gradient(initial_lambdas)\n",
    "print(\"Numerical Gradient:\", numerical_grad)\n",
    "print(\"Analytical Gradient:\", analytical_grad)\n",
    "print(\"Gradient Difference:\", np.linalg.norm(numerical_grad - analytical_grad))  # Check if close to zero"
   ],
   "id": "0332130d-f82f-4aec-9223-4679bdc3a587"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can run the steepest ascent algorithm."
   ],
   "id": "aa7209b2-31ea-44dc-b064-1ff81c71acac"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "0d21fe72-3e1c-4fba-b4aa-9f0ad4b7bc67"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Steepest ascent algorithm\n",
    "lambdas = initial_lambdas.copy()\n",
    "\n",
    "learning_rate = 1\n",
    "turns = 15000\n",
    "entropy_values = []\n",
    "lambdas_history = []\n",
    "\n",
    "for _ in range(turns):\n",
    "    grad = entropy_gradient(lambdas)\n",
    "    lambdas += learning_rate * grad # update lambda for steepest ascent\n",
    "    entropy_values.append(entropy(lambdas))\n",
    "    lambdas_history.append(lambdas.copy())"
   ],
   "id": "80e39ab1-9534-4d88-a620-2d47f1d6e74a"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can plot the histogram at a set of chosen turn numbers to see the\n",
    "progress of the algorithm."
   ],
   "id": "e9e70506-9986-4d0a-9f41-ecf16e2d11e5"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "888e21b1-6718-4b93-acd7-7656955316d4"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "plot_at = [0, 100, 1000, 2500, 5000, 7500, 10000, 12500, turns-1]\n",
    "for i, iter in enumerate(plot_at):\n",
    "    plot_histogram(ax, lambdas_history[i]**2/(lambdas_history[i]**2).sum(), 1)\n",
    "    # write the figure,\n",
    "    mlai.write_figure(filename=f'four-bin-histogram-turn-{i:02d}.svg', \n",
    "                      directory = './information-game')"
   ],
   "id": "4a705b81-d239-47a2-a5f8-2f7c1e972961"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import notutils as nu\n",
    "from ipywidgets import IntSlider"
   ],
   "id": "b4b4cd37-756b-4c37-bc55-c8c90110664b"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nu.display_plots('two_point_sample{sample:0>3}.svg', \n",
    "                            './information-game', \n",
    "                            sample=IntSlider(5, 5, 5, 1))"
   ],
   "id": "376a95d6-fdc8-4f72-9e7f-db13fccbac88"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### \n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-00.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-02.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-04.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-06.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\"><img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-turn-08.svg\" class=\"\" width=\"20%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Intermediate stages of the histogram entropy game. After 0,\n",
    "1000, 5000, 10000 and 15000 iterations.</i>\n",
    "\n",
    "And we can also plot the changing entropy as a function of the number of\n",
    "game turns."
   ],
   "id": "76345332-3958-4e36-bea2-8e94fb35464a"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=plot.big_wide_figsize)\n",
    "ax.plot(range(turns), entropy_values)\n",
    "ax.set_xlabel(\"turns\")\n",
    "ax.set_ylabel(\"entropy\")\n",
    "ax.set_title(\"Entropy vs. turns (Steepest Ascent)\")\n",
    "mlai.write_figure(filename='four-bin-histogram-entropy-vs-turns.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "197a40d3-c3b6-4301-a1b4-8ebe2eabc6de"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-histogram-entropy-vs-turns.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Four bin histogram entropy game. The plot shows the\n",
    "increasing entropy against the number of turns across 15000 iterations\n",
    "of gradient ascent.</i>\n",
    "\n",
    "Note that the entropy starts at a saddle point, increaseases rapidly,\n",
    "and the levels off towards the maximum entropy, with the gradient\n",
    "decreasing slowly in the manner of Zeno’s paradox."
   ],
   "id": "507ebab2-99d9-430b-8f79-cdfdf9bf403e"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Constructed Quantities and Lemmas"
   ],
   "id": "10dc8a41-2c2b-41b3-b9cd-26742fb240dc"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Variable Partition\n",
    "\n",
    "$$\n",
    "X(t) = \\left\\{ i \\mid \\left| \\frac{\\text{d}\\theta_i}{\\text{d}t} \\right| \\geq \\varepsilon \\right\\}, \\quad M(t) = Z \\setminus X(t)\n",
    "$$"
   ],
   "id": "779e993d-4228-44e7-bd48-c3bc4d616bf2"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fisher Information Matrix Partitioning\n",
    "\n",
    "We partition the Fisher Information Matrix $G(\\boldsymbol{\\theta})$\n",
    "according to the active variables $X(t)$ and latent information\n",
    "reservoir $M(t)$: $$\n",
    "G(\\boldsymbol{\\theta}) = \n",
    "\\begin{bmatrix}\n",
    "G_{XX} & G_{XM} \\\\\n",
    "G_{MX} & G_{MM}\n",
    "\\end{bmatrix}\n",
    "$$ where $G_{XX}$ represents the information geometry within active\n",
    "variables, $G_{MM}$ within the latent reservoir, and\n",
    "$G_{XM} = G_{MX}^\\top$ captures the cross-coupling between active and\n",
    "latent components. This partitioning reveals how information flows\n",
    "between observable dynamics and the latent structure."
   ],
   "id": "e76ee6d4-f47f-4ca1-93c7-88856a7437db"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lemma 1: Form of the Minimal Entropy Configuration\n",
    "\n",
    "The minimal-entropy state compatible with the system’s resolution\n",
    "constraint and regularity condition is represented by a density matrix\n",
    "of the exponential form, $$\n",
    "\\rho(\\boldsymbol{\\theta}_o) = \\frac{1}{Z(\\boldsymbol{\\theta}_o)} \\exp\\left( \\sum_i \\theta_{oi} H_i \\right),\n",
    "$$ where all components $\\theta_{oi}$ are sub-threshold $$\n",
    "|\\dot{\\theta}_{oi}| < \\varepsilon.\n",
    "$$ This state minimizes entropy under the constraint that it remains\n",
    "regular, continuous, and detectable only above a resolution scale \\$\\$.\n",
    "Its structure can be derived via a *minimum-entropy* analogue of Jaynes’\n",
    "formalism, using the same density matrix geometry but inverted\n",
    "optimization."
   ],
   "id": "778fc968-a3a5-4076-81e0-42424857b2dd"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lemma 2: Symmetry Breaking\n",
    "\n",
    "If $\\theta_k \\in M(t)$ and $|\\dot{\\theta}_k| \\geq \\varepsilon$, then $$\n",
    "\\theta_k \\in X(t + \\delta t).\n",
    "$$"
   ],
   "id": "be433c62-7939-4741-bf94-dd04ce3bc55f"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Four-Bin Saddle Point Example\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-saddle-example.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/four-bin-saddle-example.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "To illustrate saddle points and information reservoirs, we need at least\n",
    "a 4-bin system. This creates a 3-dimensional parameter space where we\n",
    "can observe genuine saddle points.\n",
    "\n",
    "Consider a 4-bin system parameterized by natural parameters $\\theta_1$,\n",
    "$\\theta_2$, and $\\theta_3$ (with one constraint). A saddle point occurs\n",
    "where the gradient $\\nabla_\\theta S = 0$, but the Hessian has mixed\n",
    "eigenvalues - some positive, some negative.\n",
    "\n",
    "At these points, the Fisher information matrix $G(\\theta)$\n",
    "eigendecomposition reveals.\n",
    "\n",
    "-   Fast modes: large positive eigenvalues → rapid evolution\n",
    "-   Slow modes: small positive eigenvalues → gradual evolution\n",
    "-   Critical modes: near-zero eigenvalues → information reservoirs\n",
    "\n",
    "The eigenvectors of $G(\\theta)$ at the saddle point determine which\n",
    "parameter combinations form information reservoirs."
   ],
   "id": "abb7bd70-1354-4e9b-976f-c7bee83893b8"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ],
   "id": "67df517a-350f-4fe8-b612-00a41461e578"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exponential family entropy functions for 4-bin system\n",
    "def exponential_family_entropy(theta):\n",
    "    \"\"\"\n",
    "    Compute entropy of a 4-bin exponential family distribution\n",
    "    parameterized by natural parameters theta\n",
    "    \"\"\"\n",
    "    # Compute the log-partition function (normalization constant)\n",
    "    log_Z = np.log(np.sum(np.exp(theta)))\n",
    "    \n",
    "    # Compute probabilities\n",
    "    p = np.exp(theta - log_Z)\n",
    "    \n",
    "    # Compute entropy: -sum(p_i * log(p_i))\n",
    "    entropy = -np.sum(p * np.log(p), where=p>0)\n",
    "    \n",
    "    return entropy\n",
    "\n",
    "def entropy_gradient(theta):\n",
    "    \"\"\"\n",
    "    Compute the gradient of the entropy with respect to theta\n",
    "    \"\"\"\n",
    "    # Compute the log-partition function (normalization constant)\n",
    "    log_Z = np.log(np.sum(np.exp(theta)))\n",
    "    \n",
    "    # Compute probabilities\n",
    "    p = np.exp(theta - log_Z)\n",
    "    \n",
    "    # Gradient is theta times the second derivative of log partition function\n",
    "    return -p*theta + p*(np.dot(p, theta))\n",
    "\n",
    "# Add a gradient check function\n",
    "def check_gradient(theta, epsilon=1e-6):\n",
    "    \"\"\"\n",
    "    Check the analytical gradient against numerical gradient\n",
    "    \"\"\"\n",
    "    # Compute analytical gradient\n",
    "    analytical_grad = entropy_gradient(theta)\n",
    "    \n",
    "    # Compute numerical gradient\n",
    "    numerical_grad = np.zeros_like(theta)\n",
    "    for i in range(len(theta)):\n",
    "        theta_plus = theta.copy()\n",
    "        theta_plus[i] += epsilon\n",
    "        entropy_plus = exponential_family_entropy(theta_plus)\n",
    "        \n",
    "        theta_minus = theta.copy()\n",
    "        theta_minus[i] -= epsilon\n",
    "        entropy_minus = exponential_family_entropy(theta_minus)\n",
    "        \n",
    "        numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)\n",
    "    \n",
    "    # Compare\n",
    "    print(\"Analytical gradient:\", analytical_grad)\n",
    "    print(\"Numerical gradient:\", numerical_grad)\n",
    "    print(\"Difference:\", np.abs(analytical_grad - numerical_grad))\n",
    "    \n",
    "    return analytical_grad, numerical_grad\n",
    "\n",
    "# Project gradient to respect constraints (sum of theta is constant)\n",
    "def project_gradient(theta, grad):\n",
    "    \"\"\"\n",
    "    Project gradient to ensure sum constraint is respected\n",
    "    \"\"\"\n",
    "    # Project to space where sum of components is zero\n",
    "    return grad - np.mean(grad)\n",
    "\n",
    "# Perform gradient ascent on entropy\n",
    "def gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1):\n",
    "    \"\"\"\n",
    "    Perform gradient ascent on entropy for 4-bin system\n",
    "    \"\"\"\n",
    "    theta = theta_init.copy()\n",
    "    theta_history = [theta.copy()]\n",
    "    entropy_history = [exponential_family_entropy(theta)]\n",
    "    \n",
    "    for _ in range(steps):\n",
    "        # Compute gradient\n",
    "        grad = entropy_gradient(theta)\n",
    "        proj_grad = project_gradient(theta, grad)\n",
    "        \n",
    "        # Update parameters\n",
    "        theta += learning_rate * proj_grad\n",
    "        \n",
    "        # Store history\n",
    "        theta_history.append(theta.copy())\n",
    "        entropy_history.append(exponential_family_entropy(theta))\n",
    "    \n",
    "    return np.array(theta_history), np.array(entropy_history)"
   ],
   "id": "ef9703ea-2c96-4df0-aedb-72e8193bce19"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test the gradient calculation\n",
    "test_theta = np.array([0.5, -0.3, 0.1, -0.3])\n",
    "test_theta = test_theta - np.mean(test_theta)  # Ensure constraint is satisfied\n",
    "print(\"Testing gradient calculation:\")\n",
    "analytical_grad, numerical_grad = check_gradient(test_theta)\n",
    "\n",
    "# Verify if we're ascending or descending\n",
    "entropy_before = exponential_family_entropy(test_theta)\n",
    "step_size = 0.01\n",
    "test_theta_after = test_theta + step_size * analytical_grad\n",
    "entropy_after = exponential_family_entropy(test_theta_after)\n",
    "print(f\"Entropy before step: {entropy_before}\")\n",
    "print(f\"Entropy after step: {entropy_after}\")\n",
    "print(f\"Change in entropy: {entropy_after - entropy_before}\")\n",
    "if entropy_after > entropy_before:\n",
    "    print(\"We are ascending the entropy gradient\")\n",
    "else:\n",
    "    print(\"We are descending the entropy gradient\")"
   ],
   "id": "5de283ef-2aad-426f-8b09-d1cfb41e4964"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize with asymmetric distribution (away from saddle point)\n",
    "theta_init = np.array([1.0, -0.5, -0.2, -0.3])\n",
    "theta_init = theta_init - np.mean(theta_init)  # Ensure constraint is satisfied\n",
    "\n",
    "# Run gradient ascent\n",
    "theta_history, entropy_history = gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1.0)\n",
    "\n",
    "# Create a grid for visualization\n",
    "x = np.linspace(-2, 2, 100)\n",
    "y = np.linspace(-2, 2, 100)\n",
    "X, Y = np.meshgrid(x, y)\n",
    "\n",
    "# Compute entropy at each grid point (with constraint on theta3 and theta4)\n",
    "Z = np.zeros_like(X)\n",
    "for i in range(X.shape[0]):\n",
    "    for j in range(X.shape[1]):\n",
    "        # Create full theta vector with constraint that sum is zero\n",
    "        theta1, theta2 = X[i,j], Y[i,j]\n",
    "        theta3 = -0.5 * (theta1 + theta2)\n",
    "        theta4 = -0.5 * (theta1 + theta2)\n",
    "        theta = np.array([theta1, theta2, theta3, theta4])\n",
    "        Z[i,j] = exponential_family_entropy(theta)\n",
    "\n",
    "# Compute gradient field\n",
    "dX = np.zeros_like(X)\n",
    "dY = np.zeros_like(Y)\n",
    "for i in range(X.shape[0]):\n",
    "    for j in range(X.shape[1]):\n",
    "        # Create full theta vector with constraint\n",
    "        theta1, theta2 = X[i,j], Y[i,j]\n",
    "        theta3 = -0.5 * (theta1 + theta2)\n",
    "        theta4 = -0.5 * (theta1 + theta2)\n",
    "        theta = np.array([theta1, theta2, theta3, theta4])\n",
    "        \n",
    "        # Get full gradient and project\n",
    "        grad = entropy_gradient(theta)\n",
    "        proj_grad = project_gradient(theta, grad)\n",
    "        \n",
    "        # Store first two components\n",
    "        dX[i,j] = proj_grad[0]\n",
    "        dY[i,j] = proj_grad[1]\n",
    "\n",
    "# Normalize gradient vectors for better visualization\n",
    "norm = np.sqrt(dX**2 + dY**2)\n",
    "# Avoid division by zero\n",
    "norm = np.where(norm < 1e-10, 1e-10, norm)\n",
    "dX_norm = dX / norm\n",
    "dY_norm = dY / norm\n",
    "\n",
    "# A few gradient vectors for visualization\n",
    "stride = 10"
   ],
   "id": "f0067769-a055-4742-8477-c8f8c14e2908"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import mlai.plot as plot\n",
    "import mlai"
   ],
   "id": "15fa8092-5d46-4428-91f9-b1ac1f5ccc8d"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=plot.big_wide_figsize)\n",
    "\n",
    "# Create contour lines only (no filled contours)\n",
    "contours = plt.contour(X, Y, Z, levels=15, colors='black', linewidths=0.8)\n",
    "plt.clabel(contours, inline=True, fontsize=8, fmt='%.2f')\n",
    "\n",
    "# Add gradient vectors (normalized for direction, but scaled by magnitude for visibility)\n",
    "plt.quiver(X[::stride, ::stride], Y[::stride, ::stride], \n",
    "           dX_norm[::stride, ::stride], dY_norm[::stride, ::stride], \n",
    "           color='r', scale=30, width=0.003, scale_units='width')\n",
    "\n",
    "# Plot the gradient ascent trajectory\n",
    "plt.plot(theta_history[:, 0], theta_history[:, 1], 'b-', linewidth=2, \n",
    "         label='Gradient Ascent Path')\n",
    "plt.scatter(theta_history[0, 0], theta_history[0, 1], color='green', s=100, \n",
    "           marker='o', label='Start')\n",
    "plt.scatter(theta_history[-1, 0], theta_history[-1, 1], color='purple', s=100, \n",
    "           marker='*', label='End')\n",
    "\n",
    "# Add labels and title\n",
    "plt.xlabel('$\\\\theta_1$')\n",
    "plt.ylabel('$\\\\theta_2$')\n",
    "plt.title('Entropy Contours with Gradient Field')\n",
    "\n",
    "# Mark the saddle point (approximately at origin for this system)\n",
    "plt.scatter([0], [0], color='yellow', s=100, marker='*', \n",
    "            edgecolor='black', zorder=10, label='Saddle Point')\n",
    "plt.legend()\n",
    "\n",
    "mlai.write_figure(filename='simplified-saddle-point-example.svg', \n",
    "                  directory = './information-game')\n",
    "\n",
    "# Plot entropy evolution during gradient ascent\n",
    "plt.figure(figsize=plot.big_figsize)\n",
    "plt.plot(entropy_history)\n",
    "plt.xlabel('Gradient Ascent Step')\n",
    "plt.ylabel('Entropy')\n",
    "plt.title('Entropy Evolution During Gradient Ascent')\n",
    "plt.grid(True)\n",
    "mlai.write_figure(filename='four-bin-entropy-evolution.svg', \n",
    "                  directory = './information-game')"
   ],
   "id": "0371b83d-6a92-4eaa-90df-c209900aa89c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/simplified-saddle-point-example.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Visualisation of a saddle point projected down to two\n",
    "dimensions.</i>\n",
    "\n",
    "<img src=\"https://inverseprobability.com/talks/../slides/diagrams//information-game/four-bin-entropy-evolution.svg\" class=\"\" width=\"70%\" style=\"vertical-align:middle;\">\n",
    "\n",
    "Figure: <i>Entropy evolution during gradient ascent on the four-bin\n",
    "system.</i>\n",
    "\n",
    "The animation of system evolution would show initial rapid movement\n",
    "along high-eigenvalue directions, progressive slowing in directions with\n",
    "low eigenvalues and formation of information reservoirs in the\n",
    "critically slowed directions. Parameter-capacity uncertainty emerges\n",
    "naturally at the saddle point."
   ],
   "id": "f74a54ff-7eb5-4820-aa95-615ce64f16ed"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Entropy-Time\n",
    "\n",
    "$$\n",
    "\\tau(t) := S_{X(t)}(t)\n",
    "$$"
   ],
   "id": "ca4062b5-026b-475b-8058-a6564cdd023a"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Lemma 3: Monotonicity of Entropy-Time\n",
    "\n",
    "$$\n",
    "\\tau(t_2) \\geq \\tau(t_1) \\quad \\text{for all } t_2 > t_1\n",
    "$$"
   ],
   "id": "0d0783c8-d86f-4b74-94fe-b3cd7579a630"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Corollary: Irreversibility\n",
    "\n",
    "$\\tau(t)$ increases monotonically, preventing time-reversal globally."
   ],
   "id": "e32cafab-4bf4-4cf1-a827-f8c83f94de68"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conjecture: Frieden-Analogous Extremal Flow\n",
    "\n",
    "At points where the latent-to-active flow functional is locally extremal\n",
    "(e.g., \\$ \\$), the system may exhibit critical slowing where information\n",
    "resevoir variables are slow relative to active variables. It may be\n",
    "possible to separate the system entropy into active variables and,\n",
    "$I = S[\\rho_X]$ and “intrinsic information” $J= S[\\rho_{X|M}]$ allowing\n",
    "us to create an information analogous to B. Roy Frieden’s extreme\n",
    "physical information (Frieden (1998)) which allows derivation of locally\n",
    "valid differential equations that depend on the *information\n",
    "topography*."
   ],
   "id": "856ce68d-cd41-4abf-a518-753833e76ed6"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Thanks!\n",
    "\n",
    "For more information on these subjects and more you might want to check\n",
    "the following resources.\n",
    "\n",
    "-   book: [The Atomic\n",
    "    Human](https://www.penguin.co.uk/books/455130/the-atomic-human-by-lawrence-neil-d/9780241625248)\n",
    "-   twitter: [@lawrennd](https://twitter.com/lawrennd)\n",
    "-   podcast: [The Talking Machines](http://thetalkingmachines.com)\n",
    "-   newspaper: [Guardian Profile\n",
    "    Page](http://www.theguardian.com/profile/neil-lawrence)\n",
    "-   blog:\n",
    "    [http://inverseprobability.com](http://inverseprobability.com/blog.html)"
   ],
   "id": "2bca1bae-499c-4ac5-8779-e2b2daa86fc4"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Appendix"
   ],
   "id": "9bd98306-8c89-4021-b42b-f847c9c3847d"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variational Derivation of the Initial Curvature Structure\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/minimal-entropy-density-matrix-intro.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/minimal-entropy-density-matrix-intro.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "We will determine constraints on the Fisher Information Matrix\n",
    "$G(\\boldsymbol{\\theta})$ that are consistent with the system’s unfolding\n",
    "rules and internal information geometry. We follow Jaynes (Jaynes, 1957)\n",
    "in solving a variational problem that captures the allowed structure of\n",
    "the system’s origin (minimal entropy) state.\n",
    "\n",
    "Hirschman Jr (1957) established a connection between entropy and the\n",
    "Fourier transform, showing that the entropy of a function and its\n",
    "Fourier transform cannot both be arbitrarily small. This result, known\n",
    "as the Hirschman uncertainty principle, was later strengthened by\n",
    "Beckner (Beckner, 1975) who derived the optimal constant in the\n",
    "inequality. Białynicki-Birula and Mycielski (1975) extended these ideas\n",
    "to derive uncertainty relations for information entropy in wave\n",
    "mechanics.\n",
    "\n",
    "From these results we know that there are fundamental limits to how we\n",
    "express the entropy of position and its conjugate space simultaneously.\n",
    "These limits inspire us to focus on the *von Neumann entropy* so that\n",
    "our system respects the Hirschman uncertainty principle.\n",
    "\n",
    "A density matrix has the form $$\n",
    "\\rho(\\boldsymbol{\\theta}) = \\frac{1}{Z(\\boldsymbol{\\theta})} \\exp\\left( \\sum_i \\theta_i H_i \\right)\n",
    "$$ where\n",
    "$Z(\\boldsymbol{\\theta}) = \\mathrm{tr}\\left[\\exp\\left( \\sum_i \\theta_i H_i \\right)\\right]$\n",
    "and $\\boldsymbol{\\theta} \\in \\mathbb{R}^d$, $H_i$ are Hermitian\n",
    "observables.\n",
    "\n",
    "The von Neumman entropy is given by $$\n",
    "S[\\rho] = -\\text{tr} (\\rho \\log \\rho)\n",
    "$$\n",
    "\n",
    "We now derive the minimal entropy configuration inspired by Jaynes’s\n",
    "*free-form* variational approach. This enables us to derive the form of\n",
    "the *density matrix* directly from information-theoretic constraints\n",
    "(Jaynes, 1963)."
   ],
   "id": "0cd0d5c9-f1f0-4baa-b310-2024383995ac"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Jaynesian Derivation of Minimal Entropy Configuration\n",
    "\n",
    "<span class=\"editsection-bracket\"\n",
    "style=\"\">\\[</span><span class=\"editsection\"\n",
    "style=\"\"><a href=\"https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynesian-derivation-minimal-entropy.md\" target=\"_blank\" onclick=\"ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/snippets/edit/main/_information-game/includes/jaynesian-derivation-minimal-entropy.md', 13);\">edit</a></span><span class=\"editsection-bracket\" style=\"\">\\]</span>\n",
    "\n",
    "Jaynes suggested that statistical mechanics problems should be treated\n",
    "as problems of inference. Assign the probability distribution (or\n",
    "density matrix) that is maximally noncommittal with respect to missing\n",
    "information, subject to known constraints.\n",
    "\n",
    "While Jaynes applied this idea to derive the *maximum entropy*\n",
    "configuration given constraints, here we adapt it to derive the *minimum\n",
    "entropy* configuration, under an assumption of zero initial entropy\n",
    "bounded by a maximum entropy of $N$ bits.\n",
    "\n",
    "Let $\\rho$ be a density matrix describing the state of a system. The von\n",
    "Neumann entropy is, $$\n",
    "S[\\rho] = -\\mathrm{tr}(\\rho \\log \\rho),\n",
    "$$ we wish to *minimize* $S[\\rho]$, subject to constraints that encode\n",
    "the resolution bounds.\n",
    "\n",
    "In the game we assume that the system begins in a state of minimal\n",
    "entropy, the state cannot be a delta function (no singularities, so it\n",
    "must obey a resolution constraint $\\varepsilon$) and the entropy is\n",
    "bounded above by $N$ bits: $S[\\rho] \\leq N$.\n",
    "\n",
    "We apply a variational principle where we minimise $$\n",
    "S[\\rho] = -\\mathrm{tr}(\\rho \\log \\rho)\n",
    "$$ subject to constraints."
   ],
   "id": "6ca56102-a181-4688-a9c9-e067301f92b9"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Constraints\n",
    "\n",
    "1.  The first constraint is normalization, $\\mathrm{tr}(\\rho) = 1$.\n",
    "\n",
    "2.  The resolution constraint is motivated by the entropy being\n",
    "    constrained to be, $$\n",
    "    S[\\rho] \\leq N\n",
    "    $$ with the bound saturated only when the system is at maximum\n",
    "    entropy. This implies that the system is finite in resolution. To\n",
    "    reflect this we introduce a resolution constraint,\n",
    "    $\\mathrm{tr}(\\rho \\hat{Z}^2) \\geq \\epsilon^2$. And/or\n",
    "    $\\mathrm{tr}(\\rho \\hat{P}^2) \\geq \\delta^2$, and other optional\n",
    "    moment or dual-space constraints.\n",
    "\n",
    "We introduce Lagrange multipliers $\\lambda_0$, $\\lambda_z$, $\\lambda_p$\n",
    "for these constraints and define the Lagrangian $$\n",
    "\\mathcal{L}[\\rho] = -\\mathrm{tr}(\\rho \\log \\rho)\n",
    "+ \\lambda_0 (\\mathrm{tr}(\\rho) - 1)\n",
    "- \\lambda_x (\\mathrm{tr}(\\rho \\hat{Z}^2) - \\epsilon^2)\n",
    "- \\lambda_p (\\mathrm{tr}(\\rho \\hat{P}^2) - \\delta^2).\n",
    "$$\n",
    "\n",
    "To find the extremum, we take the functional derivative and set it to\n",
    "zero, $$\n",
    "\\frac{\\delta \\mathcal{L}}{\\delta \\rho} = -\\log \\rho - 1 - \\lambda_x \\hat{Z}^2 - \\lambda_p \\hat{P}^2 + \\lambda_0 = 0\n",
    "$$ and solving for $\\rho$ gives $$\n",
    "\\rho = \\frac{1}{Z} \\exp\\left(-\\lambda_z \\hat{Z}^2 - \\lambda_p \\hat{P}^2\\right)\n",
    "$$ where the partition function (which ensures normalisation) is $$\n",
    "Z = \\mathrm{tr}\\left[\\exp\\left(-\\lambda_z \\hat{Z}^2 - \\lambda_p \\hat{P}^2\\right)\\right]\n",
    "$$ This is a Gaussian state for a density matrix, which is consistent\n",
    "with the minimum entropy distribution under uncertainty constraints.\n",
    "\n",
    "The Lagrange multipliers $\\lambda_z, \\lambda_p$ enforce lower bounds on\n",
    "variance. These define the natural parameters as $\\theta_z = -\\lambda_z$\n",
    "and $\\theta_p = -\\lambda_p$ in the exponential family form\n",
    "$\\rho(\\boldsymbol{\\theta}) \\propto \\exp(\\boldsymbol{\\theta} \\cdot \\mathbf{H})$.\n",
    "The form of $\\rho$ is a density matrix. The curvature (second\n",
    "derivative) of $\\log Z(\\boldsymbol{\\theta})$ gives the Fisher\n",
    "Information matrix $G(\\boldsymbol{\\theta})$. Steepest ascent\n",
    "trajectories in $\\boldsymbol{\\theta}$ space will trace the system’s\n",
    "entropy dynamics.\n",
    "\n",
    "Next we compute $G(\\boldsymbol{\\theta})$ from\n",
    "$\\log Z(\\boldsymbol{\\theta})$ to explore the information geometry. From\n",
    "this we should verify that the following conditions hold, $$\n",
    "\\left| \\left[G(\\boldsymbol{\\theta}) \\boldsymbol{\\theta}\\right]_i \\right| < \\varepsilon \\quad \\text{for all } i\n",
    "$$ which implies that all variables remain latent at initialization.\n",
    "\n",
    "The Hermitians have a *non-commuting observable pair* constraint, $$\n",
    "  [H_i, H_j] \\neq 0,\n",
    "$$ which is equivalent to an *uncertainty relation*, $$\n",
    "  \\mathrm{Var}(H_i) \\cdot \\mathrm{Var}(H_j) \\geq C > 0,\n",
    "$$ and ensures that we have *bounded curvature* $$\n",
    "\\mathrm{tr}(G(\\boldsymbol{\\theta})) \\geq \\gamma > 0.\n",
    "$$\n",
    "\n",
    "We can then use $\\varepsilon$ and $N$ to define initial thresholds and\n",
    "maximum resolution and examine how variables decouple and how\n",
    "saddle-like regions emerge as the landscape unfolds through gradient\n",
    "ascent.\n",
    "\n",
    "This constrained minimization problem yields the *structure of the\n",
    "initial density matrix* $\\rho(\\boldsymbol{\\theta}_o)$ and the\n",
    "*permissible curvature geometry* $G(\\boldsymbol{\\theta}_o)$ and a\n",
    "constraint-consistent basis of observables $\\{H_i\\}$ that have a\n",
    "quadratic form. This ensures the system begins in a *regular, latent,\n",
    "low-entropy state*.\n",
    "\n",
    "This is the configuration from which entropy ascent and\n",
    "symmetry-breaking transitions emerge."
   ],
   "id": "14fd2e10-418e-4dd4-9ce0-808605a617ac"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References"
   ],
   "id": "95e05d0f-2f55-4489-8427-2fc1b418815c"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Barato, A.C., Seifert, U., 2014. Stochastic thermodynamics with\n",
    "information reservoirs. Physical Review E 90, 042150.\n",
    "<https://doi.org/10.1103/PhysRevE.90.042150>\n",
    "\n",
    "Beckner, W., 1975. Inequalities in Fourier analysis. Annals of\n",
    "Mathematics 159–182. <https://doi.org/10.2307/1970980>\n",
    "\n",
    "Białynicki-Birula, I., Mycielski, J., 1975. Uncertainty relations for\n",
    "information entropy in wave mechanics. Communications in Mathematical\n",
    "Physics 44, 129–132. <https://doi.org/10.1007/BF01608825>\n",
    "\n",
    "Frieden, B.R., 1998. Physics from fisher information: A unification.\n",
    "Cambridge University Press, Cambridge, UK.\n",
    "<https://doi.org/10.1017/CBO9780511622670>\n",
    "\n",
    "Hirschman Jr, I.I., 1957. A note on entropy. American Journal of\n",
    "Mathematics 79, 152–156. <https://doi.org/10.2307/2372390>\n",
    "\n",
    "Jaynes, E.T., 1963. Information theory and statistical mechanics, in:\n",
    "Ford, K.W. (Ed.), Brandeis University Summer Institute Lectures in\n",
    "Theoretical Physics, Vol. 3: Statistical Physics. W. A. Benjamin, Inc.,\n",
    "New York, pp. 181–218.\n",
    "\n",
    "Jaynes, E.T., 1957. Information theory and statistical mechanics.\n",
    "Physical Review 106, 620–630. <https://doi.org/10.1103/PhysRev.106.620>\n",
    "\n",
    "Parrondo, J.M.R., Horowitz, J.M., Sagawa, T., 2015. Thermodynamics of\n",
    "information. Nature Physics 11, 131–139.\n",
    "<https://doi.org/10.1038/nphys3230>"
   ],
   "id": "ffd2af91-2798-4b98-976c-9f3682a84332"
  }
 ],
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {}
}