{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Broadcasting Rules in Tensorflow Probability\n",
    "\n",
    "> In this post, it will introduce you to numpy's broadcasting rules and show how you can use broadcasting when specifying batches of distributions in TensorFlow, as well as with the `prob` and `log_prob` methods. This is the summary of lecture \"Probabilistic Deep Learning with Tensorflow 2\" from Imperial College London.\n",
    "\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true\n",
    "- author: Chanseok Kang\n",
    "- categories: [Python, Coursera, Tensorflow_probability, ICL]\n",
    "- image: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import tensorflow_probability as tfp\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "tfd = tfp.distributions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensorflow Version:  2.5.0\n",
      "Tensorflow Probability Version:  0.13.0\n"
     ]
    }
   ],
   "source": [
    "print(\"Tensorflow Version: \", tf.__version__)\n",
    "print(\"Tensorflow Probability Version: \", tfp.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Operations on arrays of different sizes in numpy\n",
    "\n",
    "Numpy operations can be applied to arrays that are not of the same shape, but only if the shapes satisfy certain conditions.\n",
    "\n",
    "As a demonstration of this, let us add together two arrays of different shapes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((4, 1), (3,))"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Add two arrays with different shapes\n",
    "\n",
    "a = np.array([[1.],\n",
    "              [2.],\n",
    "              [3.],\n",
    "              [4.]])  # shape (4, 1)\n",
    "\n",
    "b = np.array([0., 1., 2.])  # shape (3,) \n",
    "\n",
    "a.shape, b.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1., 2., 3.],\n",
       "       [2., 3., 4.],\n",
       "       [3., 4., 5.],\n",
       "       [4., 5., 6.]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a + b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(4, 3)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(a + b).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the addition\n",
    "\n",
    "    [ [1.],    +  [0., 1., 2.]  \n",
    "      [2.],  \n",
    "      [3.],  \n",
    "      [4.] ]\n",
    "\n",
    "To execute it, numpy:\n",
    "1. Aligned the shapes of `a` and `b` on the last axis and prepended 1s to the shape with fewer axes:\n",
    "        a: 4 x 1     --->    a: 4 x 1\n",
    "        b:     3     --->    b: 1 x 3\n",
    "        \n",
    "\n",
    "2. Checked that the sizes of the axes matched or were equal to 1:\n",
    "        a: 4 x 1  \n",
    "        b: 1 x 3\n",
    "`a` and `b` satisfied this criterion. \n",
    "\n",
    "\n",
    "3. Stretched both arrays on their 1-valued axes so that their shapes matched, then added them together.  \n",
    "`a` was replicated 3 times in the second axis, while `b` was replicated 4 times in the first axis.\n",
    "\n",
    "This meant that the addition in the final step was\n",
    "\n",
    "    [ [1., 1., 1.],    +  [ [0., 1., 2.],  \n",
    "      [2., 2., 2.],         [0., 1., 2.],  \n",
    "      [3., 3., 3.],         [0., 1., 2.],  \n",
    "      [4., 4., 4.] ]        [0., 1., 2.] ]\n",
    "      \n",
    "Addition was then carried out element-by-element, as you can verify by referring back to the output of the code cell above.  \n",
    "This resulted in an output with shape 4 x 3.\n",
    "\n",
    "\n",
    "## Numpy's broadcasting rule\n",
    "\n",
    "Broadcasting rules describe how values should be transmitted when the inputs to an operation do not match.  \n",
    "In numpy, the broadcasting rule is very simple:\n",
    "\n",
    "> Prepend 1s to the smaller shape,   \n",
    "check that the axes of both arrays have sizes that are equal or 1,  \n",
    "then stretch the arrays in their size-1 axes.\n",
    "\n",
    "A crucial aspect of this rule is that it does not require the input arrays have the same number of axes.  \n",
    "Another consequence of it is that a broadcasting output will have the largest size of its inputs in each axis.  \n",
    "Take the following multiplication as an example:\n",
    "\n",
    "        a: 3 x 7 x 1  \n",
    "        b:     1 x 5  \n",
    "    a * b: 3 x 7 x 5\n",
    "\n",
    "You can see that the output shape is the maximum of the sizes in each axis.\n",
    "\n",
    "Numpy's broadcasting rule also does not require that one of the arrays has to be bigger in all axes.  \n",
    "This is seen in the following example, where `a` is smaller than `b` in its third axis but is bigger in its second axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((2, 2, 1), (2, 1, 2))"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Multiply two arrays with different shapes\n",
    "\n",
    "a = np.array([[[0.01], [0.1]],\n",
    "              [[1.00], [10.]]])  # shape (2, 2, 1)\n",
    "b = np.array([[[2., 2.]],\n",
    "              [[3., 3.]]])       # shape (2, 1, 2)\n",
    "\n",
    "a.shape, b.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[[2.e-02, 2.e-02],\n",
       "        [2.e-01, 2.e-01]],\n",
       "\n",
       "       [[3.e+00, 3.e+00],\n",
       "        [3.e+01, 3.e+01]]])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a * b # shape (2, 2, 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2, 2, 2)\n"
     ]
    }
   ],
   "source": [
    "print((a * b).shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Broadcasting behaviour also points to an efficient way to compute an outer product in numpy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((3,), (4,))"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting to compute an outer product\n",
    "\n",
    "a = np.array([-1., 0., 1.])\n",
    "b = np.array([0., 1., 2., 3.])\n",
    "\n",
    "a.shape, b.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-0., -1., -2., -3.],\n",
       "       [ 0.,  0.,  0.,  0.],\n",
       "       [ 0.,  1.,  2.,  3.]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[:, np.newaxis] * b  # outer product ab^T, where a and b are column vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3, 4)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(a[:, np.newaxis] * b).shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3, 1)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[:, np.newaxis].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The idea of numpy stretching the arrays in their size-1 axes is useful and is functionally correct. But this is not what numpy literally does behind the scenes, since that would be an inefficient use of memory. Instead, numpy carries out the operation by looping over singleton (size-1) dimensions.\n",
    "\n",
    "To give you some practice with broadcasting, try predicting the output shapes for the following operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define three arrays with different shapes\n",
    "\n",
    "a = [[1.], [2.], [3.]]\n",
    "b = np.zeros(shape=[10, 1, 1])\n",
    "c = np.ones(shape=[4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((10, 1, 1), (4,))"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b.shape, c.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Actually, `a` is 2D list, not numpy array. But numpy addition can automatically convert list to numpy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10, 3, 1)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Predict the shape before executing this cell\n",
    "\n",
    "(a + b).shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3, 4)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Predict the shape before executing this cell\n",
    "\n",
    "(a * c).shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(10, 3, 4)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Predict the shape before executing this cell\n",
    "\n",
    "(a * b + c).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Broadcasting for univariate TensorFlow Distributions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The broadcasting rule for TensorFlow is the same as that for numpy. For example, TensorFlow also allows you to specify the parameters of Distribution objects using broadcasting. \n",
    "\n",
    "What is meant by this can be understood through an example with the univariate normal distribution. Say that we wish to specify a parameter grid for six Gaussians. The parameter combinations to be used, `(loc, scale)`, are:  \n",
    "\n",
    "    (0, 1)  \n",
    "    (0, 10)  \n",
    "    (0, 100)  \n",
    "    (1, 1)  \n",
    "    (1, 10)  \n",
    "    (1, 100)\n",
    "    \n",
    "A laborious way of doing this is to explicitly pass each parameter to `tfd.Normal`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.Normal 'Normal' batch_shape=[6] event_shape=[] dtype=float32>"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define a batch of Normal distributions without broadcasting\n",
    "batch_of_normals = tfd.Normal(loc=[0., 0., 0., 1., 1., 1.,], scale=[1., 10., 100., 1., 10., 100.])\n",
    "batch_of_normals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(6,), dtype=float32, numpy=array([0., 0., 0., 1., 1., 1.], dtype=float32)>"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check the parameter values for loc (mean)\n",
    "batch_of_normals.loc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(6,), dtype=float32, numpy=array([  1.,  10., 100.,   1.,  10., 100.], dtype=float32)>"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check the parameter values for scale (std)\n",
    "batch_of_normals.scale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A more succinct way to create a batch of distributions for this parameter grid is to use broadcasting.  \n",
    "Consider what would happen if we were to broadcast these arrays according the rule discussed earlier: \n",
    "    \n",
    "    loc = [ [0.],\n",
    "            [1.] ]\n",
    "    scale = [1., 10., 100.]\n",
    "    \n",
    "The shapes would be stretched according to\n",
    "\n",
    "    loc:   2 x 1 ---> 2 x 3\n",
    "    scale: 1 x 3 ---> 2 x 3\n",
    "    \n",
    "resulting in\n",
    "\n",
    "    loc = [ [0., 0., 0.],\n",
    "            [1., 1., 1.] ]\n",
    "    scale = [ [1., 10., 100.],\n",
    "              [1., 10., 100.] ]\n",
    "              \n",
    "which are compatible with the `loc` and `scale` arguments of `tfd.Normal`.  \n",
    "Sure enough, this is precisely what TensorFlow does:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.Normal 'Normal' batch_shape=[2, 3] event_shape=[] dtype=float32>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define a batch of Normal distribution with broadcasting\n",
    "\n",
    "loc = [[0.], [1.]] # (2, 1)\n",
    "scale = [1., 10., 100.] # (3, )\n",
    "\n",
    "another_batch_of_normals = tfd.Normal(loc=loc, scale=scale)\n",
    "another_batch_of_normals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 1), dtype=float32, numpy=\n",
       "array([[0.],\n",
       "       [1.]], dtype=float32)>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The stored loc parameter values are what you pass in, not what is used after broadcasting\n",
    "\n",
    "another_batch_of_normals.loc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(3,), dtype=float32, numpy=array([  1.,  10., 100.], dtype=float32)>"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The stored scale parameter values are what you pass in, not what is used after broadcasting\n",
    "\n",
    "another_batch_of_normals.scale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In summary, TensorFlow broadcasts parameter arrays: it stretches them according to the broadcasting rule, then creates a distribution on an element-by-element basis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Broadcasting with `prob` and `log_prob` methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When using `prob` and  `log_prob` with broadcasting, we follow the same principles as before. Let's make a new batch of normals as before but with means which are centered at different locations to help distinguish the results we get."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.Normal 'Normal' batch_shape=[2, 3] event_shape=[] dtype=float32>"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define a batch of Normal distributions with broadcasting\n",
    "\n",
    "loc = [[0.], [10.]]\n",
    "scale = [1., 1., 1.]\n",
    "\n",
    "another_batch_of_normals = tfd.Normal(loc=loc, scale=scale)\n",
    "another_batch_of_normals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can feed in samples of any shape as long as it can be broadcast agasint our batch shape for this example. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 1), dtype=float32, numpy=\n",
       "array([[0.30983818],\n",
       "       [0.7367121 ]], dtype=float32)>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting along the second axis with the prob method\n",
    "sample = tf.random.uniform((2, 1))\n",
    "sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
       "array([[3.8024542e-01, 3.8024542e-01, 3.8024542e-01],\n",
       "       [9.2860834e-20, 9.2860834e-20, 9.2860834e-20]], dtype=float32)>"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "another_batch_of_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or broadcasting along the first axis instead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[0.48120987, 0.05952108, 0.13428748]], dtype=float32)>"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting along the first axis with the prob method\n",
    "sample = tf.random.uniform((1, 3))\n",
    "sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
       "array([[3.5532588e-01, 3.9823624e-01, 3.9536136e-01],\n",
       "       [8.4288889e-21, 1.3928772e-22, 2.9206223e-22]], dtype=float32)>"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "another_batch_of_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or even both axes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.61676073]], dtype=float32)>"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting along both axes with the prob method\n",
    "sample = tf.random.uniform((1, 1))\n",
    "sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
       "array([[3.2984403e-01, 3.2984403e-01, 3.2984403e-01],\n",
       "       [3.0348733e-20, 3.0348733e-20, 3.0348733e-20]], dtype=float32)>"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "another_batch_of_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`log_prob` works in the exact same way with broadcasting. We can replace `prob` with `log_prob` in any of the previous examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
       "array([[ -1.2130027 ,  -0.96663547,  -1.0695213 ],\n",
       "       [-43.54405   , -47.878044  , -45.58167   ]], dtype=float32)>"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample = tf.random.uniform((1, 3))\n",
    "another_batch_of_normals.log_prob(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Broadcasting for multivariate TensorFlow distributions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Broadcasting behaviour for multivariate distributions is only a little more sophisticated than it is for univariate distributions.\n",
    "\n",
    "Recall that `MultivariateNormalDiag` has two parameter arguments: `loc` and `scale_diag`. When specifying a single distribution, these arguments are vectors of the same length:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[] event_shape=[2] dtype=float32>"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define a multivariate normal distribution without broadcasting\n",
    "\n",
    "single_mvt_normal = tfd.MultivariateNormalDiag(loc=[0., 0.], scale_diag=[1., 0.5])\n",
    "single_mvt_normal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 0.], dtype=float32)>"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Print loc parameter\n",
    "single_mvt_normal.loc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 2), dtype=float32, numpy=\n",
       "array([[1.  , 0.  ],\n",
       "       [0.  , 0.25]], dtype=float32)>"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Print Covariance matrix\n",
    "single_mvt_normal.covariance()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Covariance Matrix is the diagonal matrix with scale_diag^2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The size of the final axis of the inputs determines the event shape for each distribution in the batch.  This means that if we pass\n",
    "    \n",
    "    loc = [ [0., 0.],\n",
    "            [1., 1.] ]\n",
    "    scale_diag = [1., 0.5]\n",
    "    \n",
    "such that\n",
    "\n",
    "    loc:        2 x 2\n",
    "    scale_diag: 1 x 2\n",
    "                    ^ final dimension is interpreted as event dimension\n",
    "                ^ other dimensions are interpreted as batch dimensions  \n",
    "then a batch of two bivariate normal distributions will be created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a multivariate normal distribution with broadcasting\n",
    "\n",
    "loc = [[0., 0.],\n",
    "       [1., 1.]]\n",
    "scale_diag = [1., 0.5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[2] event_shape=[2] dtype=float32>"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "batch_of_mvt_normals = tfd.MultivariateNormalDiag(loc=loc, scale_diag=scale_diag)\n",
    "batch_of_mvt_normals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'loc': ListWrapper([ListWrapper([0.0, 0.0]), ListWrapper([1.0, 1.0])]),\n",
       " 'scale_diag': ListWrapper([1.0, 0.5]),\n",
       " 'scale_identity_multiplier': None,\n",
       " 'validate_args': False,\n",
       " 'allow_nan_stats': True,\n",
       " 'experimental_use_kahan_sum': False,\n",
       " 'name': 'MultivariateNormalDiag'}"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Print the distribution paramters\n",
    "# There is a batch of two distributions with different means and same covariance\n",
    "\n",
    "batch_of_mvt_normals.parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Knowing that, for multivariate distributions, TensorFlow \n",
    "\n",
    "- interprets the final axis of an array of parameters as the event shape, \n",
    "\n",
    "\n",
    "- and broadcasts over the remaining axes,  \n",
    "\n",
    "can you predict what the batch and event shapes will if we pass the arguments\n",
    "\n",
    "\n",
    "    loc = [ [ 1.,  1.,  1.],\n",
    "            [-1., -1., -1.] ] # shape (2, 3)\n",
    "    scale_diag = [ [[0.1, 0.1, 0.1]],\n",
    "                   [[10., 10., 10.]] ] # shape (2, 1, 3)\n",
    "                   \n",
    "to `MultivariateNormalDiag`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#collapse-hide\n",
    "\n",
    "Solution:\n",
    "\n",
    "Align the parameter array shapes on their last axis, prepending 1s where necessary:  \n",
    "    \n",
    "           loc: 1 x 2 x 3  \n",
    "    scale_diag: 2 x 1 x 3  \n",
    "\n",
    "The final axis has size 3, so `event_shape = (3)`. The remaining axes are broadcast over to yield  \n",
    "    \n",
    "           loc: 2 x 2 x 3  \n",
    "    scale_diag: 2 x 2 x 3  \n",
    "\n",
    "so `batch_shape = (2, 2)`.\n",
    "\n",
    "Let's see if this is correct!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a multivariate Gaussian distribution with broadcasting\n",
    "\n",
    "loc = [ [ 1.,  1.,  1.],\n",
    "        [-1., -1., -1.] ]  # shape (2, 3)\n",
    "scale_diag = [ [[0.1, 0.1, 0.1]],\n",
    "               [[10., 10., 10.]] ]  # shape (2, 1, 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[2, 2] event_shape=[3] dtype=float32>"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "another_batch_of_mvt_normals = tfd.MultivariateNormalDiag(loc=loc, scale_diag=scale_diag)\n",
    "another_batch_of_mvt_normals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'loc': ListWrapper([ListWrapper([1.0, 1.0, 1.0]), ListWrapper([-1.0, -1.0, -1.0])]),\n",
       " 'scale_diag': ListWrapper([ListWrapper([ListWrapper([0.1, 0.1, 0.1])]), ListWrapper([ListWrapper([10.0, 10.0, 10.0])])]),\n",
       " 'scale_identity_multiplier': None,\n",
       " 'validate_args': False,\n",
       " 'allow_nan_stats': True,\n",
       " 'experimental_use_kahan_sum': False,\n",
       " 'name': 'MultivariateNormalDiag'}"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Print the distribution parameters\n",
    "\n",
    "another_batch_of_mvt_normals.parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we did before lets also look at broadcasting when we have batches of multivariate distributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a batch of Normal distributions with broadcasting\n",
    "\n",
    "loc = [[0.],\n",
    "       [1.],\n",
    "       [0.]]\n",
    "scale = [1., 10., 100., 1., 10, 100.]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.Normal 'Normal' batch_shape=[3, 6] event_shape=[] dtype=float32>"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "another_batch_of_normals = tfd.Normal(loc=loc, scale=scale)\n",
    "another_batch_of_normals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And to refresh our memory of `Independent` we'll use it below to roll the rightmost batch shape into the event shape. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tfp.distributions.Independent 'IndependentNormal' batch_shape=[3] event_shape=[6] dtype=float32>"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a multivariate Independent distribution\n",
    "\n",
    "another_batch_of_mvt_normals = tfd.Independent(another_batch_of_normals)\n",
    "another_batch_of_mvt_normals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, onto the broadcasting: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(3,), dtype=float32, numpy=array([3.6341732e-09, 3.8412056e-09, 2.6659681e-09], dtype=float32)>"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting with the prob method\n",
    "# Batch_size shaped input (broadcast over event)\n",
    "\n",
    "sample = tf.random.uniform((3, 1))\n",
    "another_batch_of_mvt_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(3,), dtype=float32, numpy=array([2.4391147e-09, 2.7616289e-09, 2.4391147e-09], dtype=float32)>"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting with the prob method\n",
    "# Event_shape shaped input (broadcast over batch)\n",
    "\n",
    "sample = tf.random.uniform((1, 6))\n",
    "another_batch_of_mvt_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
       "array([[2.5010654e-09, 3.3668932e-09, 3.8643830e-09],\n",
       "       [3.0422742e-09, 2.6408655e-09, 3.9074304e-09]], dtype=float32)>"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting with the prob method\n",
    "# [Samples,Batch_size,Events] shaped input (broadcast over samples)\n",
    "\n",
    "sample = tf.random.uniform((2, 3, 6))\n",
    "another_batch_of_mvt_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
       "array([[3.2072505e-09, 2.8448774e-09, 3.2072505e-09],\n",
       "       [3.9072363e-09, 1.9765971e-09, 3.9072363e-09]], dtype=float32)>"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# [S,b,e] shaped input where [b,e] can be broadcast agaisnt [B,E]\n",
    "sample = tf.random.uniform((2, 1, 6))\n",
    "another_batch_of_mvt_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a final example with `log_prob` instead of `prob`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: shape=(2, 3), dtype=float32, numpy=\n",
       "array([[3.5190979e-09, 2.3076914e-09, 3.0966967e-09],\n",
       "       [1.6677665e-09, 2.7270144e-09, 3.7685051e-09]], dtype=float32)>"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Use broadcasting with the log_prob method\n",
    "# [S,b,e] shaped input where [b,e] can be broadcast agaisnt [B,E]\n",
    "\n",
    "sample = tf.random.uniform((2, 3, 1))\n",
    "another_batch_of_mvt_normals.prob(sample)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should now feel confident specifying batches of distributions using broadcasting. As you may have already guessed, broadcasting is especially useful when specifying grids of hyperparameters. \n",
    "\n",
    "If you don't feel entirely comfortable with broadcasting quite yet, don't worry: re-read this notebook, go through the further reading provided below, and experiment with broadcasting in both numpy and TensorFlow, and you'll be broadcasting in no time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further reading and resources\n",
    "* Numpy documentation on broadcasting: https://numpy.org/devdocs/user/theory.broadcasting.html\n",
    "* https://www.tensorflow.org/xla/broadcasting"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}