{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Thorough Introduction to Boltzmann Machines\n", "\n", "The principal task of machine learning is to fit a model to some data. Thinking on the level of APIs, a model is an object with two methods:\n", "\n", "python\n", "class Model:\n", " \n", " def likelihood(self, x):\n", " pass\n", " \n", " def sample(self, n_samples):\n", " pass\n", "\n", "\n", "## Likelihood\n", "\n", "How likely is the query point(s) $x$ under our model? In other words, how likely was it that our model produced $x$? \n", "\n", "The likelihood gives a value proportional to a valid probability, but is not necessarily a valid probability itself.\n", "\n", "(Finally, the likelihood is often used as an umbrella term for, or interchangeably with, the probability density. The mainstream machine learning community would do well to agree to the use of one of these terms, and to sunset the other; while their definitions may differ slightly, the confusion brought about by their shared used sharply outweighs the pedagogical purity maintained by keeping them distinct.\n", "\n", "## Sample\n", "\n", "Draw samples from the model.\n", "\n", "## Denotation\n", "\n", "Canonically, we denote an instance of our Model in mathematical syntax as follows:\n", "\n", "$$\n", "x \\sim p(x)\n", "$$\n", "\n", "Again, this simple denotation implies two methods: that we can evaluate the likelihood of having observed $x$ under our model $p$, and that we can sample a new value $x$ from our model $p$. \n", "\n", "Often, we work with *conditional* models, such as $y \\sim p(y\\vert x)$, in classification and regression tasks. The same two implicit methods apply.\n", "\n", "## Boltzmann machines\n", "\n", "A Boltzmann machine is one of the simplest mechanisms for modeling $p(x)$. It is an undirected graphical model where every dimension $x_i$ of a given observation $x$ influences every other dimension. **As such, we might use it to model data which we believe to exhibit this property, e.g. an image.** For $x \\in R^3$, our model would look as follows:\n", "\n", "![](figures/boltzmann-machine.svg)\n", "\n", "For $x \\in R^n$, a given node $x_i$ would have $n - 1$ outgoing connections in total—one to each of the other nodes $x_j$ for $j \\neq i$.\n", "\n", "Finally, a Boltzmann machine strictly operates on *binary* data. This keeps things simple.\n", "\n", "## Computing the likelihood\n", "\n", "A Boltzmann machines admits the following formula for computing the likelihood of data points $x^{(1)}, ..., x^{(n)}$:\n", "\n", "$$\n", "H(x) = \\sum\\limits_{i \\neq j} w_{i, j} x_i x_j + \\sum\\limits_i b_i x_i\n", "$$\n", "\n", "$$\n", "p(x) = \\frac{\\exp{(H(x))}}{Z}\n", "$$\n", "\n", "$$\n", "\\mathcal{L}(x^{(1)}, ..., x^{(n)}) = \\prod\\limits_{i=1}^n p(x^{(i)})\n", "$$\n", "\n", "Note:\n", "\n", "- Since our weights can be negative, $H(x)$ can be negative. As a likelihood gives an optionally-normalized probability, it must be non-negative.\n", "- To enforce this constraint, we exponentiate $H(x)$ in the second equation.\n", "- To normalize, we divide by the normalization constant $Z$, i.e. the sum of the likelihoods of all possible values of $x^{(1)}, ..., x^{(n)}$.\n", "\n", "## Computing the partition function, with examples\n", "\n", "In the case of 2-dimensional binary $x$, the only possible \"configurations\" of $x$ are: $[0, 0], [0, 1], [1, 0], [1, 1]$, i.e. 4 distinct values. This means that in evaluating the likelihood of one datum $x$, the normalization constant $Z$ would be a sum of 4 terms.\n", "\n", "Now, with two data points $x^{(1)}$ and $x^{(2)}$, there are 16 possible \"configurations\":\n", "\n", "1. $x^{(1)} = [0, 0]$, $x^{(2)} = [0, 0]$\n", "2. $x^{(1)} = [0, 0]$, $x^{(2)} = [0, 1]$\n", "3. $x^{(1)} = [0, 0]$, $x^{(2)} = [1, 0]$\n", "4. $x^{(1)} = [0, 0]$, $x^{(2)} = [1, 1]$\n", "5. $x^{(1)} = [0, 1]$, $x^{(2)} = [0, 0]$\n", "6. $x^{(1)} = [0, 1]$, $x^{(2)} = [0, 1]$\n", "7. Etc.\n", "\n", "This means that in evaluating the likelihood of $\\mathcal{L}(x^{(1)}, x^{(2)})$, the normalization constant $Z$ would be a sum of 16 terms.\n", "\n", "More generally, given $d$-dimensional $x$, where each $x_i$ can assume one of $v$ distinct values, and $n$ data points $x^{(1)}, ..., x^{(n)}$—in evaluating the likelihood of $\\mathcal{L}(x^{(1)}, ..., x^{(n)})$ the normalization constant $Z$ would be a sum of $(v^d)^n$ terms. **With a non-trivially large $v$ or $d$ (in the discrete case), or a non-trivially large $k$ in the continuous case, this becomes intractable to compute.**\n", "\n", "In the case of a Boltzmann machine, $v = 2$, which is not large. Below, we will vary $d$ and examine its impact on the tractability (in terms of, \"can we actually compute $Z$ before the end of the universe?\") of inference.\n", "\n", "## The likelihood function in code\n", "\n", "In code, the likelihood function looks as follows:\n", "\n", "python\n", "def _unnormalized_likelihood(self, x):\n", " return np.exp(self._H(x))\n", " \n", "def _H(self, x):\n", " h = 0\n", " for i, j in self.var_combinations:\n", " h += self.weights[i, j] * x[i] * x[j]\n", " h += self.biases @ x\n", " return h\n", "\n", "def likelihood(self, x, log=False):\n", " \"\"\"\n", " :param x: a vector of shape (n_units,) or (n, n_units),\n", " where the latter is a matrix of multiple data points\n", " for which to compute the joint likelihood.\n", " \"\"\"\n", " x = np.array(x)\n", " if not self.n_units in x.shape and len(x.shape) in (1, 2):\n", " raise('Please pass 1 or more points of n_units dimensions')\n", "\n", " # compute unnormalized likelihoods\n", " multiple_samples = len(x.shape) == 2\n", " if multiple_samples:\n", " likelihood = [self._unnormalized_likelihood(point) for point in x]\n", " else:\n", " likelihood = [self._unnormalized_likelihood(x)]\n", "\n", " # compute partition function\n", " Z = sum([self._unnormalized_likelihood(config) for config in self.all_configs])\n", "\n", " if log:\n", " return sum([np.log(lik) - np.log(Z) for lik in likelihood])\n", " else:\n", " return reduce(np.multiply, [lik / Z for lik in likelihood])\n", "\n", "\n", "This code block is longer than you might expect because it includes a few supplementary behaviors, namely:\n", "\n", "- Computing the likelihood of one or more points\n", "- Avoiding redundant computation of Z\n", "- Optionally computing the log-likelihood\n", "\n", "Above all, note that: the likelihood is a function of the model's parameters, i.e. self.weights and self.biases, which we can vary, and the data x, which we can't.\n", "\n", "## Training the model\n", "\n", "At the outset, the parameters self.weights and self.biases of our model are initialized at random. Trivially, such that the values returned by likelihood and sample are useful, we must first update these parameters by fitting this model to observed data.\n", "\n", "To do so, we will employ the principal of maximum likelihood: compute the parameters that make the observed data maximally likely under the model, via gradient ascent.\n", "\n", "## Gradients\n", "\n", "Since our model is simple, we can derive exact gradients by hand. We will work with the log-likelihood instead of the true likelihood to avoid issues of computational underflow. Below, we simplify this expression, then compute its various gradients.\n", "\n", "### $\\log{\\mathcal{L}}$\n", "\n", "$$\n", "\\mathcal{L}(x^{(1)}, ..., x^{(n)}) = \\prod\\limits_{k=1}^n \\frac{\\exp{(H(x^{(k)})}}{Z}\n", "$$\n", "\n", "\n", "\\begin{align*}\n", "\\log{\\mathcal{L}(x^{(1)}, ..., x^{(n)})} \n", "&= \\sum\\limits_{k=1}^n \\log{\\frac{\\exp{(H(x^{(k)})}}{Z}}\\\\\n", "&= \\sum\\limits_{k=1}^n \\log{\\big(\\exp{(H(x^{(k)})}\\big)} - \\log{Z}\\\\\n", "&= \\sum\\limits_{k=1}^n H(x^{(k)}) - \\log{Z}\n", "\\end{align*}\n", "\n", "\n", "This gives the total likelihood. Our aim is to maximize the expected likelihood with respect to the data generating distribution.\n", "\n", "### Expected likelihood\n", "\n", "\n", "\\begin{align*}\n", "\\mathop{\\mathbb{E}}_{x \\sim p_{\\text{data}}}\\big[ \\mathcal{L}(x) \\big]\n", "&= \\sum\\limits_{k=1}^N p_{\\text{data}}(x = x^{(k)}) \\mathcal{L(x^{(k)})}\\\\\n", "&= \\sum\\limits_{k=1}^N \\frac{1}{N} \\mathcal{L(x^{(k)})}\\\\\n", "&= \\frac{1}{N} \\sum\\limits_{k=1}^N \\mathcal{L(x^{(k)})}\\\\\n", "\\end{align*}\n", "\n", "\n", "In other words, the average. We will continue to denote this as $\\mathcal{L}$, i.e. $\\mathcal{L} = \\frac{1}{N} \\sum\\limits_{k=1}^n H(x^{(k)}) - \\log{Z}$.\n", "\n", "Now, deriving the gradient with respect to our weights:\n", "\n", "### $\\nabla_{w_{i, j}}\\log{\\mathcal{L}}$\n", "\n", "\n", "\\begin{align*}\n", "\\nabla_{w_{i, j}} \\frac{1}{N} \\sum\\limits_{k=1}^n H(x^{(k)}) - \\log{Z}\n", "&= \\frac{1}{N} \\sum\\limits_{k=1}^n \\nabla_{w_{i, j}} H(x^{(k)}) - \\frac{1}{N} \\sum\\limits_{k=1}^n \\nabla_{w_{i, j}} \\log{Z}\n", "\\end{align*}\n", "\n", "\n", "### First term\n", "\n", "\n", "\\begin{align*}\n", "\\frac{1}{N} \\sum\\limits_{k=1}^n \\nabla_{w_{i, j}} H(x^{(k)})\n", "&= \\frac{1}{N} \\sum\\limits_{k=1}^n \\nabla_{w_{i, j}} \\sum\\limits_{i \\neq j} w_{i, j} x_i^{(k)} x_j^{(k)} + \\sum\\limits_i b_i x_i^{(k)}\\\\\n", "&= \\frac{1}{N} \\sum\\limits_{k=1}^n x_i^{(k)} x_j^{(k)}\\\\\n", "&= \\mathop{\\mathbb{E}}_{x \\sim p_{\\text{data}}} [x_i x_j]\n", "\\end{align*}\n", "\n", "\n", "### Second term\n", "\n", "NB: $\\sum\\limits_{\\mathcal{x}}$ implies a summation over all $(v^d)^n$ possible configurations of values that $x^{(1)}, ..., x^{(n)}$ can assume.\n", "\n", "\n", "\\begin{align*}\n", "\\nabla_{w_{i, j}} \\log{Z}\n", "&= \\nabla_{w_{i, j}} \\log{\\sum\\limits_{\\mathcal{x}}} \\exp{(H(x))}\\\\\n", "&= \\frac{1}{\\sum\\limits_{\\mathcal{x}} \\exp{(H(x))}} \\nabla_{w_{i, j}} \\sum\\limits_{\\mathcal{x}} \\exp{(H(x))}\\\\\n", "&= \\frac{1}{Z} \\nabla_{w_{i, j}} \\sum\\limits_{\\mathcal{x}} \\exp{(H(x))}\\\\\n", "&= \\frac{1}{Z} \\sum\\limits_{\\mathcal{x}} \\exp{(H(x))} \\nabla_{w_{i, j}} H(x)\\\\\n", "&= \\sum\\limits_{\\mathcal{x}} \\frac{\\exp{(H(x))}}{Z} \\nabla_{w_{i, j}} H(x)\\\\\n", "&= \\sum\\limits_{\\mathcal{x}} P(x) \\nabla_{w_{i, j}} H(x)\\\\\n", "&= \\sum\\limits_{\\mathcal{x}} P(x) [x_i x_j]\\\\\n", "&= \\mathop{\\mathbb{E}}_{x \\sim p_{\\text{model}}} [x_i x_j]\n", "\\end{align*}\n", "\n", "\n", "### Putting it back together\n", "\n", "Combining these constituent parts, we arrive at the following formula:\n", "\n", "$$\n", "\\nabla_{w_{i, j}}\\log{\\mathcal{L}} = \\mathop{\\mathbb{E}}_{x \\sim p_{\\text{data}}} [x_i x_j] - \\mathop{\\mathbb{E}}_{x \\sim p_{\\text{model}}} [x_i x_j]\n", "$$\n", "\n", "Finally, following the same logic, we derive the exact gradient with respect to our biases:\n", "\n", "$$\n", "\\nabla_{b_i}\\log{\\mathcal{L}} = \\mathop{\\mathbb{E}}_{x \\sim p_{\\text{data}}} [x_i] - \\mathop{\\mathbb{E}}_{x \\sim p_{\\text{model}}} [x_i]\n", "$$\n", "\n", "The first and second terms of each gradient are called, respectively, **the positive and negative phases.**\n", "\n", "## Computing the positive phase\n", "\n", "In the following toy example, our data are small: we can compute the positive phase using all of the training data, i.e. $\\frac{1}{N} \\sum\\limits_{k=1}^n x_i^{(k)} x_j^{(k)}$. Were our data bigger, we could approximate this expectation with a mini-batch of training data and we do in SGD.\n", "\n", "## Computing the negative phase\n", "\n", "Again, this term asks us to compute then sum the log-likelihood over every possible data configuration in the support of our model, which is $O(nv^d)$. **With non-trivially large $v$ or $d$, this becomes intractable to compute.**\n", "\n", "Below, we'll begin our toy example computing the true negative-phase, $\\mathop{\\mathbb{E}}_{x \\sim p_{\\text{model}}} [x_i x_j]$, with varying data-dimensionalities $d$. Then, once this computation becomes slow, we'll look to approximate this expectation later on.\n", "\n", "## Parameter updates in code\n", "\n", "python\n", "def update_parameters_with_true_negative_phase(weights, biases, var_combinations, all_configs, data, alpha=alpha):\n", " model = Model(weights, biases, var_combinations, all_configs)\n", " model_distribution = [(np.array(config), model.likelihood(config)) for config in all_configs]\n", "\n", " for i, j in var_combinations:\n", " # positive phase\n", " positive_phase = (data[:, i] * data[:, j]).mean()\n", "\n", " # negative phase\n", " negative_phase = sum([config[i] * config[j] * likelihood for config, likelihood in model_distribution])\n", "\n", " # update weights\n", " weights[i, j] += alpha * (positive_phase - negative_phase)\n", " \n", " for i, _ in enumerate(biases):\n", " # positive phase\n", " positive_phase = data[:, i].mean()\n", " \n", " # negative phase\n", " negative_phase = sum([config[i] * likelihood for config, likelihood in model_distribution])\n", " \n", " # update biases\n", " biases[i] += alpha * (positive_phase - negative_phase)\n", " \n", " return np.array(weights), np.array(biases)\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train model, visualize model distribution\n", "\n", "Finally, we're ready to train. Using the true negative phase, let's train our model for 100 epochs with $d=3$ then visualize results." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from collections import defaultdict\n", "from functools import reduce\n", "from itertools import product, combinations\n", "from time import time\n", "\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "seed = 42\n", "alpha = .01\n", "\n", "\n", "def reset_data_and_parameters(n_units=3, n_obs=100, p=[.8, .1, .5]):\n", " \"\"\"\n", " Generate training data, weights, biases, and a list of all data configurations\n", " in our model's support.\n", " \n", " In addition, generate a list of tuples of the indices of adjacent nodes, which\n", " we'll use to update parameters without duplication. \n", " \n", " For example, with n_units=3, we generate a matrix of weights with shape (3, 3); \n", " however, there are only 3 distinct weights in this matrix that we'll actually \n", " want to update: those connecting Node 0 --> Node 1, Node 1 --> Node 2, and \n", " Node 0 --> Node 2. This function returns a list containing these tuples \n", " named var_combinations.\n", " \n", " :param n_units: the dimensionality of our data d\n", " :param n_obs: the number of observations in our training set\n", " :param p: a vector of the probabilities of observing a 1 in each index\n", " of the training data. The length of this vector must equal n_units\n", " \n", " :return: weights, biases, var_combinations, all_configs, data\n", " \"\"\"\n", " np.random.seed(seed)\n", " \n", " # initialize data\n", " data = np.random.binomial(n=1, p=p, size=(100, n_units))\n", " \n", " # initialize parameters\n", " biases = np.random.randn(n_units)\n", " weights = np.random.randn(n_units, n_units)\n", " \n", " # a few other pieces we'll need\n", " var_combinations = list(combinations(range(n_units), 2))\n", " all_configs = list(product([0, 1], repeat=n_units))\n", " \n", " return weights, biases, var_combinations, all_configs, data\n", "\n", "\n", "class Model:\n", " \n", " def __init__(self, weights, biases, var_combinations, all_configs):\n", " self.weights = weights\n", " self.biases = biases\n", " self.var_combinations = var_combinations\n", " self.all_configs = all_configs\n", " self.n_units = len(self.biases)\n", " \n", " @staticmethod\n", " def _inv_logit(z):\n", " return 1 / (1 + np.exp(-z))\n", " \n", " def _unnormalized_likelihood(self, x):\n", " return np.exp(self._H(x))\n", " \n", " def _H(self, x):\n", " h = 0\n", " for i, j in self.var_combinations:\n", " h += self.weights[i, j] * x[i] * x[j]\n", " h += self.biases @ x\n", " return h\n", " \n", " def likelihood(self, x, log=False):\n", " \"\"\"\n", " :param x: a vector of shape (n_units,) or (n, n_units),\n", " where the latter is a matrix of multiple data points\n", " for which to compute the joint likelihood.\n", " \"\"\"\n", " x = np.array(x)\n", " if not self.n_units in x.shape and len(x.shape) in (1, 2):\n", " raise('Please pass 1 or more points of n_units` dimensions')\n", " \n", " # compute unnormalized likelihoods\n", " multiple_samples = len(x.shape) == 2\n", " if multiple_samples:\n", " likelihood = [self._unnormalized_likelihood(point) for point in x]\n", " else:\n", " likelihood = [self._unnormalized_likelihood(x)]\n", " \n", " # compute partition function\n", " Z = sum([self._unnormalized_likelihood(config) for config in self.all_configs])\n", " \n", " if log:\n", " return sum([np.log(lik) - np.log(Z) for lik in likelihood])\n", " else:\n", " return reduce(np.multiply, [lik / Z for lik in likelihood])\n", " \n", " def sample(self, n_samples=100, init_sample=None, burn_in=25, every_n=10, seed=seed) -> np.array:\n", "\n", " np.random.seed(seed)\n", "\n", " if burn_in > n_samples:\n", " raise(\"Can't burn in for more samples than there are in the chain\")\n", "\n", " init_sample = init_sample or [0 for _ in self.biases]\n", " samples = [init_sample]\n", "\n", " def _gibbs_step(sample, i):\n", " z = sum([self.weights[i, j] * sample[j] for j in range(len(sample)) if j != i]) + self.biases[i]\n", " p = self._inv_logit(z)\n", " return np.random.binomial(n=1, p=p)\n", "\n", " for _ in range(n_samples):\n", " sample = list(samples[-1]) # make copy\n", " for i, _ in enumerate(sample):\n", " sample[i] = _gibbs_step(sample=sample, i=i)\n", " samples.append( sample )\n", "\n", " return np.array([sample for i, sample in enumerate(samples[burn_in:]) if i % every_n == 0])\n", " \n", " def conditional_likelihood(x, cond: dict):\n", " joint = np.array(x)\n", " for index, val in cond.items():\n", " if isinstance(joint[index], int):\n", " raise\n", " joint[index] = val\n", "\n", " evidence = [cond.get(i, ...) for i in range(len(x))]\n", "\n", " return self._unnormalized_likelihood(joint) / self.marginal_likelihood(evidence)\n", " \n", " def marginal_likelihood(self, x):\n", " \"\"\"\n", " To marginalize, put ellipses (...) in the elements over \n", " which you wish to marginalize.\n", " \"\"\"\n", " unnormalized_lik = 0\n", " for config in product(*[[0, 1] if el == ... else [el] for el in x]):\n", " config = np.array(config)\n", " unnormalized_lik += np.exp(self._H(config))\n", " return unnormalized_lik\n", "\n", "\n", "def update_parameters_with_true_negative_phase(weights, biases, var_combinations, all_configs, data, alpha=alpha, **kwargs):\n", " model = Model(weights, biases, var_combinations, all_configs)\n", " model_distribution = [(np.array(config), model.likelihood(config)) for config in all_configs]\n", "\n", " for i, j in var_combinations:\n", " # positive phase\n", " positive_phase = (data[:, i] * data[:, j]).mean()\n", "\n", " # negative phase\n", " negative_phase = sum([config[i] * config[j] * likelihood for config, likelihood in model_distribution])\n", "\n", " # update weights\n", " weights[i, j] += alpha * (positive_phase - negative_phase)\n", " \n", " for i, _ in enumerate(biases):\n", " # positive phase\n", " positive_phase = data[:, i].mean()\n", " \n", " # negative phase\n", " negative_phase = sum([config[i] * likelihood for config, likelihood in model_distribution])\n", " \n", " # update biases\n", " biases[i] += alpha * (positive_phase - negative_phase)\n", " \n", " return np.array(weights), np.array(biases)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0 | Likelihood: -209.63758306786653\n", "Epoch: 10 | Likelihood: -162.04280784271083\n", "Epoch: 20 | Likelihood: -160.49961381649555\n", "Epoch: 30 | Likelihood: -159.79539070373576\n", "Epoch: 40 | Likelihood: -159.2853717231018\n", "Epoch: 50 | Likelihood: -158.90186293631422\n", "Epoch: 60 | Likelihood: -158.6084020645482\n", "Epoch: 70 | Likelihood: -158.38094343579155\n", "Epoch: 80 | Likelihood: -158.20287017780586\n", "Epoch: 90 | Likelihood: -158.06232196551673\n" ] } ], "source": [ "weights, biases, var_combinations, all_configs, data = reset_data_and_parameters(n_units=3, p=[.8, .1, .5])\n", "\n", "\n", "for i in range(100):\n", " weights, biases = update_parameters_with_true_negative_phase(weights, biases, var_combinations, all_configs, data, alpha=1)\n", " \n", " lik = Model(weights, biases, var_combinations, all_configs).likelihood(data, log=True)\n", " if i % 10 == 0:\n", " print(f'Epoch: {i:2} | Likelihood: {lik}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualize samples" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def plot_n_samples(n_samples, weights, biases):\n", " \"\"\"\n", " NB: We add some jitter to the points so as to better visualize density in a given corner of the model.\n", " \"\"\"\n", " fig = plt.figure(figsize=(12, 9))\n", " ax = fig.add_subplot(111, projection='3d')\n", " \n", " samples = Model(weights, biases, var_combinations, all_configs).sample(n_samples)\n", " x, y, z = zip(*np.array(samples))\n", " \n", " x += np.random.randn(len(x)) * .05\n", " y += np.random.randn(len(y)) * .05\n", " z += np.random.randn(len(z)) * .05\n", " \n", " ax.scatter(x, y, z)\n", " ax.set_xlabel('Node 0')\n", " ax.set_ylabel('Node 1')\n", " ax.set_zlabel('Node 2')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, 