{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "596ff08c-400d-4573-912b-9ecfe361d9fd",
   "metadata": {},
   "source": [
    "# Notebook 7: Final [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mattsankner/micrograd/blob/main/mg7_final.ipynb) [![View in nbviewer](https://img.shields.io/badge/view-nbviewer-orange)](https://nbviewer.jupyter.org/github/mattsankner/micrograd/blob/main/mg7_final.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a2bcf2b-70f4-4753-90c3-3160ed9f01d0",
   "metadata": {},
   "source": [
    "# **Let's build the final micrograd.**\n",
    "\n",
    "### **Run the code, note the common error listed, and view the summary. Keep running forward, backward, and SGD to minimize the loss!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "3dadf990-368f-458c-96ad-744093cfbe5f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "8a1e4e7f-ad29-4d7d-9705-0361385a6a25",
   "metadata": {},
   "outputs": [],
   "source": [
    "from graphviz import Digraph\n",
    "\n",
    "def trace(root):\n",
    "  # builds a set of all nodes and edges in a graph\n",
    "  nodes, edges = set(), set()\n",
    "  def build(v):\n",
    "    if v not in nodes:\n",
    "      nodes.add(v)\n",
    "      for child in v._prev:\n",
    "        edges.add((child, v))\n",
    "        build(child)\n",
    "  build(root)\n",
    "  return nodes, edges\n",
    "\n",
    "def draw_dot(root):\n",
    "  dot = Digraph(format='svg', graph_attr={'rankdir': 'LR'}) # LR = left to right\n",
    "  \n",
    "  nodes, edges = trace(root)\n",
    "  for n in nodes:\n",
    "    uid = str(id(n))\n",
    "    # for any value in the graph, create a rectangular ('record') node for it\n",
    "    dot.node(name = uid, label = \"{ %s | data %.4f | grad %.4f }\" % (n.label, n.data, n.grad), shape='record')\n",
    "    if n._op:\n",
    "      # if this value is a result of some operation, create an op node for it\n",
    "      dot.node(name = uid + n._op, label = n._op)\n",
    "      # and connect this node to it\n",
    "      dot.edge(uid + n._op, uid)\n",
    "\n",
    "  for n1, n2 in edges:\n",
    "    # connect n1 to the op node of n2\n",
    "    dot.edge(str(id(n1)), str(id(n2)) + n2._op)\n",
    "\n",
    "  return dot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "19489dcd-2c0b-4ea8-8f29-3dc3cce9e10c",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Value:\n",
    "    def __init__(self, data, _children=(), _op='', label=''):\n",
    "        self.data = data\n",
    "        self.grad = 0.0\n",
    "        self._backward = lambda: None\n",
    "        self._prev = set(_children)\n",
    "        self._op = _op\n",
    "        self.label = label\n",
    "\n",
    "    def __repr__(self):\n",
    "        return f\"Value(data={self.data})\"\n",
    "\n",
    "    def __add__(self, other):\n",
    "        other = other if isinstance(other, Value) else Value(other)\n",
    "        out = Value(self.data + other.data, (self, other), '+')\n",
    "\n",
    "        def _backward():\n",
    "            self.grad += 1.0 * out.grad\n",
    "            other.grad += 1.0 * out.grad\n",
    "        out._backward = _backward\n",
    "\n",
    "        return out\n",
    "\n",
    "    def __mul__(self, other):\n",
    "        other = other if isinstance(other, Value) else Value(other)\n",
    "        out = Value(self.data * other.data, (self, other), '*')\n",
    "\n",
    "        def _backward():\n",
    "            self.grad += other.data * out.grad\n",
    "            other.grad += self.data * out.grad\n",
    "        out._backward = _backward\n",
    "\n",
    "        return out\n",
    "\n",
    "    def __neg__(self):\n",
    "        return self * -1\n",
    "\n",
    "    def __sub__(self, other):\n",
    "        return self + (-other)\n",
    "\n",
    "    def __pow__(self, other):\n",
    "        assert isinstance(other, (int, float)), \"only supporting int/float powers for now\"\n",
    "        out = Value(self.data ** other, (self,), f'**{other}')\n",
    "\n",
    "        def _backward():\n",
    "            self.grad += other * self.data ** (other - 1) * out.grad\n",
    "        out._backward = _backward\n",
    "        return out\n",
    "\n",
    "    def __rmul__(self, other):\n",
    "        return self * other\n",
    "\n",
    "    def __truediv__(self, other):\n",
    "        return self * other ** -1\n",
    "\n",
    "    def tanh(self):\n",
    "        x = self.data\n",
    "        t = (math.exp(2 * x) - 1) / (math.exp(2 * x) + 1)\n",
    "        out = Value(t, (self,), 'tanh')\n",
    "\n",
    "        def _backward():\n",
    "            self.grad += (1 - t ** 2) * out.grad\n",
    "        out._backward = _backward\n",
    "\n",
    "        return out\n",
    "\n",
    "    def exp(self):\n",
    "        x = self.data\n",
    "        out = Value(math.exp(x), (self,), 'exp')\n",
    "\n",
    "        def _backward():\n",
    "            self.grad += out.data * out.grad\n",
    "        out._backward = _backward\n",
    "        return out\n",
    "\n",
    "    def backward(self):\n",
    "        topo = []\n",
    "        visited = set()\n",
    "        def build_topo(v):\n",
    "            if v not in visited:\n",
    "                visited.add(v)\n",
    "                for child in v._prev:\n",
    "                    build_topo(child)\n",
    "                topo.append(v)\n",
    "        build_topo(self)\n",
    "\n",
    "        self.grad = 1.0\n",
    "        for node in reversed(topo):\n",
    "            node._backward()\n",
    "\n",
    "import random\n",
    "class Neuron:\n",
    "    def __init__(self, nin):\n",
    "        self.w = [Value(random.uniform(-1, 1)) for _ in range(nin)]\n",
    "        self.b = Value(random.uniform(-1, 1))\n",
    "\n",
    "    def __call__(self, x):\n",
    "        act = sum((wi * xi for wi, xi in zip(self.w, x)), self.b)\n",
    "        out = act.tanh()\n",
    "        return out\n",
    "    def parameters(self): #calling it this way because pytorch has params on every single nn module. Does same thing. \n",
    "        #Returns the param tensors.\n",
    "        #for us, return the param scalars\n",
    "        return self.w + [self.b]\n",
    "\n",
    "class Layer:\n",
    "    def __init__(self, nin, nout):\n",
    "        self.neurons = [Neuron(nin) for _ in range(nout)]\n",
    "\n",
    "    def __call__(self, x):\n",
    "        outs = [n(x) for n in self.neurons]\n",
    "        return outs[0] if len(outs) == 1 else outs\n",
    "        \n",
    "    def parameters(self):\n",
    "        params = []\n",
    "        for neuron in self.neurons:\n",
    "            ps = neuron.parameters()\n",
    "            params.extend(ps)\n",
    "        return params\n",
    "        # can also write it as return [p for neuron in self.neurons for p in neuron.parameters()\n",
    "\n",
    "class MLP:\n",
    "    def __init__(self, nin, nouts):\n",
    "        sz = [nin] + nouts\n",
    "        self.layers = [Layer(sz[i], sz[i + 1]) for i in range(len(nouts))]\n",
    "\n",
    "    def __call__(self, x):\n",
    "        for layer in self.layers:\n",
    "            x = layer(x)\n",
    "        return x\n",
    "    def parameters(self):\n",
    "        return [p for layer in self.layers for p in layer.parameters()]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "1a58cd1d-235d-4f4e-a446-f07ce2325f29",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Value(data=0.9144691472967574)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#let's reinitialize the neural net from scratch\n",
    "x = [2.0,3.0,-1.0]\n",
    "n = MLP(3,[4,4,1])\n",
    "n(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "dfac7fa9-5723-4767-8239-5b0e50543bd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "xs = [\n",
    "    [2.0, 3.0, -1.0],\n",
    "    [3.0, -1.0, 0.5],\n",
    "    [0.5, 1.0, 1.0],\n",
    "    [1.0, 1.0, -1.0],\n",
    "]\n",
    "ys = [1.0, -1.0, -1.0, 1.0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfb00a95-aeeb-41b3-85b5-89f23b115338",
   "metadata": {},
   "source": [
    "## **Common mistake:**\n",
    "\n",
    "**Forgetting to zero_grad() before .backward().**\n",
    "\n",
    "  Everytime we go backward, we now reset the gradients first below, because we're using ```+=``` the gradient in the backpropogation. The weight includes the ```.data``` and ```.grad``` (which starts at $0$).\n",
    "  \n",
    "Before, when we were doing the backpropogation, and updating the gradients, we were not flushing or cancelling the previous gradient, so they were adding, or ```+=``` onto each other. You need to flush the ```.grad``` and reset it to $0$, or else the gradient will accumulate quickly and give you huge step sizes.\n",
    "  \n",
    "In the first few passes in previous notebooks, ours actually worked despite the bugs in the code. It's important to pay attention to these and know that sometimes the code will work but not be correct!\n",
    "\n",
    "In complex problems, we won't get away with optimize the loss step very well with the above bug. Below, we fix it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "7550b892-52c8-4e28-a332-15ab084f8c63",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 0.00021680770832094827\n",
      "1 0.0002166101690439338\n",
      "2 0.00021641297626310104\n",
      "3 0.00021621612908318135\n",
      "4 0.00021601962661196804\n",
      "5 0.00021582346796028453\n",
      "6 0.00021562765224198208\n",
      "7 0.00021543217857393666\n",
      "8 0.00021523704607599964\n",
      "9 0.00021504225387103675\n",
      "10 0.00021484780108487405\n",
      "11 0.00021465368684630507\n",
      "12 0.00021445991028707823\n",
      "13 0.00021426647054186964\n",
      "14 0.0002140733667482915\n"
     ]
    }
   ],
   "source": [
    "for k in range(15):\n",
    "    \n",
    "    #forward pass\n",
    "    ypred = [n(x) for x in xs]\n",
    "    ys_value = [Value(y) for y in ys]\n",
    "    loss = sum([(yout - ygt) ** 2 for ygt, yout in zip(ys_value, ypred)], Value(0.0))\n",
    "\n",
    "    #backward pass\n",
    "    for p in n.parameters():\n",
    "        p.grad = 0.0 #flushing .grad to 0, this will result in a slower descent (so I will increase step size)\n",
    "    loss.backward()\n",
    "    \n",
    "    #update with gradient descent\n",
    "    for p in n.parameters():\n",
    "        p.data += -.10 * p.grad #data goes up\n",
    "\n",
    "    print(k, loss.data) #print step and loss\n",
    "    \n",
    "#notice how the loss gets extremely low and ypred gets extremely close to targets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "ae45b4ee-8b29-4d75-8740-0317e72bbe68",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Value(data=0.9921185717777462),\n",
       " Value(data=-0.9923511277429776),\n",
       " Value(data=-0.9923635441111361),\n",
       " Value(data=0.9938300746501169)]"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ypred"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85afb579-e4b5-4984-a51e-814d926b6a8b",
   "metadata": {},
   "source": [
    "## **Summary of What We've Learned:**\n",
    "\n",
    "### What are Neural Nets:\n",
    "Neural networks are mathematical expressions, very simple in the case of Multi-Layer Perceptrons (MLPs). They take input data, weights, and parameters of the neural network to perform computations.\n",
    "\n",
    "### Mathematical Expressions for Forward Pass:\n",
    "This is followed by the loss function, which tries to measure the accuracy of the predictions. Usually, the loss is low when predictions match targets or when the network is behaving well.\n",
    "\n",
    "### Manipulating the Loss Function:\n",
    "The goal is to adjust the loss function so that when the loss is low, the network performs as desired on the given problem.\n",
    "\n",
    "### Backward Pass and Gradient Descent:\n",
    "We then perform the backward pass to calculate the gradients using backpropagation. These gradients are used to tune all the parameters to decrease the loss locally. This process is iterated many times, which is known as gradient descent.\n",
    "\n",
    "### Minimizing Loss:\n",
    "By following the gradient information, we minimize the loss. When the loss is minimized, the network behaves as intended.\n",
    "\n",
    "### Capabilities of Neural Nets:\n",
    "Neural networks can perform arbitrary tasks. While the example network had 41 parameters, more complex networks can have billions of parameters, resembling a massive blob of neural tissue.\n",
    "\n",
    "### Complex Problems and Emerging Properties:\n",
    "Neural networks can solve extremely complex problems and exhibit fascinating emergent properties. For instance, in the case of GPT, the network is trained on a massive amount of text from the internet to predict the next word in a sequence. This training leads to amazing properties, although it involves billions of parameters.\n",
    "\n",
    "### Fundamental Principles:\n",
    "Despite the complexity, the fundamental principles remain the same. Evaluating the gradient and performing gradient descent are consistent. People may use slightly different stochastic gradient descent (SGD) updates, and the loss function may be cross-entropy loss instead of mean squared error for predicting the next token, but the training setup is fundamentally similar."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdc1de7a-9450-40ab-a712-2d665794a214",
   "metadata": {},
   "source": [
    "# In the next notebook, we will attack a challenge that builds on the knowledge from the series!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}