{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Enabling notebook extension splitcell/splitcell...\n", " - Validating: \u001b[32mOK\u001b[0m\n", "Enabling notebook extension rise/main...\n", " - Validating: \u001b[32mOK\u001b[0m\n" ] } ], "source": [ "!jupyter nbextension enable splitcell/splitcell\n", "!jupyter nbextension enable rise/main" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hide_input": false, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "eos access: ✗\n" ] } ], "source": [ "import tensorflow as tf\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "plt.style.use(\"ggplot\")\n", "from matplotlib import colors, cm\n", "\n", "import os\n", "import pandas as pd\n", "from tensorflow import keras\n", "\n", "from tutorial import get_file" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "plt.rcParams[\"axes.grid\"] = False\n", "plt.rcParams.update({'font.size': 20})\n", "plt.rcParams.update({'figure.figsize': (12,9)})\n", "plt.rcParams['lines.markersize'] = 8" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# IML Workshop Tutorial\n", "\n", "### Introduction to the Basics of Deep Learning" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "
\n", "\n", "Martin Erdmann, **Yannik Rath**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "This first session is an introduction to neural networks and deep learning\n", " * Neural network basics\n", " * Training and generalization\n", " * Fully connected and convolutional architectures\n", " * Physics!\n", "\n", "This is a tutorial, not a talk. Feel free to ask questions throughout!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Plan of Today" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=timetable.png)" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Marked tutorials use same technical setup" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Setup for Tutorial" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "You can follow along locally or use Swan/Binder:\n", "\n", "* Go to https://github.com/3pia/iml2019\n", "\n", "* Suggested: Click `Open in SWAN`\n", "\n", " $\\qquad \\rightarrow$ Enter `/eos/user/m/mrieger/public/iml2019/intro/setup.sh` as environment script\n", "\n", " $\\qquad \\rightarrow$ Press `Start my session`\n", " \n", "* Alternatives: Binder, local execution with docker\n", "\n", " $\\qquad \\rightarrow$ Choose Tensorflow v1 image\n", " \n", "* Second tutorial will use v2 image" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=github_page.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Gitter Channel" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Chat channel for the tutorials at https://gitter.im/IMLWorkshop19:\n", "\n", "* For any questions that may arise during the open part or afterwards (especially remote)\n", "\n", "* For us to share code snippets if needed" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=gitter.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Deep Learning Libraries" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Many different frameworks for deep learning, steered mostly from Python\n", "\n", "Two most popular:\n", "* Tensorflow ( + higher level abstractions with Keras)\n", "* Pytorch" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=dl_popularity.png)\n", "https://towardsdatascience.com/deep-learning-framework-power-scores-2018-23607ddf297a" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "In this tutorial, we will use **TensorFlow + Keras**\n", "\n", "Side note: Many changes coming with TensorFlow 2.0. Some glimpses into the future later today" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Training by Example" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data = np.random.uniform(size=100)\n", "labels = 3*data + 1 + np.random.normal(loc=0.0, scale=0.1, size=100)\n", "\n", "plt.scatter(data, labels, label=\"data\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Data: {xi, yi}, i = 1...N\n", "\n", "Define model $y_{m}(x, \\theta) = Wx + b$ with free parameters $\\theta = (W, b)$\n", "\n", "Define *objective function (loss/cost)*\n", "\\begin{equation}\n", "J(\\theta|x,y) = \\frac{1}{N} \\sum_{i}^{N} [y_{i}-y_{m}(x_{i}, \\theta)]^{2}\n", "\\end{equation}\n", "\n", "Minimize objective (\"train model\")\n", "\\begin{equation}\n", "\\hat{\\theta} = argmin[J(\\theta)]\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Easy to do if we have only a few variables" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "cell_style": "center", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "def cost(params):\n", " W, b = params\n", " return np.mean((labels - (W*data + b))**2)\n", "\n", "from scipy.optimize import minimize\n", "res = minimize(cost, [1., 1.])\n", "W, b = res.x\n", "\n", "points = np.linspace(0, 1, 100)\n", "prediction = W*points + b\n", "\n", "plt.scatter(data, labels, label=\"data\")\n", "plt.plot(points, prediction, label=\"model\", color=\"blue\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Multidimensional Linear Models" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Now, consider multiple inputs $x = (x_{1}, .., x_{n})$ and outputs $y = (y_{1}, ..., y_{m})$\n", "\n", "Example: $x \\in \\mathbb{R}^{3}$, $y \\in \\mathbb{R}^{2}$\n", "\n", "\\begin{gather}\n", "\\begin{bmatrix} W_{11} & W_{12} & W_{13} \\\\\n", " W_{21} & W_{22} & W_{23} \\end{bmatrix} x \\begin{pmatrix} x_{1} \\\\ x_{2} \\\\ x_{3} \\end{pmatrix} + \\begin{pmatrix} b_{1} \\\\ b_{2} \\end{pmatrix} = \\begin{pmatrix} y_{1} \\\\ y_{2} \\end{pmatrix}\n", " \\end{gather}" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=linear_model.png)\n", "Adapted from Deep Learning in Physics Research Course, RWTH" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "slide" } }, "source": [ "# Neural Networks" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "So far we can only describe linear models\n", " \n", "$\\quad \\rightarrow$ Compose model with multiple layers\n", "\n", "\\begin{align}\n", "& h = W^{(1)}x + b^{(1)} \\\\\n", "& y = W^{(2)}h + b^{(2)}\n", "\\end{align}\n", "\n", "Model ist still linear\n", "\n", "\\begin{align}\n", "& y = W^{(2)}(W^{(1)}x + b^{(1)}) + b^{(2)} \\\\\n", "& y = W^{(2)}W^{(1)}x + W^{(2)}b^{(1)} + b^{(1)}\n", "\\end{align}\n", "\n", "$\\quad \\rightarrow$ Apply non-linear *activation function* $\\sigma(x)$" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=simple_nn.png)\n", "Deep Learning in Physics Research Course, RWTH" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Activation Functions\n", "\n", "Applied elementwise to each node in a hidden layer\n", "\n", "Introduces non-linearity\n", "\n", "$\\quad \\rightarrow$ Allows stacking of multiple layers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=activations.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Mostly ReLU variants used in deep learning due to *vanishing gradient problem*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Deep Neural Networks" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "\n", "[Michael Nielsen](http://neuralnetworksanddeeplearning.com/chap5.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hierarchical feature extraction: Higher level of abstraction with each layer\n", "\n", "$\\quad \\rightarrow$ Enables training on lower-level/\"raw\" data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Classification vs Regression" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "### Regression\n", "\n", "Predict continous label " ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "### Classification\n", "\n", "Separate events into multiple categories " ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=regression.png)" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=classification.png)" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Minimize *mean squared error*\n", "\n", "\\begin{equation}\n", "J(\\theta|x,y) = \\frac{1}{N} \\sum_{i}^{N} [y_{i}-y_{m}(x_{i}, \\theta)]^{2}\n", "\\end{equation}\n", "\n", "Typically linear/ no activation function in output layer" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Minimize *cross entropy*\n", "\n", "\\begin{equation}\n", "J(\\theta|x,y) = - \\frac{1}{N} \\sum_{i}^{N} y_{i}log[y_{m}(x_{i}, \\theta)]\n", "\\end{equation}\n", "\n", "with softmax activation to constrain outputs to (0, 1) and their sum to 1 (\"probability like\")\n", "\n", "\\begin{equation}\n", "y_{j}(z) = \\frac{e^{z_{j}}}{\\sum_{i}e^{z_{i}}}\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "slide" } }, "source": [ "# Backpropagation and Gradient Descent\n", "\n", "Can no longer directly minimize objective function. Instead, minimze iteratively by updating $\\theta$ in opposite direction of gradient\n", "\n", "\\begin{equation}\n", "\\hat{\\theta} \\rightarrow \\theta - \\alpha\\frac{dJ}{d\\theta}\n", "\\end{equation}\n", "\n", "with learning rate $\\alpha$\n", "\n", "Derivative can be calculated using the chain rule\n", "\n", "\\begin{equation}\n", "\\frac{dJ}{d\\theta} = \\frac{dJ}{dy_{m}}\\frac{dy_{m}}{d\\theta} = ...\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Stochastic Gradient Descent" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Evaluation and derivation of the objective function for the full dataset costly.\n", "\n", "
\n", "\n", "Instead, calculate gradient for a a small subset (*batch*) of the training data\n", "\n", "$\\rightarrow$ Stochastic updates also help avoid local minima \n", "\n", "
\n", "\n", "One iteration over the full dataset called an *epoch*" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=gradientupdate.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Optimizers" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "More advanced options than fixed learning rate \n", "\n", "* Momentum: Use past gradients as \"velocity\"\n", "\n", "
\n", "\n", "* Adaptive methods: Learning rates based on past gradients, separate for all parameters\n", "\n", "$\\qquad \\rightarrow$ E.g. Adagrad, Adadelta, Adam" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=optimizers.gif)\n", "[Alec Radford](https://www.reddit.com/r/MachineLearning/comments/2gopfa/visualizing_gradient_optimization_techniques/cklhott/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Initialization\n", "\n", "Neural network parameters initialized randomly\n", "\n", "$\\quad \\rightarrow$ Break symmetry\n", "\n", "Suitable values depend on layer sizes and activation functions\n", "\n", "$\\quad \\rightarrow$ Basic idea: $Var[output] = Var[input]$" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Glorot (tanh)\n", "\n", "\\begin{equation}\n", "Var[W] = \\frac{2}{N_{in} + N_{out}}\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "He (ReLU)\n", "\n", "\\begin{equation}\n", "Var[W] = \\frac{2}{N_{in}}\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Values typically sampled from normal or uniform (with range $\\pm \\sqrt{3Var}$ ) distribution" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Preprocessing\n", "\n", "Input features can have vastly different scales (e.g. $p_{T}$ vs $\\eta$)\n", "\n", "Most basic strategy: Normalize to mean 0 and variance 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=preprocessing.jpeg)\n", "Andrej Karpathy, http://cs231n.github.io" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Other options possible, e.g. input decorrelation, non-linear transformations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Let's Try it Out\n", "\n", "For now we use base tensorflow to see all steps.\n", "Later we will use higher level abstractions provided in `tf.keras`\n", "\n", "First, let's generate some toy data:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "n_samples = 50\n", "\n", "np.random.seed(4321)\n", "class1_data = np.random.multivariate_normal([-1., -1.], [[1., 0.], [0., 1.]], n_samples)\n", "class2_data = np.random.multivariate_normal([1., 1.], [[1., 0.], [0., 1.]], n_samples)\n", "\n", "train_data = np.concatenate([class1_data, class2_data])\n", "toy_labels = np.zeros(train_data.shape)\n", "toy_labels[:n_samples, 0] = 1\n", "toy_labels[n_samples:, 1] = 1\n", "\n", "plt.scatter(*class1_data.T)\n", "plt.scatter(*class2_data.T)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Our class labels are *one-hot encoded*, i.e. $\\begin{pmatrix} 1 \\\\ 0 \\end{pmatrix}$ for the first class and $\\begin{pmatrix} 0 \\\\ 1 \\end{pmatrix}$ for the second" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Tensorflow provides datasets and iterators to handle the data. Our model works with an iterator batch, which will be dynamically loaded during training" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Colocations handled automatically by placer.\n" ] } ], "source": [ "tf.reset_default_graph()\n", "\n", "inp_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, 2])\n", "target_placeholder = tf.placeholder(dtype=tf.float32, shape=[None, 2])\n", "\n", "dataset = tf.data.Dataset.from_tensor_slices((inp_placeholder, target_placeholder)).batch(10).shuffle(buffer_size=2*n_samples)\n", "iterator = dataset.make_initializable_iterator()\n", "inp, target = iterator.get_next()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "We define a simple model with one hidden layer, using the initialization rules we learned about\n", "\n", "Since this is a classification task, we use cross entropy as our objective function" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use tf.cast instead.\n" ] } ], "source": [ "n_hidden = 10\n", "n_epochs = 100\n", "\n", "with tf.variable_scope(\"initialization\"):\n", " W1 = tf.get_variable(\"W1\", initializer=tf.random.normal((2, n_hidden), stddev=1.))\n", " b1 = tf.get_variable(\"b1\", initializer=tf.constant(0., shape=(n_hidden,)))\n", " hidden = tf.nn.relu(tf.add(tf.matmul(inp, W1), b1))\n", "\n", " W2 = tf.get_variable(\"W2\", initializer=tf.random.normal((n_hidden, 2), stddev=(2./n_hidden)**0.5))\n", " b2 = tf.get_variable(\"b2\", initializer=tf.constant(0., shape=(2,)))\n", " out = tf.nn.softmax(tf.add(tf.matmul(hidden, W2), b2))\n", "\n", "# tensorflow also provides predefined objective functions in tf.losses\n", "cost = -tf.reduce_mean(tf.reduce_sum(target*tf.log(tf.clip_by_value(out,1e-10,1.0)), reduction_indices=1))\n", "\n", "train_step = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "In our training loop, we iterate over all batches in our training data, then re-initialize the iterator and repeat this for N *epochs*" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "with tf.Session() as sess:\n", " sess.run(tf.global_variables_initializer())\n", " for i in range(n_epochs):\n", " sess.run(iterator.initializer, feed_dict={inp_placeholder: train_data,\n", " target_placeholder: toy_labels})\n", " while True:\n", " try:\n", " sess.run(train_step)\n", " except tf.errors.OutOfRangeError:\n", " break\n", "\n", " x = y = np.linspace(-3, 3, 41)\n", " z = np.array([sess.run(out, feed_dict={inp: [[j, i]]})[0, 0] for i in x for j in y])\n", "Z = z.reshape(41, 41)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(*class1_data.T)\n", "plt.scatter(*class2_data.T)\n", "plt.imshow(Z, interpolation=\"bilinear\", origin=\"lower\", extent=(-3, 3, -3., 3.))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Overtraining and Generalization\n", "\n", "* If modeling capacity too low or training insufficient $\\rightarrow$ Bad performance\n", "\n", "
\n", "\n", "* If modeling capacity high, network can learn to memorize training samples $\\rightarrow$ Bad generalization (*Overtraining*)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=overtraining.png)" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Solution 1: Evaluate performance on a statistically independent *validation set* to measure generalization capabilities \n", "\n", "$\\quad \\rightarrow$ Stop training when performance on validation set decreases: *Early stopping*" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=earlystopping.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "**Caution:** Optimization of network hyperparameters itself a form of training\n", "\n", "$\\quad \\rightarrow$ Validation set not unbiased\n", "\n", "Use a third *test set* to measure final performance *once*\n", "\n", "In physics: Validation set is sometimes called test set, test set is called evaluation set\n", "\n", "
\n", "\n", "More sophisticated option: Cross validation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Regularization" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Methods to suppress overtraining\n", "\n", "* Already mentioned: Early stopping\n", "\n", "* L1/L2 regularization: Penalize high weights\n", "\n", "\\begin{align}\n", "& L1: J = J_{0} + \\lambda \\sum |W_{i}| \\\\\n", "& L2: J = J_{0} + \\lambda \\sum W_{i}^{2} \\\\\n", "\\end{align}\n", "\n", "with scaling factor $\\lambda$\n", "\n", "* ElasticNet: Combination of L1 and L2 regularization" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=overtraining_single.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Dropout" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Randomly drop a percentage of nodes at each training step\n", "\n", "Learn redundant representations $\\rightarrow$ More robust model\n", "\n", "Can be seen as training an ensemble of losely coupled networks in parallel\n", "\n", "For evaluation: Disable dropout, scale node outputs accordingly" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=dropout.png)\n", "[Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://jmlr.org/papers/v15/srivastava14a.html)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Convolutional Neural Networks" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "So far, considered only fully connected neural networks\n", "\n", "For images, Convolutional neural networks (CNNs) are used\n", "\n", "Exploit structures in data: Local correlations and translational invariance" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=translational_invar.png)\n", "[Udacity Course Deep Learning](https://in.udacity.com/course/deep-learning--ud730-india)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Convolutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Small filter (weight tensor) is applied to image patch, mapping it to a single value\n", "\n", "Convolution: Slide filter over image to create feature map" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "\n", "Adapted from Deep Learning in Physics Research Course, RWTH" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Feature Maps" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Multiple filters are stacked depth-wise" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "\n", "Adapted from Deep Learning in Physics Research Course, RWTH" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Each filter can learn a different feature" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=filters.png)\n", "Deep Learning in Physics Research Course, RWTH" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Padding and Pooling" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "Multiple convolutional layers will decrease output size due to edge effects\n", "\n", "$\\quad \\rightarrow$ Can pad with zeros to keep dimensions the same" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=padding.png)" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "On the other hand, can pool outputs together when output size reduction is desired\n", "\n", "E.g. max pooling: Take the maximum of each patch " ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=cnn_maxpool.png)\n", "Deep Learning in Physics Research Course, RWTH" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Feature Hierarchy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=featuremaps.png)\n", "[Zeiler & Fergus 2013](https://arxiv.org/abs/1311.2901), adapted by Yann leCun" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# CNN Structure" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Abstraction increases for later layers\n", "\n", "$\\quad \\rightarrow$ Typically smaller spatial extent (pooling), but larger number of feature maps" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=cnn_pyramid.png)\n", "Deep Learning in Physics Research Course, RWTH" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Fully connected layers at the end to combine features" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# CNN Example - Top Tagging\n", "\n", "Problem statement: Hadronically decaying top quark, boosted so that all decay products are contained in one fat jet\n", "\n", "$\\quad \\rightarrow$ Distinguish from QCD jets\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=boosted_top.png)\n", "[Top tagging at the LHC experiments with proton-proton collisions at √s = 13TeV](https://www.researchgate.net/publication/280882365_Top_tagging_at_the_LHC_experiments_with_proton-proton_collisions_at_s_13TeV)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "[Open dataset](https://desycloud.desy.de/index.php/s/llbX3zpLhazgPJ6) ([citation](https://arxiv.org/abs/1707.08966)), with comparison of various machine learning methods [here](https://arxiv.org/abs/1902.09914)\n", "\n", "One approach: Image based using CNNs\n", "\n", "Simplified model here based on [Deep-learning Top Taggers or The End of QCD?](https://arxiv.org/abs/1701.08784) and [Pulling Out All the Tops with Computer Vision and Deep Learning](https://arxiv.org/abs/1803.00107)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Input Data and Preprocessing\n", "\n", "The dataset contains 1 Million jets each for QCD and top events. For this tutorial we only use 10000 events each for experimentation.\n", "\n", "Features are the cartesian four vectors for the first 200 constituents. (zero-padded if fewer than 200)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " E_0 PX_0 PY_0 PZ_0 E_1 PX_1 \\\n", "436 218.364243 -172.341858 110.129105 -76.503624 153.661118 -111.320465 \n", "440 122.238762 26.738468 -91.613998 76.382225 121.227135 17.644758 \n", "441 383.772308 -97.906456 79.640709 -362.426361 200.625992 -54.921326 \n", "444 132.492752 -77.763947 -87.322601 -62.304600 83.946594 -49.450481 \n", "445 730.786987 -209.120010 -193.454315 -672.973877 225.477325 -75.363350 \n", "452 425.659546 323.020142 -155.901611 -229.213257 83.688065 63.508339 \n", "454 184.878754 -163.308701 -48.425228 71.870857 39.259518 -34.826504 \n", "460 337.542389 -144.905655 52.931824 -300.225647 194.716293 -83.677284 \n", "469 57.453899 8.245859 54.802525 15.153853 48.546764 -30.239859 \n", "476 269.725555 197.983688 180.264572 -32.542664 170.214188 112.955109 \n", "159 279.122498 -186.983612 -184.703552 93.973915 130.233459 -87.081039 \n", "174 293.411957 226.085785 -186.910233 6.353484 56.282650 46.258255 \n", "176 68.190567 42.502636 44.376175 29.567446 50.892151 31.388681 \n", "188 364.401398 0.241121 -186.221115 313.225189 126.724442 -5.135780 \n", "190 839.949280 261.210724 149.367004 784.202332 470.249542 160.376953 \n", "195 158.013107 -58.645222 -111.765907 95.064514 147.360916 -64.619461 \n", "196 242.548203 128.798447 133.625168 156.156647 118.898994 65.832794 \n", "198 155.798996 -76.049294 -74.047806 -114.047157 126.377312 -57.531590 \n", "200 156.450684 70.312820 129.299667 -53.052074 93.001892 34.249096 \n", "211 37.492371 -14.051148 34.111626 -6.681313 29.366516 -4.658844 \n", "835 306.089844 77.324013 -286.238312 76.023705 46.095268 14.454353 \n", "838 72.369453 -42.351864 -47.631145 -34.277267 55.552345 -33.023098 \n", "839 208.428543 176.771591 -16.152925 -109.239861 169.994568 144.319977 \n", "843 119.918091 -83.957573 -62.511585 58.513050 106.310768 -77.028214 \n", "844 227.227386 186.315811 -46.266666 121.565208 157.918549 129.314774 \n", "851 517.916016 -294.284393 359.200562 -229.365799 44.625099 -22.775585 \n", "853 286.906738 119.059898 -84.216034 247.078690 128.020676 53.864735 \n", "863 226.503021 -90.072205 141.984818 151.759430 215.646698 -78.667992 \n", "865 681.133545 -71.523308 -199.660583 647.273499 110.246689 -6.527587 \n", "876 120.037498 90.969063 53.741943 56.968712 106.599525 81.198769 \n", ".. ... ... ... ... ... ... \n", "78 112.548164 -69.994125 55.432499 -68.521164 92.737343 -75.086960 \n", "79 85.717751 -74.409958 11.056811 -41.090591 68.491020 -59.455765 \n", "81 85.493820 -27.314945 -51.542961 -62.501278 102.863480 -34.857529 \n", "84 127.245735 85.553139 -93.959938 6.608004 124.445580 73.211723 \n", "86 94.716499 -78.742737 -1.188648 -52.624935 40.758934 -37.494156 \n", "87 100.364716 72.360008 -60.275139 34.698883 97.817802 71.583481 \n", "88 119.582886 -44.312881 110.618286 -10.001425 113.191605 -41.405144 \n", "95 94.261490 -26.274246 90.048035 9.286770 62.105225 -14.019166 \n", "96 59.289379 -39.532875 -41.123634 -16.162573 50.016296 -22.099298 \n", "98 114.959221 82.774529 77.833389 -17.491800 70.668648 54.612209 \n", "300 337.563660 130.170883 -161.850128 -266.100159 145.962097 56.285725 \n", "305 186.623291 115.854362 -119.002655 -85.113960 70.260399 51.881416 \n", "315 327.187653 91.097389 -132.940445 -284.745239 226.840576 64.454109 \n", "317 55.485935 -17.748005 51.104897 12.328293 44.846516 0.138648 \n", "324 171.201141 155.533829 43.707058 -56.645840 138.376160 125.712807 \n", "325 143.212662 -54.333054 31.085360 128.807953 79.777313 -52.362625 \n", "335 158.137543 -133.522797 -7.243723 -84.419617 80.231056 -68.161530 \n", "338 168.803345 52.608585 34.850925 156.564102 87.154083 39.945709 \n", "345 112.950981 64.614891 -4.162764 92.550049 115.284904 63.913273 \n", "347 274.400665 -60.116039 -165.123810 210.750854 121.846008 -68.042580 \n", "791 279.383087 43.128311 131.399200 242.753189 84.100731 -10.180331 \n", "797 229.098892 228.911407 -6.138898 -6.941409 53.032585 49.544853 \n", "810 311.466248 177.566422 -99.088737 -235.929688 64.184464 36.165394 \n", "816 179.288879 120.904938 -132.369537 -2.192201 148.808716 115.180977 \n", "819 153.994751 151.944946 -10.638010 22.670460 70.607399 69.639229 \n", "820 169.716614 -122.102844 36.656090 112.031059 112.158615 -80.692657 \n", "822 287.591492 -81.565056 -140.493576 237.313202 160.357544 -34.246628 \n", "824 91.996681 60.538616 -63.476665 27.734055 93.122139 60.690456 \n", "825 132.474686 35.062359 125.736656 22.593466 61.466259 7.588309 \n", "827 95.355087 -46.626762 25.236219 79.256989 86.610634 -36.374290 \n", "\n", " PY_1 PZ_1 E_2 PX_2 ... E_199 PX_199 \\\n", "436 93.167969 -50.390713 76.708054 -56.523701 ... 0.0 0.0 \n", "440 -93.015450 75.715302 90.420105 21.377417 ... 0.0 0.0 \n", "441 37.994343 -189.184753 123.247223 -33.828953 ... 0.0 0.0 \n", "444 -53.823605 -41.288010 28.072624 -19.964916 ... 0.0 0.0 \n", "445 -66.226990 -201.926651 217.040192 -63.698189 ... 0.0 0.0 \n", "452 -30.651501 -45.065155 35.438320 25.900942 ... 0.0 0.0 \n", "454 -6.211102 17.024878 36.229164 -31.835764 ... 0.0 0.0 \n", "460 30.484045 -173.156784 115.748047 -52.055187 ... 0.0 0.0 \n", "469 37.749592 4.160240 49.887844 -4.153875 ... 0.0 0.0 \n", "476 125.723831 -20.187439 47.387337 35.194256 ... 0.0 0.0 \n", "159 -86.566429 43.403915 35.724968 -23.600866 ... 0.0 0.0 \n", "174 -32.018463 1.651915 47.768581 37.176056 ... 0.0 0.0 \n", "176 36.319347 16.901674 42.756905 23.390104 ... 0.0 0.0 \n", "188 -65.158585 108.568260 95.833519 -0.119313 ... 0.0 0.0 \n", "190 66.286209 437.058350 213.598602 73.139099 ... 0.0 0.0 \n", "195 -96.582558 90.616646 107.360291 -41.913647 ... 0.0 0.0 \n", "196 62.647701 76.669937 55.116993 31.064240 ... 0.0 0.0 \n", "198 -64.622345 -92.115646 83.377754 -41.058903 ... 0.0 0.0 \n", "200 81.849770 -27.874113 90.545639 34.173183 ... 0.0 0.0 \n", "211 28.639086 4.526615 28.641333 -4.654485 ... 0.0 0.0 \n", "835 -42.235260 11.490357 41.543114 12.639909 ... 0.0 0.0 \n", "838 -36.267849 -26.080288 50.658684 -31.153194 ... 0.0 0.0 \n", "839 -11.911080 -89.039459 118.925804 100.790558 ... 0.0 0.0 \n", "843 -49.156624 54.334709 57.184879 -40.574955 ... 0.0 0.0 \n", "844 -42.471935 80.075539 74.789467 61.393959 ... 0.0 0.0 \n", "851 33.639759 -18.467234 32.496765 -18.493771 ... 0.0 0.0 \n", "853 -37.853809 109.795143 98.720222 38.892624 ... 0.0 0.0 \n", "863 132.869980 150.533752 114.808167 -43.253536 ... 0.0 0.0 \n", "865 -32.490814 105.147850 104.702782 -11.399035 ... 0.0 0.0 \n", "876 47.593819 50.050446 78.952003 59.764385 ... 0.0 0.0 \n", ".. ... ... ... ... ... ... ... \n", "78 42.109306 -34.481441 71.450615 -45.100121 ... 0.0 0.0 \n", "79 8.834720 -32.832600 53.507847 -46.168694 ... 0.0 0.0 \n", "81 -38.749676 -88.680946 96.135536 -31.703087 ... 0.0 0.0 \n", "84 -95.260292 -32.437981 105.382408 56.979771 ... 0.0 0.0 \n", "86 8.191441 -13.725134 38.636875 -36.520794 ... 0.0 0.0 \n", "87 -57.933125 32.983059 88.084023 80.769066 ... 0.0 0.0 \n", "88 104.923485 -9.434802 87.742432 -33.157097 ... 0.0 0.0 \n", "95 59.618134 10.305344 53.662228 11.528226 ... 0.0 0.0 \n", "96 -39.540451 -21.208580 43.462807 -20.643940 ... 0.0 0.0 \n", "98 40.573910 19.113411 50.274555 37.760147 ... 0.0 0.0 \n", "300 -69.983795 -115.061378 73.108635 5.547623 ... 0.0 0.0 \n", "305 -21.762051 -42.086288 60.948166 44.974922 ... 0.0 0.0 \n", "315 -92.687958 -196.751770 168.608826 45.740944 ... 0.0 0.0 \n", "317 43.049892 12.565718 33.957642 6.710351 ... 0.0 0.0 \n", "324 35.326954 -45.784943 41.754326 34.007488 ... 0.0 0.0 \n", "325 -3.214091 60.101952 69.032806 -44.975594 ... 0.0 0.0 \n", "335 -3.043645 -42.210953 63.538044 -58.925129 ... 0.0 0.0 \n", "338 12.922450 76.375290 73.016518 36.039776 ... 0.0 0.0 \n", "345 -4.656441 95.833298 81.806313 57.135715 ... 0.0 0.0 \n", "347 -61.409584 80.283997 97.850533 -57.151199 ... 0.0 0.0 \n", "791 46.183716 69.543930 60.285049 -7.393208 ... 0.0 0.0 \n", "797 13.902727 -12.824851 50.633774 50.608711 ... 0.0 0.0 \n", "810 -20.800457 -48.775509 40.091709 20.045677 ... 0.0 0.0 \n", "816 -90.013542 -27.837770 77.845200 51.305622 ... 0.0 0.0 \n", "819 -5.500076 10.272882 57.127357 54.653866 ... 0.0 0.0 \n", "820 24.224476 74.036636 86.687820 -60.876274 ... 0.0 0.0 \n", "822 -88.843933 129.028931 105.006508 -30.313942 ... 0.0 0.0 \n", "824 -61.697636 34.377369 67.617706 33.402180 ... 0.0 0.0 \n", "825 55.307018 25.722607 53.369095 17.563345 ... 0.0 0.0 \n", "827 28.903816 73.095024 76.419800 -39.889969 ... 0.0 0.0 \n", "\n", " PY_199 PZ_199 truthE truthPX truthPY truthPZ ttv \\\n", "436 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "440 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "441 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "444 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "445 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "452 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "454 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "460 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "469 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "476 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "159 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "174 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "176 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "188 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "190 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "195 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "196 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "198 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "200 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "211 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "835 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "838 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "839 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "843 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "844 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "851 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "853 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "863 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "865 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", "876 0.0 0.0 0.000000 0.000000 0.000000 0.000000 1 \n", ".. ... ... ... ... ... ... ... \n", "78 0.0 0.0 684.287781 -258.680817 376.150818 -481.489655 1 \n", "79 0.0 0.0 564.532837 -442.659424 201.212677 -230.923553 1 \n", "81 0.0 0.0 869.995117 -400.330170 -440.149078 -611.341797 1 \n", "84 0.0 0.0 503.165894 361.098053 -289.973358 -87.063324 1 \n", "86 0.0 0.0 689.829285 -542.529480 -75.202179 -382.762421 1 \n", "87 0.0 0.0 710.630371 544.073853 -405.004059 125.427887 1 \n", "88 0.0 0.0 550.275146 -185.418213 489.095428 4.628590 1 \n", "95 0.0 0.0 575.489746 -208.970932 479.300079 170.623764 1 \n", "96 0.0 0.0 573.650818 -303.963409 -434.481537 -136.952423 1 \n", "98 0.0 0.0 574.677612 459.333130 298.173401 -36.719540 1 \n", "300 0.0 0.0 971.624207 345.387085 -633.088318 -628.454407 1 \n", "305 0.0 0.0 829.788757 527.661987 -320.467896 -527.385315 1 \n", "315 0.0 0.0 1005.369202 301.333252 -463.502106 -822.074158 1 \n", "317 0.0 0.0 677.365662 35.399406 587.608215 291.583069 1 \n", "324 0.0 0.0 585.305908 513.958679 195.995544 -106.244362 1 \n", "325 0.0 0.0 970.638184 -571.535217 -35.579041 764.569458 1 \n", "335 0.0 0.0 692.826843 -609.769348 -40.451492 -278.393799 1 \n", "338 0.0 0.0 1452.763672 575.779358 302.395813 1287.667969 1 \n", "345 0.0 0.0 703.255371 415.590302 221.872940 493.012695 1 \n", "347 0.0 0.0 860.613403 -257.987335 -486.466919 638.902527 1 \n", "791 0.0 0.0 1145.094727 139.510925 637.620056 925.251465 1 \n", "797 0.0 0.0 537.091309 489.141541 -43.226307 135.200928 1 \n", "810 0.0 0.0 959.667175 459.227905 -374.334259 -735.301575 1 \n", "816 0.0 0.0 666.544861 546.472229 -329.620361 -88.627724 1 \n", "819 0.0 0.0 653.731995 618.538391 -69.253319 102.905121 1 \n", "820 0.0 0.0 832.899963 -572.147461 249.778381 524.242310 1 \n", "822 0.0 0.0 1014.205322 -213.806396 -476.465851 852.417358 1 \n", "824 0.0 0.0 659.905701 419.718506 -414.999115 239.802658 1 \n", "825 0.0 0.0 659.797302 39.736858 593.325745 230.296585 1 \n", "827 0.0 0.0 1255.107666 -514.749023 269.764374 1099.081665 1 \n", "\n", " is_signal_new \n", "436 0 \n", "440 0 \n", "441 0 \n", "444 0 \n", "445 0 \n", "452 0 \n", "454 0 \n", "460 0 \n", "469 0 \n", "476 0 \n", "159 0 \n", "174 0 \n", "176 0 \n", "188 0 \n", "190 0 \n", "195 0 \n", "196 0 \n", "198 0 \n", "200 0 \n", "211 0 \n", "835 0 \n", "838 0 \n", "839 0 \n", "843 0 \n", "844 0 \n", "851 0 \n", "853 0 \n", "863 0 \n", "865 0 \n", "876 0 \n", ".. ... \n", "78 1 \n", "79 1 \n", "81 1 \n", "84 1 \n", "86 1 \n", "87 1 \n", "88 1 \n", "95 1 \n", "96 1 \n", "98 1 \n", "300 1 \n", "305 1 \n", "315 1 \n", "317 1 \n", "324 1 \n", "325 1 \n", "335 1 \n", "338 1 \n", "345 1 \n", "347 1 \n", "791 1 \n", "797 1 \n", "810 1 \n", "816 1 \n", "819 1 \n", "820 1 \n", "822 1 \n", "824 1 \n", "825 1 \n", "827 1 \n", "\n", "[20000 rows x 806 columns]\n" ] } ], "source": [ "label_key = \"is_signal_new\"\n", "n_constituents = 200\n", "n_bins = 40\n", "n_events_per_class = None\n", "R_jet = 0.8\n", "plot_range = ([-R_jet, R_jet], [-R_jet, R_jet])\n", "\n", "vector_keys = [\"PT\", \"ETA\", \"PHI\"]\n", "vector_names = {\n", " key: [\"{}_{}\".format(key, i) for i in range(n_constituents)]\n", " for key in vector_keys\n", "}\n", "\n", "# get input file\n", "input_file = get_file(\"intro/data/top_tagging.h5\")\n", "\n", "df = pd.read_hdf(input_file, key=\"table\")\n", "if n_events_per_class is not None:\n", " # select subset of events for faster testing\n", " df = df.iloc[pd.np.r_[0:n_events_per_class, -n_events_per_class:0]]\n", "print(df)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "We want to work on images in the $\\eta$-$\\phi$ plane, with $p_{T}$ as the pixel values\n", "\n", "Start by calculating $p_{T}$, $\\eta$, and $\\phi$ for all constituents" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# convert cartesian coordiantes to pt/eta/phi\n", "for i in range(n_constituents):\n", " px = \"PX_{}\".format(i)\n", " py = \"PY_{}\".format(i)\n", " pz = \"PZ_{}\".format(i)\n", " pt = \"PT_{}\".format(i)\n", " df[\"PT_{}\".format(i)] = (df[px]**2 + df[py]**2)**0.5\n", " df[\"ETA_{}\".format(i)] = np.arctanh(df[pz] / (df[pt]**2 + df[pz]**2)**0.5)\n", " df[\"PHI_{}\".format(i)] = np.arctan2(df[py], df[px])\n", "\n", "df.fillna(0, inplace=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "First step: Center the jet" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "cell_style": "center", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# (first constituent has highest pT)\n", "df[vector_names[\"ETA\"]] = df[vector_names[\"ETA\"]].subtract(df[\"ETA_0\"], axis=0)\n", "# for phi, take smaller angle\n", "df[vector_names[\"PHI\"]] = df[vector_names[\"PHI\"]].subtract(df[\"PHI_0\"], axis=0).add(np.pi).mod(2 * np.pi).subtract(np.pi)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "cell_style": "center" }, "outputs": [], "source": [ "def plot_jet_image(jet_histogram):\n", " fig = plt.figure()\n", " ax = fig.add_subplot(111)\n", " norm = colors.LogNorm(10**-4, jet_histogram.max(), clip='True')\n", " im = ax.imshow(jet_histogram, norm=norm)\n", " fig.colorbar(im)\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "cell_style": "center" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "average_centered_jet, xedges, y_edges = \\\n", " np.histogram2d(df[vector_names[\"ETA\"]].values.reshape(-1), df[vector_names[\"PHI\"]].values.reshape(-1),\n", " weights=df[vector_names[\"PT\"]].values.reshape(-1) / len(df), bins=n_bins, range=plot_range,\n", " )\n", "plot_jet_image(average_centered_jet)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Multiple options for further preprocessing. Here, we rotate to align the jets and then normalize the pixel values" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "stacked_matrix = np.stack([df[vector_names[\"ETA\"]], df[vector_names[\"PHI\"]]], axis=1)\n", "\n", "# get constituent with second highest p_T\n", "rotation_vectors = stacked_matrix[:, :, 1].copy()\n", "rotation_vectors /= np.linalg.norm(rotation_vectors, axis=-1)[:, None]\n", "\n", "# rotate to align that maximum\n", "rotation_matrix = np.stack([rotation_vectors, rotation_vectors.dot([[0, 1],[-1, 0]])], axis=-1)\n", "rotated_matrix = np.einsum('ijl,ijk->ilk', rotation_matrix, stacked_matrix)\n", "\n", "# plot rotated jets\n", "average_rotated_jet, x_edges, y_edges = \\\n", " np.histogram2d(rotated_matrix[:,0,:].reshape(-1), rotated_matrix[:,1,:].reshape(-1),\n", " weights=df[vector_names[\"PT\"]].values.reshape(-1) / len(df), bins=n_bins, range=plot_range,\n", " )\n", "plot_jet_image(average_rotated_jet)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# create image data\n", "weights = df[vector_names[\"PT\"]].values\n", "images = []\n", "for i in range(len(df)):\n", " image, x_edges, y_edges = np.histogram2d(rotated_matrix[i, 0], rotated_matrix[i, 1],\n", " weights=weights[i], bins=n_bins, range=plot_range,\n", " )\n", " # scale to keep values between 0 and 1\n", " image /= np.max(image)\n", " images.append(image)\n", "images = np.stack(images)\n", "\n", "# plot data we use for training\n", "average_image = np.sum(images, axis=0) / len(images)\n", "plot_jet_image(average_image)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Network Training" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# add color channel dimension to data\n", "data = np.expand_dims(images, axis=-1)\n", "labels = df[label_key].values\n", "\n", "activation = \"relu\"\n", "padding = \"same\"\n", "\n", "from tensorflow.keras import layers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Use keras to simplify network definition. Two options of writing the model:" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "### Functional" ] }, { "cell_type": "markdown", "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "source": [ "### Sequential" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "\n", "inputs = layers.Input(shape=(40, 40, 1))\n", "\n", "# convolutional layers: n_filters, kernel size, **kwargs\n", "x = layers.Conv2D(8, 4, activation=activation,\n", " padding=padding)(inputs)\n", "x = layers.Conv2D(8, 4, activation=activation,\n", " padding=padding)(x)\n", "x = layers.MaxPooling2D(pool_size=(2, 2))(x)\n", "x = layers.Conv2D(8, 4, activation=activation, padding=padding)(x)\n", "x = layers.Conv2D(8, 4, activation=activation, padding=padding)(x)\n", "x = layers.Flatten()(x)\n", "x = layers.Dense(64, activation=activation)(x)\n", "x = layers.Dense(64, activation=activation)(x)\n", "x = layers.Dense(64, activation=activation)(x)\n", "output = layers.Dense(2, activation=\"softmax\")(x)\n", "\n", "model = keras.models.Model(inputs, output)\n", "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=\"adam\", metrics=[\"accuracy\"])\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "cell_style": "split", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "\n", "model = keras.Sequential([\n", "# convolutional layers: n_filters, kernel size, **kwargs\n", " layers.Conv2D(8, 4, activation=activation, padding=padding,\n", " input_shape=(40, 40, 1)),\n", " layers.Conv2D(8, 4, activation=activation, padding=padding),\n", " layers.MaxPooling2D(pool_size=(2, 2)),\n", " layers.Conv2D(8, 4, activation=activation, padding=padding),\n", " layers.Conv2D(8, 4, activation=activation, padding=padding),\n", " layers.Flatten(),\n", " layers.Dense(64, activation=activation),\n", " layers.Dense(64, activation=activation),\n", " layers.Dense(64, activation=activation),\n", " layers.Dense(2, activation=\"softmax\")\n", "])\n", "\n", "model.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=\"adam\", metrics=[\"accuracy\"])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Sequential definition is slightly more simple, but functional definition allows access to intermediate layers of the model\n", "\n", "Very helpful feature in Keras: Model summary" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "conv2d_4 (Conv2D) (None, 40, 40, 8) 136 \n", "_________________________________________________________________\n", "conv2d_5 (Conv2D) (None, 40, 40, 8) 1032 \n", "_________________________________________________________________\n", "max_pooling2d_1 (MaxPooling2 (None, 20, 20, 8) 0 \n", "_________________________________________________________________\n", "conv2d_6 (Conv2D) (None, 20, 20, 8) 1032 \n", "_________________________________________________________________\n", "conv2d_7 (Conv2D) (None, 20, 20, 8) 1032 \n", "_________________________________________________________________\n", "flatten_1 (Flatten) (None, 3200) 0 \n", "_________________________________________________________________\n", "dense_4 (Dense) (None, 64) 204864 \n", "_________________________________________________________________\n", "dense_5 (Dense) (None, 64) 4160 \n", "_________________________________________________________________\n", "dense_6 (Dense) (None, 64) 4160 \n", "_________________________________________________________________\n", "dense_7 (Dense) (None, 2) 130 \n", "=================================================================\n", "Total params: 216,546\n", "Trainable params: 216,546\n", "Non-trainable params: 0\n", "_________________________________________________________________\n", "None\n" ] } ], "source": [ "print(model.summary())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Now we train the model, with early stopping if the loss on the validation set does not improve for 2 epochs" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 14000 samples, validate on 6000 samples\n", "Epoch 1/20\n", "14000/14000 [==============================] - 21s 1ms/sample - loss: 0.3727 - acc: 0.8469 - val_loss: 0.3028 - val_acc: 0.8847\n", "Epoch 2/20\n", "14000/14000 [==============================] - 20s 1ms/sample - loss: 0.2967 - acc: 0.8800 - val_loss: 0.2802 - val_acc: 0.8910\n", "Epoch 3/20\n", "14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2700 - acc: 0.8874 - val_loss: 0.2505 - val_acc: 0.8988\n", "Epoch 4/20\n", "14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2532 - acc: 0.8945 - val_loss: 0.2453 - val_acc: 0.8980\n", "Epoch 5/20\n", "14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2448 - acc: 0.8978 - val_loss: 0.2382 - val_acc: 0.9057\n", "Epoch 6/20\n", "14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2313 - acc: 0.9025 - val_loss: 0.2395 - val_acc: 0.9023\n", "Epoch 7/20\n", "14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2211 - acc: 0.9082 - val_loss: 0.2331 - val_acc: 0.9057\n", "Epoch 8/20\n", "14000/14000 [==============================] - 21s 1ms/sample - loss: 0.2099 - acc: 0.9124 - val_loss: 0.2526 - val_acc: 0.8980\n", "Epoch 9/20\n", "14000/14000 [==============================] - 21s 2ms/sample - loss: 0.2039 - acc: 0.9154 - val_loss: 0.2464 - val_acc: 0.9010\n" ] } ], "source": [ "early_stopper = keras.callbacks.EarlyStopping(monitor=\"val_loss\", patience=2, mode=\"auto\", restore_best_weights=True)\n", "history = model.fit(data, labels, batch_size=50, epochs=10, shuffle=True, validation_split=0.3, callbacks=[early_stopper])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(history.history[\"loss\"], label=\"training loss\")\n", "plt.plot(history.history[\"val_loss\"], label=\"validation loss\")\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "A note on metrics (losss/accuracy/...) in Keras:\n", "\n", "* Training metrics are accumulated during an epoch, with dropout applied (if used)\n", "\n", "* Validation metrics are evaluated at the end of an epoch\n", "\n", "$\\quad \\rightarrow$ E.g. training loss can be larger than validation loss" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Target: Validation accuracy of 0.93 when trained on full dataset" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Open Part\n", "\n", "
\n", "\n", "### *Time for questions*\n", "\n", "
\n", "\n", "Otherwise: Explore top-tagging dataset. Some inspiration\n", "* How helpful are the parts of the preprocessing?\n", "* How does the performance scale with the amount of data?\n", "* What would we be able to achieve with a fully connected network?\n", "* Find better hyperparameters for architecture?\n", "* Other ideas to improve performance?\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Coffee\n", "\n", "![](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=coffee.JPG)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" }, "rise": { "backimage": "https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fintro%2Fimages&files=logo.png", "controls": false, "footer": "3rd IML Workshop
Rath | 18/04/2019 | Introduction to the Basics of Deep Learning", "scroll": true } }, "nbformat": 4, "nbformat_minor": 2 }