{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "pycharm": { "name": "#%% md\n" } }, "source": [ "# Tutorial: Generative Adversarial Networks - Advanced Techniques\n", "### - *Jonas Glombitza, RWTH Aachen University, IML 2019, CERN*\n", "This tutorial is about _generative models_ and especially **Generative Adversarial Networks** (**GANs**).\n", "In this tutorial we will implement different types of GANs, which were proposed recently:\n", "- Vanilla GAN - https://arxiv.org/abs/1406.2661\n", "- Conditional GAN - https://arxiv.org/abs/1610.09585\n", "- Wasserstein GAN (WGAN-GP) - https://arxiv.org/abs/1704.00028\n", "- Spectral Normalization SN-GAN - https://arxiv.org/abs/1802.05957\n", "\n", "and learn about further techniques to stabilize the training of GANs. (DCGANs, conditioning of the generator ...).\n", "To train our generative models, we will have a look on three different data sets (1 from computer vision, 2 physics data sets):\n", "1. [ CIFAR10 ](https://www.cs.toronto.edu/~kriz/cifar.html)\n", "2. [ Footprints of Cosmic Ray induced Air Showers ](https://link.springer.com/article/10.1007/s41781-018-0008-x)\n", "3. [ Electromagnetic Calorimeter Images (multi-layer) ](https://doi.org/10.1007/s41781-018-0019-7)\n", "\n", "As framework, we make use of [TensorFlow](https://www.tensorflow.org/) and especially:\n", "- [Keras](https://keras.io/): Keras API shipped with TensorFlow\n", "- [TensorFlow-GAN](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/gan): lightweight library for training GANs\n", "\n", "#### Table of contents\n", "1. [ Basics ](#basics)\n", "\n", "2. [ Generative Adversarial Networks ](#gan)\n", " 1. [ Theory ](#gan_theory)\n", " 2. [ CIFAR10: Example](#gan_code)\n", " 3. [Implementation](#gan_code)\n", " \n", "3. [ Wasserstein GANs ](#wgan)\n", " 1. [ Theory ](#wgan_theory)\n", " 2. [ CIFAR10: Example](#wgan_code)\n", " 3. [ Results](#wgan_result)\n", " \n", "4. [ Spectral Normalization for GANs ](#sngan)\n", " 1. [ Theory ](#sngan_theory)\n", " 2. [ Physics Example: Cosmic Ray induces Air Showers](#sngan_code)\n", " 3. [ Results](#sngan_result)\n", " \n", "5. [ Calorimeter Images](#calgan)\n", " 1. [ Generator conditioning ](#calgan_theory)\n", " 2. [ Physics Example: Calorimeter images](#calgan_code)\n", " 3. [ Results](#calgan_result)\n", " \n", " \n", "\n", "# Basics\n", " \n", "\n", "## Generative Models\n", "Before we jump into the implementation of several GANs, we need to introduce _Generative Models_.\n", "\n", "Let us assume we have a bunch of images which forms the distribution of real images $P_{r}$.\n", "In our case (CIFAR10 dataset) the distribution consists of several classes: `horse, airplane, bird, frog, truck, deer, cat, dog, car, ship`.\n", "Instead of training a classifier to be able to label our data, we now would like to generate samples which are \n", "similar to samples in the distribution $P_{r}$ formed by the bunch of images.\n", "The clue is, that we would like to generate **new samples** which were **not part** of the dataset, but look really similar.\n", " \n", "![CIFAR 10 images](https://cernbox.cern.ch/index.php/s/xDYiSmbleT3rip4/download?path=%2Fgan%2Fimages&files=CIFAR10_collection.png)\n", "\n", "So in a mathematical way, we would like to approximate the real distribution $P_{r}$ with a model $P_{\\theta}$.\n", "With this *generative* model, we then would like to generate new samples ($x \\sim P_{\\theta}$).\n", "\n", "\n", "## Generative Adversarial Networks\n", "The basic idea of Generative Adversarial Networks (GANs) is to train a **generator network** to learn the underlying data distribution.[1](#myfootnote1)\n", "In other words, we would like to design a generator machine, we can feed with noise and which outputs us nice samples following\n", "the distribution of real images $P_{r}$, but which were not part of the training dataset.\n", "So in our case, we would like to generate new samples of airplanes, cars, dogs etc..\n", " \n", " \n", "\n", "In our setup we use a neural network $G(z)$ as generator machine. \n", "The generator network $G(z)$ gets as input a noise vector $z$ sampled from a multidimensional noise distribution $z \\sim p(z)$.\n", "This space of $z$ is often called latent space. The generator should then map the noise vector $z$ into the data space (the space where our\n", "real data samples lie) and output new samples/images $\\tilde{x} \\sim G(z)$. We would like to have these $\\tilde{x}$ very similar to the real samples $x \\sim P_{r}$.\n", "\n", "For the training of our generator network we need feedback, if the generated samples are of good or bad quality.\n", "Because a classical supervised loss is incapable for giving a good feedback to the generator network, it is trained in an unsupervised manner.\n", "So instead of using \"mean squared error\" or similar metrics, the performance measure is given by a **second** _adversarial_ neural network, which is called _discriminator_.\n", "In the vanilla GAN setup, the way of measuring the quality of the generated samples should be given by a classifier which is trained\n", "to classify between fake `class=0` (generated by our generator network) and real images `class=1` (from our bunch of images).\n", "\n", "The clue is, that the generator should try to fool the discriminator.\n", "So, by adapting the generator weights the discriminator should fail to identify the fake images, and should output `class=1` (real images)\n", "when generated images are input into the discriminator. It is crucial, that we can directly get this feedback when stacking the generator on top\n", "of the discriminator to build our GAN framework. Because both are neural networks, we just can propagate the gradient \n", "through the discriminator to the generator, which can then adapts its weights to generate samples of better quality.\n", "In simple words, the generator should change the weights in a such a way, that the discriminator identifies\n", "the generated samples as real images.\n", "When iteratively adapting the weights of the discriminator and generator, the performance of generated images should increase stepwise.\n", "This is the fascinating idea of _adversarial training_.\n", "\n", "\n", "\n", "\n", "In a very figurative sense, the discriminator could be seen as painter which is able to classify between\n", "real images and fake images, because he knows a little bit about colors and art. We now would like to build a generator machine\n", "which produce nice photographs.\n", "The idea is, to fool the painter (discriminator) by changing the parameters of our machine.\n", "Because the painter has knowledge how \"real images\" look like, the feedback of him helps us to modify the generator machine in such a way that it will produce images of better quality.\n", "\n", "\n", "### Adversarial training\n", "After introducing the basic idea of adversarial training, let us now understand the math and focus on the algorithm itself.\n", " \n", "Our adversarial framework consists of 2 networks:\n", "- the generator network $G$ (learn the mapping from noise $z$ to images $\\tilde{x}$)\n", "- the discriminator $D$ network (measures the image quality, by discriminating if the images are true `class=1` or fake `class=0`)\n", "\n", "The iterative update procedure of the framework is as follows:\n", " 1. Discriminator update: train discriminator to classify between fake and real images\n", " 2. Generator update: train generator to fool the discriminator\n", " - Repeat from beginning\n", "\n", "\n", "#### Discriminator update\n", "Sample noise vectors from the latent space $z \\sim p(z)$.\n", "Generate new fake samples by feeding the sampled vectors $z$ into the generator $G(z)$ to obtain the new samples $\\tilde{x} \\sim G(z)$.\n", "Subsequently, we sample from the real distribution and obtain a bunch of samples $ x \\sim P_{r}$.\n", "We now train the discriminator using the binary cross entropy by changing the weights $w$ of the discriminator:\n", "\n", "$ \\mathcal{L}_{Dis} = \\min{ -\\mathbb{E}_{\\mathbf{x} \\sim p_{data}(\\mathbf{x})} [log(D_w(\\mathbf{x}))] - \\mathbb{E}_{\\mathbf{z} \\sim p_z(\\mathbf{z})} [log(1-D_w(G_{\\theta}(\\mathbf{z})))]}$\n", "\n", "This is just the typical supervised training of a classifier.\n", "\n", "\n", "#### Generator update\n", "Firs, we sample noise from the latent space $z \\sim p(z)$.\n", "Generate new fake samples by feeding the noise into the generator $G(z)$ and obtain a bunch of generated samples $\\tilde{x} \\sim G(z)$.\n", "Now we freeze the weights $w$ of the discriminator.\n", "Finally, we train the generator to fool the discriminator, by adapting the weights $\\theta$ of our generator network:\n", "\n", "$ \\mathcal{L}_{Gen} = \\max{ -\\mathbb{E}_{\\mathbf{z} \\sim p_z(\\mathbf{z})} [log(1-D_w(G_{\\theta}(\\mathbf{z})))]}$\n", "\n", "By iteratively passing these steps of discriminator and generator updates, we train our framework.\n", "\n", "### Architectures\n", "Building a powerful architecture of the discriminator and generator is rucial for successful GAN training.\n", "Remember, that the generator maps from the latent space into the data space.\n", "Hence, the input should be a 1D-vector of noise variables and the output should have image dimensions.\n", "In this case the output will have the dimension `(32, 32, 3)`. (3 color channels: RGB)\n", "\n", "[Radford, Metz, and Chintala](https://arxiv.org/abs/1511.06434) proposed stable architectures for generator and discriminator networks.\n", "These DCGAN \"guidelines\" can be summarized as follows:\n", "- Replace fully connected layers with convolutional layers\n", "- Do not use pooling layers, use striding instead\n", "- Make use of batch normalization in generator and discriminator to stabilize training[2](#myfootnote2)\n", "- Use [LeakyReLU](https://arxiv.org/pdf/1505.00853.pdf) activation in discriminator for better feedback[3](#myfootnote3)\n", "- Use a pyramidal topology in the generator by using transposed convolutions, to support a simple and structured latent space\n", "\n", "\n", "