{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

ING3 GEE: Devoir Maison

\n", "\n", "

Gradient descent optimization with LASSO and autonomous driving

\n", " \n", "
Given date: Tuesday Nov 22
\n", "\n", "
Due date: Friday December 13
\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 1. (5pts) Local vs global minimas and gradient descent\n", "\n", "We consider the following function.\n", "\n", "\\begin{align}\n", "F(x_1, x_2) = 3(1-x_1)^2\\exp(-(x_1^2) - (x_2+1)^2)\\\\ \n", " - 10(x_1/5 - x_1^3 - x_2^5)\\exp(-x_1^2-x_2^2)\\\\\n", " - (1/3)\\exp(-(x_1+1)^2 - x_2^2)\n", "\\end{align}\n", "\n", "The surface plot of this function is given below together with its contour plot. The function has a single global minimum located near $(0.23, -1.62)$ and shown in red in the contour plot.\n", "\n", "We want to implement gradient descent iterations on that function. Starting from a random initial point $(x_1, x_2)$, code the following updates \n", "\n", "\\begin{align}\n", "x_1^{(k+1)} = x_1^{(k)} - \\eta * \\text{grad}_{x_1} F(x_1, x_2)\\\\\n", "x_2^{(k+1)} = x_2^{(k)} - \\eta * \\text{grad}_{x_2} F(x_1, x_2)\n", "\\end{align}\n", "\n", "where $\\text{grad}_{x_i}$ represents the gradient of $F(x_1, x_2)$ with respect to $x_i$. Choose a sufficiently small learning rate and plot the iterates (in white) on the contour plot. Repeat your experiments for various initial iterates. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from mpl_toolkits.mplot3d import Axes3D \n", "\n", "import matplotlib.pyplot as plt\n", "from matplotlib import cm\n", "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n", "import numpy as np\n", "\n", "\n", "fig = plt.figure()\n", "ax = fig.gca(projection='3d')\n", "\n", "# Make data.\n", "x = np.linspace(-3, 3, 100)\n", "y = np.linspace(-3, 3, 100)\n", "x1, x2 = np.meshgrid(x, y)\n", "F = 3*(1-x1)**2 * np.exp(-(x1**2) - (x2+1)**2)\\\n", " - 10*(np.true_divide(x1,5) - x1**3 - x2**5)*np.exp(-x1**2 - x2**2)\\\n", " - np.true_divide(1,3)*np.exp(-(x1+1)**2 - x2**2)\n", "\n", "# Plot the surface.\n", "surf = ax.plot_surface(x1, x2, F, linewidth=0, alpha=1, cmap = 'viridis')\n", "plt.show()\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig1, ax = plt.subplots(constrained_layout=True)\n", "contour = ax.contourf(x1, x2, F,cmap = 'viridis')\n", "plt.scatter(0.23, -1.62,c='r',marker='X')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# put your solution here\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 2. Solving LASSO (10pts)\n", "\n", "Learning a model through the OLS loss can be done very efficiently through either gradient descent or even through the Normal equations. The same is true for ridge regression. For LASSO however, the non differentiability of the absolute value at $0$ makes the learning more tricky.\n", "\n", "\n", "One approach, known as _ISTA (Iterative Shrinkage-Thresholding Algorithm)_ consists in combining traditional gradient descent steps with a projection onto the $\\ell_1$ norm ball. Concretely, for LASSO \n", "\n", "\\begin{align}\n", "\\ell(\\boldsymbol \\beta) = \\|\\boldsymbol X\\boldsymbol \\beta - \\boldsymbol t\\|^2_2 + \\lambda \\|\\boldsymbol \\beta\\|_1\n", "\\end{align}\n", "\n", "where the data has centered so that $\\beta_0 = 0$. I.e.\n", "\\begin{align}\n", " \\mathbf{x}^{(i)} \\leftarrow \\mathbf{x}^{(i)}- \\frac{1}{N}\\sum_{i=1}^{N} \\mathbf{x}^{(i)}\\\\\n", "t^{(i)} \\leftarrow t^{(i)} - \\frac{1}{N}\\sum_{i=1}^N t^{(i)}\n", "\\end{align}\n", "\n", "The ISTA update takes the form \n", "\n", "\\begin{align}\n", "\\boldsymbol \\beta^{k+1} \\leftarrow \\mathcal{T}_{\\lambda \\eta} (\\boldsymbol \\beta^{k} - 2\\eta \\mathbf{X}^T(\\mathbf{X}\\mathbf{\\beta} - \\mathbf{t}))\n", "\\end{align}\n", "\n", "where $\\mathcal{T}_{\\lambda \\eta}(\\mathbf{x})_i$ is the thresholding operator defined component-wise as\n", "\n", "\\begin{align}\n", "\\mathcal{T}_{\\lambda \\eta}(\\mathbf{\\beta})_i = (|\\beta_i| - \\lambda \\eta)_+ \\text{sign}(\\beta_i)\n", "\\end{align}\n", "\n", "In the equations above, $\\eta$ is an appropriate step size and $(x)_+ = \\max(x, 0)$ \n", "\n", "##### Question 2.2. (6pts)\n", "\n", "Complete the function 'ISTA' below which must return a final estimate for the regression vector $\\mathbf{\\beta}$ given a feature matrix $\\mathbf{X}$, a target vector $\\mathbf{t}$ (the function should include the centering steps for $\\mathbf{x}_i$ and $t_i$) regularization weight $\\lambda$, and the choice for the learning rate $\\eta$. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def ISTA(beta_init, X, t, lbda, eta): \n", " \n", " '''The function takes as input an initial guess for beta, a set \n", " of feature vectors stored in X and their corresponding \n", " targets stored in t, a regularization weight lbda, \n", " step size parameter eta and must return the \n", " regression vector following from the minimization of \n", " the LASSO objective''' \n", " \n", " \n", " return beta " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Question 2.2. (4pts)\n", "\n", "Apply your algorithm to the data (in red) given below for polynomial features up to degree 6 and for various values of $\\lambda$. Display the result on top of the true model (in blue). Note that for $\\beta_0$ to be identically zero in the model including the higher degree features, the centering should be done after generating those features." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import numpy as np \n", "import matplotlib.pyplot as plt \n", "from math import sqrt\n", "import numpy as np\n", "from scipy import linalg\n", " \n", "x = np.linspace(0,1,10) \n", "xtrue = np.linspace(0,1,100) \n", "t_true = 0.1 + 1.3*xtrue \n", " \n", "t = 0.1 + 1.3*x \n", " \n", "tnoisy = t+np.random.normal(0,.1,len(x)) \n", " \n", "\n", "plt.scatter(x, tnoisy, c='r') \n", "plt.plot(xtrue, t_true) \n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 3: Convolutional Neural Network and Autonomous Driving (10pts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this third part, we will use [the Keras API](https://keras.io/) to build and train a convolutional neural network to discriminate between four types of road signs. To simplify we will consider the detection of 4 different signs: \n", "\n", "- A '30 km/h' sign (folder 1)\n", "- A 'Stop' sign \n", "- A 'Go straight' sign\n", "- A 'Keep left' sign \n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example of each sign is given below." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "\n", "img1 = mpimg.imread('1/00001_00000_00012.png')\n", "plt.subplot(141)\n", "plt.imshow(img1)\n", "plt.axis('off')\n", "plt.subplot(142)\n", "img2 = mpimg.imread('2/00014_00001_00019.png')\n", "plt.imshow(img2)\n", "plt.axis('off')\n", "plt.subplot(143)\n", "img3 = mpimg.imread('3/00035_00008_00023.png')\n", "plt.imshow(img3)\n", "plt.axis('off')\n", "plt.subplot(144)\n", "img4 = mpimg.imread('4/00039_00000_00029.png')\n", "plt.imshow(img4)\n", "plt.axis('off')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Question 3.1 (5pts) \n", " \n", "- Before building the network, you should start by cropping the images so that they all have a common predefined size (take the smallest size across all images) \n", "\n", "- We will use a __Sequential model__ from Keras but it will be up to you to define the structure of the convolution net. Initialization of the sequential model can be done with the following line " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = Sequential()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.1.a. Convolutions. \n", "\n", "- We will use a __convolutional__ architecture. you can add convolutional layers to the model by using the following lines " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.add(Conv2D(num_units, (filter_size1, filter_size2), padding='same',\n", " input_shape=(IMG_SIZE, IMG_SIZE,3),\n", " activation='relu'))\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "for the first layer and " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.add(Conv2D(filters, filter_size, activation, input_shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "for all the others. 'filters' indicate the number of filters you want to use in the convolutional layer. filter_size is the size of each filter and activation is the usual activation that comes on top of the convolution, i.e.\n", "$x_{\\text{out}} = \\sigma(\\text{filter}*\\text{input})$. Finally input_shape indicates the size of your input. Note that only the input layer should be given the input size. Subsequent layers will automatically compute the size of their inputs based on previous layers. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.1.b Pooling Layers \n", "\n", "On top of the convolutional layers, convolutional neural networks (CNN) also often rely on __Pooling layers__. The addition of such a layer can be done through the following line " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " model.add(MaxPooling2D(pool_size=(filter_sz1, filter_sz2),strides=None))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The _pooling layers_ usually come with two parameters: the 'pool size' and the 'stride' parameter. The basic choice for the pool size is (2,2) and the stride is usually set to None (which means it will split the image into non overlapping regions such as in the Figure below). You should however feel free to play a little with those parameters. The __MaxPool operator__ considers a mask of size 'pool_size' which is slided over the image by a number of pixels equal to the stride parameters (in x and y, there are hence two translation parameters). for each position of the mask, the output only retains the max of the pixels appearing in the mask (This idea is illustrated below). One way to understand the effect of the pooling operator is that if the filter detects an edge in a subregion of the image (thus returning at least one large value), although a MaxPooling will reduce the number of parameters, it will keep track of this information. \n", "\n", "Adding 'Maxpooling' layers is known to work well in practice. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although it is a little bit up to you to decide how you want to structure the network, a good start is to add a couple (definitely not exceeding 4) combinations (convolution, convolution, Pooling) with increasing number of units (you do every power of two like 16, 32, 128,...). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.1.c. Flattening and Fully connected layers\n", "\n", "Once you have stacked the convolutional and pooling layers, you should flatten the output through a line of the form" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.add(Flatten())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And add a couple (no need to put more than 2,3) dense fully connected layers through lines of the form" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.add(Dense(num_units, activation='relu'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.1.d. Concluding \n", "\n", "Since there are four possible signs, you need to __finish your network with a dense layer with 4 units__. Each of those units should output four number between 0 and 1 representing the likelihood that any of the four signs is detected and such that $p_1 + p_2 + p_3 + p_4 = 1$ (hopefully with one probability much larger than the others). For this reason, a good choice for the __final activation function__ of those four units is the __softmax__ (Why?). \n", "\n", "\n", "Build your model below. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = Sequential()\n", "\n", "# construct the model using convolutional layers, dense fully connected layers and \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Question 3.2 (3pts). Setting up the optimizer\n", "\n", "Once you have found a good architecture for your network, split the dataset, by retaining about 90% of the images for training and 10% of each folder for test. To train your network in Keras, we need two more steps. The first step is to set up the optimizer. Here again it is a little bit up to you to decide how you want to set up the optimization. Two popular approaches are __SGD and ADAM__. You will get to choose the learning rate. This rate should however be between 1e-3 and 1e-2. Once you have set up the optimizer, we need to set up the optimization parameters. This includes the loss (we will take it to be the __categorical cross entropy__ which is the extension of the log loss to the multiclass problem)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.optimizers import SGD\n", "from tensorflow.keras.optimizers import Adam\n", "\n", "# set up the optimize here\n", "# Myoptimizer = SGD\n", "# Myoptimizer = Adam\n", "\n", "model.compile(loss='categorical_crossentropy',\n", " optimizer=Myoptimizer,\n", " metrics=['accuracy'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Question 3.3 (2pts). Optimization\n", "\n", "The last step is to fit the network to your data. Just as any function in scikit-learn, we use a call to the function 'fit'. The training of neural networks can be done by splitting the dataset into minibatches and using a different batch at each SGD step. This process is repeated over the whole dataset. A complete screening of the dataset is called an epoch. We can then repeat this idea several times. In keras the number of epochs is stored in the 'epochs' parameter and the batch size is stored in the 'batch_size' parameter. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_size = '''set the size of the batch here'''\n", "epochs = '''set number of epochs here'''\n", "\n", "model.fit(X, t,batch_size=batch_size,epochs=epochs, validation_split=0.2)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }