{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### HW 1: Getting started with numpy, matplotlib, pandas and Kaggle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Total: 25 pts__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start date: Tuesday Sept. 3
\n", "Due date: Tuesday Sept. 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you don't already have a version of anaconda installed, start by downloading anaconda and installing it (see for example [here](https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/)). When working on the exercises below, keep in mind that there exists a rich python documentation online. Don't hesitate to check the documentation and examples related to the functions you want to use. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__1. (4pts) Numerical Linear Algebra: Numpy__\n", "\n", "- Start by building a 10 by 10 matrix of random Gaussian entries. Then compute the two largest eigenvalues of the matrix\n", "- Reshape the matrix that you built above into a 2 by 50 array (call it $v$) first and into a single vector then (call it 'w'). Return the vector obtained by sorting the elements of $w$ in descending order\n", "- Generate two random vectors (you can choose the distribution you use to generate the entries). Let us call those vectors $v1$ and $v2$. Stack those vectors vertically then horizontally. Store the respective results in two matrices $A$ and $B$.\n", "- Do the same with two random arrays $C_1 \\in \\mathbb{R}^{n\\times n}$ and $C_2^{n\\times n}$. Store the results in the variables $Cv$ and $Ch$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# put your code here\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__2. (2pts) Towards multiclass classification: one-hot encoding__\n", "\n", "- Generate a vector (let us call it $v$) of integers taking values between 0 and 9. \n", "- Then build the vector corresponding to the one-hot encoding of each entry in $v$ (a one-hot encoding represents each categorical variable (0 to 9 digits in your vector $v$ by using binary sequences in which only one entry (for example the one corresponding to the digit that is encoded) is non zero))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# put your code here\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__3. (6pt) Towards regression: sampling and matplolib__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__3a. (2pts) One dimensional__ In this exercise, we will successively generate points according to a function, sample pairs (t,f) from that distribution and plot the results\n", "\n", "- Using the 'linspace' function from numpy, generate $1000$ pairs $(t, f(t) = \\frac{1}{1+e^{-t}})$ for values of $t$ between $-6$ and $6$. What does the function look like? \n", "- Generate 100 random pairs $(t_i, f_i)$ from the plot. Then plot the points $(t_i,x_i)$ on top of the line $(t, f(t))$ using matplotlib (you can choose how you randomly generate the points)\n", "- From the pairs " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# put your code here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__3b. (4pts) The two dimensional hyperplane__\n", "\n", "- An extension of the previous case, we now want to generate triples $(x,y, t)$ according to the following hyperplane: \n", "\n", "$$t \\equiv\\pi(x, y) = x + y +1$$\n", "\n", "using _Axes3D_, _matplolib_ and _pyplot_, as well as the _meshgrid( )_ and _arrange( )_ functions from numpy and the _plot_surface( )_ and _scatter( )_ functions from pyplot,\n", "\n", "- Generate a regular grid of points $(x, y)$ covering the domain $[-20,20]\\times [-20,20]$. Let us say 200 by 200. \n", "- As in the 1D case, we now want to generate noisy samples that are lying on the plane on average. Start by generating $(50\\times 50)$ triples $(x,y,\\pi(x,y))$ covering the domain $[-20,20]\\times [-20,20]$. \n", "- Perturb the $50\\times 50$ pairs by adding to them a random gaussian noise of amplitude no larger than $0.1$\n", "- Finally using the _scatter( )_ function from pyplot, plot the noisy samples on top of the plane. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# put your code here\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__4. (3pts) Getting started with Pandas and Kaggle datasets__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__4a__ Download the car dataset on [Kaggle](https://www.kaggle.com/toramky/automobile-dataset/downloads/automobile-dataset.zip/2) and open this dataset with pandas. \n", "\n", "- Display a couple (5-10) of rows from the pandas data frame. \n", "- Find the brand that has the highest average price across cars\n", "- Sort the cars according to their horse power and return the corresponding panda frame. Display the first 10 lines from the frame.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# put your code here. Don't hesitate to check the online \n", "# documentation on the panda library\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }