{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### HW 1: Getting started with numpy, matplotlib, pandas and Kaggle"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Total: 25 pts__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start date: Tuesday Sept. 3
\n",
"Due date: Tuesday Sept. 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you don't already have a version of anaconda installed, start by downloading anaconda and installing it (see for example [here](https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/)). When working on the exercises below, keep in mind that there exists a rich python documentation online. Don't hesitate to check the documentation and examples related to the functions you want to use. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__1. (4pts) Numerical Linear Algebra: Numpy__\n",
"\n",
"- Start by building a 10 by 10 matrix of random Gaussian entries. Then compute the two largest eigenvalues of the matrix\n",
"- Reshape the matrix that you built above into a 2 by 50 array (call it $v$) first and into a single vector then (call it 'w'). Return the vector obtained by sorting the elements of $w$ in descending order\n",
"- Generate two random vectors (you can choose the distribution you use to generate the entries). Let us call those vectors $v1$ and $v2$. Stack those vectors vertically then horizontally. Store the respective results in two matrices $A$ and $B$.\n",
"- Do the same with two random arrays $C_1 \\in \\mathbb{R}^{n\\times n}$ and $C_2^{n\\times n}$. Store the results in the variables $Cv$ and $Ch$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# put your code here\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__2. (2pts) Towards multiclass classification: one-hot encoding__\n",
"\n",
"- Generate a vector (let us call it $v$) of integers taking values between 0 and 9. \n",
"- Then build the vector corresponding to the one-hot encoding of each entry in $v$ (a one-hot encoding represents each categorical variable (0 to 9 digits in your vector $v$ by using binary sequences in which only one entry (for example the one corresponding to the digit that is encoded) is non zero))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# put your code here\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__3. (6pt) Towards regression: sampling and matplolib__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__3a. (2pts) One dimensional__ In this exercise, we will successively generate points according to a function, sample pairs (t,f) from that distribution and plot the results\n",
"\n",
"- Using the 'linspace' function from numpy, generate $1000$ pairs $(t, f(t) = \\frac{1}{1+e^{-t}})$ for values of $t$ between $-6$ and $6$. What does the function look like? \n",
"- Generate 100 random pairs $(t_i, f_i)$ from the plot. Then plot the points $(t_i,x_i)$ on top of the line $(t, f(t))$ using matplotlib (you can choose how you randomly generate the points)\n",
"- From the pairs "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# put your code here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__3b. (4pts) The two dimensional hyperplane__\n",
"\n",
"- An extension of the previous case, we now want to generate triples $(x,y, t)$ according to the following hyperplane: \n",
"\n",
"$$t \\equiv\\pi(x, y) = x + y +1$$\n",
"\n",
"using _Axes3D_, _matplolib_ and _pyplot_, as well as the _meshgrid( )_ and _arrange( )_ functions from numpy and the _plot_surface( )_ and _scatter( )_ functions from pyplot,\n",
"\n",
"- Generate a regular grid of points $(x, y)$ covering the domain $[-20,20]\\times [-20,20]$. Let us say 200 by 200. \n",
"- As in the 1D case, we now want to generate noisy samples that are lying on the plane on average. Start by generating $(50\\times 50)$ triples $(x,y,\\pi(x,y))$ covering the domain $[-20,20]\\times [-20,20]$. \n",
"- Perturb the $50\\times 50$ pairs by adding to them a random gaussian noise of amplitude no larger than $0.1$\n",
"- Finally using the _scatter( )_ function from pyplot, plot the noisy samples on top of the plane. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# put your code here\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__4. (3pts) Getting started with Pandas and Kaggle datasets__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__4a__ Download the car dataset on [Kaggle](https://www.kaggle.com/toramky/automobile-dataset/downloads/automobile-dataset.zip/2) and open this dataset with pandas. \n",
"\n",
"- Display a couple (5-10) of rows from the pandas data frame. \n",
"- Find the brand that has the highest average price across cars\n",
"- Sort the cars according to their horse power and return the corresponding panda frame. Display the first 10 lines from the frame.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# put your code here. Don't hesitate to check the online \n",
"# documentation on the panda library\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}