{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### HW 1: Getting started with numpy, matplotlib, pandas and Kaggle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Total: 25 pts__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start date: <font color=red>Tuesday Sept. 3</font> <br>\n",
    "Due date: <font color=red>Tuesday Sept. 10</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you don't already have a version of anaconda installed, start by downloading anaconda and installing it (see for example [here](https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/)). When working on the exercises below, keep in mind that there exists a rich python documentation online. Don't hesitate to check the documentation and examples related to the functions you want to use. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__1. (4pts) Numerical Linear Algebra: Numpy__\n",
    "\n",
    "- Start by building a 10 by 10 matrix of random Gaussian entries. Then compute the two largest eigenvalues of the matrix\n",
    "- Reshape the matrix that you built above into a 2 by 50 array (call it $v$) first and into a single vector then (call it 'w'). Return the vector obtained by sorting the elements of $w$ in descending order\n",
    "- Generate two random vectors (you can choose the distribution you use to generate the entries). Let us call those vectors $v1$ and $v2$. Stack those vectors vertically then horizontally. Store the respective results in two matrices $A$ and $B$.\n",
    "- Do the same with two random arrays $C_1 \\in \\mathbb{R}^{n\\times n}$ and $C_2^{n\\times n}$. Store the results in the variables $Cv$ and $Ch$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# put your code here\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__2. (2pts) Towards multiclass classification: one-hot encoding__\n",
    "\n",
    "- Generate a vector (let us call it $v$) of integers taking values between 0 and 9. \n",
    "- Then build the vector corresponding to the one-hot encoding of each entry in $v$ (a one-hot encoding represents each categorical variable (0 to 9 digits in your vector $v$ by using binary sequences in which only one entry (for example the one corresponding to the digit that is encoded) is non zero))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# put your code here\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__3. (6pt) Towards regression: sampling and matplolib__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__3a. (2pts) One dimensional__ In this exercise, we will successively generate points according to a function, sample pairs (t,f) from that distribution and plot the results\n",
    "\n",
    "- Using the 'linspace' function from numpy, generate $1000$ pairs $(t, f(t) = \\frac{1}{1+e^{-t}})$ for values of $t$ between $-6$ and $6$. What does the function look like? \n",
    "- Generate 100 random pairs $(t_i, f_i)$ from the plot. Then plot the points $(t_i,x_i)$ on top of the line $(t, f(t))$ using matplotlib (you can choose how you randomly generate the points)\n",
    "- From the pairs "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# put your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__3b. (4pts) The two dimensional hyperplane__\n",
    "\n",
    "- An extension of the previous case, we now want to generate triples $(x,y, t)$ according to the following hyperplane: \n",
    "\n",
    "$$t \\equiv\\pi(x, y) = x + y +1$$\n",
    "\n",
    "using _Axes3D_, _matplolib_ and _pyplot_, as well as the _meshgrid( )_ and _arrange( )_ functions from numpy and the _plot_surface( )_ and _scatter( )_ functions from pyplot,\n",
    "\n",
    "- Generate a regular grid of points $(x, y)$ covering the domain $[-20,20]\\times [-20,20]$. Let us say 200 by 200. \n",
    "- As in the 1D case, we now want to generate noisy samples that are lying on the plane on average. Start by generating $(50\\times 50)$ triples $(x,y,\\pi(x,y))$ covering the domain $[-20,20]\\times [-20,20]$. \n",
    "- Perturb the $50\\times 50$ pairs by adding to them a random gaussian noise of amplitude no larger than $0.1$\n",
    "- Finally using the _scatter( )_ function from pyplot, plot the noisy samples on top of the plane. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# put your code here\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__4. (3pts) Getting started with Pandas and Kaggle datasets__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__4a__ Download the car dataset on [Kaggle](https://www.kaggle.com/toramky/automobile-dataset/downloads/automobile-dataset.zip/2) and open this dataset with pandas. \n",
    "\n",
    "- Display a couple (5-10) of rows from the pandas data frame.  \n",
    "- Find the brand that has the highest average price across cars\n",
    "- Sort the cars according to their horse power and return the corresponding panda frame. Display the first 10 lines from the frame.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# put your code here. Don't hesitate to check the online \n",
    "# documentation on the panda library\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}