{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 05\n",
    "\n",
    "# Neural networks\n",
    "\n",
    "## 4.1 Little Red Riding Hood Network\n",
    "\n",
    "Train a neural network to solve the  Little Red Riding Hood problem in sklern and Keras. Try the neural networ with different inputs and report the results.\n",
    "\n",
    "________________\n",
    "\n",
    "## 4.2 Boston House Price Prediction\n",
    "\n",
    "In the next questions we are going to work using the dataset *Boston*. This dataset measures the influence of socioeconomical factors on the price of several estates of the city of Boston. This dataset has 506 instances, each one characterized by 13 features:\n",
    "\n",
    "* CRIM - per capita crime rate by town\n",
    "* ZN - proportion of residential land zoned for lots over 25,000 sq.ft.\n",
    "* INDUS - proportion of non-retail business acres per town.\n",
    "* CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)\n",
    "* NOX - nitric oxides concentration (parts per 10 million)\n",
    "* RM - average number of rooms per dwelling\n",
    "* AGE - proportion of owner-occupied units built prior to 1940\n",
    "* DIS - weighted distances to five Boston employment centres\n",
    "* RAD - index of accessibility to radial highways\n",
    "* TAX - full-value property-tax rate per 10,000 USD\n",
    "* PTRATIO - pupil-teacher ratio by town\n",
    "* B - $1000(Bk - 0.63)^2$ where $Bk$ is the proportion of blacks by town\n",
    "* LSTAT - % lower status of the population\n",
    "\n",
    "Output variable:\n",
    "* MEDV - Median value of owner-occupied homes in 1000's USD\n",
    "\n",
    "**Note:** In this exercise we are going to predict the price of each estate, which is represented in the `MEDV` variable. It is important to remember that we are always aiming to predict `MEDV`, no matter which explanatory variables we are using. That means, in some cases we will use a subset of the 13 previously mentioned variables, while in other cases we will use all the 13 variables. But in no case we will change the dependent variable $y$.\n",
    "\n",
    "\n",
    "\n",
    "1. Load the dataset using `from sklearn.datasets import load_boston`.\n",
    "2. Create a DataFrame using the attribute `.data` from the loading function of Scikit-learn.\n",
    "3. Assign the columns of the DataFrame so they match the `.feature_names` attribute from the loading function of Scikit-learn. \n",
    "4. Assign a new column to the DataFrame which holds the value to predict, that means, the `.target` attribute of the loading function of Scikit-learn. The name of this columns must be `MEDV`.\n",
    "5. Use the function `.describe()` from Pandas for obtaining statistics about each column.\n",
    "\n",
    "## 4.3 Feature analysis:\n",
    "\n",
    "Using the DataFrame generated in the previous section:\n",
    "* Filter the dataset to just these features:\n",
    "     * Explanatory: 'LSTAT', 'INDUS', 'NOX', 'RM', 'AGE'\n",
    "     * Dependent: 'MEDV'.\n",
    "* Generate a scatter matrix among the features mentioned above using Pandas (`scatter_matrix`) or Seaborn (` pairplot`).\n",
    "     * Do you find any relationship between the features?\n",
    "* Generate the correlation matrix between these variables using `numpy.corrcoef`. Also include `MEDV`.\n",
    "     * Which characteristics are more correlated?\n",
    "     * BONUS: Visualize this matrix as heat map using Pandas, Matplotlib or Seaborn.\n",
    "\n",
    "## 4.4 Modeling linear and non linear relationships\n",
    "\n",
    "* Generate two new subsets filtering these characteristics:\n",
    "     * $D_1$:  $X = \\textit{'RM'}$, $y = \\textit{'MEDV'}$\n",
    "     * $D_2$:  $X = \\textit{'LSTAT'}$, $y = \\textit{'MEDV'}$\n",
    "* For each subset, generate a training partition and a test partition using a ratio of $ 70 \\% - 30 \\% $\n",
    "* Train a linear regression model on both subsets of data:\n",
    "     * Report the mean square error on the test set\n",
    "     * Print the values of $ w $ and $ w_0 $ of the regression equation\n",
    "     * Generate a graph where you visualize the line obtained by the regression model in conjunction with the training data and the test data\n",
    "* How does the model perform on $ D_1 $ and $ D_2 $? Why?\n",
    "\n",
    "## 4.5 Training a regression model\n",
    "\n",
    "* Generate a 70-30 partitioning of the data **using all the features**. (Do not include the dependent variable `MEDV`)\n",
    "* Train a linear regression model with the objective of predicting the output variable `MEDV`.\n",
    "     * Report the mean square error on the test set\n",
    "* Train a regression model using `MLPRegressor` in order to predict the output variable` MEDV`.\n",
    "     * Report the mean square error on the test set\n",
    "* Scale the data so that they have zero mean variance one per feature (only $ X $). You can use the following piece of code:\n",
    "\n",
    "```python\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "sc_x = StandardScaler()\n",
    "sc_x.fit(X)\n",
    "X_train_s = sc_x.transform(X_train)\n",
    "X_test_s = sc_x.transform(X_test)\n",
    "```\n",
    "Check more information about `StandardScaler` [here](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).\n",
    "\n",
    "* Train the following models:\n",
    "     1. Train a linear regression model using the scaled data.\n",
    "         * Report the mean square error on the test set\n",
    "     2. Train a regression model using a 2-layer MultiLayer Perceptron (128 neurons in the first and 512 in the second) and with the **scaled data**.\n",
    "         * Report the mean square error on the test set\n",
    "     3. Which model has better performance? Why?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}