{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 6 - Random Forest Deep Dive\n", "\n", "> A deep dive into how Random Forests work and some tricks for making them more accurate." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/lewtun/dslectures/master?urlpath=lab/tree/notebooks%2Flesson05_random-forest-deep-dive.ipynb)\n", "[![slides](https://img.shields.io/static/v1?label=slides&message=2021-lesson06.pdf&color=blue&logo=Google-drive)](https://drive.google.com/open?id=10TDYyUqEDB7oa6ejRy8x_ylUF84Ta2Te)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Understand how to go from simple to complex models\n", "* Understand the concepts of bagging and out-of-bag score\n", "* Gain an introduction to hyperparameter tuning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This lesson is adapted from Jeremy Howard's fantastic online course [_Introduction to Machine Learning for Coders_](https://course18.fast.ai/ml), in particular:\n", "\n", "* [2 - Random Forest Deep Dive](https://course18.fast.ai/lessonsml1/lesson2.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Homework\n", "\n", "* Solve the exercises included in this notebook\n", "* Read chapter 7 of _Hands-On Machine Learning with Scikit-Learn and TensorFlow_ by Aurèlien Geron" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lesson we will analyse the preprocessed table of clean housing data and their addresses that we prepared in lesson 3:\n", "\n", "* `housing_processed.csv`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How does a random forest work for regression tasks?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " | longitude | \n", "latitude | \n", "housing_median_age | \n", "total_rooms | \n", "total_bedrooms | \n", "population | \n", "households | \n", "median_income | \n", "median_house_value | \n", "city | \n", "postal_code | \n", "rooms_per_household | \n", "bedrooms_per_household | \n", "bedrooms_per_room | \n", "population_per_household | \n", "ocean_proximity_INLAND | \n", "ocean_proximity_<1H OCEAN | \n", "ocean_proximity_NEAR BAY | \n", "ocean_proximity_NEAR OCEAN | \n", "ocean_proximity_ISLAND | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "-122.23 | \n", "37.88 | \n", "41.0 | \n", "880.0 | \n", "129.0 | \n", "322.0 | \n", "126.0 | \n", "8.3252 | \n", "452600.0 | \n", "69 | \n", "94705 | \n", "6.984127 | \n", "1.023810 | \n", "0.146591 | \n", "2.555556 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
1 | \n", "-122.22 | \n", "37.86 | \n", "21.0 | \n", "7099.0 | \n", "1106.0 | \n", "2401.0 | \n", "1138.0 | \n", "8.3014 | \n", "358500.0 | \n", "620 | \n", "94611 | \n", "6.238137 | \n", "0.971880 | \n", "0.155797 | \n", "2.109842 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
2 | \n", "-122.24 | \n", "37.85 | \n", "52.0 | \n", "1467.0 | \n", "190.0 | \n", "496.0 | \n", "177.0 | \n", "7.2574 | \n", "352100.0 | \n", "620 | \n", "94618 | \n", "8.288136 | \n", "1.073446 | \n", "0.129516 | \n", "2.802260 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
3 | \n", "-122.25 | \n", "37.85 | \n", "52.0 | \n", "1274.0 | \n", "235.0 | \n", "558.0 | \n", "219.0 | \n", "5.6431 | \n", "341300.0 | \n", "620 | \n", "94618 | \n", "5.817352 | \n", "1.073059 | \n", "0.184458 | \n", "2.547945 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "
4 | \n", "-122.25 | \n", "37.85 | \n", "52.0 | \n", "1627.0 | \n", "280.0 | \n", "565.0 | \n", "259.0 | \n", "3.8462 | \n", "342200.0 | \n", "620 | \n", "94618 | \n", "6.281853 | \n", "1.081081 | \n", "0.172096 | \n", "2.181467 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "