{"cells":[{"cell_type":"markdown","metadata":{},"source":["<details><summary><b>LICENSE</b></summary>\n","\n","Copyright 2018 Google LLC.\n","\n","Licensed under the Apache License, Version 2.0 (the \"License\");\n","you may not use this file except in compliance with the License.\n","You may obtain a copy of the License at\n","\n","https://www.apache.org/licenses/LICENSE-2.0\n","\n","Unless required by applicable law or agreed to in writing, software\n","distributed under the License is distributed on an \"AS IS\" BASIS,\n","WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n","See the License for the specific language governing permissions and\n","limitations under the License.\n","</details>"]},{"cell_type":"markdown","metadata":{},"source":["# Introduction"]},{"cell_type":"markdown","metadata":{},"source":["Climate Prediction-Random Forest is a model that uses a combination of climate variables and machine learning algorithms to predict future climate conditions. The model is trained on a large dataset of climate observations and uses a random forest approach to generate predictions. The predictions are based on the relationships between the climate variables and the random forest algorithm is able to capture complex patterns in the data."]},{"cell_type":"markdown","metadata":{},"source":["## Importing Libraries"]},{"cell_type":"code","execution_count":1,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:09.977471Z","iopub.status.busy":"2021-09-12T07:42:09.976692Z","iopub.status.idle":"2021-09-12T07:42:11.175857Z","shell.execute_reply":"2021-09-12T07:42:11.174872Z","shell.execute_reply.started":"2021-09-12T07:42:09.977341Z"},"hideCode":false,"hidePrompt":false,"id":"d4a2uASN7jbm","outputId":"a31fc9d2-6d41-4f32-f935-4de1392fb75d","trusted":true},"outputs":[],"source":["# Pandas is used for data manipulation\n","import pandas as pd\n","\n","# Use numpy to convert to arrays\n","import numpy as np\n","\n","# Import tools needed for visualization\n","\n","import matplotlib.pyplot as plt\n","%matplotlib inline"]},{"cell_type":"markdown","metadata":{"execution":{"iopub.execute_input":"2021-06-05T03:15:54.659441Z","iopub.status.busy":"2021-06-05T03:15:54.658886Z","iopub.status.idle":"2021-06-05T03:15:54.679235Z","shell.execute_reply":"2021-06-05T03:15:54.677744Z","shell.execute_reply.started":"2021-06-05T03:15:54.659396Z"}},"source":["## Data Exploration"]},{"cell_type":"code","execution_count":2,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.178015Z","iopub.status.busy":"2021-09-12T07:42:11.177625Z","iopub.status.idle":"2021-09-12T07:42:11.198022Z","shell.execute_reply":"2021-09-12T07:42:11.197162Z","shell.execute_reply.started":"2021-09-12T07:42:11.177971Z"},"trusted":true},"outputs":[],"source":["# Reading the data to a dataframe \n","df = pd.read_csv('https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/classification/temps.csv')"]},{"cell_type":"code","execution_count":3,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.201049Z","iopub.status.busy":"2021-09-12T07:42:11.200281Z","iopub.status.idle":"2021-09-12T07:42:11.230628Z","shell.execute_reply":"2021-09-12T07:42:11.229917Z","shell.execute_reply.started":"2021-09-12T07:42:11.200999Z"},"trusted":true},"outputs":[{"data":{"text/html":["<div>\n","<style scoped>\n","    .dataframe tbody tr th:only-of-type {\n","        vertical-align: middle;\n","    }\n","\n","    .dataframe tbody tr th {\n","        vertical-align: top;\n","    }\n","\n","    .dataframe thead th {\n","        text-align: right;\n","    }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n","  <thead>\n","    <tr style=\"text-align: right;\">\n","      <th></th>\n","      <th>year</th>\n","      <th>month</th>\n","      <th>day</th>\n","      <th>week</th>\n","      <th>temp_2</th>\n","      <th>temp_1</th>\n","      <th>average</th>\n","      <th>actual</th>\n","      <th>friend</th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <th>0</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>1</td>\n","      <td>Fri</td>\n","      <td>45</td>\n","      <td>45</td>\n","      <td>45.6</td>\n","      <td>45</td>\n","      <td>29</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>2</td>\n","      <td>Sat</td>\n","      <td>44</td>\n","      <td>45</td>\n","      <td>45.7</td>\n","      <td>44</td>\n","      <td>61</td>\n","    </tr>\n","    <tr>\n","      <th>2</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>3</td>\n","      <td>Sun</td>\n","      <td>45</td>\n","      <td>44</td>\n","      <td>45.8</td>\n","      <td>41</td>\n","      <td>56</td>\n","    </tr>\n","    <tr>\n","      <th>3</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>4</td>\n","      <td>Mon</td>\n","      <td>44</td>\n","      <td>41</td>\n","      <td>45.9</td>\n","      <td>40</td>\n","      <td>53</td>\n","    </tr>\n","    <tr>\n","      <th>4</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>5</td>\n","      <td>Tues</td>\n","      <td>41</td>\n","      <td>40</td>\n","      <td>46.0</td>\n","      <td>44</td>\n","      <td>41</td>\n","    </tr>\n","  </tbody>\n","</table>\n","</div>"],"text/plain":["   year  month  day  week  temp_2  temp_1  average  actual  friend\n","0  2019      1    1   Fri      45      45     45.6      45      29\n","1  2019      1    2   Sat      44      45     45.7      44      61\n","2  2019      1    3   Sun      45      44     45.8      41      56\n","3  2019      1    4   Mon      44      41     45.9      40      53\n","4  2019      1    5  Tues      41      40     46.0      44      41"]},"execution_count":3,"metadata":{},"output_type":"execute_result"}],"source":["# displaying first 5 rows\n","df.head(5)"]},{"cell_type":"code","execution_count":4,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.232535Z","iopub.status.busy":"2021-09-12T07:42:11.232032Z","iopub.status.idle":"2021-09-12T07:42:11.237917Z","shell.execute_reply":"2021-09-12T07:42:11.236766Z","shell.execute_reply.started":"2021-09-12T07:42:11.232503Z"},"hideCode":false,"hidePrompt":false,"id":"5aXM1w987jbq","outputId":"c9eabdf4-30d9-4df4-b890-b28df3c5287b","trusted":true},"outputs":[{"data":{"text/plain":["(348, 9)"]},"execution_count":4,"metadata":{},"output_type":"execute_result"}],"source":["# the shape of our features\n","df.shape"]},{"cell_type":"code","execution_count":5,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.239954Z","iopub.status.busy":"2021-09-12T07:42:11.239514Z","iopub.status.idle":"2021-09-12T07:42:11.253434Z","shell.execute_reply":"2021-09-12T07:42:11.252149Z","shell.execute_reply.started":"2021-09-12T07:42:11.239913Z"},"trusted":true},"outputs":[{"data":{"text/plain":["Index(['year', 'month', 'day', 'week', 'temp_2', 'temp_1', 'average', 'actual',\n","       'friend'],\n","      dtype='object')"]},"execution_count":5,"metadata":{},"output_type":"execute_result"}],"source":["# column names\n","df.columns"]},{"cell_type":"code","execution_count":6,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.256082Z","iopub.status.busy":"2021-09-12T07:42:11.255489Z","iopub.status.idle":"2021-09-12T07:42:11.271869Z","shell.execute_reply":"2021-09-12T07:42:11.270748Z","shell.execute_reply.started":"2021-09-12T07:42:11.256038Z"},"trusted":true},"outputs":[{"data":{"text/plain":["year       0\n","month      0\n","day        0\n","week       0\n","temp_2     0\n","temp_1     0\n","average    0\n","actual     0\n","friend     0\n","dtype: int64"]},"execution_count":6,"metadata":{},"output_type":"execute_result"}],"source":["# checking for null values\n","df.isnull().sum()"]},{"cell_type":"markdown","metadata":{},"source":["There are no null values"]},{"cell_type":"markdown","metadata":{"id":"Nzu0v5mQ7jbs"},"source":["## One-Hot Encoding"]},{"cell_type":"markdown","metadata":{"execution":{"iopub.execute_input":"2021-06-05T03:26:04.246284Z","iopub.status.busy":"2021-06-05T03:26:04.245896Z","iopub.status.idle":"2021-06-05T03:26:04.252279Z","shell.execute_reply":"2021-06-05T03:26:04.250937Z","shell.execute_reply.started":"2021-06-05T03:26:04.246247Z"}},"source":["A one hot encoding allows the representation of categorical data to be more expressive. "]},{"cell_type":"code","execution_count":7,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.273448Z","iopub.status.busy":"2021-09-12T07:42:11.273117Z","iopub.status.idle":"2021-09-12T07:42:11.308893Z","shell.execute_reply":"2021-09-12T07:42:11.307365Z","shell.execute_reply.started":"2021-09-12T07:42:11.273418Z"},"hideCode":false,"hidePrompt":false,"id":"VURjcTE27jbu","outputId":"12cc15a3-072a-4e40-89c8-009ea27c2622","trusted":true},"outputs":[{"data":{"text/html":["<div>\n","<style scoped>\n","    .dataframe tbody tr th:only-of-type {\n","        vertical-align: middle;\n","    }\n","\n","    .dataframe tbody tr th {\n","        vertical-align: top;\n","    }\n","\n","    .dataframe thead th {\n","        text-align: right;\n","    }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n","  <thead>\n","    <tr style=\"text-align: right;\">\n","      <th></th>\n","      <th>year</th>\n","      <th>month</th>\n","      <th>day</th>\n","      <th>temp_2</th>\n","      <th>temp_1</th>\n","      <th>average</th>\n","      <th>actual</th>\n","      <th>friend</th>\n","      <th>week_Fri</th>\n","      <th>week_Mon</th>\n","      <th>week_Sat</th>\n","      <th>week_Sun</th>\n","      <th>week_Thurs</th>\n","      <th>week_Tues</th>\n","      <th>week_Wed</th>\n","    </tr>\n","  </thead>\n","  <tbody>\n","    <tr>\n","      <th>0</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>1</td>\n","      <td>45</td>\n","      <td>45</td>\n","      <td>45.6</td>\n","      <td>45</td>\n","      <td>29</td>\n","      <td>True</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","    </tr>\n","    <tr>\n","      <th>1</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>2</td>\n","      <td>44</td>\n","      <td>45</td>\n","      <td>45.7</td>\n","      <td>44</td>\n","      <td>61</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>True</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","    </tr>\n","    <tr>\n","      <th>2</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>3</td>\n","      <td>45</td>\n","      <td>44</td>\n","      <td>45.8</td>\n","      <td>41</td>\n","      <td>56</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>True</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","    </tr>\n","    <tr>\n","      <th>3</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>4</td>\n","      <td>44</td>\n","      <td>41</td>\n","      <td>45.9</td>\n","      <td>40</td>\n","      <td>53</td>\n","      <td>False</td>\n","      <td>True</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","    </tr>\n","    <tr>\n","      <th>4</th>\n","      <td>2019</td>\n","      <td>1</td>\n","      <td>5</td>\n","      <td>41</td>\n","      <td>40</td>\n","      <td>46.0</td>\n","      <td>44</td>\n","      <td>41</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>False</td>\n","      <td>True</td>\n","      <td>False</td>\n","    </tr>\n","  </tbody>\n","</table>\n","</div>"],"text/plain":["   year  month  day  temp_2  temp_1  average  actual  friend  week_Fri  \\\n","0  2019      1    1      45      45     45.6      45      29      True   \n","1  2019      1    2      44      45     45.7      44      61     False   \n","2  2019      1    3      45      44     45.8      41      56     False   \n","3  2019      1    4      44      41     45.9      40      53     False   \n","4  2019      1    5      41      40     46.0      44      41     False   \n","\n","   week_Mon  week_Sat  week_Sun  week_Thurs  week_Tues  week_Wed  \n","0     False     False     False       False      False     False  \n","1     False      True     False       False      False     False  \n","2     False     False      True       False      False     False  \n","3      True     False     False       False      False     False  \n","4     False     False     False       False       True     False  "]},"execution_count":7,"metadata":{},"output_type":"execute_result"}],"source":["# One-hot encode categorical features\n","df = pd.get_dummies(df)\n","df.head(5)"]},{"cell_type":"code","execution_count":8,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.312525Z","iopub.status.busy":"2021-09-12T07:42:11.312019Z","iopub.status.idle":"2021-09-12T07:42:11.320042Z","shell.execute_reply":"2021-09-12T07:42:11.318836Z","shell.execute_reply.started":"2021-09-12T07:42:11.312458Z"},"id":"zgYBtUrr7jbv","outputId":"69df322f-2e24-4576-9fd2-d34773ac406c","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Shape of features after one-hot encoding: (348, 15)\n"]}],"source":["print('Shape of features after one-hot encoding:', df.shape)"]},{"cell_type":"markdown","metadata":{"id":"mtd7DqrQ7jbw"},"source":["## Features and Labels"]},{"cell_type":"code","execution_count":9,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.322293Z","iopub.status.busy":"2021-09-12T07:42:11.321937Z","iopub.status.idle":"2021-09-12T07:42:11.33496Z","shell.execute_reply":"2021-09-12T07:42:11.333645Z","shell.execute_reply.started":"2021-09-12T07:42:11.322261Z"},"id":"2rYCVrfV7jbx","trusted":true},"outputs":[],"source":["# Labels are the values we want to predict\n","labels = df['actual']\n","\n","# Remove the labels from the features\n","df = df.drop('actual', axis = 1)\n","\n","# Saving feature names for later use\n","feature_list = list(df.columns)"]},{"cell_type":"markdown","metadata":{"id":"Q6SSjx5p7jb0"},"source":["## Train Test Split"]},{"cell_type":"code","execution_count":10,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.336918Z","iopub.status.busy":"2021-09-12T07:42:11.336569Z","iopub.status.idle":"2021-09-12T07:42:11.348348Z","shell.execute_reply":"2021-09-12T07:42:11.347294Z","shell.execute_reply.started":"2021-09-12T07:42:11.336886Z"},"id":"11BJUq0s7jb0","trusted":true},"outputs":[],"source":["# Using Skicit-learn to split data into training and testing sets\n","from sklearn.model_selection import train_test_split\n","\n","# Split the data into training and testing sets\n","train_features, test_features, train_labels, test_labels = train_test_split(df,\n","                                                                            labels,\n","                                                                            test_size = 0.20,\n","                                                                            random_state = 42)"]},{"cell_type":"code","execution_count":11,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.350455Z","iopub.status.busy":"2021-09-12T07:42:11.350066Z","iopub.status.idle":"2021-09-12T07:42:11.358556Z","shell.execute_reply":"2021-09-12T07:42:11.357489Z","shell.execute_reply.started":"2021-09-12T07:42:11.350426Z"},"id":"KkVnZf4H7jb2","outputId":"3c0a9db7-0f71-44be-bd0a-946fddc7d048","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Training Features Shape: (278, 14)\n","Training Labels Shape: (278,)\n","Testing Features Shape: (70, 14)\n","Testing Labels Shape: (70,)\n"]}],"source":["print('Training Features Shape:', train_features.shape)\n","print('Training Labels Shape:', train_labels.shape)\n","print('Testing Features Shape:', test_features.shape)\n","print('Testing Labels Shape:', test_labels.shape)"]},{"cell_type":"markdown","metadata":{"id":"ny3qdq-i7jb4"},"source":["## Training the Forest"]},{"cell_type":"code","execution_count":12,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.360258Z","iopub.status.busy":"2021-09-12T07:42:11.359962Z","iopub.status.idle":"2021-09-12T07:42:13.842601Z","shell.execute_reply":"2021-09-12T07:42:13.841175Z","shell.execute_reply.started":"2021-09-12T07:42:11.360229Z"},"hideCode":false,"hidePrompt":false,"id":"d_Vboxs77jb5","trusted":true},"outputs":[],"source":["# Import the model we are using\n","from sklearn.ensemble import RandomForestRegressor\n","\n","# Instantiate model \n","rf = RandomForestRegressor(n_estimators= 1000, random_state=42)\n","\n","# Train the model on training data\n","rf.fit(train_features, train_labels);"]},{"cell_type":"markdown","metadata":{"id":"rJz8X7b77jb6"},"source":["## Make Predictions on Test Data"]},{"cell_type":"code","execution_count":13,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:13.844914Z","iopub.status.busy":"2021-09-12T07:42:13.844471Z","iopub.status.idle":"2021-09-12T07:42:13.975596Z","shell.execute_reply":"2021-09-12T07:42:13.974317Z","shell.execute_reply.started":"2021-09-12T07:42:13.84487Z"},"id":"pssgaBC67jb6","outputId":"5a3a9029-c98b-4ac8-c081-2f7e17c3ca86","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Mean Absolute Error: 3.78 degrees.\n"]}],"source":["# Use the forest's predict method on the test data\n","predictions = rf.predict(test_features)\n","\n","# Calculate the absolute errors\n","errors = abs(predictions - test_labels)\n","\n","# Print out the mean absolute error (mae)\n","print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')\n"]},{"cell_type":"code","execution_count":14,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:13.978583Z","iopub.status.busy":"2021-09-12T07:42:13.97822Z","iopub.status.idle":"2021-09-12T07:42:13.985832Z","shell.execute_reply":"2021-09-12T07:42:13.984493Z","shell.execute_reply.started":"2021-09-12T07:42:13.978549Z"},"id":"fDaM3Z677jb7","outputId":"2307bab3-cb96-4a7a-f57d-3a9d80cec129","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Accuracy: 94.02 %.\n"]}],"source":["# Calculate mean absolute percentage error (MAPE)\n","mape = 100 * (errors / test_labels)\n","\n","# Calculate and display accuracy\n","accuracy = 100 - np.mean(mape)\n","print('Accuracy:', round(accuracy, 2), '%.')"]},{"cell_type":"markdown","metadata":{"id":"9U2KQYmS7jb7"},"source":["## Visualizing a Single Decision Tree"]},{"cell_type":"markdown","metadata":{"id":"Cnbb-pTt7jb9"},"source":["![Decision Tree](https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/assignment/deep-learning/nn/tree.png)"]},{"cell_type":"markdown","metadata":{},"source":["## Your turn! 🚀\n","You can practice your random-forest skills by following the assignment [Climate Prediction-Random Forest](../../assignments/machine-learning-productionization/random-forest-classifier.ipynb)."]},{"cell_type":"markdown","metadata":{},"source":["## Acknowledgments\n","\n","Thanks to Kaggle for creating the open source course [Climate Prediction-Random Forest](https://www.kaggle.com/code/anandhuh/climate-prediction-random-forest-94-accuracy?scriptVersionId=74560159&cellId=26). It contributes some of the content in this chapter."]}],"metadata":{"kaggle":{"accelerator":"none","dataSources":[{"datasetId":1018620,"sourceId":1717426,"sourceType":"datasetVersion"}],"dockerImageVersionId":30096,"isGpuEnabled":false,"isInternetEnabled":false,"language":"python","sourceType":"notebook"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.11.5"}},"nbformat":4,"nbformat_minor":4}