{"cells":[{"cell_type":"markdown","metadata":{},"source":["
LICENSE\n","\n","Copyright 2018 Google LLC.\n","\n","Licensed under the Apache License, Version 2.0 (the \"License\");\n","you may not use this file except in compliance with the License.\n","You may obtain a copy of the License at\n","\n","https://www.apache.org/licenses/LICENSE-2.0\n","\n","Unless required by applicable law or agreed to in writing, software\n","distributed under the License is distributed on an \"AS IS\" BASIS,\n","WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n","See the License for the specific language governing permissions and\n","limitations under the License.\n","
"]},{"cell_type":"markdown","metadata":{},"source":["# Introduction"]},{"cell_type":"markdown","metadata":{},"source":["Climate Prediction-Random Forest is a model that uses a combination of climate variables and machine learning algorithms to predict future climate conditions. The model is trained on a large dataset of climate observations and uses a random forest approach to generate predictions. The predictions are based on the relationships between the climate variables and the random forest algorithm is able to capture complex patterns in the data."]},{"cell_type":"markdown","metadata":{},"source":["## Importing Libraries"]},{"cell_type":"code","execution_count":1,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:09.977471Z","iopub.status.busy":"2021-09-12T07:42:09.976692Z","iopub.status.idle":"2021-09-12T07:42:11.175857Z","shell.execute_reply":"2021-09-12T07:42:11.174872Z","shell.execute_reply.started":"2021-09-12T07:42:09.977341Z"},"hideCode":false,"hidePrompt":false,"id":"d4a2uASN7jbm","outputId":"a31fc9d2-6d41-4f32-f935-4de1392fb75d","trusted":true},"outputs":[],"source":["# Pandas is used for data manipulation\n","import pandas as pd\n","\n","# Use numpy to convert to arrays\n","import numpy as np\n","\n","# Import tools needed for visualization\n","\n","import matplotlib.pyplot as plt\n","%matplotlib inline"]},{"cell_type":"markdown","metadata":{"execution":{"iopub.execute_input":"2021-06-05T03:15:54.659441Z","iopub.status.busy":"2021-06-05T03:15:54.658886Z","iopub.status.idle":"2021-06-05T03:15:54.679235Z","shell.execute_reply":"2021-06-05T03:15:54.677744Z","shell.execute_reply.started":"2021-06-05T03:15:54.659396Z"}},"source":["## Data Exploration"]},{"cell_type":"code","execution_count":2,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.178015Z","iopub.status.busy":"2021-09-12T07:42:11.177625Z","iopub.status.idle":"2021-09-12T07:42:11.198022Z","shell.execute_reply":"2021-09-12T07:42:11.197162Z","shell.execute_reply.started":"2021-09-12T07:42:11.177971Z"},"trusted":true},"outputs":[],"source":["# Reading the data to a dataframe \n","df = pd.read_csv('https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/classification/temps.csv')"]},{"cell_type":"code","execution_count":3,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.201049Z","iopub.status.busy":"2021-09-12T07:42:11.200281Z","iopub.status.idle":"2021-09-12T07:42:11.230628Z","shell.execute_reply":"2021-09-12T07:42:11.229917Z","shell.execute_reply.started":"2021-09-12T07:42:11.200999Z"},"trusted":true},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
yearmonthdayweektemp_2temp_1averageactualfriend
0201911Fri454545.64529
1201912Sat444545.74461
2201913Sun454445.84156
3201914Mon444145.94053
4201915Tues414046.04441
\n","
"],"text/plain":[" year month day week temp_2 temp_1 average actual friend\n","0 2019 1 1 Fri 45 45 45.6 45 29\n","1 2019 1 2 Sat 44 45 45.7 44 61\n","2 2019 1 3 Sun 45 44 45.8 41 56\n","3 2019 1 4 Mon 44 41 45.9 40 53\n","4 2019 1 5 Tues 41 40 46.0 44 41"]},"execution_count":3,"metadata":{},"output_type":"execute_result"}],"source":["# displaying first 5 rows\n","df.head(5)"]},{"cell_type":"code","execution_count":4,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.232535Z","iopub.status.busy":"2021-09-12T07:42:11.232032Z","iopub.status.idle":"2021-09-12T07:42:11.237917Z","shell.execute_reply":"2021-09-12T07:42:11.236766Z","shell.execute_reply.started":"2021-09-12T07:42:11.232503Z"},"hideCode":false,"hidePrompt":false,"id":"5aXM1w987jbq","outputId":"c9eabdf4-30d9-4df4-b890-b28df3c5287b","trusted":true},"outputs":[{"data":{"text/plain":["(348, 9)"]},"execution_count":4,"metadata":{},"output_type":"execute_result"}],"source":["# the shape of our features\n","df.shape"]},{"cell_type":"code","execution_count":5,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.239954Z","iopub.status.busy":"2021-09-12T07:42:11.239514Z","iopub.status.idle":"2021-09-12T07:42:11.253434Z","shell.execute_reply":"2021-09-12T07:42:11.252149Z","shell.execute_reply.started":"2021-09-12T07:42:11.239913Z"},"trusted":true},"outputs":[{"data":{"text/plain":["Index(['year', 'month', 'day', 'week', 'temp_2', 'temp_1', 'average', 'actual',\n"," 'friend'],\n"," dtype='object')"]},"execution_count":5,"metadata":{},"output_type":"execute_result"}],"source":["# column names\n","df.columns"]},{"cell_type":"code","execution_count":6,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.256082Z","iopub.status.busy":"2021-09-12T07:42:11.255489Z","iopub.status.idle":"2021-09-12T07:42:11.271869Z","shell.execute_reply":"2021-09-12T07:42:11.270748Z","shell.execute_reply.started":"2021-09-12T07:42:11.256038Z"},"trusted":true},"outputs":[{"data":{"text/plain":["year 0\n","month 0\n","day 0\n","week 0\n","temp_2 0\n","temp_1 0\n","average 0\n","actual 0\n","friend 0\n","dtype: int64"]},"execution_count":6,"metadata":{},"output_type":"execute_result"}],"source":["# checking for null values\n","df.isnull().sum()"]},{"cell_type":"markdown","metadata":{},"source":["There are no null values"]},{"cell_type":"markdown","metadata":{"id":"Nzu0v5mQ7jbs"},"source":["## One-Hot Encoding"]},{"cell_type":"markdown","metadata":{"execution":{"iopub.execute_input":"2021-06-05T03:26:04.246284Z","iopub.status.busy":"2021-06-05T03:26:04.245896Z","iopub.status.idle":"2021-06-05T03:26:04.252279Z","shell.execute_reply":"2021-06-05T03:26:04.250937Z","shell.execute_reply.started":"2021-06-05T03:26:04.246247Z"}},"source":["A one hot encoding allows the representation of categorical data to be more expressive. "]},{"cell_type":"code","execution_count":7,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.273448Z","iopub.status.busy":"2021-09-12T07:42:11.273117Z","iopub.status.idle":"2021-09-12T07:42:11.308893Z","shell.execute_reply":"2021-09-12T07:42:11.307365Z","shell.execute_reply.started":"2021-09-12T07:42:11.273418Z"},"hideCode":false,"hidePrompt":false,"id":"VURjcTE27jbu","outputId":"12cc15a3-072a-4e40-89c8-009ea27c2622","trusted":true},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
yearmonthdaytemp_2temp_1averageactualfriendweek_Friweek_Monweek_Satweek_Sunweek_Thursweek_Tuesweek_Wed
0201911454545.64529TrueFalseFalseFalseFalseFalseFalse
1201912444545.74461FalseFalseTrueFalseFalseFalseFalse
2201913454445.84156FalseFalseFalseTrueFalseFalseFalse
3201914444145.94053FalseTrueFalseFalseFalseFalseFalse
4201915414046.04441FalseFalseFalseFalseFalseTrueFalse
\n","
"],"text/plain":[" year month day temp_2 temp_1 average actual friend week_Fri \\\n","0 2019 1 1 45 45 45.6 45 29 True \n","1 2019 1 2 44 45 45.7 44 61 False \n","2 2019 1 3 45 44 45.8 41 56 False \n","3 2019 1 4 44 41 45.9 40 53 False \n","4 2019 1 5 41 40 46.0 44 41 False \n","\n"," week_Mon week_Sat week_Sun week_Thurs week_Tues week_Wed \n","0 False False False False False False \n","1 False True False False False False \n","2 False False True False False False \n","3 True False False False False False \n","4 False False False False True False "]},"execution_count":7,"metadata":{},"output_type":"execute_result"}],"source":["# One-hot encode categorical features\n","df = pd.get_dummies(df)\n","df.head(5)"]},{"cell_type":"code","execution_count":8,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.312525Z","iopub.status.busy":"2021-09-12T07:42:11.312019Z","iopub.status.idle":"2021-09-12T07:42:11.320042Z","shell.execute_reply":"2021-09-12T07:42:11.318836Z","shell.execute_reply.started":"2021-09-12T07:42:11.312458Z"},"id":"zgYBtUrr7jbv","outputId":"69df322f-2e24-4576-9fd2-d34773ac406c","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Shape of features after one-hot encoding: (348, 15)\n"]}],"source":["print('Shape of features after one-hot encoding:', df.shape)"]},{"cell_type":"markdown","metadata":{"id":"mtd7DqrQ7jbw"},"source":["## Features and Labels"]},{"cell_type":"code","execution_count":9,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.322293Z","iopub.status.busy":"2021-09-12T07:42:11.321937Z","iopub.status.idle":"2021-09-12T07:42:11.33496Z","shell.execute_reply":"2021-09-12T07:42:11.333645Z","shell.execute_reply.started":"2021-09-12T07:42:11.322261Z"},"id":"2rYCVrfV7jbx","trusted":true},"outputs":[],"source":["# Labels are the values we want to predict\n","labels = df['actual']\n","\n","# Remove the labels from the features\n","df = df.drop('actual', axis = 1)\n","\n","# Saving feature names for later use\n","feature_list = list(df.columns)"]},{"cell_type":"markdown","metadata":{"id":"Q6SSjx5p7jb0"},"source":["## Train Test Split"]},{"cell_type":"code","execution_count":10,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.336918Z","iopub.status.busy":"2021-09-12T07:42:11.336569Z","iopub.status.idle":"2021-09-12T07:42:11.348348Z","shell.execute_reply":"2021-09-12T07:42:11.347294Z","shell.execute_reply.started":"2021-09-12T07:42:11.336886Z"},"id":"11BJUq0s7jb0","trusted":true},"outputs":[],"source":["# Using Skicit-learn to split data into training and testing sets\n","from sklearn.model_selection import train_test_split\n","\n","# Split the data into training and testing sets\n","train_features, test_features, train_labels, test_labels = train_test_split(df,\n"," labels,\n"," test_size = 0.20,\n"," random_state = 42)"]},{"cell_type":"code","execution_count":11,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.350455Z","iopub.status.busy":"2021-09-12T07:42:11.350066Z","iopub.status.idle":"2021-09-12T07:42:11.358556Z","shell.execute_reply":"2021-09-12T07:42:11.357489Z","shell.execute_reply.started":"2021-09-12T07:42:11.350426Z"},"id":"KkVnZf4H7jb2","outputId":"3c0a9db7-0f71-44be-bd0a-946fddc7d048","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Training Features Shape: (278, 14)\n","Training Labels Shape: (278,)\n","Testing Features Shape: (70, 14)\n","Testing Labels Shape: (70,)\n"]}],"source":["print('Training Features Shape:', train_features.shape)\n","print('Training Labels Shape:', train_labels.shape)\n","print('Testing Features Shape:', test_features.shape)\n","print('Testing Labels Shape:', test_labels.shape)"]},{"cell_type":"markdown","metadata":{"id":"ny3qdq-i7jb4"},"source":["## Training the Forest"]},{"cell_type":"code","execution_count":12,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:11.360258Z","iopub.status.busy":"2021-09-12T07:42:11.359962Z","iopub.status.idle":"2021-09-12T07:42:13.842601Z","shell.execute_reply":"2021-09-12T07:42:13.841175Z","shell.execute_reply.started":"2021-09-12T07:42:11.360229Z"},"hideCode":false,"hidePrompt":false,"id":"d_Vboxs77jb5","trusted":true},"outputs":[],"source":["# Import the model we are using\n","from sklearn.ensemble import RandomForestRegressor\n","\n","# Instantiate model \n","rf = RandomForestRegressor(n_estimators= 1000, random_state=42)\n","\n","# Train the model on training data\n","rf.fit(train_features, train_labels);"]},{"cell_type":"markdown","metadata":{"id":"rJz8X7b77jb6"},"source":["## Make Predictions on Test Data"]},{"cell_type":"code","execution_count":13,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:13.844914Z","iopub.status.busy":"2021-09-12T07:42:13.844471Z","iopub.status.idle":"2021-09-12T07:42:13.975596Z","shell.execute_reply":"2021-09-12T07:42:13.974317Z","shell.execute_reply.started":"2021-09-12T07:42:13.84487Z"},"id":"pssgaBC67jb6","outputId":"5a3a9029-c98b-4ac8-c081-2f7e17c3ca86","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Mean Absolute Error: 3.78 degrees.\n"]}],"source":["# Use the forest's predict method on the test data\n","predictions = rf.predict(test_features)\n","\n","# Calculate the absolute errors\n","errors = abs(predictions - test_labels)\n","\n","# Print out the mean absolute error (mae)\n","print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')\n"]},{"cell_type":"code","execution_count":14,"metadata":{"execution":{"iopub.execute_input":"2021-09-12T07:42:13.978583Z","iopub.status.busy":"2021-09-12T07:42:13.97822Z","iopub.status.idle":"2021-09-12T07:42:13.985832Z","shell.execute_reply":"2021-09-12T07:42:13.984493Z","shell.execute_reply.started":"2021-09-12T07:42:13.978549Z"},"id":"fDaM3Z677jb7","outputId":"2307bab3-cb96-4a7a-f57d-3a9d80cec129","trusted":true},"outputs":[{"name":"stdout","output_type":"stream","text":["Accuracy: 94.02 %.\n"]}],"source":["# Calculate mean absolute percentage error (MAPE)\n","mape = 100 * (errors / test_labels)\n","\n","# Calculate and display accuracy\n","accuracy = 100 - np.mean(mape)\n","print('Accuracy:', round(accuracy, 2), '%.')"]},{"cell_type":"markdown","metadata":{"id":"9U2KQYmS7jb7"},"source":["## Visualizing a Single Decision Tree"]},{"cell_type":"markdown","metadata":{"id":"Cnbb-pTt7jb9"},"source":["![Decision Tree](https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/assignment/deep-learning/nn/tree.png)"]},{"cell_type":"markdown","metadata":{},"source":["## Your turn! 🚀\n","You can practice your random-forest skills by following the assignment [Climate Prediction-Random Forest](../../assignments/machine-learning-productionization/random-forest-classifier.ipynb)."]},{"cell_type":"markdown","metadata":{},"source":["## Acknowledgments\n","\n","Thanks to Kaggle for creating the open source course [Climate Prediction-Random Forest](https://www.kaggle.com/code/anandhuh/climate-prediction-random-forest-94-accuracy?scriptVersionId=74560159&cellId=26). It contributes some of the content in this chapter."]}],"metadata":{"kaggle":{"accelerator":"none","dataSources":[{"datasetId":1018620,"sourceId":1717426,"sourceType":"datasetVersion"}],"dockerImageVersionId":30096,"isGpuEnabled":false,"isInternetEnabled":false,"language":"python","sourceType":"notebook"},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.11.5"}},"nbformat":4,"nbformat_minor":4}