{ "cells": [ { "cell_type": "markdown", "id": "22fda647", "metadata": {}, "source": [ "

ROAD SURFACE TEMPERATURE ANALYSIS

\n", "\n", "
Influence of weather on road conditions for self driving cars
\n", "
ECON 323 PROJECT
\n", "\n", "*Authors: Martina Chiesa and Pascal Terpstra*" ] }, { "cell_type": "markdown", "id": "d66492d2", "metadata": {}, "source": [ "# Index\n", "\n", "[Introduction](#Introduction)\n", "\n", "[Objective and method](#Objective-and-method)\n", "\n", "[Libraries](#Libraries)\n", "\n", "[Data](#Data)\n", "\n", "[Data Exploration](#Data-Exploration)\n", "\n", "- [Univariate statistics](#Univariate-statistics)\n", "\n", "- [Missing value imputation](#Missing-value-imputation)\n", "\n", "- [Date time](#Date-time)\n", "\n", "- [Road Surface Temperature graph ](#Road-Surface-Temperature-graph)\n", "\n", "- [Road Surface Temperature in a day](#Road-Surface-Temperature-in-a-day)\n", "\n", "- [Temperature in a day](#Temperature-in-a-day)\n", "\n", "[Regression models](#Regression-models)\n", "\n", "- [Linear regression](#Linear-regression)\n", " - [Prediction RST at 2pm](#Prediction-RST-at-2pm)\n", " - [Prediction RST at 10pm](#Prediction-RST-at-10pm)\n", "- [Lasso regression](#Lasso-regression)\n", " - [Lasso RST at 2pm](#Lasso-RST-at-2pm)\n", " - [Lasso RST at 10pm](#Lasso-RST-at-10pm)\n", "\n", "[Build a new dataframe for classification](#Build-a-new-dataframe-for-classification)\n", " - [Adding response variable](#Adding-response-variable)\n", " \n", "[Pre processing](#Pre-processing)\n", "- [Split in train and test sets](#Split-in-train-and-test-sets)\n", "- [Feature scaling](#Feature-scaling)\n", "\n", "[Classification analysis](#Classification-analysis)\n", "- [Logistic regression classifier](#Logistic-regression-classifier)\n", "- [KNN](#KNN)\n", "- [SVM linear](#SVM-linear)\n", "- [SVM non linear](#SVM-non-linear)\n", "- [Naive bayes classifier](#Naive-bayes-classifier)\n", "- [Decision tree](#Decision-tree)\n", "- [Random forest](#Random-forest)\n", "- [Ada boost](#Ada-boost)\n", "\n", "[Conclusions](#Conclusions)" ] }, { "cell_type": "markdown", "id": "fa4dab29", "metadata": {}, "source": [ "## Introduction\n", "\n", "In the context of the development of self-driving cars, it is relevant to be able to predict road surface temperature, since it highly influences the safety of roads and it is therefore important that cars adapt their driving style depending on the road conditions (*Sukuvaara, Timo, et al. \"ITS-Enabled advanced road weather services and infrastructures for vehicle winter testing, professional traffic fleets and future automated driving.\" Proceedings of the 2018 ITS World Congress, Copenhagen, Denmark. 2018.*).\n", "\n", "Autonomous vehicles can sense their surroundings and navigate without human intervention. Although promising and proving safety features, the trustworthiness of these cars has to be examined before they can be widely adopted on the road. Unlike traditional network\n", "security, autonomous vehicles rely heavily on their sensory ability of their surroundings to make driving decision, which incurs a security risk from sensors.\n", "\n", "However, these sensors can may not be so accurate yet, thus companies are also collecting weather data available from other sources, such as observations from Earth-orbiting satellites to evaluate the road condition and test the related car performance (*Wentz, Frank J., and Matthias Schabel. \"Precise climate monitoring using complementary satellite data sets.\" Nature 403.6768 (2000): 414-416.*).\n", "\n", "\n", "## Objective and method\n", "\n", "Our main goal in this report is to predict road surface temperature - which is usually unknown - based on information about the weather such as air temperature and solar radiation.\n", "\n", "Specifically we will perform two different types of analysis: \n", "\n", "- Regression\n", "\n", "First, we will try to create models that can predict the road surface temperature using linear regression and lasso regularization. Self-driving cars may have to adapt their driving style based on the temperature of the road and since we are yet unable to measure the exact temperature of roads we will have to make decisions on the data that is accessible. In this case we are in possession of many predictors and we will therefore make use of lasso regularization to prevent overfitting. Next, we will fit a linear model on the predictors that are not dropped yet by the lasso regression, because lasso regression is a biased estimator of the true parameters.\n", "\n", "- Classification\n", "\n", "Then, we will focus on a classification tasks in which the real word application is not limited to automatic cars. Nowadays, also the current commercial vehicles are equipped with the Advanced Driver Assistance Systems (ADASs), which use an environment perception module consisted of several sensors whose objective is to provide relevant data necessary to interpret the surrounding scenes near the vehicle. In normal climatic conditions, the reliability and benefits of the ADASs have gained popularity and confidence. However, in adverse weather conditions, the experience of the driver is required to compensate for the failure of the ADASs to appropriately perceive the surrounding environment, which usually results in severe accidents. \n", "In particular when the road surface temperature is close to 0°C ice risk is verified. Thus, the ability to distinguish this situation from the others could be helpful in order to prevent accidents.\n", "\n", "We identify the Risk class for days in which the temperature is in a range between 0°C + - standard deviation of the distribution of Road Surface Temperature mean.\n", "\n", "Methods that will be used are: Logistic Regression, KNN, SVM, Naive Bayes classifier, Decision tree, Random Forest, Ada Boost.\n", "\n", "We both work on all tasks, but mainly Pascal focused on regression and Martina on classification." ] }, { "cell_type": "markdown", "id": "341dfc99", "metadata": {}, "source": [ "## Libraries" ] }, { "cell_type": "code", "execution_count": 41, "id": "7cb5f9e9", "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy import interpolate\n", "import pandas as pd\n", "import plotly.express as px\n", "import seaborn as sb\n", "\n", "from sklearn.model_selection import cross_val_score\n", "from sklearn import linear_model\n", "from sklearn import (linear_model, metrics, neural_network, pipeline, model_selection)\n", "from sklearn import datasets\n", "from sklearn.linear_model import Lasso\n", "from sklearn.linear_model import LassoCV\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.preprocessing import LabelEncoder\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import classification_report, f1_score, accuracy_score, confusion_matrix\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.svm import SVC\n", "from sklearn.naive_bayes import GaussianNB\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.ensemble import AdaBoostClassifier\n", "from sklearn.model_selection import GridSearchCV\n", "\n", "import statsmodels.api as sm\n", "\n", "import plotly.offline as pyo\n", "pyo.init_notebook_mode()" ] }, { "cell_type": "markdown", "id": "665b2f34", "metadata": {}, "source": [ "## Data\n", "\n", "In this report we will investigate a data set that contains information about weather and temperature of road surface at certain moments in time, namely for each hour of the day from November 1st 2017 until April 15th 2022 in the village Dollerup, Germany (coordinates: 6.6994,54.7717). \n", "\n", "In detail, each row corresponds to a day in which 24 values (one for each hour of the day) are collected for the following variables:\n", "\n", "| Variable Name | Meaning |\n", "| --- | --- |\n", "| RST | Road Surface Temperature |\n", "| TT | Air Temperature|\n", "| TD | Dew Point Temperature|\n", "| GRAD1 | Solar Radiation|\n", "| NE | Effective Cloud Cover |\n", "| FF | Wind Speed |\n", "\n", "Furthermore, the time of the measurement of the variable is given at the end of the variable name. For example, RST_1000 represents the road surface temperature at 10am, and RST_1800 represents the same at 6pm." ] }, { "cell_type": "code", "execution_count": 42, "id": "43fa35c9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateRST_0000RST_0100RST_0200RST_0300RST_0400RST_0500RST_0600RST_0700RST_0800...FF_1400FF_1500FF_1600FF_1700FF_1800FF_1900FF_2000FF_2100FF_2200FF_2300
0201711019.89.910.210.510.811.111.211.5NaN...19.017.316.618.718.020.220.322.121.319.7
120171102NaNNaNNaNNaNNaNNaNNaNNaN8.6...15.814.612.710.69.412.09.67.88.68.1
2201711036.66.96.57.27.17.17.07.68.6...9.711.410.510.410.511.111.012.411.612.1
3201711048.68.68.68.58.58.58.48.69.0...9.28.410.011.812.412.111.69.711.113.7
4201711059.710.110.29.58.78.18.17.58.3...6.06.47.35.95.96.26.65.26.48.0
\n", "

5 rows × 145 columns

\n", "
" ], "text/plain": [ " date RST_0000 RST_0100 RST_0200 RST_0300 RST_0400 RST_0500 \\\n", "0 20171101 9.8 9.9 10.2 10.5 10.8 11.1 \n", "1 20171102 NaN NaN NaN NaN NaN NaN \n", "2 20171103 6.6 6.9 6.5 7.2 7.1 7.1 \n", "3 20171104 8.6 8.6 8.6 8.5 8.5 8.5 \n", "4 20171105 9.7 10.1 10.2 9.5 8.7 8.1 \n", "\n", " RST_0600 RST_0700 RST_0800 ... FF_1400 FF_1500 FF_1600 FF_1700 \\\n", "0 11.2 11.5 NaN ... 19.0 17.3 16.6 18.7 \n", "1 NaN NaN 8.6 ... 15.8 14.6 12.7 10.6 \n", "2 7.0 7.6 8.6 ... 9.7 11.4 10.5 10.4 \n", "3 8.4 8.6 9.0 ... 9.2 8.4 10.0 11.8 \n", "4 8.1 7.5 8.3 ... 6.0 6.4 7.3 5.9 \n", "\n", " FF_1800 FF_1900 FF_2000 FF_2100 FF_2200 FF_2300 \n", "0 18.0 20.2 20.3 22.1 21.3 19.7 \n", "1 9.4 12.0 9.6 7.8 8.6 8.1 \n", "2 10.5 11.1 11.0 12.4 11.6 12.1 \n", "3 12.4 12.1 11.6 9.7 11.1 13.7 \n", "4 5.9 6.2 6.6 5.2 6.4 8.0 \n", "\n", "[5 rows x 145 columns]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#path = 'C:/Users/pasca/Documents/100356.csv'\n", "path = 'C:/Users/chies/OneDrive/Desktop/ECON_323/project/100356.csv'\n", "df = pd.read_csv(path)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "e002cacd", "metadata": {}, "source": [ "## Data Exploration\n", "\n", "### Univariate statistics" ] }, { "cell_type": "code", "execution_count": 43, "id": "1ad11662", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 1015 entries, 0 to 1014\n", "Columns: 145 entries, date to FF_2300\n", "dtypes: float64(144), int64(1)\n", "memory usage: 1.1 MB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "markdown", "id": "edcefaf0", "metadata": {}, "source": [ "Tha dataset contains 145 columns and 1015 rows.\n", "\n", "We present now summary statistics for each variable:" ] }, { "cell_type": "code", "execution_count": 44, "id": "2faca583", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RST_0000RST_0100RST_0200RST_0300RST_0400RST_0500RST_0600RST_0700RST_0800RST_0900...FF_1400FF_1500FF_1600FF_1700FF_1800FF_1900FF_2000FF_2100FF_2200FF_2300
count976.000000970.000000971.000000972.000000970.000000972.000000975.000000978.000000978.000000980.000000...1015.0000001015.0000001015.0000001015.0000001015.0000001015.0000001015.0000001015.0000001015.0000001015.000000
mean4.3592214.1998974.0823893.9840533.9336083.9258234.1327184.6443765.6956037.058265...12.70344812.43773412.19773412.01123211.84069011.73911311.72305411.63921211.61379311.617537
std4.4952374.4766464.4720774.4822504.4952154.4947304.4838074.6672515.1341045.694181...5.7312495.7334695.5235525.3765375.3206025.2365075.2651155.2515225.2779425.405816
min-9.400000-10.100000-10.500000-10.900000-11.200000-11.400000-11.500000-11.600000-10.600000-8.500000...1.0000000.9000001.7000001.9000001.4000001.1000001.5000001.9000001.2000001.100000
25%1.2000001.1000000.9000000.8000000.8000000.8750001.1000001.7000002.3000003.300000...8.6000008.2000008.1000008.2000008.1500008.1000008.0000008.0000008.0000007.900000
50%4.0000004.0000003.7000003.6500003.6000003.6000003.8000004.5000005.2000006.100000...12.00000011.60000011.40000011.20000011.00000011.00000011.10000011.00000011.00000010.800000
75%7.1000007.0000006.8000006.8250006.8000006.8000007.0000007.5000008.80000010.400000...16.00000016.00000015.40000014.95000014.80000014.45000014.60000014.60000014.70000014.900000
max17.30000017.00000018.00000018.00000018.20000018.10000018.10000019.00000020.90000025.000000...38.60000037.40000035.60000034.30000035.00000034.80000034.90000034.10000033.80000034.700000
\n", "

8 rows × 144 columns

\n", "
" ], "text/plain": [ " RST_0000 RST_0100 RST_0200 RST_0300 RST_0400 RST_0500 \\\n", "count 976.000000 970.000000 971.000000 972.000000 970.000000 972.000000 \n", "mean 4.359221 4.199897 4.082389 3.984053 3.933608 3.925823 \n", "std 4.495237 4.476646 4.472077 4.482250 4.495215 4.494730 \n", "min -9.400000 -10.100000 -10.500000 -10.900000 -11.200000 -11.400000 \n", "25% 1.200000 1.100000 0.900000 0.800000 0.800000 0.875000 \n", "50% 4.000000 4.000000 3.700000 3.650000 3.600000 3.600000 \n", "75% 7.100000 7.000000 6.800000 6.825000 6.800000 6.800000 \n", "max 17.300000 17.000000 18.000000 18.000000 18.200000 18.100000 \n", "\n", " RST_0600 RST_0700 RST_0800 RST_0900 ... FF_1400 \\\n", "count 975.000000 978.000000 978.000000 980.000000 ... 1015.000000 \n", "mean 4.132718 4.644376 5.695603 7.058265 ... 12.703448 \n", "std 4.483807 4.667251 5.134104 5.694181 ... 5.731249 \n", "min -11.500000 -11.600000 -10.600000 -8.500000 ... 1.000000 \n", "25% 1.100000 1.700000 2.300000 3.300000 ... 8.600000 \n", "50% 3.800000 4.500000 5.200000 6.100000 ... 12.000000 \n", "75% 7.000000 7.500000 8.800000 10.400000 ... 16.000000 \n", "max 18.100000 19.000000 20.900000 25.000000 ... 38.600000 \n", "\n", " FF_1500 FF_1600 FF_1700 FF_1800 FF_1900 \\\n", "count 1015.000000 1015.000000 1015.000000 1015.000000 1015.000000 \n", "mean 12.437734 12.197734 12.011232 11.840690 11.739113 \n", "std 5.733469 5.523552 5.376537 5.320602 5.236507 \n", "min 0.900000 1.700000 1.900000 1.400000 1.100000 \n", "25% 8.200000 8.100000 8.200000 8.150000 8.100000 \n", "50% 11.600000 11.400000 11.200000 11.000000 11.000000 \n", "75% 16.000000 15.400000 14.950000 14.800000 14.450000 \n", "max 37.400000 35.600000 34.300000 35.000000 34.800000 \n", "\n", " FF_2000 FF_2100 FF_2200 FF_2300 \n", "count 1015.000000 1015.000000 1015.000000 1015.000000 \n", "mean 11.723054 11.639212 11.613793 11.617537 \n", "std 5.265115 5.251522 5.277942 5.405816 \n", "min 1.500000 1.900000 1.200000 1.100000 \n", "25% 8.000000 8.000000 8.000000 7.900000 \n", "50% 11.100000 11.000000 11.000000 10.800000 \n", "75% 14.600000 14.600000 14.700000 14.900000 \n", "max 34.900000 34.100000 33.800000 34.700000 \n", "\n", "[8 rows x 144 columns]" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df.drop(['date'], axis=1).describe())" ] }, { "cell_type": "markdown", "id": "8d149f50", "metadata": {}, "source": [ "It is possible to notice that the all the variable have approximately same scale and in general similar distributions. This was expected by the meaning of each variable. For example the distribution of the Air Temperature at 10am is expected to behave similarly to the same measure at 6pm. " ] }, { "cell_type": "markdown", "id": "c7fcbdca", "metadata": {}, "source": [ "### Missing value imputation\n", "\n", "Then we want to check if there are missing values:" ] }, { "cell_type": "code", "execution_count": 45, "id": "6726b70a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateRST_0000RST_0100RST_0200RST_0300RST_0400RST_0500RST_0600RST_0700RST_0800...FF_1400FF_1500FF_1600FF_1700FF_1800FF_1900FF_2000FF_2100FF_2200FF_2300
0201711019.89.910.210.510.811.111.211.5NaN...19.017.316.618.718.020.220.322.121.319.7
120171102NaNNaNNaNNaNNaNNaNNaNNaN8.6...15.814.612.710.69.412.09.67.88.68.1
7201711082.63.64.03.44.04.25.35.66.1...8.75.46.27.46.36.17.26.76.95.4
820171109NaNNaNNaNNaNNaNNaNNaNNaNNaN...16.317.218.418.99.28.410.49.910.010.0
920171110NaNNaNNaNNaNNaNNaNNaNNaNNaN...15.318.919.416.216.115.013.516.513.016.0
..................................................................
984202203163.53.12.32.22.72.82.93.95.0...11.713.011.514.114.513.613.414.915.616.3
985202203174.64.54.13.73.84.14.55.26.0...17.215.616.413.012.610.010.49.89.09.2
99320220325NaNNaN4.2NaNNaN3.23.65.410.2...11.312.612.411.210.59.210.59.110.210.2
999202203310.00.00.20.30.10.61.02.03.0...18.918.116.816.814.413.914.312.611.613.5
100020220401NaNNaNNaN0.80.80.81.63.25.4...13.712.812.512.812.813.212.211.610.48.6
\n", "

106 rows × 145 columns

\n", "
" ], "text/plain": [ " date RST_0000 RST_0100 RST_0200 RST_0300 RST_0400 RST_0500 \\\n", "0 20171101 9.8 9.9 10.2 10.5 10.8 11.1 \n", "1 20171102 NaN NaN NaN NaN NaN NaN \n", "7 20171108 2.6 3.6 4.0 3.4 4.0 4.2 \n", "8 20171109 NaN NaN NaN NaN NaN NaN \n", "9 20171110 NaN NaN NaN NaN NaN NaN \n", "... ... ... ... ... ... ... ... \n", "984 20220316 3.5 3.1 2.3 2.2 2.7 2.8 \n", "985 20220317 4.6 4.5 4.1 3.7 3.8 4.1 \n", "993 20220325 NaN NaN 4.2 NaN NaN 3.2 \n", "999 20220331 0.0 0.0 0.2 0.3 0.1 0.6 \n", "1000 20220401 NaN NaN NaN 0.8 0.8 0.8 \n", "\n", " RST_0600 RST_0700 RST_0800 ... FF_1400 FF_1500 FF_1600 FF_1700 \\\n", "0 11.2 11.5 NaN ... 19.0 17.3 16.6 18.7 \n", "1 NaN NaN 8.6 ... 15.8 14.6 12.7 10.6 \n", "7 5.3 5.6 6.1 ... 8.7 5.4 6.2 7.4 \n", "8 NaN NaN NaN ... 16.3 17.2 18.4 18.9 \n", "9 NaN NaN NaN ... 15.3 18.9 19.4 16.2 \n", "... ... ... ... ... ... ... ... ... \n", "984 2.9 3.9 5.0 ... 11.7 13.0 11.5 14.1 \n", "985 4.5 5.2 6.0 ... 17.2 15.6 16.4 13.0 \n", "993 3.6 5.4 10.2 ... 11.3 12.6 12.4 11.2 \n", "999 1.0 2.0 3.0 ... 18.9 18.1 16.8 16.8 \n", "1000 1.6 3.2 5.4 ... 13.7 12.8 12.5 12.8 \n", "\n", " FF_1800 FF_1900 FF_2000 FF_2100 FF_2200 FF_2300 \n", "0 18.0 20.2 20.3 22.1 21.3 19.7 \n", "1 9.4 12.0 9.6 7.8 8.6 8.1 \n", "7 6.3 6.1 7.2 6.7 6.9 5.4 \n", "8 9.2 8.4 10.4 9.9 10.0 10.0 \n", "9 16.1 15.0 13.5 16.5 13.0 16.0 \n", "... ... ... ... ... ... ... \n", "984 14.5 13.6 13.4 14.9 15.6 16.3 \n", "985 12.6 10.0 10.4 9.8 9.0 9.2 \n", "993 10.5 9.2 10.5 9.1 10.2 10.2 \n", "999 14.4 13.9 14.3 12.6 11.6 13.5 \n", "1000 12.8 13.2 12.2 11.6 10.4 8.6 \n", "\n", "[106 rows x 145 columns]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.isna().any(axis=1)]" ] }, { "cell_type": "code", "execution_count": 46, "id": "8c32844f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of missing values in the dataframe: 1121\n" ] } ], "source": [ "print(\"Total number of missing values in the dataframe:\", df.isnull().sum().sum())" ] }, { "cell_type": "markdown", "id": "d51a832e", "metadata": {}, "source": [ "We chose to replace all missing values by its previous value, i.e. the value of the day before at the same time through the following function:" ] }, { "cell_type": "code", "execution_count": 47, "id": "b9260851", "metadata": {}, "outputs": [], "source": [ "def missing(df):\n", " \"\"\" \n", " input = dataframe\n", " fill missing values with the next observed value\n", " output = dataframe with no Nan\n", " \"\"\"\n", " df = df.fillna(method=\"bfill\")\n", " return df" ] }, { "cell_type": "code", "execution_count": 48, "id": "8339d812", "metadata": {}, "outputs": [], "source": [ "df = missing(df)\n", "#df.isnull().sum().sum() # to check that now is 0" ] }, { "cell_type": "markdown", "id": "7b9240e3", "metadata": {}, "source": [ "### Date time\n", "\n", "Date variable is read as numeric, so it is necessary to change the format of it." ] }, { "cell_type": "code", "execution_count": 49, "id": "62d08a14", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2017-11-01\n", "1 2017-11-02\n", "2 2017-11-03\n", "3 2017-11-04\n", "4 2017-11-05\n", " ... \n", "1010 2022-04-11\n", "1011 2022-04-12\n", "1012 2022-04-13\n", "1013 2022-04-14\n", "1014 2022-04-15\n", "Name: date, Length: 1015, dtype: datetime64[ns]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['date'] = pd.to_datetime(df['date'], format='%Y%m%d') # convert to date time\n", "df['date']" ] }, { "cell_type": "markdown", "id": "8dd44780", "metadata": {}, "source": [ "### Road Surface Temperature graph \n", "\n", "To get a better idea of the mobility of the road surface temperature, we made a time series plot of the road surface temperature for each day at 2pm from 2017-11-01 until 2022-04-15." ] }, { "cell_type": "code", "execution_count": 50, "id": "62fc92c6", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "hovertemplate": "date=%{x}
Road Service Temperature=%{y}", "legendgroup": "", "line": { "color": "#636efa", "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "", "showlegend": false, "type": "scattergl", "x": [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014 ], "xaxis": "x", "y": [ 12.6, 12.6, 10.5, 12.3, 11.3, 9, 7.7, 8.6, 8.6, 8.6, 8.6, 8.6, 8.6, 8.6, 8.6, 8.6, 6.7, 4.1, 5.6, 6.7, 6.3, 9.9, 9.7, 6.3, 4.2, 5, 4.5, 5.4, 5.4, 2.7, 4.7, 2.7, 5.4, 4.7, 7.7, 8.2, 4.7, 2.1, 1.7, 0, 1, 0.7, 0.4, 2.1, 3.9, 1.6, 1.1, 2.6, 5.7, 7, 6.4, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 3.4, 1.2, 1, 2.6, 4, 3.8, 2.2, 2.1, 1.7, 1.5, 1.8, 2.1, 1, 1.6, 1.6, 4.1, 1.2, 4.8, 10.6, 7.4, 7.1, 6.2, 8.5, 6.5, 5.3, 6.2, 4, 4.5, -0.3, 2.5, 1.8, 4.5, 4.5, 4.4, 4.3, 2.4, 3.4, 5.5, 7.4, 6.9, 0.8, 6.5, 6.7, 7, 9, 8.6, 8.5, 5.5, 8.4, 7.3, 3.7, 5.4, -1.5, -1.3, -2.6, 0.8, 3.9, 1.8, 6.9, 8.8, 4.9, 8.3, 11.4, 4, 10.6, 9, 7, 7.8, 7.5, 4.6, 6.7, 12.6, 15.3, 15.3, 9.6, 8.2, 10.2, 13, 10, 10, 16.5, 6.8, 5.4, 15.7, 8.3, 16.1, 18.7, 14.3, 19, 8.8, 23.8, 26.1, 29.3, 28.4, 12.8, 20.7, 22, 12.6, 10.4, 12.8, 27.1, 30.2, 35, 30.6, 24.8, 17, 20.7, 16.8, 21.1, 19.6, 14.9, 20.6, 22.8, 19.5, 21.2, 20.8, 14.6, 20.9, 17.3, 23.7, 19.5, 20.5, 17.5, 17.9, 23.3, 23.9, 25.7, 28.1, 25.7, 26.3, 21.4, 23.7, 20.4, 18.1, 18.7, 16.6, 16.9, 13.2, 12.7, 13.6, 11.2, 9.1, 9.4, 6.5, 10, 10.8, 12.5, 11.6, 11.3, 9.8, 10.6, 11.6, 13.6, 11.8, 9.5, 10.4, 11.6, 10.5, 9.9, 9.7, 11.7, 9.5, 7.2, 6.7, 5.7, 5.3, 6.4, 4.2, 4.1, 3.8, 4.5, 5.2, 3.8, 1.8, 4.7, 8.1, 7.1, 8.2, 9.5, 5.6, 3.5, 6.7, 10.3, 7.4, 6.1, 5.1, 4.1, 4.9, 3.3, 2, 2.1, 0.8, 1.4, 4.9, 3.3, 5.3, 5.8, 5, 5.7, 4.1, 6.1, 7.8, 7.9, 7.6, 7.4, 5.6, 7.1, 5.9, 3.8, 1, 5.8, 7.8, 6.3, 6.3, 6.3, 4.6, 2.2, 6.5, 5.8, 6.4, 3.7, 6.7, 6.7, 4.4, 3.5, 2.4, 3.3, 3, 2.7, -0.4, -2.5, -0.8, 3.3, 5.2, 3.4, 2.3, 3.6, 2.3, 1.5, 2.6, 6.3, 3.5, 6, 4.8, 5.9, 6.4, 5.3, 5.2, 10.7, 9.4, 8.9, 10.3, 14.3, 13, 15.4, 15.5, 8.1, 9.5, 8.4, 10.7, 9.6, 15.7, 15.9, 17.6, 18.2, 15.7, 12.3, 9.2, 11.3, 7.3, 13, 6.9, 9.2, 13.3, 7.7, 10.5, 16.4, 5.1, 5.5, 7, 8.2, 7.2, 13.7, 17.3, 19.7, 11.2, 11.7, 12.9, 20.5, 19.8, 18.9, 15.2, 11.2, 16.8, 20.2, 23.1, 21.7, 21.3, 21.7, 10.8, 24.2, 21.8, 26.6, 28, 26.6, 24.9, 14.6, 11.1, 20.2, 13.7, 16, 23.1, 26.6, 23, 19.1, 26.6, 20.2, 30.7, 29.4, 27, 24.7, 17.2, 20.1, 25.3, 21.1, 14.5, 22.3, 12.2, 17.6, 15.1, 11.9, 13.4, 16.8, 15.4, 16, 15.8, 15.8, 13.8, 15, 12.5, 17.3, 21.8, 15.6, 17.4, 16, 14.8, 14.3, 12.4, 14, 14.8, 14.5, 12.9, 14.8, 13.4, 11.5, 11.2, 9.7, 9.3, 6.2, 11.7, 11.1, 10.8, 7.6, 7.3, 7.4, 9.8, 9.6, 8.2, 6.5, 5.8, 4.8, 7.3, 6.8, 8.2, 6.6, 8.3, 8.1, 6.9, 6.8, 8.4, 8.1, 6.9, 7, 7.2, 7.8, 8.9, 5.1, 5.3, 6.3, 4.8, 6.6, 7.6, 6.8, 6.1, 7.6, 8.1, 6, 5.7, 6, 5.1, 2.8, 4.2, 4.4, 5.7, 7, 7.2, 7.5, 5.9, 7.1, 4.9, 7.4, 6, 6.1, 5.6, 4.1, 2.2, 2.3, 6.1, 6.2, 5, 2, 6.1, 4.7, 4.9, 6.1, 6.8, 8.9, 8.9, 5.9, 5.2, 5.5, 6.1, 7.8, 9.2, 8.2, 8.4, 5.1, 6.4, 6.6, 7.6, 8.2, 6.4, 6.9, 6, 6.3, 7.1, 5.9, 4.5, 7, 7.8, 9.7, 8.5, 8, 8.7, 9.4, 8.6, 6.6, 7.4, 7.2, 7.4, 4.4, 6.2, 4.7, 6.1, 8, 10.5, 9.2, 8.3, 9.8, 7.4, 8.9, 8.3, 6.1, 6.5, 5.7, 7.4, 8.5, 10.4, 11.1, 9.1, 12.1, 10.7, 8.1, 10.6, 7, 11, 9, 13.8, 9.1, 16.7, 11.6, 15.4, 17.1, 15.1, 13.9, 13.6, 9.8, 22, 18.6, 18.8, 15.9, 18.9, 17.6, 21.4, 20, 22.6, 23.2, 18.7, 21.9, 22.5, 11.5, 12, 13.2, 20.3, 26.3, 30.3, 30, 30, 28, 27.3, 29.2, 31.2, 20.3, 20.8, 26.5, 30.6, 28.4, 28.4, 25.6, 28, 25.9, 30.8, 27.1, 25.1, 20.4, 18.4, 19.4, 15.9, 21.8, 22.8, 23.2, 18.9, 18.2, 18.3, 16.8, 17.1, 19, 12.2, 18.5, 11.6, 12.1, 15.9, 18.1, 13.6, 13.8, 14.8, 16, 15.5, 14, 11.2, 12.6, 15.9, 15.8, 14.6, 11.6, 11.8, 13.1, 13.7, 9.9, 13.5, 14.2, 15.6, 16.3, 13.7, 10.4, 11.2, 15.4, 12.4, 10.8, 9.8, 8.2, 6.5, 8.7, 9.6, 10.4, 13.2, 10.1, 11.7, 11.6, 6.4, 7.1, 7.3, 9.1, 8, 8.5, 6.8, 9.5, 7.1, 5.7, 4.3, 3.7, 5.2, 4.8, 2.9, 2.4, 3.9, 5.4, 6.2, 6.1, 5.2, 3.7, 3, 3.7, 3.8, 6, 7.7, 7.1, 7.5, 7.5, 5.7, 7.2, 6.6, 7.5, 5.2, 4.1, 1.6, 3.3, 2.9, 3, 2.9, 4.5, 3.5, 3.4, 3.5, 3.3, 3, 2.9, 1.7, 2.7, 2, 2.6, 2.4, 4.1, 1.5, 1.9, 2, 0.5, 1.3, 1.7, 3.5, 4.1, 6.7, 7.3, 5.6, 4.4, 3.3, 5, 5, 3.1, 3, 0.9, 2.8, 3.4, 1.3, 1, -0.5, 2.1, 2.5, 2.6, -2.6, -2, 2, 3.2, 1.2, 1.8, 2.6, 2.4, 1.1, 3.6, 5.4, 10.9, 10.3, 14, 16.5, 16.1, 14.5, 15.1, 11.6, 14.5, 14.5, 16.3, 16.3, 16.4, 13.6, 9.3, 13.1, 8.5, 7.4, 14.3, 13.1, 9.5, 11.1, 11.1, 11.1, 11.1, 11.1, 16.3, 17.2, 14.7, 9.5, 6.4, 16.9, 11.7, 10.6, 17.1, 20.3, 18.4, 12.5, 8.2, 12.7, 24.7, 28.7, 17.4, 19.3, 22.4, 10.3, 12.4, 7.8, 10.7, 10.2, 16.7, 21.1, 7.3, 16.8, 19.4, 22.3, 21.5, 27.6, 22.1, 21.3, 20.9, 26.8, 18, 27.9, 18.2, 16.8, 24.1, 27.1, 23.7, 24.6, 22.3, 17.3, 20.9, 20.1, 17.3, 18.5, 14.3, 17.4, 21.3, 17.6, 18.8, 17, 15.1, 11.5, 14, 14, 14, 14.6, 12.7, 14.2, 13.9, 14.5, 10.2, 9.4, 12.4, 13.4, 10.6, 10.5, 13.2, 16.4, 15.7, 13.3, 13.8, 13, 12.6, 10.6, 8.6, 11.3, 10.1, 8.7, 11.1, 11.3, 11.9, 12.5, 10.5, 8.9, 8.8, 8.2, 6.9, 7.4, 10.2, 12.7, 10.8, 8.3, 7.1, 8.4, 8.2, 6.3, 4.6, 4, 4.1, 4.2, 4.1, 6.5, 2.4, 2.3, 4.3, 3.3, 2.5, 2.4, 2.5, 3.3, 4, 2.8, 5.7, 8.5, 5.6, 9, 6.7, 7.5, 7.4, 7, 2.7, 1.4, 1.2, -0.6, -0.7, -2.9, -2.5, -1, 2.1, 2.5, 8.2, 9.5, 9.2, 9.8, 7.6, 5, 4.1, 2.4, 2.7, 3.3, 2.7, 3.7, 4.4, 6, 7.4, 7.1, 5.5, 5.8, 5.8, 7, 4.4, 2.6, 5.1, 7.3, 8.7, 5.6, 6.5, 5.7, 8.5, 7.1, 8.3, 7.3, 4.8, 6.2, 7.4, 7, 6.8, 6, 5.7, 8.1, 8.4, 8.1, 7.9, 7.9, 6.4, 7.6, 8.3, 8.8, 8.4, 6.6, 4.3, 6, 6.2, 4.9, 7.6, 11.3, 6.7, 11.9, 13.6, 12.4, 13.7, 7.1, 13.8, 10.2, 13.5, 7.5, 13.8, 15.7, 16.9, 17.2, 16.5, 15.4, 17, 16.5, 13.4, 17.3, 9.1, 8.1, 20.9, 19.5, 16.3, 20.5, 21.5, 22.9, 23.3, 25.2, 25.2, 24.8, 16.5, 17.1, 13, 14.3, 14.3, 16.5, 11, 8.7, 14.5, 8.6, 9.8, 9.1, 12.2, 13, 22.5, 21.3, 26.6, 24.3, 14 ], "yaxis": "y" } ], "layout": { "legend": { "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "title": { "text": "date" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "title": { "text": "Road Service Temperature" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "example = pd.DataFrame(\n", " {\n", " \"date\": np.arange(len(df['RST_1400'])),\n", " \"Road Service Temperature\": df['RST_1400'],\n", " }\n", ")\n", "px.line(example, x=\"date\", y=\"Road Service Temperature\")" ] }, { "cell_type": "markdown", "id": "66c33d39", "metadata": {}, "source": [ "From the plot above we can observe and learn a lot. One can notice that the data is homogeneous in the sense that the road surface temperature is usually similar to the day(s) before it. Furthermore, there is a clear presence of seasonality, by noticing that some patterns are similar each day, such as the peeks in each summer." ] }, { "cell_type": "markdown", "id": "5a4be29e", "metadata": {}, "source": [ "### Road Surface Temperature in a day\n", "In the graph below we plot the average road surface temperature for each hour of the day across all the data that we have. From this plot we can get a better idea of the amount of variation and whether there could be some intuitive explanation." ] }, { "cell_type": "code", "execution_count": 51, "id": "d60d9ded", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "hovertemplate": "hour of day=%{x}
Road Service Temperature=%{y}", "legendgroup": "", "line": { "color": "#636efa", "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "", "orientation": "v", "showlegend": false, "type": "scatter", "x": [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 ], "xaxis": "x", "y": [ 4.315566502463047, 4.16187192118227, 4.0418719211822625, 3.953004926108376, 3.9154679802955656, 3.9048275862068946, 4.082167487684725, 4.564827586206893, 5.608965517241375, 6.978029556650239, 8.6216748768473, 9.960000000000006, 10.738325123152704, 10.854384236453203, 10.403546798029556, 9.306009852216755, 8.114876847290649, 6.976551724137933, 6.1367487684729065, 5.608866995073886, 5.234778325123158, 4.93221674876847, 4.678719211822659, 4.452315270935958 ], "yaxis": "y" } ], "layout": { "legend": { "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "title": { "text": "hour of day" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "title": { "text": "Road Service Temperature" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "vRST = np.zeros(24)\n", "for i in range(24):\n", " vRST[i] = df.iloc[:,1+i].mean()\n", "example2 = pd.DataFrame(\n", " {\n", " \"hour of day\": np.arange(24),\n", " \"Road Service Temperature\": vRST,\n", " }\n", ")\n", "px.line(example2, x=\"hour of day\", y=\"Road Service Temperature\")" ] }, { "cell_type": "markdown", "id": "022671cf", "metadata": {}, "source": [ "From the graph above we can see that the road surface temperature differs a lot across the day and we may suspect that there is a correlation between solar radiation and temperature with the road surface temperature, since the average road surface temperature is higher during the day and lower in the night. Furthermore, we can see that for each hour of the day we will probably have to make a seperate model, because the influence of the road surface temperature of the hours before strongly depends on the hour we focus on. For instance, if we know that it was 10 degrees at 11am we will probably predict an even higher temperature at 12pm, whereas if we know that the road surface temperature was 10 degrees at 3pm, we will predict a lower value for the road surface temperature at 4pm, i.e. the parameter value in a time series model will depend the hour of the day and we must therefore distinguish models depending on the hour." ] }, { "cell_type": "markdown", "id": "cddfbb43", "metadata": {}, "source": [ "### Temperature in a day\n", "\n", "Now, we will make a plot of the average air temperature (TT) for each hour of the day to see whether there is a similarity in distribution with the road surface temperature as we expect." ] }, { "cell_type": "code", "execution_count": 52, "id": "01271fd5", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "hovertemplate": "hour of day=%{x}
Temperature=%{y}", "legendgroup": "", "line": { "color": "#636efa", "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "", "orientation": "v", "showlegend": false, "type": "scatter", "x": [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 ], "xaxis": "x", "y": [ 5.173596059113302, 5.08177339901477, 4.994187192118222, 4.917832512315266, 4.850640394088672, 4.8095566502463045, 4.852610837438423, 5.096059113300493, 5.531231527093593, 6.072315270935956, 6.592216748768474, 7.022364532019706, 7.329359605911332, 7.475369458128082, 7.411527093596059, 7.172610837438422, 6.765517241379321, 6.353103448275855, 5.998029556650246, 5.746699507389161, 5.5726108374384244, 5.435073891625616, 5.327192118226608, 5.228965517241374 ], "yaxis": "y" } ], "layout": { "legend": { "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "title": { "text": "hour of day" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "title": { "text": "Temperature" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "vTT = np.zeros(24)\n", "for i in range(24):\n", " vTT[i] = df.iloc[:,25+i].mean()\n", "example2 = pd.DataFrame(\n", " {\n", " \"hour of day\": np.arange(24),\n", " \"Temperature\": vTT,\n", " }\n", ")\n", "px.line(example2, x=\"hour of day\", y=\"Temperature\")" ] }, { "cell_type": "markdown", "id": "f229d5c8", "metadata": {}, "source": [ "From the plots above we can notice that the average road surface temperature and average temperature seem to have a very similar distribution, which indicates that temperature will be an important predictor." ] }, { "cell_type": "markdown", "id": "ebbf1632", "metadata": {}, "source": [ "# Prediction and classification analysis\n", "\n", "## Regression models\n", "\n", "In this section we predict the road surface temperature on a specific time, namely 2pm and 10pm using as predictors: TT, TD, GRAD1, NE and FF available at the current values of the predictors and ten lags. E.g. for 2pm the predictors will be from 8am until 2pm. \n", "\n", "First, we will use linear regression to get an idea of the predictive power of the model and to designate which variables are important predictors. Secondly, we will apply lasso regularization on the same predictors to delete the variables that are of no importance. To decide which value to use as tuning parameter in lasso regression we will make use of cross validation." ] }, { "cell_type": "markdown", "id": "dce0c2d1", "metadata": {}, "source": [ "### Linear regression\n", "\n", "The method we will try out first is linear regression which can be exploited by the following equation:\n", "$y_i = \\beta_0 + \\beta_1 x_{i1} + ... + \\beta_k x_{ik} + u_i \\qquad (i = 1,2, ..., n)$\n", "\n", "where the parameters are estimated by minimizing the squared loss as follows: $\\hat{\\beta} = \\underset{\\beta\\in R^{k+1}}{\\mathrm{argmin}}\\sum_{i=1}^{n}{(y_i-X_i'\\beta)^2}$.\n", "\n", "The response variable $Y$ is the Road Surface Temperature at time 2pm firstly, then at 10pm." ] }, { "cell_type": "markdown", "id": "67c81ba5", "metadata": {}, "source": [ "#### Prediction RST at 2pm" ] }, { "cell_type": "code", "execution_count": 53, "id": "c2fd19c8", "metadata": {}, "outputs": [], "source": [ "def regression(df, hour, lags):\n", " \"\"\"\n", " input: dataframe, hour for response, number of lags\n", " output: X matrix containing TT, TD, GRAD1, NE, FF up to a certain number of lags, \n", " y vector containing RST for the given hour\n", " \"\"\"\n", " X = pd.concat([pd.DataFrame(df.iloc[:, (df.columns.get_loc(\"TT_0000\")+hour-lags):(df.columns.get_loc(\"TT_0000\")+hour+1)]),\n", " pd.DataFrame(df.iloc[:, (df.columns.get_loc(\"TD_0000\")+hour-lags):(df.columns.get_loc(\"TD_0000\")+hour+1)]),\n", " pd.DataFrame(df.iloc[:, (df.columns.get_loc(\"GRAD1_0000\")+hour-lags):(df.columns.get_loc(\"GRAD1_0000\")+hour+1)]),\n", " pd.DataFrame(df.iloc[:, (df.columns.get_loc(\"NE_0000\")+hour-lags):(df.columns.get_loc(\"NE_0000\")+hour+1)]),\n", " pd.DataFrame(df.iloc[:, (df.columns.get_loc(\"FF_0000\")+hour-lags):(df.columns.get_loc(\"FF_0000\")+hour+1)])\n", " ], axis=1)\n", "\n", " y = pd.DataFrame(df.iloc[:, df.columns.get_loc(\"RST_0000\")+hour])\n", " return X,y\n", "\n", "def lmodel(X,y):\n", " '''\n", " input: X matrix, y vector\n", " output: linear regression model summary on given input\n", " '''\n", " X = sm.add_constant(X)\n", " model = sm.OLS(y, X).fit()\n", " print(model.summary())" ] }, { "cell_type": "code", "execution_count": 54, "id": "b548b069", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: RST_1400 R-squared: 0.962\n", "Model: OLS Adj. R-squared: 0.960\n", "Method: Least Squares F-statistic: 443.9\n", "Date: Thu, 15 Dec 2022 Prob (F-statistic): 0.00\n", "Time: 22:26:25 Log-Likelihood: -1743.7\n", "No. Observations: 1015 AIC: 3599.\n", "Df Residuals: 959 BIC: 3875.\n", "Df Model: 55 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -3.1230 0.329 -9.491 0.000 -3.769 -2.477\n", "TT_0400 0.1657 0.150 1.107 0.268 -0.128 0.459\n", "TT_0500 0.0117 0.209 0.056 0.955 -0.399 0.422\n", "TT_0600 -0.2435 0.201 -1.213 0.225 -0.638 0.150\n", "TT_0700 0.3506 0.213 1.649 0.099 -0.067 0.768\n", "TT_0800 -0.0367 0.194 -0.189 0.850 -0.417 0.344\n", "TT_0900 -0.0754 0.172 -0.437 0.662 -0.414 0.263\n", "TT_1000 0.1722 0.173 0.998 0.318 -0.166 0.511\n", "TT_1100 -0.1582 0.163 -0.968 0.333 -0.479 0.163\n", "TT_1200 0.1043 0.141 0.740 0.459 -0.172 0.381\n", "TT_1300 0.0695 0.143 0.486 0.627 -0.211 0.350\n", "TT_1400 0.7044 0.103 6.837 0.000 0.502 0.907\n", "TD_0400 0.0891 0.154 0.579 0.562 -0.213 0.391\n", "TD_0500 -0.1802 0.241 -0.747 0.455 -0.653 0.293\n", "TD_0600 -0.1853 0.249 -0.745 0.456 -0.673 0.302\n", "TD_0700 0.4173 0.220 1.897 0.058 -0.014 0.849\n", "TD_0800 -0.1214 0.192 -0.632 0.527 -0.498 0.255\n", "TD_0900 0.1807 0.176 1.028 0.304 -0.164 0.526\n", "TD_1000 -0.4439 0.181 -2.459 0.014 -0.798 -0.090\n", "TD_1100 0.3055 0.178 1.714 0.087 -0.044 0.655\n", "TD_1200 -0.3071 0.167 -1.843 0.066 -0.634 0.020\n", "TD_1300 0.1375 0.176 0.780 0.435 -0.208 0.483\n", "TD_1400 0.0201 0.108 0.185 0.853 -0.192 0.233\n", "GRAD1_0400 4.1896 4.817 0.870 0.385 -5.263 13.642\n", "GRAD1_0500 0.1135 0.125 0.906 0.365 -0.132 0.359\n", "GRAD1_0600 -0.0103 0.017 -0.621 0.534 -0.043 0.022\n", "GRAD1_0700 0.0036 0.007 0.540 0.589 -0.009 0.017\n", "GRAD1_0800 0.0024 0.004 0.596 0.551 -0.005 0.010\n", "GRAD1_0900 0.0015 0.003 0.541 0.588 -0.004 0.007\n", "GRAD1_1000 0.0060 0.002 2.794 0.005 0.002 0.010\n", "GRAD1_1100 -0.0036 0.002 -2.084 0.037 -0.007 -0.000\n", "GRAD1_1200 0.0063 0.002 3.650 0.000 0.003 0.010\n", "GRAD1_1300 0.0045 0.002 2.506 0.012 0.001 0.008\n", "GRAD1_1400 0.0142 0.001 10.008 0.000 0.011 0.017\n", "NE_0400 0.0171 0.044 0.388 0.698 -0.069 0.104\n", "NE_0500 0.0973 0.054 1.792 0.073 -0.009 0.204\n", "NE_0600 -0.0428 0.044 -0.980 0.327 -0.128 0.043\n", "NE_0700 -0.0553 0.053 -1.042 0.298 -0.159 0.049\n", "NE_0800 0.0836 0.058 1.453 0.146 -0.029 0.197\n", "NE_0900 0.0848 0.048 1.749 0.081 -0.010 0.180\n", "NE_1000 -0.0744 0.078 -0.956 0.339 -0.227 0.078\n", "NE_1100 0.1329 0.080 1.654 0.099 -0.025 0.291\n", "NE_1200 0.0735 0.082 0.902 0.367 -0.086 0.234\n", "NE_1300 0.0698 0.081 0.866 0.387 -0.088 0.228\n", "NE_1400 0.1422 0.060 2.383 0.017 0.025 0.259\n", "FF_0400 -0.0632 0.033 -1.924 0.055 -0.128 0.001\n", "FF_0500 0.0593 0.044 1.335 0.182 -0.028 0.147\n", "FF_0600 -0.0120 0.046 -0.261 0.794 -0.102 0.078\n", "FF_0700 -0.0315 0.045 -0.706 0.480 -0.119 0.056\n", "FF_0800 -0.0112 0.041 -0.272 0.786 -0.092 0.070\n", "FF_0900 0.0581 0.042 1.398 0.162 -0.023 0.140\n", "FF_1000 0.0104 0.040 0.258 0.797 -0.069 0.090\n", "FF_1100 -0.0502 0.041 -1.219 0.223 -0.131 0.031\n", "FF_1200 -0.0363 0.037 -0.971 0.332 -0.110 0.037\n", "FF_1300 0.0240 0.039 0.620 0.536 -0.052 0.100\n", "FF_1400 -0.0551 0.030 -1.839 0.066 -0.114 0.004\n", "==============================================================================\n", "Omnibus: 67.138 Durbin-Watson: 1.445\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 288.268\n", "Skew: -0.042 Prob(JB): 2.53e-63\n", "Kurtosis: 5.609 Cond. No. 6.54e+04\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The condition number is large, 6.54e+04. This might indicate that there are\n", "strong multicollinearity or other numerical problems.\n" ] } ], "source": [ "df = missing(df)\n", "\n", "X1400,y1400 = regression(df, 14, 10)\n", "lmodel(X1400,y1400)" ] }, { "cell_type": "markdown", "id": "0848bdc7", "metadata": {}, "source": [ "From the table above we can see that many variables are deemed insignificant ($P(x_i>|t|)>0.05$). The variables that are deemed most important seem to be the coefficients which have $P>|t| = 0:$ (TT1400), (GRAD1_1200) and (GRAD1_1400). Furtermore, the adjusted $R^2$ value of 0.960 implicates that we can explain most of the variation in road surface temperature." ] }, { "cell_type": "markdown", "id": "3052d71a", "metadata": {}, "source": [ "#### Prediction RST at 10pm" ] }, { "cell_type": "code", "execution_count": 55, "id": "d8cb3567", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: RST_2200 R-squared: 0.962\n", "Model: OLS Adj. R-squared: 0.960\n", "Method: Least Squares F-statistic: 447.5\n", "Date: Thu, 15 Dec 2022 Prob (F-statistic): 0.00\n", "Time: 22:26:25 Log-Likelihood: -1287.6\n", "No. Observations: 1015 AIC: 2685.\n", "Df Residuals: 960 BIC: 2956.\n", "Df Model: 54 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -2.8540 0.201 -14.231 0.000 -3.248 -2.460\n", "TT_1200 0.1748 0.064 2.745 0.006 0.050 0.300\n", "TT_1300 -0.0823 0.089 -0.922 0.357 -0.257 0.093\n", "TT_1400 0.0766 0.084 0.909 0.364 -0.089 0.242\n", "TT_1500 -0.2566 0.105 -2.446 0.015 -0.462 -0.051\n", "TT_1600 0.3697 0.113 3.273 0.001 0.148 0.591\n", "TT_1700 -0.1176 0.113 -1.039 0.299 -0.340 0.104\n", "TT_1800 0.1319 0.120 1.100 0.272 -0.103 0.367\n", "TT_1900 -0.0774 0.130 -0.596 0.551 -0.332 0.177\n", "TT_2000 0.1172 0.135 0.870 0.384 -0.147 0.382\n", "TT_2100 -0.0401 0.127 -0.316 0.752 -0.289 0.209\n", "TT_2200 0.6417 0.081 7.914 0.000 0.483 0.801\n", "TD_1200 0.0918 0.071 1.291 0.197 -0.048 0.231\n", "TD_1300 -0.1298 0.112 -1.163 0.245 -0.349 0.089\n", "TD_1400 0.1492 0.107 1.389 0.165 -0.062 0.360\n", "TD_1500 -0.2424 0.105 -2.307 0.021 -0.449 -0.036\n", "TD_1600 0.1700 0.115 1.478 0.140 -0.056 0.396\n", "TD_1700 -0.0835 0.120 -0.698 0.486 -0.318 0.151\n", "TD_1800 0.0919 0.130 0.707 0.480 -0.163 0.347\n", "TD_1900 -0.1215 0.130 -0.934 0.351 -0.377 0.134\n", "TD_2000 -0.0250 0.135 -0.186 0.853 -0.289 0.239\n", "TD_2100 0.1358 0.140 0.967 0.334 -0.140 0.411\n", "TD_2200 0.0208 0.091 0.228 0.820 -0.159 0.200\n", "GRAD1_1200 0.0021 0.001 2.373 0.018 0.000 0.004\n", "GRAD1_1300 0.0004 0.001 0.351 0.726 -0.002 0.003\n", "GRAD1_1400 0.0004 0.001 0.315 0.753 -0.002 0.003\n", "GRAD1_1500 0.0011 0.002 0.719 0.472 -0.002 0.004\n", "GRAD1_1600 0.0010 0.002 0.400 0.689 -0.004 0.006\n", "GRAD1_1700 0.0037 0.004 0.951 0.342 -0.004 0.011\n", "GRAD1_1800 0.0021 0.010 0.216 0.829 -0.017 0.021\n", "GRAD1_1900 0.0526 0.071 0.740 0.459 -0.087 0.192\n", "GRAD1_2000 -2.2381 3.841 -0.583 0.560 -9.775 5.299\n", "GRAD1_2100 1.0782 1.585 0.680 0.496 -2.032 4.188\n", "GRAD1_2200 -0.8168 1.763 -0.463 0.643 -4.277 2.643\n", "NE_1200 0.0731 0.038 1.908 0.057 -0.002 0.148\n", "NE_1300 0.0343 0.052 0.665 0.506 -0.067 0.136\n", "NE_1400 0.0033 0.049 0.067 0.946 -0.093 0.099\n", "NE_1500 -0.0758 0.050 -1.510 0.131 -0.174 0.023\n", "NE_1600 0.0803 0.046 1.726 0.085 -0.011 0.171\n", "NE_1700 0.0496 0.039 1.257 0.209 -0.028 0.127\n", "NE_1800 -0.0153 0.020 -0.758 0.449 -0.055 0.024\n", "NE_1900 0.0072 0.029 0.248 0.805 -0.050 0.064\n", "NE_2000 0.0084 0.038 0.220 0.826 -0.067 0.084\n", "NE_2100 0.0822 0.042 1.938 0.053 -0.001 0.165\n", "NE_2200 0.1828 0.031 5.908 0.000 0.122 0.243\n", "FF_1200 0.0046 0.018 0.254 0.800 -0.031 0.040\n", "FF_1300 -0.0097 0.025 -0.395 0.693 -0.058 0.038\n", "FF_1400 -0.0610 0.026 -2.331 0.020 -0.112 -0.010\n", "FF_1500 0.0196 0.024 0.805 0.421 -0.028 0.067\n", "FF_1600 0.0208 0.025 0.825 0.409 -0.029 0.070\n", "FF_1700 -0.0240 0.026 -0.930 0.353 -0.075 0.027\n", "FF_1800 0.0089 0.027 0.335 0.738 -0.043 0.061\n", "FF_1900 0.0411 0.028 1.477 0.140 -0.013 0.096\n", "FF_2000 -0.0299 0.027 -1.110 0.267 -0.083 0.023\n", "FF_2100 0.0023 0.028 0.084 0.933 -0.052 0.057\n", "FF_2200 -0.0452 0.021 -2.205 0.028 -0.086 -0.005\n", "==============================================================================\n", "Omnibus: 117.482 Durbin-Watson: 1.248\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 652.087\n", "Skew: 0.361 Prob(JB): 2.52e-142\n", "Kurtosis: 6.860 Cond. No. 1.31e+16\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "[2] The smallest eigenvalue is 1.4e-24. This might indicate that there are\n", "strong multicollinearity problems or that the design matrix is singular.\n" ] } ], "source": [ "X2200,y2200 = regression(df, 22, 10)\n", "X2200\n", "lmodel(X2200,y2200)" ] }, { "cell_type": "markdown", "id": "7887fb32", "metadata": {}, "source": [ "The predictive performance of the linear regression model for 10pm is similar to the performance for 2pm, though we can notice that the parameter values are different. One thing that is interesting to notice is that solar radiation (GRAD1) was an important predictor for RST at 2pm but seems to be of no importance at 10pm, which makes sense, because there is no solar radiation around 10pm anymore. This stresses the importance to make distuinguished models for each hour of the day." ] }, { "cell_type": "markdown", "id": "44825798", "metadata": {}, "source": [ "### Lasso regression" ] }, { "cell_type": "markdown", "id": "b12d7bbc", "metadata": {}, "source": [ "Since many of the variables that were included in the linear model are not relevant and should therefore be ommited, we will make use of lasso regularization, which not only minimizes the sum of squared residuals, but also adds a penalty term which prevents overfitting:\n", "\n", "$\\hat{\\beta}_{lasso} = \\underset{\\beta}{\\arg\\min} \\sum\\limits_{i=1}^n (y_i-X_i'\\beta)^2 + \\lambda \\sum\\limits_{j=1}^p |\\beta_{j}|$\n", "\n", "Note that $\\lambda$ determines the weight of the added penalty, meaning that $\\lambda$ can be any value in the interval $[0, \\infty)$. The benefit of using Lasso regularization is that the less contributing variables will be eliminated from the model. Thus, due to Lasso regularization the model will give more accurate predictions when applying it to out of sample predictions. We will make use of Cross-Validation (CV) to choose the tuning parameter $\\lambda$. More specifically, we will use K-fold CV, which splits the data in K folds and leaves out one of the folds to train the model on all folds except one and make predictions on the left-out fold to estimate the Mean Squared Error (MSE) for a grid of $\\lambda$ values.\n", "\n", "In the code below we will apply lasso on the same predictors as in the linear regression models above." ] }, { "cell_type": "markdown", "id": "063db896", "metadata": {}, "source": [ "#### Lasso RST at 2pm" ] }, { "cell_type": "code", "execution_count": 56, "id": "0f4dcda8", "metadata": {}, "outputs": [], "source": [ "def lassoReg(X,y):\n", " '''\n", " input: matrix X and response vector y\n", " output: dataframe containing MSE, estimated using CV, for all values of the alpha grid 0 until 1.\n", " '''\n", " n = 500\n", " #Creating alpha grid\n", " alphas = np.linspace(0.002,1,n)\n", " mse = np.zeros(n)\n", " X_train=X\n", " y_train=y\n", " \n", " #creating function that fits lasso regression model and returns MSE for same training set\n", " def fit_and_report_mses(mod, X_train, y_train):\n", " mod.fit(X_train, y_train)\n", " return metrics.mean_squared_error(y_train, mod.predict(X_train))\n", "\n", " msetrain = [np.mean(fit_and_report_mses(linear_model.Lasso(alpha=alpha, max_iter=50000),\n", " X_train, y_train))\n", " for alpha in alphas]\n", " \n", " #Calculating estimated MSE on CV for alpha grid\n", " msecv = [-np.mean(cross_val_score(linear_model.Lasso(alpha=alpha, max_iter=50000),\n", " X_train, y_train, cv=10, scoring='neg_mean_squared_error'))\n", " for alpha in alphas]\n", "\n", " print('Optimal alpha: ', alphas[np.where(msecv==np.min(msecv))], 'with corresponding MSE: ', round(np.min(msecv),2))\n", "\n", " #Creating dataframe, which helps making a nice plot for alpha grid against its corresponding estimated MSE\n", " example3 = pd.DataFrame(\n", " {\n", " \"Alpha\": np.concatenate((alphas,alphas)),\n", " \"MSE\": np.concatenate((msecv,msetrain)),\n", " \"cv-train\": np.concatenate((np.repeat(\"CV\", len(alphas)),np.repeat(\"Train\", len(alphas))))\n", " }\n", " )\n", " return example3" ] }, { "cell_type": "code", "execution_count": 57, "id": "13f9324e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimal alpha: [0.048] with corresponding MSE: 2.19\n" ] }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "hovertemplate": "cv-train=CV
Alpha=%{x}
MSE=%{y}", "legendgroup": "CV", "line": { "color": "#636efa", "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "CV", "orientation": "v", "showlegend": true, "type": "scatter", "x": [ 0.002, 0.004, 0.006, 0.008, 0.01, 0.012, 0.014, 0.016, 0.018000000000000002, 0.020000000000000004, 0.022, 0.024, 0.026000000000000002, 0.028000000000000004, 0.03, 0.032, 0.034, 0.036000000000000004, 0.038000000000000006, 0.04, 0.042, 0.044000000000000004, 0.046, 0.048, 0.05, 0.052000000000000005, 0.054000000000000006, 0.056, 0.058, 0.060000000000000005, 0.062, 0.064, 0.066, 0.068, 0.07, 0.07200000000000001, 0.07400000000000001, 0.076, 0.078, 0.08, 0.082, 0.084, 0.08600000000000001, 0.08800000000000001, 0.09, 0.092, 0.094, 0.096, 0.098, 0.1, 0.10200000000000001, 0.10400000000000001, 0.10600000000000001, 0.108, 0.11, 0.112, 0.114, 0.116, 0.11800000000000001, 0.12000000000000001, 0.122, 0.124, 0.126, 0.128, 0.13, 0.132, 0.134, 0.136, 0.138, 0.14, 0.14200000000000002, 0.14400000000000002, 0.14600000000000002, 0.148, 0.15, 0.152, 0.154, 0.156, 0.158, 0.16, 0.162, 0.164, 0.166, 0.168, 0.17, 0.17200000000000001, 0.17400000000000002, 0.17600000000000002, 0.178, 0.18, 0.182, 0.184, 0.186, 0.188, 0.19, 0.192, 0.194, 0.196, 0.198, 0.2, 0.202, 0.20400000000000001, 0.20600000000000002, 0.20800000000000002, 0.21000000000000002, 0.212, 0.214, 0.216, 0.218, 0.22, 0.222, 0.224, 0.226, 0.228, 0.23, 0.232, 0.234, 0.23600000000000002, 0.23800000000000002, 0.24000000000000002, 0.242, 0.244, 0.246, 0.248, 0.25, 0.252, 0.254, 0.256, 0.258, 0.26, 0.262, 0.264, 0.266, 0.268, 0.27, 0.272, 0.274, 0.276, 0.278, 0.28, 0.28200000000000003, 0.28400000000000003, 0.28600000000000003, 0.28800000000000003, 0.29000000000000004, 0.292, 0.294, 0.296, 0.298, 0.3, 0.302, 0.304, 0.306, 0.308, 0.31, 0.312, 0.314, 0.316, 0.318, 0.32, 0.322, 0.324, 0.326, 0.328, 0.33, 0.332, 0.334, 0.336, 0.338, 0.34, 0.342, 0.34400000000000003, 0.34600000000000003, 0.34800000000000003, 0.35000000000000003, 0.35200000000000004, 0.354, 0.356, 0.358, 0.36, 0.362, 0.364, 0.366, 0.368, 0.37, 0.372, 0.374, 0.376, 0.378, 0.38, 0.382, 0.384, 0.386, 0.388, 0.39, 0.392, 0.394, 0.396, 0.398, 0.4, 0.402, 0.404, 0.406, 0.40800000000000003, 0.41000000000000003, 0.41200000000000003, 0.41400000000000003, 0.41600000000000004, 0.41800000000000004, 0.42, 0.422, 0.424, 0.426, 0.428, 0.43, 0.432, 0.434, 0.436, 0.438, 0.44, 0.442, 0.444, 0.446, 0.448, 0.45, 0.452, 0.454, 0.456, 0.458, 0.46, 0.462, 0.464, 0.466, 0.468, 0.47000000000000003, 0.47200000000000003, 0.47400000000000003, 0.47600000000000003, 0.47800000000000004, 0.48000000000000004, 0.482, 0.484, 0.486, 0.488, 0.49, 0.492, 0.494, 0.496, 0.498, 0.5, 0.502, 0.504, 0.506, 0.508, 0.51, 0.512, 0.514, 0.516, 0.518, 0.52, 0.522, 0.524, 0.526, 0.528, 0.53, 0.532, 0.534, 0.536, 0.538, 0.54, 0.542, 0.544, 0.546, 0.548, 0.55, 0.552, 0.554, 0.556, 0.558, 0.56, 0.562, 0.5640000000000001, 0.5660000000000001, 0.5680000000000001, 0.5700000000000001, 0.5720000000000001, 0.5740000000000001, 0.5760000000000001, 0.5780000000000001, 0.58, 0.582, 0.584, 0.586, 0.588, 0.59, 0.592, 0.594, 0.596, 0.598, 0.6, 0.602, 0.604, 0.606, 0.608, 0.61, 0.612, 0.614, 0.616, 0.618, 0.62, 0.622, 0.624, 0.626, 0.628, 0.63, 0.632, 0.634, 0.636, 0.638, 0.64, 0.642, 0.644, 0.646, 0.648, 0.65, 0.652, 0.654, 0.656, 0.658, 0.66, 0.662, 0.664, 0.666, 0.668, 0.67, 0.672, 0.674, 0.676, 0.678, 0.68, 0.682, 0.684, 0.686, 0.6880000000000001, 0.6900000000000001, 0.6920000000000001, 0.6940000000000001, 0.6960000000000001, 0.6980000000000001, 0.7000000000000001, 0.7020000000000001, 0.7040000000000001, 0.706, 0.708, 0.71, 0.712, 0.714, 0.716, 0.718, 0.72, 0.722, 0.724, 0.726, 0.728, 0.73, 0.732, 0.734, 0.736, 0.738, 0.74, 0.742, 0.744, 0.746, 0.748, 0.75, 0.752, 0.754, 0.756, 0.758, 0.76, 0.762, 0.764, 0.766, 0.768, 0.77, 0.772, 0.774, 0.776, 0.778, 0.78, 0.782, 0.784, 0.786, 0.788, 0.79, 0.792, 0.794, 0.796, 0.798, 0.8, 0.802, 0.804, 0.806, 0.808, 0.81, 0.812, 0.8140000000000001, 0.8160000000000001, 0.8180000000000001, 0.8200000000000001, 0.8220000000000001, 0.8240000000000001, 0.8260000000000001, 0.8280000000000001, 0.8300000000000001, 0.8320000000000001, 0.8340000000000001, 0.836, 0.838, 0.84, 0.842, 0.844, 0.846, 0.848, 0.85, 0.852, 0.854, 0.856, 0.858, 0.86, 0.862, 0.864, 0.866, 0.868, 0.87, 0.872, 0.874, 0.876, 0.878, 0.88, 0.882, 0.884, 0.886, 0.888, 0.89, 0.892, 0.894, 0.896, 0.898, 0.9, 0.902, 0.904, 0.906, 0.908, 0.91, 0.912, 0.914, 0.916, 0.918, 0.92, 0.922, 0.924, 0.926, 0.928, 0.93, 0.932, 0.934, 0.936, 0.9380000000000001, 0.9400000000000001, 0.9420000000000001, 0.9440000000000001, 0.9460000000000001, 0.9480000000000001, 0.9500000000000001, 0.9520000000000001, 0.9540000000000001, 0.9560000000000001, 0.9580000000000001, 0.9600000000000001, 0.962, 0.964, 0.966, 0.968, 0.97, 0.972, 0.974, 0.976, 0.978, 0.98, 0.982, 0.984, 0.986, 0.988, 0.99, 0.992, 0.994, 0.996, 0.998, 1 ], "xaxis": "x", "y": [ 2.2646970342940973, 2.2480075916329625, 2.2332324482811803, 2.221767818746468, 2.2148923341258575, 2.212327283474827, 2.2099379013088187, 2.2090068519538786, 2.208502626318002, 2.2074085190029353, 2.205556839153391, 2.203346545086158, 2.201008308052397, 2.1987825943213872, 2.1969024179571397, 2.1950869241800572, 2.193244589724946, 2.191578629628721, 2.19077462042609, 2.190327447054993, 2.18997033591305, 2.189593561101281, 2.1893367093335625, 2.1892946999032192, 2.189444159792406, 2.1898261003204444, 2.190164450895966, 2.190700362392526, 2.191267140534292, 2.1918288583954095, 2.1924637670911102, 2.193146544378304, 2.1933612722253057, 2.1933439898594616, 2.1933608752558604, 2.193442688589807, 2.193562932719751, 2.1937021305289157, 2.1937775520714573, 2.1936295146686193, 2.19344759427689, 2.1933434599170862, 2.193302421395095, 2.1932154030507776, 2.193148597786693, 2.193100538224181, 2.1930434411972093, 2.1930398174108463, 2.193065600278961, 2.1931562620659792, 2.1932792079204697, 2.193410368190086, 2.1936623375459505, 2.1939243837156637, 2.1941802803346198, 2.194440173745732, 2.1947103338758263, 2.194985770326418, 2.1952684604721884, 2.1955568410578703, 2.1958524815241085, 2.1961545702649197, 2.1964644461621816, 2.1967794913409477, 2.197103033006349, 2.197431627529814, 2.197769802339337, 2.1981121137469986, 2.1984610439959438, 2.1988172611637316, 2.1991816619830575, 2.19955177164722, 2.1999279943640033, 2.2003096738036283, 2.2006656720965694, 2.201051828114928, 2.2014698958000087, 2.201893768579277, 2.202325016363399, 2.2027568162258833, 2.2031917480731193, 2.2036350251510077, 2.2040992003136823, 2.204585144184043, 2.2050773542307724, 2.2055724813126116, 2.20607575441497, 2.206585576197239, 2.207100513744332, 2.207624729433896, 2.2081550329011534, 2.208692569056244, 2.209236187677086, 2.2097828296755626, 2.210334804134593, 2.2108938719157725, 2.2114597761893338, 2.2120329137144683, 2.2126140097894362, 2.2132076057139427, 2.2138355863668435, 2.214470373751456, 2.2151123092521257, 2.215766182389509, 2.2164421881188616, 2.217115299287154, 2.2177952967193137, 2.2184814297300015, 2.219174936648227, 2.219875021911226, 2.220580800665813, 2.2212960311429373, 2.2220165513136356, 2.22274431593683, 2.2234785849828516, 2.224219769435302, 2.224980608248149, 2.2257466332117, 2.226525364773408, 2.227323868823265, 2.2281214317718616, 2.228914050034286, 2.2297199823406353, 2.2305349456943055, 2.231352369838241, 2.232160308895268, 2.23298472318975, 2.233828596041547, 2.2346645655680155, 2.235509765292485, 2.2363640969780554, 2.237225633314453, 2.2380932164597835, 2.238966587994188, 2.2398493496425558, 2.2407436452053946, 2.2416463998466396, 2.242556276602297, 2.2434735931169443, 2.244393900519938, 2.2453454858937434, 2.246303835739933, 2.2472664363423984, 2.2482490715121273, 2.2492391505303373, 2.250236954604909, 2.2512495962534085, 2.25226608189688, 2.253305449972487, 2.254355133171308, 2.255416490994443, 2.256471079424362, 2.2575190842218786, 2.258573889515358, 2.2596382082062485, 2.26071228199607, 2.261791489445245, 2.262876639290659, 2.263970676652858, 2.2650708260352266, 2.266175498614732, 2.2672831146142225, 2.2683992106198376, 2.269524907579063, 2.2706605966419486, 2.2717984938255205, 2.2729338026707184, 2.274079274248168, 2.2752292804711223, 2.276391545935722, 2.277550152018371, 2.2787164556923343, 2.27988444678167, 2.281067080090644, 2.2822545315278986, 2.2834460334759603, 2.284641406312343, 2.2858413682354692, 2.287052681148853, 2.2882680189805944, 2.2894822813961637, 2.2907169559776164, 2.2919592095729664, 2.293209536980332, 2.2944501107563116, 2.295676987049178, 2.2969124868868898, 2.298154325423422, 2.299399416192505, 2.300647068697243, 2.301901291703248, 2.3031645949493633, 2.304433414730186, 2.3057085766543906, 2.3069910288501125, 2.30827735509823, 2.3095633945075464, 2.3108571654135863, 2.3121559943815395, 2.313463420944287, 2.3147885055951525, 2.316122086050607, 2.317461330991372, 2.3187947956539765, 2.3201367325757776, 2.321486259903576, 2.322834093615738, 2.324153348759798, 2.325474460077653, 2.326800070783945, 2.328131424016429, 2.3294689461387477, 2.33081492670246, 2.3321661848793602, 2.333505370088495, 2.334852367988506, 2.3362006734438636, 2.3375586307816465, 2.338919393775917, 2.3402893391747535, 2.341666822645401, 2.343048679582893, 2.34443431520567, 2.34583378472509, 2.34722232187931, 2.3486190110679135, 2.3500226328522538, 2.351426226633805, 2.352833422444232, 2.354249098640456, 2.3556653493579787, 2.357085334536058, 2.3585149485196033, 2.3599508882714177, 2.361394586207056, 2.3628469108815646, 2.3643071911682396, 2.3657742846790653, 2.367249310721649, 2.3687291013326712, 2.3702164319641534, 2.3717098568194093, 2.373212541354399, 2.3747200733464995, 2.3762323410452266, 2.377756174308695, 2.379284387225437, 2.380820185681889, 2.3823727845232066, 2.3839453754831803, 2.385522025900024, 2.387108066080773, 2.3887060349807485, 2.390309124899407, 2.391917174736566, 2.3935329392588107, 2.3951522838910857, 2.3967791094127477, 2.3984191862674047, 2.4000419302729674, 2.4016607008240647, 2.4032746198954547, 2.404884733926206, 2.4064962422945055, 2.408108611512049, 2.4097301809873874, 2.4113588229767986, 2.412989944128454, 2.4146286229873333, 2.4162708096385552, 2.4179323298773485, 2.4196149138265013, 2.4213003150666648, 2.4229934645839033, 2.424690695412785, 2.4263956510503184, 2.4281067954142346, 2.4298240007132086, 2.431546364125146, 2.43327111346596, 2.434997905533611, 2.436726494973629, 2.4384616533257177, 2.440202607950653, 2.441951180973009, 2.4437015383802003, 2.4454629477687053, 2.44722829830683, 2.4490016551349494, 2.4507810731612416, 2.4525842149624477, 2.454388918575281, 2.456199221103726, 2.4580150082237915, 2.459836113984938, 2.461663199157079, 2.4634952178614764, 2.465330817446152, 2.4671738840256316, 2.4690228361280906, 2.470875629717492, 2.4727348397796702, 2.474601198035676, 2.4764760705944364, 2.478354876619569, 2.480239464722888, 2.48213123759903, 2.484021574822098, 2.485906275847989, 2.48773008553971, 2.4895302099002303, 2.4913273722177207, 2.493130019424709, 2.4949367451299542, 2.496744684156554, 2.498546278034863, 2.5003541385756978, 2.502187172581254, 2.504011933863982, 2.5058451216444375, 2.507684173724325, 2.5095299111052585, 2.5113791864469923, 2.513238473934027, 2.515100641621695, 2.5169677867581726, 2.5188479047973606, 2.5207329224127735, 2.522627087763698, 2.5245279819720126, 2.5264308620233202, 2.5283412535073473, 2.5302596976592593, 2.5321824478441575, 2.5341075077690585, 2.5360213786370283, 2.5379280191474125, 2.539832785154413, 2.5417406239399636, 2.543655534303679, 2.545577054605465, 2.547504111874335, 2.549437731639345, 2.5513761290115395, 2.553317226227235, 2.5552640257774586, 2.5572147008222474, 2.5591724536953766, 2.561132268126961, 2.5630885944316075, 2.5650473233758886, 2.5670105505280763, 2.568975862927698, 2.5709504709567264, 2.57292693165411, 2.574912093959612, 2.576902436341377, 2.578897130409057, 2.5808803704511014, 2.5828193282386542, 2.584764909069977, 2.5867163141069356, 2.5886702652581417, 2.5906292453000477, 2.5925929798641727, 2.5945634365616352, 2.596499214712228, 2.598374995714762, 2.600255537354656, 2.6021393064380107, 2.6040193808674252, 2.605906833353132, 2.6077942524369777, 2.6096908043879283, 2.611511523721671, 2.6132909721351534, 2.615071468759127, 2.616856689969015, 2.6186469654248095, 2.6204431394887453, 2.622243788017639, 2.6240501093065243, 2.6258551322587484, 2.627646137503979, 2.6293829519933096, 2.631122826438186, 2.6328357754152663, 2.634548212101121, 2.636218432265684, 2.637810661211842, 2.6394037415011318, 2.640998866828009, 2.6426042628363997, 2.6442147100731495, 2.645826778937402, 2.647442234529307, 2.649057150993339, 2.6506772374565224, 2.6522970766933827, 2.6539237708478085, 2.655553488923773, 2.657188667826014, 2.658826588315209, 2.6604588371225217, 2.662095284412581, 2.6637357196810987, 2.6653811929900866, 2.6670034719501428, 2.668581334103152, 2.6701643347593227, 2.671750790826306, 2.6733408416835562, 2.674934415395511, 2.676530906529153, 2.67813180521666, 2.6797368794702385, 2.681346772897407, 2.68296014826963, 2.6845751055102602, 2.6861937387466623, 2.6878173389067643, 2.6894447533704415, 2.691076009260006, 2.692710969385761, 2.694349650830669, 2.69599361406013, 2.6976376691928694, 2.6992875806267795, 2.700939945518756, 2.7025971607761337, 2.7041963553674413, 2.7056732434862467, 2.707154810978717, 2.7086353704560393, 2.710122532753744, 2.711612904744349, 2.713108248661459, 2.7146051331735377, 2.7161048690085785, 2.717601623315467, 2.7189299894659604, 2.720262708187976, 2.7215993876857416, 2.7229387851497338, 2.7242816701966945, 2.725626834853654, 2.7269766354080858, 2.7283304362084273, 2.729685492855166, 2.730909182827473, 2.73212122433975, 2.733307147071362, 2.7343796715715665, 2.7354554125570174, 2.736415188008926, 2.7372097668278865, 2.7380070107288965, 2.738803780421803, 2.739478955554243, 2.7401470238425234, 2.740818019879849, 2.7414891840416695, 2.7421644531024256, 2.7428394818977635, 2.743516252934311, 2.744194719650923, 2.744876881669922, 2.745560046959358, 2.7462446247994756, 2.7469321651850658, 2.747621849285917, 2.7483122086894545, 2.7490051291398254, 2.749698277195968, 2.750393981469906, 2.7510920689118215, 2.7517924499050714, 2.752494045672429, 2.7531974320865444, 2.7539021921147846, 2.7545388849826375, 2.75509297299552, 2.755646376989914, 2.7562001506712215, 2.7567555656591263, 2.757312044280975, 2.7578686437764444, 2.758425771092712, 2.758985101255687, 2.7595451210233644, 2.760105640132469, 2.7606677908878843, 2.761230998011544, 2.7617964689618795, 2.7623199049870406, 2.762845618478882, 2.7633724526048002, 2.7639013510953285, 2.764431464268081, 2.7649604257178004 ], "yaxis": "y" }, { "hovertemplate": "cv-train=Train
Alpha=%{x}
MSE=%{y}", "legendgroup": "Train", "line": { "color": "#EF553B", "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "Train", "orientation": "v", "showlegend": true, "type": "scatter", "x": [ 0.002, 0.004, 0.006, 0.008, 0.01, 0.012, 0.014, 0.016, 0.018000000000000002, 0.020000000000000004, 0.022, 0.024, 0.026000000000000002, 0.028000000000000004, 0.03, 0.032, 0.034, 0.036000000000000004, 0.038000000000000006, 0.04, 0.042, 0.044000000000000004, 0.046, 0.048, 0.05, 0.052000000000000005, 0.054000000000000006, 0.056, 0.058, 0.060000000000000005, 0.062, 0.064, 0.066, 0.068, 0.07, 0.07200000000000001, 0.07400000000000001, 0.076, 0.078, 0.08, 0.082, 0.084, 0.08600000000000001, 0.08800000000000001, 0.09, 0.092, 0.094, 0.096, 0.098, 0.1, 0.10200000000000001, 0.10400000000000001, 0.10600000000000001, 0.108, 0.11, 0.112, 0.114, 0.116, 0.11800000000000001, 0.12000000000000001, 0.122, 0.124, 0.126, 0.128, 0.13, 0.132, 0.134, 0.136, 0.138, 0.14, 0.14200000000000002, 0.14400000000000002, 0.14600000000000002, 0.148, 0.15, 0.152, 0.154, 0.156, 0.158, 0.16, 0.162, 0.164, 0.166, 0.168, 0.17, 0.17200000000000001, 0.17400000000000002, 0.17600000000000002, 0.178, 0.18, 0.182, 0.184, 0.186, 0.188, 0.19, 0.192, 0.194, 0.196, 0.198, 0.2, 0.202, 0.20400000000000001, 0.20600000000000002, 0.20800000000000002, 0.21000000000000002, 0.212, 0.214, 0.216, 0.218, 0.22, 0.222, 0.224, 0.226, 0.228, 0.23, 0.232, 0.234, 0.23600000000000002, 0.23800000000000002, 0.24000000000000002, 0.242, 0.244, 0.246, 0.248, 0.25, 0.252, 0.254, 0.256, 0.258, 0.26, 0.262, 0.264, 0.266, 0.268, 0.27, 0.272, 0.274, 0.276, 0.278, 0.28, 0.28200000000000003, 0.28400000000000003, 0.28600000000000003, 0.28800000000000003, 0.29000000000000004, 0.292, 0.294, 0.296, 0.298, 0.3, 0.302, 0.304, 0.306, 0.308, 0.31, 0.312, 0.314, 0.316, 0.318, 0.32, 0.322, 0.324, 0.326, 0.328, 0.33, 0.332, 0.334, 0.336, 0.338, 0.34, 0.342, 0.34400000000000003, 0.34600000000000003, 0.34800000000000003, 0.35000000000000003, 0.35200000000000004, 0.354, 0.356, 0.358, 0.36, 0.362, 0.364, 0.366, 0.368, 0.37, 0.372, 0.374, 0.376, 0.378, 0.38, 0.382, 0.384, 0.386, 0.388, 0.39, 0.392, 0.394, 0.396, 0.398, 0.4, 0.402, 0.404, 0.406, 0.40800000000000003, 0.41000000000000003, 0.41200000000000003, 0.41400000000000003, 0.41600000000000004, 0.41800000000000004, 0.42, 0.422, 0.424, 0.426, 0.428, 0.43, 0.432, 0.434, 0.436, 0.438, 0.44, 0.442, 0.444, 0.446, 0.448, 0.45, 0.452, 0.454, 0.456, 0.458, 0.46, 0.462, 0.464, 0.466, 0.468, 0.47000000000000003, 0.47200000000000003, 0.47400000000000003, 0.47600000000000003, 0.47800000000000004, 0.48000000000000004, 0.482, 0.484, 0.486, 0.488, 0.49, 0.492, 0.494, 0.496, 0.498, 0.5, 0.502, 0.504, 0.506, 0.508, 0.51, 0.512, 0.514, 0.516, 0.518, 0.52, 0.522, 0.524, 0.526, 0.528, 0.53, 0.532, 0.534, 0.536, 0.538, 0.54, 0.542, 0.544, 0.546, 0.548, 0.55, 0.552, 0.554, 0.556, 0.558, 0.56, 0.562, 0.5640000000000001, 0.5660000000000001, 0.5680000000000001, 0.5700000000000001, 0.5720000000000001, 0.5740000000000001, 0.5760000000000001, 0.5780000000000001, 0.58, 0.582, 0.584, 0.586, 0.588, 0.59, 0.592, 0.594, 0.596, 0.598, 0.6, 0.602, 0.604, 0.606, 0.608, 0.61, 0.612, 0.614, 0.616, 0.618, 0.62, 0.622, 0.624, 0.626, 0.628, 0.63, 0.632, 0.634, 0.636, 0.638, 0.64, 0.642, 0.644, 0.646, 0.648, 0.65, 0.652, 0.654, 0.656, 0.658, 0.66, 0.662, 0.664, 0.666, 0.668, 0.67, 0.672, 0.674, 0.676, 0.678, 0.68, 0.682, 0.684, 0.686, 0.6880000000000001, 0.6900000000000001, 0.6920000000000001, 0.6940000000000001, 0.6960000000000001, 0.6980000000000001, 0.7000000000000001, 0.7020000000000001, 0.7040000000000001, 0.706, 0.708, 0.71, 0.712, 0.714, 0.716, 0.718, 0.72, 0.722, 0.724, 0.726, 0.728, 0.73, 0.732, 0.734, 0.736, 0.738, 0.74, 0.742, 0.744, 0.746, 0.748, 0.75, 0.752, 0.754, 0.756, 0.758, 0.76, 0.762, 0.764, 0.766, 0.768, 0.77, 0.772, 0.774, 0.776, 0.778, 0.78, 0.782, 0.784, 0.786, 0.788, 0.79, 0.792, 0.794, 0.796, 0.798, 0.8, 0.802, 0.804, 0.806, 0.808, 0.81, 0.812, 0.8140000000000001, 0.8160000000000001, 0.8180000000000001, 0.8200000000000001, 0.8220000000000001, 0.8240000000000001, 0.8260000000000001, 0.8280000000000001, 0.8300000000000001, 0.8320000000000001, 0.8340000000000001, 0.836, 0.838, 0.84, 0.842, 0.844, 0.846, 0.848, 0.85, 0.852, 0.854, 0.856, 0.858, 0.86, 0.862, 0.864, 0.866, 0.868, 0.87, 0.872, 0.874, 0.876, 0.878, 0.88, 0.882, 0.884, 0.886, 0.888, 0.89, 0.892, 0.894, 0.896, 0.898, 0.9, 0.902, 0.904, 0.906, 0.908, 0.91, 0.912, 0.914, 0.916, 0.918, 0.92, 0.922, 0.924, 0.926, 0.928, 0.93, 0.932, 0.934, 0.936, 0.9380000000000001, 0.9400000000000001, 0.9420000000000001, 0.9440000000000001, 0.9460000000000001, 0.9480000000000001, 0.9500000000000001, 0.9520000000000001, 0.9540000000000001, 0.9560000000000001, 0.9580000000000001, 0.9600000000000001, 0.962, 0.964, 0.966, 0.968, 0.97, 0.972, 0.974, 0.976, 0.978, 0.98, 0.982, 0.984, 0.986, 0.988, 0.99, 0.992, 0.994, 0.996, 0.998, 1 ], "xaxis": "x", "y": [ 1.8218995882689908, 1.8271695718249303, 1.8325296212751687, 1.837842790100628, 1.8417801403185197, 1.846374574843671, 1.8504125900564652, 1.853410943291435, 1.8566005541433075, 1.8600531441735701, 1.8637352012017683, 1.8665316895304065, 1.868587885072446, 1.8707994146821867, 1.8729126616708707, 1.8751172445245339, 1.877341567302015, 1.878794033810613, 1.8801263360493639, 1.8814550630514797, 1.8827869252452805, 1.8841839948730814, 1.8856460251938998, 1.8871730288099204, 1.8887649872449352, 1.8904221847787699, 1.8917275088491454, 1.8930285262386013, 1.8943768553312998, 1.8957724658901767, 1.8972156638668343, 1.8987059041815781, 1.900243450817441, 1.9015485919612074, 1.9026811643347743, 1.9038466191647352, 1.9050447355337297, 1.9062758815383656, 1.9075398325137176, 1.9088365874418136, 1.910166018380413, 1.9115285658786563, 1.9120528933621723, 1.9123201015925144, 1.912593403755995, 1.9128728158192958, 1.913158348210847, 1.9134500982463825, 1.9137478755747244, 1.9140518684731351, 1.9143619260517102, 1.9146781304498883, 1.915000514891766, 1.9153289368907456, 1.9156635114820624, 1.9160043031759322, 1.9163511619886557, 1.9167041482875096, 1.9170633197467373, 1.9174285438784735, 1.91779990397879, 1.9181773893561582, 1.918561093561795, 1.918950844767419, 1.919346729491154, 1.9197487186313436, 1.9201569539358634, 1.920571220007641, 1.9209916149739759, 1.921418142921235, 1.9218507941793166, 1.92228966786583, 1.922734561874176, 1.9231856237490692, 1.9236428013326201, 1.9241061114231617, 1.9245756642098903, 1.9250512346905564, 1.9255329287044547, 1.9260207698273901, 1.9265147458904925, 1.9270148588636988, 1.9275211989784888, 1.928033575504482, 1.928552046135555, 1.929076653375861, 1.9295832228454033, 1.9300756934456353, 1.930573797485272, 1.9310776429857233, 1.9315870347146278, 1.932102020235943, 1.9326226508325912, 1.9331489327564453, 1.9336808621965997, 1.9342183826197676, 1.934761569735646, 1.9353104490034458, 1.9358648682907371, 1.936424884786856, 1.9369905431315493, 1.9375618253531486, 1.9381387352725195, 1.938721290669527, 1.939309440623377, 1.9399032331284256, 1.9405027039128233, 1.9411077390466878, 1.941718366243108, 1.9423345797460607, 1.9429564453390924, 1.943583946024608, 1.944217056708308, 1.9448558114163128, 1.9455001714616265, 1.9461502451541361, 1.946805861608553, 1.9474670753839056, 1.9481339045922799, 1.9488063449776027, 1.949484436355769, 1.9501681049787352, 1.9508574250212918, 1.9515718036292882, 1.9523766481371936, 1.9531861396945527, 1.9539983562513088, 1.9548093007797647, 1.9556188062241524, 1.9564351206188746, 1.957257212779552, 1.9580861623097567, 1.958921437416864, 1.9597624585166886, 1.9606103323527573, 1.9614645320808082, 1.962324472196863, 1.9631912934808498, 1.9640644250251644, 1.9649432755165415, 1.9658290252265544, 1.9667210795037502, 1.9676188472921428, 1.9685235196800899, 1.969434499214494, 1.9703490786132731, 1.971268226671242, 1.9721936517788063, 1.9731246973255065, 1.974062663335426, 1.9750069102714658, 1.975957423546644, 1.9769135345082105, 1.9778765912538383, 1.978845922429162, 1.9798208095727814, 1.9808026786961321, 1.9817908129990685, 1.9827844919639226, 1.9837851468318113, 1.9847920490889297, 1.9858051901251776, 1.9868229941888913, 1.9878796827502565, 1.988942894420922, 1.9900133915754565, 1.9910903683550598, 1.9921738254470596, 1.993263758111315, 1.9943592795293263, 1.9954621374742847, 1.9965714924303684, 1.9976873083507807, 1.998808708877613, 1.9999374627710829, 2.0010726906176597, 2.0022143907331174, 2.003361624829913, 2.0045161912177254, 2.0056773014229234, 2.0068449295214887, 2.008019033223502, 2.0091986532797463, 2.0103857094057522, 2.0115792506874537, 2.0127792715266195, 2.0139847829223263, 2.015197691251043, 2.016417138851919, 2.0176431120186895, 2.018874548100278, 2.0201134845834496, 2.02135889265621, 2.022610773541519, 2.023868069651209, 2.0251329071030515, 2.0264031545624013, 2.0276798048997615, 2.0289616000436475, 2.030250190363882, 2.0315453982679728, 2.0328446641826714, 2.034146304770409, 2.0354188397206903, 2.0366978092242354, 2.0379830962987344, 2.0392732397958797, 2.0405711110653053, 2.0418752594426883, 2.043185594177946, 2.0445007054054782, 2.0458235997621688, 2.0471528012988203, 2.0484882633755968, 2.0498299884534785, 2.051176425489328, 2.0525306561490524, 2.053891147141818, 2.055257906632004, 2.056630938787049, 2.0580086121626566, 2.059394126957781, 2.060785811834703, 2.0621837676320234, 2.0635879943785906, 2.064996902609912, 2.0664137430024554, 2.067836845745594, 2.0692662064367133, 2.070701829577302, 2.0721419799924443, 2.0735901164224466, 2.0750445079721747, 2.0765051609616716, 2.0779702868911785, 2.0794434489276683, 2.080922842346417, 2.082408433038049, 2.083900299205104, 2.0853965380907273, 2.0868919112419837, 2.088389133483085, 2.089896515467122, 2.0914103337615715, 2.0929305872148447, 2.0944572677567352, 2.095990362957304, 2.09752944439564, 2.099075191436526, 2.100627293945877, 2.102185097433359, 2.1037497484624312, 2.105320695073306, 2.1068979255163875, 2.1084814012486426, 2.1100701443056082, 2.1116660290142697, 2.113268154525342, 2.1148764848642005, 2.1164909627059907, 2.1180822461764626, 2.11967473729032, 2.121275180539557, 2.1228802022752724, 2.1244934542607776, 2.126113140297672, 2.1277368936622802, 2.1293691645440678, 2.1310052056243918, 2.1326499234126475, 2.1343003264317044, 2.135947401370493, 2.1376034878955243, 2.1392631089862126, 2.1409314044693897, 2.1426057188891248, 2.144286146552337, 2.1459726628000464, 2.147665268358998, 2.1493639483072418, 2.1510687007688527, 2.152779590774944, 2.1544965830687968, 2.1562196427399933, 2.15794603499946, 2.15968122045378, 2.1614224666844213, 2.1631697751329733, 2.1649231416155303, 2.166682565089105, 2.168445309703239, 2.1702168559308244, 2.171994466497768, 2.173778148323155, 2.175565127906273, 2.1773609377710863, 2.1791628149507734, 2.1809707613373988, 2.182784778230708, 2.18460198109547, 2.1864280293413594, 2.1882601525999403, 2.190098350866896, 2.191942627105959, 2.193790093693992, 2.1956464968161375, 2.1975089673076402, 2.199357316461437, 2.201205033104349, 2.2030553826916317, 2.204913135490343, 2.206774245353634, 2.208643618517564, 2.2105189898743487, 2.2123980223866893, 2.214285362931006, 2.2161786918018263, 2.2180780096831603, 2.219980926621419, 2.2218922111446937, 2.2238094890504208, 2.225730318394042, 2.227659561817173, 2.2295947930557047, 2.2315360223505025, 2.2334807377342525, 2.2354339515268733, 2.237393164001113, 2.239358372998662, 2.2413269653468, 2.2433040511760023, 2.245287146853164, 2.247276243025864, 2.2492686789893224, 2.2512697487314197, 2.253276814058492, 2.2552898713542158, 2.2573089391735786, 2.2593312628299946, 2.261362351031112, 2.263399519761675, 2.2654427037022686, 2.267421574840923, 2.2693348573490844, 2.2712569206879674, 2.2731814806350794, 2.2751144570006936, 2.2770504032779613, 2.278992773541445, 2.280939535170153, 2.2828533236194635, 2.284756192979397, 2.28666446208041, 2.2885781322115806, 2.2904935331556513, 2.292417990980668, 2.294348226269907, 2.2962841699158476, 2.298221803351277, 2.3001685679683437, 2.3021207397950887, 2.3040745634633084, 2.3060375647031295, 2.3080059818692424, 2.3099760125463575, 2.3119552602587343, 2.3139361264535427, 2.3159263243512602, 2.3179219232273103, 2.319919081687615, 2.3219255191360237, 2.323937369455042, 2.3259507458984157, 2.3279734324696686, 2.3300015338913433, 2.3320311141602845, 2.3340700285473206, 2.3361143033575122, 2.3381598401005848, 2.340214765159479, 2.3422750995938832, 2.3443368243210543, 2.3464079597692535, 2.348484493619014, 2.349899018379504, 2.351269446490315, 2.3526432822582453, 2.354017300770173, 2.3553980930933034, 2.3567823667816556, 2.358170126436477, 2.3595614370105142, 2.360952979363603, 2.362351403780699, 2.363753387747081, 2.3651589305534375, 2.3665646656486543, 2.3679773059725586, 2.369393498994245, 2.3708132725458273, 2.3722365939207477, 2.3736600744800875, 2.375090513835856, 2.376524510406378, 2.3779620700940183, 2.379403188316518, 2.3808444272801363, 2.3822926593257066, 2.383744449361242, 2.3851997937786393, 2.386655222821631, 2.388117720737775, 2.389583855120398, 2.391053544802343, 2.3925233201882743, 2.3940001568830143, 2.3954805546303213, 2.396964536690485, 2.3984485391598764, 2.399939635503077, 2.401434292465434, 2.4029289502126447, 2.4044307291324545, 2.4059360345294585, 2.4074446942492287, 2.408956774616811, 2.410468986298161, 2.411988348315909, 2.4135112621724253, 2.4150377294403174, 2.416564119580342, 2.418097694627388, 2.4196348242453594, 2.421175507963692, 2.4227197453386955, 2.424263877931667, 2.425815240354714, 2.427370156006167, 2.4289286404154757, 2.430486984244306, 2.432052582686983, 2.4336217296464806, 2.435194433451705, 2.4367706882599416, 2.4383467613261534, 2.439930116861379, 2.4415170249025673, 2.4431075184137656, 2.4446977766711133, 2.4462953369866316, 2.447896447307171, 2.449501128507984, 2.451109396856469, 2.4527174173207476, 2.454332824684089, 2.4559517796654, 2.4575742882918505, 2.4591964961231843, 2.4608260783266593, 2.4624592152230513, 2.464095908104011, 2.465454836008675, 2.465904267526873, 2.4663546175581743, 2.466805920724948, 2.467258185056978, 2.4677114028925113, 2.4681655888369485, 2.468620759431011, 2.4690768949554847, 2.4695340911071173, 2.46999227535532, 2.4704514534460507, 2.4709116258064507, 2.471372821420685, 2.4718350409736662, 2.4722982216261684, 2.4727623409513177, 2.4732313885662385, 2.473697360496989, 2.4741643015601853, 2.4746321970210596, 2.4751010508518188, 2.4755708298377743, 2.476041251285179, 2.4765124977468047, 2.4769848737968863, 2.4774623634449098, 2.4779365851700352, 2.4784119703639895, 2.4788883160307496, 2.4793656323801994, 2.4798332865658868, 2.4803018215134607, 2.480770128745363, 2.4812390078765425, 2.481708827648703, 2.4821834574023884, 2.482655166494873, 2.483127823768324 ], "yaxis": "y" } ], "layout": { "legend": { "title": { "text": "cv-train" }, "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "title": { "text": "Alpha" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "title": { "text": "MSE" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "example3 = lassoReg(np.matrix(X1400),y1400)\n", "px.line(example3, x=\"Alpha\", y=\"MSE\", color=\"cv-train\")" ] }, { "cell_type": "markdown", "id": "047018d8", "metadata": {}, "source": [ "The optimal lasso regression model for RST1400 seems to be for $\\alpha=0.048$, since it minimizes the estimated MSE using CV. One can notice that the MSE on the training set is lower than on the left-out fold, which make sense intuitively." ] }, { "cell_type": "code", "execution_count": 58, "id": "040dee94", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lasso
TT_04000.040399
TT_05000.000000
TT_06000.000000
TT_07000.175607
TT_08000.039599
TT_09000.000000
TT_10000.000000
TT_11000.000000
TT_12000.000000
TT_13000.068205
TT_14000.694896
TD_04000.000000
TD_0500-0.000000
TD_0600-0.000000
TD_07000.000000
TD_08000.000000
TD_0900-0.000000
TD_1000-0.049887
TD_1100-0.000000
TD_1200-0.000000
TD_1300-0.000000
TD_14000.000000
GRAD1_04000.000000
GRAD1_05000.000000
GRAD1_0600-0.005161
GRAD1_07000.004623
GRAD1_08000.002682
GRAD1_09000.002266
GRAD1_10000.006073
GRAD1_1100-0.003775
GRAD1_12000.005764
GRAD1_13000.004337
GRAD1_14000.014364
NE_04000.012087
NE_05000.066627
NE_0600-0.016707
NE_0700-0.005721
NE_08000.045778
NE_09000.057771
NE_10000.000000
NE_11000.082545
NE_12000.040359
NE_13000.072016
NE_14000.143800
FF_0400-0.020241
FF_05000.000000
FF_0600-0.000000
FF_0700-0.000000
FF_0800-0.000000
FF_09000.002726
FF_10000.000000
FF_1100-0.027617
FF_1200-0.029053
FF_1300-0.000000
FF_1400-0.031354
\n", "
" ], "text/plain": [ " lasso\n", "TT_0400 0.040399\n", "TT_0500 0.000000\n", "TT_0600 0.000000\n", "TT_0700 0.175607\n", "TT_0800 0.039599\n", "TT_0900 0.000000\n", "TT_1000 0.000000\n", "TT_1100 0.000000\n", "TT_1200 0.000000\n", "TT_1300 0.068205\n", "TT_1400 0.694896\n", "TD_0400 0.000000\n", "TD_0500 -0.000000\n", "TD_0600 -0.000000\n", "TD_0700 0.000000\n", "TD_0800 0.000000\n", "TD_0900 -0.000000\n", "TD_1000 -0.049887\n", "TD_1100 -0.000000\n", "TD_1200 -0.000000\n", "TD_1300 -0.000000\n", "TD_1400 0.000000\n", "GRAD1_0400 0.000000\n", "GRAD1_0500 0.000000\n", "GRAD1_0600 -0.005161\n", "GRAD1_0700 0.004623\n", "GRAD1_0800 0.002682\n", "GRAD1_0900 0.002266\n", "GRAD1_1000 0.006073\n", "GRAD1_1100 -0.003775\n", "GRAD1_1200 0.005764\n", "GRAD1_1300 0.004337\n", "GRAD1_1400 0.014364\n", "NE_0400 0.012087\n", "NE_0500 0.066627\n", "NE_0600 -0.016707\n", "NE_0700 -0.005721\n", "NE_0800 0.045778\n", "NE_0900 0.057771\n", "NE_1000 0.000000\n", "NE_1100 0.082545\n", "NE_1200 0.040359\n", "NE_1300 0.072016\n", "NE_1400 0.143800\n", "FF_0400 -0.020241\n", "FF_0500 0.000000\n", "FF_0600 -0.000000\n", "FF_0700 -0.000000\n", "FF_0800 -0.000000\n", "FF_0900 0.002726\n", "FF_1000 0.000000\n", "FF_1100 -0.027617\n", "FF_1200 -0.029053\n", "FF_1300 -0.000000\n", "FF_1400 -0.031354" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lasso_model = Lasso(alpha=0.048).fit(X1400,y1400)\n", "lasso_coefs = pd.Series(dict(zip(list(X1400), lasso_model.coef_)))\n", "coefs = pd.DataFrame(dict(lasso=lasso_coefs))\n", "coefs" ] }, { "cell_type": "markdown", "id": "94e2c127", "metadata": {}, "source": [ "The cell above provides us with the estimated parameter coefficients. The estimated coefficient values are absolutely lower than the values in the linear regression model. Furthermore, some coefficients are now set equal to 0 and are therefore deemed insignificant." ] }, { "cell_type": "markdown", "id": "64de9fe2", "metadata": {}, "source": [ "#### Lasso RST at 10pm" ] }, { "cell_type": "code", "execution_count": 59, "id": "d7fa91a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimal alpha: [0.044] with corresponding MSE: 0.84\n" ] }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "hovertemplate": "cv-train=CV
Alpha=%{x}
MSE=%{y}", "legendgroup": "CV", "line": { "color": "#636efa", "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "CV", "orientation": "v", "showlegend": true, "type": "scatter", "x": [ 0.002, 0.004, 0.006, 0.008, 0.01, 0.012, 0.014, 0.016, 0.018000000000000002, 0.020000000000000004, 0.022, 0.024, 0.026000000000000002, 0.028000000000000004, 0.03, 0.032, 0.034, 0.036000000000000004, 0.038000000000000006, 0.04, 0.042, 0.044000000000000004, 0.046, 0.048, 0.05, 0.052000000000000005, 0.054000000000000006, 0.056, 0.058, 0.060000000000000005, 0.062, 0.064, 0.066, 0.068, 0.07, 0.07200000000000001, 0.07400000000000001, 0.076, 0.078, 0.08, 0.082, 0.084, 0.08600000000000001, 0.08800000000000001, 0.09, 0.092, 0.094, 0.096, 0.098, 0.1, 0.10200000000000001, 0.10400000000000001, 0.10600000000000001, 0.108, 0.11, 0.112, 0.114, 0.116, 0.11800000000000001, 0.12000000000000001, 0.122, 0.124, 0.126, 0.128, 0.13, 0.132, 0.134, 0.136, 0.138, 0.14, 0.14200000000000002, 0.14400000000000002, 0.14600000000000002, 0.148, 0.15, 0.152, 0.154, 0.156, 0.158, 0.16, 0.162, 0.164, 0.166, 0.168, 0.17, 0.17200000000000001, 0.17400000000000002, 0.17600000000000002, 0.178, 0.18, 0.182, 0.184, 0.186, 0.188, 0.19, 0.192, 0.194, 0.196, 0.198, 0.2, 0.202, 0.20400000000000001, 0.20600000000000002, 0.20800000000000002, 0.21000000000000002, 0.212, 0.214, 0.216, 0.218, 0.22, 0.222, 0.224, 0.226, 0.228, 0.23, 0.232, 0.234, 0.23600000000000002, 0.23800000000000002, 0.24000000000000002, 0.242, 0.244, 0.246, 0.248, 0.25, 0.252, 0.254, 0.256, 0.258, 0.26, 0.262, 0.264, 0.266, 0.268, 0.27, 0.272, 0.274, 0.276, 0.278, 0.28, 0.28200000000000003, 0.28400000000000003, 0.28600000000000003, 0.28800000000000003, 0.29000000000000004, 0.292, 0.294, 0.296, 0.298, 0.3, 0.302, 0.304, 0.306, 0.308, 0.31, 0.312, 0.314, 0.316, 0.318, 0.32, 0.322, 0.324, 0.326, 0.328, 0.33, 0.332, 0.334, 0.336, 0.338, 0.34, 0.342, 0.34400000000000003, 0.34600000000000003, 0.34800000000000003, 0.35000000000000003, 0.35200000000000004, 0.354, 0.356, 0.358, 0.36, 0.362, 0.364, 0.366, 0.368, 0.37, 0.372, 0.374, 0.376, 0.378, 0.38, 0.382, 0.384, 0.386, 0.388, 0.39, 0.392, 0.394, 0.396, 0.398, 0.4, 0.402, 0.404, 0.406, 0.40800000000000003, 0.41000000000000003, 0.41200000000000003, 0.41400000000000003, 0.41600000000000004, 0.41800000000000004, 0.42, 0.422, 0.424, 0.426, 0.428, 0.43, 0.432, 0.434, 0.436, 0.438, 0.44, 0.442, 0.444, 0.446, 0.448, 0.45, 0.452, 0.454, 0.456, 0.458, 0.46, 0.462, 0.464, 0.466, 0.468, 0.47000000000000003, 0.47200000000000003, 0.47400000000000003, 0.47600000000000003, 0.47800000000000004, 0.48000000000000004, 0.482, 0.484, 0.486, 0.488, 0.49, 0.492, 0.494, 0.496, 0.498, 0.5, 0.502, 0.504, 0.506, 0.508, 0.51, 0.512, 0.514, 0.516, 0.518, 0.52, 0.522, 0.524, 0.526, 0.528, 0.53, 0.532, 0.534, 0.536, 0.538, 0.54, 0.542, 0.544, 0.546, 0.548, 0.55, 0.552, 0.554, 0.556, 0.558, 0.56, 0.562, 0.5640000000000001, 0.5660000000000001, 0.5680000000000001, 0.5700000000000001, 0.5720000000000001, 0.5740000000000001, 0.5760000000000001, 0.5780000000000001, 0.58, 0.582, 0.584, 0.586, 0.588, 0.59, 0.592, 0.594, 0.596, 0.598, 0.6, 0.602, 0.604, 0.606, 0.608, 0.61, 0.612, 0.614, 0.616, 0.618, 0.62, 0.622, 0.624, 0.626, 0.628, 0.63, 0.632, 0.634, 0.636, 0.638, 0.64, 0.642, 0.644, 0.646, 0.648, 0.65, 0.652, 0.654, 0.656, 0.658, 0.66, 0.662, 0.664, 0.666, 0.668, 0.67, 0.672, 0.674, 0.676, 0.678, 0.68, 0.682, 0.684, 0.686, 0.6880000000000001, 0.6900000000000001, 0.6920000000000001, 0.6940000000000001, 0.6960000000000001, 0.6980000000000001, 0.7000000000000001, 0.7020000000000001, 0.7040000000000001, 0.706, 0.708, 0.71, 0.712, 0.714, 0.716, 0.718, 0.72, 0.722, 0.724, 0.726, 0.728, 0.73, 0.732, 0.734, 0.736, 0.738, 0.74, 0.742, 0.744, 0.746, 0.748, 0.75, 0.752, 0.754, 0.756, 0.758, 0.76, 0.762, 0.764, 0.766, 0.768, 0.77, 0.772, 0.774, 0.776, 0.778, 0.78, 0.782, 0.784, 0.786, 0.788, 0.79, 0.792, 0.794, 0.796, 0.798, 0.8, 0.802, 0.804, 0.806, 0.808, 0.81, 0.812, 0.8140000000000001, 0.8160000000000001, 0.8180000000000001, 0.8200000000000001, 0.8220000000000001, 0.8240000000000001, 0.8260000000000001, 0.8280000000000001, 0.8300000000000001, 0.8320000000000001, 0.8340000000000001, 0.836, 0.838, 0.84, 0.842, 0.844, 0.846, 0.848, 0.85, 0.852, 0.854, 0.856, 0.858, 0.86, 0.862, 0.864, 0.866, 0.868, 0.87, 0.872, 0.874, 0.876, 0.878, 0.88, 0.882, 0.884, 0.886, 0.888, 0.89, 0.892, 0.894, 0.896, 0.898, 0.9, 0.902, 0.904, 0.906, 0.908, 0.91, 0.912, 0.914, 0.916, 0.918, 0.92, 0.922, 0.924, 0.926, 0.928, 0.93, 0.932, 0.934, 0.936, 0.9380000000000001, 0.9400000000000001, 0.9420000000000001, 0.9440000000000001, 0.9460000000000001, 0.9480000000000001, 0.9500000000000001, 0.9520000000000001, 0.9540000000000001, 0.9560000000000001, 0.9580000000000001, 0.9600000000000001, 0.962, 0.964, 0.966, 0.968, 0.97, 0.972, 0.974, 0.976, 0.978, 0.98, 0.982, 0.984, 0.986, 0.988, 0.99, 0.992, 0.994, 0.996, 0.998, 1 ], "xaxis": "x", "y": [ 2.2646970342940973, 2.2480075916329625, 2.2332324482811803, 2.221767818746468, 2.2148923341258575, 2.212327283474827, 2.2099379013088187, 2.2090068519538786, 2.208502626318002, 2.2074085190029353, 2.205556839153391, 2.203346545086158, 2.201008308052397, 2.1987825943213872, 2.1969024179571397, 2.1950869241800572, 2.193244589724946, 2.191578629628721, 2.19077462042609, 2.190327447054993, 2.18997033591305, 2.189593561101281, 2.1893367093335625, 2.1892946999032192, 2.189444159792406, 2.1898261003204444, 2.190164450895966, 2.190700362392526, 2.191267140534292, 2.1918288583954095, 2.1924637670911102, 2.193146544378304, 2.1933612722253057, 2.1933439898594616, 2.1933608752558604, 2.193442688589807, 2.193562932719751, 2.1937021305289157, 2.1937775520714573, 2.1936295146686193, 2.19344759427689, 2.1933434599170862, 2.193302421395095, 2.1932154030507776, 2.193148597786693, 2.193100538224181, 2.1930434411972093, 2.1930398174108463, 2.193065600278961, 2.1931562620659792, 2.1932792079204697, 2.193410368190086, 2.1936623375459505, 2.1939243837156637, 2.1941802803346198, 2.194440173745732, 2.1947103338758263, 2.194985770326418, 2.1952684604721884, 2.1955568410578703, 2.1958524815241085, 2.1961545702649197, 2.1964644461621816, 2.1967794913409477, 2.197103033006349, 2.197431627529814, 2.197769802339337, 2.1981121137469986, 2.1984610439959438, 2.1988172611637316, 2.1991816619830575, 2.19955177164722, 2.1999279943640033, 2.2003096738036283, 2.2006656720965694, 2.201051828114928, 2.2014698958000087, 2.201893768579277, 2.202325016363399, 2.2027568162258833, 2.2031917480731193, 2.2036350251510077, 2.2040992003136823, 2.204585144184043, 2.2050773542307724, 2.2055724813126116, 2.20607575441497, 2.206585576197239, 2.207100513744332, 2.207624729433896, 2.2081550329011534, 2.208692569056244, 2.209236187677086, 2.2097828296755626, 2.210334804134593, 2.2108938719157725, 2.2114597761893338, 2.2120329137144683, 2.2126140097894362, 2.2132076057139427, 2.2138355863668435, 2.214470373751456, 2.2151123092521257, 2.215766182389509, 2.2164421881188616, 2.217115299287154, 2.2177952967193137, 2.2184814297300015, 2.219174936648227, 2.219875021911226, 2.220580800665813, 2.2212960311429373, 2.2220165513136356, 2.22274431593683, 2.2234785849828516, 2.224219769435302, 2.224980608248149, 2.2257466332117, 2.226525364773408, 2.227323868823265, 2.2281214317718616, 2.228914050034286, 2.2297199823406353, 2.2305349456943055, 2.231352369838241, 2.232160308895268, 2.23298472318975, 2.233828596041547, 2.2346645655680155, 2.235509765292485, 2.2363640969780554, 2.237225633314453, 2.2380932164597835, 2.238966587994188, 2.2398493496425558, 2.2407436452053946, 2.2416463998466396, 2.242556276602297, 2.2434735931169443, 2.244393900519938, 2.2453454858937434, 2.246303835739933, 2.2472664363423984, 2.2482490715121273, 2.2492391505303373, 2.250236954604909, 2.2512495962534085, 2.25226608189688, 2.253305449972487, 2.254355133171308, 2.255416490994443, 2.256471079424362, 2.2575190842218786, 2.258573889515358, 2.2596382082062485, 2.26071228199607, 2.261791489445245, 2.262876639290659, 2.263970676652858, 2.2650708260352266, 2.266175498614732, 2.2672831146142225, 2.2683992106198376, 2.269524907579063, 2.2706605966419486, 2.2717984938255205, 2.2729338026707184, 2.274079274248168, 2.2752292804711223, 2.276391545935722, 2.277550152018371, 2.2787164556923343, 2.27988444678167, 2.281067080090644, 2.2822545315278986, 2.2834460334759603, 2.284641406312343, 2.2858413682354692, 2.287052681148853, 2.2882680189805944, 2.2894822813961637, 2.2907169559776164, 2.2919592095729664, 2.293209536980332, 2.2944501107563116, 2.295676987049178, 2.2969124868868898, 2.298154325423422, 2.299399416192505, 2.300647068697243, 2.301901291703248, 2.3031645949493633, 2.304433414730186, 2.3057085766543906, 2.3069910288501125, 2.30827735509823, 2.3095633945075464, 2.3108571654135863, 2.3121559943815395, 2.313463420944287, 2.3147885055951525, 2.316122086050607, 2.317461330991372, 2.3187947956539765, 2.3201367325757776, 2.321486259903576, 2.322834093615738, 2.324153348759798, 2.325474460077653, 2.326800070783945, 2.328131424016429, 2.3294689461387477, 2.33081492670246, 2.3321661848793602, 2.333505370088495, 2.334852367988506, 2.3362006734438636, 2.3375586307816465, 2.338919393775917, 2.3402893391747535, 2.341666822645401, 2.343048679582893, 2.34443431520567, 2.34583378472509, 2.34722232187931, 2.3486190110679135, 2.3500226328522538, 2.351426226633805, 2.352833422444232, 2.354249098640456, 2.3556653493579787, 2.357085334536058, 2.3585149485196033, 2.3599508882714177, 2.361394586207056, 2.3628469108815646, 2.3643071911682396, 2.3657742846790653, 2.367249310721649, 2.3687291013326712, 2.3702164319641534, 2.3717098568194093, 2.373212541354399, 2.3747200733464995, 2.3762323410452266, 2.377756174308695, 2.379284387225437, 2.380820185681889, 2.3823727845232066, 2.3839453754831803, 2.385522025900024, 2.387108066080773, 2.3887060349807485, 2.390309124899407, 2.391917174736566, 2.3935329392588107, 2.3951522838910857, 2.3967791094127477, 2.3984191862674047, 2.4000419302729674, 2.4016607008240647, 2.4032746198954547, 2.404884733926206, 2.4064962422945055, 2.408108611512049, 2.4097301809873874, 2.4113588229767986, 2.412989944128454, 2.4146286229873333, 2.4162708096385552, 2.4179323298773485, 2.4196149138265013, 2.4213003150666648, 2.4229934645839033, 2.424690695412785, 2.4263956510503184, 2.4281067954142346, 2.4298240007132086, 2.431546364125146, 2.43327111346596, 2.434997905533611, 2.436726494973629, 2.4384616533257177, 2.440202607950653, 2.441951180973009, 2.4437015383802003, 2.4454629477687053, 2.44722829830683, 2.4490016551349494, 2.4507810731612416, 2.4525842149624477, 2.454388918575281, 2.456199221103726, 2.4580150082237915, 2.459836113984938, 2.461663199157079, 2.4634952178614764, 2.465330817446152, 2.4671738840256316, 2.4690228361280906, 2.470875629717492, 2.4727348397796702, 2.474601198035676, 2.4764760705944364, 2.478354876619569, 2.480239464722888, 2.48213123759903, 2.484021574822098, 2.485906275847989, 2.48773008553971, 2.4895302099002303, 2.4913273722177207, 2.493130019424709, 2.4949367451299542, 2.496744684156554, 2.498546278034863, 2.5003541385756978, 2.502187172581254, 2.504011933863982, 2.5058451216444375, 2.507684173724325, 2.5095299111052585, 2.5113791864469923, 2.513238473934027, 2.515100641621695, 2.5169677867581726, 2.5188479047973606, 2.5207329224127735, 2.522627087763698, 2.5245279819720126, 2.5264308620233202, 2.5283412535073473, 2.5302596976592593, 2.5321824478441575, 2.5341075077690585, 2.5360213786370283, 2.5379280191474125, 2.539832785154413, 2.5417406239399636, 2.543655534303679, 2.545577054605465, 2.547504111874335, 2.549437731639345, 2.5513761290115395, 2.553317226227235, 2.5552640257774586, 2.5572147008222474, 2.5591724536953766, 2.561132268126961, 2.5630885944316075, 2.5650473233758886, 2.5670105505280763, 2.568975862927698, 2.5709504709567264, 2.57292693165411, 2.574912093959612, 2.576902436341377, 2.578897130409057, 2.5808803704511014, 2.5828193282386542, 2.584764909069977, 2.5867163141069356, 2.5886702652581417, 2.5906292453000477, 2.5925929798641727, 2.5945634365616352, 2.596499214712228, 2.598374995714762, 2.600255537354656, 2.6021393064380107, 2.6040193808674252, 2.605906833353132, 2.6077942524369777, 2.6096908043879283, 2.611511523721671, 2.6132909721351534, 2.615071468759127, 2.616856689969015, 2.6186469654248095, 2.6204431394887453, 2.622243788017639, 2.6240501093065243, 2.6258551322587484, 2.627646137503979, 2.6293829519933096, 2.631122826438186, 2.6328357754152663, 2.634548212101121, 2.636218432265684, 2.637810661211842, 2.6394037415011318, 2.640998866828009, 2.6426042628363997, 2.6442147100731495, 2.645826778937402, 2.647442234529307, 2.649057150993339, 2.6506772374565224, 2.6522970766933827, 2.6539237708478085, 2.655553488923773, 2.657188667826014, 2.658826588315209, 2.6604588371225217, 2.662095284412581, 2.6637357196810987, 2.6653811929900866, 2.6670034719501428, 2.668581334103152, 2.6701643347593227, 2.671750790826306, 2.6733408416835562, 2.674934415395511, 2.676530906529153, 2.67813180521666, 2.6797368794702385, 2.681346772897407, 2.68296014826963, 2.6845751055102602, 2.6861937387466623, 2.6878173389067643, 2.6894447533704415, 2.691076009260006, 2.692710969385761, 2.694349650830669, 2.69599361406013, 2.6976376691928694, 2.6992875806267795, 2.700939945518756, 2.7025971607761337, 2.7041963553674413, 2.7056732434862467, 2.707154810978717, 2.7086353704560393, 2.710122532753744, 2.711612904744349, 2.713108248661459, 2.7146051331735377, 2.7161048690085785, 2.717601623315467, 2.7189299894659604, 2.720262708187976, 2.7215993876857416, 2.7229387851497338, 2.7242816701966945, 2.725626834853654, 2.7269766354080858, 2.7283304362084273, 2.729685492855166, 2.730909182827473, 2.73212122433975, 2.733307147071362, 2.7343796715715665, 2.7354554125570174, 2.736415188008926, 2.7372097668278865, 2.7380070107288965, 2.738803780421803, 2.739478955554243, 2.7401470238425234, 2.740818019879849, 2.7414891840416695, 2.7421644531024256, 2.7428394818977635, 2.743516252934311, 2.744194719650923, 2.744876881669922, 2.745560046959358, 2.7462446247994756, 2.7469321651850658, 2.747621849285917, 2.7483122086894545, 2.7490051291398254, 2.749698277195968, 2.750393981469906, 2.7510920689118215, 2.7517924499050714, 2.752494045672429, 2.7531974320865444, 2.7539021921147846, 2.7545388849826375, 2.75509297299552, 2.755646376989914, 2.7562001506712215, 2.7567555656591263, 2.757312044280975, 2.7578686437764444, 2.758425771092712, 2.758985101255687, 2.7595451210233644, 2.760105640132469, 2.7606677908878843, 2.761230998011544, 2.7617964689618795, 2.7623199049870406, 2.762845618478882, 2.7633724526048002, 2.7639013510953285, 2.764431464268081, 2.7649604257178004 ], "yaxis": "y" }, { "hovertemplate": "cv-train=Train
Alpha=%{x}
MSE=%{y}", "legendgroup": "Train", "line": { "color": "#EF553B", "dash": "solid" }, "marker": { "symbol": "circle" }, "mode": "lines", "name": "Train", "orientation": "v", "showlegend": true, "type": "scatter", "x": [ 0.002, 0.004, 0.006, 0.008, 0.01, 0.012, 0.014, 0.016, 0.018000000000000002, 0.020000000000000004, 0.022, 0.024, 0.026000000000000002, 0.028000000000000004, 0.03, 0.032, 0.034, 0.036000000000000004, 0.038000000000000006, 0.04, 0.042, 0.044000000000000004, 0.046, 0.048, 0.05, 0.052000000000000005, 0.054000000000000006, 0.056, 0.058, 0.060000000000000005, 0.062, 0.064, 0.066, 0.068, 0.07, 0.07200000000000001, 0.07400000000000001, 0.076, 0.078, 0.08, 0.082, 0.084, 0.08600000000000001, 0.08800000000000001, 0.09, 0.092, 0.094, 0.096, 0.098, 0.1, 0.10200000000000001, 0.10400000000000001, 0.10600000000000001, 0.108, 0.11, 0.112, 0.114, 0.116, 0.11800000000000001, 0.12000000000000001, 0.122, 0.124, 0.126, 0.128, 0.13, 0.132, 0.134, 0.136, 0.138, 0.14, 0.14200000000000002, 0.14400000000000002, 0.14600000000000002, 0.148, 0.15, 0.152, 0.154, 0.156, 0.158, 0.16, 0.162, 0.164, 0.166, 0.168, 0.17, 0.17200000000000001, 0.17400000000000002, 0.17600000000000002, 0.178, 0.18, 0.182, 0.184, 0.186, 0.188, 0.19, 0.192, 0.194, 0.196, 0.198, 0.2, 0.202, 0.20400000000000001, 0.20600000000000002, 0.20800000000000002, 0.21000000000000002, 0.212, 0.214, 0.216, 0.218, 0.22, 0.222, 0.224, 0.226, 0.228, 0.23, 0.232, 0.234, 0.23600000000000002, 0.23800000000000002, 0.24000000000000002, 0.242, 0.244, 0.246, 0.248, 0.25, 0.252, 0.254, 0.256, 0.258, 0.26, 0.262, 0.264, 0.266, 0.268, 0.27, 0.272, 0.274, 0.276, 0.278, 0.28, 0.28200000000000003, 0.28400000000000003, 0.28600000000000003, 0.28800000000000003, 0.29000000000000004, 0.292, 0.294, 0.296, 0.298, 0.3, 0.302, 0.304, 0.306, 0.308, 0.31, 0.312, 0.314, 0.316, 0.318, 0.32, 0.322, 0.324, 0.326, 0.328, 0.33, 0.332, 0.334, 0.336, 0.338, 0.34, 0.342, 0.34400000000000003, 0.34600000000000003, 0.34800000000000003, 0.35000000000000003, 0.35200000000000004, 0.354, 0.356, 0.358, 0.36, 0.362, 0.364, 0.366, 0.368, 0.37, 0.372, 0.374, 0.376, 0.378, 0.38, 0.382, 0.384, 0.386, 0.388, 0.39, 0.392, 0.394, 0.396, 0.398, 0.4, 0.402, 0.404, 0.406, 0.40800000000000003, 0.41000000000000003, 0.41200000000000003, 0.41400000000000003, 0.41600000000000004, 0.41800000000000004, 0.42, 0.422, 0.424, 0.426, 0.428, 0.43, 0.432, 0.434, 0.436, 0.438, 0.44, 0.442, 0.444, 0.446, 0.448, 0.45, 0.452, 0.454, 0.456, 0.458, 0.46, 0.462, 0.464, 0.466, 0.468, 0.47000000000000003, 0.47200000000000003, 0.47400000000000003, 0.47600000000000003, 0.47800000000000004, 0.48000000000000004, 0.482, 0.484, 0.486, 0.488, 0.49, 0.492, 0.494, 0.496, 0.498, 0.5, 0.502, 0.504, 0.506, 0.508, 0.51, 0.512, 0.514, 0.516, 0.518, 0.52, 0.522, 0.524, 0.526, 0.528, 0.53, 0.532, 0.534, 0.536, 0.538, 0.54, 0.542, 0.544, 0.546, 0.548, 0.55, 0.552, 0.554, 0.556, 0.558, 0.56, 0.562, 0.5640000000000001, 0.5660000000000001, 0.5680000000000001, 0.5700000000000001, 0.5720000000000001, 0.5740000000000001, 0.5760000000000001, 0.5780000000000001, 0.58, 0.582, 0.584, 0.586, 0.588, 0.59, 0.592, 0.594, 0.596, 0.598, 0.6, 0.602, 0.604, 0.606, 0.608, 0.61, 0.612, 0.614, 0.616, 0.618, 0.62, 0.622, 0.624, 0.626, 0.628, 0.63, 0.632, 0.634, 0.636, 0.638, 0.64, 0.642, 0.644, 0.646, 0.648, 0.65, 0.652, 0.654, 0.656, 0.658, 0.66, 0.662, 0.664, 0.666, 0.668, 0.67, 0.672, 0.674, 0.676, 0.678, 0.68, 0.682, 0.684, 0.686, 0.6880000000000001, 0.6900000000000001, 0.6920000000000001, 0.6940000000000001, 0.6960000000000001, 0.6980000000000001, 0.7000000000000001, 0.7020000000000001, 0.7040000000000001, 0.706, 0.708, 0.71, 0.712, 0.714, 0.716, 0.718, 0.72, 0.722, 0.724, 0.726, 0.728, 0.73, 0.732, 0.734, 0.736, 0.738, 0.74, 0.742, 0.744, 0.746, 0.748, 0.75, 0.752, 0.754, 0.756, 0.758, 0.76, 0.762, 0.764, 0.766, 0.768, 0.77, 0.772, 0.774, 0.776, 0.778, 0.78, 0.782, 0.784, 0.786, 0.788, 0.79, 0.792, 0.794, 0.796, 0.798, 0.8, 0.802, 0.804, 0.806, 0.808, 0.81, 0.812, 0.8140000000000001, 0.8160000000000001, 0.8180000000000001, 0.8200000000000001, 0.8220000000000001, 0.8240000000000001, 0.8260000000000001, 0.8280000000000001, 0.8300000000000001, 0.8320000000000001, 0.8340000000000001, 0.836, 0.838, 0.84, 0.842, 0.844, 0.846, 0.848, 0.85, 0.852, 0.854, 0.856, 0.858, 0.86, 0.862, 0.864, 0.866, 0.868, 0.87, 0.872, 0.874, 0.876, 0.878, 0.88, 0.882, 0.884, 0.886, 0.888, 0.89, 0.892, 0.894, 0.896, 0.898, 0.9, 0.902, 0.904, 0.906, 0.908, 0.91, 0.912, 0.914, 0.916, 0.918, 0.92, 0.922, 0.924, 0.926, 0.928, 0.93, 0.932, 0.934, 0.936, 0.9380000000000001, 0.9400000000000001, 0.9420000000000001, 0.9440000000000001, 0.9460000000000001, 0.9480000000000001, 0.9500000000000001, 0.9520000000000001, 0.9540000000000001, 0.9560000000000001, 0.9580000000000001, 0.9600000000000001, 0.962, 0.964, 0.966, 0.968, 0.97, 0.972, 0.974, 0.976, 0.978, 0.98, 0.982, 0.984, 0.986, 0.988, 0.99, 0.992, 0.994, 0.996, 0.998, 1 ], "xaxis": "x", "y": [ 1.8218995882689908, 1.8271695718249303, 1.8325296212751687, 1.837842790100628, 1.8417801403185197, 1.846374574843671, 1.8504125900564652, 1.853410943291435, 1.8566005541433075, 1.8600531441735701, 1.8637352012017683, 1.8665316895304065, 1.868587885072446, 1.8707994146821867, 1.8729126616708707, 1.8751172445245339, 1.877341567302015, 1.878794033810613, 1.8801263360493639, 1.8814550630514797, 1.8827869252452805, 1.8841839948730814, 1.8856460251938998, 1.8871730288099204, 1.8887649872449352, 1.8904221847787699, 1.8917275088491454, 1.8930285262386013, 1.8943768553312998, 1.8957724658901767, 1.8972156638668343, 1.8987059041815781, 1.900243450817441, 1.9015485919612074, 1.9026811643347743, 1.9038466191647352, 1.9050447355337297, 1.9062758815383656, 1.9075398325137176, 1.9088365874418136, 1.910166018380413, 1.9115285658786563, 1.9120528933621723, 1.9123201015925144, 1.912593403755995, 1.9128728158192958, 1.913158348210847, 1.9134500982463825, 1.9137478755747244, 1.9140518684731351, 1.9143619260517102, 1.9146781304498883, 1.915000514891766, 1.9153289368907456, 1.9156635114820624, 1.9160043031759322, 1.9163511619886557, 1.9167041482875096, 1.9170633197467373, 1.9174285438784735, 1.91779990397879, 1.9181773893561582, 1.918561093561795, 1.918950844767419, 1.919346729491154, 1.9197487186313436, 1.9201569539358634, 1.920571220007641, 1.9209916149739759, 1.921418142921235, 1.9218507941793166, 1.92228966786583, 1.922734561874176, 1.9231856237490692, 1.9236428013326201, 1.9241061114231617, 1.9245756642098903, 1.9250512346905564, 1.9255329287044547, 1.9260207698273901, 1.9265147458904925, 1.9270148588636988, 1.9275211989784888, 1.928033575504482, 1.928552046135555, 1.929076653375861, 1.9295832228454033, 1.9300756934456353, 1.930573797485272, 1.9310776429857233, 1.9315870347146278, 1.932102020235943, 1.9326226508325912, 1.9331489327564453, 1.9336808621965997, 1.9342183826197676, 1.934761569735646, 1.9353104490034458, 1.9358648682907371, 1.936424884786856, 1.9369905431315493, 1.9375618253531486, 1.9381387352725195, 1.938721290669527, 1.939309440623377, 1.9399032331284256, 1.9405027039128233, 1.9411077390466878, 1.941718366243108, 1.9423345797460607, 1.9429564453390924, 1.943583946024608, 1.944217056708308, 1.9448558114163128, 1.9455001714616265, 1.9461502451541361, 1.946805861608553, 1.9474670753839056, 1.9481339045922799, 1.9488063449776027, 1.949484436355769, 1.9501681049787352, 1.9508574250212918, 1.9515718036292882, 1.9523766481371936, 1.9531861396945527, 1.9539983562513088, 1.9548093007797647, 1.9556188062241524, 1.9564351206188746, 1.957257212779552, 1.9580861623097567, 1.958921437416864, 1.9597624585166886, 1.9606103323527573, 1.9614645320808082, 1.962324472196863, 1.9631912934808498, 1.9640644250251644, 1.9649432755165415, 1.9658290252265544, 1.9667210795037502, 1.9676188472921428, 1.9685235196800899, 1.969434499214494, 1.9703490786132731, 1.971268226671242, 1.9721936517788063, 1.9731246973255065, 1.974062663335426, 1.9750069102714658, 1.975957423546644, 1.9769135345082105, 1.9778765912538383, 1.978845922429162, 1.9798208095727814, 1.9808026786961321, 1.9817908129990685, 1.9827844919639226, 1.9837851468318113, 1.9847920490889297, 1.9858051901251776, 1.9868229941888913, 1.9878796827502565, 1.988942894420922, 1.9900133915754565, 1.9910903683550598, 1.9921738254470596, 1.993263758111315, 1.9943592795293263, 1.9954621374742847, 1.9965714924303684, 1.9976873083507807, 1.998808708877613, 1.9999374627710829, 2.0010726906176597, 2.0022143907331174, 2.003361624829913, 2.0045161912177254, 2.0056773014229234, 2.0068449295214887, 2.008019033223502, 2.0091986532797463, 2.0103857094057522, 2.0115792506874537, 2.0127792715266195, 2.0139847829223263, 2.015197691251043, 2.016417138851919, 2.0176431120186895, 2.018874548100278, 2.0201134845834496, 2.02135889265621, 2.022610773541519, 2.023868069651209, 2.0251329071030515, 2.0264031545624013, 2.0276798048997615, 2.0289616000436475, 2.030250190363882, 2.0315453982679728, 2.0328446641826714, 2.034146304770409, 2.0354188397206903, 2.0366978092242354, 2.0379830962987344, 2.0392732397958797, 2.0405711110653053, 2.0418752594426883, 2.043185594177946, 2.0445007054054782, 2.0458235997621688, 2.0471528012988203, 2.0484882633755968, 2.0498299884534785, 2.051176425489328, 2.0525306561490524, 2.053891147141818, 2.055257906632004, 2.056630938787049, 2.0580086121626566, 2.059394126957781, 2.060785811834703, 2.0621837676320234, 2.0635879943785906, 2.064996902609912, 2.0664137430024554, 2.067836845745594, 2.0692662064367133, 2.070701829577302, 2.0721419799924443, 2.0735901164224466, 2.0750445079721747, 2.0765051609616716, 2.0779702868911785, 2.0794434489276683, 2.080922842346417, 2.082408433038049, 2.083900299205104, 2.0853965380907273, 2.0868919112419837, 2.088389133483085, 2.089896515467122, 2.0914103337615715, 2.0929305872148447, 2.0944572677567352, 2.095990362957304, 2.09752944439564, 2.099075191436526, 2.100627293945877, 2.102185097433359, 2.1037497484624312, 2.105320695073306, 2.1068979255163875, 2.1084814012486426, 2.1100701443056082, 2.1116660290142697, 2.113268154525342, 2.1148764848642005, 2.1164909627059907, 2.1180822461764626, 2.11967473729032, 2.121275180539557, 2.1228802022752724, 2.1244934542607776, 2.126113140297672, 2.1277368936622802, 2.1293691645440678, 2.1310052056243918, 2.1326499234126475, 2.1343003264317044, 2.135947401370493, 2.1376034878955243, 2.1392631089862126, 2.1409314044693897, 2.1426057188891248, 2.144286146552337, 2.1459726628000464, 2.147665268358998, 2.1493639483072418, 2.1510687007688527, 2.152779590774944, 2.1544965830687968, 2.1562196427399933, 2.15794603499946, 2.15968122045378, 2.1614224666844213, 2.1631697751329733, 2.1649231416155303, 2.166682565089105, 2.168445309703239, 2.1702168559308244, 2.171994466497768, 2.173778148323155, 2.175565127906273, 2.1773609377710863, 2.1791628149507734, 2.1809707613373988, 2.182784778230708, 2.18460198109547, 2.1864280293413594, 2.1882601525999403, 2.190098350866896, 2.191942627105959, 2.193790093693992, 2.1956464968161375, 2.1975089673076402, 2.199357316461437, 2.201205033104349, 2.2030553826916317, 2.204913135490343, 2.206774245353634, 2.208643618517564, 2.2105189898743487, 2.2123980223866893, 2.214285362931006, 2.2161786918018263, 2.2180780096831603, 2.219980926621419, 2.2218922111446937, 2.2238094890504208, 2.225730318394042, 2.227659561817173, 2.2295947930557047, 2.2315360223505025, 2.2334807377342525, 2.2354339515268733, 2.237393164001113, 2.239358372998662, 2.2413269653468, 2.2433040511760023, 2.245287146853164, 2.247276243025864, 2.2492686789893224, 2.2512697487314197, 2.253276814058492, 2.2552898713542158, 2.2573089391735786, 2.2593312628299946, 2.261362351031112, 2.263399519761675, 2.2654427037022686, 2.267421574840923, 2.2693348573490844, 2.2712569206879674, 2.2731814806350794, 2.2751144570006936, 2.2770504032779613, 2.278992773541445, 2.280939535170153, 2.2828533236194635, 2.284756192979397, 2.28666446208041, 2.2885781322115806, 2.2904935331556513, 2.292417990980668, 2.294348226269907, 2.2962841699158476, 2.298221803351277, 2.3001685679683437, 2.3021207397950887, 2.3040745634633084, 2.3060375647031295, 2.3080059818692424, 2.3099760125463575, 2.3119552602587343, 2.3139361264535427, 2.3159263243512602, 2.3179219232273103, 2.319919081687615, 2.3219255191360237, 2.323937369455042, 2.3259507458984157, 2.3279734324696686, 2.3300015338913433, 2.3320311141602845, 2.3340700285473206, 2.3361143033575122, 2.3381598401005848, 2.340214765159479, 2.3422750995938832, 2.3443368243210543, 2.3464079597692535, 2.348484493619014, 2.349899018379504, 2.351269446490315, 2.3526432822582453, 2.354017300770173, 2.3553980930933034, 2.3567823667816556, 2.358170126436477, 2.3595614370105142, 2.360952979363603, 2.362351403780699, 2.363753387747081, 2.3651589305534375, 2.3665646656486543, 2.3679773059725586, 2.369393498994245, 2.3708132725458273, 2.3722365939207477, 2.3736600744800875, 2.375090513835856, 2.376524510406378, 2.3779620700940183, 2.379403188316518, 2.3808444272801363, 2.3822926593257066, 2.383744449361242, 2.3851997937786393, 2.386655222821631, 2.388117720737775, 2.389583855120398, 2.391053544802343, 2.3925233201882743, 2.3940001568830143, 2.3954805546303213, 2.396964536690485, 2.3984485391598764, 2.399939635503077, 2.401434292465434, 2.4029289502126447, 2.4044307291324545, 2.4059360345294585, 2.4074446942492287, 2.408956774616811, 2.410468986298161, 2.411988348315909, 2.4135112621724253, 2.4150377294403174, 2.416564119580342, 2.418097694627388, 2.4196348242453594, 2.421175507963692, 2.4227197453386955, 2.424263877931667, 2.425815240354714, 2.427370156006167, 2.4289286404154757, 2.430486984244306, 2.432052582686983, 2.4336217296464806, 2.435194433451705, 2.4367706882599416, 2.4383467613261534, 2.439930116861379, 2.4415170249025673, 2.4431075184137656, 2.4446977766711133, 2.4462953369866316, 2.447896447307171, 2.449501128507984, 2.451109396856469, 2.4527174173207476, 2.454332824684089, 2.4559517796654, 2.4575742882918505, 2.4591964961231843, 2.4608260783266593, 2.4624592152230513, 2.464095908104011, 2.465454836008675, 2.465904267526873, 2.4663546175581743, 2.466805920724948, 2.467258185056978, 2.4677114028925113, 2.4681655888369485, 2.468620759431011, 2.4690768949554847, 2.4695340911071173, 2.46999227535532, 2.4704514534460507, 2.4709116258064507, 2.471372821420685, 2.4718350409736662, 2.4722982216261684, 2.4727623409513177, 2.4732313885662385, 2.473697360496989, 2.4741643015601853, 2.4746321970210596, 2.4751010508518188, 2.4755708298377743, 2.476041251285179, 2.4765124977468047, 2.4769848737968863, 2.4774623634449098, 2.4779365851700352, 2.4784119703639895, 2.4788883160307496, 2.4793656323801994, 2.4798332865658868, 2.4803018215134607, 2.480770128745363, 2.4812390078765425, 2.481708827648703, 2.4821834574023884, 2.482655166494873, 2.483127823768324 ], "yaxis": "y" } ], "layout": { "legend": { "title": { "text": "cv-train" }, "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "title": { "text": "Alpha" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "title": { "text": "MSE" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "example4 = lassoReg(X2200,y2200)\n", "px.line(example3, x=\"Alpha\", y=\"MSE\", color=\"cv-train\")" ] }, { "cell_type": "markdown", "id": "14e2f9ca", "metadata": {}, "source": [ "The optimal lasso regression model for RST2200 seems to be for $\\alpha=0.044$, since it minimizes the estimated MSE using cross-validation. Moreover, we can notice that many variables were set to 0 and thus deemed insiginificant." ] }, { "cell_type": "code", "execution_count": 60, "id": "ed5de812", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lasso
TT_12000.104730
TT_13000.000000
TT_14000.000000
TT_15000.000000
TT_16000.078908
TT_17000.000000
TT_18000.062059
TT_19000.000000
TT_20000.047117
TT_21000.000000
TT_22000.626788
TD_12000.000000
TD_13000.000000
TD_14000.000000
TD_1500-0.000000
TD_16000.000000
TD_17000.000000
TD_18000.000000
TD_19000.000000
TD_20000.000000
TD_21000.000000
TD_22000.071725
GRAD1_12000.001877
GRAD1_13000.000171
GRAD1_14000.000075
GRAD1_15000.001163
GRAD1_16000.002237
GRAD1_17000.003312
GRAD1_18000.005909
GRAD1_19000.000000
GRAD1_2000-0.000000
GRAD1_21000.000000
GRAD1_22000.000000
NE_12000.059796
NE_13000.000864
NE_14000.000000
NE_15000.000000
NE_16000.036876
NE_17000.037229
NE_1800-0.000000
NE_19000.000000
NE_20000.013189
NE_21000.082416
NE_22000.183437
FF_1200-0.000000
FF_1300-0.000000
FF_1400-0.034338
FF_15000.000000
FF_16000.000000
FF_17000.000000
FF_18000.000000
FF_19000.005986
FF_2000-0.000000
FF_2100-0.000000
FF_2200-0.038410
\n", "
" ], "text/plain": [ " lasso\n", "TT_1200 0.104730\n", "TT_1300 0.000000\n", "TT_1400 0.000000\n", "TT_1500 0.000000\n", "TT_1600 0.078908\n", "TT_1700 0.000000\n", "TT_1800 0.062059\n", "TT_1900 0.000000\n", "TT_2000 0.047117\n", "TT_2100 0.000000\n", "TT_2200 0.626788\n", "TD_1200 0.000000\n", "TD_1300 0.000000\n", "TD_1400 0.000000\n", "TD_1500 -0.000000\n", "TD_1600 0.000000\n", "TD_1700 0.000000\n", "TD_1800 0.000000\n", "TD_1900 0.000000\n", "TD_2000 0.000000\n", "TD_2100 0.000000\n", "TD_2200 0.071725\n", "GRAD1_1200 0.001877\n", "GRAD1_1300 0.000171\n", "GRAD1_1400 0.000075\n", "GRAD1_1500 0.001163\n", "GRAD1_1600 0.002237\n", "GRAD1_1700 0.003312\n", "GRAD1_1800 0.005909\n", "GRAD1_1900 0.000000\n", "GRAD1_2000 -0.000000\n", "GRAD1_2100 0.000000\n", "GRAD1_2200 0.000000\n", "NE_1200 0.059796\n", "NE_1300 0.000864\n", "NE_1400 0.000000\n", "NE_1500 0.000000\n", "NE_1600 0.036876\n", "NE_1700 0.037229\n", "NE_1800 -0.000000\n", "NE_1900 0.000000\n", "NE_2000 0.013189\n", "NE_2100 0.082416\n", "NE_2200 0.183437\n", "FF_1200 -0.000000\n", "FF_1300 -0.000000\n", "FF_1400 -0.034338\n", "FF_1500 0.000000\n", "FF_1600 0.000000\n", "FF_1700 0.000000\n", "FF_1800 0.000000\n", "FF_1900 0.005986\n", "FF_2000 -0.000000\n", "FF_2100 -0.000000\n", "FF_2200 -0.038410" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lasso_model = Lasso(alpha=0.044).fit(X2200,y2200)\n", "lasso_coefs = pd.Series(dict(zip(list(X2200), lasso_model.coef_)))\n", "coefs = pd.DataFrame(dict(lasso=lasso_coefs))\n", "coefs" ] }, { "cell_type": "markdown", "id": "3a736042", "metadata": {}, "source": [ "Many variables are set to 0 for the corresponding lasso model. The most important predictor is TT_2200, the current temperature. Furthermore, NE seems to be an important factor as well, whereas GRAD1, FF and TD seem to be close to negligible." ] }, { "cell_type": "markdown", "id": "6ee1e495", "metadata": {}, "source": [ "# Classification analysis\n", "\n", "In this section we want to conduct a classification analysis in which we are able to predict if a day has 'driving risk' or not, accordingly to the other variables (air temperature, dew point temperature, solar radiation, effective cloud cover, wind speed).\n", "\n", "Thus, we firstly code a new binary output variable which represents the risk. Thus, we are not interested anymore in the hourly values, so we consider the daily mean of the road surface temperature in a day (given by the mean by row of the 24 columns RST). If this value is close to 0 means that driving could be risky on that day. \n", "\n", "As techniques we will apply several machine learning algorithms to classify that variable. \n", "In order to do so, we create a new dataframe which summaries for each day the information regarding each of these variables (TT, TD, GRAD1, NE, FF). Specifically, to summarise them in an informative and compact representation we generate new features (std, skewness, min, max, ...).\n", "This method generates a large set of candidate features by combining information in the original features, with the aim of maximizing predictive performance." ] }, { "cell_type": "markdown", "id": "b0b30fbf", "metadata": {}, "source": [ "## Build a new dataframe for classification" ] }, { "cell_type": "markdown", "id": "ffb4ae56", "metadata": {}, "source": [ "Functions for feature generation:" ] }, { "cell_type": "code", "execution_count": 61, "id": "f9c29a30", "metadata": {}, "outputs": [], "source": [ "from scipy import stats \n", "def mean(x):\n", " return np.mean(x)\n", "def std(x):\n", " return np.std(x)\n", "def ptp(x):\n", " return np.ptp(x)\n", "def variance(x):\n", " return np.var(x)\n", "def minim(x):\n", " return np.min(x)\n", "def maxim(x):\n", " return np.max(x)\n", "def argminim(x):\n", " return np.argmin(x)\n", "def argmaxim(x):\n", " return np.argmax(x)\n", "def rms(x):\n", " return np.sqrt(np.mean(x**2))\n", "def skewness(x):\n", " return stats.skew(x)\n", "def kurtosis(x):\n", " return stats.kurtosis(x)\n", "\n", "def concatenate_features(x):\n", " '''''''''\n", " this function apply several functions defined above.\n", " It takes as input a numpy array.\n", " It outputs a vector with the value of each function: mean, std, ...\n", " '''''''''\n", " return mean(x),std(x),ptp(x),variance(x),minim(x),maxim(x),argminim(x),argmaxim(x),rms(x),skewness(x),kurtosis(x)" ] }, { "cell_type": "markdown", "id": "d2df65ba", "metadata": {}, "source": [ "Apply the concatenate_features function to each predictor and store the values in a dictionary." ] }, { "cell_type": "code", "execution_count": 62, "id": "686e8128", "metadata": {}, "outputs": [], "source": [ "variables = [\"RST\", \"TT\", \"TD\", \"GRAD1\", \"NE\", \"FF\"]\n", "date_list = []\n", "final_dic = {}\n", "\n", "for i in range(len(df)): # consider one row per time\n", " \n", " date = df[\"date\"][i]\n", " date_list.append(date) # add date in the list\n", " \n", " # create empty list to collect mean, std, var, ... for a fixed predictor\n", " gen_features = {}\n", " \n", " for var in variables: # for each variable (RST, TT, ...)\n", " # create empty list to collect all the data in the 24 columns for a fixed variable\n", " all_values = []\n", " \n", " for col in df.columns: # for each column\n", " if col.split(\"_\")[0] == var: # if the var is in the column name, then:\n", " \n", " all_values.append(df[col][i])\n", "\n", " # add a list with the new features associated to the name of the var in a dictionary\n", " gen_features[var] = concatenate_features(np.array(all_values))\n", " \n", " final_dic[date] = gen_features" ] }, { "cell_type": "markdown", "id": "e3acb872", "metadata": {}, "source": [ "Storing all the values in a new dataframe." ] }, { "cell_type": "code", "execution_count": 63, "id": "7698a628", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateRST_meanTT_meanTT_stdTT_ptpTT_varTT_minimTT_maximTT_argminimTT_argmaxim...FF_stdFF_ptpFF_varFF_minimFF_maximFF_argminimFF_argmaximFF_rmsFF_skewnessFF_kurtosis
02017-11-0110.78750011.8000000.6757712.40.45666710.512.90.014.0...3.29106310.810.83109411.322.10.021.017.008931-0.251824-1.224803
12017-11-029.57916710.3250000.8120193.40.6593758.511.923.012.0...4.00213314.116.0170667.821.921.01.014.2446040.421528-0.817587
22017-11-038.7583338.6833331.0102253.91.0205566.410.32.011.0...2.0673897.74.2740974.712.45.021.09.291259-0.243920-0.896543
32017-11-049.5833339.6750000.6001742.10.3602088.911.07.013.0...1.9698387.23.8802606.513.710.023.010.302568-0.136474-1.105156
42017-11-058.5083339.1333331.4432794.82.0830566.411.222.01.0...2.4773948.56.1374835.213.721.01.08.6418800.977124-0.182912
..................................................................
10102022-04-119.3708335.0458332.7040067.87.3116491.29.05.016.0...1.9255557.13.7077602.59.621.011.06.575745-0.266277-0.512240
10112022-04-1210.0500006.1666672.7558528.27.5947221.29.41.014.0...3.99603113.615.9682644.017.60.018.013.512710-0.874238-0.375518
10122022-04-1315.01250010.4041672.7915768.47.7928996.615.01.013.0...2.2021307.74.8493755.713.419.00.09.4355710.322618-0.817434
10132022-04-1414.9125009.5833331.8780906.03.5272227.513.523.011.0...3.05927310.39.3591495.716.02.015.011.3011250.109536-1.170334
10142022-04-1510.6125007.0750000.7848832.50.6160425.88.321.013.0...2.1799238.04.7520663.811.823.011.08.4566300.101002-0.677766
\n", "

1015 rows × 57 columns

\n", "
" ], "text/plain": [ " Date RST_mean TT_mean TT_std TT_ptp TT_var TT_minim \\\n", "0 2017-11-01 10.787500 11.800000 0.675771 2.4 0.456667 10.5 \n", "1 2017-11-02 9.579167 10.325000 0.812019 3.4 0.659375 8.5 \n", "2 2017-11-03 8.758333 8.683333 1.010225 3.9 1.020556 6.4 \n", "3 2017-11-04 9.583333 9.675000 0.600174 2.1 0.360208 8.9 \n", "4 2017-11-05 8.508333 9.133333 1.443279 4.8 2.083056 6.4 \n", "... ... ... ... ... ... ... ... \n", "1010 2022-04-11 9.370833 5.045833 2.704006 7.8 7.311649 1.2 \n", "1011 2022-04-12 10.050000 6.166667 2.755852 8.2 7.594722 1.2 \n", "1012 2022-04-13 15.012500 10.404167 2.791576 8.4 7.792899 6.6 \n", "1013 2022-04-14 14.912500 9.583333 1.878090 6.0 3.527222 7.5 \n", "1014 2022-04-15 10.612500 7.075000 0.784883 2.5 0.616042 5.8 \n", "\n", " TT_maxim TT_argminim TT_argmaxim ... FF_std FF_ptp FF_var \\\n", "0 12.9 0.0 14.0 ... 3.291063 10.8 10.831094 \n", "1 11.9 23.0 12.0 ... 4.002133 14.1 16.017066 \n", "2 10.3 2.0 11.0 ... 2.067389 7.7 4.274097 \n", "3 11.0 7.0 13.0 ... 1.969838 7.2 3.880260 \n", "4 11.2 22.0 1.0 ... 2.477394 8.5 6.137483 \n", "... ... ... ... ... ... ... ... \n", "1010 9.0 5.0 16.0 ... 1.925555 7.1 3.707760 \n", "1011 9.4 1.0 14.0 ... 3.996031 13.6 15.968264 \n", "1012 15.0 1.0 13.0 ... 2.202130 7.7 4.849375 \n", "1013 13.5 23.0 11.0 ... 3.059273 10.3 9.359149 \n", "1014 8.3 21.0 13.0 ... 2.179923 8.0 4.752066 \n", "\n", " FF_minim FF_maxim FF_argminim FF_argmaxim FF_rms FF_skewness \\\n", "0 11.3 22.1 0.0 21.0 17.008931 -0.251824 \n", "1 7.8 21.9 21.0 1.0 14.244604 0.421528 \n", "2 4.7 12.4 5.0 21.0 9.291259 -0.243920 \n", "3 6.5 13.7 10.0 23.0 10.302568 -0.136474 \n", "4 5.2 13.7 21.0 1.0 8.641880 0.977124 \n", "... ... ... ... ... ... ... \n", "1010 2.5 9.6 21.0 11.0 6.575745 -0.266277 \n", "1011 4.0 17.6 0.0 18.0 13.512710 -0.874238 \n", "1012 5.7 13.4 19.0 0.0 9.435571 0.322618 \n", "1013 5.7 16.0 2.0 15.0 11.301125 0.109536 \n", "1014 3.8 11.8 23.0 11.0 8.456630 0.101002 \n", "\n", " FF_kurtosis \n", "0 -1.224803 \n", "1 -0.817587 \n", "2 -0.896543 \n", "3 -1.105156 \n", "4 -0.182912 \n", "... ... \n", "1010 -0.512240 \n", "1011 -0.375518 \n", "1012 -0.817434 \n", "1013 -1.170334 \n", "1014 -0.677766 \n", "\n", "[1015 rows x 57 columns]" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create an empty dataframe\n", "clas_df = pd.DataFrame()\n", "\n", "clas_df[\"Date\"] = date_list\n", "\n", "\n", "functions = [\"mean\", \"std\", \"ptp\",\"var\",\"minim\",\"maxim\",\"argminim\",\"argmaxim\",\"rms\",\"skewness\",\"kurtosis\"] \n", "\n", "for i in range(len(clas_df)): # i indicates the row\n", " \n", " d = date_list[i]\n", " \n", " for var in variables: # for each variable\n", " # RST is the response, so we don't need all the values but just the mean\n", " if var == 'RST': \n", " clas_df.at[i, var + \"_\" + functions[0]] = final_dic[d][var][0] \n", " \n", " else: \n", " for j in range(11): \n", "\n", " # example: final_dic[pd.Timestamp('2017-11-01')]['RST'][0]\n", " clas_df.at[i, var + \"_\" + functions[j]] = final_dic[d][var][j] \n", " \n", "clas_df" ] }, { "cell_type": "markdown", "id": "7873f591", "metadata": {}, "source": [ "\n", "### Adding response variable\n", "\n", "0°C indicates the ice temperature, which is computed as a mean, so we consider a day risky if the temperature is lower than 4.78°C. Approximately 0°C+/-std(RST_mean). " ] }, { "cell_type": "code", "execution_count": 64, "id": "8282aed2", "metadata": {}, "outputs": [], "source": [ "def scale(x):\n", " \"\"\"\n", " classification of a value in Safe and Risky\n", " \"\"\"\n", " if x > 4.78 or x < -4.78:\n", " return \"Safe\"\n", " else:\n", " return \"Risk\" " ] }, { "cell_type": "code", "execution_count": 65, "id": "5ea994d8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateTT_meanTT_stdTT_ptpTT_varTT_minimTT_maximTT_argminimTT_argmaximTT_rms...FF_ptpFF_varFF_minimFF_maximFF_argminimFF_argmaximFF_rmsFF_skewnessFF_kurtosisY
02017-11-0111.8000000.6757712.40.45666710.512.90.014.011.819334...10.810.83109411.322.10.021.017.008931-0.251824-1.224803Safe
12017-11-0210.3250000.8120193.40.6593758.511.923.012.010.356882...14.116.0170667.821.921.01.014.2446040.421528-0.817587Safe
22017-11-038.6833331.0102253.91.0205566.410.32.011.08.741901...7.74.2740974.712.45.021.09.291259-0.243920-0.896543Safe
32017-11-049.6750000.6001742.10.3602088.911.07.013.09.693598...7.23.8802606.513.710.023.010.302568-0.136474-1.105156Safe
42017-11-059.1333331.4432794.82.0830566.411.222.01.09.246666...8.56.1374835.213.721.01.08.6418800.977124-0.182912Safe
..................................................................
10102022-04-115.0458332.7040067.87.3116491.29.05.016.05.724691...7.13.7077602.59.621.011.06.575745-0.266277-0.512240Safe
10112022-04-126.1666672.7558528.27.5947221.29.41.014.06.754443...13.615.9682644.017.60.018.013.512710-0.874238-0.375518Safe
10122022-04-1310.4041672.7915768.47.7928996.615.01.013.010.772167...7.74.8493755.713.419.00.09.4355710.322618-0.817434Safe
10132022-04-149.5833331.8780906.03.5272227.513.523.011.09.765628...10.39.3591495.716.02.015.011.3011250.109536-1.170334Safe
10142022-04-157.0750000.7848832.50.6160425.88.321.013.07.118403...8.04.7520663.811.823.011.08.4566300.101002-0.677766Safe
\n", "

1015 rows × 57 columns

\n", "
" ], "text/plain": [ " Date TT_mean TT_std TT_ptp TT_var TT_minim TT_maxim \\\n", "0 2017-11-01 11.800000 0.675771 2.4 0.456667 10.5 12.9 \n", "1 2017-11-02 10.325000 0.812019 3.4 0.659375 8.5 11.9 \n", "2 2017-11-03 8.683333 1.010225 3.9 1.020556 6.4 10.3 \n", "3 2017-11-04 9.675000 0.600174 2.1 0.360208 8.9 11.0 \n", "4 2017-11-05 9.133333 1.443279 4.8 2.083056 6.4 11.2 \n", "... ... ... ... ... ... ... ... \n", "1010 2022-04-11 5.045833 2.704006 7.8 7.311649 1.2 9.0 \n", "1011 2022-04-12 6.166667 2.755852 8.2 7.594722 1.2 9.4 \n", "1012 2022-04-13 10.404167 2.791576 8.4 7.792899 6.6 15.0 \n", "1013 2022-04-14 9.583333 1.878090 6.0 3.527222 7.5 13.5 \n", "1014 2022-04-15 7.075000 0.784883 2.5 0.616042 5.8 8.3 \n", "\n", " TT_argminim TT_argmaxim TT_rms ... FF_ptp FF_var FF_minim \\\n", "0 0.0 14.0 11.819334 ... 10.8 10.831094 11.3 \n", "1 23.0 12.0 10.356882 ... 14.1 16.017066 7.8 \n", "2 2.0 11.0 8.741901 ... 7.7 4.274097 4.7 \n", "3 7.0 13.0 9.693598 ... 7.2 3.880260 6.5 \n", "4 22.0 1.0 9.246666 ... 8.5 6.137483 5.2 \n", "... ... ... ... ... ... ... ... \n", "1010 5.0 16.0 5.724691 ... 7.1 3.707760 2.5 \n", "1011 1.0 14.0 6.754443 ... 13.6 15.968264 4.0 \n", "1012 1.0 13.0 10.772167 ... 7.7 4.849375 5.7 \n", "1013 23.0 11.0 9.765628 ... 10.3 9.359149 5.7 \n", "1014 21.0 13.0 7.118403 ... 8.0 4.752066 3.8 \n", "\n", " FF_maxim FF_argminim FF_argmaxim FF_rms FF_skewness FF_kurtosis \\\n", "0 22.1 0.0 21.0 17.008931 -0.251824 -1.224803 \n", "1 21.9 21.0 1.0 14.244604 0.421528 -0.817587 \n", "2 12.4 5.0 21.0 9.291259 -0.243920 -0.896543 \n", "3 13.7 10.0 23.0 10.302568 -0.136474 -1.105156 \n", "4 13.7 21.0 1.0 8.641880 0.977124 -0.182912 \n", "... ... ... ... ... ... ... \n", "1010 9.6 21.0 11.0 6.575745 -0.266277 -0.512240 \n", "1011 17.6 0.0 18.0 13.512710 -0.874238 -0.375518 \n", "1012 13.4 19.0 0.0 9.435571 0.322618 -0.817434 \n", "1013 16.0 2.0 15.0 11.301125 0.109536 -1.170334 \n", "1014 11.8 23.0 11.0 8.456630 0.101002 -0.677766 \n", "\n", " Y \n", "0 Safe \n", "1 Safe \n", "2 Safe \n", "3 Safe \n", "4 Safe \n", "... ... \n", "1010 Safe \n", "1011 Safe \n", "1012 Safe \n", "1013 Safe \n", "1014 Safe \n", "\n", "[1015 rows x 57 columns]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clas_df['Y'] = clas_df[\"RST_mean\"].apply(scale)\n", "clas_df = clas_df.drop([\"RST_mean\"], axis=1)\n", "clas_df" ] }, { "cell_type": "markdown", "id": "c9b35e0b", "metadata": {}, "source": [ "## Pre processing" ] }, { "cell_type": "markdown", "id": "d82978c6", "metadata": {}, "source": [ "### Split in train and test sets" ] }, { "cell_type": "code", "execution_count": 66, "id": "8ca69cff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The shape of X_train is: (812, 55)\n", "The shape of X_test is: (203, 55)\n", "\n", "The shape of y_train is: (812,)\n", "The shape of y_test is: (203,)\n" ] } ], "source": [ "y = clas_df['Y']\n", "X = clas_df.drop([\"Y\", \"Date\"], axis=1)\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state = 42)\n", "\n", "print(\"The shape of X_train is:\", X_train.shape)\n", "print(\"The shape of X_test is:\", X_test.shape)\n", "print('')\n", "print(\"The shape of y_train is:\", y_train.shape)\n", "print(\"The shape of y_test is:\", y_test.shape)" ] }, { "cell_type": "markdown", "id": "c80dd6e3", "metadata": {}, "source": [ "### Feature scaling" ] }, { "cell_type": "markdown", "id": "04137a52", "metadata": {}, "source": [ "Coding the response variable in 0 and 1 (1=Risk)" ] }, { "cell_type": "code", "execution_count": 67, "id": "c22bb21a", "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "array([1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,\n", " 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1,\n", " 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,\n", " 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,\n", " 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,\n", " 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1,\n", " 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,\n", " 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1,\n", " 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,\n", " 1, 1, 1, 0, 1])" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#For label data (y) always use LabelEncoder\n", "LaEnc = LabelEncoder()\n", "y_train = LaEnc.fit_transform(y_train)\n", "y_train\n", "\n", "y_test = LaEnc.transform(y_test)\n", "y_test" ] }, { "cell_type": "code", "execution_count": 68, "id": "c1e16d2a", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Safe 612\n", "Risk 403\n", "Name: Y, dtype: int64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clas_df['Y'].value_counts()" ] }, { "cell_type": "markdown", "id": "132014b9", "metadata": {}, "source": [ "The dataset is balanced, the classes are similar in terms of occurrence." ] }, { "cell_type": "code", "execution_count": 69, "id": "7188c265", "metadata": {}, "outputs": [], "source": [ "sc = StandardScaler()\n", "X_train = sc.fit_transform(X_train)\n", "X_test = sc.transform(X_test)" ] }, { "cell_type": "markdown", "id": "277440d4", "metadata": {}, "source": [ "## Classification analysis" ] }, { "cell_type": "markdown", "id": "6d5bdaa6", "metadata": {}, "source": [ "We fit different models to find the one that could be used in future to early detect a cold day. \n", "\n", "A good classifier has minimum test error rate, which is measured by:\n", "\n", "$ {ER = \\frac{1}{n}\\sum\\limits_{i=1}^n I(y_i \\neq \\hat{y_i} )} $\n", "\n", "This error rate can be easily computed summing the elements on the second diagonal in the confusion matrix divided by the total number of observations in the table.\n", "\n", "The accuracy is equal to $ acc = 1 - ER $ , or also the sum of the elements on the first diagonal of the confusion matrix, divided by n. \n", "\n", "$ acc = \\frac{TP + TN}{P + N} $\n", "\n", "There are other interesting measures like the precision, which is a measure of exactness and the recall, which measures the completeness. " ] }, { "cell_type": "markdown", "id": "ed127e9c", "metadata": {}, "source": [ "### Logistic regression classifier" ] }, { "cell_type": "markdown", "id": "2ebccf38", "metadata": {}, "source": [ "Logistic regression models the probability that Y belongs to a particular category h using the logistic function. $ P(Y=h | X=x)=p(x)= \\frac{exp(\\beta_0 + \\beta_1x)}{1+exp(\\beta_0 + \\beta_1x)} $\n", "\n" ] }, { "cell_type": "code", "execution_count": 70, "id": "fbb4729a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 76 5]\n", " [ 8 114]]\n", "Accuracy of logistic regression = 93.6 %\n" ] } ], "source": [ "LogitReg = LogisticRegression(max_iter = 10000000000000000) # add maximum iteration\n", "LogitReg.fit(X_train, y_train)\n", "\n", "y_pred=LogitReg.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "print(cm)\n", "print(\"Accuracy of logistic regression =\", round(accuracy_score(y_test, y_pred)*100,2), \"%\")" ] }, { "cell_type": "markdown", "id": "d0a28917", "metadata": {}, "source": [ "### KNN" ] }, { "cell_type": "markdown", "id": "7025656d", "metadata": {}, "source": [ "K-Nearest-Neighbor classifier first identifies the neighbors K points in the training data that are closest (in distance) to the point x0. \n", "\n", "Then it estimates the conditional probability for class h as the fraction of points in these group whose response values equal h:\n", "\n", "$ {P(Y=h | X=x_0) = \\frac{1}{K}\\sum\\limits_{i=1}^K I(y_i = h)} $\n", "\n", "Finally, KNN applyes the Bayes rule and classifies the test observation x0 to the class with the largest probability. " ] }, { "cell_type": "code", "execution_count": 71, "id": "a325c87a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion Matrix\n", "[[ 66 15]\n", " [ 22 100]]\n", "\n", "\n", "Accuracy of KNN: 81.77 \n", "\n" ] } ], "source": [ "KNNclassifier = KNeighborsClassifier(n_neighbors = 3, metric = 'euclidean', p = 2)\n", "KNNclassifier.fit(X_train, y_train)\n", "\n", "y_pred = KNNclassifier.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "acc = accuracy_score(y_test, y_pred)\n", "print('Confusion Matrix')\n", "print(cm)\n", "print(\"\\n\")\n", "print(\"Accuracy of KNN:\", round(acc*100,2),'\\n')" ] }, { "cell_type": "markdown", "id": "4879d3f9", "metadata": {}, "source": [ "We decided to use as metric to measure the distance between points the euclidean distance and K=3, but it does not mean that this is the best K." ] }, { "cell_type": "markdown", "id": "ddc57364", "metadata": {}, "source": [ "### SVM linear" ] }, { "cell_type": "markdown", "id": "7356aedf", "metadata": {}, "source": [ "The support vector classifier is called also soft margin classifier. 'Soft' comes from the fact that hyperplane does not perfectly separate the two classes, accepting some errors. The advantage is that it is robust to individual observations. " ] }, { "cell_type": "code", "execution_count": 72, "id": "3747da56", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion Matrix\n", "[[ 78 3]\n", " [ 9 113]]\n", "\n", "\n", "Accuracy of SVM linear: 94.09 \n", "\n" ] } ], "source": [ "SVMclassifier = SVC(kernel = 'linear', random_state = 1)\n", "SVMclassifier.fit(X_train, y_train)\n", "\n", "y_pred = SVMclassifier.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "acc = accuracy_score(y_test, y_pred)\n", "print('Confusion Matrix')\n", "print(cm)\n", "print(\"\\n\")\n", "print(\"Accuracy of SVM linear:\", round(acc*100,2),'\\n')" ] }, { "cell_type": "markdown", "id": "98e111b8", "metadata": {}, "source": [ "### SVM non linear" ] }, { "cell_type": "markdown", "id": "7dc7063b", "metadata": {}, "source": [ "The support vector machines are extension of the support vector classifier, resulting from enlarging the feature space using kernels. \n", "\n", "The idea behind it is to accomodate a non linear boundary. " ] }, { "cell_type": "code", "execution_count": 73, "id": "868dc0b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion Matrix\n", "[[ 70 11]\n", " [ 8 114]]\n", "\n", "\n", "Accuracy of SVM rbf: 90.64 \n", "\n" ] } ], "source": [ "SVMclassifier = SVC(kernel = 'rbf', random_state = 1)\n", "SVMclassifier.fit(X_train, y_train)\n", "\n", "y_pred = SVMclassifier.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "acc = accuracy_score(y_test, y_pred)\n", "print('Confusion Matrix')\n", "print(cm)\n", "print(\"\\n\")\n", "print(\"Accuracy of SVM rbf:\", round(acc*100,2),'\\n')" ] }, { "cell_type": "markdown", "id": "051156ee", "metadata": {}, "source": [ "### Naive bayes classifier" ] }, { "cell_type": "markdown", "id": "797f74d6", "metadata": {}, "source": [ "The Naive bayes classifier is a technique that assumes that the features $X_k$ are independent for a given class Y=h :\n", "$ f_h(X) = {\\prod\\limits_{j=1}^p f_{hj}(X_j)} $" ] }, { "cell_type": "code", "execution_count": 74, "id": "19a13987", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion Matrix\n", "[[ 73 8]\n", " [ 15 107]]\n", "\n", "\n", "Accuracy of Naive bayes Classifier: 88.67 \n", "\n" ] } ], "source": [ "NBclassifier = GaussianNB()\n", "NBclassifier.fit(X_train, y_train)\n", "\n", "y_pred = NBclassifier.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "acc = accuracy_score(y_test, y_pred)\n", "print('Confusion Matrix')\n", "print(cm)\n", "print(\"\\n\")\n", "print(\"Accuracy of Naive bayes Classifier:\", round(acc*100,2),'\\n')" ] }, { "cell_type": "markdown", "id": "49502959", "metadata": {}, "source": [ "### Decision tree" ] }, { "cell_type": "markdown", "id": "3bc5f2f8", "metadata": {}, "source": [ "Classification trees stratifies the predictor space into a number of simple regions according to splitting rules. \n", "\n", "The approach is top down since it begins at the top of the tree and then successively splits the predictor space and it is called recursive binary splitting. \n", "\n", "Each observation is allocated to the most commonly occurring class of training observations in the region to which it belongs. " ] }, { "cell_type": "code", "execution_count": 75, "id": "681fb78b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion Matrix\n", "[[ 71 10]\n", " [ 13 109]]\n", "\n", "\n", "Accuracy of decision tree: 88.67 \n", "\n" ] } ], "source": [ "Treeclassifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 1)\n", "Treeclassifier.fit(X_train, y_train)\n", "\n", "y_pred = Treeclassifier.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "acc = accuracy_score(y_test, y_pred)\n", "print('Confusion Matrix')\n", "print(cm)\n", "print(\"\\n\")\n", "print(\"Accuracy of decision tree:\", round(acc*100,2),'\\n')" ] }, { "cell_type": "markdown", "id": "e067897c", "metadata": {}, "source": [ "### Random forest" ] }, { "cell_type": "markdown", "id": "5a91aef5", "metadata": {}, "source": [ "Random forest is an improvment over bagging by decorrelating trees. \n", "\n", "Firstly, it generate B samples from the training set using bootstrap. \n", "Then, train the tree on each b-th bootstrapped training, but before each split only m predictors are considered as split candidates.\n", "Finally, it average all the predictors" ] }, { "cell_type": "code", "execution_count": 76, "id": "3a024e7e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion Matrix\n", "[[ 75 6]\n", " [ 11 111]]\n", "\n", "\n", "Accuracy of random forest: 91.63 \n", "\n" ] } ], "source": [ "Forestclassifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 1)\n", "Forestclassifier.fit(X_train, y_train)\n", "\n", "y_pred = Forestclassifier.predict(X_test)\n", "\n", "cm = confusion_matrix(y_test, y_pred)\n", "acc = accuracy_score(y_test, y_pred)\n", "print('Confusion Matrix')\n", "print(cm)\n", "print(\"\\n\")\n", "print(\"Accuracy of random forest:\", round(acc*100,2),'\\n')" ] }, { "cell_type": "markdown", "id": "1d09ca63", "metadata": {}, "source": [ "We decided to use the entropy as criterion to for making the binary split since it turns out that classification error is not sufficiently sensitive for tree growing. An other alternative could have been the Gini index.\n", "\n", "$ D = -{\\sum\\limits_{k=1}^K \\hat{p}_{mk}log(\\hat{p}_{mk})} $" ] }, { "cell_type": "markdown", "id": "e074cca3", "metadata": {}, "source": [ "### Ada boost" ] }, { "cell_type": "markdown", "id": "3b575077", "metadata": {}, "source": [ "Ada Boost procedure trains the classifiers on weighted versions of the training sample, giving higher weight to cases that are currently miscalssified, then the final classifier is a linear combination of the classifier from each stage. \n", "\n", "Each successive classifier is forced to concentrate on those training observations that are missed by previous one in the sequence.\n", "\n", "Below we are going to find the best Ada Boost Classifier between some possible classifier choosing the best values for the learning rate and the number of estimator from a grid of possible values." ] }, { "cell_type": "code", "execution_count": 77, "id": "ab22fdef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wall time: 1min 26s\n", "Best: 0.923634 using {'learning_rate': 0.45, 'n_estimators': 175}\n", "0.860849 (0.009035) with: {'learning_rate': 0.01, 'n_estimators': 50}\n", "0.859619 (0.008844) with: {'learning_rate': 0.01, 'n_estimators': 75}\n", "0.867013 (0.013652) with: {'learning_rate': 0.01, 'n_estimators': 100}\n", "0.874398 (0.007775) with: {'learning_rate': 0.01, 'n_estimators': 125}\n", "0.869473 (0.009010) with: {'learning_rate': 0.01, 'n_estimators': 150}\n", "0.875637 (0.013419) with: {'learning_rate': 0.01, 'n_estimators': 175}\n", "0.883030 (0.014696) with: {'learning_rate': 0.01, 'n_estimators': 200}\n", "0.902724 (0.014776) with: {'learning_rate': 0.12, 'n_estimators': 50}\n", "0.902724 (0.017321) with: {'learning_rate': 0.12, 'n_estimators': 75}\n", "0.900260 (0.018261) with: {'learning_rate': 0.12, 'n_estimators': 100}\n", "0.910113 (0.012450) with: {'learning_rate': 0.12, 'n_estimators': 125}\n", "0.908879 (0.016535) with: {'learning_rate': 0.12, 'n_estimators': 150}\n", "0.908874 (0.017095) with: {'learning_rate': 0.12, 'n_estimators': 175}\n", "0.910104 (0.018382) with: {'learning_rate': 0.12, 'n_estimators': 200}\n", "0.905184 (0.009093) with: {'learning_rate': 0.23, 'n_estimators': 50}\n", "0.908879 (0.011321) with: {'learning_rate': 0.23, 'n_estimators': 75}\n", "0.910104 (0.009633) with: {'learning_rate': 0.23, 'n_estimators': 100}\n", "0.913790 (0.006316) with: {'learning_rate': 0.23, 'n_estimators': 125}\n", "0.913794 (0.010570) with: {'learning_rate': 0.23, 'n_estimators': 150}\n", "0.916259 (0.009179) with: {'learning_rate': 0.23, 'n_estimators': 175}\n", "0.915020 (0.008023) with: {'learning_rate': 0.23, 'n_estimators': 200}\n", "0.907644 (0.013075) with: {'learning_rate': 0.34, 'n_estimators': 50}\n", "0.912564 (0.015148) with: {'learning_rate': 0.34, 'n_estimators': 75}\n", "0.913799 (0.012505) with: {'learning_rate': 0.34, 'n_estimators': 100}\n", "0.912573 (0.016537) with: {'learning_rate': 0.34, 'n_estimators': 125}\n", "0.913794 (0.010570) with: {'learning_rate': 0.34, 'n_estimators': 150}\n", "0.916250 (0.009742) with: {'learning_rate': 0.34, 'n_estimators': 175}\n", "0.912560 (0.013597) with: {'learning_rate': 0.34, 'n_estimators': 200}\n", "0.908870 (0.012155) with: {'learning_rate': 0.45, 'n_estimators': 50}\n", "0.911325 (0.010903) with: {'learning_rate': 0.45, 'n_estimators': 75}\n", "0.917489 (0.004578) with: {'learning_rate': 0.45, 'n_estimators': 100}\n", "0.910095 (0.012201) with: {'learning_rate': 0.45, 'n_estimators': 125}\n", "0.918719 (0.009040) with: {'learning_rate': 0.45, 'n_estimators': 150}\n", "0.923634 (0.009779) with: {'learning_rate': 0.45, 'n_estimators': 175}\n", "0.923634 (0.009779) with: {'learning_rate': 0.45, 'n_estimators': 200}\n", "0.911334 (0.007924) with: {'learning_rate': 0.56, 'n_estimators': 50}\n", "0.911339 (0.013077) with: {'learning_rate': 0.56, 'n_estimators': 75}\n", "0.913803 (0.012109) with: {'learning_rate': 0.56, 'n_estimators': 100}\n", "0.910095 (0.006318) with: {'learning_rate': 0.56, 'n_estimators': 125}\n", "0.919958 (0.008634) with: {'learning_rate': 0.56, 'n_estimators': 150}\n", "0.915020 (0.003148) with: {'learning_rate': 0.56, 'n_estimators': 175}\n", "0.912555 (0.007040) with: {'learning_rate': 0.56, 'n_estimators': 200}\n", "0.906419 (0.017325) with: {'learning_rate': 0.67, 'n_estimators': 50}\n", "0.918714 (0.008021) with: {'learning_rate': 0.67, 'n_estimators': 75}\n", "0.919954 (0.006236) with: {'learning_rate': 0.67, 'n_estimators': 100}\n", "0.916259 (0.009179) with: {'learning_rate': 0.67, 'n_estimators': 125}\n", "0.918714 (0.008021) with: {'learning_rate': 0.67, 'n_estimators': 150}\n", "0.915015 (0.013198) with: {'learning_rate': 0.67, 'n_estimators': 175}\n", "0.913781 (0.009332) with: {'learning_rate': 0.67, 'n_estimators': 200}\n", "0.903936 (0.005308) with: {'learning_rate': 0.78, 'n_estimators': 50}\n", "0.912564 (0.009178) with: {'learning_rate': 0.78, 'n_estimators': 75}\n", "0.908874 (0.014194) with: {'learning_rate': 0.78, 'n_estimators': 100}\n", "0.913799 (0.009635) with: {'learning_rate': 0.78, 'n_estimators': 125}\n", "0.910091 (0.006389) with: {'learning_rate': 0.78, 'n_estimators': 150}\n", "0.912560 (0.010595) with: {'learning_rate': 0.78, 'n_estimators': 175}\n", "0.913785 (0.011469) with: {'learning_rate': 0.78, 'n_estimators': 200}\n", "0.913785 (0.006385) with: {'learning_rate': 0.89, 'n_estimators': 50}\n", "0.903931 (0.010528) with: {'learning_rate': 0.89, 'n_estimators': 75}\n", "0.908852 (0.010730) with: {'learning_rate': 0.89, 'n_estimators': 100}\n", "0.905157 (0.009381) with: {'learning_rate': 0.89, 'n_estimators': 125}\n", "0.910082 (0.014360) with: {'learning_rate': 0.89, 'n_estimators': 150}\n", "0.910082 (0.012319) with: {'learning_rate': 0.89, 'n_estimators': 175}\n", "0.911316 (0.008142) with: {'learning_rate': 0.89, 'n_estimators': 200}\n", "0.902711 (0.004574) with: {'learning_rate': 1.0, 'n_estimators': 50}\n", "0.911325 (0.010903) with: {'learning_rate': 1.0, 'n_estimators': 75}\n", "0.910091 (0.014290) with: {'learning_rate': 1.0, 'n_estimators': 100}\n", "0.917466 (0.012713) with: {'learning_rate': 1.0, 'n_estimators': 125}\n", "0.917471 (0.009851) with: {'learning_rate': 1.0, 'n_estimators': 150}\n", "0.912537 (0.014103) with: {'learning_rate': 1.0, 'n_estimators': 175}\n", "0.912537 (0.016751) with: {'learning_rate': 1.0, 'n_estimators': 200}\n" ] } ], "source": [ "n_estimators =[int(x) for x in np.linspace(start = 50, stop = 200, num =7)]\n", "learning_rate =[float(x) for x in np.linspace(start = 0.01, stop = 1, num =10)]\n", "\n", "param_grid = {'n_estimators':n_estimators,\n", " 'learning_rate':learning_rate}\n", "\n", "ada=AdaBoostClassifier()\n", "\n", "grid_ada = GridSearchCV(estimator = ada, param_grid=param_grid, cv=3)\n", "\n", "%time grid_ada_result = grid_ada.fit(X_train,y_train)\n", "\n", "# summarize results\n", "print(\"Best: %f using %s\" % (grid_ada_result.best_score_, grid_ada_result.best_params_))\n", "means = grid_ada_result.cv_results_['mean_test_score']\n", "stds = grid_ada_result.cv_results_['std_test_score']\n", "params = grid_ada_result.cv_results_['params']\n", "for mean, stdev, param in zip(means, stds, params):\n", " print(\"%f (%f) with: %r\" % (mean, stdev, param))" ] }, { "cell_type": "code", "execution_count": 78, "id": "aec0bf8f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Confusion Matrix\n", "[[ 73 8]\n", " [ 10 112]]\n", "\n", "\n", "Accuracy of Adaboost: 91.13 \n", "\n" ] } ], "source": [ "#results on the test set for the best model\n", "best_model_ada = grid_ada.best_estimator_\n", "y_pred = best_model_ada.predict(X_test)\n", "conf_matrix = confusion_matrix(y_test, y_pred)\n", "ada_acc = accuracy_score(y_test, y_pred)\n", "print('Confusion Matrix')\n", "print(conf_matrix)\n", "print(\"\\n\")\n", "print(\"Accuracy of Adaboost:\",round(ada_acc*100,2),'\\n')" ] }, { "cell_type": "code", "execution_count": 79, "id": "adf02faa", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "importances = best_model_ada.feature_importances_*100\n", "feature_list = X.columns\n", "\n", "sample_df = pd.DataFrame([])\n", "feature_list\n", "sample_df['features'] = feature_list\n", "sample_df['importance_score'] = importances\n", "sample_df = sample_df.sort_values(by = ['importance_score'],ascending=False)\n", "plt.figure(figsize=(30,8))\n", "sb.set(font_scale = 2)\n", "sb.barplot(x='features',y='importance_score',data =sample_df).set(title=\"Predictors importance\")\n", "plt.xticks(rotation=70)\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "id": "db6057fa", "metadata": {}, "source": [ "## Conclusions\n", "\n", "In the regression analysis we constructed prediction models for two different times in the day and saw that we were able to predict the road surface temperature based on the given predictors with an estimated MSE of around 2.19 and 0.84 for RST_1400 and RST_2200, respectively. This suggests that we are better in predicting road surface temperatures in the evening/night rather than during the day. Furthermore, the most important features for predicting the current road surface temperature seems to be the current temperature and - depending on whether it is day or night - solar radiation is an important predictor.\n", "\n", "From the classification analysis we observed that all the classification models works well in predicting the response variable. \n", "The best one is the SVM, which has an accuracy of about 94%. \n", "Thus, in future, it would be possible to aware people about the risk of driving in days with specific values of air temperature, dew point temperature, solar radiation, effective cloud cover, wind speed, without really knowing the road surface temperature. Moreover in the context of self-driving cars it is necessary to improve the performance of sensors in recognizing these weather situations and then study the behaviour of the car in terms of road-holding. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }