{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"import pandas as pd\n",
"import numpy as np\n",
"import statsmodels.api as sm # for datasets\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"# Hide Numpy warnings from Statsmodels\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"# Sequence of Appelpy imports\n",
"from appelpy.eda import statistical_moments\n",
"from appelpy.utils import DummyEncoder\n",
"from appelpy.linear_model import OLS\n",
"from appelpy.diagnostics import heteroskedasticity_test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook shows how to make a simple **model pipeline** with Appelpy, using the [Cars93 dataset](https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/Cars93.html).\n",
"\n",
"There is data available about 93 cars. One variable of interest to model is the price of the cars.\n",
"\n",
"It can be particularly challenging to fit models with many **categorical variables**. If you fit models on data that come from a database, where categories of a variable may shift over time, then model pipelines can help to make the modelling process more robust and easier to maintain.\n",
"\n",
"**Notebook workflow:**\n",
"- Load data\n",
"- Explore data\n",
"- Transform data (set up functions for transforming the raw data)\n",
" - Column renaming\n",
" - Dummy column encoding\n",
" - Log variables\n",
"- Fit model\n",
" - Initial model: set up a basic `ModelPipeline` and fit `model1` using it. Run model diagnostics.\n",
" - Second model: fit `model2` with the pipeline. Run model diagnostics.\n",
" - Second model with different base levels for categorical variables."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Load data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df_raw = sm.datasets.get_rdataset('Cars93', 'MASS').data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(93, 27)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_raw.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Explore data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
mean
\n",
"
var
\n",
"
skew
\n",
"
kurtosis
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Min.Price
\n",
"
17.1258
\n",
"
76.493
\n",
"
1.16382
\n",
"
0.9016
\n",
"
\n",
"
\n",
"
Price
\n",
"
19.5097
\n",
"
93.3046
\n",
"
1.50824
\n",
"
3.18369
\n",
"
\n",
"
\n",
"
Max.Price
\n",
"
21.8989
\n",
"
121.671
\n",
"
2.00091
\n",
"
6.9816
\n",
"
\n",
"
\n",
"
MPG.city
\n",
"
22.3656
\n",
"
31.5823
\n",
"
1.67682
\n",
"
3.72841
\n",
"
\n",
"
\n",
"
MPG.highway
\n",
"
29.086
\n",
"
28.4273
\n",
"
1.20997
\n",
"
2.41192
\n",
"
\n",
"
\n",
"
EngineSize
\n",
"
2.66774
\n",
"
1.07612
\n",
"
0.845494
\n",
"
0.297016
\n",
"
\n",
"
\n",
"
Horsepower
\n",
"
143.828
\n",
"
2743.08
\n",
"
0.936308
\n",
"
0.98822
\n",
"
\n",
"
\n",
"
RPM
\n",
"
5280.65
\n",
"
356089
\n",
"
-0.254344
\n",
"
-0.451623
\n",
"
\n",
"
\n",
"
Rev.per.mile
\n",
"
2332.2
\n",
"
246519
\n",
"
0.276984
\n",
"
0.145034
\n",
"
\n",
"
\n",
"
Fuel.tank.capacity
\n",
"
16.6645
\n",
"
10.7543
\n",
"
0.106394
\n",
"
0.0566398
\n",
"
\n",
"
\n",
"
Passengers
\n",
"
5.08602
\n",
"
1.07948
\n",
"
0.061504
\n",
"
0.822782
\n",
"
\n",
"
\n",
"
Length
\n",
"
183.204
\n",
"
213.23
\n",
"
-0.0886349
\n",
"
0.361628
\n",
"
\n",
"
\n",
"
Wheelbase
\n",
"
103.946
\n",
"
46.5079
\n",
"
0.111884
\n",
"
-0.819052
\n",
"
\n",
"
\n",
"
Width
\n",
"
69.3763
\n",
"
14.2807
\n",
"
0.25975
\n",
"
-0.297207
\n",
"
\n",
"
\n",
"
Turn.circle
\n",
"
38.957
\n",
"
10.3894
\n",
"
-0.131405
\n",
"
-0.757256
\n",
"
\n",
"
\n",
"
Rear.seat.room
\n",
"
27.8297
\n",
"
8.93455
\n",
"
0.0769636
\n",
"
0.781057
\n",
"
\n",
"
\n",
"
Luggage.room
\n",
"
13.8902
\n",
"
8.9878
\n",
"
0.225345
\n",
"
0.444448
\n",
"
\n",
"
\n",
"
Weight
\n",
"
3072.9
\n",
"
347978
\n",
"
-0.141341
\n",
"
-0.873658
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mean var skew kurtosis\n",
"Min.Price 17.1258 76.493 1.16382 0.9016\n",
"Price 19.5097 93.3046 1.50824 3.18369\n",
"Max.Price 21.8989 121.671 2.00091 6.9816\n",
"MPG.city 22.3656 31.5823 1.67682 3.72841\n",
"MPG.highway 29.086 28.4273 1.20997 2.41192\n",
"EngineSize 2.66774 1.07612 0.845494 0.297016\n",
"Horsepower 143.828 2743.08 0.936308 0.98822\n",
"RPM 5280.65 356089 -0.254344 -0.451623\n",
"Rev.per.mile 2332.2 246519 0.276984 0.145034\n",
"Fuel.tank.capacity 16.6645 10.7543 0.106394 0.0566398\n",
"Passengers 5.08602 1.07948 0.061504 0.822782\n",
"Length 183.204 213.23 -0.0886349 0.361628\n",
"Wheelbase 103.946 46.5079 0.111884 -0.819052\n",
"Width 69.3763 14.2807 0.25975 -0.297207\n",
"Turn.circle 38.957 10.3894 -0.131405 -0.757256\n",
"Rear.seat.room 27.8297 8.93455 0.0769636 0.781057\n",
"Luggage.room 13.8902 8.9878 0.225345 0.444448\n",
"Weight 3072.9 347978 -0.141341 -0.873658"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"statistical_moments(df_raw)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exploratory data analysis and causal reasoning are not the focus of this notebook.\n",
"\n",
"We are interested in **modelling the `Price` of cars based on these independent variables:**\n",
"- `Type` (categorical variable)\n",
"- `MPG.city` (continuous variable)\n",
"- `Airbags` (categorical variable)\n",
"- `Origin` (categorical variable)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Transform data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Rename columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The source dataset has columns in camel case. As we're using Python let's get it closer to **snake case**."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def _rename_columns(df):\n",
" \"Make column names lowercase and more like snake-case.\"\n",
" return (df\n",
" .rename(columns=lambda x: re.sub(\"[ ,.,-]\", '_', x))\n",
" .rename(columns=str.lower))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Manufacturer', 'Model', 'Type', 'Min.Price', 'Price', 'Max.Price',\n",
" 'MPG.city', 'MPG.highway', 'AirBags', 'DriveTrain', 'Cylinders',\n",
" 'EngineSize', 'Horsepower', 'RPM', 'Rev.per.mile', 'Man.trans.avail',\n",
" 'Fuel.tank.capacity', 'Passengers', 'Length', 'Wheelbase', 'Width',\n",
" 'Turn.circle', 'Rear.seat.room', 'Luggage.room', 'Weight', 'Origin',\n",
" 'Make'],\n",
" dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_raw.columns"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df_raw = df_raw.pipe(_rename_columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clean the column names by using the `rename_columns` method."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['manufacturer', 'model', 'type', 'min_price', 'price', 'max_price',\n",
" 'mpg_city', 'mpg_highway', 'airbags', 'drivetrain', 'cylinders',\n",
" 'enginesize', 'horsepower', 'rpm', 'rev_per_mile', 'man_trans_avail',\n",
" 'fuel_tank_capacity', 'passengers', 'length', 'wheelbase', 'width',\n",
" 'turn_circle', 'rear_seat_room', 'luggage_room', 'weight', 'origin',\n",
" 'make'],\n",
" dtype='object')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_raw.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encoding of columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many categorical variables in the dataset, so for a linear regression model we will need to encode dummy variables from each categorical variable."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def _process_data(raw_df, cat_base_levels):\n",
" \"Transform raw dataset by encoding columns, e.g. dummy columns from categorical variables.\"\n",
" return (df_raw\n",
" .pipe(_rename_columns)\n",
" .pipe(DummyEncoder,\n",
" categorical_col_base_levels=cat_base_levels)\n",
" .transform()\n",
" .pipe(_rename_columns))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example of a categorical variable in the dataset and its values:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['None', 'Driver & Passenger', 'Driver only'], dtype=object)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_raw['airbags'].unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The raw dataset comes with category values that have capital letters and whitespace. When encoding dummy columns from them, it is desirable to have the new columns as close to snake case as possible. The **encoder** will process the columns so that they are more like snake case."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For demonstration, see the columns from a transformed dataset below."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['manufacturer', 'model', 'min_price', 'price', 'max_price', 'mpg_city',\n",
" 'mpg_highway', 'drivetrain', 'cylinders', 'enginesize', 'horsepower',\n",
" 'rpm', 'rev_per_mile', 'man_trans_avail', 'fuel_tank_capacity',\n",
" 'passengers', 'length', 'wheelbase', 'width', 'turn_circle',\n",
" 'rear_seat_room', 'luggage_room', 'weight', 'make', 'type_large',\n",
" 'type_midsize', 'type_small', 'type_sporty', 'type_van',\n",
" 'airbags_driver_&_passenger', 'airbags_driver_only', 'origin_non_usa'],\n",
" dtype='object')"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_raw.pipe(_process_data, {'type': 'Compact',\n",
" 'airbags': 'None',\n",
" 'origin': 'USA'}).columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Log variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a basic function for generating log transformations of variables. This will be particularly useful for positively-skewed variables."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def _create_log_variables(raw_df, cols_list):\n",
" \"Transform dataset to include log variables of the variables in cols_list.\"\n",
" df = raw_df.copy()\n",
" for col in cols_list:\n",
" df['ln_' + col] = np.log(df[col])\n",
" return df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These data cleaning functions could be gathered in a specific class, e.g. `Transformer`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fit model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initial model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's set out these key things for a linear model, from the raw dataset:\n",
"- Dependent variable (`raw_y_list`)\n",
"- Independent variables (`raw_X_list`) and the base levels for the categorical variables."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"raw_y_list = ['price']\n",
"raw_X_list = ['type', 'mpg_city', 'airbags', 'origin']\n",
"cat_base_levels = {'type': 'Compact',\n",
" 'airbags': 'None',\n",
" 'origin': 'USA'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how many of the variables in `raw_X_list` are categorical variables."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"type object\n",
"mpg_city int64\n",
"airbags object\n",
"origin object\n",
"dtype: object"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_raw[raw_X_list].dtypes"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"type 6\n",
"mpg_city 21\n",
"airbags 3\n",
"origin 2\n",
"dtype: int64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_raw[raw_X_list].nunique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One trick is to make another representation of the independent variables – an X_list – where the categorical variables have the `DummyEncoder` separator appended to them."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['type_', 'mpg_city', 'airbags_', 'origin_']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[str(x + '_') if x in cat_base_levels.keys()\n",
" else x for x in raw_X_list]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `ModelPipeline`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can transform the raw dataset, given lists of its y & X variables (and base levels), and make a dataset to use for modelling.\n",
"\n",
"Let's set up a **`ModelPipeline` class**, which takes those variables from the raw dataset and does the following:\n",
"- Get a final dataset to use for modelling (`get_dataset` method)\n",
"- Get an Appelpy model object that consumes the final dataset (`get_model` method)."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"class ModelPipeline:\n",
" def __init__(self, raw_df, raw_y_list, raw_X_list, cat_base_levels):\n",
" self.raw_df = raw_df\n",
" self.raw_y_list = raw_y_list\n",
" self.raw_X_list = raw_X_list\n",
" self.cat_base_levels = cat_base_levels\n",
"\n",
" # Transformed dataset and its columns\n",
" self.df = None\n",
" self.y_list = None\n",
" self.X_list = None\n",
"\n",
" def get_dataset(self):\n",
" \"Return processed dataset, ready to use for modelling\"\n",
" df = (self.raw_df\n",
" .pipe(_process_data, self.cat_base_levels)\n",
" .pipe(_create_log_variables, ['price', 'mpg_city']))\n",
"\n",
" # Append separator to columns that represent categorical variables: \n",
" X_list_prefixes = [str(x + '_') if x in self.cat_base_levels.keys()\n",
" else x for x in self.raw_X_list]\n",
"\n",
" # Get column names to use in model dataset:\n",
" self.y_list = self.raw_y_list\n",
" self.X_list = df.columns[df.columns.str.startswith(tuple(X_list_prefixes))]\n",
" # and filter the source dataset on those columns:\n",
" filter_condition = df.columns.str.startswith(tuple([*self.y_list, *self.X_list]))\n",
" self.df = df.loc[:, filter_condition]\n",
" return self.df\n",
"\n",
" def get_model(self):\n",
" \"Return OLS Appelpy model instance (not fitted)\"\n",
" if self.df:\n",
" return OLS(self.df, self.y_list, self.X_list)\n",
" else:\n",
" self.df = self.get_dataset()\n",
" return OLS(self.df, self.y_list, self.X_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is useful, as it will make it easier to run different specifications of models and inspect the dataset used for modelling."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Model results"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(figsize=(8, 5))\n",
"model1.resid_standardized.hist(ax=ax)\n",
"ax.set_title('Histogram of standardized residuals')\n",
"ax.set_xlabel('Standardized residuals')\n",
"ax.set_ylabel('Frequency')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the residual plots, it is clear that the assumptions of OLS do not hold in the model:\n",
"- Residuals are not normally distributed (for y-values close to mean and ones near the tails)\n",
"- RVF plot has a funnel pattern. Perhaps there is heteroskedasticity in the dataset."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Breusch-Pagan test (studentized) :: chi2(1)\n",
"Test statistic: 10.6543\n",
"Test p-value: 0.3002\n"
]
}
],
"source": [
"bps_stats = heteroskedasticity_test('breusch_pagan_studentized', model1)\n",
"print('Breusch-Pagan test (studentized) :: {}'.format(bps_stats['distribution'] + '({})'.format(bps_stats['nu'])))\n",
"print('Test statistic: {:.4f}'.format(bps_stats['test_stat']))\n",
"print('Test p-value: {:.4f}'.format(bps_stats['p_value']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Second model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The functional form of the first model may be inappropriate.\n",
"\n",
"In the statistical moments dataframe we see that the dependent variable `price` is positively skewed. The model estimates were worst at the tail end of the `price` distribution and the residuals did not appear normally distributed.\n",
"\n",
"Let's use the log-transformed price (`ln_price`) as the dependent variable in a second model."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"def log_plots(df, col, figsize=(11,5)): \n",
" plot_df = (df[col].to_frame()\n",
" .pipe(_create_log_variables, [col]))\n",
" \n",
" fig, axarr = plt.subplots(1, 2, figsize=figsize)\n",
" \n",
" plot_df[col].hist(ax=axarr[0])\n",
" axarr[0].set_title('Histogram of {} values'.format(col))\n",
" axarr[0].set_ylabel('Frequency')\n",
" axarr[0].set_xlabel('Value')\n",
" \n",
" plot_df['ln_' + col].hist(ax=axarr[1])\n",
" axarr[1].set_title('Histogram of ln_{} values'.format(col))\n",
" axarr[1].set_ylabel('Frequency')\n",
" axarr[1].set_xlabel('Value')\n",
"\n",
" return fig"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAApUAAAFNCAYAAABR1JR4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de7hkdX3n+/dH8ILAAAbdIhLbKEMG7YjYgxqTzEYUEVR0juPAEAPRTCdRJ5LpOQnx5CijMQ9zIhKjiaZFBkwQNSpChKjIsEOc4w0QbW4OqG2gubSKAq2OpvU7f9TasdhU7a7dq66936/nqWev+/quqtrf+tb6rVq/VBWSJElSGw+adACSJEmafRaVkiRJas2iUpIkSa1ZVEqSJKk1i0pJkiS1ZlEpSZKk1iwqdwFJrk8yP+k4JinJS5LcmmRbkqcOYXsnJfnkMGIbhiTnJvmjScchzQrz4o7zYpJK8sQxx2Ru3YVZVE65JJuTPGfJtFOSfHpxvKqeVFULO9jOmiaB7D6iUCftLcBrqmqvqvpi241V1flVdfQQ4pI0ZObFgQ01Lw6DuXXXZlGpoZiCpPw44PphbGgKjkXSLmAKcsnQ8uIwTMHzoRGzqNwFdH9rT3JEkquS3JvkriRvbRa7svn73aYp5JlJHpTkD5N8I8nWJO9Nsk/Xdn+tmfftJP/vkv2cnuRDSf46yb3AKc2+P5Pku0nuSPKOJA/p2l4leVWSm5Pcl+RNSZ6Q5P9v4v1g9/JLjrFnrEkemmQbsBvwpSRf7bN+JfmdJF9L8q0kf5LkQc28U5L8zyRnJfk2cPrSsx5JnpTksiR3N8/r67riOi3JV5vn6YNJHtEnhhuTvKBrfPck30xyeDP+N0nuTHJPkiuTPKnPdu4XW9fxPbEZfmiStyT5xybWdyXZo5m3f5KPNa/R3Un+YfF5kHYl5sUd58Ul2zo3yZ8nuaSJ43NJnjDAeuZWzK2LVt0BrwJvA95WVf8CeALwwWb6rzR/922aQj4DnNI8jgR+DtgLeAdAkkOBvwBOAg4A9gEOXLKv44EPAfsC5wM/Bn4X2B94JnAU8Kol6zwPeBrwDOD3gI3ArwIHAU8GTuxzXD1jraofVtVezTJPqarlkuBLgHXA4U3sr+ia93Tga8Ac8ObulZLsDXwK+DjwGOCJwOXN7P8EvBj4N8287wB/3mf/Fyw5vucB36qqa5rxvwMOBh4FXEPnOd0ZZwD/EjisifVA4PXNvA3AbcAj6Rzr6wD7atWuzrw4mBOA/wrsB9zCkly4DHOrubWjqnxM8QPYDGwDvtv1+D7w6SXLPKcZvpJOUth/yXbW0HmD79417XLgVV3jhwD/BOxO5x/lgq55Dwd+1LWf04ErdxD7qcCFXeMFPKtr/Grg97vGzwT+tM+2+sbate0nLhNLAcd0jb8KuLwZPgX4xyXLn7L4HNNJVl/ss90bgaO6xg/ojmvJsk8E7gMe3oyfD7y+z3b3bWLepxk/F/ijpbEtOb4nAgG+Bzyha94zga83w28ELlruufLhY9of5sUdx9q17R3lxSc2w+cCZ3fNOxa4aYDXwtxa5tbFh2cqZ8OLq2rfxQcP/Jbb7ZV0vkndlOQL3U0CPTwG+EbX+DfoJM65Zt6tizOq6vvAt5esf2v3SJJ/2Zz+v7Np+vljOt/Ou93VNfyDHuN70dtysQ6qO95vNNvsNW+pg4B+zUePAy5smjy+SycR/rhXXFV1SzP/hUkeDrwIeB9Akt2SnNE09dxL5wMRHvj87cgj6XzQXd0V08eb6QB/QucMxCeb5qrTVrh9aVqYF4eTF7vd2TX8/WX2u5S51dwK2Py9y6mqm6vqRDqn+f8b8KEke9L7NPztdP5xF/0ssJ1OQrsDeOzijOa6kZ9Zursl4+8EbgIOrk4z0+vofLsbhuViHdRBS9a/vWt8uWaKW+k0LfWb9/zuD7eqelhVbemz/GIzzfHADU0yBPgPzbTn0GlSW9NM7/X8fY9OcusskDy6a9636HwIPakrnn2qaQqrqvuqakNV/RydxPufkxy1zLFLM8+8OHLmVnMrYFG5y0nyq0keWVU/odMkBPAT4JvN3+5/4AuA303y+CR70fkG/YGq2k7nmqAXJvnF5iLx09lxItwbuBfYluTngd8e1nHtINZB/d9J9ktyEPBa4AMDrvcx4IAkpzYXau+d5OnNvHcBb07yOIAkj0xy/DLbej9wNJ3n5n1d0/cGfkjnrMfD6RxfP18CnpTksCQPo/PaANC87u8GzkryqCamA5M8rxl+QZInJglwD51v/j8Z6FmQZpR5ceTMreZWwKJyV3QMcH06v/x7G3BCVf2gaaZ5M/A/m1P3zwDOAf6KzvVGXwf+N52Lo6mq65vh99P5dr4N2Ernn7Of/0LnW+F9dP75Bk0sg+gb6wpcROd6pWuBS4D3DLJSVd0HPBd4IZ3moZvpXBgPnef4YjpNHvcBn6VzYXq/bd0BfAb4Re7//LyXTrPRFuCGZjv9tvG/6Fy/86kmlk8vWeT36TTDfLZp7vkUnWutoHOx+qfovJ6fAf6iqq7of/TSLsG8OFrmVnMrAGkuMJWW1XwL/i6dJpyvTzqelUpSdGK/ZYcLS9IAZj0vDoO5Vd08U6m+krwwycOba4/eAmzipxc5S9KqY16U+rOo1HKOp3PB9e10Tu2fUJ7alrS6raq8mOSX07kx/AMek45N08fmb0mSJLXmmUpJkiS1ZlEpSZKk1nafdACD2H///WvNmjUj3cf3vvc99txzz5HuYxRmNW6Y3dhnNW6Y3dhHFffVV1/9rap65I6XnE0rzZ2z+v7YWR7vrms1HSuM/3j75c6ZKCrXrFnDVVddNdJ9LCwsMD8/P9J9jMKsxg2zG/usxg2zG/uo4k7yjR0vNbtWmjtn9f2xszzeXddqOlYY//H2y502f0uSJKk1i0pJkiS1ZlEpSZKk1iwqJUmS1JpFpSRJklqzqJQkSVJrFpWSJElqbWRFZZKDklyR5IYk1yd5bTP99CRbklzbPI4dVQySJEkaj1He/Hw7sKGqrkmyN3B1ksuaeWdV1VtGuG9JkiSN0ciKyqq6A7ijGb4vyY3AgaPanyRJkiZnLNdUJlkDPBX4XDPpNUm+nOScJPuNIwZJkiSNzsj7/k6yF/Bh4NSqujfJO4E3AdX8PRN4RY/11gPrAebm5lhYWFjRfjdtuWdFy8/tAW8//6IVrbP2wH1WtPwobNu2bcXPzbSY1dhnNW6Y3dhnNW5pGNacdslQtrNh7XZO6bOtzWccN5R9aHUbaVGZ5MF0Csrzq+ojAFV1V9f8dwMf67VuVW0ENgKsW7euVtpRer9/nH42rN3OmZtW9nRsPml+RcuPwrg7kR+mWY19VuOG2Y19VuOWpNVklL/+DvAe4MaqemvX9AO6FnsJcN2oYpAkSdJ4jPJM5bOAlwObklzbTHsdcGKSw+g0f28GfnOEMUiSJGkMRvnr708D6THr0lHtU5IkSZNhjzqSJElqzaJSkiRJrVlUSpIkqTWLSkkagyQHJbkiyQ1Jrk/y2mb6I5JcluTm5m/PDiGSnNwsc3OSk8cbvSTtmEWlJI3HdmBDVR0KPAN4dZJDgdOAy6vqYODyZvx+kjwCeAPwdOAI4A32RiZp2lhUStIYVNUdVXVNM3wfcCNwIHA8cF6z2HnAi3us/jzgsqq6u6q+A1wGHDP6qCVpcBaVkjRmSdYATwU+B8xV1R3NrDuBuR6rHAjc2jV+WzNNkqbGyPv+liT9VJK96HRfe2pV3dvpfKyjqipJtdz+emA9wNzc3Ir6TF9tfazPyvFuWLt9KNuZ26P/tmbheViJWXlth2VajteiUpLGJMmD6RSU51fVR5rJdyU5oKruaLqx3dpj1S3AfNf4Y4GFXvuoqo3ARoB169bVSvpMX219rM/K8Z5y2iVD2c6Gtds5c1Pvj/3NJ80PZR/TYlZe22GZluO1+VuSxiCdU5LvAW6sqrd2zboYWPw198nART1W/wRwdJL9mh/oHN1Mk6SpYVEpSePxLODlwLOTXNs8jgXOAJ6b5GbgOc04SdYlORugqu4G3gR8oXm8sZkmSVPD5m9JGoOq+jSQPrOP6rH8VcBvdI2fA5wzmugkqT3PVEqSJKk1i0pJkiS1ZlEpSZKk1iwqJUmS1Jo/1GlhzZDuHdbGhrXbh3IPs81nHDeEaCRJ0mrlmUpJkiS1ZlEpSZKk1iwqJUmS1JpFpSRJklqzqJQkSVJrFpWSJElqzaJSkiRJrVlUSpIkqTWLSkmSJLVmUSlJkqTWLColSZLUmkWlJEmSWrOolCRJUmsWlZIkSWrNolKSJEmtWVRKkiSpNYtKSZIktbb7pAOQpNUgyTnAC4CtVfXkZtoHgEOaRfYFvltVh/VYdzNwH/BjYHtVrRtL0JK0AhaVkjQe5wLvAN67OKGq/v3icJIzgXuWWf/IqvrWyKKTpJYsKiVpDKrqyiRres1LEuBlwLPHGZMkDZPXVErS5P0ycFdV3dxnfgGfTHJ1kvVjjEuSBuaZSkmavBOBC5aZ/0tVtSXJo4DLktxUVVf2WrApOtcDzM3NsbCwMHAQ27ZtW9Hys25WjnfD2u1D2c7cHv23NQvPw0rMyms7LNNyvBaVkjRBSXYH/i3wtH7LVNWW5u/WJBcCRwA9i8qq2ghsBFi3bl3Nz88PHMvCwgIrWX7WzcrxnnLaJUPZzoa12zlzU++P/c0nzQ9lH9NiVl7bYZmW47X5W5Im6znATVV1W6+ZSfZMsvfiMHA0cN0Y45OkgVhUStIYJLkA+AxwSJLbkryymXUCS5q+kzwmyaXN6Bzw6SRfAj4PXFJVHx9X3JI0KJu/JWkMqurEPtNP6THtduDYZvhrwFNGGpwkDcHIzlQmOSjJFUluSHJ9ktc20x+R5LIkNzd/9xtVDJIkSRqPUTZ/bwc2VNWhwDOAVyc5FDgNuLyqDgYub8YlSZI0w0ZWVFbVHVV1TTN8H3AjcCBwPHBes9h5wItHFYMkSZLGYyw/1Gl6kXgq8DlgrqruaGbdSecidEmSJM2wkf9QJ8lewIeBU6vq3k5vZB1VVUmqz3o7fQNfWPnNYpe7Kew0G1bck7hp6rTcrHWlZjVumN3YZzVuSVpNRlpUJnkwnYLy/Kr6SDP5riQHVNUdSQ4AtvZat80NfGHlN4td7qaw02xYcU/ixrfTcrPWlZrVuGF2Y5/VuCVpNRnlr78DvAe4sare2jXrYuDkZvhk4KJRxSBJkqTxGOWpuWcBLwc2Jbm2mfY64Azgg82Nf78BvGyEMUiSJGkMRlZUVtWngfSZfdSo9itJkqTxs5tGSZIktWZRKUmSpNYsKiVJktSaRaUkSZJas6iUJElSa7N3t29JkqbEmhV2tCHtyjxTKUmSpNYsKiVJktSaRaUkSZJas6iUJElSaxaVkiRJas2iUpIkSa1ZVEqSJKk1i0pJGpMk5yTZmuS6rmmnJ9mS5NrmcWyfdY9J8pUktyQ5bXxRS9JgLColaXzOBY7pMf2sqjqseVy6dGaS3YA/B54PHAqcmOTQkUYqSStkUSlJY1JVVwJ378SqRwC3VNXXqupHwPuB44canCS1ZFEpSZP3miRfbprH9+sx/0Dg1q7x25ppkjQ17PtbkibrncCbgGr+ngm8Ymc3lmQ9sB5gbm6OhYWFgdfdtm3bipafdcM43g1rtw8nmDGY26N/vLva6+57eTIsKiVpgqrqrsXhJO8GPtZjsS3AQV3jj22m9dreRmAjwLp162p+fn7gWBYWFljJ8rNuGMd7ymmXDCeYMdiwdjtnbur9sb/5pPnxBjNivpcnw+ZvSZqgJAd0jb4EuK7HYl8ADk7y+CQPAU4ALh5HfJI0KM9UStKYJLkAmAf2T3Ib8AZgPslhdJq/NwO/2Sz7GODsqjq2qrYneQ3wCWA34Jyqun4ChyBJfVlUStKYVNWJPSa/p8+ytwPHdo1fCjzgdkOSNC1s/pYkSVJrFpWSJElqzaJSkiRJrVlUSpIkqTWLSkmSJLVmUSlJkqTWLColSZLUmkWlJEmSWrOolCRJUmsWlZIkSWrNolKSJEmtWVRKkiSpNYtKSZIktWZRKUmSpNYsKiVJktSaRaUkSZJas6iUJElSaxaVkiRJas2iUpIkSa0NVFQmWTvqQCRpVpgTJemBBj1T+RdJPp/kVUn2GWlEkjT9zImStMRARWVV/TJwEnAQcHWS9yV57nLrJDknydYk13VNOz3JliTXNo9jW0UvSROwMzlRknZ1A19TWVU3A38I/D7wb4A/S3JTkn/bZ5VzgWN6TD+rqg5rHpeuNGBJmgYrzYl9vmj/SbPOl5NcmGTfPutuTrKp+TJ+1SiOR5LaGvSayl9IchZwI/Bs4IVV9a+a4bN6rVNVVwJ3DytQSZoWO5MT6f1F+zLgyVX1C8D/Av5gmd0e2XwZX9cqeEkakd0HXO7twNnA66rqB4sTq+r2JH+4wn2+JsmvAVcBG6rqO70WSrIeWA8wNzfHwsLCinayYe32FS0/t8fK15kGw4p7pc/vMGzbtm0i+21rVuOG2Y19CuNecU6sqiuTrFky7ZNdo58FXjr8UCVpPAYtKo8DflBVPwZI8iDgYVX1/ar6qxXs753Am4Bq/p4JvKLXglW1EdgIsG7dupqfn1/BbuCU0y5Z0fIb1m7nzE2DPh3TY1hxbz5pvn0wK7SwsMBKX9dpMKtxw+zGPoVxDysndnsF8IE+8wr4ZJIC/rLJj5I0VQatRj4FPAfY1ow/HPgk8Isr2VlV3bU4nOTdwMdWsr4kTYmh5MRFSf4fYDtwfp9FfqmqtiR5FHBZkpuaS4x6bWunW3mm8IzwSA3jeGephWu5lq1d7XX3vTwZgxaVD6uqxeRJVW1L8vCV7izJAVV1RzP6EuC65ZaXpCk1lJwIkOQU4AXAUVVVvZapqi3N361JLgSOAHoWlW1aeabwjPBIDeN4V9oqNknLtWxNorVqlHwvT8agv/7+XpLDF0eSPA34wTLLk+QC4DPAIUluS/JK4P9rfsH4ZeBI4Hd3Mm5JmqQV58RekhwD/B7woqr6fp9l9kyy9+IwcDR+IZc0hQY9U3kq8DdJbgcCPBr498utUFUn9pj8npWFJ0lTacU5sfmiPQ/sn+Q24A10fu39UDpN2gCfrarfSvIY4OyqOhaYAy5s5u8OvK+qPj6So5KkFgYqKqvqC0l+HjikmfSVqvqn0YUlSdNrZ3LiSr5oV9XtwLHN8NeAp7QIV5LGYiU/G/7XwJpmncOTUFXvHUlUkjT9zImS1GWgojLJXwFPAK4FftxMLsAEKmnVMSdK0gMNeqZyHXBov18mStIqY06UpCUG/fX3dXQuRJckmRMl6QEGPVO5P3BDks8DP1ycWFUvGklUkjTdzImStMSgReXpowxCkmbM6ZMOQJKmzaC3FPr7JI8DDq6qTzU9R+w22tAkaTqZEyXpgQa6pjLJfwQ+BPxlM+lA4KOjCkqSppk5UZIeaNAf6rwaeBZwL0BV3Qw8alRBSdKUMydK0hKDFpU/rKofLY4k2Z3OPdkkaTUyJ0rSEoMWlX+f5HXAHkmeC/wN8LejC0uSppo5UZKWGLSoPA34JrAJ+E3gUuAPRxWUJE05c6IkLTHor79/Ary7eUjSqmZOlKQHGrTv76/T43qhqvq5oUckSVPOnNjOmtMuGfk+Np9x3Mj3Ien+VtL396KHAf8OeMTww5GkmWBOlKQlBrqmsqq+3fXYUlV/Cvg1UNKqZE6UpAcatPn78K7RB9H5lj7oWU5J2qWYEyXpgQZNgmd2DW8HNgMvG3o0kjQbzImStMSgv/4+ctSBSNKsMCdK0gMN2vz9n5ebX1VvHU44kjT9zImS9EAr+fX3vwYubsZfCHweuHkUQUnSlDMnStISgxaVjwUOr6r7AJKcDlxSVb86qsAkaYqZEyVpiUG7aZwDftQ1/qNmmiStRjuVE5Ock2Rrkuu6pj0iyWVJbm7+7tdn3ZObZW5OcnLrI5CkIRu0qHwv8PkkpzffyD8HnDeyqCRpuu1sTjwXOGbJtNOAy6vqYODyZvx+kjwCeAPwdOAI4A39ik9JmpRBb37+ZuDXge80j1+vqj8eZWCSNK12NidW1ZXA3UsmH89PC9LzgBf3WPV5wGVVdXdVfQe4jAcWp5I0UYOeqQR4OHBvVb0NuC3J40cUkyTNgmHlxLmquqMZvpPezegHArd2jd/WTJOkqTHoLYXeQOfXjocA/x14MPDXwLNGF5okTadR5cSqqiTVMrb1wHqAubk5FhYWBl5327ZtK1p+Z21Yu33k+3j7+RftcJm5PQZbbjkb1rZafazm9uj/3I/jdR+ncb2Xp8W0HO+gv/5+CfBU4BqAqro9yd4ji0qSptswc+JdSQ6oqjuSHABs7bHMFmC+a/yxwEKvjVXVRmAjwLp162p+fr7XYj0tLCywkuV31imnXTLyfQxiw9rtnLlp9fSuudzxbj5pfrzBjNi43svTYlqOd9Dm7x9VVQEFkGTP0YUkSVNvmDnxYmDx19wnA71OnX0CODrJfs0PdI5upknS1Bi0qPxgkr8E9k3yH4FPAe8eXViSNNV2KicmuQD4DHBIktuSvBI4A3hukpuB5zTjJFmX5GyAqrobeBPwhebxxmaaJE2NQfv+fkuS5wL30rmG6PVVddlII5OkKbWzObGqTuwz66gey14F/EbX+DnAOTsXsSSN3g6LyiS7AZ+qqiPp3MZCklYtc6Ik9bbD5u+q+jHwkyT7jCEeSZpq5kRJ6m3Qn71tAzYluQz43uLEqvqdkUQlSdPNnChJSwxaVH6keUiSzImS9ADLFpVJfraq/rGq7Odb0qpnTpSk/nZ0TeVHFweSfHjEsUjStDMnSlIfOyoq0zX8c6MMRJJmgDlRkvrY0TWV1WdYklYjc6J2SWvG1HXm5jOOG8t+NBk7KiqfkuReOt/O92iGacarqv7FSKOTpOliTpSkPpYtKqtqt3EFIknTzpwoSf0N2vf3iiU5J8nWJNd1TXtEksuS3Nz83W9U+5ckSdL4jKyoBM4Fjlky7TTg8qo6GLi8GZckSdKMG1lRWVVXAncvmXw8sHh/t/OAF49q/5IkSRqfUZ6p7GWuqu5ohu8E5sa8f0mSJI3AoN00Dl1VVZK+t+RIsh5YDzA3N8fCwsKKtr9h7fYVLT+3x8rXmQbDinulz+8wbNu2bSL7bWtW44bZjX1W45ak1WTcReVdSQ6oqjuSHABs7bdgVW0ENgKsW7eu5ufnV7SjU1Z4z60Na7dz5qaJ1dg7bVhxbz5pvn0wK7SwsMBKX9dpMKtxw+zGPqtxS9JqMu7m74uBk5vhk4GLxrx/SZIkjcAobyl0AfAZ4JAktyV5JXAG8NwkNwPPacYlSZI040bW3ltVJ/aZddSo9ilJkqTJmL2LCCVJI7Fpyz0rvh5dWolx9TG+Ye32kb6X7cO8t3FfUylJkqRdkEWlJEmSWrOolKQJS3JIkmu7HvcmOXXJMvNJ7ula5vWTileSevGaSkmasKr6CnAYQJLdgC3AhT0W/YeqesE4Y5OkQXmmUpKmy1HAV6vqG5MORJJWwjOVkjRdTgAu6DPvmUm+BNwO/Jequn7pAm26uJ3V7mp3lse76xr1sU5bt7HT0pWtRaUkTYkkDwFeBPxBj9nXAI+rqm1JjgU+Chy8dKE2Xdy+/fyLZrK72p01q93z7qzVdLyjPtZJdG28nGnpytbmb0maHs8Hrqmqu5bOqKp7q2pbM3wp8OAk+487QEnqx6JSkqbHifRp+k7y6CRpho+gk7+/PcbYJGlZq+M8uCRNuSR7As8FfrNr2m8BVNW7gJcCv51kO/AD4ISqqknEKkm9WFRK0hSoqu8BP7Nk2ru6ht8BvGPccUnSoGz+liRJUmsWlZIkSWrNolKSJEmtWVRKkiSpNX+oIwDWnHbJyPex+YzjRr4PSZI0GZ6plCRJUmsWlZIkSWrNolKSJEmtWVRKkiSpNYtKSZIktWZRKUmSpNYsKiVJktSaRaUkSZJas6iUJElSaxaVkiRJas2iUpIkSa1ZVEqSJKk1i0pJkiS1ZlEpSZKk1iwqJUmS1JpFpSRJklqzqJSkKZBkc5JNSa5NclWP+UnyZ0luSfLlJIdPIk5J6mf3SQcgSfpnR1bVt/rMez5wcPN4OvDO5q8kTQXPVErSbDgeeG91fBbYN8kBkw5KkhZZVErSdCjgk0muTrK+x/wDgVu7xm9rpknSVLD5W2Oz5rRL7je+Ye12Tlkyra3NZxw31O1JY/RLVbUlyaOAy5LcVFVXrnQjTUG6HmBubo6FhYWB153bo/N/uVp4vLuuUR/rSv6vxmHbtm1TEZNFpSRNgara0vzdmuRC4Aigu6jcAhzUNf7YZtrS7WwENgKsW7eu5ufnB47h7edfxJmbVs/Hwoa12z3eXdSoj3XzSfMj2/bOWFhYYCX/66Ni87ckTViSPZPsvTgMHA1ct2Sxi4Ffa34F/gzgnqq6Y8yhSlJfq+MriyRNtzngwiTQycvvq6qPJ/ktgKp6F3ApcCxwC/B94NcnFKsk9WRRKUkTVlVfA57SY/q7uoYLePU445KklbD5W5IkSa1N5Exlks3AfcCPge1VtW4ScUiSJGk4Jtn8vVzPEZIkSZohNn9LkiSptUkVlTvqOUKSJEkzZFLN3zvsOaJNrxCw8jvpz2pPA7MaN4wm9reff9FQt9fL3B7T15vCoKal14WVmtW4JWk1mUhROUDPEa16hQBW3P3frPY0MKtxw+zGvmHtdl42BT0X7Ixp6XVhpWY1bklaTcbe/D1gzxGSJEmaIZM4TdSz54gJxCFJkqQhGXtR2a/nCEmSJM0ubykkSZKk1iwqJUmS1JpFpSRJklqzqJQkSVJrFpWSJElqzaJSkiRJrVlUSpIkqTWLSkmSJLVmUSlJkqTWLColSZLUmkWlJEmSWrOolCRJUmsWlZI0QUkOSnJFkhuSXJ/ktT2WmU9yT5Jrm8frJxGrJC1n90kHIEmr3HZgQ1Vdk2Rv4Ookl1XVDUuW+4eqesEE4pOkgXimUpImqKruqKprmuH7gBuBAycblSStnEWlJE2JJGuApwKf6zH7mUm+lOTvkjxprIFJ0gBs/pakKZBkL+DDwKlVde+S2dcAj6uqbUmOBYIUNbMAAAqGSURBVD4KHNxnO+uB9QBzc3MsLCwMHMPcHrBh7fadiH42eby7rlEf60r+r8Zh27ZtUxGTRaUkTViSB9MpKM+vqo8snd9dZFbVpUn+Isn+VfWtHstuBDYCrFu3rubn5weO4+3nX8SZm1bPx8KGtds93l3UqI9180nzI9v2zlhYWGAl/+ujYvO3JE1QkgDvAW6sqrf2WebRzXIkOYJO7v72+KKUpB1bHV9ZJGl6PQt4ObApybXNtNcBPwtQVe8CXgr8dpLtwA+AE6qqJhGsJPVjUSlJE1RVnwayg2XeAbxjPBFJ0s6x+VuSJEmtWVRKkiSpNYtKSZIktWZRKUmSpNYsKiVJktSaRaUkSZJas6iUJElSaxaVkiRJas2iUpIkSa1ZVEqSJKk1i0pJkiS1ZlEpSZKk1iwqJUmS1Nrukw5AkiRplqw57ZJJh3A/G9Zu55SdjGnzGccNLQ7PVEqSJKk1z1RKO2HavqUOqte32WF+S+2n7fM1yLfwcRyHJKk/z1RKkiSpNYtKSZIktWZRKUmSpNYsKiVJktSaRaUkSZJas6iUJElSaxMpKpMck+QrSW5JctokYpCkabKjvJjkoUk+0Mz/XJI1449Skvobe1GZZDfgz4HnA4cCJyY5dNxxSNK0GDAvvhL4TlU9ETgL+G/jjVKSljeJM5VHALdU1deq6kfA+4HjJxCHJE2LQfLi8cB5zfCHgKOSZIwxStKyJlFUHgjc2jV+WzNNklarQfLiPy9TVduBe4CfGUt0kjSAVNV4d5i8FDimqn6jGX858PSqes2S5dYD65vRQ4CvjDi0/YFvjXgfozCrccPsxj6rccPsxj6quB9XVY8cwXZXZJC8mOS6ZpnbmvGvNst8a8m22uTOWX1/7CyPd9e1mo4Vxn+8PXPnJPr+3gIc1DX+2Gba/VTVRmDjuIJKclVVrRvX/oZlVuOG2Y19VuOG2Y19VuNegUHy4uIytyXZHdgH+PbSDbXJnavgeb4fj3fXtZqOFabneCfR/P0F4OAkj0/yEOAE4OIJxCFJ02KQvHgxcHIz/FLgf9S4m5okaRljP1NZVduTvAb4BLAbcE5VXT/uOCRpWvTLi0neCFxVVRcD7wH+KsktwN10Ck9JmhqTaP6mqi4FLp3Evpcxtqb2IZvVuGF2Y5/VuGF2Y5/VuAfWKy9W1eu7hv838O9GHMYu/zwv4fHuulbTscKUHO/Yf6gjSZKkXY/dNEqSJKm1VVlUJjknydbmFh2L0x6R5LIkNzd/95tkjL0kOSjJFUluSHJ9ktc206c69iQPS/L5JF9q4v6vzfTHN93N3dJ0P/eQScfaT5Ldknwxycea8amPPcnmJJuSXJvkqmbaVL9XFiXZN8mHktyU5MYkz5yV2KddvzyyZJkk+bPm/f3lJIdPItZhGPB455Pc0/yvXJvk9b22NQv65dsly+wSXX4OeKynJPlm12v7G5OIdZiWfh4tmTfR13ZVFpXAucAxS6adBlxeVQcDlzfj02Y7sKGqDgWeAbw6na7cpj32HwLPrqqnAIcBxyR5Bp1u5s5qup37Dp1u6KbVa4Ebu8ZnJfYjq+qwrltNTPt7ZdHbgI9X1c8DT6Hz3M9K7NOuXx7p9nzg4OaxHnjneEMcqkGOF+Afmv+Vw6rqjeMNcaj65dtuu0qXn4McK8AHul7bs8cb4kgs/TzqNtHXdlUWlVV1JZ1fT3br7gLtPODFYw1qAFV1R1Vd0wzfR+dNdSBTHnt1bGtGH9w8Cng2ne7mYArjXpTkscBxwNnNeJiR2HuY6vcKQJJ9gF+h82tnqupHVfVdZiD2WbBMHul2PPDe5n/3s8C+SQ4Yc6hDMeDx7jKWybfddokuPwc81l3K0s+jHib62q7KorKPuaq6oxm+E5ibZDA70pzSfirwOWYg9uZ0/bXAVuAy4KvAd5vu5mC6u+v8U+D3gJ804z/DbMRewCeTXJ1OLyswA+8V4PHAN4H/3jTxnJ1kT2Yj9pmyJI902yW7013meAGe2TSj/l2SJ401sCFbmm+rqu/rO+tdfg5wrAD/V3MZx4eSHNRj/ixZ+nm01ERfW4vKHpobCk/tt50kewEfBk6tqnu7501r7FX146o6jE5PIUcAPz/hkAaS5AXA1qq6etKx7IRfqqrD6TRlvjrJr3TPnNb3Cp1bnR0OvLOqngp8jyVN3VMc+8xYLo/sinZwvNfQ6XbuKcDbgY+OO75hWppvkzx50jGNygDH+rfAmqr6BTonNM5buo1ZMQufRxaVP3XXYvNO83frhOPpKcmD6STG86vqI83kmYgdoGnGvAJ4Jp0mtcV7pfbsrnMKPAt4UZLNwPvpNHu/jRmIvaq2NH+3AhfSKeZn4b1yG3Bb1xmHD9EpMmch9pnQJ490G6g73Vmxo+OtqnsXm1Gb+4U+OMn+Yw5z6Lry7dLfEPzz65tluvycJf2Otaq+XVU/bEbPBp427tiG6AGfR0n+eskyE31tLSp/qrsLtJOBiyYYS0/NdRHvAW6sqrd2zZrq2JM8Msm+zfAewHPpXNd0BZ3u5mAK4waoqj+oqsdW1Ro6PZj8j6o6iSmPPcmeSfZeHAaOBq5jyt8rAFV1J3BrkkOaSUcBNzADsc+CZfJIt4uBX2t+Bf4M4J6uSw9myiDHm+TRi9edJTmCzmfjTBZZffLtTUsW2yW6/BzkWJdcC/wi+v/AZer1+Tz61SWLTfS1nUiPOpOW5AJgHtg/yW3AG4AzgA8meSXwDeBlk4uwr2cBLwc2NdeQALyO6Y/9AOC8JLvRSdYfrKqPJbkBeH+SPwK+SPPDjBnx+0x37HPAhc3n5O7A+6rq40m+wHS/Vxb9J+D8dG7V9DXg12neOzMQ+7Trl0d+FqCq3kWnZ59jgVuA79N5/mfVIMf7UuC3k2wHfgCcMItFVqNfvt0Vu/wc5Fh/J8mL6NwF4G7glIlFOyLT9Nrao44kSZJas/lbkiRJrVlUSpIkqTWLSkmSJLVmUSlJkqTWLColSZLUmkWlplKSK5I8b8m0U5O8c5l1tvWbJ0mrgblTk2RRqWl1AQ+8v9YJzXRJUm/mTk2MRaWm1YeA45qbX5NkDfAY4ItJLk9yTZJNSY5fumKS+SQf6xp/R5JTmuGnJfn7JFcn+cSS3hYkadaZOzUxFpWaSlV1N/B54PnNpBOAD9Lp7eIlVXU4cCRw5mL3ajvS9P/7duClVfU04BzgzcOOXZImxdypSVqV3TRqZiw241zU/H0lEOCPk/wK8BPgQDpdEt45wPYOAZ4MXNbk0t2AmezPWJKWYe7URFhUappdBJyV5HDg4VV1ddMU80jgaVX1T0k2Aw9bst527n8WfnF+gOur6pmjDVuSJsrcqYmw+VtTq6q2AVfQaWpZvMh8H2BrkxSPBB7XY9VvAIcmeWiSfYGjmulfAR6Z5JnQadJJ8qSRHoQkjZm5U5PimUpNuwuAC/nprxnPB/42ySbgKuCmpStU1a1JPghcB3wd+GIz/UdJXgr8WZJ96Lz//xS4fuRHIUnjZe7U2KWqJh2DJEmSZpzN35IkSWrNolKSJEmtWVRKkiSpNYtKSZIktWZRKUmSpNYsKiVJktSaRaUkSZJas6iUJElSa/8HYxcQ10MQFdkAAAAASUVORK5CYII=\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig, axarr = plt.subplots(2, 2, figsize=(10, 10))\n",
"model2.diagnostic_plot('pp_plot', ax=axarr[0][0])\n",
"model2.diagnostic_plot('qq_plot', ax=axarr[1][0])\n",
"model2.diagnostic_plot('rvf_plot', ax=axarr[1][1])\n",
"axarr[0, 1].axis('off')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The residual plots look much better for this model:\n",
"- The residuals are closer to the 45-degree lines in the P-P and Q-Q plots.\n",
"- The funnel-like appearance of the RVF plot has disappeared."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAfgAAAFNCAYAAADsL325AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de5wlZX3n8c8X0DgyyEWkRSSOGoKLjCK0oHGz9ggSwAvqKisxBuJlYuIl7o4bJ/ECaszLxKDRJWpQCRqV8UrCgrfR2CKJtxmCDAgGVFwYcAh3Gog68ts/TnU89JyePtMzp0+f8vN+vc6rq+p56qlfPX26f6eeqlOVqkKSJLXLTsMOQJIk7XgmeEmSWsgEL0lSC5ngJUlqIRO8JEktZIKXJKmFTPAaSUkuSzIx7DiGKcmzk1yTZCrJ44Ydz7Qkpyb5yA5s7+QkF3bNTyV5xI5qv2lzMslLdmSbs2znc0lOmqVsWZJKsssO2M5ZSf5se9vRaDPBa9FJcnWSo2Ysu9c/+ap6dFVNztHODvuHuUj9FfCKqlpaVf/a70qj/s+/2d8fDDuO+aiqY6vqQ8OOQ78cTPDSPC2CDw4PAy4bcgw7VJKdhx3DXBbB713qiwleI6n7KD/J4UnWJbk9yaYk72iqXdD8vLUZ1n1ikp2SvD7Jj5LckOTDSXbvavd3m7KbkrxhxnZOTfKpJB9JcjtwcrPtrye5Ncn1SU5Pct+u9irJHya5MskdSd6S5JFJ/qWJ9xPd9WfsY89Yk/xKkilgZ+A7Sb7fY90keWez3u1JNiQ5OMlK4AXAHzd98n+b+quTfL+J8btJnt3V1slJLkzyV0luSfLDJMd2lT88yVebddcCe8+I5ZNJfpzktiQXJHl0V9lZSd6b5LNJ7gRWJHlgknObuL8FPHJGe5Xk15I8pNmH6dddSaqr3ouSXN7E/IUkD+sqe2qSK5qYTgfS63ewld/7Tl19dlPze9yrqX+/pu5Nzfvi20nGmrL/PBWQZOemT29M8gPgaTO2e6+RrMw49bG1fp3Rzt5JzmtiuTnJ15L4v/+XgL9ktcG7gHdV1QPoJINPNMv/W/Nzj2ZY9+vAyc1rBfAIYClwOkCSg4D30EmA+wK7A/vN2NbxwKeAPYCPAj8H/iedpPZE4EjgD2es81vAYcATgD8GzgB+B9gfOBg4cZb96hlrVf2kqpY2dR5bVY/sse7Rzf7/erMfJwA3VdUZTdx/2fTJM5r63wd+s6n7JuAjSfbtau8I4HvNfv4l8MEk00nxY8D6puwtwMxzzJ8DDgD2AS5qtt/tt4G3ArsBFwJ/A/wHnd/Bi5rXFqrqumYfljb9cQ6wBiDJ8cCfAs8BHgR8DTi7Kdsb+Azw+ibm7wNP6rWNLjN/768EngU8GXgIcEsTN83+707n9/tA4GXA3T3afCnwdOBxwDjw3DlimGmufp22CriWTj+M0ekX71H+y6CqfPlaVC/gamAKuLXrdRdw4Yw6RzXTF9BJSnvPaGcZnX9ku3Qt+zLwh13zBwI/A3YB3gic3VV2f+CnXds5FbhgjthfDZzTNV/Ak7rm1wOv7Zo/DfjrWdqaNdautn9tlnWfAvwbnQ8VO80oOwv4szn242Lg+Gb6ZOCqGf1SwIOBXwU2A7t2lX8M+Mgs7e7RrLt7Vywf7irfudnHR3Ut+/MZv/st9ht4bdO3S5r5zwEv7irfqXkPPQz4XeAbXWWhkwBfMkvMW/zegcuBI7vm9+16H70I+BfgMT3ampzeDvBPwMu6yo7ufr/S9R7vimNb+vXPmuk3A/8423vFV3tfHsFrsXpWVe0x/WLLo+JuL6ZzpHpFMxz69K3UfQjwo675H9H5pzzWlF0zXVBVdwE3zVj/mu6ZJL/eDH/+uBm+/XNmDFEDm7qm7+4xv5TethbrVlXVP9EZmfgb4IYkZyR5wGz10zk1cXEzjHsrnZGF7v34cVfbdzWTS5sYb6mqO2fEOd3uzkne1gxl304naTGj7e4+fVCzj93LuvugV+zHAn9E5z0zfaT8MOBdXftzM51Evh9b/p5rxvZ6mVn+MOCcrvYvpzOaMwb8PfAFYE2S65L8ZZL79GjzXnHMtZ/d+uzXaW8HrgK+mOQHSVb3ux2NNhO8Rl5VXVlVJ9IZqvwL4FNJdqX3MOR1dP45T5s+At0EXA88dLogyRI6Q6z32tyM+fcCVwAHVOcUwZ+ylfO522hrsc6pqt5dVYcBB9H5APS/p4u66zXnpt8PvAJ4YPOB6lL624/rgT2b/u6Oc9pv0xnePorOsPWy6c12h9o1/e909nH/Wdq7lyQHAh8CTqiq7mR5DfD73R8Sq2pJVf1LE/P+XW1kxvZ6mfl7vwY4dkb796uqjVX1s6p6U1UdBPwGnWH43+3R5r3i6LGfd9IZLZn24K7pfvq1E3jVHVW1qqoeATwT+F9JjtzazqodTPAaeUl+J8mDquoeOsP5APfQSRb30Dl/Pe1s4H82F4YtpXPE/fGq2kznHOszkvxGOhe+ncrcSW434HZgKsmjgD/YUfs1R6xbleTxSY5ojhzvpHNO+56meBP37pPpD0P/3qz7e3SO4OdUVT8C1gFvSnLfJP8VeEZXld2An9AZCbl/sw9ba+/ndM6Pn5rk/s11EbN9b/wBdIaeX1dVF84ofh/wJ9MXnqVzceLzmrLzgUcneU46V8S/insnz368D3jr9IV7SR7UnPcnyYoky9P5RsDtdIbu7+nRxieAVyV5aJI9gZlH1hcDz09ynyQzz9H33a9Jnp7ORYkBbqMz0tArHrWMCV5tcAxwWTpXlr8LeH5V3d0MJb8V+OdmKPUJwJl0hlAvAH5IJ/G9EqCqLmum19A5upoCbqDzj3Q2r6FzNHUHnaPgj+/A/Zo11j48oInnFjpDvzfRGaoF+CBwUNMn/1BV36VzLcDX6ST/5cA/b0Ocv03nIrybgVOAD3eVfbjZ/kbgu8A3+mjvFXSG/39M51zy381S71A61yW8M11X0wNU1Tl0RnPWNEPYlwLHNmU3As8D3kanXw5g2/YXOu+zc+kMe9/R7NcRTdmD6XxYvJ3O0P1X6fweZ3o/naH879C5SO4zM8rfQOei0VvoXGPysa6ybenXA4Av0Xk/fx14T1V9pZ+d1GhL5/STpJmao+Zb6Qy//3DY8UjStvAIXuqS5BnN0PCudO4Ut4FfXMAkSSPDBC/d2/F0Lm67js7Q5vPLYS5JI8ghekmSWsgjeEmSWsgEL0lSC7XqqUh77713LVu2bCBt33nnney6665zV9QW7Lv5s+/mz77bPvbf/C1k361fv/7GqnpQr7JWJfhly5axbt26gbQ9OTnJxMTEQNpuO/tu/uy7+bPvto/9N38L2XdJZr3FsUP0kiS1kAlekqQWMsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLWSClySphUzwkiS1kAlekqQWMsFLktRCrboXvTQfy1afP+wQZrVq+WYmhh2EpJHkEbwkSS1kgpckqYVM8JIktZAJXpKkFjLBS5LUQiZ4SZJayAQvSVILDSzBJ9k/yVeSfDfJZUn+qFm+V5K1Sa5sfu45y/onNXWuTHLSoOKUJKmNBnkEvxlYVVUHAU8AXp7kIGA18OWqOgD4cjN/L0n2Ak4BjgAOB06Z7YOAJEna0sASfFVdX1UXNdN3AJcD+wHHAx9qqn0IeFaP1X8LWFtVN1fVLcBa4JhBxSpJUtssyDn4JMuAxwHfBMaq6vqm6MfAWI9V9gOu6Zq/tlkmSZL6MPB70SdZCnwaeHVV3Z7kP8uqqpLUdra/ElgJMDY2xuTk5PY0N6upqamBtd12i73vVi3fPOwQZjW2hEXdd4vZYn/fLXb23/wtlr4baIJPch86yf2jVfWZZvGmJPtW1fVJ9gVu6LHqRrjXMzYeCkz22kZVnQGcATA+Pl4TExO9qm23yclJBtV22y32vjt5kT9s5oRF3HeL2WJ/3y129t/8LZa+G+RV9AE+CFxeVe/oKjoXmL4q/iTgH3us/gXg6CR7NhfXHd0skyRJfRjkOfgnAS8EnpLk4uZ1HPA24KlJrgSOauZJMp7kAwBVdTPwFuDbzevNzTJJktSHgQ3RV9WFQGYpPrJH/XXAS7rmzwTOHEx0kiS1m3eykySphUzwkiS1kAlekqQWMsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLTTwh83ol9uy1eezavnmRX2/d0lqI4/gJUlqIRO8JEktZIKXJKmFTPCSJLWQCV6SpBYywUuS1EImeEmSWsgEL0lSC5ngJUlqIRO8JEktZIKXJKmFTPCSJLWQCV6SpBYa2NPkkpwJPB24oaoObpZ9HDiwqbIHcGtVHdJj3auBO4CfA5uranxQcUqS1EaDfFzsWcDpwIenF1TV/5ieTnIacNtW1l9RVTcOLDpJklpsYAm+qi5IsqxXWZIAJwBPGdT2JUn6ZTasc/C/CWyqqitnKS/gi0nWJ1m5gHFJktQKqarBNd45gj9v+hx81/L3AldV1WmzrLdfVW1Msg+wFnhlVV0wS92VwEqAsbGxw9asWbMD9+AXpqamWLp06UDabrMNG29jbAlsunvYkYymsSWwz167DzuMkeTf7Pax/+ZvIftuxYoV62e7Tm2Q5+B7SrIL8BzgsNnqVNXG5ucNSc4BDgd6JviqOgM4A2B8fLwmJiZ2dMgATE5OMqi22+zk1eezavlmTtuw4G+1Vli1fDMn+L6bF/9mt4/9N3+Lpe+GMUR/FHBFVV3bqzDJrkl2m54GjgYuXcD4JEkaeQNL8EnOBr4OHJjk2iQvboqeD5w9o+5Dkny2mR0DLkzyHeBbwPlV9flBxSlJUhsN8ir6E2dZfnKPZdcBxzXTPwAeO6i4JEn6ZeCd7CRJaiETvCRJLWSClySphUzwkiS1kAlekqQWMsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLWSClySphUzwkiS1kAlekqQWMsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLWSClySphUzwkiS1kAlekqQWMsFLktRCA0vwSc5MckOSS7uWnZpkY5KLm9dxs6x7TJLvJbkqyepBxShJUlsN8gj+LOCYHsvfWVWHNK/PzixMsjPwN8CxwEHAiUkOGmCckiS1zsASfFVdANw8j1UPB66qqh9U1U+BNcDxOzQ4SZJabhjn4F+R5JJmCH/PHuX7Add0zV/bLJMkSX1KVQ2u8WQZcF5VHdzMjwE3AgW8Bdi3ql40Y53nAsdU1Uua+RcCR1TVK2bZxkpgJcDY2Nhha9asGci+TE1NsXTp0oG03WYbNt7G2BLYdPewIxlNY0tgn712H3YYI8m/2e1j/83fQvbdihUr1lfVeK+yXRYkgkZVbZqeTvJ+4Lwe1TYC+3fNP7RZNlubZwBnAIyPj9fExMQOiXWmyclJBtV2m528+nxWLd/MaRsW9K3WGquWb+YE33fz4t/s9rH/5m+x9N2CDtEn2bdr9tnApT2qfRs4IMnDk9wXeD5w7kLEJ0lSWwzssCrJ2cAEsHeSa4FTgIkkh9AZor8a+P2m7kOAD1TVcVW1OckrgC8AOwNnVtVlg4pTkqQ2GliCr6oTeyz+4Cx1rwOO65r/LLDFV+gkSVJ/vJOdJEktZIKXJKmFTPCSJLWQCV6SpBYywUuS1EImeEmSWsgEL0lSC3n/UEnbbdnq84cdwhZWLd/MyU1cV7/taUOORlp4HsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLWSClySphUzwkiS1kAlekqQWMsFLktRCJnhJklrIBC9JUgv1leCTLB90IJIkacfp9wj+PUm+leQPk+w+0IgkSdJ26yvBV9VvAi8A9gfWJ/lYkqcONDJJkjRvfZ+Dr6orgdcDrwWeDLw7yRVJntOrfpIzk9yQ5NKuZW9v1rkkyTlJ9phl3auTbEhycZJ127ZLkiSp33Pwj0nyTuBy4CnAM6rqvzTT75xltbOAY2YsWwscXFWPAf4N+JOtbHZFVR1SVeP9xChJkn6h3yP4/wNcBDy2ql5eVRcBVNV1dI7qt1BVFwA3z1j2xara3Mx+A3jovKKWJElbtUuf9Z4G3F1VPwdIshNwv6q6q6r+fp7bfhHw8VnKCvhikgL+tqrOmOc2JEn6pZSqmrtS8g3gqKqaauaXAl+sqt+YY71lwHlVdfCM5a8DxoHnVI8AkuxXVRuT7ENnWP+VzYhAr22sBFYCjI2NHbZmzZo592c+pqamWLp06UDabrMNG29jbAlsunvYkYymsSWwz16L/4srGzbeNuwQttD9vlu+3+Lvw8XG/3nzt5B9t2LFivWzncru9wj+ftPJHaCqppLcfz7BJDkZeDpwZK/k3rS/sfl5Q5JzgMOBngm+Obo/A2B8fLwmJibmE9acJicnGVTbbXby6vNZtXwzp23o962mbquWb+aEEXjfnbz6/GGHsIXu993VL5gYbjAjyP9587dY+q7fc/B3Jjl0eibJYcA2H5MlOQb4Y+CZVXXXLHV2TbLb9DRwNHBpr7qSJKm3fg+rXg18Msl1QIAHA/9jayskORuYAPZOci1wCp2r5n8FWJsE4BtV9bIkDwE+UFXHAWPAOU35LsDHqurz27pjkiT9MusrwVfVt5M8CjiwWfS9qvrZHOuc2GPxB2epex1wXDP9A+Cx/cQlSZJ625YTo48HljXrHJqEqvrwQKKSJEnbpa8En+TvgUcCFwM/bxYXYIKXJGkR6vcIfhw4aLar3iVJ0uLS71X0l9K5sE6SJI2Afo/g9wa+m+RbwE+mF1bVMwcSlSRJ2i79JvhTBxmEJEnasfr9mtxXkzwMOKCqvtTcxW7nwYYmSZLmq9/Hxb4U+BTwt82i/YB/GFRQkiRp+/R7kd3LgScBtwNU1ZXAPoMKSpIkbZ9+E/xPquqn0zNJdqHzPXhJkrQI9Zvgv5rkT4ElSZ4KfBL4v4MLS5IkbY9+E/xq4N+BDcDvA58FXj+ooCRJ0vbp9yr6e4D3Ny9JkrTI9Xsv+h/S45x7VT1ih0ekvi1bff6wQ5AkLVLbci/6afcDngfstePDkSRJO0Jf5+Cr6qau18aq+mvgaQOOTZIkzVO/Q/SHds3uROeIflueJS9JkhZQv0n6tK7pzcDVwAk7PBpJkrRD9HsV/YpBByJJknacfofo/9fWyqvqHTsmHEmStCNsy1X0jwfObeafAXwLuHIQQUmSpO3Tb4J/KHBoVd0BkORU4Pyq+p1BBSZJkuav31vVjgE/7Zr/abNsq5KcmeSGJJd2LdsrydokVzY/95xl3ZOaOlcmOanPOCVJEv0n+A8D30pyanP0/k3gQ32sdxZwzIxlq4EvV9UBwJeb+XtJshdwCnAEcDhwymwfBCRJ0pb6vdHNW4HfA25pXr9XVX/ex3oXADfPWHw8v/hw8CHgWT1W/S1gbVXdXFW3AGvZ8oOCJEmaRb9H8AD3B26vqncB1yZ5+Dy3OVZV1zfTP6b3UP9+wDVd89c2yyRJUh9StcUzZLaslJxC50r6A6vq15M8BPhkVT2pj3WXAedV1cHN/K1VtUdX+S1VteeMdV4D3K+q/qyZfwNwd1X9VY/2VwIrAcbGxg5bs2bNnPszH1NTUyxdunQgbc/Xho23DTuEvowtgU13DzuK0WTfzV933y3fb/fhBjOCFuP/vFGxkH23YsWK9VU13qus36vonw08DrgIoKquS7LbPOPZlGTfqro+yb7ADT3qbAQmuuYfCkz2aqyqzgDOABgfH6+JiYle1bbb5OQkg2p7vk4ekafJrVq+mdM2eGfj+bDv5q+7765+wcRwgxlBi/F/3qhYLH3X7xD9T6tzqF8ASXbdjm2eC0xfFX8S8I896nwBODrJns3FdUc3yyRJUh/6TfCfSPK3wB5JXgp8CXj/XCslORv4OnBgkmuTvBh4G/DUJFcCRzXzJBlP8gGAqroZeAvw7eb15maZJEnqQ7/3ov+rJE8FbgcOBN5YVWv7WO/EWYqO7FF3HfCSrvkzgTP7iU+SJN3bnAk+yc7Al5oHzsyZ1CVJ0vDNOURfVT8H7kniZaiSJI2Ifi/PnQI2JFkL3Dm9sKpeNZCoJEnSduk3wX+meUmSpBGw1QSf5Fer6v9VVT/3nZckSYvEXOfg/2F6IsmnBxyLJEnaQeZK8OmafsQgA5EkSTvOXAm+ZpmWJEmL2FwX2T02ye10juSXNNM081VVDxhodJIkaV62muCraueFCkSSJO042/I8eEmSNCJM8JIktZAJXpKkFjLBS5LUQiZ4SZJayAQvSVILmeAlSWohE7wkSS1kgpckqYVM8JIktZAJXpKkFjLBS5LUQgue4JMcmOTirtftSV49o85Ektu66rxxoeOUJGmUzfW42B2uqr4HHAKQZGdgI3BOj6pfq6qnL2RskiS1xbCH6I8Evl9VPxpyHJIktUqqangbT84ELqqq02csnwA+DVwLXAe8pqoum6WNlcBKgLGxscPWrFkzkFinpqZYunTpQNqerw0bbxt2CH0ZWwKb7h52FKPJvpu/7r5bvt/uww1mBC3G/3mjYiH7bsWKFeurarxX2dASfJL70knej66qTTPKHgDcU1VTSY4D3lVVB8zV5vj4eK1bt24g8U5OTjIxMTGQtudr2erzhx1CX1Yt38xpGxb8bFAr2Hfz1913V7/taUOOZvQsxv95o2Ih+y7JrAl+mEP0x9I5et80s6Cqbq+qqWb6s8B9kuy90AFKkjSqhpngTwTO7lWQ5MFJ0kwfTifOmxYwNkmSRtpQxv6S7Ao8Ffj9rmUvA6iq9wHPBf4gyWbgbuD5NcyLBSRJGjFDSfBVdSfwwBnL3tc1fTpw+sz1JElSf4b9NTlJkjQAXp4rqfVG4RsnXumvHc0jeEmSWsgEL0lSC5ngJUlqIRO8JEktZIKXJKmFTPCSJLWQCV6SpBYywUuS1EImeEmSWsgEL0lSC5ngJUlqIe9FL0mLwGK7X/6q5Zs5eUZM3i9/tHgEL0lSC5ngJUlqIRO8JEktZIKXJKmFTPCSJLWQCV6SpBYywUuS1EJDS/BJrk6yIcnFSdb1KE+Sdye5KsklSQ4dRpySJI2iYd/oZkVV3ThL2bHAAc3rCOC9zU9JkjSHxTxEfzzw4er4BrBHkn2HHZQkSaNgmAm+gC8mWZ9kZY/y/YBruuavbZZJkqQ5pKqGs+Fkv6ramGQfYC3wyqq6oKv8POBtVXVhM/9l4LVVtW5GOyuBlQBjY2OHrVmzZiDxTk1NsXTp0oG0PV8bNt427BD6MrYENt097ChGk303f/bd9unVf8v32304wYyYhcwXK1asWF9V473KhnYOvqo2Nj9vSHIOcDhwQVeVjcD+XfMPbZbNbOcM4AyA8fHxmpiYGEi8k5OTDKrt+Zr5IIjFatXyzZy2YdiXe4wm+27+7Lvt06v/rn7BxHCCGTGLJV8MZYg+ya5JdpueBo4GLp1R7Vzgd5ur6Z8A3FZV1y9wqJIkjaRhfbwdA85JMh3Dx6rq80leBlBV7wM+CxwHXAXcBfzekGKVJGnkDCXBV9UPgMf2WP6+rukCXr6QcUmS1BaL+WtykiRpnkzwkiS1kAlekqQWMsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLWSClySphUzwkiS1kAlekqQWMsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLWSClySphUzwkiS1kAlekqQWMsFLktRCuww7gMVs2erz/3N61fLNnNw1L0nSYuYRvCRJLbTgCT7J/km+kuS7SS5L8kc96kwkuS3Jxc3rjQsdpyRJo2wYQ/SbgVVVdVGS3YD1SdZW1Xdn1PtaVT19CPFJkjTyFvwIvqqur6qLmuk7gMuB/RY6DkmS2myo5+CTLAMeB3yzR/ETk3wnyeeSPHpBA5MkacSlqoaz4WQp8FXgrVX1mRllDwDuqaqpJMcB76qqA2ZpZyWwEmBsbOywNWvW7LAYN2y87T+nx5bAprt3WNO/VOy7+bPv5s++2z69+m/5frsPJ5gRMzU1xdKlSxdkWytWrFhfVeO9yoaS4JPcBzgP+EJVvaOP+lcD41V149bqjY+P17p163ZMkGz5NbnTNvitwvmw7+bPvps/+2779Oq/q9/2tCFFM1omJyeZmJhYkG0lmTXBD+Mq+gAfBC6fLbkneXBTjySH04nzpoWLUpKk0TaMj7dPAl4IbEhycbPsT4FfBaiq9wHPBf4gyWbgbuD5NaxzCZIkjaAFT/BVdSGQOeqcDpy+MBFJktQ+3slOkqQW8goUSVIrLFskzwvZ2rNLFvJCRY/gJUlqIRO8JEktZIKXJKmFTPCSJLWQCV6SpBYywUuS1EImeEmSWsgEL0lSC5ngJUlqIRO8JEktZIKXJKmFTPCSJLWQCV6SpBYywUuS1EImeEmSWsgEL0lSC5ngJUlqIRO8JEktZIKXJKmFTPCSJLXQUBJ8kmOSfC/JVUlW9yj/lSQfb8q/mWTZwkcpSdLoWvAEn2Rn4G+AY4GDgBOTHDSj2ouBW6rq14B3An+xsFFKkjTahnEEfzhwVVX9oKp+CqwBjp9R53jgQ830p4Ajk2QBY5QkaaQNI8HvB1zTNX9ts6xnnaraDNwGPHBBopMkqQVSVQu7weS5wDFV9ZJm/oXAEVX1iq46lzZ1rm3mv9/UubFHeyuBlc3sgcD3BhT63sAW21df7Lv5s+/mz77bPvbf/C1k3z2sqh7Uq2CXBQqg20Zg/675hzbLetW5NskuwO7ATb0aq6ozgDMGEOe9JFlXVeOD3k4b2XfzZ9/Nn323fey/+VssfTeMIfpvAwckeXiS+wLPB86dUedc4KRm+rnAP9VCDzVIkjTCFvwIvqo2J3kF8AVgZ+DMqrosyZuBdVV1LvBB4O+TXAXcTOdDgCRJ6tMwhuipqs8Cn52x7I1d0/8BPG+h45rDwE8DtJh9N3/23fzZd9vH/pu/RdF3C36RnSRJGjxvVStJUguZ4PuU5O1JrkhySZJzkuwx7JhGSZLnJbksyT1Jhn516SiY65bO6i3JmUluaL5uq22QZP8kX0ny3ebv9Y+GHdOoSHK/JN9K8p2m79407JhM8P1bCxxcVY8B/g34kyHHM2ouBZ4DXDDsQEZBn7d0Vm9nAccMO6CRdcEAAAaKSURBVIgRtRlYVVUHAU8AXu77rm8/AZ5SVY8FDgGOSfKEYQZkgu9TVX2xuasewDfofH9ffaqqy6tqUDchaqN+bumsHqrqAjrfvtE2qqrrq+qiZvoO4HK2vNOoeqiOqWb2Ps1rqBe5meDn50XA54YdhFqtn1s6SwPTPMXzccA3hxvJ6Eiyc5KLgRuAtVU11L4bytfkFqskXwIe3KPodVX1j02d19EZxvroQsY2CvrpP0mLX5KlwKeBV1fV7cOOZ1RU1c+BQ5prtM5JcnBVDe1aEBN8l6o6amvlSU4Gng4c6Z31tjRX/2mb9HNLZ2mHS3IfOsn9o1X1mWHHM4qq6tYkX6FzLcjQErxD9H1Kcgzwx8Azq+quYcej1uvnls7SDtU8lvuDwOVV9Y5hxzNKkjxo+ttVSZYATwWuGGZMJvj+nQ7sBqxNcnGS9w07oFGS5NlJrgWeCJyf5AvDjmkxay7onL6l8+XAJ6rqsuFGNRqSnA18HTgwybVJXjzsmEbIk4AXAk9p/s9dnOS4YQc1IvYFvpLkEjof0NdW1XnDDMg72UmS1EIewUuS1EImeEmSWsgEL0lSC5ngJUlqIRO8JEktZIKXBizJ65qnS13SfO3oiGb5q5Pcfwdu5+oke2/H+hNJzmumn7kjnmCXZHJHPz0wyXiSd89SNu8+SHJqktdsX3TS4uGd7KQBSvJEOnc/PLSqftIkn/s2xa8GPgIM5cZJSXZubq25hao6lwW6sc7W4uilqtYB6wYYktQKHsFLg7UvcGNV/QSgqm6squuSvAp4CJ0bY3wFIMl7k6yb+Szp5qj0TUkuSrIhyaOa5Q9M8sWm/geAdK3zD0nWN2Uru5ZPJTktyXeAJzbPnL8iyUV0Huc7Xe/kJKc30xd3ve5O8uQkuzbPXf9Wkn9NcnxTd0mSNUkuT3IOsKRXpzT79BfNdp+X5JFJPt/E/LWufXxekkubZ2xf0CzrHmno2QdJlqXrefBJXpPk1Gb6pUm+3bT56V6jKElelc4z0S9Jsqa/X7W0uJjgpcH6IrB/kn9L8p4kTwaoqncD1wErqmpFU/d1VTUOPAZ4cpLHdLVzY1UdCrwXmB5GPgW4sKoeDZwD/GpX/RdV1WHAOPCqJA9slu8KfLN5ZvU64P3AM4DD6P2gIKrqkKo6BHhDs86/AK8D/qmqDgdWAG9PsivwB8BdVfVfmvgO20rf3FRVh1bVGuAM4JVNzK8B3tPUeSPwW028z+zRxtb6YDafqarHN21eDvS6091q4HFV9RjgZX20KS06JnhpgJrnQx8GrAT+Hfh489CiXk5ojmj/FXg0cFBX2fRDP9YDy5rp/0ZniJ+qOh+4pav+q5qj9G/QeWjNAc3yn9N5kAjAo4AfVtWVzcOTPjLbfiQ5AHg7cEJV/Qw4GlidzqMxJ4H70Umu3TFdAlwyW5vAx5u2lwK/AXyyae9v6Yx8APwzcFaSlwI792hja30wm4ObUYINwAvo9PVMlwAfTfI7dJ4eKY0cz8FLA9acX54EJpukchJwVnedJA+nc+T6+Kq6JclZdJLmtJ80P3/OHH+3SSaAo4AnVtVdSSa72vqPbTnf3bS3FPgE8NKqun56MfDfq+p7M+puS9N3Nj93Am5tRgnupape1lyU+DRgfZKtjQh028y9D2C6+/Is4FlV9Z3mw9ZEj/WfRufDwzOA1yVZ3jwfQBoZHsFLA5TkwObod9ohwI+a6TvoPMAI4AF0Et5tScaAY/to/gLgt5vtHAvs2SzfHbilSe6PAp4wy/pXAMuSPLKZP3GWemcCf1dVX+ta9gXglWkyepLH9YjpYDqnG7aqed74D5M8r1kvSR7bTD+yqr5ZVW+kMwKy/4zVZ+uDTcA+zTn6X6FzoeO03YDr03ks6gtmxpNkJ2D/qvoK8Fo6/bl0rv2QFhuP4KXBWgr8n3QeI7kZuIrOcD10zjt/Psl1VbUiyb/SSbrX0BmansubgLOTXEbnvPj/a5Z/HnhZksuB79EZpt9CVf1HcwHe+UnuAr7GLz5wAJDkYcBzgV9P8qJm8UuAtwB/DVzSJMQf0kmi7wX+rtn25XROKfTjBcB7k7weuA+wBvgOnXP7B9AZMfhys+zJc/VBVf0syZuBbwEbufdjO98AfJPOB4ZvztxnOqcCPpJk92a7766qW/vcD2nR8GlykiS1kEP0kiS1kAlekqQWMsFLktRCJnhJklrIBC9JUguZ4CVJaiETvCRJLWSClySphf4/Xc1eTqUWSgYAAAAASUVORK5CYII=\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(figsize=(8, 5))\n",
"model2.resid_standardized.hist(ax=ax)\n",
"ax.set_title('Histogram of standardized residuals')\n",
"ax.set_xlabel('Standardized residuals')\n",
"ax.set_ylabel('Frequency')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Breusch-Pagan test (studentized) :: chi2(1)\n",
"Test statistic: 5.8346\n",
"Test p-value: 0.7564\n"
]
}
],
"source": [
"bps_stats = heteroskedasticity_test('breusch_pagan_studentized', model2)\n",
"print('Breusch-Pagan test (studentized) :: {}'.format(bps_stats['distribution'] + '({})'.format(bps_stats['nu'])))\n",
"print('Test statistic: {:.4f}'.format(bps_stats['test_stat']))\n",
"print('Test p-value: {:.4f}'.format(bps_stats['p_value']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Second model - change base levels"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Suppose we want to run a model with different base levels for the categorical variables.\n",
"\n",
"This is relatively easy now that we have made the `ModelPipeline`.\n",
"\n",
"It's possible to do so without the model pipeline class, e.g. by manually listing the final columns in an `X_list`, but it is difficult to check whether column names are correct or if columns are missing, especially for datasets where there are multiple categorical variables or categorical variables that have multiple values."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's set up different base levels for categorical variables in an alternative form of model 2, `model2_alt`."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"raw_y_list = ['ln_price']\n",
"raw_X_list = ['type', 'ln_mpg_city', 'airbags', 'origin']\n",
"cat_base_levels = {'type': 'Small', # changed level\n",
" 'airbags': 'None',\n",
" 'origin': 'USA'}"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
ln_price
R-squared:
0.789
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.766
\n",
"
\n",
"
\n",
"
Method:
Least Squares
F-statistic:
34.46
\n",
"
\n",
"
\n",
"
Date:
Fri, 03 Jan 2020
Prob (F-statistic):
1.98e-24
\n",
"
\n",
"
\n",
"
Time:
21:40:34
Log-Likelihood:
14.159
\n",
"
\n",
"
\n",
"
No. Observations:
93
AIC:
-8.318
\n",
"
\n",
"
\n",
"
Df Residuals:
83
BIC:
17.01
\n",
"
\n",
"
\n",
"
Df Model:
9
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
const
6.1811
0.602
10.271
0.000
4.984
7.378
\n",
"
\n",
"
\n",
"
type_compact
0.1280
0.090
1.428
0.157
-0.050
0.306
\n",
"
\n",
"
\n",
"
type_large
0.2169
0.127
1.703
0.092
-0.036
0.470
\n",
"
\n",
"
\n",
"
type_midsize
0.2673
0.104
2.574
0.012
0.061
0.474
\n",
"
\n",
"
\n",
"
type_sporty
0.1140
0.099
1.153
0.252
-0.083
0.311
\n",
"
\n",
"
\n",
"
type_van
0.0085
0.131
0.065
0.949
-0.252
0.269
\n",
"
\n",
"
\n",
"
airbags_driver_&_passenger
0.3635
0.080
4.550
0.000
0.205
0.522
\n",
"
\n",
"
\n",
"
airbags_driver_only
0.2386
0.058
4.088
0.000
0.122
0.355
\n",
"
\n",
"
\n",
"
origin_non_usa
0.2322
0.050
4.632
0.000
0.132
0.332
\n",
"
\n",
"
\n",
"
ln_mpg_city
-1.2106
0.178
-6.789
0.000
-1.565
-0.856
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Omnibus:
5.184
Durbin-Watson:
1.891
\n",
"
\n",
"
\n",
"
Prob(Omnibus):
0.075
Jarque-Bera (JB):
4.821
\n",
"
\n",
"
\n",
"
Skew:
0.555
Prob(JB):
0.0898
\n",
"
\n",
"
\n",
"
Kurtosis:
3.101
Cond. No.
95.1
\n",
"
\n",
"
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: ln_price R-squared: 0.789\n",
"Model: OLS Adj. R-squared: 0.766\n",
"Method: Least Squares F-statistic: 34.46\n",
"Date: Fri, 03 Jan 2020 Prob (F-statistic): 1.98e-24\n",
"Time: 21:40:34 Log-Likelihood: 14.159\n",
"No. Observations: 93 AIC: -8.318\n",
"Df Residuals: 83 BIC: 17.01\n",
"Df Model: 9 \n",
"Covariance Type: nonrobust \n",
"==============================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"----------------------------------------------------------------------------------------------\n",
"const 6.1811 0.602 10.271 0.000 4.984 7.378\n",
"type_compact 0.1280 0.090 1.428 0.157 -0.050 0.306\n",
"type_large 0.2169 0.127 1.703 0.092 -0.036 0.470\n",
"type_midsize 0.2673 0.104 2.574 0.012 0.061 0.474\n",
"type_sporty 0.1140 0.099 1.153 0.252 -0.083 0.311\n",
"type_van 0.0085 0.131 0.065 0.949 -0.252 0.269\n",
"airbags_driver_&_passenger 0.3635 0.080 4.550 0.000 0.205 0.522\n",
"airbags_driver_only 0.2386 0.058 4.088 0.000 0.122 0.355\n",
"origin_non_usa 0.2322 0.050 4.632 0.000 0.132 0.332\n",
"ln_mpg_city -1.2106 0.178 -6.789 0.000 -1.565 -0.856\n",
"==============================================================================\n",
"Omnibus: 5.184 Durbin-Watson: 1.891\n",
"Prob(Omnibus): 0.075 Jarque-Bera (JB): 4.821\n",
"Skew: 0.555 Prob(JB): 0.0898\n",
"Kurtosis: 3.101 Cond. No. 95.1\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model2_alt = (ModelPipeline(df_raw, raw_y_list, raw_X_list, cat_base_levels)\n",
" .get_model()\n",
" .fit())\n",
"model2_alt.results_output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the alternative base levels, of course we still have the same model performance (e.g. R-squared) but different dummy columns are shown."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"