{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fitting Linear Models with Custom Loss Functions and Regularization in Python\n", "\n", "### by [Alex P. Miller](https://alex.miller.im/) ([@alexpmil](https://twitter.com/alexpmil))\n", "\n", "As part of a predictive model competition I participated in earlier this month, I found myself trying to accomplish a peculiar task. The challenge organizers were going to use \"mean absolute percentage error\" (MAPE) as their criterion for model evaluation. Since this is not a standard loss function built into most software, I decided to write my own code to train a model that would use the MAPE in its objective function. \n", "\n", "I started by searching through the SciKit-Learn documentation on [linear models](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model) to see if the model I needed has already been developed somewhere. I thought that the `sklearn.linear_model.RidgeCV` class would accomplish what I wanted (MAPE minimization with L2 regularization), but I could not get the `scoring` argument (which supposedly lets you pass a custom loss function to the model class) to behave as I expected it to.\n", "\n", "While I highly recommend searching through existing packages to see if the model you want already exists, you should (in theory) be able to use this notebook as a template for a building linear models with an arbitrary loss function and regularization scheme." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python Code\n", "\n", "I'll be using a Jupyter Notebook (running Python 3) to build my model. If you're reading this on my website, you can find [the raw .ipynb file linked here](https://github.com/alexmill/website_notebooks/blob/master/custom-loss-function-regularization-python.ipynb); you can also run a fully-exectuable version of the notebook the the Binder platform by [clicking here](https://mybinder.org/v2/gh/alexmill/website_notebooks/master?filepath=custom-loss-function-regularization-python.ipynb).\n", "\n", "We'll start with some basic imports:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Your Data\n", "\n", "For the purposes of this walkthrough, I'll need to generate some raw data. Presumably, if you've found yourself here, you will want to substitute this step with one where you load your own data. \n", "\n", "I am simulating a scenario where I have 100 observations on 10 features (9 features and an intercept). The \"true\" function will simply be a linear function of these features: $y=X\\beta$. However, we want to simulate observing these data with noise. Because I'm mostly going to be focusing on the \"mean absolute percentage error\" loss function, I want my noise to be on an exponential scale, which is why I am taking exponents/logs below:\n", "\n", "$$y = e^{log(X\\beta) + \\varepsilon}, \\; \\varepsilon \\sim \\mathcal{N}(0, 0.2)$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Generate predictors\n", "X_raw = np.random.random(100*9)\n", "X_raw = np.reshape(X_raw, (100, 9))\n", "\n", "# Standardize the predictors\n", "scaler = StandardScaler().fit(X_raw)\n", "X = scaler.transform(X_raw)\n", "\n", "# Add an intercept column to the model.\n", "X = np.abs(np.concatenate((np.ones((X.shape[0],1)), X), axis=1))\n", "\n", "# Define my \"true\" beta coefficients\n", "beta = np.array([2,6,7,3,5,7,1,2,2,8])\n", "\n", "# Y = Xb\n", "Y_true = np.matmul(X,beta)\n", "\n", "# Observed data with noise\n", "Y = Y_true*np.exp(np.random.normal(loc=0.0, scale=0.2, size=100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define your custom loss function\n", "\n", "I am mainly going to focus on the MAPE loss function in this notebook, but this is where you would substitute in your own loss function (if applicable). MAPE is defined as follows:\n", "\n", "### Mean Absolute Percentage Error (MAPE)\n", "\n", "$$ \\text{error}(\\beta) = \\frac{100}{n} \\sum_{i=1}^{n}\\left| \\frac{y_i - X_i\\beta}{y_i} \\right|$$\n", "\n", "While I won't go to into too much detail here, I ended up using a *weighted* MAPE criteria to fit the model I used in the data science competition. Given a set of sample weights $w_i$, you can define the weighted MAPE loss function using the following formula:\n", "\n", "### Weighted MAPE\n", "\n", "$$\\text{error}(\\beta) = 100 \\left( \\sum_{i=1}^N w_i \\right)^{-1} \\sum_{i=1}^N w_i \\left| \\frac{y_i - X_i\\beta}{y_i} \\right|$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python, the MAPE can be calculated with the function below:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def mean_absolute_percentage_error(y_pred, y_true, sample_weights=None):\n", " y_true = np.array(y_true)\n", " y_pred = np.array(y_pred)\n", " assert len(y_true) == len(y_pred)\n", " \n", " if np.any(y_true==0):\n", " print(\"Found zeroes in y_true. MAPE undefined. Removing from set...\")\n", " idx = np.where(y_true==0)\n", " y_true = np.delete(y_true, idx)\n", " y_pred = np.delete(y_pred, idx)\n", " if type(sample_weights) != type(None):\n", " sample_weights = np.array(sample_weights)\n", " sample_weights = np.delete(sample_weights, idx)\n", " \n", " if type(sample_weights) == type(None):\n", " return(np.mean(np.abs((y_true - y_pred) / y_true)) * 100)\n", " else:\n", " sample_weights = np.array(sample_weights)\n", " assert len(sample_weights) == len(y_true)\n", " return(100/sum(sample_weights)*np.dot(\n", " sample_weights, (np.abs((y_true - y_pred) / y_true))\n", " ))\n", " \n", "loss_function = mean_absolute_percentage_error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fitting a simple linear model with custom loss function\n", "\n", "You may know that the traditional method for fitting linear models, ordinary least squares, has a nice analytic solution. This means that the \"optimal\" model parameters that minimize the squared error of the model, can be calculated directly from the input data:\n", "\n", "$$ \\hat\\beta = \\arg\\min_\\beta \\frac{1}{n} \\sum_{i=1}^n (y_i - X_i\\beta)^2 = (X^\\mathrm{T}X)^{-1}X^\\mathrm{T}y $$\n", "\n", "However, with an arbitrary loss function, there is no guarantee that finding the optimal parameters can be done so easily. To keep this notebook as generalizable as possible, I'm going to be minimizing our custom loss functions using numerical optimization techniques (similar to the \"solver\" functionality in Excel). In general terms, the $\\beta$ we want to fit can be found as the solution to the following equation (where I've subsituted in the MAPE for the error function in the last line):\n", "\n", "$$ \\hat\\beta = \\arg\\min_\\beta \\; \\text{error}(\\beta) = \\arg\\min_\\beta \\frac{100}{n} \\sum_{i=1}^{n}\\left| \\frac{y_i - X_i\\beta}{y_i} \\right| $$\n", "\n", "Essentially we want to search over the space of all $\\beta$ values and find the value that minimizes our chosen error function. To get a flavor for what this looks like in Python, I'll fit a simple MAPE model below, using the `minimize` function from SciPy. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 9.08252394 5.54995839 8.75233095 1.1883712 3.29482497 5.03886496\n", " -0.22556182 0.38830739 3.15524308 5.24599191]\n" ] } ], "source": [ "from scipy.optimize import minimize\n", "\n", "def objective_function(beta, X, Y):\n", " error = loss_function(np.matmul(X,beta), Y)\n", " return(error)\n", "\n", "# You must provide a starting point at which to initialize\n", "# the parameter search space\n", "beta_init = np.array([1]*X.shape[1])\n", "result = minimize(objective_function, beta_init, args=(X,Y),\n", " method='BFGS', options={'maxiter': 500})\n", "\n", "# The optimal values for the input parameters are stored\n", "# in result.x\n", "beta_hat = result.x\n", "print(beta_hat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can compare the esimated betas to the true model betas that we initialized at the beginning of this notebook:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
true_betaestimated_betaerror
029.082524-7.082524
165.5499580.450042
278.752331-1.752331
331.1883711.811629
453.2948251.705175
575.0388651.961135
61-0.2255621.225562
720.3883071.611693
823.155243-1.155243
985.2459922.754008
\n", "
" ], "text/plain": [ " true_beta estimated_beta error\n", "0 2 9.082524 -7.082524\n", "1 6 5.549958 0.450042\n", "2 7 8.752331 -1.752331\n", "3 3 1.188371 1.811629\n", "4 5 3.294825 1.705175\n", "5 7 5.038865 1.961135\n", "6 1 -0.225562 1.225562\n", "7 2 0.388307 1.611693\n", "8 2 3.155243 -1.155243\n", "9 8 5.245992 2.754008" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame({\n", " \"true_beta\": beta, \n", " \"estimated_beta\": beta_hat,\n", " \"error\": beta-beta_hat\n", "})[[\"true_beta\", \"estimated_beta\", \"error\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's obviously not perfect, but we can see that our estimated values are at least in the ballpark from the true values. We can also calculate the final MAPE of our estimated model using our loss function:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "14.354033248368872" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loss_function(np.matmul(X,beta_hat), Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Incorporating Regularization into Model Fitting\n", "\n", "The process described above fits a simple linear model to the data provided by directly minimizing the a custom loss function (MAPE, in this case). However, in many machine learning problems, you will want to [regularize](https://en.wikipedia.org/wiki/Regularization_(mathematics)) your model parameters to prevent overfitting. In this notebook, I'm going to walk through the process of incorporating L2 regularization, which amounts to penalizing your model's parameters by the square of their magnitude. \n", "\n", "In precise terms, rather than minimizing our loss function directly, we will augment our loss function by adding a squared penalty term on our model's coefficients. With L2 regularization, our new loss function becomes:\n", "\n", "$$ L(\\beta) = \\frac{100}{N} \\sum_{i=1}^N \\left| \\frac{y_i - X_i\\beta}{y_i} \\right| + \\lambda \\sum_{k=1}^K \\beta_k^2 $$\n", "\n", "Or, in the case that sample weights are provided:\n", "\n", "$$ L(\\beta) = 100 \\left( \\sum_{i=1}^N w_i \\right)^{-1} \\sum_{i=1}^N w_i \\left| \\frac{y_i - X_i\\beta}{y_i} \\right| + \\lambda \\sum_{k=1}^K \\beta_k^2 $$\n", "\n", "For now, we will assume that the $\\lambda$ coefficient (the regularization parameter) is already known. However, later we will use cross validation to find the optimal $\\lambda$ value for our data.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since our model is getting a little more complicated, I'm going to define a Python class with a very similar attribute and method scheme as those found in SciKit-Learn (e.g., `sklearn.linear_model.Lasso` or `sklearn.ensemble.RandomForestRegressor`)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "class CustomLinearModel:\n", " \"\"\"\n", " Linear model: Y = XB, fit by minimizing the provided loss_function\n", " with L2 regularization\n", " \"\"\"\n", " def __init__(self, loss_function=mean_absolute_percentage_error, \n", " X=None, Y=None, sample_weights=None, beta_init=None, \n", " regularization=0.00012):\n", " self.regularization = regularization\n", " self.beta = None\n", " self.loss_function = loss_function\n", " self.sample_weights = sample_weights\n", " self.beta_init = beta_init\n", " \n", " self.X = X\n", " self.Y = Y\n", " \n", " \n", " def predict(self, X):\n", " prediction = np.matmul(X, self.beta)\n", " return(prediction)\n", "\n", " def model_error(self):\n", " error = self.loss_function(\n", " self.predict(self.X), self.Y, sample_weights=self.sample_weights\n", " )\n", " return(error)\n", " \n", " def l2_regularized_loss(self, beta):\n", " self.beta = beta\n", " return(self.model_error() + \\\n", " sum(self.regularization*np.array(self.beta)**2))\n", " \n", " def fit(self, maxiter=250): \n", " # Initialize beta estimates (you may need to normalize\n", " # your data and choose smarter initialization values\n", " # depending on the shape of your loss function)\n", " if type(self.beta_init)==type(None):\n", " # set beta_init = 1 for every feature\n", " self.beta_init = np.array([1]*self.X.shape[1])\n", " else: \n", " # Use provided initial values\n", " pass\n", " \n", " if self.beta!=None and all(self.beta_init == self.beta):\n", " print(\"Model already fit once; continuing fit with more itrations.\")\n", " \n", " res = minimize(self.l2_regularized_loss, self.beta_init,\n", " method='BFGS', options={'maxiter': 500})\n", " self.beta = res.x\n", " self.beta_init = self.beta" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 8.70454035, 5.56955027, 8.82671937, 1.10660836, 3.36271348,\n", " 5.16710648, -0.08675964, 0.4776243 , 3.12646051, 5.28643399])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "l2_mape_model = CustomLinearModel(\n", " loss_function=mean_absolute_percentage_error,\n", " X=X, Y=Y, regularization=0.00012\n", ")\n", "l2_mape_model.fit()\n", "l2_mape_model.beta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to confirm that our regularization did work, let's make sure that the estimated betas found with regularization are different from those found without regularization (which we calculated earlier):" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
true_betaestimated_betaregularized_beta
029.0825248.704540
165.5499585.569550
278.7523318.826719
331.1883711.106608
453.2948253.362713
575.0388655.167106
61-0.225562-0.086760
720.3883070.477624
823.1552433.126461
985.2459925.286434
\n", "
" ], "text/plain": [ " true_beta estimated_beta regularized_beta\n", "0 2 9.082524 8.704540\n", "1 6 5.549958 5.569550\n", "2 7 8.752331 8.826719\n", "3 3 1.188371 1.106608\n", "4 5 3.294825 3.362713\n", "5 7 5.038865 5.167106\n", "6 1 -0.225562 -0.086760\n", "7 2 0.388307 0.477624\n", "8 2 3.155243 3.126461\n", "9 8 5.245992 5.286434" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame({\n", " \"true_beta\": beta, \n", " \"estimated_beta\": beta_hat,\n", " \"regularized_beta\": l2_mape_model.beta\n", "})[[\"true_beta\", \"estimated_beta\", \"regularized_beta\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since our regularization parameter is so small, we can see that it didn't affect our coefficient estimates dramatically. But the fact that the betas are different between the two models indicates that our regularization does seem to be working." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to make sure things are in the realm of common sense, it's never a bad idea to plot your predicted Y against our observed Y." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Predicted Y vs. observed Y\n", "plt.scatter(l2_mape_model.predict(X), Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Important Caveat: Standardize Your Predictors\n", "> In most applications, your features will be measured on many different scales; however you'll notice in the loss function described above, each $\\beta_k$ parameter is being penalized by the same amount ($\\lambda$). Best practice when using L2 regularization is to **standardize your feature matrix** (subtract the mean off of each column and divide the result by the column standard deviation). This will ensure that all features are on approximately the same scale and that the regularization parameter has an equal impact on all $\\beta_k$ coefficients.\n", "\n", "> I standardized my data at the very beginning of this notebook, but typically you will need to work standardization into your data pipeline. Use `sklearn.preprocessing.StandardScaler` and keep track of your intercept when going through this process!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross Validation to Identify Optimal Regularization Parameter\n", "\n", "At this point, we have a model class that will find the optimal beta coefficients to minimize the loss function described above with a given regularization parameter. Of course, your regularization parameter $\\lambda$ will not typically fall from the sky. Below I've included some code that uses cross validation to find the optimal $\\lambda$, among the set of candidates provided by the user." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import KFold\n", "\n", "# Used to cross-validate models and identify optimal lambda\n", "class CustomCrossValidator:\n", " \n", " \"\"\"\n", " Cross validates arbitrary model using MAPE criterion on\n", " list of lambdas.\n", " \"\"\"\n", " def __init__(self, X, Y, ModelClass,\n", " sample_weights=None,\n", " loss_function=mean_absolute_percentage_error):\n", " \n", " self.X = X\n", " self.Y = Y\n", " self.ModelClass = ModelClass\n", " self.loss_function = loss_function\n", " self.sample_weights = sample_weights\n", " \n", " def cross_validate(self, lambdas, num_folds=10):\n", " \"\"\"\n", " lambdas: set of regularization parameters to try\n", " num_folds: number of folds to cross-validate against\n", " \"\"\"\n", " \n", " self.lambdas = lambdas\n", " self.cv_scores = []\n", " X = self.X\n", " Y = self.Y \n", " \n", " # Beta values are not likely to differ dramatically\n", " # between differnt folds. Keeping track of the estimated\n", " # beta coefficients and passing them as starting values\n", " # to the .fit() operator on our model class can significantly\n", " # lower the time it takes for the minimize() function to run\n", " beta_init = None\n", " \n", " for lam in self.lambdas:\n", " print(\"Lambda: {}\".format(lam))\n", " \n", " # Split data into training/holdout sets\n", " kf = KFold(n_splits=num_folds, shuffle=True)\n", " kf.get_n_splits(X)\n", " \n", " # Keep track of the error for each holdout fold\n", " k_fold_scores = []\n", " \n", " # Iterate over folds, using k-1 folds for training\n", " # and the k-th fold for validation\n", " f = 1\n", " for train_index, test_index in kf.split(X):\n", " # Training data\n", " CV_X = X[train_index,:]\n", " CV_Y = Y[train_index]\n", " CV_weights = None\n", " if type(self.sample_weights) != type(None):\n", " CV_weights = self.sample_weights[train_index]\n", " \n", " # Holdout data\n", " holdout_X = X[test_index,:]\n", " holdout_Y = Y[test_index]\n", " holdout_weights = None\n", " if type(self.sample_weights) != type(None):\n", " holdout_weights = self.sample_weights[test_index]\n", " \n", " # Fit model to training sample\n", " lambda_fold_model = self.ModelClass(\n", " regularization=lam,\n", " X=CV_X,\n", " Y=CV_Y,\n", " sample_weights=CV_weights,\n", " beta_init=beta_init,\n", " loss_function=self.loss_function\n", " )\n", " lambda_fold_model.fit()\n", " \n", " # Extract beta values to pass as beta_init \n", " # to speed up estimation of the next fold\n", " beta_init = lambda_fold_model.beta\n", " \n", " # Calculate holdout error\n", " fold_preds = lambda_fold_model.predict(holdout_X)\n", " fold_mape = mean_absolute_percentage_error(\n", " holdout_Y, fold_preds, sample_weights=holdout_weights\n", " )\n", " k_fold_scores.append(fold_mape)\n", " print(\"Fold: {}. Error: {}\".format( f, fold_mape))\n", " f += 1\n", " \n", " # Error associated with each lambda is the average\n", " # of the errors across the k folds\n", " lambda_scores = np.mean(k_fold_scores)\n", " print(\"LAMBDA AVERAGE: {}\".format(lambda_scores))\n", " self.cv_scores.append(lambda_scores)\n", " \n", " # Optimal lambda is that which minimizes the cross-validation error\n", " self.lambda_star_index = np.argmin(self.cv_scores)\n", " self.lambda_star = self.lambdas[self.lambda_star_index]\n", " print(\"\\n\\n**OPTIMAL LAMBDA: {}**\".format(self.lambda_star))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lambda: 1\n", "Fold: 1. Error: 281.675016261853\n", "Fold: 2. Error: 288.38164186977633\n", "Fold: 3. Error: 252.80554916202297\n", "Fold: 4. Error: 274.83039775112576\n", "Fold: 5. Error: 249.91013817877382\n", "LAMBDA AVERAGE: 269.5205486447104\n", "Lambda: 0.1\n", "Fold: 1. Error: 21.494107244928493\n", "Fold: 2. Error: 20.40295905215266\n", "Fold: 3. Error: 18.66240844943417\n", "Fold: 4. Error: 25.116437906551965\n", "Fold: 5. Error: 22.90006062583064\n", "LAMBDA AVERAGE: 21.715194655779584\n", "Lambda: 0.01\n", "Fold: 1. Error: 17.89961806856108\n", "Fold: 2. Error: 20.60101543589219\n", "Fold: 3. Error: 15.300577288722952\n", "Fold: 4. Error: 16.103828700399553\n", "Fold: 5. Error: 24.36922875047351\n", "LAMBDA AVERAGE: 18.854853648809858\n", "Lambda: 0.001\n", "Fold: 1. Error: 22.120515998293445\n", "Fold: 2. Error: 12.805498902418814\n", "Fold: 3. Error: 17.399272285579485\n", "Fold: 4. Error: 18.907906539945323\n", "Fold: 5. Error: 15.496265314676894\n", "LAMBDA AVERAGE: 17.34589180818279\n", "Lambda: 0.0001\n", "Fold: 1. Error: 18.90513051308685\n", "Fold: 2. Error: 17.138574318436756\n", "Fold: 3. Error: 24.855574956251054\n", "Fold: 4. Error: 18.797927727509116\n", "Fold: 5. Error: 18.42840150874861\n", "LAMBDA AVERAGE: 19.625121804806476\n", "Lambda: 1e-05\n", "Fold: 1. Error: 23.92337561024042\n", "Fold: 2. Error: 17.214572854892992\n", "Fold: 3. Error: 17.61002265462196\n", "Fold: 4. Error: 21.151712858887713\n", "Fold: 5. Error: 16.60731580859065\n", "LAMBDA AVERAGE: 19.301399957446744\n", "Lambda: 1e-06\n", "Fold: 1. Error: 17.106159456538276\n", "Fold: 2. Error: 15.874651443835047\n", "Fold: 3. Error: 15.931341544180578\n", "Fold: 4. Error: 27.61864882348346\n", "Fold: 5. Error: 13.632013387958775\n", "LAMBDA AVERAGE: 18.032562931199227\n", "\n", "\n", "**OPTIMAL LAMBDA: 0.001**\n" ] } ], "source": [ "# User must specify lambdas over which to search\n", "lambdas = [1, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]\n", "\n", "cross_validator = CustomCrossValidator(\n", " X, Y, CustomLinearModel,\n", " loss_function=mean_absolute_percentage_error\n", ")\n", "cross_validator.cross_validate(lambdas, num_folds=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After identifying the optimal $\\lambda$ for your model/dataset, you will want to fit your final model using this value on the entire training dataset." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7.75056473, 5.66556818, 8.82981011, 1.08610394, 3.58484621,\n", " 5.52614915, 0.21737665, 0.63868509, 3.04998495, 5.39175218])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lambda_star = cross_validator.lambda_star\n", "final_model = CustomLinearModel(\n", " loss_function=mean_absolute_percentage_error,\n", " X=X, Y=Y, regularization=lambda_star\n", ")\n", "final_model.fit()\n", "final_model.beta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can then generate out-of-sample predictions using this final, fully optimized model." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([23.28912996, 30.44766829, 32.25209686, 25.3555125 , 21.03657229,\n", " 16.94769912, 20.11264239, 28.44548273, 20.53333071, 28.27208959])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_data = np.random.random((10,10))\n", "final_model.predict(test_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }