{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classification metrics\n", "Author: Geraldine Klarenberg\n", "\n", "Based on the Google Machine Learning Crash Course" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Tresholds\n", "In previous lessons, we have talked about using regression models to predict values. But sometimes we are interested in **classifying** things: \"spam\" vs \"not spam\", \"bark\" vs \"not barking\", etc. \n", "\n", "Logistic regression is a great tool to use in ML classification models. We can use the outputs from these models by defining **classification thresholds**. For instance, if our model tells us there's a probability of 0.8 that an email is spam (based on some characteristics), the model classifies it as such. If the probability estimate is less than 0.8, the model classifies it as \"not spam\". The threshold allows us to map a logistic regression value to a binary category (the prediction).\n", "\n", "Tresholds are problem-dependent, so they will have to be tuned for the specific problem you are dealing with.\n", "\n", "In this lesson we will look at metrics you can use to evaluate a classification model's predictions, and what changing the threshold does to your model and predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## True, false, positive, negative...\n", "\n", "Now, we could simply look at \"accuracy\": the ratio of all correct predictions to all predictions. This is simple, intuitive and straightfoward. \n", "\n", "But there are some problems with this approach:\n", "* This approach does not work well if there is (class) imbalance; situations where certain negative or positive values or outcomes are rare; \n", "* and, most importantly: different kind of mistakes can have different costs..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The boy who cried wolf...\n", "\n", "We all know the story!\n", "\n", "![Illustration of the boy who cried wolf](../nb-images/wolfpic.jpg)\n", "\n", "For this example, we define \"there actually is a wolf\" as a positive class, and \"there is no wolf\" as a negative class. The predictions that a model makes can be true or false for both classes, generating 4 outcomes:\n", "\n", "![An table showing a confusion matrix based on the story of the boy who cried wolf](../nb-images/confusionmatrix_wolf.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This table is also called a *confusion matrix*.\n", "\n", "There are 2 metrics we can derive from these outcomes: precision and recall.\n", "\n", "## Precision\n", "Precision asks the question what proportion of the positive predictions was actually correct?\n", "\n", "To calculate the precision of your model, take all true positives divided by *all* positive predictions:\n", "$$\\text{Precision} = \\frac{TP}{TP+FP}$$\n", "\n", "Basically: **did the model cry 'wolf' too often or too little?**\n", "\n", "**NB** If your model produces no negative positives, the value of the precision is 1.0. Too many negative positives gives values greater than 1, too few gives values less than 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "Calculate the precision of a model with the following outcomes\n", "\n", "true positives (TP): 1 | false positives (FP): 1 \n", "-------|--------\n", "**false negatives (FN): 8** | **true negatives (TN): 90** " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Recall\n", "Recall tries to answer the question what proportion of actual positives was answered correctly?\n", "\n", "To calculate recall, divide all true positives by the true positives plus the false negatives:\n", "$$\\text{Recall} = \\frac{TP}{TP+FN}$$\n", "\n", "Basically: **how many wolves that tried to get into the village did the model actually get?**\n", "\n", "**NB** If the model produces no false negative, recall equals 1.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "For the same confusion matrix as above, calculate the recall." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Balancing precision and recall\n", "To evaluate your model, should look at **both** precision and recall. They are often in tension though: improving one reduces the other.\n", "Lowering the classification treshold improves recall (your model will call wolf at every little sound it hears) but will negatively affect precision (it will call wolf too often)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise\n", "#### Part 1\n", "Look at the outputs of a model that classifies incoming emails as \"spam\" or \"not spam\".\n", "\n", "![Image of outcomes of a spam/not spam classification model](../nb-images/PrecisionVsRecallBase.svg)\n", "\n", "The confusion matrix looks as follows\n", "\n", "true positives (TP): 8 | false positives (FP): 2 \n", "-------|--------\n", "**false negatives (FN): 3** | **true negatives (TN): 17** \n", "\n", "Calculate the precision and recall for this model.\n", "\n", "#### Part 2\n", "Now see what happens to the outcomes (below) if we increase the threshold\n", "\n", "![Image of outcomes of a spam/not spam classification model](../nb-images/PrecisionVsRecallRaiseThreshold.svg)\n", "\n", "The confusion matrix looks as follows\n", "\n", "true positives (TP): 7 | false positives (FP): 4 \n", "-------|--------\n", "**false negatives (FN): 1** | **true negatives (TN): 18** \n", "\n", "Calculate the precision and recall again.\n", "\n", "**Compare the precision and recall from the first and second model. What do you notice?** " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate model performance\n", "We can evaluate the performance of a classification model at all classification thresholds. For all different thresholds, calculate the *true positive rate* and the *false positive rate*. The true positive rate is synonymous with recall (and sometimes called *sensitivity*) and is thus calculated as\n", "\n", "$ TPR = \\frac{TP} {TP + FN} $\n", "\n", "False positive rate (sometimes called *specificity*) is:\n", "\n", "$ FPR = \\frac{FP} {FP + TN} $\n", "\n", "When you plot the pairs of TPR and FPR for all the different thresholds, you get a Receiver Operating Characteristics (ROC) curve. Below is a typical ROC curve.\n", "\n", "![Image of an ROC curve](../nb-images/ROCCurve.svg)\n", "\n", "To evaluate the model, we look at the area under the curve (AUC). The AUC has a probabilistic interpretation: it represents the probability that a random positive (green) example is positioned to the right of a random negative (red) example.\n", "\n", "![Image with predictions ranked according to logistic regression score](../nb-images/AUCPredictionsRanked.svg)\n", "\n", "So if that AUC is 0.9, that's the probability the pair-wise prediction is correct. Below are a few visualizations of AUC results. On top are the distributions of the outcomes of the negative and positive outcomes at various thresholds. Below is the corresponding ROC.\n", "\n", "![Image with distributions of positive and negative classes - perfect](../nb-images/TowardsDataScienceAUC_perfect.png) \n", "![Image with AUC - perfect](../nb-images/TowardsDataScienceAUC_perfect2.png)\n", "**This AUC suggests a perfect model** (which is suspicious!)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image with distributions of positive and negative classes - normal](../nb-images/TowardsDataScienceAUC_normal.png)\n", "![Image with AUC - normal](../nb-images/TowardsDataScienceAUC_normal2.png)\n", "**This is what most AUCs look like**. In this case, AUC = 0.7 means that there is 70% chance the model will be able to distinguish between positive and negative classes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Image with distributions of positive and negative classes - worst](../nb-images/TowardsDataScienceAUC_worst.png)\n", "![Image with AUC - worst](../nb-images/TowardsDataScienceAUC_worst2.png)\n", "**This is actually the worst case scenario.** This model has no discrimination capacity at all... " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prediction bias\n", "Logistic regression should be unbiased, meaning that the average of the predictions should be more or less equal to the average of the observations. **Prediction bias** is the difference between the average of the predictions and the average of the labels in a data set.\n", "\n", "This approach is not perfect, e.g. if your model almost always predicts the average there will not be much bias. However, if there **is** bias (\"significant nonzero bias\"), that means there is something something going on that needs to be checked, specifically that the model is wrong about the frequency of positive labels.\n", "\n", "Possible root causes of prediction bias are:\n", "* Incomplete feature set\n", "* Noisy data set\n", "* Buggy pipeline\n", "* Biased training sample\n", "* Overly strong regularization\n", "\n", "### Buckets and prediction bias\n", "For logistic regression, this process is a bit more involved, as the labels assigned to an examples are either 0 or 1. So you cannot accurately predict the prediction bias based on one example. You need to group data in \"buckets\" and examine the prediction bias on that. Prediction bias for logistic regression only makes sense when grouping enough examples together to be able to compare a predicted value (for example, 0.392) to observed values (for example, 0.394). \n", "\n", "You can create buckets by linearly breaking up the target predictions, or create quantiles. \n", "\n", "The plot below is a calibration plot. Each dot represents a bucket with 1000 values. On the x-axis we have the average value of the predictions for that bucket and on the y-axis the average of the actual observations. Note that the axes are on logarithmic scales.\n", "\n", "![Image of a calibration plot with buckets](../nb-images/BucketingBias.svg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coding\n", "Recall the logistic regression model we made in the previous lesson. That was a perfect fit, so not that useful when we look at the metrics we just discussed.\n", "\n", "In the cloud plot with the sepal length and petal width plotted against each other, it is clear that the other two iris species are less separated. Let's use one of these as an example. We'll rework the example so we're classifying irises for being \"virginica\" or \"not virginica\". " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal_length(cm)sepal_width(cm)petal_length(cm)petal_width(cm)species_idspecies_name
05.13.51.40.20setosa
14.93.01.40.20setosa
24.73.21.30.20setosa
34.63.11.50.20setosa
45.03.61.40.20setosa
\n", "
" ], "text/plain": [ " sepal_length(cm) sepal_width(cm) petal_length(cm) petal_width(cm) \\\n", "0 5.1 3.5 1.4 0.2 \n", "1 4.9 3.0 1.4 0.2 \n", "2 4.7 3.2 1.3 0.2 \n", "3 4.6 3.1 1.5 0.2 \n", "4 5.0 3.6 1.4 0.2 \n", "\n", " species_id species_name \n", "0 0 setosa \n", "1 0 setosa \n", "2 0 setosa \n", "3 0 setosa \n", "4 0 setosa " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import load_iris\n", "import pandas as pd\n", "\n", "iris = load_iris()\n", "\n", "X = iris.data\n", "y = iris.target\n", "\n", "df = pd.DataFrame(X, \n", " columns = ['sepal_length(cm)',\n", " 'sepal_width(cm)',\n", " 'petal_length(cm)',\n", " 'petal_width(cm)'])\n", "\n", "df['species_id'] = y\n", "\n", "species_map = {0: 'setosa', 1: 'versicolor', 2: 'virginica'}\n", "\n", "df['species_name'] = df['species_id'].map(species_map)\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now extract the data we need and create the necessary dataframes again." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X = np.c_[X[:,0], X[:,3]]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y = []\n", "for i in range(len(X)):\n", " if i > 99:\n", " y.append(1)\n", " else:\n", " y.append(0)\n", "\n", "y = np.array(y)\n", "\n", "plt.scatter(X[:,0], X[:,1], c = y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create our test and train data, and run a model. The default classification threshold is 0.5. If the predicted probability is > 0.5, the predicted result is 'virgnica'. If it is < 0.5, the predicted result is 'not virginica'." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "random = np.random.permutation(len(X))\n", "x_train = X[random][30:]\n", "x_test = X[random][:30]\n", "\n", "y_train= y[random][30:]\n", "y_test = y[random][:30]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", " verbose=0, warm_start=False)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "\n", "log_reg = LogisticRegression()\n", "log_reg.fit(x_train,y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of looking at the probabilities and the plot, like in the last lesson, let's run some classification metrics on the training dataset. \n", "\n", "If you use \".score\", you get the mean accuracy." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.95" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "log_reg.score(x_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's predict values and see what this ouput means and how we can look at other metrics." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,\n", " 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,\n", " 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,\n", " 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1,\n", " 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,\n", " 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]),\n", " array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0,\n", " 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,\n", " 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,\n", " 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1,\n", " 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0,\n", " 0, 1, 0, 0, 1, 0, 0, 0, 0, 1]))" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predictions = log_reg.predict(x_train)\n", "predictions, y_train" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is a way to look at the confusion matrix. The output that is generated has the same structure as the confusion matrices we showed earlier:\n", "\n", "true positives (TP) | false positives (FP) \n", "-------|--------\n", "**false negatives (FN)** | **true negatives (TN)** " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[81, 1],\n", " [ 5, 33]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import confusion_matrix\n", "confusion_matrix(y_train, predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indeed, for the accuracy calculation: we predicted 81 + 33 = 114 correct (true positives and true negatives), and 114/120 (remember, our training data had 120 points) = 0.95.\n", "\n", "There is also a function to calculate recall and precision:\n", "\n", "Since we also have a testing data set, let's see what the metrics look like for that." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.868421052631579" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import recall_score\n", "recall_score(y_train, predictions)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9705882352941176" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import precision_score\n", "precision_score(y_train, predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, of course, there are also built-in functions to check the ROC curve and AUC! For these functions, the inputs are the labels of the original dataset and the predicted probabilities (- not the predicted labels -> **why?**). Remember what the two columns mean?" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.79088354, 0.20911646],\n", " [0.76441507, 0.23558493],\n", " [0.21376472, 0.78623528],\n", " [0.68298146, 0.31701854],\n", " [0.98434495, 0.01565505],\n", " [0.98202253, 0.01797747],\n", " [0.59645687, 0.40354313],\n", " [0.99162653, 0.00837347],\n", " [0.98955069, 0.01044931],\n", " [0.69536435, 0.30463565]])" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proba_virginica = log_reg.predict_proba(x_train)\n", "proba_virginica[0:10]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import roc_curve\n", "fpr_model, tpr_model, thresholds_model = roc_curve(y_train, proba_virginica[:,1])" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0. , 0. , 0.01219512, 0.01219512,\n", " 0.03658537, 0.04878049, 0.04878049, 0.07317073, 0.07317073,\n", " 0.1097561 , 0.1097561 , 0.13414634, 0.15853659, 0.17073171,\n", " 0.19512195, 0.19512195, 0.20731707, 0.24390244, 0.35365854,\n", " 0.37804878, 0.41463415, 0.43902439, 0.57317073, 0.6097561 ,\n", " 0.62195122, 0.64634146, 0.68292683, 0.69512195, 0.7195122 ,\n", " 0.75609756, 0.76829268, 0.81707317, 0.84146341, 0.85365854,\n", " 0.87804878, 0.90243902, 0.92682927, 0.96341463, 0.98780488,\n", " 1. ])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fpr_model" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.02631579, 0.21052632, 0.26315789, 0.42105263, 0.47368421,\n", " 0.5 , 0.55263158, 0.63157895, 0.63157895, 0.86842105,\n", " 0.86842105, 0.86842105, 0.89473684, 0.89473684, 0.92105263,\n", " 0.92105263, 0.94736842, 0.94736842, 0.94736842, 0.97368421,\n", " 0.97368421, 1. , 1. , 1. , 1. ,\n", " 1. , 1. , 1. , 1. , 1. ,\n", " 1. , 1. , 1. , 1. , 1. ,\n", " 1. , 1. , 1. , 1. , 1. ,\n", " 1. , 1. , 1. , 1. , 1. ,\n", " 1. ])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tpr_model" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.9543821 , 0.88400175, 0.86877788, 0.78623528, 0.76976277,\n", " 0.75242062, 0.74626989, 0.70166775, 0.6840347 , 0.52035581,\n", " 0.49338248, 0.47269031, 0.46453574, 0.45209154, 0.44708893,\n", " 0.37018773, 0.35394216, 0.32247005, 0.31701854, 0.30463565,\n", " 0.30198749, 0.29512957, 0.29252928, 0.28736925, 0.20911646,\n", " 0.20705139, 0.1887098 , 0.18115478, 0.01842535, 0.01797747,\n", " 0.01698478, 0.01584934, 0.01565505, 0.01497238, 0.01460719,\n", " 0.01362849, 0.01346105, 0.0118611 , 0.01106433, 0.01044931,\n", " 0.01032052, 0.00950749, 0.00897822, 0.0088674 , 0.00837347,\n", " 0.00719249])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "thresholds_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot the ROC curve as follows" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(fpr_model, tpr_model,label='our model')\n", "plt.plot([0,1],[0,1],label='random')\n", "plt.plot([0,0,1,1],[0,1,1,1],label='perfect')\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The AUC:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9815468549422336" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import roc_auc_score\n", "auc_model = roc_auc_score(y_train, proba_virginica[:,1])\n", "auc_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use the ROC and AUC metric to evaluate competing models. Many people prefer to use these metrics to analyze each model’s performance because it does not require selecting a threshold and helps balance true positive rate and false positive rate." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's do the same thing for our test data (but again, this dataset is fairly small, and K-fold cross-validation is recommended)." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9333333333333333" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "log_reg.score(x_test, y_test)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,\n", " 0, 1, 1, 1, 0, 1, 0, 0]),\n", " array([0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,\n", " 0, 1, 1, 0, 0, 1, 0, 0]))" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predictions = log_reg.predict(x_test)\n", "predictions, y_test" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[17, 1],\n", " [ 1, 11]])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "confusion_matrix(y_test, predictions)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9166666666666666" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "recall_score(y_test, predictions)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9166666666666666" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "precision_score(y_test, predictions)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "proba_virginica = log_reg.predict_proba(x_test)\n", "fpr_model, tpr_model, thresholds_model = roc_curve(y_test, proba_virginica[:,1])" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(fpr_model, tpr_model,label='our model')\n", "plt.plot([0,1],[0,1],label='random')\n", "plt.plot([0,0,1,1],[0,1,1,1],label='perfect')\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9907407407407408" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "auc_model = roc_auc_score(y_test, proba_virginica[:,1])\n", "auc_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Learn more about the logistic regression function and options at https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }