{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Class Likelihood Ratios to measure classification performance\n\nThis example demonstrates the :func:`~sklearn.metrics.class_likelihood_ratios`\nfunction, which computes the positive and negative likelihood ratios (`LR+`,\n`LR-`) to assess the predictive power of a binary classifier. As we will see,\nthese metrics are independent of the proportion between classes in the test set,\nwhich makes them very useful when the available data for a study has a different\nclass proportion than the target application.\n\nA typical use is a case-control study in medicine, which has nearly balanced\nclasses while the general population has large class imbalance. In such\napplication, the pre-test probability of an individual having the target\ncondition can be chosen to be the prevalence, i.e. the proportion of a\nparticular population found to be affected by a medical condition. The post-test\nprobabilities represent then the probability that the condition is truly present\ngiven a positive test result.\n\nIn this example we first discuss the link between pre-test and post-test odds\ngiven by the `class_likelihood_ratios`. Then we evaluate their behavior in\nsome controlled scenarios. In the last section we plot them as a function of the\nprevalence of the positive class.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pre-test vs. post-test analysis\n\nSuppose we have a population of subjects with physiological measurements `X`\nthat can hopefully serve as indirect bio-markers of the disease and actual\ndisease indicators `y` (ground truth). Most of the people in the population do\nnot carry the disease but a minority (in this case around 10%) does:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.datasets import make_classification\n\nX, y = make_classification(n_samples=10_000, weights=[0.9, 0.1], random_state=0)\nprint(f\"Percentage of people carrying the disease: {100*y.mean():.2f}%\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A machine learning model is built to diagnose if a person with some given\nphysiological measurements is likely to carry the disease of interest. To\nevaluate the model, we need to assess its performance on a held-out test set:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can fit our diagnosis model and compute the positive likelihood\nratio to evaluate the usefulness of this classifier as a disease diagnosis\ntool:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression\nfrom sklearn.metrics import class_likelihood_ratios\n\nestimator = LogisticRegression().fit(X_train, y_train)\ny_pred = estimator.predict(X_test)\npos_LR, neg_LR = class_likelihood_ratios(y_test, y_pred)\nprint(f\"LR+: {pos_LR:.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the positive class likelihood ratio is much larger than 1.0, it means\nthat the machine learning-based diagnosis tool is useful: the post-test odds\nthat the condition is truly present given a positive test result are more than\n12 times larger than the pre-test odds.\n\n## Cross-validation of likelihood ratios\n\nWe assess the variability of the measurements for the class likelihood ratios\nin some particular cases.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n\n\ndef scoring(estimator, X, y):\n y_pred = estimator.predict(X)\n pos_lr, neg_lr = class_likelihood_ratios(y, y_pred, raise_warning=False)\n return {\"positive_likelihood_ratio\": pos_lr, \"negative_likelihood_ratio\": neg_lr}\n\n\ndef extract_score(cv_results):\n lr = pd.DataFrame(\n {\n \"positive\": cv_results[\"test_positive_likelihood_ratio\"],\n \"negative\": cv_results[\"test_negative_likelihood_ratio\"],\n }\n )\n return lr.aggregate([\"mean\", \"std\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first validate the :class:`~sklearn.linear_model.LogisticRegression` model\nwith default hyperparameters as used in the previous section.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.model_selection import cross_validate\n\nestimator = LogisticRegression()\nextract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We confirm that the model is useful: the post-test odds are between 12 and 20\ntimes larger than the pre-test odds.\n\nOn the contrary, let's consider a dummy model that will output random\npredictions with similar odds as the average disease prevalence in the\ntraining set:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.dummy import DummyClassifier\n\nestimator = DummyClassifier(strategy=\"stratified\", random_state=1234)\nextract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here both class likelihood ratios are compatible with 1.0 which makes this\nclassifier useless as a diagnostic tool to improve disease detection.\n\nAnother option for the dummy model is to always predict the most frequent\nclass, which in this case is \"no-disease\".\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "estimator = DummyClassifier(strategy=\"most_frequent\")\nextract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The absence of positive predictions means there will be no true positives nor\nfalse positives, leading to an undefined `LR+` that by no means should be\ninterpreted as an infinite `LR+` (the classifier perfectly identifying\npositive cases). In such situation the\n:func:`~sklearn.metrics.class_likelihood_ratios` function returns `nan` and\nraises a warning by default. Indeed, the value of `LR-` helps us discard this\nmodel.\n\nA similar scenario may arise when cross-validating highly imbalanced data with\nfew samples: some folds will have no samples with the disease and therefore\nthey will output no true positives nor false negatives when used for testing.\nMathematically this leads to an infinite `LR+`, which should also not be\ninterpreted as the model perfectly identifying positive cases. Such event\nleads to a higher variance of the estimated likelihood ratios, but can still\nbe interpreted as an increment of the post-test odds of having the condition.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "estimator = LogisticRegression()\nX, y = make_classification(n_samples=300, weights=[0.9, 0.1], random_state=0)\nextract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Invariance with respect to prevalence\n\nThe likelihood ratios are independent of the disease prevalence and can be\nextrapolated between populations regardless of any possible class imbalance,\n**as long as the same model is applied to all of them**. Notice that in the\nplots below **the decision boundary is constant** (see\n`sphx_glr_auto_examples_svm_plot_separating_hyperplane_unbalanced.py` for\na study of the boundary decision for unbalanced classes).\n\nHere we train a :class:`~sklearn.linear_model.LogisticRegression` base model\non a case-control study with a prevalence of 50%. It is then evaluated over\npopulations with varying prevalence. We use the\n:func:`~sklearn.datasets.make_classification` function to ensure the\ndata-generating process is always the same as shown in the plots below. The\nlabel `1` corresponds to the positive class \"disease\", whereas the label `0`\nstands for \"no-disease\".\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from collections import defaultdict\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom sklearn.inspection import DecisionBoundaryDisplay\n\npopulations = defaultdict(list)\ncommon_params = {\n \"n_samples\": 10_000,\n \"n_features\": 2,\n \"n_informative\": 2,\n \"n_redundant\": 0,\n \"random_state\": 0,\n}\nweights = np.linspace(0.1, 0.8, 6)\nweights = weights[::-1]\n\n# fit and evaluate base model on balanced classes\nX, y = make_classification(**common_params, weights=[0.5, 0.5])\nestimator = LogisticRegression().fit(X, y)\nlr_base = extract_score(cross_validate(estimator, X, y, scoring=scoring, cv=10))\npos_lr_base, pos_lr_base_std = lr_base[\"positive\"].values\nneg_lr_base, neg_lr_base_std = lr_base[\"negative\"].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now show the decision boundary for each level of prevalence. Note that\nwe only plot a subset of the original data to better assess the linear model\ndecision boundary.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, axs = plt.subplots(nrows=3, ncols=2, figsize=(15, 12))\n\nfor ax, (n, weight) in zip(axs.ravel(), enumerate(weights)):\n X, y = make_classification(\n **common_params,\n weights=[weight, 1 - weight],\n )\n prevalence = y.mean()\n populations[\"prevalence\"].append(prevalence)\n populations[\"X\"].append(X)\n populations[\"y\"].append(y)\n\n # down-sample for plotting\n rng = np.random.RandomState(1)\n plot_indices = rng.choice(np.arange(X.shape[0]), size=500, replace=True)\n X_plot, y_plot = X[plot_indices], y[plot_indices]\n\n # plot fixed decision boundary of base model with varying prevalence\n disp = DecisionBoundaryDisplay.from_estimator(\n estimator,\n X_plot,\n response_method=\"predict\",\n alpha=0.5,\n ax=ax,\n )\n scatter = disp.ax_.scatter(X_plot[:, 0], X_plot[:, 1], c=y_plot, edgecolor=\"k\")\n disp.ax_.set_title(f\"prevalence = {y_plot.mean():.2f}\")\n disp.ax_.legend(*scatter.legend_elements())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define a function for bootstrapping.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def scoring_on_bootstrap(estimator, X, y, rng, n_bootstrap=100):\n results_for_prevalence = defaultdict(list)\n for _ in range(n_bootstrap):\n bootstrap_indices = rng.choice(\n np.arange(X.shape[0]), size=X.shape[0], replace=True\n )\n for key, value in scoring(\n estimator, X[bootstrap_indices], y[bootstrap_indices]\n ).items():\n results_for_prevalence[key].append(value)\n return pd.DataFrame(results_for_prevalence)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We score the base model for each prevalence using bootstrapping.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "results = defaultdict(list)\nn_bootstrap = 100\nrng = np.random.default_rng(seed=0)\n\nfor prevalence, X, y in zip(\n populations[\"prevalence\"], populations[\"X\"], populations[\"y\"]\n):\n results_for_prevalence = scoring_on_bootstrap(\n estimator, X, y, rng, n_bootstrap=n_bootstrap\n )\n results[\"prevalence\"].append(prevalence)\n results[\"metrics\"].append(\n results_for_prevalence.aggregate([\"mean\", \"std\"]).unstack()\n )\n\nresults = pd.DataFrame(results[\"metrics\"], index=results[\"prevalence\"])\nresults.index.name = \"prevalence\"\nresults" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the plots below we observe that the class likelihood ratios re-computed\nwith different prevalences are indeed constant within one standard deviation\nof those computed with on balanced classes.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(15, 6))\nresults[\"positive_likelihood_ratio\"][\"mean\"].plot(\n ax=ax1, color=\"r\", label=\"extrapolation through populations\"\n)\nax1.axhline(y=pos_lr_base + pos_lr_base_std, color=\"r\", linestyle=\"--\")\nax1.axhline(\n y=pos_lr_base - pos_lr_base_std,\n color=\"r\",\n linestyle=\"--\",\n label=\"base model confidence band\",\n)\nax1.fill_between(\n results.index,\n results[\"positive_likelihood_ratio\"][\"mean\"]\n - results[\"positive_likelihood_ratio\"][\"std\"],\n results[\"positive_likelihood_ratio\"][\"mean\"]\n + results[\"positive_likelihood_ratio\"][\"std\"],\n color=\"r\",\n alpha=0.3,\n)\nax1.set(\n title=\"Positive likelihood ratio\",\n ylabel=\"LR+\",\n ylim=[0, 5],\n)\nax1.legend(loc=\"lower right\")\n\nax2 = results[\"negative_likelihood_ratio\"][\"mean\"].plot(\n ax=ax2, color=\"b\", label=\"extrapolation through populations\"\n)\nax2.axhline(y=neg_lr_base + neg_lr_base_std, color=\"b\", linestyle=\"--\")\nax2.axhline(\n y=neg_lr_base - neg_lr_base_std,\n color=\"b\",\n linestyle=\"--\",\n label=\"base model confidence band\",\n)\nax2.fill_between(\n results.index,\n results[\"negative_likelihood_ratio\"][\"mean\"]\n - results[\"negative_likelihood_ratio\"][\"std\"],\n results[\"negative_likelihood_ratio\"][\"mean\"]\n + results[\"negative_likelihood_ratio\"][\"std\"],\n color=\"b\",\n alpha=0.3,\n)\nax2.set(\n title=\"Negative likelihood ratio\",\n ylabel=\"LR-\",\n ylim=[0, 0.5],\n)\nax2.legend(loc=\"lower right\")\n\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }