{ "cells": [ { "attachments": { "image-3.png": { "image/png": "" }, "image-4.png": { "image/png": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "# Intro\n", "\n", "When we published the [GHOST paper](https://doi.org/10.1021/acs.jcim.1c00160) on shifting the decision boundary to improve the predictive performance of classification models built on imbalanced datasets, we only considered binary classifiers (e.g. active/inactive, soluble/insoluble, etc.). I was recently asked if the method could be extended to ternary (three-class) classifiers. This post is about doing that.\n", "\n", "The code here isn't set up for easy re-use at the moment. It will eventually find its way into the [open-source ghostml package](https://github.com/rinikerlab/GHOST) once we've had a chance to review and test it more thoroughly. \n", "\n", "> *Aside*: the ghostml package is now pip installable: `python -m pip install ghostml` to install it in your environment\n", "\n", "In order for this to make sense, I think I should start with some explanation of the way I've approached the problem:\n", "\n", "## Using thresholds in ternary problems\n", "\n", "Things are a bit more complicated here than with binary classifiers. For the binary case we just have a single threshold which determines whether an instance is predicted to be in class 0 or 1. So, assuming that we optimized based on the probability of class 1, we can formulate the decision as:\n", "```\n", "if probabilities[1] >= threshold:\n", " prediction = 1\n", "else:\n", " prediction = 0\n", "```\n", "Before doing any optimization `threshold` is equal to 0.5.\n", "\n", "For ternary predictions we have two different decision boundaries and there's no longer a simple threshold; instead the default decision rule can be expressed as:\n", "```\n", "prediction = argmax(probabilities)\n", "```\n", "i.e., the prediction is the class which has the highest predicted probability.\n", " \n", "> *Aside*: the same decision rule can be used for a binary classifier with the default threshold. It's just easier to explain using the threshold of 0.5.\n", " \n", "If we want to introduce two thresholds for the ternary classifier, and assuming that we optimize the thresholds for classes 0 and 2, we have to use a more complex decision rule:\n", "```\n", " if probabilities[0]>=thresholds[0]:\n", " # we might still be in class 2 if the relative probability of that\n", " # is larger than the probability of class 0\n", " if (probabilities[2]-thresholds[1])>(probabilities[0]-thresholds[0]):\n", " prediction = 2\n", " else:\n", " prediction = 0\n", " elif probabilities[2]>=thresholds[1]:\n", " prediction = 0\n", " else:\n", " prediction = 1\n", "```\n", "\n", "## Optimizing thresholds for ternary problems\n", "\n", "For the sake of this post let's assume that we're optimizing the thresholds for classes 0 and 2; we could also do 0 and 1, or 1 and 2, the results should still be the same.\n", "\n", "In this post I explore two different approaches for optimizing these thresholds.\n", "\n", "### Greedy optimization\n", "\n", "Here I optimize the two thresholds independently of each other by constructing two binary classification problems and optimizing the thresholds for those problems. Here's the process:\n", "\n", "1. Create a binary classification set by setting the training-set `y` values to 1 if the original value is 0 and to 0 otherwise. \n", "2. Use the original `ghostml` approach with that binary classification data and the predicted probabilities of each training point to be 0 in order to set `threshold0`, the threshold for the predicted probability of being 0.\n", "3. Create a binary classification set by setting the training-set `y` values to 1 if the original value is 2 and to 0 otherwise. \n", "4. Use the original `ghostml` approach with that binary classification data and the predicted probabilities of each training point to be 2 in order to set `threshold2`, the threshold for the predicted probability of being 2.\n", "\n", "Since the current `ghostml` code doesn't support using balanced accuracy for optimization, I just use kappa for the greedy optimization.\n", "\n", "### Grid search\n", "\n", "Explore the full grid of possible `(threshold0, threshold2)` pairs and pick the one which produces the optimal Cohen's kappa value. I also try a variant of this which optimizes balanced accuracy instead of Cohen's kappa.\n", "\n", "## TL;DR Results summary\n", "\n", "Both approaches work well with both simulated data and a couple of datasets from ChEMBL. There doesn't seem to be a large or consistent difference in the quality of the results generated with the two different methods. The greedy optimization approach is, however, quite a bit faster.\n", "\n", "Here's the improvement in three scoring metrics (kappa, balanced accuracy, and overall accuracy) when using the greedy optimization procedure on 50 simulated datasets with a 10-80-10 class split; the threshold shift improves both kappa and balanced accuracy on all datasets:\n", "![image-4.png](attachment:image-4.png)\n", "\n", "And here's the same plot for 20 different random stratified train/tests splits with target CHEMBL205 (carbonic anhydrase II) with activity thresholds chosen to give a 19-72-9 class split. Once again, the threshold shift improves predictive performance:\n", "![image-3.png](attachment:image-3.png)\n", "\n", "\n", "Note: the original version of this [notebook](https://github.com/greglandrum/rdkit_blog/blob/master/notebooks/ghost_multiclass.ipynb) and the two CHEMBL data files ([file1](https://github.com/greglandrum/rdkit_blog/blob/master/data/target_CHEMBL205.csv.gz), [file2](https://github.com/greglandrum/rdkit_blog/blob/master/data/target_CHEMBL217.csv.gz)), are both in github in the [older rdkit blog repo](https://github.com/greglandrum/rdkit_blog).\n", "\n", "\n", "## Beyond ternary problems\n", "\n", "I put some thought into figuring out how to extend this to the general multi-class prediction case, but that turned out to be more difficult than I'd anticipated. If you have suggestions, ideally suggestions accompanied by code, please let me know in the comments!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now onto the code and more detailed exploration" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "from rdkit import Chem\n", "from rdkit.Chem import rdMolDescriptors\n", "from rdkit.Chem import rdFingerprintGenerator\n", "from rdkit.Chem import PandasTools\n", "# note that you can install ghost using pip: python -m pip install ghostml\n", "import ghostml\n", "import pandas as pd\n", "from sklearn import metrics\n", "import numpy as np\n", "\n", "\n", "%pylab inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Code we'll use" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def ternary_rebin(probs,thresholds):\n", " ''' returns a list of classifications based on the provided predicted probabilities and thresholds '''\n", " res = []\n", " for prob in probs:\n", " if prob[0]>=thresholds[0]:\n", " # we might still be in class 2 if the relative probability of that\n", " # is larger than the probability of class 0\n", " if (prob[2]-thresholds[1])>(prob[0]-thresholds[0]):\n", " res.append(2)\n", " else:\n", " res.append(0)\n", " elif prob[2]>=thresholds[1]:\n", " res.append(2)\n", " else:\n", " res.append(1)\n", " return res\n", "\n", "def run_ternary_oob_optimization(oob_probs, labels_train, thresholds, ThOpt_metrics = 'Kappa'):\n", " ''' does a grid search to optimize the decision thresholds for a ternary problem '''\n", " res = []\n", " tscores = []\n", " for t1 in thresholds:\n", " for t2 in thresholds:\n", " preds = ternary_rebin(oob_probs,(t1,t2))\n", " if ThOpt_metrics == 'Kappa':\n", " tgt = metrics.cohen_kappa_score(labels_train,preds)\n", " elif ThOpt_metrics == 'BalancedAccuracy':\n", " tgt = metrics.balanced_accuracy_score(labels_train,preds)\n", " elif ThOpt_metrics == 'F1':\n", " tgt = metrics.f1_score(labels_train,preds)\n", " tscores.append((np.round(tgt,3),(t1,t2)))\n", " tscores.sort(reverse=True)\n", " thresh = tscores[0][-1]\n", " return thresh\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.model_selection import train_test_split\n", "\n", "def run_ternary_experiment(X,y,accum,random_state=0):\n", " ''' experiment wrapper for the ternary bounds optimization '''\n", " n_classes = max(y)+1\n", " local = {}\n", " \n", " # --------------------\n", " # Train - test split\n", " X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, stratify = y, \n", " random_state=random_state)\n", "\n", " # --------------------\n", " # Train a RF classifier\n", " cls = RandomForestClassifier(n_estimators=500,max_depth=10,oob_score=True,n_jobs=8)\n", " cls.fit(X_train, y_train)\n", "\n", "\n", " # --------------------\n", " # Calculate the baseline accuracy values\n", " test_preds = cls.predict(X_test)\n", " test_probs = cls.predict_proba(X_test)\n", " kappa = metrics.cohen_kappa_score(y_test,test_preds)\n", " balanced = metrics.balanced_accuracy_score(y_test,test_preds)\n", " accuracy = metrics.accuracy_score(y_test,test_preds)\n", " confusion = metrics.confusion_matrix(y_test,test_preds,labels=list(set(y_test)))\n", " print('original')\n", " print(f'accuracy: {accuracy:.3f} balanced accuracy: {balanced:.3f} kappa: {kappa:.3f}')\n", " print(confusion)\n", " local['orig-accuracy'] = accuracy\n", " local['orig-balanced'] = balanced\n", " local['orig-kappa'] = kappa\n", " local['orig-confusion'] = confusion\n", " \n", " # --------------------\n", " # optimize the two thresholds individually\n", " thresholds = [0]*(n_classes-1)\n", " for i,clsv in enumerate((0,2)):\n", " d_tform = [1 if y==clsv else 0 for y in y_train]\n", " d_probs = [x[clsv] for x in cls.oob_decision_function_]\n", " thresholds[i] = ghostml.optimize_threshold_from_oob_predictions(d_tform,d_probs,thresholds=np.arange(0.05,1.0,0.05))\n", " local['thresholds'] = thresholds\n", " \n", " # calculate the accuracy values for those thresholds:\n", " test_preds = ternary_rebin(test_probs,thresholds)\n", " kappa = metrics.cohen_kappa_score(y_test,test_preds)\n", " balanced = metrics.balanced_accuracy_score(y_test,test_preds)\n", " accuracy = metrics.accuracy_score(y_test,test_preds)\n", " confusion = metrics.confusion_matrix(y_test,test_preds,labels=list(set(y_test)))\n", " print('rebalanced')\n", " print(f'thresholds: {thresholds}')\n", " print(f'accuracy: {accuracy:.3f} balanced accuracy: {balanced:.3f} kappa: {kappa:.3f}')\n", " print(confusion)\n", " local['shift-accuracy'] = accuracy\n", " local['shift-balanced'] = balanced\n", " local['shift-kappa'] = kappa\n", " local['shift-confusion'] = confusion\n", " \n", " \n", " # --------------------\n", " # grid-search optimization of the threshold values based on kappa\n", " thresholds = run_ternary_oob_optimization(cls.oob_decision_function_,y_train,\n", " thresholds=np.arange(0.05,1.00,0.05),\n", " ThOpt_metrics = 'Kappa')\n", " test_preds = ternary_rebin(test_probs,thresholds)\n", " kappa = metrics.cohen_kappa_score(y_test,test_preds)\n", " balanced = metrics.balanced_accuracy_score(y_test,test_preds)\n", " accuracy = metrics.accuracy_score(y_test,test_preds)\n", " confusion = metrics.confusion_matrix(y_test,test_preds,labels=list(set(y_test)))\n", " print('global kappa rebalanced')\n", " print(f'thresholds: {thresholds}')\n", " print(f'accuracy: {accuracy:.3f} balanced accuracy: {balanced:.3f} kappa: {kappa:.3f}')\n", " print(confusion)\n", " local['global-k-shift-accuracy'] = accuracy\n", " local['global-k-shift-balanced'] = balanced\n", " local['global-k-shift-kappa'] = kappa\n", " local['global-k-shift-confusion'] = confusion\n", " \n", " # --------------------\n", " # grid-search optimization of the threshold values based on the balanced accuracy\n", " thresholds = run_ternary_oob_optimization(cls.oob_decision_function_,y_train,\n", " thresholds=np.arange(0.05,1.00,0.05),\n", " ThOpt_metrics = 'BalancedAccuracy')\n", " test_preds = ternary_rebin(test_probs,thresholds)\n", " kappa = metrics.cohen_kappa_score(y_test,test_preds)\n", " balanced = metrics.balanced_accuracy_score(y_test,test_preds)\n", " accuracy = metrics.accuracy_score(y_test,test_preds)\n", " confusion = metrics.confusion_matrix(y_test,test_preds,labels=list(set(y_test)))\n", " print('global balanced_accuracy rebalanced')\n", " print(f'thresholds: {thresholds}')\n", " print(f'accuracy: {accuracy:.3f} balanced accuracy: {balanced:.3f} kappa: {kappa:.3f}')\n", " print(confusion)\n", " local['global-ba-shift-accuracy'] = accuracy\n", " local['global-ba-shift-balanced'] = balanced\n", " local['global-ba-shift-kappa'] = kappa\n", " local['global-ba-shift-confusion'] = confusion\n", " \n", " accum.append(local)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Synthetic datasets\n", "\n", "\n", "I will try out a couple of real datasets below, but I want to start by verifying that the process works with some synthetic datasest. Scikit-learn's [make_classification() function](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html) makes this really easy.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Try a 10-80-10 split\n", "\n", "I will test this with multiple different forms of imbalance, just to be sure that it generalizes. Let's start with an example where the majority class is in the middle:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------\n", "original\n", "accuracy: 0.865 balanced accuracy: 0.569 kappa: 0.483\n", "[[ 54 69 1]\n", " [ 1 950 1]\n", " [ 4 86 34]]\n", "rebalanced\n", "thresholds: [0.25, 0.2]\n", "accuracy: 0.893 balanced accuracy: 0.763 kappa: 0.682\n", "[[ 77 42 5]\n", " [ 14 906 32]\n", " [ 7 28 89]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.893 balanced accuracy: 0.763 kappa: 0.682\n", "[[ 77 42 5]\n", " [ 14 906 32]\n", " [ 7 28 89]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.844 balanced accuracy: 0.820 kappa: 0.619\n", "[[100 19 5]\n", " [ 66 814 72]\n", " [ 10 15 99]]\n", "--------------\n", "original\n", "accuracy: 0.885 balanced accuracy: 0.628 kappa: 0.584\n", "[[ 49 67 7]\n", " [ 1 953 0]\n", " [ 7 56 60]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.920 balanced accuracy: 0.775 kappa: 0.750\n", "[[ 93 26 4]\n", " [ 11 939 4]\n", " [ 16 35 72]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.918 balanced accuracy: 0.798 kappa: 0.756\n", "[[101 18 4]\n", " [ 22 927 5]\n", " [ 19 30 74]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.15000000000000002)\n", "accuracy: 0.913 balanced accuracy: 0.827 kappa: 0.753\n", "[[ 94 18 11]\n", " [ 21 908 25]\n", " [ 9 20 94]]\n", "--------------\n", "original\n", "accuracy: 0.866 balanced accuracy: 0.565 kappa: 0.476\n", "[[ 61 61 0]\n", " [ 2 954 0]\n", " [ 4 94 24]]\n", "rebalanced\n", "thresholds: [0.25, 0.2]\n", "accuracy: 0.899 balanced accuracy: 0.753 kappa: 0.690\n", "[[ 81 29 12]\n", " [ 19 921 16]\n", " [ 5 40 77]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.899 balanced accuracy: 0.753 kappa: 0.690\n", "[[ 81 29 12]\n", " [ 19 921 16]\n", " [ 5 40 77]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.821 balanced accuracy: 0.792 kappa: 0.571\n", "[[102 9 11]\n", " [126 797 33]\n", " [ 12 24 86]]\n", "--------------\n", "original\n", "accuracy: 0.878 balanced accuracy: 0.603 kappa: 0.542\n", "[[ 67 51 4]\n", " [ 1 955 0]\n", " [ 4 86 32]]\n", "rebalanced\n", "thresholds: [0.25, 0.2]\n", "accuracy: 0.893 balanced accuracy: 0.803 kappa: 0.700\n", "[[ 95 14 13]\n", " [ 13 892 51]\n", " [ 6 31 85]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.893 balanced accuracy: 0.803 kappa: 0.700\n", "[[ 95 14 13]\n", " [ 13 892 51]\n", " [ 6 31 85]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.845 balanced accuracy: 0.823 kappa: 0.624\n", "[[109 2 11]\n", " [ 55 817 84]\n", " [ 14 20 88]]\n", "--------------\n", "original\n", "accuracy: 0.868 balanced accuracy: 0.570 kappa: 0.484\n", "[[ 65 57 0]\n", " [ 1 954 1]\n", " [ 3 97 22]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.910 balanced accuracy: 0.765 kappa: 0.715\n", "[[ 95 23 4]\n", " [ 17 931 8]\n", " [ 5 51 66]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.910 balanced accuracy: 0.765 kappa: 0.715\n", "[[ 95 23 4]\n", " [ 17 931 8]\n", " [ 5 51 66]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.858 balanced accuracy: 0.833 kappa: 0.645\n", "[[106 8 8]\n", " [ 54 831 71]\n", " [ 5 24 93]]\n", "--------------\n", "original\n", "accuracy: 0.868 balanced accuracy: 0.574 kappa: 0.503\n", "[[ 24 87 12]\n", " [ 0 953 1]\n", " [ 2 56 65]]\n", "rebalanced\n", "thresholds: [0.2, 0.3]\n", "accuracy: 0.885 balanced accuracy: 0.699 kappa: 0.637\n", "[[ 64 49 10]\n", " [ 26 923 5]\n", " [ 13 35 75]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.886 balanced accuracy: 0.718 kappa: 0.649\n", "[[ 60 47 16]\n", " [ 26 916 12]\n", " [ 8 28 87]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.775 balanced accuracy: 0.785 kappa: 0.515\n", "[[ 93 11 19]\n", " [168 735 51]\n", " [ 15 6 102]]\n", "--------------\n", "original\n", "accuracy: 0.874 balanced accuracy: 0.601 kappa: 0.533\n", "[[ 57 67 0]\n", " [ 1 949 2]\n", " [ 6 75 43]]\n", "rebalanced\n", "thresholds: [0.3, 0.2]\n", "accuracy: 0.897 balanced accuracy: 0.751 kappa: 0.686\n", "[[ 74 43 7]\n", " [ 14 917 21]\n", " [ 7 31 86]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.2)\n", "accuracy: 0.897 balanced accuracy: 0.751 kappa: 0.686\n", "[[ 74 43 7]\n", " [ 14 917 21]\n", " [ 7 31 86]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.864 balanced accuracy: 0.798 kappa: 0.647\n", "[[ 96 23 5]\n", " [ 63 851 38]\n", " [ 17 17 90]]\n", "--------------\n", "original\n", "accuracy: 0.851 balanced accuracy: 0.530 kappa: 0.423\n", "[[ 32 81 10]\n", " [ 1 948 7]\n", " [ 4 76 41]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.877 balanced accuracy: 0.694 kappa: 0.613\n", "[[ 61 48 14]\n", " [ 7 916 33]\n", " [ 4 41 76]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.877 balanced accuracy: 0.722 kappa: 0.630\n", "[[ 74 37 12]\n", " [ 19 904 33]\n", " [ 8 38 75]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.844 balanced accuracy: 0.744 kappa: 0.583\n", "[[ 79 28 16]\n", " [ 37 849 70]\n", " [ 7 29 85]]\n", "--------------\n", "original\n", "accuracy: 0.885 balanced accuracy: 0.632 kappa: 0.582\n", "[[ 64 58 2]\n", " [ 2 951 0]\n", " [ 5 71 47]]\n", "rebalanced\n", "thresholds: [0.3, 0.25]\n", "accuracy: 0.909 balanced accuracy: 0.758 kappa: 0.715\n", "[[ 77 40 7]\n", " [ 4 931 18]\n", " [ 7 33 83]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.906 balanced accuracy: 0.758 kappa: 0.709\n", "[[ 81 36 7]\n", " [ 9 926 18]\n", " [ 10 33 80]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.848 balanced accuracy: 0.807 kappa: 0.623\n", "[[ 92 19 13]\n", " [ 43 825 85]\n", " [ 12 11 100]]\n", "--------------\n", "original\n", "accuracy: 0.895 balanced accuracy: 0.671 kappa: 0.628\n", "[[ 67 54 1]\n", " [ 6 949 0]\n", " [ 3 62 58]]\n", "rebalanced\n", "thresholds: [0.3, 0.3]\n", "accuracy: 0.924 balanced accuracy: 0.812 kappa: 0.767\n", "[[ 96 24 2]\n", " [ 20 930 5]\n", " [ 4 36 83]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.3)\n", "accuracy: 0.924 balanced accuracy: 0.812 kappa: 0.767\n", "[[ 96 24 2]\n", " [ 20 930 5]\n", " [ 4 36 83]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.885 balanced accuracy: 0.900 kappa: 0.715\n", "[[112 7 3]\n", " [ 62 839 54]\n", " [ 4 8 111]]\n", "--------------\n", "original\n", "accuracy: 0.867 balanced accuracy: 0.570 kappa: 0.488\n", "[[ 52 71 1]\n", " [ 1 952 0]\n", " [ 5 82 36]]\n", "rebalanced\n", "thresholds: [0.25, 0.2]\n", "accuracy: 0.887 balanced accuracy: 0.739 kappa: 0.656\n", "[[ 81 39 4]\n", " [ 20 909 24]\n", " [ 6 42 75]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.887 balanced accuracy: 0.739 kappa: 0.656\n", "[[ 81 39 4]\n", " [ 20 909 24]\n", " [ 6 42 75]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.728 balanced accuracy: 0.780 kappa: 0.452\n", "[[106 12 6]\n", " [137 672 144]\n", " [ 15 12 96]]\n", "--------------\n", "original\n", "accuracy: 0.884 balanced accuracy: 0.629 kappa: 0.577\n", "[[ 57 64 3]\n", " [ 1 951 1]\n", " [ 3 67 53]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.907 balanced accuracy: 0.778 kappa: 0.715\n", "[[ 86 37 1]\n", " [ 18 919 16]\n", " [ 8 32 83]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.889 balanced accuracy: 0.780 kappa: 0.680\n", "[[ 83 36 5]\n", " [ 18 894 41]\n", " [ 6 27 90]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.809 balanced accuracy: 0.800 kappa: 0.554\n", "[[ 94 25 5]\n", " [ 72 775 106]\n", " [ 7 14 102]]\n", "--------------\n", "original\n", "accuracy: 0.866 balanced accuracy: 0.573 kappa: 0.489\n", "[[ 50 70 3]\n", " [ 2 950 2]\n", " [ 4 80 39]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.888 balanced accuracy: 0.724 kappa: 0.655\n", "[[ 82 33 8]\n", " [ 23 917 14]\n", " [ 11 45 67]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.2)\n", "accuracy: 0.882 balanced accuracy: 0.724 kappa: 0.642\n", "[[ 68 41 14]\n", " [ 9 909 36]\n", " [ 4 37 82]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.831 balanced accuracy: 0.780 kappa: 0.583\n", "[[ 99 15 9]\n", " [ 70 814 70]\n", " [ 15 24 84]]\n", "--------------\n", "original\n", "accuracy: 0.881 balanced accuracy: 0.621 kappa: 0.558\n", "[[ 61 60 2]\n", " [ 1 951 4]\n", " [ 1 75 45]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.907 balanced accuracy: 0.796 kappa: 0.720\n", "[[ 88 31 4]\n", " [ 25 913 18]\n", " [ 3 31 87]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.907 balanced accuracy: 0.796 kappa: 0.720\n", "[[ 88 31 4]\n", " [ 25 913 18]\n", " [ 3 31 87]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.847 balanced accuracy: 0.819 kappa: 0.621\n", "[[ 96 14 13]\n", " [ 73 821 62]\n", " [ 5 17 99]]\n", "--------------\n", "original\n", "accuracy: 0.877 balanced accuracy: 0.606 kappa: 0.546\n", "[[ 52 67 4]\n", " [ 0 952 2]\n", " [ 5 69 49]]\n", "rebalanced\n", "thresholds: [0.3, 0.25]\n", "accuracy: 0.898 balanced accuracy: 0.773 kappa: 0.695\n", "[[ 76 40 7]\n", " [ 18 910 26]\n", " [ 4 27 92]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.25)\n", "accuracy: 0.898 balanced accuracy: 0.773 kappa: 0.695\n", "[[ 76 40 7]\n", " [ 18 910 26]\n", " [ 4 27 92]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.828 balanced accuracy: 0.821 kappa: 0.595\n", "[[ 98 14 11]\n", " [ 73 792 89]\n", " [ 8 12 103]]\n", "--------------\n", "original\n", "accuracy: 0.858 balanced accuracy: 0.542 kappa: 0.440\n", "[[ 36 83 4]\n", " [ 1 953 0]\n", " [ 1 81 41]]\n", "rebalanced\n", "thresholds: [0.2, 0.25]\n", "accuracy: 0.885 balanced accuracy: 0.730 kappa: 0.649\n", "[[ 80 38 5]\n", " [ 38 910 6]\n", " [ 12 39 72]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.2)\n", "accuracy: 0.888 balanced accuracy: 0.753 kappa: 0.669\n", "[[ 78 37 8]\n", " [ 37 905 12]\n", " [ 11 29 83]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.730 balanced accuracy: 0.790 kappa: 0.459\n", "[[106 8 9]\n", " [191 671 92]\n", " [ 13 11 99]]\n", "--------------\n", "original\n", "accuracy: 0.892 balanced accuracy: 0.657 kappa: 0.622\n", "[[ 74 43 6]\n", " [ 2 951 0]\n", " [ 9 69 46]]\n", "rebalanced\n", "thresholds: [0.3, 0.25]\n", "accuracy: 0.917 balanced accuracy: 0.787 kappa: 0.749\n", "[[ 92 22 9]\n", " [ 6 929 18]\n", " [ 15 30 79]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.916 balanced accuracy: 0.796 kappa: 0.751\n", "[[ 98 18 7]\n", " [ 11 924 18]\n", " [ 18 29 77]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.873 balanced accuracy: 0.825 kappa: 0.677\n", "[[109 3 11]\n", " [ 32 852 69]\n", " [ 21 17 86]]\n", "--------------\n", "original\n", "accuracy: 0.879 balanced accuracy: 0.609 kappa: 0.544\n", "[[ 42 79 1]\n", " [ 1 953 0]\n", " [ 0 64 60]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.925 balanced accuracy: 0.787 kappa: 0.761\n", "[[ 80 37 5]\n", " [ 7 941 6]\n", " [ 2 33 89]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.2)\n", "accuracy: 0.932 balanced accuracy: 0.835 kappa: 0.796\n", "[[ 90 26 6]\n", " [ 12 931 11]\n", " [ 2 24 98]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.895 balanced accuracy: 0.850 kappa: 0.720\n", "[[ 96 19 7]\n", " [ 37 873 44]\n", " [ 5 14 105]]\n", "--------------\n", "original\n", "accuracy: 0.874 balanced accuracy: 0.601 kappa: 0.531\n", "[[ 50 67 5]\n", " [ 2 950 3]\n", " [ 2 72 49]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.900 balanced accuracy: 0.787 kappa: 0.706\n", "[[ 86 28 8]\n", " [ 14 907 34]\n", " [ 6 30 87]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.900 balanced accuracy: 0.787 kappa: 0.706\n", "[[ 86 28 8]\n", " [ 14 907 34]\n", " [ 6 30 87]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.858 balanced accuracy: 0.843 kappa: 0.649\n", "[[ 98 15 9]\n", " [ 50 825 80]\n", " [ 6 11 106]]\n", "--------------\n", "original\n", "accuracy: 0.902 balanced accuracy: 0.692 kappa: 0.661\n", "[[ 75 45 3]\n", " [ 3 951 2]\n", " [ 5 59 57]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.931 balanced accuracy: 0.813 kappa: 0.786\n", "[[103 17 3]\n", " [ 12 939 5]\n", " [ 8 38 75]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.928 balanced accuracy: 0.831 kappa: 0.785\n", "[[103 17 3]\n", " [ 12 928 16]\n", " [ 7 31 83]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.890 balanced accuracy: 0.875 kappa: 0.718\n", "[[115 3 5]\n", " [ 50 857 49]\n", " [ 8 17 96]]\n", "--------------\n", "original\n", "accuracy: 0.865 balanced accuracy: 0.574 kappa: 0.497\n", "[[ 59 57 8]\n", " [ 2 948 2]\n", " [ 5 88 31]]\n", "rebalanced\n", "thresholds: [0.3, 0.25]\n", "accuracy: 0.889 balanced accuracy: 0.715 kappa: 0.653\n", "[[ 81 31 12]\n", " [ 8 921 23]\n", " [ 7 52 65]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.889 balanced accuracy: 0.727 kappa: 0.663\n", "[[ 88 24 12]\n", " [ 14 916 22]\n", " [ 13 48 63]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.829 balanced accuracy: 0.769 kappa: 0.580\n", "[[ 95 13 16]\n", " [ 52 815 85]\n", " [ 15 24 85]]\n", "--------------\n", "original\n", "accuracy: 0.886 balanced accuracy: 0.632 kappa: 0.583\n", "[[ 35 79 8]\n", " [ 1 953 1]\n", " [ 1 47 75]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.920 balanced accuracy: 0.796 kappa: 0.756\n", "[[ 79 33 10]\n", " [ 13 931 11]\n", " [ 6 23 94]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.912 balanced accuracy: 0.826 kappa: 0.749\n", "[[ 87 25 10]\n", " [ 24 908 23]\n", " [ 7 16 100]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.2)\n", "accuracy: 0.873 balanced accuracy: 0.842 kappa: 0.677\n", "[[ 98 14 10]\n", " [ 62 846 47]\n", " [ 10 10 103]]\n", "--------------\n", "original\n", "accuracy: 0.855 balanced accuracy: 0.531 kappa: 0.416\n", "[[ 21 103 1]\n", " [ 0 953 0]\n", " [ 0 70 52]]\n", "rebalanced\n", "thresholds: [0.2, 0.25]\n", "accuracy: 0.893 balanced accuracy: 0.746 kappa: 0.676\n", "[[ 81 39 5]\n", " [ 26 914 13]\n", " [ 13 32 77]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.893 balanced accuracy: 0.746 kappa: 0.676\n", "[[ 81 39 5]\n", " [ 26 914 13]\n", " [ 13 32 77]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.764 balanced accuracy: 0.791 kappa: 0.500\n", "[[ 99 14 12]\n", " [124 717 112]\n", " [ 14 7 101]]\n", "--------------\n", "original\n", "accuracy: 0.906 balanced accuracy: 0.691 kappa: 0.673\n", "[[ 75 43 5]\n", " [ 0 956 0]\n", " [ 10 55 56]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.929 balanced accuracy: 0.818 kappa: 0.787\n", "[[ 96 17 10]\n", " [ 5 935 16]\n", " [ 13 24 84]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.929 balanced accuracy: 0.818 kappa: 0.787\n", "[[ 96 17 10]\n", " [ 5 935 16]\n", " [ 13 24 84]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.888 balanced accuracy: 0.824 kappa: 0.703\n", "[[101 8 14]\n", " [ 26 876 54]\n", " [ 14 18 89]]\n", "--------------\n", "original\n", "accuracy: 0.866 balanced accuracy: 0.569 kappa: 0.485\n", "[[ 39 79 6]\n", " [ 0 952 2]\n", " [ 1 73 48]]\n", "rebalanced\n", "thresholds: [0.3, 0.25]\n", "accuracy: 0.892 balanced accuracy: 0.698 kappa: 0.645\n", "[[ 56 55 13]\n", " [ 6 933 15]\n", " [ 3 38 81]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.887 balanced accuracy: 0.745 kappa: 0.662\n", "[[ 81 35 8]\n", " [ 33 906 15]\n", " [ 11 34 77]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.853 balanced accuracy: 0.802 kappa: 0.627\n", "[[ 89 22 13]\n", " [ 57 836 61]\n", " [ 7 16 99]]\n", "--------------\n", "original\n", "accuracy: 0.906 balanced accuracy: 0.700 kappa: 0.676\n", "[[ 67 45 9]\n", " [ 1 952 2]\n", " [ 3 53 68]]\n", "rebalanced\n", "thresholds: [0.3, 0.25]\n", "accuracy: 0.925 balanced accuracy: 0.807 kappa: 0.773\n", "[[ 79 26 16]\n", " [ 4 933 18]\n", " [ 4 22 98]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.926 balanced accuracy: 0.814 kappa: 0.778\n", "[[ 83 23 15]\n", " [ 8 931 16]\n", " [ 5 22 97]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.849 balanced accuracy: 0.825 kappa: 0.632\n", "[[ 95 13 13]\n", " [ 74 821 60]\n", " [ 11 10 103]]\n", "--------------\n", "original\n", "accuracy: 0.928 balanced accuracy: 0.767 kappa: 0.763\n", "[[ 81 38 4]\n", " [ 0 954 0]\n", " [ 4 40 79]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.939 balanced accuracy: 0.847 kappa: 0.818\n", "[[ 93 22 8]\n", " [ 12 935 7]\n", " [ 6 18 99]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.939 balanced accuracy: 0.847 kappa: 0.818\n", "[[ 93 22 8]\n", " [ 12 935 7]\n", " [ 6 18 99]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.900 balanced accuracy: 0.871 kappa: 0.740\n", "[[102 10 11]\n", " [ 39 871 44]\n", " [ 7 9 107]]\n", "--------------\n", "original\n", "accuracy: 0.862 balanced accuracy: 0.556 kappa: 0.468\n", "[[ 59 63 2]\n", " [ 3 951 0]\n", " [ 7 91 24]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.891 balanced accuracy: 0.710 kappa: 0.650\n", "[[ 91 29 4]\n", " [ 21 926 7]\n", " [ 11 59 52]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.893 balanced accuracy: 0.738 kappa: 0.671\n", "[[ 89 28 7]\n", " [ 20 918 16]\n", " [ 9 48 65]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.1)\n", "accuracy: 0.782 balanced accuracy: 0.791 kappa: 0.521\n", "[[ 96 10 18]\n", " [ 54 742 158]\n", " [ 9 13 100]]\n", "--------------\n", "original\n", "accuracy: 0.866 balanced accuracy: 0.570 kappa: 0.488\n", "[[ 41 76 5]\n", " [ 2 952 2]\n", " [ 5 71 46]]\n", "rebalanced\n", "thresholds: [0.2, 0.25]\n", "accuracy: 0.906 balanced accuracy: 0.794 kappa: 0.723\n", "[[ 92 22 8]\n", " [ 12 913 31]\n", " [ 11 29 82]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.906 balanced accuracy: 0.794 kappa: 0.723\n", "[[ 92 22 8]\n", " [ 12 913 31]\n", " [ 11 29 82]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.863 balanced accuracy: 0.824 kappa: 0.654\n", "[[ 95 14 13]\n", " [ 27 842 87]\n", " [ 10 13 99]]\n", "--------------\n", "original\n", "accuracy: 0.887 balanced accuracy: 0.632 kappa: 0.582\n", "[[ 38 80 3]\n", " [ 0 954 2]\n", " [ 2 49 72]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.899 balanced accuracy: 0.805 kappa: 0.708\n", "[[ 82 35 4]\n", " [ 38 899 19]\n", " [ 4 21 98]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.899 balanced accuracy: 0.805 kappa: 0.708\n", "[[ 82 35 4]\n", " [ 38 899 19]\n", " [ 4 21 98]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.833 balanced accuracy: 0.840 kappa: 0.605\n", "[[ 98 17 6]\n", " [104 794 58]\n", " [ 4 11 108]]\n", "--------------\n", "original\n", "accuracy: 0.898 balanced accuracy: 0.686 kappa: 0.657\n", "[[ 56 53 14]\n", " [ 2 947 5]\n", " [ 6 42 75]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.917 balanced accuracy: 0.795 kappa: 0.754\n", "[[ 84 25 14]\n", " [ 10 927 17]\n", " [ 11 22 90]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.917 balanced accuracy: 0.802 kappa: 0.757\n", "[[ 79 23 21]\n", " [ 10 923 21]\n", " [ 9 16 98]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.874 balanced accuracy: 0.808 kappa: 0.675\n", "[[ 84 13 26]\n", " [ 33 862 59]\n", " [ 11 9 103]]\n", "--------------\n", "original\n", "accuracy: 0.893 balanced accuracy: 0.660 kappa: 0.619\n", "[[ 72 48 2]\n", " [ 3 952 1]\n", " [ 6 68 48]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.924 balanced accuracy: 0.820 kappa: 0.773\n", "[[110 9 3]\n", " [ 19 927 10]\n", " [ 13 37 72]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.924 balanced accuracy: 0.842 kappa: 0.781\n", "[[107 8 7]\n", " [ 19 918 19]\n", " [ 10 28 84]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.874 balanced accuracy: 0.869 kappa: 0.689\n", "[[113 1 8]\n", " [ 58 838 60]\n", " [ 11 13 98]]\n", "--------------\n", "original\n", "accuracy: 0.884 balanced accuracy: 0.630 kappa: 0.574\n", "[[ 60 61 2]\n", " [ 1 951 2]\n", " [ 1 72 50]]\n", "rebalanced\n", "thresholds: [0.25, 0.2]\n", "accuracy: 0.908 balanced accuracy: 0.761 kappa: 0.714\n", "[[ 84 27 12]\n", " [ 14 929 11]\n", " [ 3 43 77]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.908 balanced accuracy: 0.761 kappa: 0.714\n", "[[ 84 27 12]\n", " [ 14 929 11]\n", " [ 3 43 77]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.1)\n", "accuracy: 0.825 balanced accuracy: 0.820 kappa: 0.590\n", "[[ 96 13 14]\n", " [ 54 789 111]\n", " [ 4 14 105]]\n", "--------------\n", "original\n", "accuracy: 0.914 balanced accuracy: 0.729 kappa: 0.713\n", "[[ 79 31 12]\n", " [ 0 951 4]\n", " [ 2 54 67]]\n", "rebalanced\n", "thresholds: [0.3, 0.3]\n", "accuracy: 0.936 balanced accuracy: 0.807 kappa: 0.799\n", "[[ 90 19 13]\n", " [ 2 948 5]\n", " [ 4 34 85]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.939 balanced accuracy: 0.839 kappa: 0.817\n", "[[ 94 15 13]\n", " [ 3 939 13]\n", " [ 4 25 94]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.908 balanced accuracy: 0.855 kappa: 0.751\n", "[[100 8 14]\n", " [ 30 890 35]\n", " [ 5 18 100]]\n", "--------------\n", "original\n", "accuracy: 0.867 balanced accuracy: 0.568 kappa: 0.487\n", "[[ 53 62 7]\n", " [ 2 954 0]\n", " [ 3 86 33]]\n", "rebalanced\n", "thresholds: [0.25, 0.2]\n", "accuracy: 0.897 balanced accuracy: 0.730 kappa: 0.676\n", "[[ 80 28 14]\n", " [ 16 927 13]\n", " [ 8 45 69]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.897 balanced accuracy: 0.730 kappa: 0.676\n", "[[ 80 28 14]\n", " [ 16 927 13]\n", " [ 8 45 69]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.859 balanced accuracy: 0.800 kappa: 0.638\n", "[[ 98 10 14]\n", " [ 72 846 38]\n", " [ 12 23 87]]\n", "--------------\n", "original\n", "accuracy: 0.882 balanced accuracy: 0.620 kappa: 0.560\n", "[[ 36 86 1]\n", " [ 0 953 1]\n", " [ 0 53 70]]\n", "rebalanced\n", "thresholds: [0.2, 0.3]\n", "accuracy: 0.892 balanced accuracy: 0.750 kappa: 0.671\n", "[[ 76 44 3]\n", " [ 26 912 16]\n", " [ 7 33 83]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.3)\n", "accuracy: 0.892 balanced accuracy: 0.750 kappa: 0.671\n", "[[ 76 44 3]\n", " [ 26 912 16]\n", " [ 7 33 83]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.835 balanced accuracy: 0.810 kappa: 0.595\n", "[[ 88 28 7]\n", " [ 72 807 75]\n", " [ 2 14 107]]\n", "--------------\n", "original\n", "accuracy: 0.899 balanced accuracy: 0.674 kappa: 0.644\n", "[[ 72 48 3]\n", " [ 1 953 0]\n", " [ 5 64 54]]\n", "rebalanced\n", "thresholds: [0.3, 0.3]\n", "accuracy: 0.925 balanced accuracy: 0.803 kappa: 0.769\n", "[[ 97 21 5]\n", " [ 12 934 8]\n", " [ 7 37 79]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.919 balanced accuracy: 0.846 kappa: 0.770\n", "[[107 11 5]\n", " [ 22 908 24]\n", " [ 7 28 88]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.15000000000000002)\n", "accuracy: 0.875 balanced accuracy: 0.877 kappa: 0.693\n", "[[109 4 10]\n", " [ 31 834 89]\n", " [ 5 11 107]]\n", "--------------\n", "original\n", "accuracy: 0.895 balanced accuracy: 0.666 kappa: 0.629\n", "[[ 61 61 2]\n", " [ 2 951 1]\n", " [ 7 53 62]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.933 balanced accuracy: 0.819 kappa: 0.796\n", "[[ 92 29 3]\n", " [ 11 939 4]\n", " [ 13 20 89]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.923 balanced accuracy: 0.831 kappa: 0.776\n", "[[ 91 28 5]\n", " [ 10 920 24]\n", " [ 11 14 97]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.15000000000000002)\n", "accuracy: 0.894 balanced accuracy: 0.826 kappa: 0.712\n", "[[ 92 25 7]\n", " [ 23 882 49]\n", " [ 12 11 99]]\n", "--------------\n", "original\n", "accuracy: 0.858 balanced accuracy: 0.542 kappa: 0.435\n", "[[ 50 72 0]\n", " [ 0 953 1]\n", " [ 1 96 27]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.902 balanced accuracy: 0.758 kappa: 0.697\n", "[[ 88 30 4]\n", " [ 19 921 14]\n", " [ 9 42 73]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.889 balanced accuracy: 0.763 kappa: 0.675\n", "[[ 98 20 4]\n", " [ 38 902 14]\n", " [ 15 42 67]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.15000000000000002)\n", "accuracy: 0.847 balanced accuracy: 0.832 kappa: 0.627\n", "[[ 94 15 13]\n", " [ 37 814 103]\n", " [ 3 13 108]]\n", "--------------\n", "original\n", "accuracy: 0.863 balanced accuracy: 0.554 kappa: 0.458\n", "[[ 35 86 2]\n", " [ 1 954 0]\n", " [ 2 74 46]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.895 balanced accuracy: 0.776 kappa: 0.689\n", "[[ 87 34 2]\n", " [ 30 905 20]\n", " [ 9 31 82]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.895 balanced accuracy: 0.776 kappa: 0.689\n", "[[ 87 34 2]\n", " [ 30 905 20]\n", " [ 9 31 82]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.814 balanced accuracy: 0.818 kappa: 0.570\n", "[[106 13 4]\n", " [121 776 58]\n", " [ 10 17 95]]\n", "--------------\n", "original\n", "accuracy: 0.906 balanced accuracy: 0.703 kappa: 0.670\n", "[[ 89 33 0]\n", " [ 3 951 2]\n", " [ 0 75 47]]\n", "rebalanced\n", "thresholds: [0.3, 0.25]\n", "accuracy: 0.927 balanced accuracy: 0.810 kappa: 0.773\n", "[[101 18 3]\n", " [ 8 935 13]\n", " [ 6 40 76]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.921 balanced accuracy: 0.810 kappa: 0.760\n", "[[103 16 3]\n", " [ 16 927 13]\n", " [ 9 38 75]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.874 balanced accuracy: 0.838 kappa: 0.676\n", "[[107 10 5]\n", " [ 47 851 58]\n", " [ 11 20 91]]\n", "--------------\n", "original\n", "accuracy: 0.855 balanced accuracy: 0.538 kappa: 0.429\n", "[[ 51 70 2]\n", " [ 4 950 0]\n", " [ 3 95 25]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.892 balanced accuracy: 0.730 kappa: 0.660\n", "[[ 86 34 3]\n", " [ 20 919 15]\n", " [ 8 50 65]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.890 balanced accuracy: 0.756 kappa: 0.670\n", "[[ 84 33 6]\n", " [ 20 906 28]\n", " [ 6 39 78]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.861 balanced accuracy: 0.828 kappa: 0.649\n", "[[109 10 4]\n", " [ 57 835 62]\n", " [ 11 23 89]]\n", "--------------\n", "original\n", "accuracy: 0.882 balanced accuracy: 0.618 kappa: 0.564\n", "[[ 48 70 5]\n", " [ 2 953 0]\n", " [ 4 61 57]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.900 balanced accuracy: 0.735 kappa: 0.684\n", "[[ 81 36 6]\n", " [ 22 929 4]\n", " [ 11 41 70]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.25)\n", "accuracy: 0.902 balanced accuracy: 0.750 kappa: 0.695\n", "[[ 77 35 11]\n", " [ 22 925 8]\n", " [ 7 35 80]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.784 balanced accuracy: 0.789 kappa: 0.524\n", "[[ 98 9 16]\n", " [125 747 83]\n", " [ 13 13 96]]\n", "--------------\n", "original\n", "accuracy: 0.888 balanced accuracy: 0.638 kappa: 0.588\n", "[[ 53 67 3]\n", " [ 0 954 1]\n", " [ 0 63 59]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.919 balanced accuracy: 0.769 kappa: 0.740\n", "[[ 83 37 3]\n", " [ 10 941 4]\n", " [ 5 38 79]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.3)\n", "accuracy: 0.919 balanced accuracy: 0.769 kappa: 0.740\n", "[[ 83 37 3]\n", " [ 10 941 4]\n", " [ 5 38 79]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.867 balanced accuracy: 0.833 kappa: 0.660\n", "[[103 11 9]\n", " [ 56 842 57]\n", " [ 6 21 95]]\n", "--------------\n", "original\n", "accuracy: 0.848 balanced accuracy: 0.509 kappa: 0.385\n", "[[ 35 86 1]\n", " [ 1 953 0]\n", " [ 5 89 30]]\n", "rebalanced\n", "thresholds: [0.2, 0.2]\n", "accuracy: 0.883 balanced accuracy: 0.729 kappa: 0.642\n", "[[ 80 39 3]\n", " [ 27 908 19]\n", " [ 9 43 72]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.2)\n", "accuracy: 0.883 balanced accuracy: 0.729 kappa: 0.642\n", "[[ 80 39 3]\n", " [ 27 908 19]\n", " [ 9 43 72]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.835 balanced accuracy: 0.775 kappa: 0.580\n", "[[ 94 24 4]\n", " [ 87 822 45]\n", " [ 9 29 86]]\n", "--------------\n", "original\n", "accuracy: 0.868 balanced accuracy: 0.579 kappa: 0.500\n", "[[ 36 84 3]\n", " [ 1 951 2]\n", " [ 4 64 55]]\n", "rebalanced\n", "thresholds: [0.2, 0.25]\n", "accuracy: 0.897 balanced accuracy: 0.763 kappa: 0.687\n", "[[ 74 48 1]\n", " [ 23 912 19]\n", " [ 9 24 90]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.897 balanced accuracy: 0.763 kappa: 0.687\n", "[[ 74 48 1]\n", " [ 23 912 19]\n", " [ 9 24 90]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.860 balanced accuracy: 0.802 kappa: 0.635\n", "[[ 79 38 6]\n", " [ 42 845 67]\n", " [ 8 7 108]]\n", "--------------\n", "original\n", "accuracy: 0.856 balanced accuracy: 0.530 kappa: 0.417\n", "[[ 24 95 3]\n", " [ 1 955 0]\n", " [ 0 74 48]]\n", "rebalanced\n", "thresholds: [0.25, 0.25]\n", "accuracy: 0.901 balanced accuracy: 0.725 kappa: 0.678\n", "[[ 66 48 8]\n", " [ 9 935 12]\n", " [ 6 36 80]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.902 balanced accuracy: 0.761 kappa: 0.697\n", "[[ 81 37 4]\n", " [ 25 921 10]\n", " [ 10 32 80]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.857 balanced accuracy: 0.802 kappa: 0.634\n", "[[ 89 19 14]\n", " [ 47 842 67]\n", " [ 11 14 97]]\n", "--------------\n", "original\n", "accuracy: 0.866 balanced accuracy: 0.564 kappa: 0.482\n", "[[ 24 92 6]\n", " [ 0 954 1]\n", " [ 3 59 61]]\n", "rebalanced\n", "thresholds: [0.2, 0.25]\n", "accuracy: 0.907 balanced accuracy: 0.760 kappa: 0.715\n", "[[ 76 36 10]\n", " [ 14 929 12]\n", " [ 12 27 84]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.25)\n", "accuracy: 0.907 balanced accuracy: 0.760 kappa: 0.715\n", "[[ 76 36 10]\n", " [ 14 929 12]\n", " [ 12 27 84]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.1)\n", "accuracy: 0.827 balanced accuracy: 0.806 kappa: 0.590\n", "[[ 87 13 22]\n", " [ 44 798 113]\n", " [ 6 10 107]]\n", "--------------\n", "original\n", "accuracy: 0.888 balanced accuracy: 0.639 kappa: 0.592\n", "[[ 42 78 3]\n", " [ 1 953 0]\n", " [ 2 50 71]]\n", "rebalanced\n", "thresholds: [0.25, 0.3]\n", "accuracy: 0.910 balanced accuracy: 0.747 kappa: 0.709\n", "[[ 73 47 3]\n", " [ 11 937 6]\n", " [ 6 35 82]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.2)\n", "accuracy: 0.905 balanced accuracy: 0.769 kappa: 0.708\n", "[[ 69 46 8]\n", " [ 11 921 22]\n", " [ 3 24 96]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.811 balanced accuracy: 0.812 kappa: 0.565\n", "[[105 15 3]\n", " [147 773 34]\n", " [ 14 14 95]]\n", "--------------\n", "original\n", "accuracy: 0.913 balanced accuracy: 0.722 kappa: 0.704\n", "[[ 74 43 6]\n", " [ 1 953 1]\n", " [ 3 50 69]]\n", "rebalanced\n", "thresholds: [0.3, 0.3]\n", "accuracy: 0.927 balanced accuracy: 0.792 kappa: 0.771\n", "[[ 89 28 6]\n", " [ 6 943 6]\n", " [ 7 34 81]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.25)\n", "accuracy: 0.927 balanced accuracy: 0.804 kappa: 0.776\n", "[[ 87 27 9]\n", " [ 6 938 11]\n", " [ 5 29 88]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.15000000000000002)\n", "accuracy: 0.915 balanced accuracy: 0.855 kappa: 0.766\n", "[[ 97 14 12]\n", " [ 16 899 40]\n", " [ 7 13 102]]\n" ] } ], "source": [ "from sklearn.datasets import make_classification\n", "\n", "accum_10_80_10 = []\n", "\n", "for rep in range(50):\n", " print('--------------')\n", " # Generate a ternary imbalanced classification problem\n", " X, y = make_classification(n_samples=6000, n_features=20,\n", " n_informative=10, n_redundant=0, n_classes=3, \n", " random_state=0xf00d+rep, shuffle=False, weights = [0.1, 0.8, 0.1])\n", " run_ternary_experiment(X,y,accum_10_80_10)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start by comparing the model-performance metrics kappa, balanced accuracy, and accuracy between the model with the greedy threshold shift based on kappa and the model with \"default thresholds\"." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "accum = accum_10_80_10\n", "figsize(9,6)\n", "scatter([x['orig-kappa'] for x in accum],[x['shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['orig-balanced'] for x in accum],[x['shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['orig-accuracy'] for x in accum],[x['shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('orig')\n", "ylabel('greedy shift');\n", "title('10-80-10');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The shift improves all three metrics for every dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now compare the results for using a grid search based on Cohen's kappa to the greedy shift results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-k-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-k-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-k-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-kappa');\n", "title('10-80-10');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the changes are reasonably small, but they do tend to slightly favor the results of the grid search." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, do the equivalent plot comparing the result from using balanced accuracy in the grid search to the results from the greedy shift:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-ba-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-ba-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-ba-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-balanced');\n", "title('10-80-10');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That plot makes it look like doing the threshold shifts using balanced accuracy doesn't improve kappa, but it's important to remember that this comparing the balanced accuracy shift vs the kappa shift.\n", "\n", "Using balanced accuracy to do the shift instead of kappa does actually help kappa too, as this plot shows:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['orig-kappa'] for x in accum],[x['global-ba-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['orig-balanced'] for x in accum],[x['global-ba-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['orig-accuracy'] for x in accum],[x['global-ba-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('orig')\n", "ylabel('grid-balanced');\n", "title('10-80-10');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Still, with these datasets it looks like optimizing the threshold with kappa instead of balanced accuracy is a better idea." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0 is the majority class\n", "\n", "Now let's make sure that the code doesn't have some \"feature\" which causes it to only work with the middle class is the majority:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------\n", "original\n", "accuracy: 0.883 balanced accuracy: 0.623 kappa: 0.571\n", "[[953 0 1]\n", " [ 64 54 4]\n", " [ 67 4 53]]\n", "rebalanced\n", "thresholds: [0.6000000000000001, 0.3]\n", "accuracy: 0.911 balanced accuracy: 0.745 kappa: 0.723\n", "[[939 10 5]\n", " [ 30 76 16]\n", " [ 29 17 78]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.3)\n", "accuracy: 0.911 balanced accuracy: 0.745 kappa: 0.723\n", "[[939 10 5]\n", " [ 30 76 16]\n", " [ 29 17 78]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.3)\n", "accuracy: 0.812 balanced accuracy: 0.756 kappa: 0.556\n", "[[798 151 5]\n", " [ 9 97 16]\n", " [ 13 32 79]]\n", "--------------\n", "original\n", "accuracy: 0.875 balanced accuracy: 0.596 kappa: 0.526\n", "[[953 1 0]\n", " [ 71 51 1]\n", " [ 74 3 46]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.904 balanced accuracy: 0.778 kappa: 0.714\n", "[[916 30 8]\n", " [ 21 94 8]\n", " [ 37 11 75]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.25)\n", "accuracy: 0.904 balanced accuracy: 0.778 kappa: 0.714\n", "[[916 30 8]\n", " [ 21 94 8]\n", " [ 37 11 75]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.891 balanced accuracy: 0.796 kappa: 0.698\n", "[[890 55 9]\n", " [ 12 103 8]\n", " [ 26 21 76]]\n", "--------------\n", "original\n", "accuracy: 0.879 balanced accuracy: 0.607 kappa: 0.549\n", "[[954 0 0]\n", " [ 69 52 2]\n", " [ 68 6 49]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.905 balanced accuracy: 0.762 kappa: 0.710\n", "[[924 23 7]\n", " [ 32 79 12]\n", " [ 32 8 83]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.25)\n", "accuracy: 0.906 balanced accuracy: 0.736 kappa: 0.699\n", "[[936 13 5]\n", " [ 41 71 11]\n", " [ 37 6 80]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.868 balanced accuracy: 0.772 kappa: 0.645\n", "[[869 78 7]\n", " [ 23 88 12]\n", " [ 20 18 85]]\n", "--------------\n", "original\n", "accuracy: 0.892 balanced accuracy: 0.646 kappa: 0.608\n", "[[956 1 0]\n", " [ 51 64 7]\n", " [ 66 5 50]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.900 balanced accuracy: 0.760 kappa: 0.708\n", "[[920 29 8]\n", " [ 17 76 29]\n", " [ 22 15 84]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.905 balanced accuracy: 0.747 kappa: 0.711\n", "[[932 17 8]\n", " [ 22 71 29]\n", " [ 29 9 83]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.3)\n", "accuracy: 0.841 balanced accuracy: 0.746 kappa: 0.597\n", "[[844 108 5]\n", " [ 11 93 18]\n", " [ 14 35 72]]\n", "--------------\n", "original\n", "accuracy: 0.895 balanced accuracy: 0.674 kappa: 0.635\n", "[[948 4 3]\n", " [ 52 66 5]\n", " [ 55 7 60]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.906 balanced accuracy: 0.799 kappa: 0.731\n", "[[910 26 19]\n", " [ 18 88 17]\n", " [ 19 14 89]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.906 balanced accuracy: 0.799 kappa: 0.731\n", "[[910 26 19]\n", " [ 18 88 17]\n", " [ 19 14 89]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.873 balanced accuracy: 0.807 kappa: 0.672\n", "[[861 74 20]\n", " [ 11 95 17]\n", " [ 8 23 91]]\n", "--------------\n", "original\n", "accuracy: 0.832 balanced accuracy: 0.453 kappa: 0.271\n", "[[954 1 0]\n", " [100 21 1]\n", " [100 0 23]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.856 balanced accuracy: 0.664 kappa: 0.554\n", "[[898 33 24]\n", " [ 55 57 10]\n", " [ 41 10 72]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.25)\n", "accuracy: 0.856 balanced accuracy: 0.664 kappa: 0.554\n", "[[898 33 24]\n", " [ 55 57 10]\n", " [ 41 10 72]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.15000000000000002)\n", "accuracy: 0.749 balanced accuracy: 0.736 kappa: 0.450\n", "[[721 138 96]\n", " [ 28 74 20]\n", " [ 15 4 104]]\n", "--------------\n", "original\n", "accuracy: 0.868 balanced accuracy: 0.589 kappa: 0.513\n", "[[945 5 2]\n", " [ 71 51 3]\n", " [ 70 8 45]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.875 balanced accuracy: 0.693 kappa: 0.617\n", "[[911 24 17]\n", " [ 43 63 19]\n", " [ 38 9 76]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.875 balanced accuracy: 0.693 kappa: 0.617\n", "[[911 24 17]\n", " [ 43 63 19]\n", " [ 38 9 76]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.843 balanced accuracy: 0.714 kappa: 0.577\n", "[[857 77 18]\n", " [ 28 78 19]\n", " [ 27 20 76]]\n", "--------------\n", "original\n", "accuracy: 0.876 balanced accuracy: 0.596 kappa: 0.528\n", "[[955 1 0]\n", " [ 78 42 3]\n", " [ 64 3 54]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.912 balanced accuracy: 0.782 kappa: 0.732\n", "[[926 23 7]\n", " [ 35 76 12]\n", " [ 22 7 92]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.912 balanced accuracy: 0.782 kappa: 0.732\n", "[[926 23 7]\n", " [ 35 76 12]\n", " [ 22 7 92]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.875 balanced accuracy: 0.805 kappa: 0.670\n", "[[866 81 9]\n", " [ 20 91 12]\n", " [ 11 17 93]]\n", "--------------\n", "original\n", "accuracy: 0.874 balanced accuracy: 0.596 kappa: 0.531\n", "[[952 2 0]\n", " [ 79 36 8]\n", " [ 59 3 61]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.899 balanced accuracy: 0.771 kappa: 0.701\n", "[[912 28 14]\n", " [ 27 77 19]\n", " [ 32 1 90]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.25)\n", "accuracy: 0.902 balanced accuracy: 0.740 kappa: 0.694\n", "[[930 12 12]\n", " [ 39 65 19]\n", " [ 34 1 88]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.863 balanced accuracy: 0.794 kappa: 0.647\n", "[[852 86 16]\n", " [ 13 91 19]\n", " [ 16 15 92]]\n", "--------------\n", "original\n", "accuracy: 0.930 balanced accuracy: 0.792 kappa: 0.775\n", "[[947 2 7]\n", " [ 35 81 5]\n", " [ 31 4 88]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.3]\n", "accuracy: 0.918 balanced accuracy: 0.830 kappa: 0.765\n", "[[915 32 9]\n", " [ 22 84 15]\n", " [ 14 6 103]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.3)\n", "accuracy: 0.923 balanced accuracy: 0.820 kappa: 0.773\n", "[[926 21 9]\n", " [ 27 79 15]\n", " [ 16 4 103]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7000000000000001, 0.3)\n", "accuracy: 0.902 balanced accuracy: 0.828 kappa: 0.731\n", "[[894 53 9]\n", " [ 20 86 15]\n", " [ 11 9 103]]\n", "--------------\n", "original\n", "accuracy: 0.882 balanced accuracy: 0.627 kappa: 0.583\n", "[[950 3 1]\n", " [ 64 54 5]\n", " [ 53 15 55]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.894 balanced accuracy: 0.738 kappa: 0.687\n", "[[919 27 8]\n", " [ 28 66 29]\n", " [ 23 12 88]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.3)\n", "accuracy: 0.899 balanced accuracy: 0.755 kappa: 0.701\n", "[[919 29 6]\n", " [ 28 80 15]\n", " [ 23 20 80]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.35000000000000003)\n", "accuracy: 0.869 balanced accuracy: 0.747 kappa: 0.646\n", "[[881 70 3]\n", " [ 20 93 10]\n", " [ 14 40 69]]\n", "--------------\n", "original\n", "accuracy: 0.877 balanced accuracy: 0.609 kappa: 0.547\n", "[[951 2 2]\n", " [ 76 42 4]\n", " [ 58 5 60]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.898 balanced accuracy: 0.753 kappa: 0.692\n", "[[919 25 11]\n", " [ 35 70 17]\n", " [ 29 5 89]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.898 balanced accuracy: 0.753 kappa: 0.692\n", "[[919 25 11]\n", " [ 35 70 17]\n", " [ 29 5 89]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.875 balanced accuracy: 0.786 kappa: 0.661\n", "[[873 53 29]\n", " [ 22 80 20]\n", " [ 21 5 97]]\n", "--------------\n", "original\n", "accuracy: 0.868 balanced accuracy: 0.581 kappa: 0.504\n", "[[950 2 2]\n", " [ 72 48 3]\n", " [ 73 6 44]]\n", "rebalanced\n", "thresholds: [0.6000000000000001, 0.3]\n", "accuracy: 0.891 balanced accuracy: 0.704 kappa: 0.653\n", "[[929 15 10]\n", " [ 39 74 10]\n", " [ 42 15 66]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.3)\n", "accuracy: 0.891 balanced accuracy: 0.704 kappa: 0.653\n", "[[929 15 10]\n", " [ 39 74 10]\n", " [ 42 15 66]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.3)\n", "accuracy: 0.862 balanced accuracy: 0.741 kappa: 0.626\n", "[[873 71 10]\n", " [ 19 94 10]\n", " [ 22 34 67]]\n", "--------------\n", "original\n", "accuracy: 0.876 balanced accuracy: 0.595 kappa: 0.532\n", "[[955 1 0]\n", " [ 63 53 7]\n", " [ 75 3 43]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.914 balanced accuracy: 0.828 kappa: 0.757\n", "[[910 35 11]\n", " [ 11 95 17]\n", " [ 20 9 92]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.930 balanced accuracy: 0.825 kappa: 0.792\n", "[[933 13 10]\n", " [ 15 91 17]\n", " [ 23 6 92]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.897 balanced accuracy: 0.836 kappa: 0.725\n", "[[883 44 29]\n", " [ 9 89 25]\n", " [ 10 7 104]]\n", "--------------\n", "original\n", "accuracy: 0.875 balanced accuracy: 0.613 kappa: 0.542\n", "[[946 8 1]\n", " [ 69 52 1]\n", " [ 66 5 52]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.882 balanced accuracy: 0.739 kappa: 0.647\n", "[[902 36 17]\n", " [ 41 70 11]\n", " [ 31 6 86]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.882 balanced accuracy: 0.739 kappa: 0.647\n", "[[902 36 17]\n", " [ 41 70 11]\n", " [ 31 6 86]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.847 balanced accuracy: 0.767 kappa: 0.605\n", "[[842 74 39]\n", " [ 25 77 20]\n", " [ 14 12 97]]\n", "--------------\n", "original\n", "accuracy: 0.858 balanced accuracy: 0.537 kappa: 0.430\n", "[[955 0 0]\n", " [ 83 38 1]\n", " [ 85 1 37]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.2]\n", "accuracy: 0.887 balanced accuracy: 0.742 kappa: 0.667\n", "[[909 35 11]\n", " [ 30 75 17]\n", " [ 30 12 81]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.2)\n", "accuracy: 0.894 balanced accuracy: 0.716 kappa: 0.665\n", "[[929 16 10]\n", " [ 38 67 17]\n", " [ 39 7 77]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.873 balanced accuracy: 0.759 kappa: 0.649\n", "[[881 61 13]\n", " [ 23 82 17]\n", " [ 21 18 84]]\n", "--------------\n", "original\n", "accuracy: 0.893 balanced accuracy: 0.663 kappa: 0.626\n", "[[950 1 3]\n", " [ 36 80 6]\n", " [ 75 7 42]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.3]\n", "accuracy: 0.903 balanced accuracy: 0.760 kappa: 0.708\n", "[[923 16 15]\n", " [ 8 104 10]\n", " [ 51 16 57]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.3)\n", "accuracy: 0.907 balanced accuracy: 0.742 kappa: 0.703\n", "[[935 5 14]\n", " [ 16 96 10]\n", " [ 59 8 57]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.6500000000000001, 0.3)\n", "accuracy: 0.903 balanced accuracy: 0.760 kappa: 0.708\n", "[[923 16 15]\n", " [ 8 104 10]\n", " [ 51 16 57]]\n", "--------------\n", "original\n", "accuracy: 0.873 balanced accuracy: 0.592 kappa: 0.529\n", "[[951 1 2]\n", " [ 75 38 9]\n", " [ 60 6 58]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.909 balanced accuracy: 0.796 kappa: 0.738\n", "[[915 20 19]\n", " [ 26 75 21]\n", " [ 13 10 101]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.909 balanced accuracy: 0.796 kappa: 0.738\n", "[[915 20 19]\n", " [ 26 75 21]\n", " [ 13 10 101]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.864 balanced accuracy: 0.820 kappa: 0.662\n", "[[843 89 22]\n", " [ 9 92 21]\n", " [ 7 15 102]]\n", "--------------\n", "original\n", "accuracy: 0.890 balanced accuracy: 0.646 kappa: 0.603\n", "[[953 0 3]\n", " [ 73 42 6]\n", " [ 47 3 73]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.3]\n", "accuracy: 0.905 balanced accuracy: 0.776 kappa: 0.715\n", "[[919 22 15]\n", " [ 34 75 12]\n", " [ 23 8 92]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.902 balanced accuracy: 0.768 kappa: 0.705\n", "[[918 18 20]\n", " [ 34 68 19]\n", " [ 23 4 96]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.2)\n", "accuracy: 0.838 balanced accuracy: 0.777 kappa: 0.600\n", "[[827 94 35]\n", " [ 16 77 28]\n", " [ 11 10 102]]\n", "--------------\n", "original\n", "accuracy: 0.914 balanced accuracy: 0.721 kappa: 0.709\n", "[[955 0 1]\n", " [ 42 76 5]\n", " [ 45 10 66]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.3]\n", "accuracy: 0.937 balanced accuracy: 0.830 kappa: 0.810\n", "[[940 12 4]\n", " [ 13 100 10]\n", " [ 24 13 84]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.3)\n", "accuracy: 0.937 balanced accuracy: 0.830 kappa: 0.810\n", "[[940 12 4]\n", " [ 13 100 10]\n", " [ 24 13 84]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7000000000000001, 0.3)\n", "accuracy: 0.931 balanced accuracy: 0.840 kappa: 0.799\n", "[[928 24 4]\n", " [ 8 105 10]\n", " [ 21 16 84]]\n", "--------------\n", "original\n", "accuracy: 0.887 balanced accuracy: 0.650 kappa: 0.607\n", "[[946 5 3]\n", " [ 53 59 10]\n", " [ 56 9 59]]\n", "rebalanced\n", "thresholds: [0.6000000000000001, 0.3]\n", "accuracy: 0.902 balanced accuracy: 0.742 kappa: 0.699\n", "[[929 17 8]\n", " [ 31 70 21]\n", " [ 33 7 84]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.35000000000000003)\n", "accuracy: 0.900 balanced accuracy: 0.734 kappa: 0.691\n", "[[929 18 7]\n", " [ 31 78 13]\n", " [ 34 17 73]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.3)\n", "accuracy: 0.812 balanced accuracy: 0.758 kappa: 0.557\n", "[[797 148 9]\n", " [ 10 91 21]\n", " [ 11 27 86]]\n", "--------------\n", "original\n", "accuracy: 0.852 balanced accuracy: 0.518 kappa: 0.395\n", "[[954 1 0]\n", " [ 93 27 2]\n", " [ 82 0 41]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.901 balanced accuracy: 0.768 kappa: 0.701\n", "[[916 27 12]\n", " [ 40 73 9]\n", " [ 23 8 92]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.25)\n", "accuracy: 0.901 balanced accuracy: 0.768 kappa: 0.701\n", "[[916 27 12]\n", " [ 40 73 9]\n", " [ 23 8 92]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.888 balanced accuracy: 0.803 kappa: 0.692\n", "[[884 42 29]\n", " [ 29 77 16]\n", " [ 12 6 105]]\n", "--------------\n", "original\n", "accuracy: 0.855 balanced accuracy: 0.525 kappa: 0.408\n", "[[956 0 0]\n", " [ 91 30 1]\n", " [ 81 1 40]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.2]\n", "accuracy: 0.886 balanced accuracy: 0.711 kappa: 0.645\n", "[[920 17 19]\n", " [ 42 61 19]\n", " [ 34 6 82]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.25)\n", "accuracy: 0.892 balanced accuracy: 0.731 kappa: 0.664\n", "[[921 23 12]\n", " [ 42 72 8]\n", " [ 36 8 78]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.15000000000000002)\n", "accuracy: 0.822 balanced accuracy: 0.744 kappa: 0.564\n", "[[818 93 45]\n", " [ 14 76 32]\n", " [ 13 17 92]]\n", "--------------\n", "original\n", "accuracy: 0.912 balanced accuracy: 0.720 kappa: 0.701\n", "[[953 2 1]\n", " [ 45 75 3]\n", " [ 48 6 67]]\n", "rebalanced\n", "thresholds: [0.6000000000000001, 0.3]\n", "accuracy: 0.924 balanced accuracy: 0.804 kappa: 0.768\n", "[[934 15 7]\n", " [ 23 92 8]\n", " [ 30 8 83]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.3)\n", "accuracy: 0.924 balanced accuracy: 0.804 kappa: 0.768\n", "[[934 15 7]\n", " [ 23 92 8]\n", " [ 30 8 83]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.3)\n", "accuracy: 0.893 balanced accuracy: 0.824 kappa: 0.714\n", "[[883 66 7]\n", " [ 10 105 8]\n", " [ 15 22 84]]\n", "--------------\n", "original\n", "accuracy: 0.863 balanced accuracy: 0.552 kappa: 0.456\n", "[[954 0 0]\n", " [ 67 57 0]\n", " [ 96 2 24]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.904 balanced accuracy: 0.740 kappa: 0.695\n", "[[932 19 3]\n", " [ 32 87 5]\n", " [ 47 9 66]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.904 balanced accuracy: 0.740 kappa: 0.695\n", "[[932 19 3]\n", " [ 32 87 5]\n", " [ 47 9 66]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.881 balanced accuracy: 0.787 kappa: 0.672\n", "[[880 57 17]\n", " [ 16 94 14]\n", " [ 30 9 83]]\n", "--------------\n", "original\n", "accuracy: 0.859 balanced accuracy: 0.557 kappa: 0.468\n", "[[947 3 2]\n", " [ 61 55 8]\n", " [ 91 4 29]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.874 balanced accuracy: 0.720 kappa: 0.635\n", "[[898 35 19]\n", " [ 27 67 30]\n", " [ 35 5 84]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.874 balanced accuracy: 0.720 kappa: 0.635\n", "[[898 35 19]\n", " [ 27 67 30]\n", " [ 35 5 84]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.823 balanced accuracy: 0.745 kappa: 0.566\n", "[[816 89 47]\n", " [ 16 72 36]\n", " [ 16 9 99]]\n", "--------------\n", "original\n", "accuracy: 0.899 balanced accuracy: 0.683 kappa: 0.647\n", "[[950 4 1]\n", " [ 36 83 3]\n", " [ 73 4 46]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.917 balanced accuracy: 0.835 kappa: 0.765\n", "[[911 20 24]\n", " [ 8 103 11]\n", " [ 26 10 87]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.25)\n", "accuracy: 0.922 balanced accuracy: 0.823 kappa: 0.770\n", "[[922 12 21]\n", " [ 12 99 11]\n", " [ 29 9 85]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7000000000000001, 0.25)\n", "accuracy: 0.904 balanced accuracy: 0.837 kappa: 0.739\n", "[[892 36 27]\n", " [ 7 104 11]\n", " [ 19 15 89]]\n", "--------------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "original\n", "accuracy: 0.877 balanced accuracy: 0.597 kappa: 0.543\n", "[[955 0 0]\n", " [ 67 51 5]\n", " [ 64 12 46]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.907 balanced accuracy: 0.765 kappa: 0.722\n", "[[927 27 1]\n", " [ 25 74 24]\n", " [ 25 9 88]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.3)\n", "accuracy: 0.910 balanced accuracy: 0.754 kappa: 0.722\n", "[[935 19 1]\n", " [ 30 76 17]\n", " [ 29 12 81]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.25)\n", "accuracy: 0.861 balanced accuracy: 0.783 kappa: 0.643\n", "[[855 99 1]\n", " [ 9 90 24]\n", " [ 14 20 88]]\n", "--------------\n", "original\n", "accuracy: 0.869 balanced accuracy: 0.578 kappa: 0.493\n", "[[953 0 2]\n", " [ 79 43 1]\n", " [ 75 0 47]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.894 balanced accuracy: 0.775 kappa: 0.691\n", "[[904 36 15]\n", " [ 22 94 7]\n", " [ 35 12 75]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.2)\n", "accuracy: 0.893 balanced accuracy: 0.775 kappa: 0.690\n", "[[903 25 27]\n", " [ 22 87 14]\n", " [ 33 7 82]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.874 balanced accuracy: 0.788 kappa: 0.661\n", "[[871 56 28]\n", " [ 12 96 15]\n", " [ 29 11 82]]\n", "--------------\n", "original\n", "accuracy: 0.900 balanced accuracy: 0.677 kappa: 0.649\n", "[[953 1 0]\n", " [ 73 40 10]\n", " [ 35 1 87]]\n", "rebalanced\n", "thresholds: [0.6000000000000001, 0.35000000000000003]\n", "accuracy: 0.910 balanced accuracy: 0.747 kappa: 0.719\n", "[[937 15 2]\n", " [ 45 62 16]\n", " [ 19 11 93]]\n", "global kappa rebalanced\n", "thresholds: (0.55, 0.3)\n", "accuracy: 0.906 balanced accuracy: 0.724 kappa: 0.698\n", "[[941 6 7]\n", " [ 52 49 22]\n", " [ 22 4 97]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.3)\n", "accuracy: 0.849 balanced accuracy: 0.793 kappa: 0.627\n", "[[834 111 9]\n", " [ 17 84 22]\n", " [ 4 18 101]]\n", "--------------\n", "original\n", "accuracy: 0.866 balanced accuracy: 0.571 kappa: 0.481\n", "[[951 2 1]\n", " [ 71 50 2]\n", " [ 85 0 38]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.911 balanced accuracy: 0.783 kappa: 0.730\n", "[[923 24 7]\n", " [ 21 94 8]\n", " [ 39 8 76]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.2)\n", "accuracy: 0.913 balanced accuracy: 0.761 kappa: 0.726\n", "[[936 10 8]\n", " [ 30 81 12]\n", " [ 42 2 79]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.889 balanced accuracy: 0.812 kappa: 0.698\n", "[[881 51 22]\n", " [ 10 101 12]\n", " [ 29 9 85]]\n", "--------------\n", "original\n", "accuracy: 0.873 balanced accuracy: 0.588 kappa: 0.518\n", "[[954 0 0]\n", " [ 67 56 1]\n", " [ 79 5 38]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.2]\n", "accuracy: 0.915 balanced accuracy: 0.799 kappa: 0.750\n", "[[922 21 11]\n", " [ 16 82 26]\n", " [ 27 1 94]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.2)\n", "accuracy: 0.915 balanced accuracy: 0.799 kappa: 0.750\n", "[[922 21 11]\n", " [ 16 82 26]\n", " [ 27 1 94]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.882 balanced accuracy: 0.795 kappa: 0.683\n", "[[879 60 15]\n", " [ 12 86 26]\n", " [ 22 6 94]]\n", "--------------\n", "original\n", "accuracy: 0.879 balanced accuracy: 0.611 kappa: 0.549\n", "[[953 0 3]\n", " [ 65 56 1]\n", " [ 70 6 46]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.3]\n", "accuracy: 0.899 balanced accuracy: 0.753 kappa: 0.700\n", "[[921 26 9]\n", " [ 23 91 8]\n", " [ 27 28 67]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.906 balanced accuracy: 0.775 kappa: 0.720\n", "[[921 20 15]\n", " [ 23 86 13]\n", " [ 26 16 80]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.863 balanced accuracy: 0.790 kappa: 0.648\n", "[[855 86 15]\n", " [ 10 99 13]\n", " [ 11 30 81]]\n", "--------------\n", "original\n", "accuracy: 0.887 balanced accuracy: 0.638 kappa: 0.590\n", "[[952 2 1]\n", " [ 43 79 0]\n", " [ 80 10 33]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.912 balanced accuracy: 0.766 kappa: 0.726\n", "[[932 16 7]\n", " [ 19 90 13]\n", " [ 44 7 72]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.3)\n", "accuracy: 0.901 balanced accuracy: 0.712 kappa: 0.678\n", "[[940 12 3]\n", " [ 26 89 7]\n", " [ 55 16 52]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.882 balanced accuracy: 0.782 kappa: 0.675\n", "[[884 61 10]\n", " [ 13 96 13]\n", " [ 27 18 78]]\n", "--------------\n", "original\n", "accuracy: 0.872 balanced accuracy: 0.583 kappa: 0.511\n", "[[954 0 1]\n", " [ 77 43 3]\n", " [ 68 5 49]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.887 balanced accuracy: 0.725 kappa: 0.660\n", "[[916 31 8]\n", " [ 30 70 23]\n", " [ 34 9 79]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.887 balanced accuracy: 0.725 kappa: 0.660\n", "[[916 31 8]\n", " [ 30 70 23]\n", " [ 34 9 79]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.25)\n", "accuracy: 0.807 balanced accuracy: 0.746 kappa: 0.541\n", "[[797 150 8]\n", " [ 9 91 23]\n", " [ 21 20 81]]\n", "--------------\n", "original\n", "accuracy: 0.853 balanced accuracy: 0.528 kappa: 0.410\n", "[[952 1 2]\n", " [114 6 2]\n", " [ 57 0 66]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.869 balanced accuracy: 0.703 kappa: 0.597\n", "[[900 39 16]\n", " [ 64 54 4]\n", " [ 27 7 89]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.2)\n", "accuracy: 0.870 balanced accuracy: 0.715 kappa: 0.605\n", "[[896 34 25]\n", " [ 64 52 6]\n", " [ 24 3 96]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.2)\n", "accuracy: 0.812 balanced accuracy: 0.803 kappa: 0.562\n", "[[780 146 29]\n", " [ 20 96 6]\n", " [ 13 11 99]]\n", "--------------\n", "original\n", "accuracy: 0.903 balanced accuracy: 0.684 kappa: 0.661\n", "[[955 0 0]\n", " [ 50 67 5]\n", " [ 55 6 62]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.922 balanced accuracy: 0.789 kappa: 0.764\n", "[[937 8 10]\n", " [ 28 72 22]\n", " [ 20 5 98]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.921 balanced accuracy: 0.770 kappa: 0.753\n", "[[943 3 9]\n", " [ 33 67 22]\n", " [ 23 5 95]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.906 balanced accuracy: 0.794 kappa: 0.731\n", "[[912 31 12]\n", " [ 23 77 22]\n", " [ 13 12 98]]\n", "--------------\n", "original\n", "accuracy: 0.879 balanced accuracy: 0.612 kappa: 0.550\n", "[[952 3 0]\n", " [ 60 59 4]\n", " [ 77 1 44]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.898 balanced accuracy: 0.744 kappa: 0.687\n", "[[923 20 12]\n", " [ 24 86 13]\n", " [ 44 9 69]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.898 balanced accuracy: 0.744 kappa: 0.687\n", "[[923 20 12]\n", " [ 24 86 13]\n", " [ 44 9 69]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.871 balanced accuracy: 0.775 kappa: 0.648\n", "[[872 63 20]\n", " [ 14 88 21]\n", " [ 32 5 85]]\n", "--------------\n", "original\n", "accuracy: 0.877 balanced accuracy: 0.604 kappa: 0.542\n", "[[953 0 0]\n", " [ 57 62 4]\n", " [ 83 3 38]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.908 balanced accuracy: 0.741 kappa: 0.716\n", "[[937 9 7]\n", " [ 25 74 24]\n", " [ 35 10 79]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.908 balanced accuracy: 0.741 kappa: 0.716\n", "[[937 9 7]\n", " [ 25 74 24]\n", " [ 35 10 79]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7000000000000001, 0.25)\n", "accuracy: 0.897 balanced accuracy: 0.753 kappa: 0.700\n", "[[917 29 7]\n", " [ 19 80 24]\n", " [ 28 16 80]]\n", "--------------\n", "original\n", "accuracy: 0.887 balanced accuracy: 0.636 kappa: 0.590\n", "[[953 1 0]\n", " [ 62 59 3]\n", " [ 65 4 53]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.3]\n", "accuracy: 0.916 balanced accuracy: 0.774 kappa: 0.740\n", "[[934 17 3]\n", " [ 33 81 10]\n", " [ 28 10 84]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.3)\n", "accuracy: 0.916 balanced accuracy: 0.774 kappa: 0.740\n", "[[934 17 3]\n", " [ 33 81 10]\n", " [ 28 10 84]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.25)\n", "accuracy: 0.860 balanced accuracy: 0.781 kappa: 0.637\n", "[[854 92 8]\n", " [ 20 88 16]\n", " [ 13 19 90]]\n", "--------------\n", "original\n", "accuracy: 0.906 balanced accuracy: 0.703 kappa: 0.679\n", "[[951 3 2]\n", " [ 41 78 3]\n", " [ 52 12 58]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.901 balanced accuracy: 0.765 kappa: 0.706\n", "[[918 31 7]\n", " [ 20 84 18]\n", " [ 30 13 79]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.25)\n", "accuracy: 0.912 balanced accuracy: 0.763 kappa: 0.728\n", "[[934 15 7]\n", " [ 22 82 18]\n", " [ 34 10 78]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.25)\n", "accuracy: 0.873 balanced accuracy: 0.775 kappa: 0.661\n", "[[876 73 7]\n", " [ 12 92 18]\n", " [ 17 25 80]]\n", "--------------\n", "original\n", "accuracy: 0.908 balanced accuracy: 0.707 kappa: 0.686\n", "[[952 0 2]\n", " [ 55 64 4]\n", " [ 43 6 74]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.3]\n", "accuracy: 0.922 balanced accuracy: 0.811 kappa: 0.770\n", "[[926 24 4]\n", " [ 19 94 10]\n", " [ 21 16 86]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.3)\n", "accuracy: 0.931 balanced accuracy: 0.796 kappa: 0.785\n", "[[945 5 4]\n", " [ 27 86 10]\n", " [ 26 11 86]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.3)\n", "accuracy: 0.902 balanced accuracy: 0.812 kappa: 0.727\n", "[[898 52 4]\n", " [ 15 98 10]\n", " [ 18 19 86]]\n", "--------------\n", "original\n", "accuracy: 0.864 balanced accuracy: 0.567 kappa: 0.475\n", "[[951 5 0]\n", " [ 64 56 2]\n", " [ 89 3 30]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.890 balanced accuracy: 0.716 kappa: 0.652\n", "[[924 28 4]\n", " [ 34 80 8]\n", " [ 48 10 64]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.890 balanced accuracy: 0.716 kappa: 0.652\n", "[[924 28 4]\n", " [ 34 80 8]\n", " [ 48 10 64]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.839 balanced accuracy: 0.718 kappa: 0.565\n", "[[853 75 28]\n", " [ 26 78 18]\n", " [ 33 13 76]]\n", "--------------\n", "original\n", "accuracy: 0.894 balanced accuracy: 0.659 kappa: 0.622\n", "[[953 0 2]\n", " [ 59 60 3]\n", " [ 57 6 60]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.907 balanced accuracy: 0.788 kappa: 0.725\n", "[[917 23 15]\n", " [ 33 77 12]\n", " [ 22 6 95]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.907 balanced accuracy: 0.788 kappa: 0.725\n", "[[917 23 15]\n", " [ 33 77 12]\n", " [ 22 6 95]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.25)\n", "accuracy: 0.845 balanced accuracy: 0.814 kappa: 0.625\n", "[[820 119 16]\n", " [ 12 98 12]\n", " [ 7 20 96]]\n", "--------------\n", "original\n", "accuracy: 0.859 balanced accuracy: 0.542 kappa: 0.444\n", "[[954 0 0]\n", " [ 70 47 5]\n", " [ 92 2 30]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.2]\n", "accuracy: 0.884 balanced accuracy: 0.710 kappa: 0.645\n", "[[917 21 16]\n", " [ 31 66 25]\n", " [ 41 5 78]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.25)\n", "accuracy: 0.882 balanced accuracy: 0.700 kappa: 0.637\n", "[[918 29 7]\n", " [ 31 74 17]\n", " [ 41 17 66]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.2)\n", "accuracy: 0.808 balanced accuracy: 0.717 kappa: 0.530\n", "[[810 125 19]\n", " [ 18 79 25]\n", " [ 20 23 81]]\n", "--------------\n", "original\n", "accuracy: 0.848 balanced accuracy: 0.502 kappa: 0.369\n", "[[955 0 0]\n", " [105 14 3]\n", " [ 75 0 48]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.895 balanced accuracy: 0.738 kappa: 0.676\n", "[[921 27 7]\n", " [ 38 74 10]\n", " [ 33 11 79]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.2)\n", "accuracy: 0.896 balanced accuracy: 0.745 kappa: 0.681\n", "[[919 20 16]\n", " [ 38 68 16]\n", " [ 31 4 88]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.872 balanced accuracy: 0.775 kappa: 0.650\n", "[[873 61 21]\n", " [ 23 83 16]\n", " [ 23 10 90]]\n", "--------------\n", "original\n", "accuracy: 0.856 balanced accuracy: 0.532 kappa: 0.420\n", "[[954 1 0]\n", " [ 83 37 3]\n", " [ 86 0 36]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.891 balanced accuracy: 0.717 kappa: 0.657\n", "[[924 17 14]\n", " [ 47 66 10]\n", " [ 31 12 79]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.25)\n", "accuracy: 0.891 balanced accuracy: 0.717 kappa: 0.657\n", "[[924 17 14]\n", " [ 47 66 10]\n", " [ 31 12 79]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.848 balanced accuracy: 0.749 kappa: 0.595\n", "[[851 73 31]\n", " [ 33 74 16]\n", " [ 18 12 92]]\n", "--------------\n", "original\n", "accuracy: 0.846 balanced accuracy: 0.508 kappa: 0.390\n", "[[950 0 2]\n", " [ 96 22 7]\n", " [ 74 6 43]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.25]\n", "accuracy: 0.863 balanced accuracy: 0.707 kappa: 0.599\n", "[[888 44 20]\n", " [ 46 62 17]\n", " [ 29 9 85]]\n", "global kappa rebalanced\n", "thresholds: (0.7000000000000001, 0.2)\n", "accuracy: 0.861 balanced accuracy: 0.711 kappa: 0.597\n", "[[884 38 30]\n", " [ 46 58 21]\n", " [ 29 3 91]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.8, 0.2)\n", "accuracy: 0.795 balanced accuracy: 0.734 kappa: 0.513\n", "[[783 135 34]\n", " [ 24 80 21]\n", " [ 20 12 91]]\n", "--------------\n", "original\n", "accuracy: 0.858 balanced accuracy: 0.558 kappa: 0.454\n", "[[946 2 5]\n", " [ 88 36 0]\n", " [ 73 2 48]]\n", "rebalanced\n", "thresholds: [0.7000000000000001, 0.3]\n", "accuracy: 0.895 balanced accuracy: 0.749 kappa: 0.688\n", "[[915 29 9]\n", " [ 36 83 5]\n", " [ 22 25 76]]\n", "global kappa rebalanced\n", "thresholds: (0.6500000000000001, 0.3)\n", "accuracy: 0.902 balanced accuracy: 0.738 kappa: 0.695\n", "[[930 14 9]\n", " [ 42 77 5]\n", " [ 31 16 76]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.887 balanced accuracy: 0.781 kappa: 0.683\n", "[[890 39 24]\n", " [ 29 81 14]\n", " [ 18 12 93]]\n", "--------------\n", "original\n", "accuracy: 0.887 balanced accuracy: 0.644 kappa: 0.588\n", "[[949 4 2]\n", " [ 56 67 0]\n", " [ 72 2 48]]\n", "rebalanced\n", "thresholds: [0.6500000000000001, 0.25]\n", "accuracy: 0.912 balanced accuracy: 0.800 kappa: 0.737\n", "[[919 21 15]\n", " [ 28 87 8]\n", " [ 31 2 89]]\n", "global kappa rebalanced\n", "thresholds: (0.6000000000000001, 0.25)\n", "accuracy: 0.909 balanced accuracy: 0.768 kappa: 0.716\n", "[[928 13 14]\n", " [ 39 76 8]\n", " [ 33 2 87]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.7500000000000001, 0.2)\n", "accuracy: 0.890 balanced accuracy: 0.828 kappa: 0.704\n", "[[876 47 32]\n", " [ 13 97 13]\n", " [ 21 6 95]]\n" ] } ], "source": [ "accum_80_10_10 = []\n", "\n", "for rep in range(50):\n", " print('--------------')\n", " # Generate a ternary imbalanced classification problem\n", " X, y = make_classification(n_samples=6000, n_features=20,\n", " n_informative=10, n_redundant=0, n_classes=3, \n", " random_state=0xf00d+rep, shuffle=False, weights = [0.8, 0.1, 0.1])\n", " run_ternary_experiment(X,y,accum_80_10_10)\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "accum = accum_80_10_10\n", "figsize(9,6)\n", "scatter([x['orig-kappa'] for x in accum],[x['shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['orig-balanced'] for x in accum],[x['shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['orig-accuracy'] for x in accum],[x['shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('orig')\n", "ylabel('greedy shift');\n", "title('80-10-10');" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-k-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-k-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-k-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-kappa');\n", "title('80-10-10');" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-ba-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-ba-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-ba-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-balanced');\n", "title('80-10-10');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same conclusions as before (good thing!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 is the majority class" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------------\n", "original\n", "accuracy: 0.877 balanced accuracy: 0.604 kappa: 0.534\n", "[[ 54 1 69]\n", " [ 1 46 74]\n", " [ 3 0 952]]\n", "rebalanced\n", "thresholds: [0.3, 0.7000000000000001]\n", "accuracy: 0.893 balanced accuracy: 0.742 kappa: 0.681\n", "[[ 74 25 25]\n", " [ 8 81 32]\n", " [ 10 28 917]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.7000000000000001)\n", "accuracy: 0.897 balanced accuracy: 0.755 kappa: 0.694\n", "[[ 84 15 25]\n", " [ 13 76 32]\n", " [ 16 22 917]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.840 balanced accuracy: 0.767 kappa: 0.597\n", "[[ 95 16 13]\n", " [ 21 80 20]\n", " [ 34 88 833]]\n", "--------------\n", "original\n", "accuracy: 0.857 balanced accuracy: 0.534 kappa: 0.425\n", "[[ 47 0 76]\n", " [ 3 27 93]\n", " [ 0 0 954]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.897 balanced accuracy: 0.791 kappa: 0.705\n", "[[ 79 10 34]\n", " [ 11 97 15]\n", " [ 11 43 900]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.906 balanced accuracy: 0.767 kappa: 0.712\n", "[[ 78 6 39]\n", " [ 11 86 26]\n", " [ 10 21 923]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.870 balanced accuracy: 0.797 kappa: 0.658\n", "[[ 89 9 25]\n", " [ 18 94 11]\n", " [ 27 66 861]]\n", "--------------\n", "original\n", "accuracy: 0.869 balanced accuracy: 0.580 kappa: 0.497\n", "[[ 54 0 67]\n", " [ 2 37 86]\n", " [ 1 1 952]]\n", "rebalanced\n", "thresholds: [0.2, 0.7000000000000001]\n", "accuracy: 0.898 balanced accuracy: 0.781 kappa: 0.705\n", "[[ 93 4 24]\n", " [ 20 78 27]\n", " [ 21 26 907]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.7000000000000001)\n", "accuracy: 0.898 balanced accuracy: 0.781 kappa: 0.705\n", "[[ 93 4 24]\n", " [ 20 78 27]\n", " [ 21 26 907]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.839 balanced accuracy: 0.798 kappa: 0.610\n", "[[ 95 14 12]\n", " [ 20 94 11]\n", " [ 23 113 818]]\n", "--------------\n", "original\n", "accuracy: 0.878 balanced accuracy: 0.610 kappa: 0.543\n", "[[ 50 1 72]\n", " [ 1 52 69]\n", " [ 2 1 952]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.903 balanced accuracy: 0.763 kappa: 0.701\n", "[[ 78 9 36]\n", " [ 2 84 36]\n", " [ 15 18 922]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.904 balanced accuracy: 0.775 kappa: 0.707\n", "[[ 86 3 34]\n", " [ 5 81 36]\n", " [ 25 12 918]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.863 balanced accuracy: 0.791 kappa: 0.637\n", "[[ 87 12 24]\n", " [ 5 94 23]\n", " [ 30 70 855]]\n", "--------------\n", "original\n", "accuracy: 0.870 balanced accuracy: 0.583 kappa: 0.503\n", "[[ 61 1 60]\n", " [ 3 31 89]\n", " [ 2 1 952]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.893 balanced accuracy: 0.732 kappa: 0.669\n", "[[ 77 15 30]\n", " [ 4 74 45]\n", " [ 8 26 921]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6500000000000001)\n", "accuracy: 0.893 balanced accuracy: 0.732 kappa: 0.669\n", "[[ 77 15 30]\n", " [ 4 74 45]\n", " [ 8 26 921]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.840 balanced accuracy: 0.788 kappa: 0.604\n", "[[ 98 15 9]\n", " [ 16 86 21]\n", " [ 30 101 824]]\n", "--------------\n", "original\n", "accuracy: 0.882 balanced accuracy: 0.617 kappa: 0.569\n", "[[ 35 11 77]\n", " [ 3 69 50]\n", " [ 0 0 955]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.881 balanced accuracy: 0.701 kappa: 0.641\n", "[[ 72 10 41]\n", " [ 33 68 21]\n", " [ 9 29 917]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.892 balanced accuracy: 0.682 kappa: 0.646\n", "[[ 56 13 54]\n", " [ 18 74 30]\n", " [ 3 12 940]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.8)\n", "accuracy: 0.812 balanced accuracy: 0.697 kappa: 0.531\n", "[[ 58 40 25]\n", " [ 18 92 12]\n", " [ 4 126 825]]\n", "--------------\n", "original\n", "accuracy: 0.859 balanced accuracy: 0.552 kappa: 0.451\n", "[[ 53 0 71]\n", " [ 1 29 95]\n", " [ 0 2 949]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.887 balanced accuracy: 0.762 kappa: 0.677\n", "[[ 83 16 25]\n", " [ 13 84 28]\n", " [ 13 41 897]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.7000000000000001)\n", "accuracy: 0.887 balanced accuracy: 0.762 kappa: 0.677\n", "[[ 83 16 25]\n", " [ 13 84 28]\n", " [ 13 41 897]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.8)\n", "accuracy: 0.823 balanced accuracy: 0.772 kappa: 0.579\n", "[[ 83 30 11]\n", " [ 13 100 12]\n", " [ 13 134 804]]\n", "--------------\n", "original\n", "accuracy: 0.865 balanced accuracy: 0.566 kappa: 0.484\n", "[[ 40 6 77]\n", " [ 4 46 73]\n", " [ 0 2 952]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.877 balanced accuracy: 0.734 kappa: 0.650\n", "[[ 70 22 31]\n", " [ 19 85 19]\n", " [ 9 47 898]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6500000000000001)\n", "accuracy: 0.886 balanced accuracy: 0.716 kappa: 0.654\n", "[[ 61 21 41]\n", " [ 13 85 25]\n", " [ 4 33 917]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.851 balanced accuracy: 0.727 kappa: 0.603\n", "[[ 70 30 23]\n", " [ 19 87 17]\n", " [ 9 81 864]]\n", "--------------\n", "original\n", "accuracy: 0.881 balanced accuracy: 0.613 kappa: 0.559\n", "[[ 49 7 68]\n", " [ 3 54 65]\n", " [ 0 0 954]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.896 balanced accuracy: 0.739 kappa: 0.682\n", "[[ 78 12 34]\n", " [ 13 76 33]\n", " [ 13 20 921]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.896 balanced accuracy: 0.739 kappa: 0.682\n", "[[ 78 12 34]\n", " [ 13 76 33]\n", " [ 13 20 921]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.863 balanced accuracy: 0.756 kappa: 0.632\n", "[[ 78 27 19]\n", " [ 13 89 20]\n", " [ 16 70 868]]\n", "--------------\n", "original\n", "accuracy: 0.922 balanced accuracy: 0.747 kappa: 0.738\n", "[[ 87 2 33]\n", " [ 7 64 50]\n", " [ 1 0 956]]\n", "rebalanced\n", "thresholds: [0.3, 0.7000000000000001]\n", "accuracy: 0.948 balanced accuracy: 0.868 kappa: 0.847\n", "[[104 11 7]\n", " [ 9 93 19]\n", " [ 5 11 941]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.7000000000000001)\n", "accuracy: 0.948 balanced accuracy: 0.868 kappa: 0.847\n", "[[104 11 7]\n", " [ 9 93 19]\n", " [ 5 11 941]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.8)\n", "accuracy: 0.899 balanced accuracy: 0.869 kappa: 0.737\n", "[[104 13 5]\n", " [ 9 102 10]\n", " [ 5 79 873]]\n", "--------------\n", "original\n", "accuracy: 0.858 balanced accuracy: 0.545 kappa: 0.446\n", "[[ 48 4 72]\n", " [ 1 31 92]\n", " [ 1 0 951]]\n", "rebalanced\n", "thresholds: [0.3, 0.7000000000000001]\n", "accuracy: 0.879 balanced accuracy: 0.729 kappa: 0.645\n", "[[ 83 25 16]\n", " [ 3 71 50]\n", " [ 9 42 901]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.7000000000000001)\n", "accuracy: 0.883 balanced accuracy: 0.745 kappa: 0.658\n", "[[ 96 12 16]\n", " [ 10 64 50]\n", " [ 19 33 900]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.830 balanced accuracy: 0.807 kappa: 0.597\n", "[[109 7 8]\n", " [ 19 87 18]\n", " [ 46 106 800]]\n", "--------------\n", "original\n", "accuracy: 0.907 balanced accuracy: 0.705 kappa: 0.686\n", "[[ 79 11 34]\n", " [ 6 59 58]\n", " [ 1 2 950]]\n", "rebalanced\n", "thresholds: [0.35000000000000003, 0.6500000000000001]\n", "accuracy: 0.907 balanced accuracy: 0.773 kappa: 0.724\n", "[[ 84 18 22]\n", " [ 8 83 32]\n", " [ 2 29 922]]\n", "global kappa rebalanced\n", "thresholds: (0.35000000000000003, 0.55)\n", "accuracy: 0.916 balanced accuracy: 0.753 kappa: 0.734\n", "[[ 83 16 25]\n", " [ 8 74 41]\n", " [ 2 9 942]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.35000000000000003, 0.7500000000000001)\n", "accuracy: 0.868 balanced accuracy: 0.769 kappa: 0.645\n", "[[ 84 24 16]\n", " [ 8 88 27]\n", " [ 2 81 870]]\n", "--------------\n", "original\n", "accuracy: 0.864 balanced accuracy: 0.565 kappa: 0.475\n", "[[ 45 1 77]\n", " [ 3 41 79]\n", " [ 3 0 951]]\n", "rebalanced\n", "thresholds: [0.3, 0.7000000000000001]\n", "accuracy: 0.872 balanced accuracy: 0.724 kappa: 0.627\n", "[[ 68 24 31]\n", " [ 5 84 34]\n", " [ 12 48 894]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.6500000000000001)\n", "accuracy: 0.882 balanced accuracy: 0.719 kappa: 0.640\n", "[[ 86 3 34]\n", " [ 19 62 42]\n", " [ 31 13 910]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.858 balanced accuracy: 0.754 kappa: 0.617\n", "[[ 92 11 20]\n", " [ 19 75 29]\n", " [ 38 54 862]]\n", "--------------\n", "original\n", "accuracy: 0.880 balanced accuracy: 0.612 kappa: 0.558\n", "[[ 59 3 61]\n", " [ 8 44 71]\n", " [ 1 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.901 balanced accuracy: 0.779 kappa: 0.708\n", "[[ 86 7 30]\n", " [ 14 84 25]\n", " [ 16 27 911]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.901 balanced accuracy: 0.779 kappa: 0.708\n", "[[ 86 7 30]\n", " [ 14 84 25]\n", " [ 16 27 911]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.8)\n", "accuracy: 0.832 balanced accuracy: 0.790 kappa: 0.595\n", "[[104 4 15]\n", " [ 31 83 9]\n", " [ 64 79 811]]\n", "--------------\n", "original\n", "accuracy: 0.881 balanced accuracy: 0.611 kappa: 0.558\n", "[[ 51 4 68]\n", " [ 7 51 64]\n", " [ 0 0 955]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.901 balanced accuracy: 0.742 kappa: 0.691\n", "[[ 87 7 29]\n", " [ 13 67 42]\n", " [ 4 24 927]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6000000000000001)\n", "accuracy: 0.907 balanced accuracy: 0.731 kappa: 0.699\n", "[[ 85 4 34]\n", " [ 13 63 46]\n", " [ 2 12 941]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.863 balanced accuracy: 0.755 kappa: 0.627\n", "[[ 96 7 20]\n", " [ 25 70 27]\n", " [ 20 66 869]]\n", "--------------\n", "original\n", "accuracy: 0.859 balanced accuracy: 0.540 kappa: 0.440\n", "[[ 36 1 86]\n", " [ 5 40 77]\n", " [ 0 0 955]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.884 balanced accuracy: 0.764 kappa: 0.673\n", "[[ 82 19 22]\n", " [ 15 84 23]\n", " [ 22 38 895]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.897 balanced accuracy: 0.739 kappa: 0.687\n", "[[ 82 14 27]\n", " [ 15 71 36]\n", " [ 15 16 924]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.846 balanced accuracy: 0.762 kappa: 0.607\n", "[[ 82 26 15]\n", " [ 15 90 17]\n", " [ 22 90 843]]\n", "--------------\n", "original\n", "accuracy: 0.877 balanced accuracy: 0.623 kappa: 0.549\n", "[[ 53 1 69]\n", " [ 0 55 67]\n", " [ 4 7 944]]\n", "rebalanced\n", "thresholds: [0.3, 0.7000000000000001]\n", "accuracy: 0.860 balanced accuracy: 0.728 kappa: 0.598\n", "[[ 77 14 32]\n", " [ 2 78 42]\n", " [ 21 57 877]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6000000000000001)\n", "accuracy: 0.875 balanced accuracy: 0.708 kappa: 0.609\n", "[[ 78 4 41]\n", " [ 3 66 53]\n", " [ 23 26 906]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.840 balanced accuracy: 0.774 kappa: 0.585\n", "[[ 95 7 21]\n", " [ 3 83 36]\n", " [ 56 69 830]]\n", "--------------\n", "original\n", "accuracy: 0.886 balanced accuracy: 0.632 kappa: 0.583\n", "[[ 60 5 57]\n", " [ 3 50 70]\n", " [ 2 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.912 balanced accuracy: 0.766 kappa: 0.728\n", "[[ 91 3 28]\n", " [ 20 71 32]\n", " [ 7 16 932]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.912 balanced accuracy: 0.766 kappa: 0.728\n", "[[ 91 3 28]\n", " [ 20 71 32]\n", " [ 7 16 932]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.885 balanced accuracy: 0.784 kappa: 0.682\n", "[[ 92 11 19]\n", " [ 20 82 21]\n", " [ 8 59 888]]\n", "--------------\n", "original\n", "accuracy: 0.898 balanced accuracy: 0.675 kappa: 0.643\n", "[[ 75 6 41]\n", " [ 7 50 64]\n", " [ 2 2 953]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.916 balanced accuracy: 0.795 kappa: 0.748\n", "[[ 94 13 15]\n", " [ 12 78 31]\n", " [ 9 21 927]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.919 balanced accuracy: 0.774 kappa: 0.747\n", "[[ 94 9 19]\n", " [ 12 69 40]\n", " [ 8 9 940]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7000000000000001)\n", "accuracy: 0.907 balanced accuracy: 0.807 kappa: 0.732\n", "[[103 4 15]\n", " [ 20 76 25]\n", " [ 21 27 909]]\n", "--------------\n", "original\n", "accuracy: 0.905 balanced accuracy: 0.712 kappa: 0.676\n", "[[ 65 0 57]\n", " [ 5 75 42]\n", " [ 2 8 946]]\n", "rebalanced\n", "thresholds: [0.3, 0.6000000000000001]\n", "accuracy: 0.918 balanced accuracy: 0.789 kappa: 0.748\n", "[[ 85 5 32]\n", " [ 9 85 28]\n", " [ 6 18 932]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.918 balanced accuracy: 0.789 kappa: 0.748\n", "[[ 85 5 32]\n", " [ 9 85 28]\n", " [ 6 18 932]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.881 balanced accuracy: 0.817 kappa: 0.684\n", "[[ 94 12 16]\n", " [ 13 94 15]\n", " [ 17 70 869]]\n", "--------------\n", "original\n", "accuracy: 0.865 balanced accuracy: 0.568 kappa: 0.478\n", "[[ 53 1 70]\n", " [ 2 34 86]\n", " [ 2 1 951]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.889 balanced accuracy: 0.722 kappa: 0.654\n", "[[ 77 9 38]\n", " [ 8 71 43]\n", " [ 12 23 919]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.894 balanced accuracy: 0.741 kappa: 0.671\n", "[[ 84 4 36]\n", " [ 8 71 43]\n", " [ 20 16 918]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.849 balanced accuracy: 0.767 kappa: 0.605\n", "[[ 87 16 21]\n", " [ 8 87 27]\n", " [ 23 86 845]]\n", "--------------\n", "original\n", "accuracy: 0.902 balanced accuracy: 0.683 kappa: 0.656\n", "[[ 55 6 61]\n", " [ 4 73 45]\n", " [ 1 0 955]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.910 balanced accuracy: 0.767 kappa: 0.726\n", "[[ 86 11 25]\n", " [ 18 76 28]\n", " [ 8 18 930]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.915 balanced accuracy: 0.745 kappa: 0.727\n", "[[ 75 12 35]\n", " [ 12 77 33]\n", " [ 5 5 946]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.894 balanced accuracy: 0.763 kappa: 0.694\n", "[[ 95 8 19]\n", " [ 29 68 25]\n", " [ 15 31 910]]\n", "--------------\n", "original\n", "accuracy: 0.859 balanced accuracy: 0.548 kappa: 0.450\n", "[[ 40 3 82]\n", " [ 2 40 81]\n", " [ 1 0 951]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.881 balanced accuracy: 0.716 kappa: 0.641\n", "[[ 66 18 41]\n", " [ 8 82 33]\n", " [ 7 36 909]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.890 balanced accuracy: 0.697 kappa: 0.648\n", "[[ 66 16 43]\n", " [ 8 72 43]\n", " [ 7 15 930]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.8)\n", "accuracy: 0.769 balanced accuracy: 0.710 kappa: 0.478\n", "[[ 66 43 16]\n", " [ 8 99 16]\n", " [ 7 187 758]]\n", "--------------\n", "original\n", "accuracy: 0.892 balanced accuracy: 0.652 kappa: 0.607\n", "[[ 65 1 57]\n", " [ 1 53 69]\n", " [ 1 1 952]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.909 balanced accuracy: 0.790 kappa: 0.728\n", "[[ 93 5 25]\n", " [ 10 80 33]\n", " [ 15 21 918]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.909 balanced accuracy: 0.790 kappa: 0.728\n", "[[ 93 5 25]\n", " [ 10 80 33]\n", " [ 15 21 918]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.875 balanced accuracy: 0.822 kappa: 0.677\n", "[[ 95 15 13]\n", " [ 10 98 15]\n", " [ 16 81 857]]\n", "--------------\n", "original\n", "accuracy: 0.866 balanced accuracy: 0.567 kappa: 0.485\n", "[[ 30 6 88]\n", " [ 1 57 66]\n", " [ 0 0 952]]\n", "rebalanced\n", "thresholds: [0.2, 0.6500000000000001]\n", "accuracy: 0.883 balanced accuracy: 0.691 kappa: 0.629\n", "[[ 65 3 56]\n", " [ 18 72 34]\n", " [ 16 13 923]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.6500000000000001)\n", "accuracy: 0.883 balanced accuracy: 0.691 kappa: 0.629\n", "[[ 65 3 56]\n", " [ 18 72 34]\n", " [ 16 13 923]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.834 balanced accuracy: 0.741 kappa: 0.578\n", "[[ 72 21 31]\n", " [ 18 95 11]\n", " [ 22 96 834]]\n", "--------------\n", "original\n", "accuracy: 0.892 balanced accuracy: 0.650 kappa: 0.606\n", "[[ 47 2 72]\n", " [ 1 70 53]\n", " [ 0 2 953]]\n", "rebalanced\n", "thresholds: [0.2, 0.7000000000000001]\n", "accuracy: 0.894 balanced accuracy: 0.790 kappa: 0.697\n", "[[ 85 7 29]\n", " [ 13 90 21]\n", " [ 30 27 898]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.6500000000000001)\n", "accuracy: 0.898 balanced accuracy: 0.765 kappa: 0.695\n", "[[ 81 5 35]\n", " [ 13 83 28]\n", " [ 26 15 914]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.841 balanced accuracy: 0.805 kappa: 0.612\n", "[[ 86 15 20]\n", " [ 13 105 6]\n", " [ 34 103 818]]\n", "--------------\n", "original\n", "accuracy: 0.896 balanced accuracy: 0.674 kappa: 0.634\n", "[[ 57 4 62]\n", " [ 3 69 50]\n", " [ 1 5 949]]\n", "rebalanced\n", "thresholds: [0.2, 0.6500000000000001]\n", "accuracy: 0.918 balanced accuracy: 0.800 kappa: 0.754\n", "[[ 85 5 33]\n", " [ 12 90 20]\n", " [ 13 15 927]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6500000000000001)\n", "accuracy: 0.915 balanced accuracy: 0.784 kappa: 0.741\n", "[[ 75 11 37]\n", " [ 8 94 20]\n", " [ 4 22 929]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.899 balanced accuracy: 0.830 kappa: 0.726\n", "[[ 90 14 19]\n", " [ 12 101 9]\n", " [ 18 49 888]]\n", "--------------\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "original\n", "accuracy: 0.839 balanced accuracy: 0.479 kappa: 0.326\n", "[[ 41 0 83]\n", " [ 3 13 107]\n", " [ 0 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.879 balanced accuracy: 0.705 kappa: 0.623\n", "[[ 76 9 39]\n", " [ 6 67 50]\n", " [ 13 28 912]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.7000000000000001)\n", "accuracy: 0.880 balanced accuracy: 0.710 kappa: 0.627\n", "[[ 83 3 38]\n", " [ 11 62 50]\n", " [ 22 20 911]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.800 balanced accuracy: 0.740 kappa: 0.521\n", "[[ 88 20 16]\n", " [ 11 84 28]\n", " [ 24 141 788]]\n", "--------------\n", "original\n", "accuracy: 0.887 balanced accuracy: 0.634 kappa: 0.581\n", "[[ 52 1 70]\n", " [ 1 59 63]\n", " [ 1 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.913 balanced accuracy: 0.789 kappa: 0.740\n", "[[ 93 8 22]\n", " [ 13 79 31]\n", " [ 12 18 924]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.907 balanced accuracy: 0.739 kappa: 0.705\n", "[[ 74 12 37]\n", " [ 8 78 37]\n", " [ 7 10 937]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.876 balanced accuracy: 0.799 kappa: 0.675\n", "[[104 11 8]\n", " [ 26 79 18]\n", " [ 31 55 868]]\n", "--------------\n", "original\n", "accuracy: 0.899 balanced accuracy: 0.674 kappa: 0.646\n", "[[ 63 3 55]\n", " [ 11 62 50]\n", " [ 1 1 954]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.938 balanced accuracy: 0.826 kappa: 0.817\n", "[[ 87 21 13]\n", " [ 16 95 12]\n", " [ 1 11 944]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6500000000000001)\n", "accuracy: 0.938 balanced accuracy: 0.826 kappa: 0.817\n", "[[ 87 21 13]\n", " [ 16 95 12]\n", " [ 1 11 944]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.7500000000000001)\n", "accuracy: 0.916 balanced accuracy: 0.831 kappa: 0.768\n", "[[ 87 26 8]\n", " [ 16 101 6]\n", " [ 1 44 911]]\n", "--------------\n", "original\n", "accuracy: 0.895 balanced accuracy: 0.662 kappa: 0.623\n", "[[ 43 2 78]\n", " [ 3 78 41]\n", " [ 2 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.903 balanced accuracy: 0.813 kappa: 0.728\n", "[[ 80 14 29]\n", " [ 11 103 8]\n", " [ 22 32 901]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6000000000000001)\n", "accuracy: 0.913 balanced accuracy: 0.774 kappa: 0.731\n", "[[ 77 6 40]\n", " [ 10 88 24]\n", " [ 15 9 931]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.880 balanced accuracy: 0.808 kappa: 0.681\n", "[[ 87 11 25]\n", " [ 18 98 6]\n", " [ 41 43 871]]\n", "--------------\n", "original\n", "accuracy: 0.870 balanced accuracy: 0.578 kappa: 0.497\n", "[[ 28 2 92]\n", " [ 1 62 60]\n", " [ 0 1 954]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.894 balanced accuracy: 0.773 kappa: 0.690\n", "[[ 80 10 32]\n", " [ 10 88 25]\n", " [ 17 33 905]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.908 balanced accuracy: 0.762 kappa: 0.716\n", "[[ 77 8 37]\n", " [ 10 84 29]\n", " [ 13 13 929]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.874 balanced accuracy: 0.796 kappa: 0.664\n", "[[100 5 17]\n", " [ 21 81 21]\n", " [ 40 47 868]]\n", "--------------\n", "original\n", "accuracy: 0.882 balanced accuracy: 0.621 kappa: 0.562\n", "[[ 61 2 60]\n", " [ 1 45 76]\n", " [ 0 2 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.904 balanced accuracy: 0.796 kappa: 0.725\n", "[[ 91 14 18]\n", " [ 14 85 23]\n", " [ 3 43 909]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6000000000000001)\n", "accuracy: 0.920 balanced accuracy: 0.788 kappa: 0.756\n", "[[ 91 10 22]\n", " [ 14 79 29]\n", " [ 1 20 934]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.868 balanced accuracy: 0.840 kappa: 0.669\n", "[[102 8 13]\n", " [ 15 99 8]\n", " [ 9 106 840]]\n", "--------------\n", "original\n", "accuracy: 0.861 balanced accuracy: 0.571 kappa: 0.472\n", "[[ 33 0 89]\n", " [ 3 56 64]\n", " [ 1 10 944]]\n", "rebalanced\n", "thresholds: [0.2, 0.6500000000000001]\n", "accuracy: 0.876 balanced accuracy: 0.730 kappa: 0.627\n", "[[ 67 2 53]\n", " [ 10 86 27]\n", " [ 25 32 898]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.881 balanced accuracy: 0.727 kappa: 0.634\n", "[[ 63 2 57]\n", " [ 8 88 27]\n", " [ 14 35 906]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.8)\n", "accuracy: 0.801 balanced accuracy: 0.765 kappa: 0.533\n", "[[ 91 5 26]\n", " [ 23 90 10]\n", " [ 65 110 780]]\n", "--------------\n", "original\n", "accuracy: 0.864 balanced accuracy: 0.561 kappa: 0.474\n", "[[ 48 1 73]\n", " [ 6 36 82]\n", " [ 1 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.892 balanced accuracy: 0.722 kappa: 0.663\n", "[[ 80 7 35]\n", " [ 13 67 44]\n", " [ 13 17 924]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.892 balanced accuracy: 0.722 kappa: 0.663\n", "[[ 80 7 35]\n", " [ 13 67 44]\n", " [ 13 17 924]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.851 balanced accuracy: 0.734 kappa: 0.594\n", "[[ 80 16 26]\n", " [ 13 80 31]\n", " [ 14 79 861]]\n", "--------------\n", "original\n", "accuracy: 0.892 balanced accuracy: 0.650 kappa: 0.614\n", "[[ 48 7 68]\n", " [ 6 69 48]\n", " [ 0 1 953]]\n", "rebalanced\n", "thresholds: [0.3, 0.6000000000000001]\n", "accuracy: 0.902 balanced accuracy: 0.709 kappa: 0.691\n", "[[ 70 18 35]\n", " [ 23 70 30]\n", " [ 0 11 943]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.902 balanced accuracy: 0.709 kappa: 0.691\n", "[[ 70 18 35]\n", " [ 23 70 30]\n", " [ 0 11 943]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.7000000000000001)\n", "accuracy: 0.894 balanced accuracy: 0.748 kappa: 0.697\n", "[[ 71 30 22]\n", " [ 23 87 13]\n", " [ 0 39 915]]\n", "--------------\n", "original\n", "accuracy: 0.885 balanced accuracy: 0.629 kappa: 0.576\n", "[[ 56 0 67]\n", " [ 5 53 64]\n", " [ 1 1 953]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.912 balanced accuracy: 0.764 kappa: 0.727\n", "[[ 84 8 31]\n", " [ 11 77 34]\n", " [ 7 14 934]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6000000000000001)\n", "accuracy: 0.916 balanced accuracy: 0.753 kappa: 0.729\n", "[[ 87 1 35]\n", " [ 16 69 37]\n", " [ 7 5 943]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.896 balanced accuracy: 0.809 kappa: 0.714\n", "[[102 6 15]\n", " [ 24 81 17]\n", " [ 19 44 892]]\n", "--------------\n", "original\n", "accuracy: 0.879 balanced accuracy: 0.609 kappa: 0.550\n", "[[ 37 5 82]\n", " [ 2 65 56]\n", " [ 0 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.913 balanced accuracy: 0.797 kappa: 0.745\n", "[[ 83 8 33]\n", " [ 15 93 15]\n", " [ 13 20 920]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.913 balanced accuracy: 0.797 kappa: 0.745\n", "[[ 83 8 33]\n", " [ 15 93 15]\n", " [ 13 20 920]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.879 balanced accuracy: 0.822 kappa: 0.687\n", "[[ 97 9 18]\n", " [ 19 96 8]\n", " [ 28 63 862]]\n", "--------------\n", "original\n", "accuracy: 0.865 balanced accuracy: 0.564 kappa: 0.469\n", "[[ 47 0 75]\n", " [ 0 38 85]\n", " [ 1 1 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.888 balanced accuracy: 0.735 kappa: 0.659\n", "[[ 82 9 31]\n", " [ 10 71 42]\n", " [ 11 31 913]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.900 balanced accuracy: 0.728 kappa: 0.677\n", "[[ 80 2 40]\n", " [ 10 68 45]\n", " [ 11 12 932]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.860 balanced accuracy: 0.747 kappa: 0.615\n", "[[ 92 8 22]\n", " [ 19 71 33]\n", " [ 27 59 869]]\n", "--------------\n", "original\n", "accuracy: 0.860 balanced accuracy: 0.559 kappa: 0.459\n", "[[ 34 2 87]\n", " [ 1 50 72]\n", " [ 3 3 948]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.882 balanced accuracy: 0.747 kappa: 0.651\n", "[[ 87 7 29]\n", " [ 10 73 40]\n", " [ 24 32 898]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6000000000000001)\n", "accuracy: 0.887 balanced accuracy: 0.726 kappa: 0.649\n", "[[ 83 5 35]\n", " [ 10 67 46]\n", " [ 18 22 914]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.823 balanced accuracy: 0.756 kappa: 0.560\n", "[[ 95 10 18]\n", " [ 19 79 25]\n", " [ 52 89 813]]\n", "--------------\n", "original\n", "accuracy: 0.902 balanced accuracy: 0.685 kappa: 0.651\n", "[[ 78 2 42]\n", " [ 1 51 70]\n", " [ 2 1 953]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.923 balanced accuracy: 0.818 kappa: 0.768\n", "[[101 6 15]\n", " [ 4 80 38]\n", " [ 11 18 927]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.922 balanced accuracy: 0.815 kappa: 0.766\n", "[[103 5 14]\n", " [ 7 77 38]\n", " [ 15 14 927]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.851 balanced accuracy: 0.842 kappa: 0.641\n", "[[108 11 3]\n", " [ 12 96 14]\n", " [ 28 111 817]]\n", "--------------\n", "original\n", "accuracy: 0.860 balanced accuracy: 0.547 kappa: 0.449\n", "[[ 54 0 69]\n", " [ 5 25 93]\n", " [ 1 0 953]]\n", "rebalanced\n", "thresholds: [0.25, 0.7000000000000001]\n", "accuracy: 0.901 balanced accuracy: 0.772 kappa: 0.706\n", "[[101 10 12]\n", " [ 13 66 44]\n", " [ 17 23 914]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.902 balanced accuracy: 0.737 kappa: 0.693\n", "[[101 7 15]\n", " [ 13 51 59]\n", " [ 14 9 931]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.8)\n", "accuracy: 0.819 balanced accuracy: 0.797 kappa: 0.573\n", "[[111 5 7]\n", " [ 18 81 24]\n", " [ 35 128 791]]\n", "--------------\n", "original\n", "accuracy: 0.851 balanced accuracy: 0.518 kappa: 0.396\n", "[[ 29 2 92]\n", " [ 2 39 81]\n", " [ 1 1 953]]\n", "rebalanced\n", "thresholds: [0.2, 0.7000000000000001]\n", "accuracy: 0.861 balanced accuracy: 0.702 kappa: 0.600\n", "[[ 76 10 37]\n", " [ 28 68 26]\n", " [ 33 33 889]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.7000000000000001)\n", "accuracy: 0.871 balanced accuracy: 0.725 kappa: 0.623\n", "[[ 69 13 41]\n", " [ 13 83 26]\n", " [ 19 43 893]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.846 balanced accuracy: 0.724 kappa: 0.587\n", "[[ 79 12 32]\n", " [ 28 77 17]\n", " [ 35 61 859]]\n", "--------------\n", "original\n", "accuracy: 0.876 balanced accuracy: 0.604 kappa: 0.540\n", "[[ 58 3 62]\n", " [ 7 42 73]\n", " [ 2 2 951]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.901 balanced accuracy: 0.773 kappa: 0.707\n", "[[ 89 12 22]\n", " [ 12 78 32]\n", " [ 10 31 914]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6500000000000001)\n", "accuracy: 0.901 balanced accuracy: 0.773 kappa: 0.707\n", "[[ 89 12 22]\n", " [ 12 78 32]\n", " [ 10 31 914]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.7000000000000001)\n", "accuracy: 0.892 balanced accuracy: 0.784 kappa: 0.695\n", "[[ 89 19 15]\n", " [ 12 84 26]\n", " [ 10 48 897]]\n", "--------------\n", "original\n", "accuracy: 0.833 balanced accuracy: 0.451 kappa: 0.275\n", "[[ 21 2 99]\n", " [ 3 22 97]\n", " [ 0 0 956]]\n", "rebalanced\n", "thresholds: [0.2, 0.7000000000000001]\n", "accuracy: 0.899 balanced accuracy: 0.731 kappa: 0.692\n", "[[ 86 10 26]\n", " [ 30 63 29]\n", " [ 9 17 930]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.7000000000000001)\n", "accuracy: 0.899 balanced accuracy: 0.731 kappa: 0.692\n", "[[ 86 10 26]\n", " [ 30 63 29]\n", " [ 9 17 930]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.15000000000000002, 0.8)\n", "accuracy: 0.817 balanced accuracy: 0.728 kappa: 0.557\n", "[[101 11 10]\n", " [ 53 61 8]\n", " [ 34 104 818]]\n", "--------------\n", "original\n", "accuracy: 0.882 balanced accuracy: 0.623 kappa: 0.580\n", "[[ 44 14 65]\n", " [ 6 63 54]\n", " [ 1 1 952]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.909 balanced accuracy: 0.759 kappa: 0.726\n", "[[ 72 22 29]\n", " [ 14 88 21]\n", " [ 7 16 931]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6500000000000001)\n", "accuracy: 0.909 balanced accuracy: 0.759 kappa: 0.726\n", "[[ 72 22 29]\n", " [ 14 88 21]\n", " [ 7 16 931]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.7500000000000001)\n", "accuracy: 0.876 balanced accuracy: 0.766 kappa: 0.667\n", "[[ 73 36 14]\n", " [ 14 96 13]\n", " [ 8 64 882]]\n", "--------------\n", "original\n", "accuracy: 0.857 balanced accuracy: 0.534 kappa: 0.436\n", "[[ 37 3 82]\n", " [ 8 37 79]\n", " [ 0 0 954]]\n", "rebalanced\n", "thresholds: [0.2, 0.7000000000000001]\n", "accuracy: 0.881 balanced accuracy: 0.700 kappa: 0.640\n", "[[ 78 14 30]\n", " [ 28 62 34]\n", " [ 16 21 917]]\n", "global kappa rebalanced\n", "thresholds: (0.2, 0.7000000000000001)\n", "accuracy: 0.881 balanced accuracy: 0.700 kappa: 0.640\n", "[[ 78 14 30]\n", " [ 28 62 34]\n", " [ 16 21 917]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.2, 0.7500000000000001)\n", "accuracy: 0.863 balanced accuracy: 0.721 kappa: 0.620\n", "[[ 81 21 20]\n", " [ 28 71 25]\n", " [ 18 53 883]]\n", "--------------\n", "original\n", "accuracy: 0.886 balanced accuracy: 0.633 kappa: 0.588\n", "[[ 46 7 68]\n", " [ 5 65 55]\n", " [ 2 0 952]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.895 balanced accuracy: 0.731 kappa: 0.684\n", "[[ 71 17 33]\n", " [ 21 80 24]\n", " [ 8 23 923]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.902 balanced accuracy: 0.720 kappa: 0.691\n", "[[ 71 13 37]\n", " [ 21 74 30]\n", " [ 6 11 937]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.7500000000000001)\n", "accuracy: 0.871 balanced accuracy: 0.740 kappa: 0.646\n", "[[ 71 29 21]\n", " [ 21 88 16]\n", " [ 8 60 886]]\n", "--------------\n", "original\n", "accuracy: 0.882 balanced accuracy: 0.625 kappa: 0.569\n", "[[ 46 4 73]\n", " [ 2 63 60]\n", " [ 1 2 949]]\n", "rebalanced\n", "thresholds: [0.3, 0.6500000000000001]\n", "accuracy: 0.902 balanced accuracy: 0.776 kappa: 0.714\n", "[[ 76 26 21]\n", " [ 3 94 28]\n", " [ 6 34 912]]\n", "global kappa rebalanced\n", "thresholds: (0.3, 0.6000000000000001)\n", "accuracy: 0.907 balanced accuracy: 0.762 kappa: 0.715\n", "[[ 75 17 31]\n", " [ 3 88 34]\n", " [ 5 22 925]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.7500000000000001)\n", "accuracy: 0.869 balanced accuracy: 0.799 kappa: 0.663\n", "[[ 76 31 16]\n", " [ 3 110 12]\n", " [ 6 89 857]]\n", "--------------\n", "original\n", "accuracy: 0.892 balanced accuracy: 0.671 kappa: 0.620\n", "[[ 54 3 66]\n", " [ 0 72 51]\n", " [ 8 2 944]]\n", "rebalanced\n", "thresholds: [0.25, 0.6500000000000001]\n", "accuracy: 0.907 balanced accuracy: 0.796 kappa: 0.725\n", "[[ 84 8 31]\n", " [ 7 92 24]\n", " [ 20 22 912]]\n", "global kappa rebalanced\n", "thresholds: (0.25, 0.6500000000000001)\n", "accuracy: 0.907 balanced accuracy: 0.796 kappa: 0.725\n", "[[ 84 8 31]\n", " [ 7 92 24]\n", " [ 20 22 912]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.25, 0.7500000000000001)\n", "accuracy: 0.883 balanced accuracy: 0.814 kappa: 0.690\n", "[[ 88 19 16]\n", " [ 7 100 16]\n", " [ 21 61 872]]\n" ] } ], "source": [ "accum_10_10_80 = []\n", "\n", "for rep in range(50):\n", " print('--------------')\n", " # Generate a ternary imbalanced classification problem\n", " X, y = make_classification(n_samples=6000, n_features=20,\n", " n_informative=10, n_redundant=0, n_classes=3, \n", " random_state=0xf00d+rep, shuffle=False, weights = [0.1, 0.1, 0.8])\n", " run_ternary_experiment(X,y,accum_10_10_80)\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "accum = accum_10_10_80\n", "figsize(9,6)\n", "scatter([x['orig-kappa'] for x in accum],[x['shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['orig-balanced'] for x in accum],[x['shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['orig-accuracy'] for x in accum],[x['shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('orig')\n", "ylabel('greedy shift');\n", "title('10-10-80');" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-k-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-k-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-k-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-kappa');\n", "title('10-10-80');" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-ba-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-ba-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-ba-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-balanced');\n", "title('10-10-80');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same conclusions as before (good thing!)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Some ChEMBL datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's just be sure that this approach works with bioactivity data too. I don't think it's necessary do a comprehensive evaluation here, but I want to show a couple of examples. I didn't cherry pick these." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CHEMBL205: Carbonic Anhydrase II" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
compound_chembl_idcanonical_smilesstandard_valuestandard_unitsstandard_relationstandard_typeyearROMolpKi
0CHEMBL1054NS(=O)(=O)c1cc2c(cc1Cl)NC(C(Cl)Cl)NS2(=O)=O91.0nM=Ki2009\"Mol\"/7.040959
1CHEMBL1055NS(=O)(=O)c1cc(C2(O)NC(=O)c3ccccc32)ccc1Cl138.0nM=Ki2009\"Mol\"/6.860121
2CHEMBL1060O=P([O-])([O-])O.[Na+].[Na+]13200000.0nM=Ki2004\"Mol\"/1.879426
3CHEMBL106848NS(=O)(=O)c1ccc(SCCO)cc121.0nM=Ki2013\"Mol\"/7.677781
4CHEMBL107217CCN(CC)C(=S)[S-].[Na+]3100.0nM=Ki2009\"Mol\"/5.508638
\n", "
" ], "text/plain": [ " compound_chembl_id canonical_smiles \\\n", "0 CHEMBL1054 NS(=O)(=O)c1cc2c(cc1Cl)NC(C(Cl)Cl)NS2(=O)=O \n", "1 CHEMBL1055 NS(=O)(=O)c1cc(C2(O)NC(=O)c3ccccc32)ccc1Cl \n", "2 CHEMBL1060 O=P([O-])([O-])O.[Na+].[Na+] \n", "3 CHEMBL106848 NS(=O)(=O)c1ccc(SCCO)cc1 \n", "4 CHEMBL107217 CCN(CC)C(=S)[S-].[Na+] \n", "\n", " standard_value standard_units standard_relation standard_type year \\\n", "0 91.0 nM = Ki 2009 \n", "1 138.0 nM = Ki 2009 \n", "2 13200000.0 nM = Ki 2004 \n", "3 21.0 nM = Ki 2013 \n", "4 3100.0 nM = Ki 2009 \n", "\n", " ROMol pKi \n", "0 \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
standard_valueyearpKi
countmeanstdmin25%50%75%maxcountmean...75%maxcountmeanstdmin25%50%75%max
activity
0968.01.242009e+183.864224e+1910000.00010000.000050000.0196700.0001.202264e+21968.02012.994835...2016.02020.0968.04.0691071.200449-12.0800003.7062164.3010305.0000005.00000
13582.07.292523e+021.778519e+033.20013.500073.4417.7509.900000e+033582.02013.261307...2017.02020.03582.07.0502310.9156515.0043656.3790847.1343067.8696668.49485
2427.01.309327e+008.709364e-010.0080.63551.02.0353.100000e+00427.02014.962529...2017.02020.0427.09.0506590.5007798.5086388.6914379.0000009.19689511.09691
\n", "

3 rows × 24 columns

\n", "" ], "text/plain": [ " standard_value \\\n", " count mean std min 25% \n", "activity \n", "0 968.0 1.242009e+18 3.864224e+19 10000.000 10000.0000 \n", "1 3582.0 7.292523e+02 1.778519e+03 3.200 13.5000 \n", "2 427.0 1.309327e+00 8.709364e-01 0.008 0.6355 \n", "\n", " year ... \\\n", " 50% 75% max count mean ... 75% \n", "activity ... \n", "0 50000.0 196700.000 1.202264e+21 968.0 2012.994835 ... 2016.0 \n", "1 73.4 417.750 9.900000e+03 3582.0 2013.261307 ... 2017.0 \n", "2 1.0 2.035 3.100000e+00 427.0 2014.962529 ... 2017.0 \n", "\n", " pKi \\\n", " max count mean std min 25% 50% \n", "activity \n", "0 2020.0 968.0 4.069107 1.200449 -12.080000 3.706216 4.301030 \n", "1 2020.0 3582.0 7.050231 0.915651 5.004365 6.379084 7.134306 \n", "2 2020.0 427.0 9.050659 0.500779 8.508638 8.691437 9.000000 \n", "\n", " \n", " 75% max \n", "activity \n", "0 5.000000 5.00000 \n", "1 7.869666 8.49485 \n", "2 9.196895 11.09691 \n", "\n", "[3 rows x 24 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def binner(act,bins=(5,8.5)):\n", " for i,bin in enumerate(bins):\n", " if act<=bin:\n", " return i\n", " return len(bins)\n", "data['activity'] = [binner(x) for x in data.pKi]\n", "data.groupby('activity').describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, that's imbalanced :-)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate fingerprints:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from rdkit.Chem import SaltRemover\n", "sr = SaltRemover.SaltRemover()\n", "stripped = [sr.StripMol(m) for m in data.ROMol]\n", "fpgen = rdFingerprintGenerator.GetMorganGenerator(radius=2)\n", "fps = [fpgen.GetFingerprint(m) for m in stripped]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now run the experiment with 20 random splits:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "original\n", "accuracy: 0.833 balanced accuracy: 0.547 kappa: 0.530\n", "[[126 68 0]\n", " [ 14 703 0]\n", " [ 0 84 1]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.851 balanced accuracy: 0.719 kappa: 0.647\n", "[[154 38 2]\n", " [ 39 656 22]\n", " [ 0 47 38]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.851 balanced accuracy: 0.719 kappa: 0.647\n", "[[154 38 2]\n", " [ 39 656 22]\n", " [ 0 47 38]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.712 balanced accuracy: 0.776 kappa: 0.493\n", "[[177 12 5]\n", " [ 96 467 154]\n", " [ 3 17 65]]\n", "original\n", "accuracy: 0.822 balanced accuracy: 0.538 kappa: 0.497\n", "[[117 77 0]\n", " [ 17 699 1]\n", " [ 0 82 3]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.837 balanced accuracy: 0.723 kappa: 0.622\n", "[[149 45 0]\n", " [ 41 642 34]\n", " [ 1 41 43]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.837 balanced accuracy: 0.723 kappa: 0.622\n", "[[149 45 0]\n", " [ 41 642 34]\n", " [ 1 41 43]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.745 balanced accuracy: 0.785 kappa: 0.528\n", "[[172 18 4]\n", " [ 93 505 119]\n", " [ 2 18 65]]\n", "original\n", "accuracy: 0.810 balanced accuracy: 0.518 kappa: 0.463\n", "[[114 80 0]\n", " [ 23 693 1]\n", " [ 1 84 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.827 balanced accuracy: 0.704 kappa: 0.596\n", "[[146 48 0]\n", " [ 49 638 30]\n", " [ 0 45 40]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.827 balanced accuracy: 0.704 kappa: 0.596\n", "[[146 48 0]\n", " [ 49 638 30]\n", " [ 0 45 40]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.35000000000000003, 0.1)\n", "accuracy: 0.728 balanced accuracy: 0.758 kappa: 0.494\n", "[[162 28 4]\n", " [ 83 500 134]\n", " [ 0 22 63]]\n", "original\n", "accuracy: 0.820 balanced accuracy: 0.534 kappa: 0.491\n", "[[115 79 0]\n", " [ 17 699 1]\n", " [ 1 81 3]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.841 balanced accuracy: 0.729 kappa: 0.631\n", "[[152 42 0]\n", " [ 48 643 26]\n", " [ 1 41 43]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.841 balanced accuracy: 0.729 kappa: 0.631\n", "[[152 42 0]\n", " [ 48 643 26]\n", " [ 1 41 43]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.735 balanced accuracy: 0.780 kappa: 0.518\n", "[[177 12 5]\n", " [ 96 492 129]\n", " [ 1 21 63]]\n", "original\n", "accuracy: 0.834 balanced accuracy: 0.549 kappa: 0.535\n", "[[127 67 0]\n", " [ 14 703 0]\n", " [ 2 82 1]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.852 balanced accuracy: 0.737 kappa: 0.652\n", "[[149 45 0]\n", " [ 34 655 28]\n", " [ 1 39 45]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.852 balanced accuracy: 0.737 kappa: 0.652\n", "[[149 45 0]\n", " [ 34 655 28]\n", " [ 1 39 45]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.772 balanced accuracy: 0.792 kappa: 0.566\n", "[[176 15 3]\n", " [ 75 531 111]\n", " [ 2 21 62]]\n", "original\n", "accuracy: 0.814 balanced accuracy: 0.511 kappa: 0.462\n", "[[107 87 0]\n", " [ 13 704 0]\n", " [ 3 82 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.842 balanced accuracy: 0.707 kappa: 0.619\n", "[[140 53 1]\n", " [ 37 658 22]\n", " [ 0 44 41]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.842 balanced accuracy: 0.707 kappa: 0.619\n", "[[140 53 1]\n", " [ 37 658 22]\n", " [ 0 44 41]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.749 balanced accuracy: 0.783 kappa: 0.533\n", "[[174 17 3]\n", " [ 81 509 127]\n", " [ 1 21 63]]\n", "original\n", "accuracy: 0.820 balanced accuracy: 0.530 kappa: 0.494\n", "[[120 74 0]\n", " [ 20 697 0]\n", " [ 2 83 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.832 balanced accuracy: 0.703 kappa: 0.607\n", "[[151 42 1]\n", " [ 54 641 22]\n", " [ 0 48 37]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.832 balanced accuracy: 0.703 kappa: 0.607\n", "[[151 42 1]\n", " [ 54 641 22]\n", " [ 0 48 37]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.747 balanced accuracy: 0.769 kappa: 0.526\n", "[[175 15 4]\n", " [108 510 99]\n", " [ 3 23 59]]\n", "original\n", "accuracy: 0.825 balanced accuracy: 0.539 kappa: 0.507\n", "[[122 72 0]\n", " [ 18 699 0]\n", " [ 0 84 1]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.835 balanced accuracy: 0.690 kappa: 0.605\n", "[[145 48 1]\n", " [ 45 652 20]\n", " [ 1 49 35]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.835 balanced accuracy: 0.690 kappa: 0.605\n", "[[145 48 1]\n", " [ 45 652 20]\n", " [ 1 49 35]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.750 balanced accuracy: 0.788 kappa: 0.536\n", "[[172 17 5]\n", " [ 93 510 114]\n", " [ 3 17 65]]\n", "original\n", "accuracy: 0.829 balanced accuracy: 0.539 kappa: 0.514\n", "[[121 73 0]\n", " [ 13 704 0]\n", " [ 0 84 1]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.843 balanced accuracy: 0.695 kappa: 0.624\n", "[[149 44 1]\n", " [ 31 657 29]\n", " [ 1 50 34]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.843 balanced accuracy: 0.695 kappa: 0.624\n", "[[149 44 1]\n", " [ 31 657 29]\n", " [ 1 50 34]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.35000000000000003, 0.1)\n", "accuracy: 0.794 balanced accuracy: 0.777 kappa: 0.586\n", "[[164 26 4]\n", " [ 51 568 98]\n", " [ 1 25 59]]\n", "original\n", "accuracy: 0.826 balanced accuracy: 0.536 kappa: 0.509\n", "[[122 72 0]\n", " [ 15 701 1]\n", " [ 1 84 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.846 balanced accuracy: 0.717 kappa: 0.637\n", "[[152 41 1]\n", " [ 37 652 28]\n", " [ 0 46 39]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.846 balanced accuracy: 0.717 kappa: 0.637\n", "[[152 41 1]\n", " [ 37 652 28]\n", " [ 0 46 39]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.735 balanced accuracy: 0.766 kappa: 0.512\n", "[[177 13 4]\n", " [107 496 114]\n", " [ 0 26 59]]\n", "original\n", "accuracy: 0.820 balanced accuracy: 0.530 kappa: 0.490\n", "[[117 77 0]\n", " [ 18 699 0]\n", " [ 1 83 1]]\n", "rebalanced\n", "thresholds: [0.45, 0.15000000000000002]\n", "accuracy: 0.831 balanced accuracy: 0.696 kappa: 0.592\n", "[[132 62 0]\n", " [ 30 654 33]\n", " [ 1 42 42]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.834 balanced accuracy: 0.720 kappa: 0.616\n", "[[150 44 0]\n", " [ 45 639 33]\n", " [ 1 42 42]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.715 balanced accuracy: 0.768 kappa: 0.490\n", "[[175 16 3]\n", " [101 474 142]\n", " [ 1 21 63]]\n", "original\n", "accuracy: 0.827 balanced accuracy: 0.537 kappa: 0.509\n", "[[120 74 0]\n", " [ 14 703 0]\n", " [ 1 83 1]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.838 balanced accuracy: 0.700 kappa: 0.616\n", "[[152 42 0]\n", " [ 47 648 22]\n", " [ 0 50 35]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.838 balanced accuracy: 0.700 kappa: 0.616\n", "[[152 42 0]\n", " [ 47 648 22]\n", " [ 0 50 35]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.746 balanced accuracy: 0.784 kappa: 0.531\n", "[[176 15 3]\n", " [ 96 504 117]\n", " [ 2 20 63]]\n", "original\n", "accuracy: 0.828 balanced accuracy: 0.538 kappa: 0.514\n", "[[123 71 0]\n", " [ 15 702 0]\n", " [ 1 84 0]]\n", "rebalanced\n", "thresholds: [0.45, 0.15000000000000002]\n", "accuracy: 0.845 balanced accuracy: 0.711 kappa: 0.623\n", "[[134 60 0]\n", " [ 24 664 29]\n", " [ 0 41 44]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.833 balanced accuracy: 0.725 kappa: 0.614\n", "[[149 45 0]\n", " [ 51 637 29]\n", " [ 0 41 44]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.736 balanced accuracy: 0.778 kappa: 0.518\n", "[[181 12 1]\n", " [102 491 124]\n", " [ 1 23 61]]\n", "original\n", "accuracy: 0.819 balanced accuracy: 0.525 kappa: 0.485\n", "[[116 78 0]\n", " [ 17 700 0]\n", " [ 1 84 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.820 balanced accuracy: 0.685 kappa: 0.577\n", "[[144 50 0]\n", " [ 47 637 33]\n", " [ 0 49 36]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.820 balanced accuracy: 0.685 kappa: 0.577\n", "[[144 50 0]\n", " [ 47 637 33]\n", " [ 0 49 36]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.727 balanced accuracy: 0.757 kappa: 0.495\n", "[[170 21 3]\n", " [100 494 123]\n", " [ 0 25 60]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "original\n", "accuracy: 0.822 balanced accuracy: 0.534 kappa: 0.502\n", "[[122 72 0]\n", " [ 20 697 0]\n", " [ 3 82 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.827 balanced accuracy: 0.701 kappa: 0.598\n", "[[149 45 0]\n", " [ 53 637 27]\n", " [ 2 45 38]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.827 balanced accuracy: 0.701 kappa: 0.598\n", "[[149 45 0]\n", " [ 53 637 27]\n", " [ 2 45 38]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.757 balanced accuracy: 0.791 kappa: 0.543\n", "[[175 19 0]\n", " [ 98 515 104]\n", " [ 2 19 64]]\n", "original\n", "accuracy: 0.827 balanced accuracy: 0.541 kappa: 0.517\n", "[[126 68 0]\n", " [ 19 698 0]\n", " [ 2 83 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.831 balanced accuracy: 0.709 kappa: 0.614\n", "[[159 34 1]\n", " [ 54 633 30]\n", " [ 3 46 36]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.831 balanced accuracy: 0.709 kappa: 0.614\n", "[[159 34 1]\n", " [ 54 633 30]\n", " [ 3 46 36]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.746 balanced accuracy: 0.781 kappa: 0.529\n", "[[174 17 3]\n", " [101 506 110]\n", " [ 4 18 63]]\n", "original\n", "accuracy: 0.826 balanced accuracy: 0.533 kappa: 0.505\n", "[[117 77 0]\n", " [ 12 705 0]\n", " [ 4 80 1]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.812 balanced accuracy: 0.663 kappa: 0.556\n", "[[141 53 0]\n", " [ 51 636 30]\n", " [ 3 50 32]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.812 balanced accuracy: 0.663 kappa: 0.556\n", "[[141 53 0]\n", " [ 51 636 30]\n", " [ 3 50 32]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.739 balanced accuracy: 0.742 kappa: 0.506\n", "[[170 21 3]\n", " [ 92 512 113]\n", " [ 3 28 54]]\n", "original\n", "accuracy: 0.837 balanced accuracy: 0.564 kappa: 0.557\n", "[[141 53 0]\n", " [ 24 693 0]\n", " [ 0 85 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.853 balanced accuracy: 0.735 kappa: 0.662\n", "[[169 25 0]\n", " [ 54 644 19]\n", " [ 0 48 37]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.853 balanced accuracy: 0.735 kappa: 0.662\n", "[[169 25 0]\n", " [ 54 644 19]\n", " [ 0 48 37]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.754 balanced accuracy: 0.810 kappa: 0.552\n", "[[186 6 2]\n", " [102 499 116]\n", " [ 0 19 66]]\n", "original\n", "accuracy: 0.809 balanced accuracy: 0.506 kappa: 0.447\n", "[[105 89 0]\n", " [ 16 701 0]\n", " [ 2 83 0]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.813 balanced accuracy: 0.671 kappa: 0.557\n", "[[138 56 0]\n", " [ 48 637 32]\n", " [ 2 48 35]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.813 balanced accuracy: 0.671 kappa: 0.557\n", "[[138 56 0]\n", " [ 48 637 32]\n", " [ 2 48 35]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.730 balanced accuracy: 0.758 kappa: 0.502\n", "[[170 20 4]\n", " [ 83 497 137]\n", " [ 2 23 60]]\n", "original\n", "accuracy: 0.821 balanced accuracy: 0.538 kappa: 0.499\n", "[[120 74 0]\n", " [ 20 696 1]\n", " [ 1 82 2]]\n", "rebalanced\n", "thresholds: [0.4, 0.15000000000000002]\n", "accuracy: 0.838 balanced accuracy: 0.716 kappa: 0.623\n", "[[154 40 0]\n", " [ 42 642 33]\n", " [ 0 46 39]]\n", "global kappa rebalanced\n", "thresholds: (0.4, 0.15000000000000002)\n", "accuracy: 0.838 balanced accuracy: 0.716 kappa: 0.623\n", "[[154 40 0]\n", " [ 42 642 33]\n", " [ 0 46 39]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.3, 0.1)\n", "accuracy: 0.771 balanced accuracy: 0.814 kappa: 0.572\n", "[[180 14 0]\n", " [ 78 521 118]\n", " [ 1 17 67]]\n" ] } ], "source": [ "accum_chembl205 = []\n", "for i in range(20):\n", " run_ternary_experiment(fps,data.activity,accum_chembl205,random_state=0xf00d+i)\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "accum = accum_chembl205\n", "figsize(9,6)\n", "scatter([x['orig-kappa'] for x in accum],[x['shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['orig-balanced'] for x in accum],[x['shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['orig-accuracy'] for x in accum],[x['shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('orig')\n", "ylabel('greedy shift');\n", "title('CHEMBL205');" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-k-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-k-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-k-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-kappa');\n", "title('CHEMBL205');" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-ba-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-ba-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-ba-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-balanced');\n", "title('CHEMBL205');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see the same behavior as before: shifting the descision thresholds using either the greedy approach or grid-based approach improves prediction accuracy over the default decision thresholds.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CHEMBL217: Dopamine D2" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
standard_valueyearpKi
countmeanstdmin25%50%75%maxcountmean...75%maxcountmeanstdmin25%50%75%max
activity
0356.0143415.189354781194.66832610000.00010000.000010000.0024234.510000000.00356.02011.679775...2017.02019.0356.04.6729160.5818652.0000004.6156265.0000005.0000005.000000
14014.0830.5461631471.61012510.00063.1875238.51931.09906.004014.02011.100648...2015.02020.04014.06.6200740.7249195.0041026.0310506.6224947.1993708.000000
2607.03.7159422.7861550.0271.21503.005.99.86607.02011.957166...2016.02019.0607.08.6146710.4758628.0061238.2291488.5228798.91545710.568636
\n", "

3 rows × 24 columns

\n", "
" ], "text/plain": [ " standard_value \\\n", " count mean std min 25% \n", "activity \n", "0 356.0 143415.189354 781194.668326 10000.000 10000.0000 \n", "1 4014.0 830.546163 1471.610125 10.000 63.1875 \n", "2 607.0 3.715942 2.786155 0.027 1.2150 \n", "\n", " year ... \\\n", " 50% 75% max count mean ... 75% \n", "activity ... \n", "0 10000.00 24234.5 10000000.00 356.0 2011.679775 ... 2017.0 \n", "1 238.51 931.0 9906.00 4014.0 2011.100648 ... 2015.0 \n", "2 3.00 5.9 9.86 607.0 2011.957166 ... 2016.0 \n", "\n", " pKi \\\n", " max count mean std min 25% 50% \n", "activity \n", "0 2019.0 356.0 4.672916 0.581865 2.000000 4.615626 5.000000 \n", "1 2020.0 4014.0 6.620074 0.724919 5.004102 6.031050 6.622494 \n", "2 2019.0 607.0 8.614671 0.475862 8.006123 8.229148 8.522879 \n", "\n", " \n", " 75% max \n", "activity \n", "0 5.000000 5.000000 \n", "1 7.199370 8.000000 \n", "2 8.915457 10.568636 \n", "\n", "[3 rows x 24 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_csv('../data/target_CHEMBL217.csv.gz')\n", "PandasTools.AddMoleculeColumnToFrame(data,smilesCol='canonical_smiles')\n", "data['pKi'] = [-math.log10(x*1e-9) for x in data['standard_value']]\n", "def binner(act,bins=(5,8)):\n", " for i,bin in enumerate(bins):\n", " if act<=bin:\n", " return i\n", " return len(bins)\n", "data['activity'] = [binner(x) for x in data.pKi]\n", "data.groupby('activity').describe()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from rdkit.Chem import SaltRemover\n", "sr = SaltRemover.SaltRemover()\n", "stripped = [sr.StripMol(m) for m in data.ROMol]\n", "fpgen = rdFingerprintGenerator.GetMorganGenerator(radius=2)\n", "fps = [fpgen.GetFingerprint(m) for m in stripped]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "original\n", "accuracy: 0.832 balanced accuracy: 0.436 kappa: 0.239\n", "[[ 9 62 0]\n", " [ 0 797 6]\n", " [ 0 99 23]]\n", "rebalanced\n", "thresholds: [0.1, 0.15000000000000002]\n", "accuracy: 0.810 balanced accuracy: 0.657 kappa: 0.453\n", "[[ 33 38 0]\n", " [ 33 696 74]\n", " [ 0 44 78]]\n", "global kappa rebalanced\n", "thresholds: (0.15000000000000002, 0.15000000000000002)\n", "accuracy: 0.819 balanced accuracy: 0.601 kappa: 0.434\n", "[[ 19 52 0]\n", " [ 10 719 74]\n", " [ 0 44 78]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.706 balanced accuracy: 0.665 kappa: 0.353\n", "[[ 33 29 9]\n", " [ 31 570 202]\n", " [ 0 22 100]]\n", "original\n", "accuracy: 0.832 balanced accuracy: 0.426 kappa: 0.234\n", "[[ 5 66 0]\n", " [ 3 798 2]\n", " [ 0 96 26]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.845 balanced accuracy: 0.693 kappa: 0.524\n", "[[ 43 28 0]\n", " [ 36 730 37]\n", " [ 0 53 69]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.845 balanced accuracy: 0.693 kappa: 0.524\n", "[[ 43 28 0]\n", " [ 36 730 37]\n", " [ 0 53 69]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.752 balanced accuracy: 0.725 kappa: 0.429\n", "[[ 42 24 5]\n", " [ 35 606 162]\n", " [ 0 21 101]]\n", "original\n", "accuracy: 0.838 balanced accuracy: 0.452 kappa: 0.276\n", "[[ 10 61 0]\n", " [ 2 798 3]\n", " [ 0 95 27]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.832 balanced accuracy: 0.701 kappa: 0.507\n", "[[ 45 26 0]\n", " [ 41 713 49]\n", " [ 1 50 71]]\n", "global kappa rebalanced\n", "thresholds: (0.15000000000000002, 0.2)\n", "accuracy: 0.850 balanced accuracy: 0.645 kappa: 0.508\n", "[[ 30 41 0]\n", " [ 8 746 49]\n", " [ 0 51 71]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.747 balanced accuracy: 0.747 kappa: 0.440\n", "[[ 45 19 7]\n", " [ 41 593 169]\n", " [ 1 15 106]]\n", "original\n", "accuracy: 0.841 balanced accuracy: 0.447 kappa: 0.282\n", "[[ 7 64 0]\n", " [ 1 801 1]\n", " [ 0 92 30]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.845 balanced accuracy: 0.700 kappa: 0.538\n", "[[ 38 33 0]\n", " [ 36 723 44]\n", " [ 1 40 81]]\n", "global kappa rebalanced\n", "thresholds: (0.15000000000000002, 0.2)\n", "accuracy: 0.849 balanced accuracy: 0.633 kappa: 0.510\n", "[[ 22 49 0]\n", " [ 16 743 44]\n", " [ 0 41 81]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.714 balanced accuracy: 0.701 kappa: 0.379\n", "[[ 38 28 5]\n", " [ 34 568 201]\n", " [ 1 16 105]]\n", "original\n", "accuracy: 0.838 balanced accuracy: 0.448 kappa: 0.271\n", "[[ 9 62 0]\n", " [ 0 799 4]\n", " [ 0 95 27]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.811 balanced accuracy: 0.650 kappa: 0.446\n", "[[ 35 36 0]\n", " [ 54 702 47]\n", " [ 0 51 71]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.811 balanced accuracy: 0.650 kappa: 0.446\n", "[[ 35 36 0]\n", " [ 54 702 47]\n", " [ 0 51 71]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.705 balanced accuracy: 0.689 kappa: 0.365\n", "[[ 35 32 4]\n", " [ 48 560 195]\n", " [ 0 15 107]]\n", "original\n", "accuracy: 0.840 balanced accuracy: 0.469 kappa: 0.290\n", "[[ 16 55 0]\n", " [ 0 798 5]\n", " [ 0 99 23]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.838 balanced accuracy: 0.703 kappa: 0.514\n", "[[ 47 24 0]\n", " [ 42 721 40]\n", " [ 0 55 67]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.837 balanced accuracy: 0.747 kappa: 0.548\n", "[[ 47 24 0]\n", " [ 42 701 60]\n", " [ 0 36 86]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.746 balanced accuracy: 0.760 kappa: 0.442\n", "[[ 47 20 4]\n", " [ 39 588 176]\n", " [ 0 14 108]]\n", "original\n", "accuracy: 0.842 balanced accuracy: 0.450 kappa: 0.293\n", "[[ 6 65 0]\n", " [ 1 800 2]\n", " [ 0 89 33]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.853 balanced accuracy: 0.699 kappa: 0.551\n", "[[ 38 33 0]\n", " [ 25 733 45]\n", " [ 1 42 79]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.853 balanced accuracy: 0.699 kappa: 0.551\n", "[[ 38 33 0]\n", " [ 25 733 45]\n", " [ 1 42 79]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.742 balanced accuracy: 0.713 kappa: 0.413\n", "[[ 38 30 3]\n", " [ 23 596 184]\n", " [ 0 17 105]]\n", "original\n", "accuracy: 0.827 balanced accuracy: 0.426 kappa: 0.223\n", "[[ 6 65 0]\n", " [ 2 793 8]\n", " [ 0 97 25]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.833 balanced accuracy: 0.683 kappa: 0.492\n", "[[ 45 26 0]\n", " [ 39 722 42]\n", " [ 0 59 63]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.833 balanced accuracy: 0.683 kappa: 0.492\n", "[[ 45 26 0]\n", " [ 39 722 42]\n", " [ 0 59 63]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.753 balanced accuracy: 0.745 kappa: 0.440\n", "[[ 45 23 3]\n", " [ 37 601 165]\n", " [ 0 18 104]]\n", "original\n", "accuracy: 0.845 balanced accuracy: 0.457 kappa: 0.313\n", "[[ 5 66 0]\n", " [ 0 800 3]\n", " [ 0 85 37]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.840 balanced accuracy: 0.691 kappa: 0.521\n", "[[ 38 33 0]\n", " [ 32 721 50]\n", " [ 0 44 78]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.840 balanced accuracy: 0.691 kappa: 0.521\n", "[[ 38 33 0]\n", " [ 32 721 50]\n", " [ 0 44 78]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.724 balanced accuracy: 0.703 kappa: 0.391\n", "[[ 37 29 5]\n", " [ 28 578 197]\n", " [ 0 16 106]]\n", "original\n", "accuracy: 0.846 balanced accuracy: 0.476 kappa: 0.329\n", "[[ 11 60 0]\n", " [ 3 798 2]\n", " [ 0 88 34]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.835 balanced accuracy: 0.715 kappa: 0.518\n", "[[ 48 23 0]\n", " [ 39 713 51]\n", " [ 0 51 71]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.835 balanced accuracy: 0.715 kappa: 0.518\n", "[[ 48 23 0]\n", " [ 39 713 51]\n", " [ 0 51 71]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.740 balanced accuracy: 0.755 kappa: 0.428\n", "[[ 48 21 2]\n", " [ 36 584 183]\n", " [ 0 17 105]]\n", "original\n", "accuracy: 0.832 balanced accuracy: 0.423 kappa: 0.220\n", "[[ 7 64 0]\n", " [ 2 801 0]\n", " [ 0 101 21]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.839 balanced accuracy: 0.663 kappa: 0.495\n", "[[ 38 33 0]\n", " [ 40 732 31]\n", " [ 0 56 66]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.839 balanced accuracy: 0.663 kappa: 0.495\n", "[[ 38 33 0]\n", " [ 40 732 31]\n", " [ 0 56 66]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.750 balanced accuracy: 0.697 kappa: 0.410\n", "[[ 38 30 3]\n", " [ 38 612 153]\n", " [ 0 25 97]]\n", "original\n", "accuracy: 0.842 balanced accuracy: 0.456 kappa: 0.294\n", "[[ 9 62 0]\n", " [ 1 800 2]\n", " [ 0 92 30]]\n", "rebalanced\n", "thresholds: [0.1, 0.25]\n", "accuracy: 0.829 balanced accuracy: 0.648 kappa: 0.461\n", "[[ 40 31 0]\n", " [ 39 728 36]\n", " [ 0 64 58]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.814 balanced accuracy: 0.693 kappa: 0.480\n", "[[ 40 31 0]\n", " [ 39 691 73]\n", " [ 0 42 80]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.725 balanced accuracy: 0.712 kappa: 0.395\n", "[[ 40 26 5]\n", " [ 39 578 186]\n", " [ 0 18 104]]\n", "original\n", "accuracy: 0.846 balanced accuracy: 0.486 kappa: 0.337\n", "[[ 14 57 0]\n", " [ 1 796 6]\n", " [ 0 89 33]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.845 balanced accuracy: 0.712 kappa: 0.542\n", "[[ 42 29 0]\n", " [ 43 721 39]\n", " [ 0 43 79]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.845 balanced accuracy: 0.712 kappa: 0.542\n", "[[ 42 29 0]\n", " [ 43 721 39]\n", " [ 0 43 79]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.732 balanced accuracy: 0.712 kappa: 0.400\n", "[[ 41 26 4]\n", " [ 41 587 175]\n", " [ 0 21 101]]\n", "original\n", "accuracy: 0.836 balanced accuracy: 0.439 kappa: 0.262\n", "[[ 6 65 0]\n", " [ 2 798 3]\n", " [ 0 93 29]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.835 balanced accuracy: 0.669 kappa: 0.497\n", "[[ 36 35 0]\n", " [ 43 723 37]\n", " [ 0 49 73]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.835 balanced accuracy: 0.669 kappa: 0.497\n", "[[ 36 35 0]\n", " [ 43 723 37]\n", " [ 0 49 73]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.747 balanced accuracy: 0.699 kappa: 0.415\n", "[[ 36 28 7]\n", " [ 40 606 157]\n", " [ 0 20 102]]\n", "original\n", "accuracy: 0.839 balanced accuracy: 0.457 kappa: 0.291\n", "[[ 9 62 0]\n", " [ 2 796 5]\n", " [ 0 91 31]]\n", "rebalanced\n", "thresholds: [0.1, 0.15000000000000002]\n", "accuracy: 0.829 balanced accuracy: 0.725 kappa: 0.527\n", "[[ 41 30 0]\n", " [ 36 696 71]\n", " [ 1 32 89]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.829 balanced accuracy: 0.725 kappa: 0.527\n", "[[ 41 30 0]\n", " [ 36 696 71]\n", " [ 1 32 89]]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.829 balanced accuracy: 0.725 kappa: 0.527\n", "[[ 41 30 0]\n", " [ 36 696 71]\n", " [ 1 32 89]]\n", "original\n", "accuracy: 0.837 balanced accuracy: 0.452 kappa: 0.274\n", "[[ 10 61 0]\n", " [ 2 797 4]\n", " [ 0 95 27]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.846 balanced accuracy: 0.697 kappa: 0.534\n", "[[ 41 30 0]\n", " [ 32 728 43]\n", " [ 2 46 74]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.2)\n", "accuracy: 0.846 balanced accuracy: 0.697 kappa: 0.534\n", "[[ 41 30 0]\n", " [ 32 728 43]\n", " [ 2 46 74]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.730 balanced accuracy: 0.709 kappa: 0.395\n", "[[ 41 28 2]\n", " [ 31 586 186]\n", " [ 2 20 100]]\n", "original\n", "accuracy: 0.840 balanced accuracy: 0.465 kappa: 0.290\n", "[[ 14 57 0]\n", " [ 2 798 3]\n", " [ 0 97 25]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.836 balanced accuracy: 0.669 kappa: 0.497\n", "[[ 37 34 0]\n", " [ 41 725 37]\n", " [ 0 51 71]]\n", "global kappa rebalanced\n", "thresholds: (0.15000000000000002, 0.2)\n", "accuracy: 0.855 balanced accuracy: 0.638 kappa: 0.515\n", "[[ 28 43 0]\n", " [ 13 753 37]\n", " [ 0 51 71]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.725 balanced accuracy: 0.694 kappa: 0.385\n", "[[ 37 29 5]\n", " [ 39 583 181]\n", " [ 0 20 102]]\n", "original\n", "accuracy: 0.827 balanced accuracy: 0.417 kappa: 0.213\n", "[[ 4 67 0]\n", " [ 3 795 5]\n", " [ 0 97 25]]\n", "rebalanced\n", "thresholds: [0.15000000000000002, 0.2]\n", "accuracy: 0.858 balanced accuracy: 0.649 kappa: 0.530\n", "[[ 28 43 0]\n", " [ 11 752 40]\n", " [ 0 47 75]]\n", "global kappa rebalanced\n", "thresholds: (0.15000000000000002, 0.2)\n", "accuracy: 0.858 balanced accuracy: 0.649 kappa: 0.530\n", "[[ 28 43 0]\n", " [ 11 752 40]\n", " [ 0 47 75]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.734 balanced accuracy: 0.726 kappa: 0.411\n", "[[ 43 23 5]\n", " [ 36 585 182]\n", " [ 1 18 103]]\n", "original\n", "accuracy: 0.839 balanced accuracy: 0.448 kappa: 0.277\n", "[[ 8 63 0]\n", " [ 0 799 4]\n", " [ 0 93 29]]\n", "rebalanced\n", "thresholds: [0.1, 0.15000000000000002]\n", "accuracy: 0.819 balanced accuracy: 0.699 kappa: 0.490\n", "[[ 42 29 0]\n", " [ 42 696 65]\n", " [ 1 43 78]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.819 balanced accuracy: 0.699 kappa: 0.490\n", "[[ 42 29 0]\n", " [ 42 696 65]\n", " [ 1 43 78]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.819 balanced accuracy: 0.699 kappa: 0.490\n", "[[ 42 29 0]\n", " [ 42 696 65]\n", " [ 1 43 78]]\n", "original\n", "accuracy: 0.830 balanced accuracy: 0.436 kappa: 0.226\n", "[[ 12 59 0]\n", " [ 2 797 4]\n", " [ 0 104 18]]\n", "rebalanced\n", "thresholds: [0.1, 0.2]\n", "accuracy: 0.837 balanced accuracy: 0.692 kappa: 0.511\n", "[[ 43 28 0]\n", " [ 38 721 44]\n", " [ 1 51 70]]\n", "global kappa rebalanced\n", "thresholds: (0.1, 0.15000000000000002)\n", "accuracy: 0.826 balanced accuracy: 0.718 kappa: 0.516\n", "[[ 43 27 1]\n", " [ 38 697 68]\n", " [ 1 38 83]]\n", "global balanced_accuracy rebalanced\n", "thresholds: (0.1, 0.1)\n", "accuracy: 0.729 balanced accuracy: 0.724 kappa: 0.403\n", "[[ 43 24 4]\n", " [ 37 580 186]\n", " [ 0 19 103]]\n" ] } ], "source": [ "accum_chembl217 = []\n", "for i in range(20):\n", " run_ternary_experiment(fps,data.activity,accum_chembl217,random_state=0xf00d+i)\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "accum = accum_chembl217\n", "figsize(9,6)\n", "scatter([x['orig-kappa'] for x in accum],[x['shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['orig-balanced'] for x in accum],[x['shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['orig-accuracy'] for x in accum],[x['shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.2,1],[.2,1]);\n", "legend();\n", "xlabel('orig')\n", "ylabel('greedy shift');\n", "title('CHEMBL217');" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-k-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-k-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-k-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-kappa');\n", "title('CHEMBL217');" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "scatter([x['shift-kappa'] for x in accum],[x['global-ba-shift-kappa'] for x in accum],label='kappa');\n", "scatter([x['shift-balanced'] for x in accum],[x['global-ba-shift-balanced'] for x in accum],label='balanced accuracy');\n", "scatter([x['shift-accuracy'] for x in accum],[x['global-ba-shift-accuracy'] for x in accum],label='accuracy');\n", "plot([.4,1],[.4,1]);\n", "legend();\n", "xlabel('greedy shift')\n", "ylabel('grid-balanced');\n", "title('CHEMBL217');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, the same conclusions hold here." ] } ], "metadata": { "interpreter": { "hash": "3a73ed6500393b2402507cd16c43dee9c9d335abda3b90ec79b3c9d5565d2650" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }