{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Importance Permutation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function to estimate the feature importance of classifiers and regressors based on *permutation importance*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> `from mlxtend.evaluate import feature_importance_permutation` " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The *permutation importance* is an intuitive, model-agnostic method to estimate the feature importance for classifier and regression models. The approach is relatively simple and straight-forward:\n", "\n", "1. Take a model that was fit to the training dataset\n", "2. Estimate the predictive performance of the model on an independent dataset (e.g., validation dataset) and record it as the baseline performance\n", "3. For each feature *i*:\n", " - randomly permute feature column *i* in the original dataset\n", " - record the predictive performance of the model on the dataset with the permuted column \n", " - compute the feature importance as the difference between the baseline performance (step 2) and the performance on the permuted dataset\n", "\n", "Permutation importance is generally considered as a relatively efficient technique that works well in practice [1], while a drawback is that the importance of correlated features may be overestimated [2].\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### References\n", "\n", "- [1] Terence Parr, Kerem Turgutlu, Christopher Csiszar, and Jeremy Howard. *Beware Default Random Forest Importances* (http://parrt.cs.usfca.edu/doc/rf-importance/index.html)\n", "- [2] Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC bioinformatics, 9(1), 307." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1 -- Feature Importance for Classifiers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following example illustrates the feature importance estimation via permutation importance based for classification models." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.svm import SVC\n", "from sklearn.model_selection import train_test_split\n", "from mlxtend.evaluate import feature_importance_permutation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Generate a toy dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import make_classification\n", "from sklearn.ensemble import RandomForestClassifier\n", "\n", "# Build a classification task using 3 informative features\n", "X, y = make_classification(n_samples=10000,\n", " n_features=10,\n", " n_informative=3,\n", " n_redundant=0,\n", " n_repeated=0,\n", " n_classes=2,\n", " random_state=0,\n", " shuffle=False)\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.3, random_state=1, stratify=y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Feature importance via random forest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we compute the feature importance directly from the random forest via *mean impurity decrease* (described after the code section):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training accuracy: 100.0\n", "Test accuracy: 95.0666666667\n", "[ 0.283357 0.30846795 0.24204291 0.02229767 0.02364941 0.02390578\n", " 0.02501543 0.0234225 0.02370816 0.0241332 ]\n" ] } ], "source": [ "forest = RandomForestClassifier(n_estimators=250,\n", " random_state=0)\n", "\n", "forest.fit(X_train, y_train)\n", "\n", "print('Training accuracy:', np.mean(forest.predict(X_train) == y_train)*100)\n", "print('Test accuracy:', np.mean(forest.predict(X_test) == y_test)*100)\n", "\n", "importance_vals = forest.feature_importances_\n", "print(importance_vals)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several strategies for computing the feature importance in random forest. The method implemented in scikit-learn (used in the next code example) is based on the Breiman and Friedman's CART (Breiman, Friedman, \"Classification and regression trees\", 1984), the so-called *mean impurity decrease*. Here, the importance value of a features is computed by averaging the impurity decrease for that feature, when splitting a parent node into two child nodes, across all the trees in the ensemble. Note that the impurity decrease values are weighted by the number of samples that are in the respective nodes. This process is repeated for all features in the dataset, and the feature importance values are then normalized so that they sum up to 1. In CART, the authors also note that this fast way of computing feature importance values is relatively consistent with the permutation importance." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's visualize the feature importance values from the random forest including a measure of the *mean impurity decrease* variability (here: standard deviation):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "std = np.std([tree.feature_importances_ for tree in forest.estimators_],\n", " axis=0)\n", "indices = np.argsort(importance_vals)[::-1]\n", "\n", "# Plot the feature importances of the forest\n", "plt.figure()\n", "plt.title(\"Random Forest feature importance\")\n", "plt.bar(range(X.shape[1]), importance_vals[indices],\n", " yerr=std[indices], align=\"center\")\n", "plt.xticks(range(X.shape[1]), indices)\n", "plt.xlim([-1, X.shape[1]])\n", "plt.ylim([0, 0.5])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, the features 1, 0, and 2 are estimated to be the most informative ones for the random forest classier. Next, let's compute the feature importance via the permutation importance approach." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Permutation Importance" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.26833333, 0.26733333, 0.261 , -0.002 , -0.00033333,\n", " 0.00066667, 0.00233333, 0.00066667, 0.00066667, -0.00233333])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "imp_vals, _ = feature_importance_permutation(\n", " predict_method=forest.predict, \n", " X=X_test,\n", " y=y_test,\n", " metric='accuracy',\n", " num_rounds=1,\n", " seed=1)\n", "\n", "imp_vals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the `feature_importance_permutation` returns two arrays. The first array (here: `imp_vals`) contains the actual importance values we are interested in. If `num_rounds > 1`, the permutation is repeated multiple times (with different random seeds), and in this case the first array contains the average value of the importance computed from the different runs. The second array (here, assigned to `_`, because we are not using it) then contains all individual values from these runs (more about that later)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's also visualize the importance values in a barplot:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "indices = np.argsort(imp_vals)[::-1]\n", "plt.figure()\n", "plt.title(\"Random Forest feature importance via permutation importance\")\n", "plt.bar(range(X.shape[1]), imp_vals[indices])\n", "plt.xticks(range(X.shape[1]), indices)\n", "plt.xlim([-1, X.shape[1]])\n", "plt.ylim([0, 0.5])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, also here, features 1, 0, and 2 are predicted to be the most important ones, which is consistent with the feature importance values that we computed via the *mean impurity decrease* method earlier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Note that in the context of random forests, the feature importance via permutation importance is typically computed using the out-of-bag samples of a random forest, whereas in this implementation, an independent dataset is used.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Previously, it was mentioned that the permutation is repeated multiple times if `num_rounds > 1`. In this case, the second array returned by the `feature_importance_permutation` contains the importance values for these individual runs (the array has shape [num_features, num_rounds), which we can use to compute some sort of variability between these runs. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "imp_vals, imp_all = feature_importance_permutation(\n", " predict_method=forest.predict, \n", " X=X_test,\n", " y=y_test,\n", " metric='accuracy',\n", " num_rounds=10,\n", " seed=1)\n", "\n", "\n", "std = np.std(imp_all, axis=1)\n", "indices = np.argsort(imp_vals)[::-1]\n", "\n", "plt.figure()\n", "plt.title(\"Random Forest feature importance via permutation importance w. std. dev.\")\n", "plt.bar(range(X.shape[1]), imp_vals[indices],\n", " yerr=std[indices])\n", "plt.xticks(range(X.shape[1]), indices)\n", "plt.xlim([-1, X.shape[1]])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It shall be noted that the feature importance values do not sum up to one, since they are not normalized (you can normalize them if you'd like, by dividing these by the sum of importance values). Here, the main point is to look at the importance values relative to each other and not to over-interpret the absolute values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Support Vector Machines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the permutation importance approach yields results that are generally consistent with the *mean impurity decrease* feature importance values from a random forest, it's a method that is model-agnostic and can be used with any kind of classifier or regressor. The example below applies the `feature_importance_permutation` function to a support vector machine:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training accuracy 95.0857142857\n", "Test accuracy 94.9666666667\n" ] } ], "source": [ "from sklearn.svm import SVC\n", "\n", "\n", "svm = SVC(C=1.0, kernel='rbf')\n", "svm.fit(X_train, y_train)\n", "\n", "print('Training accuracy', np.mean(svm.predict(X_train) == y_train)*100)\n", "print('Test accuracy', np.mean(svm.predict(X_test) == y_test)*100)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "imp_vals, imp_all = feature_importance_permutation(\n", " predict_method=svm.predict, \n", " X=X_test,\n", " y=y_test,\n", " metric='accuracy',\n", " num_rounds=10,\n", " seed=1)\n", "\n", "\n", "std = np.std(imp_all, axis=1)\n", "indices = np.argsort(imp_vals)[::-1]\n", "\n", "plt.figure()\n", "plt.title(\"SVM feature importance via permutation importance\")\n", "plt.bar(range(X.shape[1]), imp_vals[indices],\n", " yerr=std[indices])\n", "plt.xticks(range(X.shape[1]), indices)\n", "plt.xlim([-1, X.shape[1]])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1 -- Feature Importance for Regressors" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.43676245, 0.22231268, 0.00146906, 0.01611528, -0.00522067])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from mlxtend.evaluate import feature_importance_permutation\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.datasets import make_regression\n", "from sklearn.svm import SVR\n", "\n", "\n", "X, y = make_regression(n_samples=1000,\n", " n_features=5,\n", " n_informative=2,\n", " n_targets=1,\n", " random_state=123,\n", " shuffle=False)\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.3, random_state=123) \n", "\n", "svm = SVR(kernel='rbf')\n", "svm.fit(X_train, y_train)\n", "\n", "imp_vals, _ = feature_importance_permutation(\n", " predict_method=svm.predict, \n", " X=X_test,\n", " y=y_test,\n", " metric='r2',\n", " num_rounds=1,\n", " seed=1)\n", "\n", "imp_vals" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAADGxJREFUeJzt3XGoXvddx/H3Z8mi0g0Fd/8YSboEDdYwtWXXtDCYUiqkVhLBCilMVqiEwcKqG2hEKRr/qR1s/mH+WNSiqFtWp39c10gYW4cIruZ2q9M0Ri8hmkuEZm5uDrE17usf97Y83t30nic5d0/y7fsFF55zzq/P/R5C3pyce5/TVBWSpF7eMOsBJEnjM+6S1JBxl6SGjLskNWTcJakh4y5JDQ2Ke5L9Sc4nWUpydJ3jDye5kuT51a+fH39USdJQWzdakGQLcBz4CWAZOJNkoapeWLP0E1V1ZBNmlCRNaciV+z5gqaouVNXLwEng4OaOJUm6ERteuQPbgUsT28vA3eus+5kk7wL+CfjFqrq0dkGSw8BhgNtuu+0dd9xxx/QTS9Lr2HPPPfflqprbaN2QuGedfWufWfAXwMer6qUk7wX+ELj3W/6jqhPACYD5+flaXFwc8O0lSa9I8i9D1g25LbMM7JzY3gFcnlxQVf9eVS+tbv4u8I4h31yStDmGxP0MsCfJ7iTbgEPAwuSCJG+d2DwAnBtvREnStDa8LVNVV5McAU4DW4Anq+pskmPAYlUtAO9PcgC4CnwFeHgTZ5YkbSCzeuSv99wlaXpJnquq+Y3W+QlVSWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNbR11gPM2q6jT896hKlcfPyBWY8g6RbglbskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJamhQ3JPsT3I+yVKSo6+x7sEklWR+vBElSdPaMO5JtgDHgfuBvcBDSfaus+7NwPuBZ8ceUpI0nSFX7vuApaq6UFUvAyeBg+us+03gCeC/R5xPknQdhsR9O3BpYnt5dd+rktwF7KyqT73WGyU5nGQxyeKVK1emHlaSNMyQuGedffXqweQNwEeAD270RlV1oqrmq2p+bm5u+JSSpKkMifsysHNiewdweWL7zcDbgc8luQjcAyz4Q1VJmp0hcT8D7EmyO8k24BCw8MrBqvpaVb2lqnZV1S7g88CBqlrclIklSRvaMO5VdRU4ApwGzgFPVdXZJMeSHNjsASVJ0xv0P8iuqlPAqTX7HrvG2h+/8bEkSTfCT6hKUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8ZdkhoaFPck+5OcT7KU5Og6x9+b5O+TPJ/kr5PsHX9USdJQG8Y9yRbgOHA/sBd4aJ14f6yqfqiq7gSeAD48+qSSpMGGXLnvA5aq6kJVvQycBA5OLqiqr09s3gbUeCNKkqa1dcCa7cClie1l4O61i5K8D/gAsA24d703SnIYOAxw++23TzurJGmgIVfuWWfft1yZV9Xxqvo+4JeBX1vvjarqRFXNV9X83NzcdJNKkgYbEvdlYOfE9g7g8musPwn89I0MJUm6MUPifgbYk2R3km3AIWBhckGSPRObDwD/PN6IkqRpbXjPvaquJjkCnAa2AE9W1dkkx4DFqloAjiS5D/gf4KvAezZzaEnSaxvyA1Wq6hRwas2+xyZePzryXJKkG+AnVCWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkNbZz2ANs+uo0/PeoSpXXz8gVmPILXglbskNWTcJakh4y5JDRl3SWrIuEtSQ8ZdkhoaFPck+5OcT7KU5Og6xz+Q5IUkX0rymSRvG39USdJQG8Y9yRbgOHA/sBd4KMneNcu+CMxX1Q8DnwSeGHtQSdJwQ67c9wFLVXWhql4GTgIHJxdU1TNV9V+rm58Hdow7piRpGkPivh24NLG9vLrvWh4B/nK9A0kOJ1lMsnjlypXhU0qSpjIk7llnX627MHk3MA98aL3jVXWiquaran5ubm74lJKkqQx5tswysHNiewdwee2iJPcBvwr8WFW9NM54kqTrMeTK/QywJ8nuJNuAQ8DC5IIkdwEfBQ5U1YvjjylJmsaGca+qq8AR4DRwDniqqs4mOZbkwOqyDwFvAv40yfNJFq7xdpKkb4NBj/ytqlPAqTX7Hpt4fd/Ic0mSboCfUJWkhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNTQo7kn2JzmfZCnJ0XWOvyvJF5JcTfLg+GNKkqaxYdyTbAGOA/cDe4GHkuxds+xfgYeBj409oCRpelsHrNkHLFXVBYAkJ4GDwAuvLKiqi6vHvrkJM0qSpjTktsx24NLE9vLqvqklOZxkMcnilStXructJEkDDIl71tlX1/PNqupEVc1X1fzc3Nz1vIUkaYAhcV8Gdk5s7wAub844kqQxDIn7GWBPkt1JtgGHgIXNHUuSdCM2jHtVXQWOAKeBc8BTVXU2ybEkBwCS/GiSZeBngY8mObuZQ0uSXtuQ35ahqk4Bp9bse2zi9RlWbtdIkm4CfkJVkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJamjrrAeQtL5dR5+e9QhTu/j4A7MeQau8cpekhoy7JDVk3CWpoUFxT7I/yfkkS0mOrnP8O5J8YvX4s0l2jT2oJGm4DeOeZAtwHLgf2As8lGTvmmWPAF+tqu8HPgL81tiDSpKGG/LbMvuApaq6AJDkJHAQeGFizUHg11dffxL4nSSpqhpxVun/8bdJpGsbEvftwKWJ7WXg7mutqaqrSb4GfC/w5clFSQ4Dh1c3v5Hk/PUMfYt4C2vOfwy5ef5N1Pn8NuXcwPP7Ntm087tJvG3IoiFxzzr71l6RD1lDVZ0ATgz4nre8JItVNT/rOTZL5/PrfG7g+b1eDPmB6jKwc2J7B3D5WmuSbAW+G/jKGANKkqY3JO5ngD1JdifZBhwCFtasWQDes/r6QeCz3m+XpNnZ8LbM6j30I8BpYAvwZFWdTXIMWKyqBeD3gT9KssTKFfuhzRz6FtH99lPn8+t8buD5vS7EC2xJ6sdPqEpSQ8Zdkhoy7iPb6FENt7okTyZ5Mck/zHqWsSXZmeSZJOeSnE3y6KxnGlOS70zyt0n+bvX8fmPWM40tyZYkX0zyqVnPMmvGfUQDH9Vwq/sDYP+sh9gkV4EPVtUPAvcA72v25/cScG9V/QhwJ7A/yT0znmlsjwLnZj3EzcC4j+vVRzVU1cvAK49qaKOq/oqmn2Goqn+rqi+svv5PViKxfbZTjadWfGN1842rX21+oyLJDuAB4PdmPcvNwLiPa71HNbSJw+vJ6pNN7wKene0k41q9bfE88CLw6arqdH6/DfwS8M1ZD3IzMO7jGvQYBt3ckrwJ+DPgF6rq67OeZ0xV9b9VdScrnzTfl+Tts55pDEl+Cnixqp6b9Sw3C+M+riGPatBNLMkbWQn7n1TVn896ns1SVf8BfI4+Pz95J3AgyUVWbofem+SPZzvSbBn3cQ15VINuUknCyqetz1XVh2c9z9iSzCX5ntXX3wXcB/zjbKcaR1X9SlXtqKpdrPy9+2xVvXvGY82UcR9RVV0FXnlUwzngqao6O9upxpXk48DfAD+QZDnJI7OeaUTvBH6Olau+51e/fnLWQ43orcAzSb7EyoXIp6vqdf8rg135+AFJasgrd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJamh/wOqoc/oAlft6gAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure()\n", "plt.bar(range(X.shape[1]), imp_vals)\n", "plt.xticks(range(X.shape[1]))\n", "plt.xlim([-1, X.shape[1]])\n", "plt.ylim([0, 0.5])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## API" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "## feature_importance_permutation\n", "\n", "*feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, seed=None)*\n", "\n", "Feature importance imputation via permutation importance\n", "\n", "**Parameters**\n", "\n", "\n", "- `X` : NumPy array, shape = [n_samples, n_features]\n", "\n", " Dataset, where n_samples is the number of samples and\n", " n_features is the number of features.\n", "\n", "\n", "- `y` : NumPy array, shape = [n_samples]\n", "\n", " Target values.\n", "\n", "\n", "- `predict_method` : prediction function\n", "\n", " A callable function that predicts the target values\n", " from X.\n", "\n", "\n", "- `metric` : str, callable\n", "\n", " The metric for evaluating the feature importance through\n", " permutation. By default, the strings 'accuracy' is\n", " recommended for classifiers and the string 'r2' is\n", " recommended for regressors. Optionally, a custom\n", " scoring function (e.g., `metric=scoring_func`) that\n", " accepts two arguments, y_true and y_pred, which have\n", " similar shape to the `y` array.\n", "\n", "\n", "- `num_rounds` : int (default=1)\n", "\n", " Number of rounds the feature columns are permuted to\n", " compute the permutation importance.\n", "\n", "\n", "- `seed` : int or None (default=None)\n", "\n", " Random seed for permuting the feature columns.\n", "\n", "**Returns**\n", "\n", "\n", "- `mean_importance_vals, all_importance_vals` : NumPy arrays.\n", "\n", " The first array, mean_importance_vals has shape [n_features, ] and\n", " contains the importance values for all features.\n", " The shape of the second array is [n_features, num_rounds] and contains\n", " the feature importance for each repetition. If num_rounds=1,\n", " it contains the same values as the first array, mean_importance_vals.\n", "\n", "**Examples**\n", "\n", "For usage examples, please see\n", " [http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/](http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/)\n", "\n", "\n" ] } ], "source": [ "with open('../../api_modules/mlxtend.evaluate/feature_importance_permutation.md', 'r') as f:\n", " s = f.read() \n", "print(s)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }