{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Rulefit Boston Housing Demo\n", "\n", "Rulefit algorithm aims for a compromise between interpretability and complexity of the resulting model. While simpler ML algorithms usually miss interaction effects or require advanced methods to uncover interaction effects, rulefit learns a sparse linear model that include automatically detected interaction effects in the form of decision rules. After that, new features are created in the form of decision rules and a transparent model is built using these features.\n", "\n", "Example: IF the number of rooms > 2 AND the age of the house < 15 THEN 1 ELSE 0 (lower than medium)\n", "\n", "The general algorithm flow:\n", "\n", "1. Algorithm fits a tree ensemble to the data, builds a rule ensemble by traversing each tree. This results in many rules but majority of them not informative.\n", "2. After that, it evaluates the rules on the data to build a rule feature set and fits a sparse linear model (LASSO) to the rule feature set joined with the original feature set, to select the best ones. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Boston house prices dataset:\n", "\n", "The response variable is the price of the houses and the goal is to produce a model with significant variables that can predict the house price using the given explanatory variables. Each record describes a Boston suburb or town. The data was created from the Boston Standard Metropolitan Statistical Area (SMSA) in the 70s. The attributes are defined as follows:\n", "\n", "- CRIM: per capita crime rate by town\n", "- ZN: proportion of residential land zoned for lots over 25000 sq. ft.\n", "- INDUS: proportion of non retail business acres per town\n", "- CHAS: Charles River dummy var (= 1 if tract bounds rivers; 0 othervise)\n", "- NOX: nitric oxides concentration (parts per 10 milion)\n", "- RM: average number of rooms per dweling\n", "- AGE: proportion of owner-occupied units built prior to 1940\n", "- DIS: weighed distances to five Boston employment centers\n", "- RAD: index of accesibility to radial highways\n", "- TAX: full value property=tax rate per 10000 USD\n", "- PTRATIO: pupil-teacher ratio by town\n", "- B: 1000(Bk - 0.63)2 where Bk is the proportion of blacks by town\n", "- LSTAT: % lower status of the population\n", "- MEDV: Median value of owner-occupied homes in 1000s USD" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "versionFromGradle='3.36.1',projectVersion='3.36.1.99999',branch='zuzana/rulefit_boston_demo',lastCommitHash='49ae4e81f2be4ece461ddccfb5ed1ec923cb4415',gitDescribe='jenkins-3.36.1.2-7-g49ae4e81f2-dirty',compiledOn='2022-06-02 14:19:03',compiledBy='zuzanaolajcova'\n", "Checking whether there is an H2O instance running at http://localhost:54321 . connected.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
H2O_cluster_uptime:08 secs
H2O_cluster_timezone:Europe/Prague
H2O_data_parsing_timezone:UTC
H2O_cluster_version:3.36.1.99999
H2O_cluster_version_age:8 minutes
H2O_cluster_name:zuzanaolajcova
H2O_cluster_total_nodes:1
H2O_cluster_free_memory:3.548 Gb
H2O_cluster_total_cores:12
H2O_cluster_allowed_cores:12
H2O_cluster_status:locked, healthy
H2O_connection_url:http://localhost:54321
H2O_connection_proxy:{\"http\": null, \"https\": null}
H2O_internal_security:False
Python_version:3.8.1 final
" ], "text/plain": [ "-------------------------- -----------------------------\n", "H2O_cluster_uptime: 08 secs\n", "H2O_cluster_timezone: Europe/Prague\n", "H2O_data_parsing_timezone: UTC\n", "H2O_cluster_version: 3.36.1.99999\n", "H2O_cluster_version_age: 8 minutes\n", "H2O_cluster_name: zuzanaolajcova\n", "H2O_cluster_total_nodes: 1\n", "H2O_cluster_free_memory: 3.548 Gb\n", "H2O_cluster_total_cores: 12\n", "H2O_cluster_allowed_cores: 12\n", "H2O_cluster_status: locked, healthy\n", "H2O_connection_url: http://localhost:54321\n", "H2O_connection_proxy: {\"http\": null, \"https\": null}\n", "H2O_internal_security: False\n", "Python_version: 3.8.1 final\n", "-------------------------- -----------------------------" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import h2o\n", "from h2o.estimators.random_forest import H2ORandomForestEstimator\n", "h2o.init()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%\n" ] } ], "source": [ "train = h2o.import_file(\"https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv\")\n", "x = train.columns\n", "y = \"medv\"\n", "x.remove(y)\n", "x.remove(\"b\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAZ/0lEQVR4nO3de/wddX3n8ddbolxEReQnRUIMKuqi9UIjYq0WL1UUJd7Wgrqi8jBa0aq1q2Ddyq5lH9paEdeVGiuCynIRb9TLVsQLDy2C4SIQwJJykcRA4gWR6oKBz/4xk+nhx++XnIScMyf5vZ6Px3mcme/MOfM5Aznv33xnzndSVUiSBHCvvguQJE0OQ0GS1DEUJEkdQ0GS1DEUJEkdQ0GS1DEUNLGSLE9yYN919CnJi5PckOTWJE8c87ZPSvI349ym+mcoqBdJrkvy7Gltr0nyvfXzVfWYqvrORt5nYZJKMm9Epfbtg8Cbq2rnqrq472K07TMUpA2YgLB5KLC85xo0hxgKmliDRxNJ9k+yLMktSW5K8qF2tXPb55vbLpanJLlXkvckuT7JmiSfTvKAgfd9dbvs50n+27TtHJPkzCSfTXIL8Jp22+cluTnJ6iQfTXKfgferJG9KcnWSXyd5X5KHJ/mXtt4zBtef9hlnrDXJ9kluBbYDfpTk32Z5/SZtO8kLklzSfpZ/SfK4gWVPTHJR+z6nAzsMLLsyyQsG5uclWZtkv2H/e2rrYChoa3E8cHxV3R94OHBG2/709nmXtovlPOA17eMZwMOAnYGPAiTZF/gY8EpgD+ABwJ7TtrUYOBPYBTgFuAN4O7Ab8BTgWcCbpr3mucAfAAcA7wSWAq8C9gIeCxw2y+easdaquq2qdm7XeXxVPXzWPTPktttzEicCbwAeBHwcOKsNoPsAXwI+A+wKfA546cA2Tp32GZ4L/KyqLtpAXdoKGQrq05fav1hvTnIzzZf1bH4HPCLJblV1a1X9YAPrvhL4UFVdU1W3AkcDh7ZdQS8D/qmqvldVtwN/DUwfAOy8qvpSVd1ZVb+tqgur6gdVta6qrqP5Mv3jaa/526q6paqWA5cD32i3/yvg68BsJ4k3VOuwht32EuDjVXV+Vd1RVScDt9GEyQHAvYEPV9XvqupM4IcD2/g/wCFJdmrnX0ETFNrGGArq04uqapf1D+7+1/egI4BHAlcl+eFgV8YMHgJcPzB/PTAP2L1ddsP6BVX1G+Dn015/w+BMkkcm+UqSG9supf9Jc9Qw6KaB6d/OML8zM9tQrcMadtsPBd4xLYj3amt4CLCq7jpCZldXVa0ArgRe2AbDITRBoW2MoaCtQlVdXVWHAQ8GPgCcmeS+3P2vfICf0nwBrrcAWEfzZbkamL9+QZIdabpS7rK5afMnAFcB+7TdV+8GsvmfZuhat7QbgGMHg7iqdqqqU2n2y55JBj/XgmmvX9+FtBi4og0KbWMMBW0VkrwqyVRV3Qnc3DbfCaxtnx82sPqpwNuT7J1kZ5q/7E+vqnU05wpemOQP2370Y9j4F/z9gFuAW5M8GvizLfSxNlbrlvYJ4I1JnpzGfZMcnOR+wHk0YfTnSe6d5CXA/tNefxrwHJrP71HCNspQ0NbiIGB5e0XO8cChbX//b4Bjge+3XSIH0JxM/QzNlUnXAv8PeAtA2+/+FpovuNXArcAamr712fwlTR/6r2m+WE/fgp9r1lq3tKpaBrye5qT7L4EVNCe5ac+vvKSd/wXwp8AXpr1+NU14/CFbdh9ogsSb7Ggua/86v5mma+jansuReueRguacJC9MslN7TuKDwGXAdf1WJU0GQ0Fz0WKaE7w/Bfah6YrykFnC7iNJ0oCRHSkkObH92f7l09rfkuSqNCNg/u1A+9FJViT5cZLnjqouSdLsRjnY10k0Vzl8en1DkmfQHLo/vqpuS/Lgtn1f4FDgMTQ/ovlmkkdW1R0b2sBuu+1WCxcuHE31krSNuvDCC39WVVMzLRtZKFTVuUkWTmv+M+D9VXVbu86atn0xcFrbfm2SFTTXSJ+3oW0sXLiQZcuWbdnCJWkbl+T62ZaN+0TzI4GnJTk/yXeTPKlt35O7Di2wkrsPUiZJGrFxjxU/j2YExgOAJwFnJHnYhl9yV0mW0AzsxYIF03+FL0m6J8Z9pLAS+EI1LqAZnmA3YBXNwFzrzW/b7qaqllbVoqpaNDU1Y5eYJGkzjTsUvkQzbjxJHgncB/gZcBbNcMHbJ9mb5trxC8ZcmyTNeSPrPkpyKnAgsFuSlcB7acZ5ObG9TPV24PD2R0PLk5wBXEEzKNeRG7vySJK05W3VP15btGhRefWRJG2aJBdW1aKZljnMhSSpYyhIkjqGgiSpM+7fKUgbtPCor/ZdAgDXvf/gvkuQeuGRgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjojC4UkJyZZ096PefqydySpJLu180nykSQrklyaZL9R1SVJmt0ojxROAg6a3phkL+A5wE8Gmp8H7NM+lgAnjLAuSdIsRhYKVXUu8IsZFh0HvBOogbbFwKer8QNglyR7jKo2SdLMxnpOIcliYFVV/Wjaoj2BGwbmV7ZtkqQxGtvtOJPsBLybpuvonrzPEpouJhYsWLAFKpMkrTfOI4WHA3sDP0pyHTAfuCjJ7wGrgL0G1p3ftt1NVS2tqkVVtWhqamrEJUvS3DK2UKiqy6rqwVW1sKoW0nQR7VdVNwJnAa9ur0I6APhVVa0eV22SpMYoL0k9FTgPeFSSlUmO2MDqXwOuAVYAnwDeNKq6JEmzG9k5hao6bCPLFw5MF3DkqGqRJA3HXzRLkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpYyhIkjqGgiSpM8p7NJ+YZE2Sywfa/i7JVUkuTfLFJLsMLDs6yYokP07y3FHVJUma3SiPFE4CDprWdjbw2Kp6HPCvwNEASfYFDgUe077mY0m2G2FtkqQZjCwUqupc4BfT2r5RVeva2R8A89vpxcBpVXVbVV0LrAD2H1VtkqSZ9XlO4XXA19vpPYEbBpatbNvuJsmSJMuSLFu7du2IS5SkuaWXUEjyV8A64JRNfW1VLa2qRVW1aGpqassXJ0lz2LxxbzDJa4AXAM+qqmqbVwF7Daw2v22TJI3RWI8UkhwEvBM4pKp+M7DoLODQJNsn2RvYB7hgnLVJkkZ4pJDkVOBAYLckK4H30lxttD1wdhKAH1TVG6tqeZIzgCtoupWOrKo7RlWbJGlmIwuFqjpshuZPbmD9Y4FjR1WPJGnj/EWzJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOiMLhSQnJlmT5PKBtl2TnJ3k6vb5gW17knwkyYoklybZb1R1SZJmN8ojhZOAg6a1HQWcU1X7AOe08wDPA/ZpH0uAE0ZYlyRpFiMLhao6F/jFtObFwMnt9MnAiwbaP12NHwC7JNljVLVJkmY27nMKu1fV6nb6RmD3dnpP4IaB9Va2bXeTZEmSZUmWrV27dnSVStIc1NuJ5qoqoDbjdUuralFVLZqamhpBZZI0d407FG5a3y3UPq9p21cBew2sN79tkySN0VChkOT3t9D2zgIOb6cPB7480P7q9iqkA4BfDXQzSZLGZN6Q630syfY0VxSdUlW/2tgLkpwKHAjslmQl8F7g/cAZSY4Argde3q7+NeD5wArgN8BrN+EzSJK2kKFCoaqelmQf4HXAhUkuAD5VVWdv4DWHzbLoWTOsW8CRw9QiSRqdoc8pVNXVwHuAdwF/DHwkyVVJXjKq4iRJ4zXsOYXHJTkOuBJ4JvDCqvpP7fRxI6xPkjRGw55T+F/APwLvrqrfrm+sqp8mec9IKpMkjd2woXAw8NuqugMgyb2AHarqN1X1mZFVJ0kaq2HPKXwT2HFgfqe2TZK0DRk2FHaoqlvXz7TTO42mJElSX4YNhX8fHM46yR8Av93A+pKkrdCw5xTeBnwuyU+BAL8H/OmoipIk9WPYH6/9MMmjgUe1TT+uqt+NrixJUh+GPVIAeBKwsH3Nfkmoqk+PpCpJUi+GCoUknwEeDlwC3NE2F2AoSNI2ZNgjhUXAvu0YRZKkbdSwVx9dTnNyWZK0DRv2SGE34Ip2dNTb1jdW1SEjqUqS1IthQ+GYURYhSZoMw16S+t0kDwX2qapvJtkJ2G60pUmSxm3Yq49eDywBdqW5CmlP4B+Y4YY50rZg4VFf7bsEAK57/8F9l6A5ZtgTzUcCTwVuge6GOw8eVVGSpH4MGwq3VdXt62eSzKP5ncJmSfL2JMuTXJ7k1CQ7JNk7yflJViQ5Pcl9Nvf9JUmbZ9gTzd9N8m5gxyR/ArwJ+KfN2WCSPYE/p/ndw2+TnAEcCjwfOK6qTkvyD8ARwAmbsw1tuknpLpHUr2GPFI4C1gKXAW8AvkZzv+bNNY8mYObRDMG9mubWnme2y08GXnQP3l+StBmGvfroTuAT7eMeqapVST4I/IRm+O1vABcCN1fVuna1lTQns+8myRKak94sWLDgnpYjSRow1JFCkmuTXDP9sTkbTPJAYDGwN/AQ4L7AQcO+vqqWVtWiqlo0NTW1OSVIkmaxKWMfrbcD8J9pLk/dHM8Grq2qtQBJvkBzZdMuSea1RwvzgVWb+f6SpM001JFCVf184LGqqj4MbO4F1D8BDkiyU5LQ/NbhCuDbwMvadQ4HvryZ7y9J2kzD/nhtv4HZe9EcOWzKvRg6VXV+kjOBi4B1wMXAUuCrwGlJ/qZt++TmvL8kafMN+8X+9wPT64DrgJdv7kar6r3Ae6c1XwPsv7nvKUm654a9+ugZoy5EktS/YbuP/mJDy6vqQ1umHElSnzbl6qMnAWe18y8ELgCuHkVRkqR+DBsK84H9qurXAEmOAb5aVa8aVWGSpPEbdpiL3YHbB+Zvb9skSduQYY8UPg1ckOSL7fyLaMYnkiRtQ4a9+ujYJF8HntY2vbaqLh5dWZKkPgzbfQTNaKa3VNXxwMoke4+oJklST4YdEO+9wLuAo9umewOfHVVRkqR+DHuk8GLgEODfAarqp8D9RlWUJKkfw4bC7VVVtLfgTHLf0ZUkSerLsKFwRpKP0wxv/Xrgm2yBG+5IkibLRq8+aoe3Ph14NHAL8Cjgr6vq7BHXJkkas42GQlVVkq9V1e8DBoEkbcOG7T66KMmTRlqJJKl3w/6i+cnAq5JcR3MFUmgOIh43qsIkSeO3wVBIsqCqfgI8d0z1SJJ6tLEjhS/RjI56fZLPV9VLx1CTJKknGzunkIHph22pjSbZJcmZSa5KcmWSpyTZNcnZSa5unx+4pbYnSRrOxkKhZpm+p44H/m9VPRp4PHAlcBRwTlXtA5zTzkuSxmhj3UePT3ILzRHDju00/MeJ5vtv6gaTPAB4OvAamje5Hbg9yWLgwHa1k4Hv0Iy3JEkakw2GQlVtN4Jt7g2sBT6V5PHAhcBbgd2ranW7zo3MchOfJEuAJQALFiwYQXmSNHdtytDZW8o8YD/ghKp6Is0lrnfpKhocZ2m6qlpaVYuqatHU1NTIi5WkuaSPUFgJrKyq89v5M2lC4qYkewC0z2t6qE2S5rSxh0JV3QjckORRbdOzgCuAs4DD27bDgS+PuzZJmuuG/UXzlvYW4JQk9wGuAV5LE1BnJDkCuB54eU+1SdKc1UsoVNUlwKIZFj1rzKVIkgb0cU5BkjSh+uo+UmvhUV/tuwRJ6nikIEnqGAqSpI6hIEnqGAqSpI6hIEnqGAqSpI6hIEnqGAqSpI6hIEnqGAqSpI6hIEnqGAqSpI6hIEnqGAqSpI6hIEnqGAqSpE5voZBkuyQXJ/lKO793kvOTrEhyenv/ZknSGPV5pPBW4MqB+Q8Ax1XVI4BfAkf0UpUkzWG93I4zyXzgYOBY4C+SBHgm8Ip2lZOBY4AT+qhPmhSTcLvW695/cN8laIz6OlL4MPBO4M52/kHAzVW1rp1fCew50wuTLEmyLMmytWvXjrxQSZpLxh4KSV4ArKmqCzfn9VW1tKoWVdWiqampLVydJM1tfXQfPRU4JMnzgR2A+wPHA7skmdceLcwHVvVQmyTNaWM/Uqiqo6tqflUtBA4FvlVVrwS+DbysXe1w4Mvjrk2S5rpJ+p3Cu2hOOq+gOcfwyZ7rkaQ5p5erj9arqu8A32mnrwH277MeSZrrJulIQZLUM0NBktQxFCRJHUNBktQxFCRJHUNBktTp9ZJUSdraTMIghTC6gQoNBUkbtK1/Cequ7D6SJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSx1CQJHUMBUlSZ+yhkGSvJN9OckWS5Une2rbvmuTsJFe3zw8cd22SNNf1caSwDnhHVe0LHAAcmWRf4CjgnKraBzinnZckjdHYB8SrqtXA6nb610muBPYEFgMHtqudDHwHeNeo6piUQb4kaZL0ek4hyULgicD5wO5tYADcCOw+y2uWJFmWZNnatWvHU6gkzRG9hUKSnYHPA2+rqlsGl1VVATXT66pqaVUtqqpFU1NTY6hUkuaOXkIhyb1pAuGUqvpC23xTkj3a5XsAa/qoTZLmsj6uPgrwSeDKqvrQwKKzgMPb6cOBL4+7Nkma6/q489pTgf8CXJbkkrbt3cD7gTOSHAFcD7y8h9okaU7r4+qj7wGZZfGzxlmLJOmu/EWzJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOoaCJKljKEiSOn2MfSRJm8wbY42HRwqSpI6hIEnqGAqSpI6hIEnqGAqSpI6hIEnqGAqSpM7EhUKSg5L8OMmKJEf1XY8kzSUTFQpJtgP+N/A8YF/gsCT79luVJM0dExUKwP7Aiqq6pqpuB04DFvdckyTNGZM2zMWewA0D8yuBJw+ukGQJsKSdvTXJj4HdgJ+NpcJNM4l1TWJNYF2bYhJrgsmsaxJrgi1QVz5wj7b/0NkWTFoobFRVLQWWDrYlWVZVi3oqaVaTWNck1gTWtSkmsSaYzLomsSaY3Lpg8rqPVgF7DczPb9skSWMwaaHwQ2CfJHsnuQ9wKHBWzzVJ0pwxUd1HVbUuyZuBfwa2A06squVDvHTpxlfpxSTWNYk1gXVtikmsCSazrkmsCSa3LlJVfdcgSZoQk9Z9JEnqkaEgSepsdaGQ5MQka5JcPtC2a5Kzk1zdPj9wQuo6JsmqJJe0j+ePuaa9knw7yRVJlid5a9ve2/7aQE1976sdklyQ5EdtXf+9bd87yfntsCuntxdATEJdJyW5dmB/PWGcdbU1bJfk4iRfaed73Vez1DQJ++m6JJe121/WtvX+nTWbrS4UgJOAg6a1HQWcU1X7AOe08+N2EnevC+C4qnpC+/jamGtaB7yjqvYFDgCObIcN6XN/zVYT9LuvbgOeWVWPB54AHJTkAOADbV2PAH4JHDEhdQH814H9dcmY6wJ4K3DlwHzf+2qmmqD//QTwjHb763+bMAnfWTPa6kKhqs4FfjGteTFwcjt9MvCicdYEs9bVq6paXVUXtdO/pvnHsic97q8N1NSratzazt67fRTwTODMtn3s/29toK5eJZkPHAz8Yzsfet5X02uacL1/Z81mqwuFWexeVavb6RuB3fssZpo3J7m07V7q7RAxyULgicD5TMj+mlYT9Lyv2q6HS4A1wNnAvwE3V9W6dpWV9BBg0+uqqvX769h2fx2XZPsxl/Vh4J3Ane38g+h/X02vab0+9xM0If6NJBe2w/TAhPwbnMm2Egqdaq6x7f0vqdYJwMNpDvtXA3/fRxFJdgY+D7ytqm4ZXNbX/pqhpt73VVXdUVVPoPkl/f7Ao8ddw0ym15XkscDRNPU9CdgVeNe46knyAmBNVV04rm1uzAZq6m0/DfijqtqPZvTnI5M8fXDhhH1nbTOhcFOSPQDa5zU91wNAVd3U/oO+E/gEzRfNWCW5N82X7ylV9YW2udf9NVNNk7Cv1quqm4FvA08Bdkmy/keevQ67MlDXQW03XFXVbcCnGO/+eipwSJLraEYyfiZwPP3uq7vVlOSzPe8nAKpqVfu8BvhiW8NEfmfBthMKZwGHt9OHA1/usZbO+v/orRcDl8+27oi2H+CTwJVV9aGBRb3tr9lqmoB9NZVkl3Z6R+BPaM53fBt4Wbva2P/fmqWuqwa+UELTHz22/VVVR1fV/KpaSDMUzbeq6pX0uK9mqelVfe6ndrv3TXK/9dPAc9oaJvI7C4Cq2qoewKk03Qu/o+m3PIKmP/Mc4Grgm8CuE1LXZ4DLgEtp/ifYY8w1/RHNYemlwCXt4/l97q8N1NT3vnoccHG7/cuBv27bHwZcAKwAPgdsPyF1favdX5cDnwV2HmddA/UdCHxlEvbVLDX1up/affKj9rEc+Ku2vffvrNkeDnMhSepsK91HkqQtwFCQJHUMBUlSx1CQJHUMBUlSx1CQxijJd5JM5A3bJTAUJEkDDAVpI5IsTHJVOzb/vyY5Jcmzk3y/HQ9///aXqye29z64OMni9rU7JjktyZVJvgjs2La/McnfDWzjNUk+2tNHlDr+eE3aiHY01xU0I7ouB35I8wvVI4BDgNcCVwBXVNVn22EpLmjXfwPw2Kp6XZLHARfR3EfieuC8au49QJKvA8dW1ffG+NGku5m38VUkAddW1WUASZbT3CClklwGLKQZAO6QJH/Zrr8DsAB4OvARgKq6NMml7fTaJNe0N8y5mmYkz++P8wNJMzEUpOHcNjB958D8nTT/ju4AXlpVPx58UTMO26xOA14OXAV8sTxs1wTwnIK0Zfwz8JZ2NE6SPLFtPxd4Rdv2WJoB7tb7Is0duA6jCQipd4aCtGW8j+ZWmZe23Uvva9tPAHZOciXwP4DuJjBV9Uua4bkfWlUXjLleaUaeaJYkdTxSkCR1DAVJUsdQkCR1DAVJUsdQkCR1DAVJUsdQkCR1/j8hBwyFqvb5AAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
breaks counts mids_true mids widths
9.5 nan nan nan nan
14 21 2.5 11.75 4.5
18.5 55 4.75 16.25 4.5
23 82 7 20.75 4.5
27.5 154 9.25 25.25 4.5
32 84 11.5 29.75 4.5
36.5 41 13.75 34.25 4.5
41 30 16 38.75 4.5
45.5 8 18.25 43.25 4.5
50 31 20.65 47.75 4.5
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train['medv'].hist()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
0.0063218 2.31 00.5386.575 65.24.09 1 296 15.3396.9 4.98 24
0.02731 0 7.07 00.4696.421 78.94.9671 2 242 17.8396.9 9.14 21.6
0.02729 0 7.07 00.4697.185 61.14.9671 2 242 17.8392.83 4.03 34.7
0.03237 0 2.18 00.4586.998 45.86.0622 3 222 18.7394.63 2.94 33.4
0.06905 0 2.18 00.4587.147 54.26.0622 3 222 18.7396.9 5.33 36.2
0.02985 0 2.18 00.4586.43 58.76.0622 3 222 18.7394.12 5.21 28.7
0.0882912.5 7.87 00.5246.012 66.65.5605 5 311 15.2395.6 12.43 22.9
0.1445512.5 7.87 00.5246.172 96.15.9505 5 311 15.2396.9 19.15 27.1
0.2112412.5 7.87 00.5245.631100 6.0821 5 311 15.2386.63 29.93 16.5
0.1700412.5 7.87 00.5246.004 85.96.5921 5 311 15.2386.71 17.1 18.9
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before training a rulefit model, we will train a tree based model for comparison. By default, rulefit uses H2ORandomForestEstimator in the following configuration:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "drf Model Build progress: |██████████████████████████████████████████████████████| (done) 100%\n", "RMSE: \n", "4.239620141024194\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0MAAAJTCAYAAADKYc3AAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAA2wUlEQVR4nO3debgkd10v/veHTCCE4LBEEAMyygXDEhLIhH0JCF4wCorcy3aBABoxFwFBf8QNogIGvQoIEg0IYYkriATDpiFhNZABQhYIizAKYQ0kg0lIyPL9/VF1oNPT58yZmXNO58z39Xqefrq71k91V1fXu+tb1dVaCwAAQG+uN+8CAAAA5kEYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQ7ISqOryqWlUdu5vTOXKczpE7Mc6J4zibdmfewOKq6tjxc3b4nOY/c9tQVVuraus8alote+IyTVup7wxg9QhDrAtVddL4hXL0MoZ9zzjsL6xFbXuKiS/t0+ddy2rblTDag6o6fUdBYCKUHznRrarqYVX1iqo6q6ouqqrLq+ozVfWyqrrlDuZ7h6r6i6o6v6ouqapLx3FfVVU/uRvLsXC7aqzp/Kr6h6p6SlXtt7PTXea81+2PFtf1z8UYnibf12uqaltVnVFVz66qvedd43o3sQ4sept3jbuih+DNrtsw7wJgmV6d5PFJfinJqxYbaNwBeUiSryZ5+yrU8dEkd0xy4SpMG9arGyR5Z5LvJXl/kn9LsleSByd5VpLHVtX9W2ufmx6xqp6Z5M8y/Dj3viT/kqQlOTTJ05McVVXPaa39+S7U9fokW5NUkhsn+YkM24f/leTFVfW01to7psZ5ZZK/S/JfuzC/lfDWJGdk2IYx28uTXJxhHfuxJI9K8tIkP5Xk5+ZX1h7lk0n+ed5FwFoQhlgXWmunV9Vnk9ytqu7eWvv4IoM+LcOOz+taa1etQh2XJTl/pacL69zVSX43yataaxctdKyq62X48eJXMgSea+2oVtWTMuzYfjvJL7TW3j/V//4ZdsheXlUXtdbeuJN1ndhaO31qmvskeW6SP0jy1qp66OR8W2sXZo4/drTWtiXZNq/5rxMva61tXXhSVX+Y5KwkP1tVD2ytvW9ehe1BzmqtHTvvImAtaCbHevLq8f6XZ/Wsqr2SPCXDr8qvGbv9fFW9qao+Oza9ubSqPlZVzxx31KansdDE5Seq6teq6uyq+u5C07HF2n9X1aFV9fKq+mRVfXtsIvS5qvrTqrrpUgtVVUdU1YfH2i6qqjdX1e135oWpqnuO432tqr5XVV+qqr+qqh/dmeksMu3vN52pqodW1QfGpkzfrKrXVdVNxuHuVlX/Mi7DJVV18qymQhNNmG5QVS+sqi9W1RVV9R9V9YKquv4idfxUVb1rfH2vGN/T46pq4xLzuH5VPb+GJldXjO/v6UleNw76uqkmIJvG8X90HO9DE6/pV6rqb6rqTjPmt2kc/8Tx8d9V1YXjerClqn52idf3MVV16sR6s7Wq/raqNs8Y9nFVdVpVXTwO++mq+t2qusFi018LrbUrW2svmgxCY/drMoSOJDl8sl9V3TjJy8anj58OQuP4H0jyhPHpy8ZxdrfWy1trL0rywiTXzxDGJuuaec5QVd2/qt5eVV8e16Wv1dA86wUTw7QkTx6ffnFivdo6Mcyi6+bYf8mmalW1sapeWVUXjOvAp2rYntXUcEueq1JTzYaW87kYh9tQVUePy/6dqrqsqj5RVc+o2dvUGvudN9Z7wVj/dp/bXdVa+3yGo4pJctjU/O8wbie21LDNuqKq/rOqTqiqW8+o9/uvW1UdUlWnjJ+3y6rqfVV1n1k1VNUtq+qvq+rrNXxnnFVVT5417MQ4t6+qN4yvycI25g01Y/s/uV6O24GPjTV9par+bGEbUFUPHtex79SwLX5jVd18ua/lztqNZXh8VX2khu+KrRPD7FtVvzW+fpeO/f+9qh43Y3pVVU+u4fvzm+P69aWqendVPWYc5vAaPpe3TXLbqfX6xNV6XVhfHBliPXl9khcleVxVPXc8SjPp4UkOSPKvrbUvjt2OS3JNko8kuSDJxgxNd16e4UvziYvM6+VJ7p/klCTvyPDL91J+OckvZPhC/rcMPzQcmuQ5SR5eVfdsrf33jPEeNdb91iSnJzkkyS8meVBV3ae19pkdzDdV9dQkJyS5IsnJSb6U5PYZmhT+XFXdq7W2Ek1+HpHkZzM0Y/rLJPdJcmSSTVX1W0lOTfKBJH+d5KAMRwF+oqruOu4UT/uHDO/Bm5NcmeSRSY5NsrmqHtFa+37b9Kr6lSTHJ7k0yT8m+UaGnevnjct439baxTPm8ZZxHu/McIThGxle54vH+b0twy/KCxam8YAkxyQ5bZzGJRle00cnecQ4v0/OmN9tMzSl/EKSNya5WZLHJHlbVT2ktXbaxDJVhp3PJ2c4EvFPSb6Z5NZJHpTkM0m2TAz/2gxh/8tjTRcnuVeSP0zyUzUc4bhqYvhjk7wgye/P+RfeK8f76SO1j05y0yQfba29e7GRW2vvqqozM7yPj84Pdth31/9L8ptJDqmqO7fWzltswKp6WIZtwXcyfMYuyPDe3jHJ0Ul+fxz095P8fJKD84OmXJm4nzRr3dyR62fYvtwkQ1O+62fYXrw8yU8m+b/LmMZiTswOPhc1nJPz9iT/M8P6+TdJLs+wvr4iyT2z/Tb1ZUmemaHZ3wn5wWf9nmP939uNmme5cur5ozI0tzwtyYfH+d05P9g+bm6tXTBjOpuT/H9J/j3Dj2s/luG1PrWqDpncNlfV/uO0fyLJB8fbrTJsJ98zq8iqOizDe3njDOvUp5IcmOT/JHnkuL04c8aov5bhO+OfM2zLfjrJrye5WVW9LcN6cUqG1/o+4/T2H8dZUbuxDM9N8tAM69JpGb6XU8MPa+9NcrckH0/y2gzfpf8zyd+Mn9PfnZjOi5L8VpIvZvg+2ZbhdT8sQ1PYv8/QTPb3kzx7HOdlE+OftWtLzh6ntebmtm5uGTZuLcmRM/q9bez36Ilut5sx3PUyBKuW5J5T/U4cu1+Q5MdnjHv42P/Yqe63TbLXjOGfNg7/vKnuR47dW5Kfner3rLH7qYvUtmmi2x0yfLl/PskBU8P/VIYQ99ZlvrYLy3b6IrVeleSBU6/jv479vp3kCVPj/fXY75FT3U8fu382yU0nuu+TYcejJXni1Gt7RYYd0QOnpvWqcfgTFpnH2Un2n7GsC8u03Xo09r9FkhvP6H5whmD0zqnumybezxdM9fufY/d3THU/auz+0SQbp/rtleRWM+r9pyQ3nBr22LHfsxbpfuysZVxkuRdetxPH8WfdzlrqtZsxzeeNw//tIuvHi5YxjReNw75mJ5fj8B0M94FxuKfMeN0On+j2lrHbwTOmsf/U8xMz9TldiXUzw05dy7CjfYOJ7jdL8h9jvwdMdD98qfd/nN7WnfxcLLw2r8jE9m5cX7f7vGfYGW8Ztk83m+g++VnfOmteS9S83WubIQheOvY7dKrfAZOv10T3n86wfTx+qvvC6zbrPfiVsfurprqfMHZ/6VT3zRnC2bXehwxNuT89dp/ebj5m7H5+kuvNeO23JbnjRPcbJDlvXJZvZfFt9CHLfI0X1oGzMvvzf8gKLMOlSe42Y94njv3/v6nu+yR5V4YfNg+Z6P6tDD8O7buMz+XWnVnX3Pq6zb0AN7eduWXYwW9JPjjV/Vbjl87Xk+y9jOncfZzO86e6L2yMn7XIeAtflMcus94av7zeO9V94Qvn1Bnj7JVh56Elue2M2jZNdHvp2O2IReb/1gwhZrsd+yWW7fRFan3jjHGeNPZ7/4x+D8zscHB6pgLPjBpOm+j2O2O3F88Y/qYZQtJ3c+0dxIV5PHKRZV1YpiN3YR08OcOv4XtPdNs0Tm9rZofi/0xy4VS3c8ZxttspmDH+J8b1+yaLrC8XZjjCMtl9/wy/0m63w73EfBZet+XcdvjaZfiF9rLxPbrdVL93jNN5+jKm8/TMCJTLWI7DdzDc32Vq5ytLh6E7LGPeJ2Z5YWin1s38IAjcf4lxXjfRbeGzdOwi89manQhDGXasv5XhCM+GGf1vkmFn9R8mur06U2FzRn1bZ9W3RM0tw6/7x2Y4Kvr6DD9QtCR/stxpjdM7O8kXFqnrgzOG3zvD53DLVLdLx3V84xLrw7ET3e47dvvwInUthPTJcLuwXv7hjOGfP/Z7w4x+Tx77PXmZr8nCOrDk5343l+GlM4a/eYbvqjMXmd7B47h/PNHtWxmOCm0Xdpezvru5Ldw0k2O9eW+GX0HvW1V3bK19euz+lAzNPk9srX2/mcTYVvo3k/xMhiYMN5qa3gGLzOejO1PU2HzkV5I8NsmdMhz2n2w/v9h8tjvRt7V2dVV9MMntMjQX+M8lZn3v8f6BY5OFabfIsLN8hyQfW2oZlmHLjG5fGe9nTXuh6cl27fJHs05y/mCGXzjvNtHt7uP9e6cHbq1dVFWfyNCs7cAMV0CatFPv46SqOiLDTvjmDMFienu5f7a/4tdZrbVZTSq/lB+8V6mqGyW5S5Kvt9Y+sYM69s2wI3BhkmfXtU8NWXBFhiZb39d270IAD2pTFx6YqOfE/OC8mEVV1R0yNIPZO8ljW2v/sYu1rKaFF7PtYLiTMjS3+khV/X2Gpj0faq19eTfmvSvr5lUZmmNNO328v9uMfivlDhmOQn0uye8ush5+N9deDxc+u0t91nfFs2Z0O7a19vvTHcfmqE/IsJN/cIYfUPaaGGSxZnrbbe9aa1dW1dfHaSw4MMm+ST7QhotfTDs9239eFt2mTXS/X4b3c/pcupXeDi/m9a21I5fovzvLMGvdPyzD+7LYeW4Ll02fXL9OytBs8FNV9Q8Z1rN/X+R9gEUJQ6wrrbVWVa9J8kcZ2nw/d/yyW2iOtnCRhYX2x2cm+fEMG983ZGjOdVWGXzGflaGJwSxf28nS/j7DOUNfyNBc72sZdlCToa3yYvP5+g7mv3EH8104MfY3dzDcSvynyqwvmKuW0W+x//7Ybtlba1dV1YUZQtyChddgsUsNL3S/yYx+O/s+Jkmq6lkZfn2+KEMzk//KcISj5QfnhMx6Ty9eZJJX5drheKHWWecqTLtphp32H85wDtB13hiETsuw8/zY1trJMwZbeG9us4xJLgzzlSWH2nkLFxj55lIDtdb+qYaLYDw3yVMz/PCRqvpYkt9qrf3rLsx7V9bNCxcJ28vdXuyOhW3N7bP0eji5rVmoZ6nP+q748dba1hquDHhIhnNzXlBVX2jbX3HwzzJsg7+a5N0ZPnPfHfsdmaEZ7iwXL9L9qlw7TC26jKNZ7/PubNNWeju8q1Z6u7ywfh2WqYtgTJlcv349w3fuUzKc43lMkquq6h1JntuGC2vADglDrEevy3CFqieNJ+7fP8NRn/dObfx+KUMQ2u4E8qq6d2b/urhgR78UT05rc4Yg9G9JHt6ufRL79TKchLuYxf6M8kfG+x39wrXQf2Nr7TvLKPe65JaZ+i+XqtqQ4YjL5LIsLOOPZGgbP+1WU8N9X2tt2e/jVA3HZvjCvntr7atT/e89a7yddPF4v9gRw0kLy/WJ1trdlxzyOqCq7pjhYho3T/K/WmtvW2TQD2bYiXlIhqaQS3nIeP+hFSky37+a3aHj04/saPjW2ilJThmP6t0zw8VEfjXJv1TV3Vprn9qZ+e/Kuplk/6raa0YgmrW9WLhoyWLf8zfJ4jv8syxM+62ttUft5Di3zLDT+n0Tn/VdPrrWWrs8yRlV9fAM56ccX1Wntta+Ms7jFhku3nBukvu0qYvYzLpC2S6YXMZZfmRGt21L9EuW2KZdh+zOMsxa9xeGe2lr7TnLKWD8HLwsw5Umb5HhSNRjM1w84c7jBReuWGISkMSltVmHWmtfz3Dexv4ZfqX/pbHXCVOD/o/x/i0zJvPAFSxpYT4nt+3/2+geSW64xLjb1VHDJcLvNz5dsglVhj9nTIZAuN7Meg/ul+FX18nlXnh8+PTA49G/QzKcw/Pp6f5LWNiZ3GtGv/0z7Ch+eEYQ2i8/aB6yy1prl2bYQbtlVS3ZtKm1dkmGEHjnqrrZ7s57NVXVQRmaBd0syaOWCELJcBXBi5Pco6oeusQ0H5rhc/TtcZyV8psZPpsfn2huu0OttUtba+8dd9henOGKaJNX6lpq3dpdGzJclGDa4eP95OfmovF+uyNvVfU/Mvso0lK1n5/xCoZjs+DlWPg/uKU+67tt/Jy+OEMz6Mmmcj+RYT/nPTOC0K3H/rvr/AxHjQ+p2ZcLP3xGt0W3aaMHjfeL/Z/edcFKL8NHMwT4Xfoua619o7X2T621/52hid7tMjRFXnB1VuczyR5AGGK9WmgO99wMR2UuzHCxgElbx/vDJzuOO5+/tYK1LDafWyT5ix2M++Da/j9onpFhQ35aa22p84WS5JUZTuh96dg06Vpq+C+T62pQ+r2a+A+mscnLH41PXzcx3JsyLOOvjTtxk/4wyQ8ledNO/gL4rfH+x2b0+0aGnZtDx/CzUN/eGS5hvP9OzGcpfz7e/9X0TlRVXa+qbjXR6c8y7HS/dgyAmRr+plV196lu+1fVgeNlf1ddVR2SoWncjTNcHOCUpYYfj2Q+d3z6N1V13xnTvE+Gyzcnya9P79DuYp37VNVvZzga9b0sfYR4YZwHjEcypi0cDZi8zP9S69ZK+KOa+F+pMSAvXG548nNzfoYjrI8ct0ULw98wP1j3pi1a+/hDzysy/OL/5+N0rqWqblXX/h+uE8f735kM8lOf9ZXyigxN1Y6sH/zHzdbx/n7jj0wL898vw3fIbreOGc9RPSnDen/sZL+x1cATZoz2oQyXJr9fVT16apxHZwgEn81w9PS6akWXobX2jQyv4+aq+r3J92tiurerqh8fH99gkW3G3hl+jEm2/1z+8Kz1FjSTY716T4YvunuMz1/ZWps+EfYNGX79fVlVPSjDib+3z9C85Z8yXP5zJZyZ4YvhUVX14Qwb/1tm+LX4M1n6PIe3J3lrVb01wxXkDhnH+3aG/y9ZUmvt/Br+Z+i1Sc6rqndl+ALaO8MOzf0znA9x4C4t2er6dIaaJ/9n6HYZ/iPj++3+x3MDnp0hWH58PFH2mxl+bb53hp2+5+3kvP89wxfls8eLbCy0YX9Fa21bVf15hvbn59Tw3x3Xz/BL580y7PA/aMY0d9ZrMrw/T0zyuXE+38xwHsuDM7ynxyZJa+21VXVohnXiP6rq3RmaGN4sQ1PQB2TYEX76xPSfkfF/hjK1k7bSxlB76ljPqUnuvUhzwpe1if+DGpfrJkn+OMkHavjjz49lvERyhtf5miTPbq29YRdKO7KqDh8f3zjDkYAHjHV+NclTW2vL2Vn78yQHVNWHMmx3vjfW9+AMFzj5u4lhT82w3Xl1Vb0lyX8nubi19spdqH/aVzOcq3ZuVZ2c4XP+6AwB5VVt4o9rx5P9X57k95J8YtzGbMjw/y5fyezt0pKfiww/PhycYT37uap6b4ZzcG6RYdt63wwh81NjDR+qqldkOMn93KnP+kVZ/HyTndZau6yqjstwhc0/SPK41trXqurvMjSdOquq3pPhiNhDMxxNPivDNnd3/XaGK50+ewxAC/8z9JgMV018xFStrYY/ZP3XJH8/fvbPz3CJ8J/PsM48qc3+f7brhFVahmdkWI/+IMkTa7iQ0NczbBPvmOFcosdluILcDZN8sKo+n2Gb8Z8ZLsH90HHYk6eO+J46jv+uqnp/hnN6P9lae/suLD57mrW+fJ2b20rd8oNLLrckP7nIMHfK0KTuGxkuf/qxDM3qNo3jnTg1/IlZ+rK4h2fG5Woz7Fy9KsOO0uUZrnj34gxXGdqaJS5hmyGc/ftY38UZmvVtdwnfpWrL8CenJ2b4QrgiQ5g6N8lfJXnwMl/PhWU7fbFal/t6jP0We41PH7vfIMkLM3yxXZHhnIIXZJHLpGb4X5D3ZNiJuiJDePzjzL7c9OkZT8tYYnkfNr7uC5fl/f5rm2Gn8TkZduq+m2Gn8I0ZTrbe7n1YbFmXU0+GX47fl6HN/OXj63FShvOVpodd+NPbb2TYIf9ahuYlL8z2/8F07GLvzRKvycJ7c/gSwyws/5ET3RaWf0e3xT5XB2b4U93PZNgZvyxDqD9+erl2cjkWbldl+Gydn+FiJ0cmudEi4y68bodPdPvfSf42ww8ql2Q44nJuhv8/+uEZ03hOhrB/RaYuH72jdTNLX1p7a4ad+b/IEEKuGOfzzCQ1Y1qVIdT/x7i+/FeGz8zM7dKOPhcT03xihp3Lb4/TvSBDAPjtJLeZUcMzJl6Pr4z1b1yshiVem607WI/2GWu5Jsldx277ju/T5zN8vr40zv/ms96L7MIlycfuP5LhB4xvZthmnDW+l4tOL0NweGOGUHjleP+mzPg+y4z1ckfrzHKWZ4lpnbjM4VdkGSaGuf64vnw4wzbxinG9PTXDhTBuPg63d4bzcd859r98fO3PyBDWrz813Rtl2J58OcP2YNnL6Lbn36q1FoC1Mv76/8DW2sxr8wIArBXnDAEAAF0ShgAAgC4JQwAAQJecMwQAAHRpXV9a+/Wvf3178pOfPO8yAACA67aZF25a183kLr300nmXAAAArFPrOgwBAADsKmEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6tGHeBeyOcy7Ylk3HnDLvMgAAgCRbjzti3iXsFEeGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC7NNQzVQCADAADW3JoHkaraVFWfqao3JLkkyX9U1YlV9dmqOqmqHlJVH6qqz1XVPda6PgAAoA/zOipz+ySvSnLnJLdJ8qdJDhxvj09yvyS/keS3p0esqqOqaktVbbn6sm1rVzEAALBHmVcY+s/W2hnj4y+21s5prV2T5Lwkp7bWWpJzkmyaHrG1dkJrbXNrbfNe+25cu4oBAIA9yrzC0KUTj6+YeHzNxPNrkmxYs4oAAICuuHgBAADQJWEIAADo0po3Q2utbU1yl+nH4/MjZw0HAACw0hwZAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLG+ZdwO446ICNOf7oI+ZdBgAAsA45MgQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdGnDvAvYHedcsC2bjjll3mUAALBObT3uiHmXwBw5MgQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADo0oqGoaq6ZAf9f3uZ01nWcAAAALtqrY8MLTfkCEMAAMCqWpUwVFW3qqr3V9VZVXVuVd2/qo5LcsOx20njcP9cVR+rqvOq6qix23bDAQAArLTVOjL0+CTvbq0dkuTgJGe11o5J8t3W2iGttSeMwz21tXZoks1JnllVN19kuO+rqqOqaktVbbn6sm2rVD4AALCnW60wdGaSp1TVsUkOaq399yLDPbOqPpnkjCS3SXL7HU24tXZCa21za23zXvtuXLGCAQCAvqxKGGqtvT/JA5JckOTEqnrS9DBVdXiShyS5d2vt4CSfSLLPatQDAAAwbbXOGbptkq+31l6d5DVJ7j72urKq9h4fb0xyUWvtsqo6MMm9JiYxORwAAMCK27BK0z08yW9W1ZVJLkmycGTohCRnV9XHkzw1ydOr6tNJPpOhqVymh5t13hAAAMDuqtbavGvYZb/6O3/U3nn1XeddBgAA69TW446YdwmsjZrVca3/ZwgAAOA6QRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEsb5l3A7jjogI05/ugj5l0GAACwDjkyBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0acO8C9gd51ywLZuOOWXeZcCa2HrcEfMuAQBgj+LIEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXdjkMVdWHd3L4w6vqX3Z1fgAAACtpl8NQa+0+K1kIAADAWtqdI0OXjPeHV9XpVfXmqjq/qk6qqhr7PWzs9vEkj5oY99iq+o2J5+dW1aaqulFVnVJVnxy7PWY3lg0AAGBRK3XO0N2SPDvJnZL8RJL7VtU+SV6d5OeSHJrkR5YxnYcl+Upr7eDW2l2SvGt6gKo6qqq2VNWWqy/btkLlAwAAvVmpMPTR1tqXW2vXJDkryaYkByb5Ymvtc621luRNy5jOOUkeWlUvqar7t9a2SzuttRNaa5tba5v32nfjCpUPAAD0ZqXC0BUTj69OsmEHw181Ne99kqS19tkkd88Qil5YVc9fofoAAACuZTUvrX1+kk1Vdbvx+eMm+m3NEHpSVXdP8uPj4x9Ncllr7U1J/mRhGAAAgJW2oyM4u6y1dnlVHZXklKq6LMkHktx47P2WJE+qqvOSfCTJZ8fuByX5k6q6JsmVSX51teoDAAD6tsthqLW233h/epLTJ7o/Y+LxuzKcOzQ97neT/PSMyW5N8u5drQkAAGC5VrOZHAAAwHWWMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAlzbMu4DdcdABG3P80UfMuwwAAGAdcmQIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOjShnkXsDvOuWBbNh1zyrzLYDdsPe6IeZcAAECnHBkCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXVrVMFRVm6rq01X16qo6r6reU1U3rKpDquqMqjq7qt5aVTetqttW1eeqav+qul5VfaCqfno16wMAAPq1FkeGbp/kL1prd05ycZJfTPKGJM9rrd01yTlJXtBa+88kL0lyfJLnJvlUa+090xOrqqOqaktVbbn6sm1rUD4AALAnWosw9MXW2lnj448luV2Sm7TW3jd2e32SByRJa+01SX4oydOT/MasibXWTmitbW6tbd5r342rWjgAALDnWoswdMXE46uT3GSxAatq3yS3Hp/ut4o1AQAAnZvHBRS2Jbmoqu4/Pn9ikoWjRC9JclKS5yd59RxqAwAAOrFhTvN9cpK/HI8EfSHJU6rqgUkOS3Lf1trVVfWLVfWU1trr5lQjAACwB1vVMNRa25rkLhPP/99E73tNDf6+yW6ttUetZm0AAEDf/M8QAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC5tmHcBu+OgAzbm+KOPmHcZAADAOuTIEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQpQ3zLmB3nHPBtmw65pR5l9GtrccdMe8SAABglzkyBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANClXQ5DVfXsqtp3F8Y7sqp+dOL5a6rqTrtaBwAAwK7YnSNDz04yMwxV1V5LjHdkku+HodbaL7XWPrUbdQAAAOy0HYahqtpUVedX1UlV9emqenNVPTNDoDmtqk4bh7ukqv60qj6Z5N5V9fyqOrOqzq2qE2rw6CSbk5xUVWdV1Q2r6vSq2jxO43FVdc44zktWcbkBAIDOLffI0E8meVVr7Y5JvpPk+km+kuRBrbUHjcPcKMlHWmsHt9Y+mOSVrbXDWmt3SXLDJD/bWntzki1JntBaO6S19t2FGYxN516S5MFJDklyWFX9/HQhVXVUVW2pqi1XX7ZtFxYZAABg+WHoS621D42P35TkfjOGuTrJWyaeP6iqPlJV52QIOHfewTwOS3J6a+2brbWrkpyU5AHTA7XWTmitbW6tbd5r343LLB8AAODaNixzuLaD50lyeWvt6iSpqn2SvCrJ5tbal6rq2CT77HKVAAAAK2y5R4Z+rKruPT5+fJIPJvnvJDdeZPiF4HNhVe2X5NET/RYb76NJHlhV+48XYHhckvctsz4AAICdstwjQ59J8n+r6rVJPpXk+CTfS/KuqvrKxHlDSZLW2sVV9eok5yb5WpIzJ3qfmOQvq+q7Se49Mc5Xq+qYJKclqSSntNbetmuLBQAAsLRqbVaLt4kBqjYl+ZfxQgjXKb/6O3/U3nn1XeddRre2HnfEvEsAAIDlqFkdd+d/hgAAANatHTaTa61tTXKdOyoEAACwOxwZAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXNsy7gN1x0AEbc/zRR8y7DAAAYB1yZAgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6NKGeRewO865YFs2HXPKvMvYY2097oh5lwAAAKvGkSEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl9Y8DFXVI6rqmLWeLwAAwKQNazmzqtrQWjs5yclrOV8AAIBpKx6GqupJSX4jSUtydpKrk1ye5G5JPlRVZyfZ3Fp7RlWdmOS7Y79bJHlqkicluXeSj7TWjlzp+gAAAJIVbiZXVXdO8rtJHtxaOzjJs8Zet05yn9bac2aMdtMM4efXMxwxemmSOyc5qKoOmTGPo6pqS1VtufqybStZPgAA0JGVPmfowUn+sbV2YZK01r49dv/H1trVi4zz9tZaS3JOkq+31s5prV2T5Lwkm6YHbq2d0Frb3FrbvNe+G1e4fAAAoBdrdQGFS5fod8V4f83E44Xna3pOEwAA0I+VDkPvTfK/qurmSVJVN1vh6QMAAKyIFT3y0lo7r6pelOR9VXV1kk+s5PQBAABWyoo3Q2utvT7J65fof2KSE8fHR05035rkLhPPjwwAAMAqWfM/XQUAALguEIYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0KUN8y5gdxx0wMYcf/QR8y4DAABYhxwZAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6tGHeBeyOcy7Ylk3HnDLvMnbb1uOOmHcJAADQHUeGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF3aMK8ZV9WxSS5J8kNJ3t9a+7d51QIAAPRnbmFoQWvt+fOuAQAA6M+aNpOrqt+pqs9W1QeT/OTY7cSqevT4+Liq+lRVnV1V/28tawMAAPqyZmGoqg5N8tgkhyT5mSSHTfW/eZJfSHLn1tpdk7xwkekcVVVbqmrL1ZdtW92iAQCAPdZaHhm6f5K3ttYua619J8nJU/23Jbk8yV9X1aOSXDZrIq21E1prm1trm/fad+PqVgwAAOyxrjNXk2utXZXkHknenORnk7xrvhUBAAB7srUMQ+9P8vNVdcOqunGSn5vsWVX7JdnYWntHkl9PcvAa1gYAAHRmza4m11r7eFX9fZJPJvlGkjOnBrlxkrdV1T5JKslz1qo2AACgP2t6ae3W2ouSvGiJQe6xVrUAAAB9u86cMwQAALCWhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAurRh3gXsjoMO2Jjjjz5i3mUAAADrkCNDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXNsy7gN1xzgXbsumYU1Z1HluPO2JVpw8AAMyHI0MAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALq1JGKqqm1TV0WsxLwAAgOVYqyNDN0kiDAEAANcZaxWGjktyu6o6q6peWlWnVtXHq+qcqnpkklTVYVV1dlXtU1U3qqrzquoua1QfAADQmQ1rNJ9jktyltXZIVW1Ism9r7TtVtX+SM6rq5NbamVV1cpIXJrlhkje11s6dnlBVHZXkqCT55Wc/L7nBGi0BAACwR5nHBRQqyYur6uwk/5bkgCS3HPv9QZKHJtmc5I9njdxaO6G1trm1tnmvfTeuRb0AAMAeaK2ODE16QpIfTnJoa+3KqtqaZJ+x382T7Jdk77HbpXOoDwAA6MBaHRn67yQ3Hh9vTPKNMQg9KMltJ4b7qyS/l+SkJC9Zo9oAAIAOrcmRodbat6rqQ1V1bpIzkxxYVeck2ZLk/CSpqiclubK19jdVtVeSD1fVg1tr712LGgEAgL6sWTO51trjdzDI1iRvGIe9Osk9V7smAACgX/O4gAIAAMDcCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6NKGeRewOw46YGOOP/qIeZcBAACsQ44MAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABd2jDvAnbHORdsy6ZjTtmlcbced8QKVwMAAKwnjgwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALq1qGKqqf66qj1XVeVV11NjtaVX12ar6aFW9uqpeOXb/4ap6S1WdOd7uu5q1AQAAfVvtI0NPba0dmmRzkmdW1QFJfi/JvZLcN8mBE8O+PMlLW2uHJfnFJK+ZNcGqOqqqtlTVlqsv27a61QMAAHus1Q5Dz6yqTyY5I8ltkjwxyftaa99urV2Z5B8nhn1IkldW1VlJTk7yQ1W13/QEW2sntNY2t9Y277XvxlUuHwAA2FNtWK0JV9XhGQLOvVtrl1XV6UnOT3LHRUa5XpJ7tdYuX62aAAAAFqzmkaGNSS4ag9CBGZrG3SjJA6vqplW1IUNzuAXvSfJrC0+q6pBVrA0AAOjcaoahdyXZUFWfTnJchqZyFyR5cZKPJvlQkq1JFk78eWaSzVV1dlV9KsnTV7E2AACgc6vWTK61dkWSh093r6otrbUTxiNDb03yz+PwFyZ5zGrVAwAAMGke/zN07HiRhHOTfDFjGAIAAFhLq3ZkaDGttd9Y63kCAABMm8eRIQAAgLkThgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgSxvmXcDuOOiAjTn+6CPmXQYAALAOOTIEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRpw7wL2B3nXLAtm445ZWa/rccdscbVAAAA64kjQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECX5hKGqurpVXXWePtiVZ1WVZdU1Yuq6pNVdUZV3XIetQEAAH2YSxhqrf1la+2QJIcl+XKSP0tyoyRntNYOTvL+JL88a9yqOqqqtlTVlqsv27ZWJQMAAHuYeTeTe3mS97bW3p7ke0n+Zez+sSSbZo3QWjuhtba5tbZ5r303rk2VAADAHmfDvGZcVUcmuW2SZ4ydrmyttfHx1ZljbQAAwJ5vLoGjqg5N8htJ7t9au2YeNQAAAH2b19GXZyS5WZLTqipJtsypDgAAoFNzCUOttafM6PxLE/3fnOTNa1cRAADQm3lfQAEAAGAuhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAurRh3gXsjoMO2Jjjjz5i3mUAAADrkCNDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAuCUMAAECXhCEAAKBLwhAAANAlYQgAAOiSMAQAAHRJGAIAALokDAEAAF0ShgAAgC4JQwAAQJeEIQAAoEvCEAAA0CVhCAAA6JIwBAAAdEkYAgAAuiQMAQAAXRKGAACALglDAABAl4QhAACgS8IQAADQJWEIAADokjAEAAB0qVpr865hlz3vec/777333vsz866DPccll1yy/3777XfhvOtgz2GdYqVZp1hp1ilW0nV4fbrwhS984cOmO67rMFRVW1prm+ddB3sO6xQrzTrFSrNOsdKsU6yk9bY+aSYHAAB0SRgCAAC6tN7D0AnzLoA9jnWKlWadYqVZp1hp1ilW0rpan9b1OUMAAAC7ar0fGQIAANglwhAAANCldRGGquphVfWZqvp8VR0zo/8Nqurvx/4fqapNcyiTdWQZ69RzqupTVXV2VZ1aVbedR52sHztapyaG+8WqalW1bi47ytpbzvpUVf973E6dV1V/s9Y1sr4s43vvx6rqtKr6xPjd9zPzqJP1o6peW1XfqKpzF+lfVfXn4zp3dlXdfa1rXI7rfBiqqr2S/EWShye5U5LHVdWdpgZ7WpKLWmv/I8lLk7xkbatkPVnmOvWJJJtba3dN8uYkf7y2VbKeLHOdSlXdOMmzknxkbStkPVnO+lRVt0/yW0nu21q7c5Jnr3WdrB/L3Eb9bpJ/aK3dLcljk7xqbatkHToxyXZ/Yjrh4UluP96OSnL8GtS0067zYSjJPZJ8vrX2hdba95L8XZJHTg3zyCSvHx+/OclPVVWtYY2sLztcp1prp7XWLhufnpHk1mtcI+vLcrZTSfKHGX6suXwti2PdWc769MtJ/qK1dlGStNa+scY1sr4sZ51qSX5ofLwxyVfWsD7Wodba+5N8e4lBHpnkDW1wRpKbVNWt1qa65VsPYeiAJF+aeP7lsdvMYVprVyXZluTma1Id69Fy1qlJT0vyzlWtiPVuh+vU2DzgNq21U9ayMNal5Wyj7pDkDlX1oao6o6qW+nUWlrNOHZvk/1TVl5O8I8mvrU1p7MF2dn9rLjbMuwC4Lquq/5Nkc5IHzrsW1q+qul6SP0ty5JxLYc+xIUPTk8MzHLl+f1Ud1Fq7eJ5Fsa49LsmJrbU/rap7J3ljVd2ltXbNvAuD1bQejgxdkOQ2E89vPXabOUxVbchwePdba1Id69Fy1qlU1UOS/E6SR7TWrlij2lifdrRO3TjJXZKcXlVbk9wryckuosAilrON+nKSk1trV7bWvpjksxnCEcyynHXqaUn+IUlaa/+eZJ8k+69JdeyplrW/NW/rIQydmeT2VfXjVXX9DCf1nTw1zMlJnjw+fnSS9zb/JsvidrhOVdXdkvxVhiCkLT47suQ61Vrb1lrbv7W2qbW2KcN5aI9orW2ZT7lcxy3ne++fMxwVSlXtn6HZ3BfWsEbWl+WsU/+V5KeSpKrumCEMfXNNq2RPc3KSJ41XlbtXkm2tta/Ou6hp1/lmcq21q6rqGUnenWSvJK9trZ1XVX+QZEtr7eQkf53hcO7nM5zI9dj5Vcx13TLXqT9Jsl+SfxyvxfFfrbVHzK1ortOWuU7BsixzfXp3kp+uqk8luTrJb7bWtIhgpmWuU89N8uqq+vUMF1M40g/LLKWq/jbDjzL7j+eavSDJ3knSWvvLDOee/UySzye5LMlT5lPp0sp6DgAA9Gg9NJMDAABYccIQAADQJWEIAADokjAEAAB0SRgCAAC6JAwBAABdEoYAAIAu/f9wWFKKWYJmNgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "drf = H2ORandomForestEstimator(ntrees=50, seed=123, max_depth=3)\n", "drf.train(y=y,x=x,training_frame=train)\n", "print(\"RMSE: \")\n", "print(drf.training_model_metrics()['RMSE'])\n", "drf.varimp_plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, RM and LSTAT are the most significant features affecting the house price. \n", "Now, let's create a rulefit model to a create custom rules for more explainability and interpretability:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rulefit Model Build progress: |██████████████████████████████████████████████████| (done) 100%\n", "RMSE: \n", "4.816685889071532\n", "\n", "Rule Importance: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
variablecoefficientsupportrule
0linear.rm3.5554561.000000
1linear.chas1.7502861.000000
2M0T41N181.4555340.132411(age < 95.2587890625 or age is NA) & (nox < 0.6593242287635803 or ...
3M0T42N161.2002940.084980(crim < 7.391515254974365 or crim is NA) & (lstat < 5.127500057220...
4linear.ptratio-0.6322231.000000
5linear.lstat-0.5205121.000000
6M0T31N21-0.3951650.264822(dis >= 1.2369916439056396 or dis is NA) & (ptratio >= 19.90244102...
7M0T20N21-0.2777950.199605(indus >= 6.66726541519165 or indus is NA) & (lstat >= 4.278124809...
8M0T27N22-0.2104250.199605(indus >= 6.66726541519165 or indus is NA) & (lstat >= 4.702812671...
9M0T22N140.1630210.092885(crim < 6.696437835693359 or crim is NA) & (ptratio < 18.346485137...
10linear.dis-0.0831961.000000
11linear.crim-0.0251051.000000
\n", "
" ], "text/plain": [ " variable coefficient support \\\n", "0 linear.rm 3.555456 1.000000 \n", "1 linear.chas 1.750286 1.000000 \n", "2 M0T41N18 1.455534 0.132411 \n", "3 M0T42N16 1.200294 0.084980 \n", "4 linear.ptratio -0.632223 1.000000 \n", "5 linear.lstat -0.520512 1.000000 \n", "6 M0T31N21 -0.395165 0.264822 \n", "7 M0T20N21 -0.277795 0.199605 \n", "8 M0T27N22 -0.210425 0.199605 \n", "9 M0T22N14 0.163021 0.092885 \n", "10 linear.dis -0.083196 1.000000 \n", "11 linear.crim -0.025105 1.000000 \n", "\n", " rule \n", "0 \n", "1 \n", "2 (age < 95.2587890625 or age is NA) & (nox < 0.6593242287635803 or ... \n", "3 (crim < 7.391515254974365 or crim is NA) & (lstat < 5.127500057220... \n", "4 \n", "5 \n", "6 (dis >= 1.2369916439056396 or dis is NA) & (ptratio >= 19.90244102... \n", "7 (indus >= 6.66726541519165 or indus is NA) & (lstat >= 4.278124809... \n", "8 (indus >= 6.66726541519165 or indus is NA) & (lstat >= 4.702812671... \n", "9 (crim < 6.696437835693359 or crim is NA) & (ptratio < 18.346485137... \n", "10 \n", "11 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from h2o.estimators.rulefit import H2ORuleFitEstimator\n", "rf = H2ORuleFitEstimator(seed=123, lambda_=.5 )#, model_type=\"rules\")\n", "rf.train(y=y, x=x, training_frame=train)\n", "print(\"RMSE: \")\n", "print(rf.training_model_metrics()['RMSE'])\n", "rf.rule_importance()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rulefits's rule importance table shows all the predictors with non-zero LASSO coefficients. Predictors can be linear (variable name prefixed with \"linear.\") or a rule (rule identificator as a variable name, e.g. M0T43N16 means this rule comes from the tree based model n.0, tree n.43, node n.16). This table holds values of LASSO coefficient and support of undrlying predictor as a factors of predictor importances. In case of rule predictor, also it's language representation is present.\n", "\n", "Let's look closer on resulting predictors. From the table we can see that the rules selected by rulefit as most important have LSTAT or RM variables present. Those which were by far the top 2 most important variables of previous H2ORandomForestEstimator model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to interpret significant rules:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full text of M0T41N18 is:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(age < 95.2587890625 or age is NA) & (nox < 0.6593242287635803 or nox is NA) & (rm >= 6.940098762512207)'" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rf.rule_importance()[4][2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which means that the most expensive houses have 7 or more rooms and are in areas with low nitric oxides concentration and with possibly high proportion of historical buildings." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full text of M0T42N16 is:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'(crim < 7.391515254974365 or crim is NA) & (lstat < 5.127500057220459 or lstat is NA) & (rm >= 6.940098762512207)'" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rf.rule_importance()[4][3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which means that big houses (7 or more rooms) in areas with very-low-to-zero crime rate and very-low-to-zero percentage of lower status of the population are more expensive to live in." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full text of M0T31N21 is:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(dis >= 1.2369916439056396 or dis is NA) & (ptratio >= 19.902441024780273) & (rm >= 5.864699363708496 or rm is NA)'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rf.rule_importance()[4][6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Which can be interpreted as: \"with increasing distance from Boston employment centers and increasing pupil-teacher ratio, the price of majority of the house degrades\" (since majority of the houses in our dataset have 6 or more rooms)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also find out at which specific rows the certain rules apply or not:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
M0T31N21 M0T42N16 linear.rm
0 0 1
0 0 1
0 1 1
0 1 1
0 0 1
0 0 1
0 0 1
0 0 1
0 0 1
0 0 1
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predicted_rules = rf.predict_rules(train, [\"M0T31N21\", \"M0T42N16\", \"linear.rm\"])\n", "predicted_rules.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please note, that linear predictor applies to all the observations and that col-wise sums of this output divided by the number of observations represents a support of each rule." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apart of that, Friedman and Popescu defines (https://arxiv.org/abs/0811.1679) the rulefit-specific global measure of predictors importance as follows:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "def FPimportance(data, lasso_coef, support, is_rule):\n", " if is_rule:\n", " import math\n", " return abs(lasso_coef) * math.sqrt(support * (1 - support))\n", " else:\n", " return abs(lasso_coef) * data.sd()[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hence, we can get global importances calculated out of combined importance factors like:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "def calculate_FPimportance(input, data):\n", " result = dict()\n", " for x in range(len(input[1])):\n", " if input[1][x].startswith('linear.'):\n", " result[input[1][x]] = FPimportance(data[input[1][x][len(\"linear.\"):]], input[2][x], input[3][x], False)\n", " else:\n", " result[input[1][x]] = FPimportance(None, input[2][x], input[3][x], True)\n", " return result " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "FPimportances = calculate_FPimportance(rf.rule_importance(), train)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import operator\n", "sorted_FPimportances = sorted(FPimportances.items(), key=operator.itemgetter(1), reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which gives us a slightly reordered list of importances:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('linear.lstat', 3.7170067016740407),\n", " ('linear.rm', 2.498124117997926),\n", " ('linear.ptratio', 1.368729326964665),\n", " ('M0T41N18', 0.49333452353437895),\n", " ('linear.chas', 0.4445622162226629),\n", " ('M0T42N16', 0.33470481934298546),\n", " ('linear.crim', 0.21593853846749478),\n", " ('linear.dis', 0.1751857632048295),\n", " ('M0T31N21', 0.1743620600544106),\n", " ('M0T20N21', 0.11103558646592059),\n", " ('M0T27N22', 0.0841076264935881),\n", " ('M0T22N14', 0.04732040851755857)]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted_FPimportances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Stay tuned for the future improvements, mainly rulefit-specific tools for importance and interaction examination!" ] } ], "metadata": { "kernelspec": { "display_name": "h2o3pyenv", "language": "python", "name": "h2o3pyenv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" } }, "nbformat": 4, "nbformat_minor": 4 }