{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Random Forests == Awesome" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Random Forests should be the hammer of your data science tool kit. \n", "\n", "What are they?\n", "* Machine learning algorithm built for prediction tasks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pros\n", "* Automatically model non-linear relations and interactions between variables. Perfect collinearity doesn't matter.\n", "* Easy to tune\n", "* Relatively easy to understand everything about them\n", "* Flexible enough to handle regression and classification tasks\n", "* Is useful as a step in exploratory data analysis\n", "* Can handle high dimensional data\n", "* Have a built in method of checking to see model accuracy\n", "* In general, beats most models at most prediction tasks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cons\n", "* ?\n", "* ?\n", "* ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Simple example: Boston Housing dataset" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Load the Boston Housing dataset\n", "from sklearn.datasets import load_boston\n", "X, y = load_boston().data, load_boston().target\n", "\n", "# Make train and test datasets\n", "from sklearn.cross_validation import train_test_split\n", "import numpy as np\n", "np.random.seed(100)\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(506, 13)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linear Regression" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R^2: 0.76\n" ] } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "model = LinearRegression()\n", "model.fit(X_train, y_train)\n", "print (\"R^2:\", model.score(X_test, y_test).round(2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Decision Tree" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R^2: 0.8\n" ] } ], "source": [ "from sklearn.tree import DecisionTreeRegressor\n", "model = DecisionTreeRegressor()\n", "model.fit(X_train, y_train)\n", "print (\"R^2:\", model.score(X_test, y_test).round(2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random Forest with defaults" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R^2: 0.87\n" ] } ], "source": [ "from sklearn.ensemble import RandomForestRegressor\n", "model = RandomForestRegressor(random_state=42)\n", "model.fit(X_train, y_train)\n", "print (\"R^2:\", model.score(X_test, y_test).round(2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "* Decision trees\n", "* Bootstrap sampling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Random Forest Algorithm\n", "\n", "The big idea: Combine a bunch of terrible decision trees into one awesome model. \n", "\n", "For each tree in the forest:\n", "1. Take a bootstrap sample of the data\n", "2. Randomly select some variables.\n", "3. For each variable selected, find the split point which minimizes MSE (or Gini Impurity or Information Gain if classification).\n", "4. Split the data using the variable with the lowest MSE (or other stat).\n", "5. Repeat step 2 through 4 (randomly selecting new sets of variables at each split) until some stopping condition is satisfied or all the data is exhausted.\n", "\n", "Repeat this process to build several trees. \n", "\n", "To make a prediction, run an observation down several trees and average the predicted values from all the trees (for regression) or find the most popular class predicted (if classification)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Most important parameters (and what they mean)\n", " * ###Parameters that will make your model better\n", " * n_estimators: The number of trees in the forest. Choose as high of a number as your computer can handle.\n", " * max_features: The number of features to consider when looking for the best split. Try [\"auto\", \"None\", \"sqrt\", \"log2\", 0.9, and 0.2]\n", " * min_samples_leaf: The minimum number of samples in newly created leaves.Try [1, 2, 3]. If 3 is the best, try higher numbers.\n", " * ###Parameters that will make it easier to train your model\n", " * n_jobs: Determines if multiple processors should be used to train and test the model. Always set this to -1 and %%timeit vs. if it is set to 1. It should be much faster (especially when many trees are trained).\n", " * random_state: Set this to 42 if you want to be cool AND want others to be able to replicate your results.\n", " * oob_score: THE BEST THING EVER. Random Forest's custom validation method: out-of-bag predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## OOB predictions\n", "About a third of observations don't show up in a bootstrap sample. \n", "\n", "Because an individual tree in the forest is made from a bootstrap sample, it means that about a third of the data was not used to build that tree. We can track which observations were used to build which trees.\n", "\n", "#### Here is the magic. \n", "\n", "After the forest is built, we take each observation in the dataset and identify which trees used the observation and which trees did not use the observation (based on the bootstrap sample). We use the trees the observation was not used to build to predict the true value of the observation. About a third of the trees in the forest will not use any specific observation from the dataset. \n", "\n", "OOB predictions are similar to following awesome, but computationally expensive method: \n", "1. Train a model with n_estimators trees, but exclude one observation from the dataset.\n", "2. Use the trained model to predict the excluded observation. Record the prediction.\n", "3. Repeat this process for every single observation in the dataset.\n", "4. Collect all your final predictions. These will be similar to your oob prediction errors. \n", "\n", "The leave-one-out method will take n_estimators\\*time_to_train_one_model\\*n_observations to run. \n", "\n", "The oob method will take n_estimators\\*time_to_train_one_model\\*3 to run (the \\*3 is because if you want to get an accuracy estimate of a 100 tree forest, you will need to train 300 trees. Why? Because with 300 trees each observation will have about 100 trees it was not used to build that can be used for the oob_predictions).\n", "\n", "This means the oob method is n_observations/3 times faster to train then the leave-one-out method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Full example. Titanic dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Your first goal always should be getting a generalized prediction as fast as possible.\n", "* This doesn't mean to skip exploratory data analysis (EDA). It just means to not get caught up on it. Initially do only what is needed to get a generalized prediction.\n", "* Getting a prediction first lets you set a benchmark for yourself. As you make improvements to the model, you should be able to see your desired error metric improve." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# With the goal above, I will import just what I need. \n", "# The model to use (I already imported it above, but will do it again here so each example is self-contained)\n", "from sklearn.ensemble import RandomForestRegressor\n", "\n", "# The error metric. In this case, we will use c-stat (aka ROC/AUC)\n", "from sklearn.metrics import roc_auc_score\n", "\n", "# An efficient data structure. \n", "import pandas as pd\n", "\n", "# Import the data\n", "X = pd.read_csv(\"../data/train.csv\")\n", "y = X.pop(\"Survived\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdPclassAgeSibSpParchFare
count891.000000891.000000714.000000891.000000891.000000891.000000
mean446.0000002.30864229.6991180.5230080.38159432.204208
std257.3538420.83607114.5264971.1027430.80605749.693429
min1.0000001.0000000.4200000.0000000.0000000.000000
25%223.5000002.00000020.1250000.0000000.0000007.910400
50%446.0000003.00000028.0000000.0000000.00000014.454200
75%668.5000003.00000038.0000001.0000000.00000031.000000
max891.0000003.00000080.0000008.0000006.000000512.329200
\n", "
" ], "text/plain": [ " PassengerId Pclass Age SibSp Parch Fare\n", "count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000\n", "mean 446.000000 2.308642 29.699118 0.523008 0.381594 32.204208\n", "std 257.353842 0.836071 14.526497 1.102743 0.806057 49.693429\n", "min 1.000000 1.000000 0.420000 0.000000 0.000000 0.000000\n", "25% 223.500000 2.000000 20.125000 0.000000 0.000000 7.910400\n", "50% 446.000000 3.000000 28.000000 0.000000 0.000000 14.454200\n", "75% 668.500000 3.000000 38.000000 1.000000 0.000000 31.000000\n", "max 891.000000 3.000000 80.000000 8.000000 6.000000 512.329200" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I know that there are categorical variables in the dataset, but I will skip them for the moment. I will impute age though, because it will be fast." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdPclassAgeSibSpParchFare
count891.000000891.000000891.000000891.000000891.000000891.000000
mean446.0000002.30864229.6991180.5230080.38159432.204208
std257.3538420.83607113.0020151.1027430.80605749.693429
min1.0000001.0000000.4200000.0000000.0000000.000000
25%223.5000002.00000022.0000000.0000000.0000007.910400
50%446.0000003.00000029.6991180.0000000.00000014.454200
75%668.5000003.00000035.0000001.0000000.00000031.000000
max891.0000003.00000080.0000008.0000006.000000512.329200
\n", "
" ], "text/plain": [ " PassengerId Pclass Age SibSp Parch Fare\n", "count 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000\n", "mean 446.000000 2.308642 29.699118 0.523008 0.381594 32.204208\n", "std 257.353842 0.836071 13.002015 1.102743 0.806057 49.693429\n", "min 1.000000 1.000000 0.420000 0.000000 0.000000 0.000000\n", "25% 223.500000 2.000000 22.000000 0.000000 0.000000 7.910400\n", "50% 446.000000 3.000000 29.699118 0.000000 0.000000 14.454200\n", "75% 668.500000 3.000000 35.000000 1.000000 0.000000 31.000000\n", "max 891.000000 3.000000 80.000000 8.000000 6.000000 512.329200" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Impute Age with mean\n", "X[\"Age\"].fillna(X.Age.mean(), inplace=True)\n", "\n", "# Confirm the code is correct\n", "X.describe()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdPclassAgeSibSpParchFare
01322.0107.2500
12138.01071.2833
23326.0007.9250
34135.01053.1000
45335.0008.0500
\n", "
" ], "text/plain": [ " PassengerId Pclass Age SibSp Parch Fare\n", "0 1 3 22.0 1 0 7.2500\n", "1 2 1 38.0 1 0 71.2833\n", "2 3 3 26.0 0 0 7.9250\n", "3 4 1 35.0 1 0 53.1000\n", "4 5 3 35.0 0 0 8.0500" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get just the numeric variables by selecting only the variables that are not \"object\" datatypes.\n", "numeric_variables = list(X.dtypes[X.dtypes != \"object\"].index)\n", "X[numeric_variables].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I notice PassengerId looks like a worthless variable. I leave it in for two reasons. First, I don't want to go through the effort of dropping it (although that would be very easy). Second, I am interested in seeing if it is useful for prediction. It might be useful if the PassengerId was assigned in some non-random way. For example, perhaps PassengerId was assigned based on when the ticket was purchased in which case there might be something predictive about people who purchased their tickets early or late." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,\n", " max_features='auto', max_leaf_nodes=None, min_samples_leaf=1,\n", " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", " n_estimators=100, n_jobs=1, oob_score=True, random_state=42,\n", " verbose=0, warm_start=False)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let's build our first model. I always have oob_score=True. It is a good idea to increase n_estimators to a number higher than \n", "# the default. In this case the oob_predictions will be based on a forest of 33 trees. I set random_state=42 so that you all can\n", "# replicate the model exactly.\n", "model = RandomForestRegressor(n_estimators=100, oob_score=True, random_state=42)\n", "\n", "# I only use numeric_variables because I have yet to dummy out the categorical variables\n", "model.fit(X[numeric_variables], y)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.1361695005913669" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# For regression, the oob_score_ attribute gives the R^2 based on the oob predictions. We want to use c-stat, but I mention this \n", "# for awareness. By the way, attributes in sklearn that have a trailing underscore are only available after the model has been fit.\n", "model.oob_score_" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "c-stat: 0.73995515504\n" ] } ], "source": [ "y_oob = model.oob_prediction_\n", "print(\"c-stat: \", roc_auc_score(y, y_oob))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a benchmark. This isn't very good for this dataset; however, it provides us a benchmark for improvement. Before changing parameters for the Random Forest, let's whip this dataset into shape." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Here is a simple function to show descriptive stats on the categorical variables\n", "def describe_categorical(X):\n", " \"\"\"\n", " Just like .describe(), but returns the results for\n", " categorical variables only.\n", " \"\"\"\n", " from IPython.display import display, HTML\n", " display(HTML(X[X.columns[X.dtypes == \"object\"]].describe().to_html()))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameSexTicketCabinEmbarked
count891891891204889
unique89126811473
topAndersson, Mr. Anders JohanmaleCA. 2343B96 B98S
freq157774644
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "describe_categorical(X)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Drop the variables I don't feel like dealing with for this tutorial\n", "X.drop([\"Name\", \"Ticket\", \"PassengerId\"], axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Change the Cabin variable to be only the first letter or None\n", "def clean_cabin(x):\n", " try:\n", " return x[0]\n", " except TypeError:\n", " return \"None\"\n", "\n", "X[\"Cabin\"] = X.Cabin.apply(clean_cabin)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "categorical_variables = ['Sex', 'Cabin', 'Embarked']\n", "\n", "for variable in categorical_variables:\n", " # Fill missing data with the word \"Missing\"\n", " X[variable].fillna(\"Missing\", inplace=True)\n", " # Create array of dummies\n", " dummies = pd.get_dummies(X[variable], prefix=variable)\n", " # Update X to include dummies and drop the main variable\n", " X = pd.concat([X, dummies], axis=1)\n", " X.drop([variable], axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PclassAgeSibSpParchFareSex_femaleSex_maleCabin_ACabin_BCabin_CCabin_DCabin_ECabin_FCabin_GCabin_NoneCabin_TEmbarked_CEmbarked_MissingEmbarked_QEmbarked_S
0322.000000107.25000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
1138.0000001071.28331.00.00.00.01.00.00.00.00.00.00.01.00.00.00.0
2326.000000007.92501.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
3135.0000001053.10001.00.00.00.01.00.00.00.00.00.00.00.00.00.01.0
4335.000000008.05000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
5329.699118008.45830.01.00.00.00.00.00.00.00.01.00.00.00.01.00.0
6154.0000000051.86250.01.00.00.00.00.01.00.00.00.00.00.00.00.01.0
732.0000003121.07500.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
8327.0000000211.13331.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
9214.0000001030.07081.00.00.00.00.00.00.00.00.01.00.01.00.00.00.0
1034.0000001116.70001.00.00.00.00.00.00.00.01.00.00.00.00.00.01.0
11158.0000000026.55001.00.00.00.01.00.00.00.00.00.00.00.00.00.01.0
12320.000000008.05000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
13339.0000001531.27500.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
14314.000000007.85421.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
15255.0000000016.00001.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
1632.0000004129.12500.01.00.00.00.00.00.00.00.01.00.00.00.01.00.0
17229.6991180013.00000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
18331.0000001018.00001.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
19329.699118007.22501.00.00.00.00.00.00.00.00.01.00.01.00.00.00.0
20235.0000000026.00000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
21234.0000000013.00000.01.00.00.00.01.00.00.00.00.00.00.00.00.01.0
22315.000000008.02921.00.00.00.00.00.00.00.00.01.00.00.00.01.00.0
23128.0000000035.50000.01.01.00.00.00.00.00.00.00.00.00.00.00.01.0
2438.0000003121.07501.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
25338.0000001531.38751.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
26329.699118007.22500.01.00.00.00.00.00.00.00.01.00.01.00.00.00.0
27119.00000032263.00000.01.00.00.01.00.00.00.00.00.00.00.00.00.01.0
28329.699118007.87921.00.00.00.00.00.00.00.00.01.00.00.00.01.00.0
29329.699118007.89580.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
...............................................................
861221.0000001011.50000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
862148.0000000025.92921.00.00.00.00.01.00.00.00.00.00.00.00.00.01.0
863329.6991188269.55001.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
864224.0000000013.00000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
865242.0000000013.00001.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
866227.0000001013.85831.00.00.00.00.00.00.00.00.01.00.01.00.00.00.0
867131.0000000050.49580.01.01.00.00.00.00.00.00.00.00.00.00.00.01.0
868329.699118009.50000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
86934.0000001111.13330.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
870326.000000007.89580.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
871147.0000001152.55421.00.00.00.00.01.00.00.00.00.00.00.00.00.01.0
872133.000000005.00000.01.00.01.00.00.00.00.00.00.00.00.00.00.01.0
873347.000000009.00000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
874228.0000001024.00001.00.00.00.00.00.00.00.00.01.00.01.00.00.00.0
875315.000000007.22501.00.00.00.00.00.00.00.00.01.00.01.00.00.00.0
876320.000000009.84580.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
877319.000000007.89580.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
878329.699118007.89580.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
879156.0000000183.15831.00.00.00.01.00.00.00.00.00.00.01.00.00.00.0
880225.0000000126.00001.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
881333.000000007.89580.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
882322.0000000010.51671.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
883228.0000000010.50000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
884325.000000007.05000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
885339.0000000529.12501.00.00.00.00.00.00.00.00.01.00.00.00.01.00.0
886227.0000000013.00000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
887119.0000000030.00001.00.00.01.00.00.00.00.00.00.00.00.00.00.01.0
888329.6991181223.45001.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
889126.0000000030.00000.01.00.00.01.00.00.00.00.00.00.01.00.00.00.0
890332.000000007.75000.01.00.00.00.00.00.00.00.01.00.00.00.01.00.0
\n", "

891 rows × 20 columns

\n", "
" ], "text/plain": [ " Pclass Age SibSp Parch Fare Sex_female Sex_male Cabin_A \\\n", "0 3 22.000000 1 0 7.2500 0.0 1.0 0.0 \n", "1 1 38.000000 1 0 71.2833 1.0 0.0 0.0 \n", "2 3 26.000000 0 0 7.9250 1.0 0.0 0.0 \n", "3 1 35.000000 1 0 53.1000 1.0 0.0 0.0 \n", "4 3 35.000000 0 0 8.0500 0.0 1.0 0.0 \n", "5 3 29.699118 0 0 8.4583 0.0 1.0 0.0 \n", "6 1 54.000000 0 0 51.8625 0.0 1.0 0.0 \n", "7 3 2.000000 3 1 21.0750 0.0 1.0 0.0 \n", "8 3 27.000000 0 2 11.1333 1.0 0.0 0.0 \n", "9 2 14.000000 1 0 30.0708 1.0 0.0 0.0 \n", "10 3 4.000000 1 1 16.7000 1.0 0.0 0.0 \n", "11 1 58.000000 0 0 26.5500 1.0 0.0 0.0 \n", "12 3 20.000000 0 0 8.0500 0.0 1.0 0.0 \n", "13 3 39.000000 1 5 31.2750 0.0 1.0 0.0 \n", "14 3 14.000000 0 0 7.8542 1.0 0.0 0.0 \n", "15 2 55.000000 0 0 16.0000 1.0 0.0 0.0 \n", "16 3 2.000000 4 1 29.1250 0.0 1.0 0.0 \n", "17 2 29.699118 0 0 13.0000 0.0 1.0 0.0 \n", "18 3 31.000000 1 0 18.0000 1.0 0.0 0.0 \n", "19 3 29.699118 0 0 7.2250 1.0 0.0 0.0 \n", "20 2 35.000000 0 0 26.0000 0.0 1.0 0.0 \n", "21 2 34.000000 0 0 13.0000 0.0 1.0 0.0 \n", "22 3 15.000000 0 0 8.0292 1.0 0.0 0.0 \n", "23 1 28.000000 0 0 35.5000 0.0 1.0 1.0 \n", "24 3 8.000000 3 1 21.0750 1.0 0.0 0.0 \n", "25 3 38.000000 1 5 31.3875 1.0 0.0 0.0 \n", "26 3 29.699118 0 0 7.2250 0.0 1.0 0.0 \n", "27 1 19.000000 3 2 263.0000 0.0 1.0 0.0 \n", "28 3 29.699118 0 0 7.8792 1.0 0.0 0.0 \n", "29 3 29.699118 0 0 7.8958 0.0 1.0 0.0 \n", ".. ... ... ... ... ... ... ... ... \n", "861 2 21.000000 1 0 11.5000 0.0 1.0 0.0 \n", "862 1 48.000000 0 0 25.9292 1.0 0.0 0.0 \n", "863 3 29.699118 8 2 69.5500 1.0 0.0 0.0 \n", "864 2 24.000000 0 0 13.0000 0.0 1.0 0.0 \n", "865 2 42.000000 0 0 13.0000 1.0 0.0 0.0 \n", "866 2 27.000000 1 0 13.8583 1.0 0.0 0.0 \n", "867 1 31.000000 0 0 50.4958 0.0 1.0 1.0 \n", "868 3 29.699118 0 0 9.5000 0.0 1.0 0.0 \n", "869 3 4.000000 1 1 11.1333 0.0 1.0 0.0 \n", "870 3 26.000000 0 0 7.8958 0.0 1.0 0.0 \n", "871 1 47.000000 1 1 52.5542 1.0 0.0 0.0 \n", "872 1 33.000000 0 0 5.0000 0.0 1.0 0.0 \n", "873 3 47.000000 0 0 9.0000 0.0 1.0 0.0 \n", "874 2 28.000000 1 0 24.0000 1.0 0.0 0.0 \n", "875 3 15.000000 0 0 7.2250 1.0 0.0 0.0 \n", "876 3 20.000000 0 0 9.8458 0.0 1.0 0.0 \n", "877 3 19.000000 0 0 7.8958 0.0 1.0 0.0 \n", "878 3 29.699118 0 0 7.8958 0.0 1.0 0.0 \n", "879 1 56.000000 0 1 83.1583 1.0 0.0 0.0 \n", "880 2 25.000000 0 1 26.0000 1.0 0.0 0.0 \n", "881 3 33.000000 0 0 7.8958 0.0 1.0 0.0 \n", "882 3 22.000000 0 0 10.5167 1.0 0.0 0.0 \n", "883 2 28.000000 0 0 10.5000 0.0 1.0 0.0 \n", "884 3 25.000000 0 0 7.0500 0.0 1.0 0.0 \n", "885 3 39.000000 0 5 29.1250 1.0 0.0 0.0 \n", "886 2 27.000000 0 0 13.0000 0.0 1.0 0.0 \n", "887 1 19.000000 0 0 30.0000 1.0 0.0 0.0 \n", "888 3 29.699118 1 2 23.4500 1.0 0.0 0.0 \n", "889 1 26.000000 0 0 30.0000 0.0 1.0 0.0 \n", "890 3 32.000000 0 0 7.7500 0.0 1.0 0.0 \n", "\n", " Cabin_B Cabin_C Cabin_D Cabin_E Cabin_F Cabin_G Cabin_None \\\n", "0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "3 0.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "5 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "6 0.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "8 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "9 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "10 0.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "11 0.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "12 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "13 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "14 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "15 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "16 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "17 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "18 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "19 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "20 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "21 0.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "22 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "23 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "24 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "25 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "26 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "27 0.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "28 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "29 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", ".. ... ... ... ... ... ... ... \n", "861 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "862 0.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "863 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "864 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "865 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "866 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "867 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "868 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "869 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "870 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "871 0.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "872 1.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "873 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "874 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "875 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "876 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "877 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "878 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "879 0.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "880 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "881 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "882 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "883 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "884 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "885 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "886 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "887 1.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "888 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "889 0.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "890 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "\n", " Cabin_T Embarked_C Embarked_Missing Embarked_Q Embarked_S \n", "0 0.0 0.0 0.0 0.0 1.0 \n", "1 0.0 1.0 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 0.0 1.0 \n", "3 0.0 0.0 0.0 0.0 1.0 \n", "4 0.0 0.0 0.0 0.0 1.0 \n", "5 0.0 0.0 0.0 1.0 0.0 \n", "6 0.0 0.0 0.0 0.0 1.0 \n", "7 0.0 0.0 0.0 0.0 1.0 \n", "8 0.0 0.0 0.0 0.0 1.0 \n", "9 0.0 1.0 0.0 0.0 0.0 \n", "10 0.0 0.0 0.0 0.0 1.0 \n", "11 0.0 0.0 0.0 0.0 1.0 \n", "12 0.0 0.0 0.0 0.0 1.0 \n", "13 0.0 0.0 0.0 0.0 1.0 \n", "14 0.0 0.0 0.0 0.0 1.0 \n", "15 0.0 0.0 0.0 0.0 1.0 \n", "16 0.0 0.0 0.0 1.0 0.0 \n", "17 0.0 0.0 0.0 0.0 1.0 \n", "18 0.0 0.0 0.0 0.0 1.0 \n", "19 0.0 1.0 0.0 0.0 0.0 \n", "20 0.0 0.0 0.0 0.0 1.0 \n", "21 0.0 0.0 0.0 0.0 1.0 \n", "22 0.0 0.0 0.0 1.0 0.0 \n", "23 0.0 0.0 0.0 0.0 1.0 \n", "24 0.0 0.0 0.0 0.0 1.0 \n", "25 0.0 0.0 0.0 0.0 1.0 \n", "26 0.0 1.0 0.0 0.0 0.0 \n", "27 0.0 0.0 0.0 0.0 1.0 \n", "28 0.0 0.0 0.0 1.0 0.0 \n", "29 0.0 0.0 0.0 0.0 1.0 \n", ".. ... ... ... ... ... \n", "861 0.0 0.0 0.0 0.0 1.0 \n", "862 0.0 0.0 0.0 0.0 1.0 \n", "863 0.0 0.0 0.0 0.0 1.0 \n", "864 0.0 0.0 0.0 0.0 1.0 \n", "865 0.0 0.0 0.0 0.0 1.0 \n", "866 0.0 1.0 0.0 0.0 0.0 \n", "867 0.0 0.0 0.0 0.0 1.0 \n", "868 0.0 0.0 0.0 0.0 1.0 \n", "869 0.0 0.0 0.0 0.0 1.0 \n", "870 0.0 0.0 0.0 0.0 1.0 \n", "871 0.0 0.0 0.0 0.0 1.0 \n", "872 0.0 0.0 0.0 0.0 1.0 \n", "873 0.0 0.0 0.0 0.0 1.0 \n", "874 0.0 1.0 0.0 0.0 0.0 \n", "875 0.0 1.0 0.0 0.0 0.0 \n", "876 0.0 0.0 0.0 0.0 1.0 \n", "877 0.0 0.0 0.0 0.0 1.0 \n", "878 0.0 0.0 0.0 0.0 1.0 \n", "879 0.0 1.0 0.0 0.0 0.0 \n", "880 0.0 0.0 0.0 0.0 1.0 \n", "881 0.0 0.0 0.0 0.0 1.0 \n", "882 0.0 0.0 0.0 0.0 1.0 \n", "883 0.0 0.0 0.0 0.0 1.0 \n", "884 0.0 0.0 0.0 0.0 1.0 \n", "885 0.0 0.0 0.0 1.0 0.0 \n", "886 0.0 0.0 0.0 0.0 1.0 \n", "887 0.0 0.0 0.0 0.0 1.0 \n", "888 0.0 0.0 0.0 0.0 1.0 \n", "889 0.0 1.0 0.0 0.0 0.0 \n", "890 0.0 0.0 0.0 1.0 0.0 \n", "\n", "[891 rows x 20 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PclassAgeSibSpParchFareSex_femaleSex_maleCabin_ACabin_BCabin_CCabin_DCabin_ECabin_FCabin_GCabin_NoneCabin_TEmbarked_CEmbarked_MissingEmbarked_QEmbarked_S
0322.000000107.25000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
1138.0000001071.28331.00.00.00.01.00.00.00.00.00.00.01.00.00.00.0
2326.000000007.92501.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
3135.0000001053.10001.00.00.00.01.00.00.00.00.00.00.00.00.00.01.0
4335.000000008.05000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
...............................................................
886227.0000000013.00000.01.00.00.00.00.00.00.00.01.00.00.00.00.01.0
887119.0000000030.00001.00.00.01.00.00.00.00.00.00.00.00.00.00.01.0
888329.6991181223.45001.00.00.00.00.00.00.00.00.01.00.00.00.00.01.0
889126.0000000030.00000.01.00.00.01.00.00.00.00.00.00.01.00.00.00.0
890332.000000007.75000.01.00.00.00.00.00.00.00.01.00.00.00.01.00.0
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Look at all the columns in the dataset\n", "def printall(X, max_rows=10):\n", " from IPython.display import display, HTML\n", " display(HTML(X.to_html(max_rows=max_rows)))\n", " \n", "printall(X)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C-stat: 0.863521128261\n" ] } ], "source": [ "model = RandomForestRegressor(100, oob_score=True, n_jobs=-1, random_state=42)\n", "model.fit(X, y)\n", "print (\"C-stat: \", roc_auc_score(y, model.oob_prediction_))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a pretty good model. Now, before we try some different parameters for the model, let's use the Random Forest to help us with some EDA." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variable importance measures" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 9.11384671e-02, 2.38891052e-01, 4.43567267e-02,\n", " 2.15831071e-02, 2.15047796e-01, 1.43423437e-01,\n", " 1.58822440e-01, 2.95342368e-03, 3.79055011e-03,\n", " 6.47116172e-03, 4.30998991e-03, 8.59480266e-03,\n", " 1.02403226e-03, 8.12054428e-04, 2.67741854e-02,\n", " 6.64265010e-05, 1.06189189e-02, 0.00000000e+00,\n", " 6.00379221e-03, 1.53176370e-02])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.feature_importances_" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgEAAAFrCAYAAABIYVrAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XucXVV9///XGwQDQcALTKTfrxMrFKRRSQiChcIBxYp+\nS7mqXKpWufQnKlZsbdFKgkrsRbQF+SpI02oUgQpt5SYiHC+ESy4kBEKU/AigrRlqqzUG5JK8v3+c\nNeEwnDMzZ+YMc9nv5+NxmH1Za+219znkfM5aa+8l20RERET1bDXeFYiIiIjxkSAgIiKiohIERERE\nVFSCgIiIiIpKEBAREVFRCQIiIiIq6nnjXYEYnKTcwxkREc9iW6MtIy0Bk4DtvEbwOuecc8a9DpPx\nleuW65brNvFf3ZIgICIioqISBERERFSUutmsUHWSjgKuAvay/aMulZk3KCJiCunp6WX9+gdHVYYk\n3IUxAQkCukjS14GXAjfbnt+lMg15jyIipg6Nul+/W0FAugO6RNJ04EDgPcAJZZskXSRptaRvSbpW\n0jFl3xxJdUlLJF0vqWccqx8RERWUIKB7/gC4wfZa4GeSZgPHAC+zvTfwDuB1AJKeB1wAHGt7P2Ah\ncN74VDsiIqoqzwnonhOAz5Xly4ETaVzfKwFs90m6pezfE5gFfFuSaARj/9G+6HlNy7XyioiIqqjX\n69Tr9a6XmzEBXSDphcBPgEdodOBvXf5eDayw/U8l3TeArwI/Ar5o+8BhlJ0xARERU0rGBEw1xwNf\ntv1y279puxdYB/wcOLaMDejh6Z/wPwR2kXQANLoHJO09HhWPiIjqShDQHW+j8au/2TeAHhotBPcC\nXwaWAf9j+0ngOOCvJK0A7qKMF4iIiHiupDtgjEmabnujpBcBdwAH2n6kg/zpDoiImFImTndABgaO\nvWsk7QxsA5zbSQDwtFG/zxERMUH09PSOdxW2SEvABCfJeY8iIqJZBgZGRETEqCQIiIiIqKgEARER\nERWVICAiIqKiEgRERERUVG4RHAVJm4CVNO7hM3CU7YfHt1YRERHDk1sER0HSL23vOIJ8W9veNMy0\neYMiIkagp6eX9esfHO9qjIk8LGhieNYbIKkX+Aqwfdn0Ptu3SzoE+ASN+QT2BPaSdBLwARoPEroD\neG/rhwIkDoiI6FRfXx60NpQEAaOznaTlNIKBB2wfC/QBb7D9hKTdgcuA/Ur62cBv235Y0l405hz4\nHdubJH0eOAlY9NyfRkREVFGCgNF51PacAdu2BS6UtA+wCdijad+dTWMGXg/MAZZIEjCNRgARERHx\nnEgQ0H1/Aqy3/WpJWwOPNe3b2LQs4J9sf3ToIuc1Ldd4ekbiiIiognq9Tr1e73q5GRg4CpI22H7B\ngG3nAz+2/VlJfwR8yfbWZUzAWbaPLOleCfwLcJDt/5T0QuAFA+8uyCyCEREjNfrZ+iaqzB0wMbT6\ndF0EvEvSXcBv8cxf/09ntO8DPgbcKGklcCMwY6wqGhERMVBaAia4tARERIxUWgKGkjEBk0Juc4mI\n6FRPT+94V2HCSxAwCUzVSDYiIsZXxgRERERUVIKAiIiIikoQEBERUVEJAiIiIioqQUBERERFJQiI\niIioqAQBERERFTVlnxMg6aPACTRm8tsEnG57yfjWqvV8A8PIM1bViYhx1tPTy/r1D453NaKipmQQ\nIOkA4M3APrafkvQiGlP8TgQjePJPHhYUMVX19SXIj/EzVbsDXgr8zPZTALb/2/Z6SXMk1SUtkXS9\npB5JW0u6U9LBAJIWSPpEu4IlrZN0nqS7Sr7Zkm6QdL+k00ua6ZJukrRU0kpJR7Yp68OljBWSzhmD\n6xAREdHWVA0CbgReJmmNpM9LOljS84ALgGNt7wcsBM6zvQl4F/B/Jb0eeCMwf4jyH7Q9G/hBKecY\n4HVN+X4NHGV7LnAY8JmBBUg6HNjD9muB2cBcSQeN6qwjIiI6MCW7A2xvlDQH+F0aX8JfBz4FzAK+\nrUYn+1bAT0v61ZIWAdcA+/e3IAzim+XvKmC67UeBRyX9WtKOwKPAgtK6sBnYTdKuth9pKuONwOGS\nltOYIWg6sAeNwGKAeU3LtfKKiIiqqNfr1Ov1rpdbiamEJR0LnAE83/aBbdJ8jca36zttf3uQstYB\n+9r+b0nvLMsfKPseAOYCvw+8CTjJ9uaS5xDbD0v6pe0dJf0t8EPblwxR90wlHDGlTd3pbmPsdGsq\n4SnZHSDptyTt3rRpH2A1sEsZNIik50nauywfA7wQOBi4sPyaH9Ghy9+dgEdKAHAo0NsizbeAd0ua\nXuqwm6RdRnjciIiIjk3J7gBgB+ACSTsBTwFrgdOAi5u2bw18TlIfcB5wmO3/kHQB8HfAH7Upe7CQ\nvX/fV4FvSloJLAXuG5jG9rcl7QXcVm4B3ACcDPxnpycbERExEpXoDpjMGt0BETFV5TkBMRLd6g6Y\nqi0BU0oCtYiIGAsJAtqQdBUws3+VRjP+RwYbNBgRETGZpDtggpPkvEcREdEsdwdERETEqCQIiIiI\nqKgEARERERWVICAiIqKiJt3dAZI+CpwAbCqv020v6UK5BwFfAJ4AXmf78dGW2eIYhwAftv37Hebr\ndlUiYphyH39MZZMqCCiP/H0zsI/tpyS9CNi2S8WfRGNWwa91qbx2RjDUP3cHRIyXvr4E4TF1Tbbu\ngJcCP+uf5c/2f9teL2mOpLqkJZKul9QjaWtJd5aZ/JC0QNInWhUq6T3AW4FPSPpK2fbhkn+FpHPK\ntl5J90laKOmHkhZJer2kH5T1uSXdfpIWS1pW9u3R4pjbS7pU0u0lXUetAxEREaM1qZ4TUCbb+QGw\nHfAd4HJgMfBd4Ejb/yXprcDv2X5PmSDoSuADwF8zyDTBkhYC37R9laTDgeNsn16mHf434K+AHwP3\n02iJWC1pKbDC9imSjgT+yPbRknYAHi0TCL0e+P9sH1e6A86yfaSkTwH32v5amcvgzlLuYwPqlVkE\nI8ZVZvmLiaeSjw22vVHSHOB3gcOArwOfAmYB3y5f2FsBPy3pV0taBFzDIAFAC28EDpe0nMbTAqcD\ne9AIAtbZXl3S3UsjGAFYxdOzBe4MfLm0AJjW1/mNwO9L+tOyvi3wMuCHw6xjRETEqEyqIACgPD7v\ne8D3JK0CzgDusX1gmyyvAn4O9HRwGAELbF/yjI1SL9A8YHBz0/pmnr6enwButn1MyXNLm2Mca/v+\noaszr2m5Vl4REVEV9Xqder3e9XInVRAg6beAzbbXlk37AKuBN0o6wPbtkp4H/FZpBTgGeCFwMHCt\npP1s/3IYh/oWcK6kr5XWh92AJ/urMYz8OwH/XpbbTUn8LRrdFO8v57aP7RWtk84bxiEjImKqqtVq\n1Gq1Levz58/vSrmTbWDgDsA/SbpH0grglcDHgeOAvyrb7gJeJ+nFwHnAe0rQcAHwd4OUvaXTr0wS\n9DXgNkl30xhXsMPAdLTvrP9r4NOSltH+Gn8C2EbS3aVF49xB6hYREdF1k2pgYBVlYGDEeMvAwJh4\nKjkwsLpyn3LEeOnp6R06UcQkVbkgQNJVwMz+VRo/sz9SugAmpPwKiYiIsZDugAlOkvMeRUREs251\nB0y2gYERERHRJQkCIiIiKipBQEREREUlCIiIiKioBAEREREVVblbBIdL0iZgJbANjUcTv9P2r9uk\nPQfYYPv8MarLWBQbU1xPTy/r1z843tWIiAksQUB7G23PASgzEf4x8LnxqUpuEYzO9fUleIyIwaU7\nYHi+D+wOIOkdklZKukvSPw1MKOkUSXeW/VdKmla2Hy9pVdleL9v2lnSHpOWSVkh6xXN5UhERUW1p\nCWhPAGVWwiOA6yXtDXwUOMD2zyXt3CLfN2x/qeT9BPAe4PPAXwJvtP1TSTuWtH8MfM72ZeU4W4/t\nKUVERDwtQUB720laXpa/B1xK40v7Cts/B7D9ixb5Xl2+/HcGptOYMhjgBzRmQLwCuKpsuw34qKT/\nBVzdNEXyAPOalmvlFRERVVGv16nX610vN48NbkPSL23vOGDb+4Ae2385YPuWgYGSHgCOtH2PpHcC\nh9h+d0m3H/B/gHcAc0prwsvLtvcDp9muDyg7swjGCGX2u4ipKo8NHnutLu7NwPGSXgQg6YUt0uwA\nrJe0DXDSlsKk37S9xPY5wCPA/5b0ctvrbF8A/Cvw6q6fRURERBvpDmjvWT+hbK+W9Cngu5KeAu4C\n3j0g2ceBO2l80d8BvKBs/xtJe5Tlm2zfLekjkv4QeBL4KfCpMTiPiIiIltIdMMGlOyBGLt0BEVNV\nt7oD0hIwKeR+7+hcT0/veFchIia4BAGTQH7NRUTEWMjAwIiIiIpKEBAREVFRCQIiIiIqKkFARERE\nRSUIiIiIqKgEAcMk6aOS7ikzCC6X9FpJF0vaq+zf0Cbf/pJuL7MH3ivp489tzSMiIlrLLYLDIOkA\n4M3APrafKo8N3tb2aU3J2t3H90/AcWUuAQF7juD4Hdd5quvp6WX9+gfHuxoREZNaWgKG56XAz2w/\nBWD7v22vl3SLpDkljSSdX1oLvi3pxWX7LkBfyWfba0ricyR9WdJiST+UdEr7wzuvAa++vofaX66I\niBiWBAHDcyPwMklrJH1e0sEt0kwH7rQ9i8bUw+eU7Z8DfijpG5JOk/T8pjyvojEv8O8AH5c0Y+xO\nISIi4pkSBAyD7Y3AHOA04D+Br5dpgpttAq4oy4uAg0reTwD70ggkTgSub8rzr7afsP1fNGYofO2Y\nnURERMQAGRMwTG48u/d7wPckrQLeyeAz+2zZZ3sd8EVJXwL+s2kK4ub8al/evKblWnlFRERV1Ot1\n6vV618vNLILDIOm3gM2215b1TwA7AbOAD9teLmkz8HbbV0j6GLCL7TMlvdn2dSXfK4HvAj00phz+\nA+AAGtMNLwMOsL1+wLEzi2BLmSEvIqorswg+t3YALpC0E/AUsJZG18A/N6X5FfBaSX9JYyDg28r2\nP5R0PvBoyXuibZcR/3cDdeDFwLkDA4CIiIixlJaAcSLpHGCD7fOHSJeWgJbSEhAR1ZWWgErJcwIG\n6unpHe8qRERMemkJmOAkOe9RREQ061ZLQG4RjIiIqKgEARERERWVICAiIqKiEgRERERUVIKAiIiI\nikoQEBERUVGTJgiQ1CPpMkn3S1oi6RpJu7dJ21ue799q38WS9hrB8edJ2ijpJU3bNnRaTkRExEQx\naYIA4GrgZtt72N4P+Asaz+Bvp+XN9bZPs71mBMc3jRkEzxrqGN0macK/ZsyY+VxcioiI6KJJEQRI\nOhR4wvYl/dtsrwJWSLpJ0lJJKyUd2ZRtG0mLJK2WdIWkaaWsWyTNKcsbJH1S0gpJiyXtMkRVFgJv\nk7Rzizp+SNIqSXdLOrNs6y3Hv1jSPZJukPT8su83JV1fWjW+WyYpasMT/tXX99AQly4iIiaaSREE\n0Jitb1mL7Y8BR9meCxwGfKZp357Ahbb3BjYA722Rfzqw2PY+wPeBU4eoxwbgH4APlnUBSNqXxtTC\n+wGvA06V9JqSZnfgAtuzgP8Bji3bLwbeV1o1/hT4v0McOyIioqsmSxDQzlbAAkkrgZuA3STtWvY9\nbPv2srwIOKhF/sf7p/mlEWTMHMYxLwDeIWkHnu4OOBC42vavbW8ErgJ+t+xbV1otthxD0nTgd4Ar\nJd0FfJHBuzYiIiK6brJMIHQvcFyL7ScBLwFm294saR0wrewb2F/fqv/+yablTQzjetj+H0lfA84Y\nstYNjw84xjQawcvPbc8ZXhHzmpZr5RUREVVRr9ep1+tdL3dSBAG2b5b0KUmn2P4SgKRXAb3AIyUA\nOLSs9+uVtL/tO4ATaTT3DzTSyRc+Cyzh6ev3fWChpE8DWwNHAye3O4btDZLWSTrO9j+X83m17btb\nH27eCKsZERFTQa1Wo1arbVmfP39+V8qdTN0BRwOHS1pbbv87D7gW2K90B5wM3NeUfg1whqTVwM7A\nF8r25haBEY3ut/1fNO5W2Las3wX8I43A4DbgYtsrhzjGycB7yqDEe4Aj26SLiIgYE5lKeIKT5Ofo\nTsRREvksRUQ8N7o1lfCk6A6IUb/PY66np3foRBERMaEkCBhA0tnA8TR+fqv8vdL2gvGqU35hR0TE\nWEh3wAQnyXmPIiKiWbe6AybTwMCIiIjoogQBERERFZUgICIioqISBERERFRUgoCIiIiKShAwDJI2\nSVpepgq+vH9a4lGW+U5JFwwz7bi/ZsyYOdpTjoiICSZBwPBstD3H9qtoTDr0x8PNKGmwazzMe/88\n7q++voeGV9WIiJg0EgR07vvA7gCSrpa0pLQQnNKfQNIGSX9bpgk+QNJcSbeWeQJuL1MJA/yGpOsl\n/VDSX43DuURERIXliYHDIwBJzwOOAK4v2//I9i9K98ASSd+w/XNgOnCb7Q9L2obGZEbH214uaQfg\n1yX/a4B9aLQu/FDS39v+9+fwvCIiosLSEjA820laDtwJPARcWrZ/UNIK4HbgfwF7lO1PAVeV5T2B\n/7C9HMD2r2xvKvu+U9YfB1bzzKmQIyIixlRaAobnUdtzmjdIOgQ4DNjf9uOSbgH6Bwz+esCzfts9\n2vHxpuVNtH0/5jUt18orIiKqol6vU6/Xu15ugoDhafUlvhPw8xIA7AUc0Cb9D4EZkva1vax0BzzW\n2eHndZY8IiKmlFqtRq1W27I+f/78rpSbIGB4Wo3ivwH4Y0n30viiv61VettPSnobcKGk7YBHgTcM\n8xgRERFjJrMITnCSPDHiA2VK44iICaJbswimJWBSGPX7PGo9PRmzGBEx1SQImATyCzwiIsZCbhGM\niIioqAQBERERFZUgICIioqISBERERFRUgoCIiIiKShAQERFRURM+CJC0SdJySXeVv3/WQd5DJH1z\nlMe/RdKcoVN2fnxJu0r6Zpli+F5J17RJN2avGTNmjuTUIiJiCpgMzwnYOHDyng6N+CZ7Sd0IkgY7\n/rnAjbYvKMeb1XkRo9PXN/4PIoqIiPEx4VsCaPO4PEnrJJ1XWgjulDRb0g2S7pd0WlPSnSRdI2mN\npIua8l9U8q2SdM6Acj8taSlwfNN2SVoo6dyyfrikxZKWSrpc0vZl+5sk3VfyHzPEub0U+En/iu17\nOrguERERozIZgoDtBnQHHN+070Hbs4EfAAtpfOm+jsYv7H77AWcArwR2l9T/xXy27dcCrwFqA36F\n/8z2XNuXl/VtgK8CP7L9cUkvBj4GvN72XGAZ8CFJzwcuBt5Sts8Y4tw+D/yDpO9IOlvSSzu5MBER\nEaMxGboDHh2kO6C/v30VMN32o8Cjkn4tacey707bDwFIugw4CLgKeLukU2lcgxnA3kD/L/H+L/9+\nXwQut72grB9Q0t8qSTSChNuAvYAHbD9Q0i0CTm13YrZvlPRy4E3Am4HlkmbZ/q9nppzXtFwrr4iI\nqIp6vU69Xu96uZMhCBjM4+Xv5qbl/vX+cxvYoW5JM4GzgH1t/1LSQmBaU5qNA/LcChwq6Xzbj9Po\norjR9knNiSS9hg5n+7H9C+DrwNfLIMKDgaufmWpeJ0VGRMQUU6vVqNVqW9bnz5/flXInQ3fASEau\nNefZX1JvGeT3NhpdBzsCvwI2SOoBjhiivEuB64ErSjm3AwdKegWApO0l7QGsAXrLr3uAEwatpHSo\npO3K8guAVwAPd3CeERERIzYZWgKmSVpO44vdwA22z2bwIfPN++4ELgR2B262fTWApBXAfcCPaQQG\nrfJuWbf9WUk7AV+xfZKkdwGXlXEABj5m+35JpwPXSdoIfB/YYZB67gtcKOlJGgHZxbaXDZI+IiKi\na5Rpaic2SWP6BvX09LJ+/YNjeYiIiOgySdge9T3ek6EloPISqEVExFhIEPAcKF0HZ/LMroZbbb9/\nfGoUERGR7oAJT5LzHkVERLNudQdMhrsDIiIiYgwkCIiIiKioBAEREREVlSAgIiKiohIEREREVNSE\nDgIkbRowg+CfdZD3kPIs/tEc/xZJ7SYvGvXxJR0haYmkeyQtk/Q3bdKN6jVjxsyRnEJERExxE/05\nARsHmUFwOEZ8b12ZI2C02h6/TF18AXBEedywgNM6LGZY+vpGfRdJRERMQRO6JYA2kwdJWifpvNJC\ncKek2ZJukHS/pOYv0p0kXSNpjaSLmvJfVPKtknTOgHI/LWkpcHzTdklaKOncsn64pMWSlkq6XNL2\nZfubJN1X8h8zxLn9KfBJ2/cDuOGLHV6fiIiIEZvoQcB2A7oDjm/a96Dt2TQm/1lI40v3dcC5TWn2\nA84AXgnsLqn/i/ls268FXgPUyq/yfj+zPdf25WV9G+CrwI9sf1zSi4GPAa+3PRdYBnyoTCR0MfCW\nsn3GEOc2q+SNiIgYFxO9O+DRQboD+vvbVwHTbT8KPCrp15J2LPvutP0QgKTLgIOAq4C3SzqVxvnP\nAPYG7il5+r/8+30RuNz2grJ+QEl/a2nC3wa4DdgLeMD2AyXdIuDUkZz0s81rWq6VV0REVEW9Xqde\nr3e93IkeBAzm8fJ3c9Ny/3r/eT1rWmBJM4GzgH1t/1LSQmBaU5qNA/LcChwq6Xzbj9PoorjR9knN\niSS9hjbdF23cA8ylEcQMYV4HxUZExFRTq9Wo1Wpb1ufPn9+Vcid6d8BIRrQ159lfUm8Z5Pc2Gl0H\nOwK/AjZI6gGOGKK8S4HrgStKObcDB0p6BYCk7SXtAawBeiW9vOQ7YYhy/xb4i5IXSVtJOn3YZxkR\nETFKE70lYJqk5TS+2A3cYPtsBh8u37zvTuBCYHfgZttXA0haAdwH/JhGYNAq75Z125+VtBPwFdsn\nlVkBLyvjAAx8rIzwPx24TtJG4PvADm0raa+S9MFSznalnGsGOa+IiIiuyiyCE5ykUb9BPT29rF//\nYBdqExERE0G3ZhGc6C0BASRQi4iIsZAgYIyVroMzeWZXw6223z8+NYqIiGhId8AEJ8l5jyIiolm3\nugMm+t0BERERMUYSBERERFRUgoCIiIiKShAQERFRUZUNAiT1SLqszDy4pMw2uHubtL2SWj7eV9LF\nkvYawfHPkfSTARMk7Th0zoiIiO6o8i2CVwMLbZ8AIOlVQA+wtk36lkP0bZ/WavswnW/7/KESNeYp\n6kweEBQREUOpZEuApEOBJ2xf0r/N9ipghaSbJC2VtFLSkU3ZtpG0SNJqSVdImlbKukXSnLK8QdIn\nJa2QtFjSLkNVZXg1dsevvr6Hhld0RERUViWDAGAWsKzF9seAo2zPBQ4DPtO0b0/gQtt7AxuA97bI\nPx1YbHsfGnMHDDWV8J80dQd8p9OTiIiIGI2qBgHtbAUskLQSuAnYTdKuZd/Dtm8vy4uAg1rkf9z2\ndWV5GTBziOOdb3uO7dm2Xz/KukdERHSkqmMC7gWOa7H9JOAlwGzbmyWtA6aVfS1nGBzgyablTXTt\n+s5rWq6VV0REVEW9Xqder3e93EoGAbZvlvQpSafY/hJsGRjYCzxSAoBDy3q/Xkn7274DOJFGc/9A\nnY7gG2b6eR0WGxERU0mtVqNWq21Znz9/flfKrXJ3wNHA4ZLWltv/zgOuBfYr3QEnA/c1pV8DnCFp\nNbAz8IWyvblFoNOH/H9wwC2CLxvRmURERIxAJhCa4CS589gCQJmCOCJiiurWBEKV7A6YfEb2nICI\niIjBJAgYY5LOBo6n8XNe5e+VthcMt4z8oo+IiLGQ7oAJTpLzHkVERLNudQdUeWBgREREpSUIiIiI\nqKgEARERERWVICAiIqKiEgRERERUVGWDAEk9ki6TdL+kJZKukbR7m7S95amCrfZdLGmvEdbhHZJW\nlWmLl0n6UJt0w3rNmDFzJNWIiIiKqvJzAq4GFto+AbbMHdADrG2TvuV9erZPG8nBJR0BfAB4g+0+\nSdsA7+jg0M/S1zfqu0UiIqJCKtkSUCYHesL2Jf3bbK8CVki6SdLS8uv8yKZs20haJGm1pCskTStl\n3SJpTlneIOmTklZIWixpl0Gq8efAWbb7yvGftH1p1082IiKijUoGAcAsYFmL7Y8BR9meCxwGfKZp\n357Ahbb3BjYA722Rfzqw2PY+NGYZPHWIOiwfQd0jIiK6oqpBQDtbAQvKLII3AbtJ2rXse9j27WV5\nEXBQi/yP276uLC8DZg5yrDwGMCIixlVVxwTcCxzXYvtJwEuA2bY3S1oHTCv7Bn5pt/oSf7JpeROD\nX997gX2B+tDVnde0XCuviIioinq9Tr1e73q5lZ07QNJtwKW2v1TWXwUcDbzY9pll3MB3aPyaF7AO\neJ3tOyRdAtxr+3OSbqHRt79c0gbbLyjlHQu8xfa72xz/COBc4P+UgYHbAn84cFxAZ1MJZ/rgiIgq\nyNwBo3c0cLikteX2v/OAa4H9SnfAycB9TenXAGdIWg3sDHyhbG/+1h32N7Dt64ELgZvK8ZcCLxjp\nyURERHSqsi0Bk0VaAiIiYqButQRUdUzAJDO897mnp3eM6xEREVNJgoAxJuls4HgaP+dV/l5pe8Fw\ny8iv+4iIGAvpDpjgJDnvUURENMvAwIiIiBiVBAEREREVlSAgIiKiohIEREREVFSCgIiIiIqasEGA\npE2Slku6q/z9sw7yHiLpm6M8/pYpgkeQd8jjSzqqTFe8WtLd5THD7dIO+ZoxY+ZIqhoRERU2kZ8T\nsNH2iL6EixHfVyepG8FR2+NLeg3w18AbbD8saSaNxwc/YPuuDoraoq9v1HeKRERExUzYlgDaPCZP\n0jpJ55UWgjslzZZ0g6T7JZ3WlHQnSddIWiPpoqb8F5V8qySdM6DcT0taSuPhPv3bJWmhpHPL+uGS\nFktaKulySduX7W+SdF/Jf8wQ53YWcJ7thwFsP0hj7oIPd3KBIiIiRmMiBwHbDegOOL5p34O2ZwM/\nABbS+NJ9HY1Z+frtB5wBvBLYXVL/F/PZtl8LvAaoSZrVlOdntufavrysbwN8FfiR7Y9LejHwMeD1\ntucCy4APSXo+cDGNWQPnAjOGOLffLnmbLS11jYiIeE5M5O6ARwfpDujvb18FTLf9KPCopF9L2rHs\nu9P2QwCSLgMOAq4C3i7pVBrnPgPYG7in5On/8u/3ReDypkf8HlDS3ypJNIKE24C9gAdsP1DSLQJO\nHclJtzbKjmQOAAAWoUlEQVSvablWXhERURX1ep16vd71cidyEDCYx8vfzU3L/ev95zSwI92l7/0s\nYF/bv5S0EJjWlGbjgDy3AodKOt/24zS6KG60fVJzotLH30mn/L3AXBpBTL+5NFoDWpjXQdERETHV\n1Go1arXalvX58+d3pdyJ3B0wkpFuzXn2l9RbBvm9jUbXwY7Ar4ANknqAI4Yo71LgeuCKUs7twIGS\nXgEgaXtJewBrgF5JLy/5Thii3M8Afy6pt5QzE/gA8DfDOsuIiIgumMgtAdMkLefpmfdusH02gw+V\nb953J3AhsDtws+2rASStAO4DfkwjMGiVd8u67c9K2gn4iu2TJL0LuKyMAzDwMdv3SzoduE7SRuD7\nwA5tK2mvlPQR4JulnF7gUNv3D3JuERERXZVZBCcASecB+wO/Z/upAfuG9Qb19PSyfv2DY1C7iIiY\naLo1i2CCgAkuUwlHRMRA3QoCJnJ3wKRXug7O5JldDbfafv/41CgiIuJpaQmY4NISEBERA3WrJWAi\n3x0QERERYyhBQEREREUlCIiIiKioBAEREREVlSAgIiKioioZBEjqkXRZmX54SZlyePc2aXslrWqz\n72JJe43g+OdI+kmZHfGHkv5ZUtsZBCW1fM2YMbPTQ0dERGxR1ecEXA0stH0CgKRXAT3A2jbpW96j\nZ/u0UdThfNvnl+O/FbhZ0izb/zXMw9PXN+q7QyIiosIq1xIg6VDgCduX9G+zvQpYIekmSUslrZR0\nZFO2bSQtkrRa0hWSppWybpE0pyxvkPRJSSskLZa0y3DrZPsK4FvAiV05yYiIiGGoXBAAzAKWtdj+\nGHCU7bnAYTRm+uu3J3Ch7b2BDcB7W+SfDiy2vQ+NCYRO7bBedwEddy1ERESMVFW7A1rZClgg6WBg\nM7CbpF3Lvodt316WFwHvB84fkP9x29eV5WXAGzo8/iBt+/OalmvlFRERVVGv16nX610vt4pBwL3A\ncS22nwS8BJhte7OkdcC0sq/lNMMDPNm0vInOr+1sYEnrXfM6LCoiIqaSWq1GrVbbsj5//vyulFu5\n7gDbNwPbSjqlf1sZGNgLPFICgEPLer9eSfuX5RNpNPcP1OkovS3pJR0LHA5c1mEZERERI1a5IKA4\nGjhc0tpy+995wLXAfpJWAicD9zWlXwOcIWk1sDPwhbK9uUWg01l+Pth/iyCNwOKw1ncGREREjI3M\nIjjBSWr7BvX09LJ+/YPPYW0iImIi6NYsglUcEzDpJFCLiIixkCBgDEk6GzieRleByt8rbS8Y14pF\nRESQ7oAJT5LzHkVERLNudQdUdWBgRERE5SUIiIiIqKgEARERERWVICAiIqKiKhsESOqRdJmk+yUt\nkXSNpN3bpO0tDxVqte9iSR1P/CPpHEk/KQ8MWi3p852WERERMRqVDQKAq4Gbbe9hez/gL4CeQdK3\nHKJv+zTba0ZYh/NtzymzE75a0iGtEkl61mvGjJkjPGRERERDJYOAMjfAE7Yv6d9mexWwQtJNkpZK\nWinpyKZs20haVH61XyFpWinrFklzyvIGSZ+UtELSYkm7DFWVkm8a8Hzg562T+Vmvvr6HRnLqERER\nW1QyCABm0Zjud6DHgKNszwUOAz7TtG9P4MLyq30D8N4W+acDi23vQ2OSoVOHqMefSFoO/DvwI9t3\nd3YaERERI1fVIKCdrYAFZRKhm4DdJO1a9j1s+/ayvAg4qEX+x21fV5aXATOHON75tucAuwI7SHrr\nqGofERHRgao+Nvhe4LgW208CXgLMLlMKrwOmlX0DxwS0GiPwZNPyJoZ5fW1vknQDcDBwxbNTzGta\nrpVXRERURb1ep16vd73cSgYBtm+W9ClJp9j+EoCkVwG9wCMlADi0rPfrlbS/7TtoTP37/RZFd/oI\nx/4xAQIOBJa3Tjavw2IjImIqqdVq1Gq1Levz58/vSrlV7g44Gjhc0tpy+995wLXAfqU74GTgvqb0\na4AzJK0Gdga+ULY3twh0+pD/D5YxAXfTeC8u6vw0IiIiRiYTCE1wktw6tlCmGI6IqKhuTSBUye6A\nyefZ73NPT2+LdBEREcOXIGCMSTobOJ7Gz3mVv1faXjDcMvKLPyIixkK6AyY4Sc57FBERzbrVHVDl\ngYERERGVliAgIiKiohIEREREVFSCgIiIiIpKEBAREVFRlQ0CJPVIukzS/ZKWSLpG0u5t0vaWpwq2\n2nexpL1GUY8Vkr42RJpnvGbMmDnSw0VERGxR5ecEXA0stH0CbJk7oAdY2yZ9y/v0bJ820gqU4GEr\n4HclbWf7seEcuq9v1HeFREREVLMloEwO9ITtS/q32V4FrJB0k6SlklZKOrIp2zaSFklaLekKSdNK\nWbdImlOWN0j6ZPl1v1jSLkNU5QTgy8CNwB909SQjIiKGUMkgAJgFLGux/THgKNtzgcOAzzTt2xO4\n0PbewAbgvS3yTwcW296HxiyDpw5Rj7cBXy+vEzs6g4iIiFGqcndAK1sBCyQdDGwGdpO0a9n3sO3b\ny/Ii4P3A+QPyP277urK8DHhDuwNJ2hf4me2fSPop8A+Sdrb9i2ennte0XOvohCIiYvKr1+vU6/Wu\nl1vVIOBe4LgW208CXgLMtr1Z0jpgWtk3cExAqzECTzYtb2Lw63sCsKekB2jMKfAC4Fjg0mcnnTdI\nMRERMdXVajVqtdqW9fnz53el3Ep2B9i+GdhW0in928rAwF7gkRIAHFrW+/VK2r8sn0ijuX+gYY3Y\nkyTgrcAs279p++XAUaRLICIinkOVDAKKo4HDJa0tt/+dB1wL7CdpJXAycF9T+jXAGZJWAzsDXyjb\nm1sEhjvTz+8CP7Hd17Tte8ArJfV0fioRERGdyyyCE5wkPzu2UKYXjoiosG7NIljVMQGTzDPf556e\n3jbpIiIihi9BwBiTdDZwPI2f8yp/r7S9YLhl5Fd/RESMhXQHTHCSnPcoIiKadas7oMoDAyMiIiot\nQUBERERFJQiIiIioqAQBERERFZUgICIioqIqHQRI6pF0maT7JS2RdI2k3duk7S1PFmy172JJe43g\n+OdI+omk5eV1Xpt0zJgxs9PiIyIiBlX15wRcDSy0fQJsmT+gB1jbJn3Le/VsnzaKOpxve+BshM86\nbF/fqO8EiYiIeIbKtgSUCYKesH1J/zbbq4AVkm6StFTSSklHNmXbRtIiSaslXSFpWinrFklzyvIG\nSZ+UtELSYkm7DFWVbp9bRETEcFQ2CABmActabH8MOMr2XOAw4DNN+/YELrS9N7ABeG+L/NOBxbb3\noTHT4KlD1ONPmroDDu/0JCIiIkaq6t0BrWwFLJB0MLAZ2E3SrmXfw7ZvL8uLgPcDA5vyH7d9XVle\nBrxhiOMNoztgXuO/8+Y9a07piIiY+ur1OvV6vevlVjkIuBc4rsX2k4CXALNtb5a0DphW9g0cE9Bq\njMCTTcub6Mo1ngfMZ968eaMvKiIiJp2BPwDnz5/flXIr2x1g+2ZgW0mn9G8rAwN7gUdKAHBoWe/X\nK2n/snwijeb+gdLHHxERk0Jlg4DiaOBwSWvL7X/nAdcC+0laCZwM3NeUfg1whqTVwM7AF8r25haB\nzPYTERGTQmYRnOAkGaCnp5f16x8c59pERMRE0K1ZBKs8JmDSSKAWERFjIUHAc0DS2cDxNLoKVP5e\naXvBuFYsIiIqLd0BE5wk5z2KiIhm3eoOqPrAwIiIiMpKEBAREVFRCQIiIiIqKkFARERERSUIiIiI\nqKhKBgGSeiRdJul+SUskXSNp9zZpe8vTBFvtu1jSXiOsw8llquJVku4qZe3YJi0zZswcyWEiIiLa\nqupzAq4GFto+AbbMGdADrG2TvuU9erZPG8nBJb0JOBP4PdvrJQl4Z6nDL1sdvq8vUxJERER3Va4l\noEwK9ITtS/q32V4FrJB0k6Sl5Rf6kU3ZtpG0SNJqSVdImlbKukXSnLK8QdInJa2QtFjSLoNU42zg\nLNvry/Ft+x9t39/1E46IiGijckEAMAtY1mL7Y8BRtucChwGfadq3J3Ch7b2BDcB7W+SfDiy2vQ+N\n2QVPHaQOvw3cNYK6R0REdE1VuwNa2QpYIOlgYDOwm6Rdy76Hbd9elhcB7wfOH5D/cdvXleVlwBsG\nOdaW7gVJs4CvAC8A/sL2lc9OPq/x33nznjWndERETH31ep16vd71civ32GBJhwHn2D5kwPZ3Am8C\nTrK9WdI64BAaz/qv2355SXco8D7bx0q6hUaz/nJJv7S9Y0lzLPAW2+9uU4fvAh+3/d2mbRcAS2x/\neUBa9085ULX3KiIiWstjg0fI9s3AtpJO6d9WBgb2Ao+UAODQst6vV9L+ZflEGs39A3XyZnwa+FtJ\nv9G0bbsO8kdERIxa5YKA4mjgcElry+1/5wHXAvtJWgmcDNzXlH4NcIak1cDOwBfK9uaf5sP+mW77\neuDvgesl3SPpB8BTwLdGekIRERGdqlx3wGTT6A6Anp5e1q9/cJxrExERE0G3ugMyMHASSKAWERFj\nIUHAGJJ0NnA8/SP7Gn+vtL1gXCsWERFBugMmPEnOexQREc1yd0BERESMSoKAiIiIikoQEBERUVEJ\nAiIiIiqqskGApB5Jl0m6X9ISSddI2r1N2t7yUKFW+y6WtFeHxz5b0l3l9ZSk5eX1vpGcS0RExEhU\n9u4ASYuBhf1TCpdHB+9o+9YWaXuBb9p+9RjUY8ucA232Ow8KioiIZrk7YBTK3ABP9AcAALZXASsk\n3SRpqaSVko5syraNpEWSVku6QtK0UtYtkuaU5Q2SPilphaTFknbpRn37+h7qRjERERHPUMkgAJhF\nY7rfgR4DjrI9FzgM+EzTvj2BC23vDWwA3tsi/3Rgse19aEwydGpXax0REdFFVQ0C2tkKWFAmEboJ\n2E3SrmXfw7ZvL8uLgINa5H/c9nVleRkwcywrGxERMRpVfWzwvcBxLbafBLwEmF2mFF4HTCv7Bg6e\naDWY4smm5U108frOmzcPgFqtRq1W61axERExCdTrder1etfLrfLAwNuAS21/qay/isYUwy+2fWYZ\nN/AdGr/mBawDXmf7DkmXAPfa/pykW4CzbC+XtMH2C0p5xwJvsf3uIeqxJU+b/YZMIhQREU/LwMDR\nOxo4XNLacvvfecC1wH6lO+Bk4L6m9GuAMyStBnYGvlC2N387j+SbOt/uERExLirbEjBZpCUgIiIG\nSktAhfT09I53FSIiYgpKS8AYk3Q2cDyNZn+Vv1faXjDM/JlKOCIinqFbLQEJAia4BAERETFQugMi\nIiJiVBIEREREVFSCgIiIiIpKEBAREVFRCQIiIiIqasggQNImScsl3VX+/tlwC5d0iKRvjqaCzVP1\njiDvoMeX9E5JmyUd1rTtqLLtmLJ+iaS9Ojzu73dynSIiIsbDcCa42Wh7RF/CxYjvb5PUjZaKoY5/\nN/B24Oay/nZgxZbMdsfTAdv+JjCq4CciImKsDedLtuV9iJLWSTqvtBDcKWm2pBsk3S/ptKakO0m6\nRtIaSRc15b+o5Fsl6ZwB5X5a0lIaD9np3y5JCyWdW9YPl7RY0lJJl0vavmx/k6T7Sv5jhnF+PwBe\nK2lrSdOB3WkKAvpbIiRtVY5/t6SVks4s+z8g6V5JKyR9rWx7p6QLyvJCSX8n6dYyT0F/C4PKNVgt\n6VuSru3fFxER8VwYTkvAdpKW8/TT7hbYvrLse9D2bEnnAwuB3wG2B+4BLi5p9gNeCTwMfEvSMbav\nAs62/Yvya/87kr5h+56S52e25wJI+mNgG+CrwCrbCyS9GPgY8Hrbj5Wm9w9J+pty3JrtByRdPozz\nM3AT8CZgJ+BfgZe3SLcP8Bu2X13qtWPZ/hFgpu0nm7b1l9tvhu0DJb0S+DfgKuBY4GW295bUQ2Oy\nokuHUd+IiIiuGE4Q8Ogg3QH9Td6rgOm2HwUelfTrpi/EO20/BCDpMuAgGl+Cb5d0aqnDDGBvGsED\nwMAv7y8Clzc9aveAkv5WSaIRJNwG7AU8YPuBkm4RMFRzvoGvA2cCOwJnAR9tke4B4OWS/g64Drix\nbF8JfE3SvwD/0uYY/wJg+z5Ju5ZtBwJXlu19ZUrilubNm7dluVarUavVhjiliIiYSur1OvV6vevl\nDicIGMzj5e/mpuX+9f6yB/bJW9JMGl+2+9r+paSFwLSmNBsH5LkVOFTS+bYfp9EqcaPtk5oTSXoN\nbbovBmN7qaRXAb+yvbYRVzwrzS9K+b8HnA68FXgP8BbgYOBI4KOSZrU4RPO16bh+zUFARERUz8Af\ngPPnz+9KuSMeE9BBnv0l9ZZm/7fR6IPfEfgVsKE0hR8xRHmXAtcDV5RybgcOlPQKAEnbS9oDWAP0\nSupvzj+hgzp/hNYtAJRjvBjY2vbVwF8Cs8uul9n+LvDn5bx2GOI4/dfmVuDYMjagB6h1UNeIiIhR\nG05LwLQBYwJusH02g4+6b953J3AhjQF3N5cvUSStoNEP/mMagUGrvFvWbX9W0k7AV2yfJOldwGWS\nnl/SfMz2/ZJOB66TtBH4PkN/KVPK/1abOvQv/wawsAQhBv5c0vOARaXrQ8DflZaNdteief0bwGHA\nveUaLAP+Zzh1jYiI6IbMIjiOJE23vVHSi4A7gANtPzIgTWYRjIiIZ1BmEZwSrpF0F/A94NyBAUCM\nzlgMoqmCXLeRyXUbmVy38VWJIEDSu/T0Ew/7XxeMd71sH2p7tu1Ztr8y3vWZavKPy8jkuo1MrtvI\n5LqNr9HeHTAp2P5H4B/HuRoRERETSiVaAiIiIuLZMjBwgpOUNygiIp6lGwMDEwRERERUVLoDIiIi\nKipBQEREREUlCBhHZdrjNZJ+JOkjbdL8vRrTM6+QtE8neaeqEVy32U3bHyxTQd8l6c7nrtbjb6jr\nJmlPNabn/rWkD3WSdyob5XXL5639dTuxXJuVkn4g6dXDzTuVjfK6df55s53XOLxoBGBrgV4asyCu\nAPYakOYI4NqyvD9w+3DzTtXXaK5bWX8AeOF4n8cEvW4vAfYFPgF8qJO8U/U1muuWz9uQ1+0AYKey\n/Kb8+za66zbSz1taAsbPa4H7bT9k+0ka0xn/wYA0fwB8GcD2HcBOZbKh4eSdqkZz3aAxx0MVP/dD\nXjfbP7O9DHiq07xT2GiuG+TzNth1u912/3wpt9OYn2VYeaew0Vw3GMHnrYofzoniN2hMHNTvJzzz\nzRwszXDyTlUjuW7/3pTGwLclLZF06pjVcuIZzWcmn7endXru+bw1DHXdTqExU+xI8k4lo7luMILP\nWyWeGDiFjPqe0OBA2z+VtAuN/1nus/2DIXNFjEw+b0OQdCjwR8BB412XyaTNdev485aWgPHz78DL\nmtb/V9k2MM3/bpFmOHmnqtFcN2z/tPz9T+BqGs1vVTCaz0w+b0/r6Nzzedui5XUrg9ouBo60/fNO\n8k5Ro7luI/q8JQgYP0uA3SX1StoWeDvwbwPS/BvwDgBJBwC/sN03zLxT1Yivm6TtJe1Qtk8H3gjc\n89xVfVx1+plpbnXK520E1y2ft8Gvm6SXAd8A/tD2/99J3ilsxNdtpJ+3dAeME9ubJL0PuJFGMHap\n7fsknd7Y7YttXyfpzZLWAhtpNP20zTtOp/KcGs11A3qAq9V4FPPzgK/avnE8zuO5NpzrVgZPLgVe\nAGyWdCawt+1f5fPW+XUDdiGft7bXDfhL4EXARZIEPGn7tfn3bWTXjRH++5bHBkdERFRUugMiIiIq\nKkFARERERSUIiIiIqKgEARERERWVICAiIqKiEgRERERUVIKAiIiIikoQEBERUVH/D0fp/Hca+rNg\nAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Simple version that shows all of the variables\n", "feature_importances = pd.Series(model.feature_importances_, index=X.columns)\n", "feature_importances.sort_values(inplace=True)\n", "feature_importances.plot(kind=\"barh\", figsize=(7,6));" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnMAAACMCAYAAAAX6y/rAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE/VJREFUeJzt3XuwZlV55/HvT8CALSAacpjRsZ0MIiIoNLdWmeFIboZM\ngdeoMAoa6Tjq6Iw6k6qo0EjUyWSGWIrWhEhRGlIIiZJJECzU8AoIDUgDcg8tF1HpVgxq2y1IN8/8\n8e42r4dzec853We/+5zvp2pX78tae6+997vrPL3W3mulqpAkSVI3PantAkiSJGnuDOYkSZI6zGBO\nkiSpwwzmJEmSOsxgTpIkqcMM5iRJkjps57YL0JYk9skiSZI6o6oy2folXTNXVU4tTaeddlrrZVjq\nk/eg/cl74PVf6pP3YPhpOp0I5pK8P8mtSW5OsjbJ4W2XSZIkaRSMfDNrkpXAscDBVbUlydOBJ7dc\nLEmSpJGQmaru2pbklcDJVXX8hPUrgDOBZcBDwMnNv9cA76uqK5J8FNhSVR+cZL+jfeKSJEkDaop3\n5roQzC0DrgJ2A74KXABcDXwNOK6qfpjk94Hfqao/SHIA8DfAu4D/BRxZVVsm2W/BaJ+7JElSX6YM\n5ka+mbWqNjW1cP8eOAb4HPBh4EDgy0lC/92/B5v0tyc5D7iYKQI5SZKkxWLkgzmA6lcfXgFckeQW\n4B3ArVX10imyHAQ8DIxNv+fVA/PjzSRJktS2XjPNrAvNrPsBj1fVumb5DGAv4LeBN1XVmiQ7A/s1\ntXKvAk4B/gvwReDwqvrJJPu1mVWSJHXE1M2sXQjmVgCfAPYEtgDrgFXAswbW7wR8DPg74OvAMVX1\nvSTvBA6tqjdPsl+DOUmS1BEdDuZ2FIM5SZLUHR3+AGLHmvSaSJIkdcaSDuaWaq2kJEnqln7nHZPr\nxHBekiRJmpzBnCRJUocZzEmSJHVYJ4K5JK9I8njT55wkSZIanQjmgNcDVwJvaLsgkiRJo2Tk+5lL\nsgy4E3gZcHFV7d+Mx/pJ+uNvPUC/M+FzquoLTSfDZwLLgIeAk6tqwyT7He0TlyRpRIyNLWf9+vva\nLsaSlnS7n7njgS9V1bokDyU5BPh14NlVdUCSMeAO4JxmWK9PAMdV1Q+T/D7wEeAPJt+18ZwkSTPZ\nsMF+WUdZF4K5N9AfqgvgAuAE+uX+G4Cq2pDk8mb784ADgS83tXdPAr63sMWVJElaOCMdzCXZCzgG\nOLBpFt2JfnXaRVNlAW6tqpcOd4TVA/PjzSRJktSuXq9Hr9cbKu1IvzOXZBVwSFX954F1lwOXA4fR\nb4L9NeB24BTgH4DbgDdV1Zqm2XW/qrp9kn07NqskSUOJoya1bLp35kb9a9bX8cRauM8DY8B36Adu\nnwVuAH5cVY8BrwH+NMlNwI3AixeuuJIkSQtrpGvmppNkWVVtSvJ04FrgpVX1/Vnkt2ZOkqShWDPX\ntq5/zTqVi5M8DdgF+NBsArl/4dc5kiTNZGxsedtF0DQ6WzM3X0lqqZ67JEnqli6/MydJkqRpGMxJ\nkiR1mMGcJElSh43kBxBJtgI30/9CoYBXVNW32y2VJEnS6BnJDyCS/KSq9phDvp2qauuQaf0AQpIk\ndUIXuyZ5QmGTLAf+CnhKs+qdzSgPRwNnAA/TH5t1/yQnAu+i323JtcDbJ4vc+sO3SpK0MMbGlrN+\n/X1tF0OLzKgGc7slWUs/qLunql4NbAB+s6p+nmRf4Hzg8Cb9IcALqurbSfanP3LES6pqa5JPAicC\n5z3xMNbMSZIWzoYNViJo+xvVYG5zVa2YsO7JwFlJDga2As8d2HbdwDt1vwGsAK5Pv+ptV/qBoCRJ\n0qIzqsHcZP4bsL6qXphkJ+BnA9s2DcwH+ExVvX/mXa4emB9vJkmSpHb1ej16vd5QaUf1A4iNVbX7\nhHVnAg9U1Z8neTPw6araqXln7r1VdVyT7vnA3wFHVdUPkuwF7D7xa1jHZpUkLTzHONXcdHEEiMl+\n6Z8CTk5yI7Afv1wb9y8Zq+4APgBcluRm4DJgnx1VUEmSpDaNZM3cQrBmTpK08KyZ09x0sWuSBeJX\nRZKkhTM2trztImgRWtLBnP87kiRJXTeq78xJkiRpCAZzkiRJHWYwJ0mS1GELHswl2ZpkbZJbklyQ\nZNdp0p6W5D0LWT5JkqQuaaNmblNVraiqg4DHgLe1UAZJkqRFoe2vWa8EDgJI8ibgvcDjwDer6qTB\nhEneCqwCdgHWAW+sqkeSvBY4FdgC/LiqxpMcAJzbpH0S8Oqq+tbEg/eHbpW6Z2xsOevX39d2MSRJ\nI2DBOw3eNlRXkp2BvwUupR/UXQSsrKqHkzytqn6U5DRgY1WdmWSvqnq42ccZ9Mdp/WSSbwK/U1UP\nJtmjqn6S5OPANVV1fnOcnarq0QnlsNNgdZgdj0rSUjJqw3ntlmQtcB1wH3AOcAxw4bZgrap+NEm+\nFya5ogneTgBe0Ky/CvhMU3O3rabxGuD9Sf478JyJgZwkSdJi0UYz6+aqWjG4YsjmznOB46rq1iQn\nAUcDVNXbkxwO/EfghiQrmhq5Nc26S5KsqqreE3e5emB+vJkkSZLa1ev16PV6Q6VtrZl1wroDgC8A\nL6mqf97WpDqhmfX7wAHAj4EvAt+pqrck+fWquqfZz7XAKU2ee5t1fwY8UFUfn3BMm1nVYTazStJS\nMmpjsz7hL1BV3Z7kw8DXkmwBbgTeMiHZqfSbZr8PXAtsCwj/LMlzm/mvVNU3k/xRkjfS/1r2QeDD\nO+A8JEmSWrfgNXOjwpo5dZs1c5K0lIxazdwIsWsSddPY2PK2iyBJGhFLOpizZkOSJHWdY7NKkiR1\nmMGcJElShxnMSZIkdVirwVySsSTnJ7k7yfVJLk6y7xRplye5ZYptZyfZf8eWVpIkafS0/QHERcC5\nVfUGgCQHAWPAuinST/rFQlWtmsvBhxx5Qi1yQHlJkqbXWs1ckpcBP6+qv9y2rqpuAW5K8pUk30hy\nc5LjBrLtkuS8JLcnuTDJrs2+Lk+yopnfmORPktyU5Ooke09dinIa8WnDhvunvn2SJKnVZtYDgRsm\nWf8z4BVVdRhwDPB/BrY9Dzirqg4ANgJvnyT/MuDqqjoYuJL+8F6SJEmL0ih+APEk4KNJbga+Avzr\nJL/WbPt2Va1p5s8Djpok/6NVdUkzfwPwnB1ZWEmSpDa1+c7cbcBrJll/IvCrwCFV9XiSe4Fdm20T\n35mb7B26xwbmtzLtOa4emB9vJkmSpHb1ej16vd5QaVsdmzXJNcA5VfXpZvkg4JXAM6rq3c17dV+l\nX7sW4F7gxVV1bZK/BG6rqo8luRx4b1WtTbKxqnZv9vdq4Peq6i2THNuxWTvBMUglSZpubNa2m1lf\nCfxWknVNtyMfAb4IHN40s/4n4I6B9HcC70hyO/A04P826wf/2vuXX5IkLRmt1sy1qV8zp1Fn1ySS\nJE1fM9d2P3OtWqqBrCRJWjzabmaVJEnSPBjMSZIkdZjBnCRJUocZzEmSJHVY68FckvcnubUZh3Vt\nkiOSnJ1k/2b7xinyHZlkTZIbk9yW5NSFLbkkSVL7Wv2aNclK4Fjg4KrakuTpwJOratVAsqk+Of0M\n8JqqujVJ6I/bOtvjz7rMmp5diUiStLDarpn7V8BDVbUFoKr+uarWJ7k8yYomTZKc2dTefTnJM5r1\newMbmnxVVXc2iU9L8tkkVye5K8lbpz58OW3nacOG+6e+3JIkabtrO5i7DHh2kjuTfDLJf5gkzTLg\nuqo6ELgCOK1Z/zHgriSfT7Iqya8M5DmI/kCrLwFOTbLPjjsFSZKk9rQazFXVJmAFsAr4AfC5JCdN\nSLYVuLCZPw84qsl7BnAo/YDwBODSgTz/r6p+XlU/BP4ROGKHnYQkSVKLWh8BovrDMFwBXNGMz3oS\n04+v+ottVXUv8BdJPg38IMleE9MAmXp/qwfmx5tJkiSpXb1ej16vN1TaVsdmTbIf8HhVrWuWzwD2\nBA4E3ldVa5M8Dry+qi5M8gFg76p6d5Jjq+qSJt/zga8BY8CpwPHASmB34AZgZVWtn3Dsmj5m1NzE\nYdIkSdrORnls1qcCn0iyJ7AFWEe/yfVvB9L8FDgiyQfpf/Dwumb9G5OcCWxu8p5QVdV8ofpNoAc8\nA/jQxEBOkiRpsWi1Zm5HSHIasLGqzpwh3eI68RFh1ySSJG1/o1wz16rFFshKkqSlZ9HVzA0rSS3V\nc5ckSd0yXc1c2/3MSZIkaR4M5iRJkjrMYE6SJKnDZgzmkmxNsjbJjc2//2PYnSc5Osk/zKeAE8Zp\nnW3eeR9fkiRplA3zNeumqppTMNWY81cGSbZHzeGUx2/6pFtS7DpEkqTFZZhgadKIJ8m9ST7S1Nhd\nl+SQJF9KcneSVQNJ90xycZI7k3xqIP+nmny3NH3DDe73fyb5BvDagfVJcm6SDzXLv5Xk6iTfSHJB\nkqc061+e5I4m/6umP7VactOGDfdPf0kkSVKnDBPM7TahmfW1A9vuq6pDgKuAc+kHTy8GPjSQ5nDg\nHcDzgX2TbAuw/riqjgBeBIwnOXAgz0NVdVhVXdAs7wL8NfBPVXVqkmcAHwB+o6oOoz9k13uS/Apw\nNvB7zfp9hr4SkiRJHTRMM+vmaZpZt72PdguwrKo2A5uTPJJkj2bbdVV1P0CS84GjgC8Ar09ySlOG\nfYADgFubPNuCuG3+Arigqj7aLK9s0n89/bbSXYBrgP2Be6rqnibdecApQ5yjJElSJ813BIhHm38f\nH5jftrxt3xPfWaskzwHeCxxaVT9Jci6w60CaTRPyfB14WZIzq+pR+k2/l1XViYOJkryIKZqFJ7d6\nYH68mSRJktrV6/Xo9XpDpR0mmJvLVwKDeY5Mshx4AHgd/Vq2PYCfAhuTjAG/C1w+zf7OAY4GLkzy\nSmANcFaSf1dV32rel3smcCewPMm/rap7gTdMX8zVczg1SZKkHWt8fJzx8fFfLJ9++ulTph0mmNs1\nyVr6AVoBX6qqP2b6r1QHt10HnAXsC/xjVV0EkOQm4A76Qd5VU+T9xXJV/XmSPYG/qqoTk5wMnN+8\nJ1fAB6rq7iR/CFySZBNwJfDUIc5RkiSpk5b02Kxtl6ENdk0iSVL3TDc263zfmeu0pRrISpKkxcPh\nvCRJkjrMYE6SJKnDDOYkSZI6zGBOkiSpw1oN5pJsbYYIu6UZX3XXmXPNuM+Tknxie5RPkiRp1LVd\nM7epqlZU1UHAY8Dbhs2YZLqyD/WZapKRn/bZ5znDXhJJkrQEtR3MDbqSfsfCJLkoyfVNjd1btyVI\nsjHJ/05yI7AyyWFJvp7kpiRrkixrkj4zyaVJ7kryp1MfskZ+2rDh/jlcSkmStFS03c9cAJLsTH9I\nr0ub9W+uqh81za7XJ/l8VT0MLAOuqar3JdmF/vBdr62qtUmeCjzS5H8RcDD92r67kny8qr67gOcl\nSZK0INqumdutGSrsOuB++mOwAvzXZrivNcCzgOc267cAX2jmnwd8r6rWAlTVT6tqa7Ptq83yo8Dt\nwPIdfyqSJEkLr+2auc1VtWJwRZKjgWOAI6vq0SSXA9s+jHikfnnYhkmHtQAeHZjfypTnuXpgfryZ\nJEmS2tXr9ej1ekOlbTuYmywY2xN4uAnk9gdWTpH+LmCfJIdW1Q1NM+vPZnf41bNLLkmStADGx8cZ\nHx//xfLpp58+Zdq2g7nJvjr9EvC2JLfRD9iumSx9VT2W5HXAWUl2AzYDvznkMSRJkhaFVt+Zq6o9\nJln386o6tqpeUFWvqqpjquqKydJX1Q1V9eKqOriqXlJVm6vqM1X1roE0x23L/0QZ+WlsbHG+7jds\n1bF2HO9B+7wH7fL6t897sH20/QFEq6pq5Kf16+9r+zLtED7A7fMetM970C6vf/u8B9vHkg7mJEmS\nus5gTpIkqcPyyz19LB1JluaJS5KkTqqqSbtkW7LBnCRJ0mJgM6skSVKHGcxJkiR12KIL5pK8PMmd\nSf4pyR9NkebjSe5OclOSg2eTVzObwz04ZGD9fUluTnJjkusWrtSLx0zXP8nzklyd5JEk75lNXg1n\nnvfAZ2A7GOIenNBc55uTXJXkhcPm1czmef19Bmar7X7UtudEPzhdBywHdgFuAvafkOZ3gS8280cC\na4bN67Rj70GzfA+wV9vn0dVpyOv/q8ChwBnAe2aT12nH3oNmm8/AwtyDlcCezfzL/VswGte/WfYZ\nmOW02GrmjgDurqr7q+ox4HPA8RPSHA98FqCqrgX2TDI2ZF7NbD73APpDXyy23+VCmvH6V9VDVXUD\nsGW2eTWU+dwD8BnYHoa5B2uq6sfN4hrgmcPm1Yzmc/3BZ2DWFtvFeibwwMDyd/jlH8h0aYbJq5nN\n5R58dyBNAV9Ocn2SU3ZYKRev+fyOfQa2j/leR5+B+ZvtPXgrcOkc8+qJ5nP9wWdg1nZuuwAjYNI+\nW9Sal1bVg0n2pv8w31FVV7VdKGkB+QwsoCQvA94MHNV2WZaiKa6/z8AsLbaaue8Czx5YflazbmKa\nfzNJmmHyambzuQdU1YPNvz8ALqJfXa/hzed37DOwfczrOvoMbBdD3YPmpfuzgeOq6uHZ5NW05nP9\nfQbmYLEFc9cD+yZZnuTJwOuBv5+Q5u+BNwEkWQn8qKo2DJlXM5vzPUjylCRPbdYvA34buHXhir4o\nzPZ3PFgz7TOwfcz5HvgMbDcz3oMkzwY+D7yxqr41m7ya0Zyvv8/A3CyqZtaq2prkncBl9APVc6rq\njiR/2N9cZ1fVJUmOTbIO2ES/enfKvC2dSmfN5x4AY8BF6Q+1tjPw11V1WRvn0VXDXP/mY5NvALsD\njyd5N3BAVf3UZ2D+5nMPgL3xGZi3Ye4B8EHg6cCnkgR4rKqO8G/B/M3n+uPfgTlxOC9JkqQOW2zN\nrJIkSUuKwZwkSVKHGcxJkiR1mMGcJElShxnMSZIkdZjBnCRJUocZzEmSJHWYwZwkSVKH/X+AHV9l\n19yVMQAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Complex version that shows the summary view\n", "\n", "def graph_feature_importances(model, feature_names, autoscale=True, headroom=0.05, width=10, summarized_columns=None):\n", " \"\"\"\n", " By Mike Bernico\n", " \n", " Graphs the feature importances of a random decision forest using a horizontal bar chart. \n", " Probably works but untested on other sklearn.ensembles.\n", " \n", " Parameters\n", " ----------\n", " ensemble = Name of the ensemble whose features you would like graphed.\n", " feature_names = A list of the names of those featurs, displayed on the Y axis.\n", " autoscale = True (Automatically adjust the X axis size to the largest feature +.headroom) / False = scale from 0 to 1\n", " headroom = used with autoscale, .05 default\n", " width=figure width in inches\n", " summarized_columns = a list of column prefixes to summarize on, for dummy variables (e.g. [\"day_\"] would summarize all day_ vars\n", " \"\"\"\n", " \n", " if autoscale:\n", " x_scale = model.feature_importances_.max()+ headroom\n", " else:\n", " x_scale = 1\n", " \n", " feature_dict=dict(zip(feature_names, model.feature_importances_))\n", " \n", " if summarized_columns: \n", " #some dummy columns need to be summarized\n", " for col_name in summarized_columns: \n", " #sum all the features that contain col_name, store in temp sum_value\n", " sum_value = sum(x for i, x in feature_dict.items() if col_name in i ) \n", " \n", " #now remove all keys that are part of col_name\n", " keys_to_remove = [i for i in feature_dict.keys() if col_name in i ]\n", " for i in keys_to_remove:\n", " feature_dict.pop(i)\n", " #lastly, read the summarized field\n", " feature_dict[col_name] = sum_value\n", " \n", " results = pd.Series(feature_dict)\n", " results.sort_values(inplace=True)\n", " results.plot(kind=\"barh\", figsize=(width,len(results)/4), xlim=(0,x_scale))\n", " \n", "graph_feature_importances(model, X.columns, summarized_columns=categorical_variables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parameter tests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parameters to test\n", "\n", " * ###Parameters that will make your model better\n", " * n_estimators: The number of trees in the forest. Choose as high of a number as your computer can handle.\n", " * max_features: The number of features to consider when looking for the best split. Try [\"auto\", \"None\", \"sqrt\", \"log2\", 0.9, and 0.2]\n", " * min_samples_leaf: The minimum number of samples in newly created leaves.Try [1, 2, 3]. If 3 is the best, try higher numbers such as 1 through 10.\n", " * ###Parameters that will make it easier to train your model\n", " * n_jobs: Determines if multiple processors should be used to train and test the model. Always set this to -1 and %%timeit vs. if it is set to 1. It should be much faster (especially when many trees are trained)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### n_jobs" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loop, best of 3: 1.21 s per loop\n" ] } ], "source": [ "%%timeit\n", "model = RandomForestRegressor(1000, oob_score=True, n_jobs=1, random_state=42)\n", "model.fit(X, y)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 loop, best of 3: 708 ms per loop\n" ] } ], "source": [ "%%timeit\n", "model = RandomForestRegressor(1000, oob_score=True, n_jobs=-1, random_state=42)\n", "model.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### n_estimators" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "30 trees\n", "C-stat: 0.853875733657\n", "\n", "50 trees\n", "C-stat: 0.860698345743\n", "\n", "100 trees\n", "C-stat: 0.863521128261\n", "\n", "200 trees\n", "C-stat: 0.862192290076\n", "\n", "500 trees\n", "C-stat: 0.863739494456\n", "\n", "1000 trees\n", "C-stat: 0.864043076726\n", "\n", "2000 trees\n", "C-stat: 0.863449227197\n", "\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEACAYAAACtVTGuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHFFJREFUeJzt3XuQFfWd9/H3B7moMKCJBI1GAyIaWa+PQTbejooyZkvZ\njY8JbpUupmIsg6tldrloUsU8te4Cum4kS7Ib4gU33ipmH3WKNRFvJ1tSEvEREHEQNsSRixpFBS8B\nGfg+f3RPbE9mes4MZ+ZwZj6vqlN0//rXfX7dzJzP/H59OYoIzMzM2tOv2g0wM7O9m4PCzMxyOSjM\nzCyXg8LMzHI5KMzMLJeDwszMcpUVFJLqJa2RtFbSjDaWD5XUKGmFpFWSpmSWDZP0oKQmSaslnZpZ\n9rdp+SpJcyqyR2ZmVlH9O6ogqR8wHzgX2Awsk/RIRKzJVJsKrI6IiyQdBLwi6Z6IaAHmAY9GxCWS\n+gP7p9stABcCx0VES7qemZntZcrpUYwD1kVEc0TsBB4AJpXUCaAuna4DtqQf/kOBMyLiLoCIaImI\nbWm9q4E5aZgQEW/v4b6YmVk3KCcoDgU2ZOY3pmVZ84FjJW0GVgLXpeUjgbcl3SXpBUkLJO2XLhsD\nnClpqaSnJZ3S9d0wM7PuUqmT2ROB5RHxeeAk4EeShpAMbZ0M/CgiTgY+Amam6/QHDoyI8cB04OcV\naouZmVVQh+cogE3A4Zn5w9KyrCuA2QAR8VtJvwOOIemJbIiI59N6vwBaT4ZvBP5vus4ySbslfTYi\ntmQ3LMkPozIz64KIUCW2U06PYhkwWtIRkgYCk4HGkjrNwAQASSNIhpXWR8SbwAZJY9J65wIvp9MP\nA+ek64wBBpSGRKuI8KuLr1mzZlW9Db3t5WPqY1oLr0rqsEcREbskXQMsJgmWOyKiSdJVyeJYANwE\nLJT0Yrra9Ih4J52+FrhX0gBgPUnvA+BO4E5Jq4AdwOUV2yszM6uYcoaeiIhfAUeXlP0kM/06yXmK\nttZdCXy5jfKdwGWdaayZmfU835ndyxUKhWo3odfxMa08H9O9myo9llVpkmJvb6OZ2d5GEtGDJ7PN\nzKwPc1CYmVkuB4WZmeVyUJiZWS4HhZmZ5XJQmJlZLgeFmZnlclCYmVkuB4WZmeVyUJiZWS4HhZmZ\n5XJQmJlZLgeFmZnlclCYmVkuB4WZmeVyUJiZWS4HhZmZ5XJQmJlZLgeFmZnlclCYmVkuB4WZmeUq\nKygk1UtaI2mtpBltLB8qqVHSCkmrJE3JLBsm6UFJTZJWSzq1ZN2/k7Rb0mf2eG/MzKziOgwKSf2A\n+cBEYCxwqaRjSqpNBVZHxInA2cCtkvqny+YBj0bEl4ATgKbMtg8DzgOa93RHzMyse5TToxgHrIuI\n5ojYCTwATCqpE0BdOl0HbImIFklDgTMi4i6AiGiJiG2Z9X4ATNujPTAzs27Vv+MqHApsyMxvJAmP\nrPlAo6TNwBDgG2n5SOBtSXeR9CaeB66LiD9IugjYEBGrJO3JPph12a5d8MEHsHVr8tq2re1/W6e3\nbYMBA2Do0OQ1bFjb09n5wYOhn88GWg0rJyjKMRFYHhHnSDoSeFzS8en2TwamRsTzkm4DZkqaA9xI\nMuzUqt20aGho+ON0oVCgUChUqNlWqyJg+/byPtzzyj78EIYM+fQHe/bf1umjjvokAHbu/CQ0tm2D\n5uZPz2dDZds2+Oij5D3aC5JyQ2fwYPDfVNaeYrFIsVjslm0rIvIrSOOBhoioT+dnAhERczN1FgGz\nI2JJOv8kMIOkJ/JsRIxKy09Py28AngA+IgmIw4BNwLiI+H3J+0dHbbTa0tIC77/f9Q/31n/79cv/\ncC9nWV1d9/+1v2tXsr9thUhn5v/wh6S9HYVMR/MOnL5BEhFRkf/pcnoUy4DRko4AXgcmA5eW1GkG\nJgBLJI0AxgDrI+IdSRskjYmItcC5wMsR8RJwcOvKkn4HnBwR7+75Lll3iUj+Oi73g7y9Za0feB19\nuB9ySP4H/6BB1T4i5dlnHzjggOS1J1oDtqNg+d3v8kNnx45PAmdPQmf//R04fUWHPQpILo8luXqp\nH3BHRMyRdBVJz2KBpEOAhcAh6SqzI+L+dN0TgNuBAcB64IqI2Fqy/fXAKRHxThvv7R5FBWSHS7ry\n4d46PXBg1/5yz/7rMfvqKu3RdaV30xo42QDpaujst58DpztUskdRVlBUk4Oi8yJg8WKYOxeampJf\n8I8/bv+Du9wP/rq6JCjMIPnjoxJDaqU/m10NHQfOpzkorE27d0NjI9x0UzK8c+ONcPbZyS+Uhwls\nb1V6cUBXQ6elZc+H04YOhX337R2/Kw4K+5Rdu+DBB+Ef/zH5i//734dJkzy8Y33Lxx93fkitrWW7\nd+cPqZUbOoMGVTdwHBTteOUVGD4cPtNHHgaycyfcey/80z8l+/3970N9fe/4a8isWnbsqMyQWmvg\ndPYy6NL5rl600dNXPdWM73wnubb+6ad791j69u1w113JOYjRo2HBAjjrLAeEWSUMGpS8Djpoz7az\nY0d5obJxY/vLt25Nfq+7MoRWSb2mRxGR/McefzwcfTT8+7/3QON62IcfJqHwz/8MJ50E3/se/Pmf\nV7tVZtadWgOns0Nq//3f7lH8iY0bk0crPPIIjB+ffKB++9vVblVlbNsGP/oR3HYbnHEGLFqUBIWZ\n9X6DBiVDy8OHd269So4w9JqgWLEi+fAcOhQefhhOPx3+7M/gK1+pdsu6bssW+OEPk5Cor4ennoKx\nY6vdKjPra3rNdTHLl8OJJybTY8YkY/iXXAKbN1e3XV3xxhswfXqyH5s3w9KlcM89Dgkzq45eExQr\nVnwSFAB/8RfJye2LL07G+GrBhg1w7bVw7LHJfRDLl8NPf5qcsDYzq5ZeFRSl4/Y33gif/zxcc01y\nsntvtX59cj7lhBOS8cjVq+Ff/xUOP7zaLTMz6yVB8d578NZbcOSRny6XYOFCePZZ+MlPqtK0XE1N\ncPnlMG4cjBgBa9fCLbckD8MzM9tb9IqgWLkSjjsueUpnqbq65OT2rFnwzDM937a2rFiRnD856yw4\n5hj47W/hH/5hz6/bNjPrDr0iKNoadsoaPRruvhu+/vXkMtpqWboULrwQvvrV5P6H9euT4bFhw6rX\nJjOzjvSKoMhe8dSe+vrkRPHFFyd3NveUCPj1r+G88+Ab34ALLkgC4rvfTb71zMxsb9crgqL0iqf2\nzJgBRxwBU6d2/8ntCPjVr5Ib5L71Lfjrv4Z165Irsfbdt3vf28yskmr+ER4ff5x8c9iWLcnz6Dvy\nwQfJTXhXXZUERqVlH/W9fXvymI1LLoH+vebWRjOrBX4oYMbq1TBqVHkhAclwz8MPJ+cIjjsOzjyz\nMu3wo77NrLeq+aAod9gpa9Qo+NnPYPJk+M1v4Atf6Pr7lz7q++ab/ahvM+tdav7v3Y6ueGrP+efD\n9dfD176W3AXdWdu3w7/9Gxx1VPJ4jQULkstvL7jAIWFmvUvNB0U5Vzy15+//Prl09uqryz+5/eGH\n8IMfJDf3/dd/wf33wxNPQKHggDCz3qmmg2L37uRmu64GhQS33570SubPz6+7dSvMnp0MWy1Zkjzq\ne9Eifx+EmfV+NX2O4tVXk5vVPvvZrm9j8GB46KFPTm4XCp9evmULzJsHP/6xH/VtZn1TTfco9mTY\nKWvkyOQ8w6WXwmuvJWWtj/o+6ih4/XU/6tvM+q6ygkJSvaQ1ktZKmtHG8qGSGiWtkLRK0pTMsmGS\nHpTUJGm1pFPT8pvTshWS/lNSp7/ltStXPLVnwgSYNg3+6q8+/ajvFSv8qG8z69s6DApJ/YD5wERg\nLHCppGNKqk0FVkfEicDZwK2SWoe15gGPRsSXgBOAprR8MTA2XWcdcENnG9/VK57ac/31yX0VgwbB\nyy/7Ud9mZlDeOYpxwLqIaAaQ9AAwCViTqRNAXTpdB2yJiJa0l3BGREwBiIgWYFs6/URm/aXAxZ1t\nfKWGnlpJyRVNZmb2iXKGng4FNmTmN6ZlWfOBYyVtBlYC16XlI4G3Jd0l6QVJCyS1dQ/1N4Ffdqbh\nb72VPI7ji1/szFpmZtZZlbrqaSKwPCLOkXQk8Lik49PtnwxMjYjnJd0GzARmta4o6XvAzoi4r72N\nNzQ0/HG6UChQKBT+eFms710wM4NisUixWOyWbXf4UEBJ44GGiKhP52cCERFzM3UWAbMjYkk6/yQw\ng6Qn8mxEjErLTwdmRMSF6fwU4ErgnIho85ut23so4C23wKZNcNttndthM7O+oJIPBSxn6GkZMFrS\nEZIGApOBxpI6zcCEtHEjgDHA+oh4E9ggaUxa71zg5bRePTANuKi9kMhTySuezMysfWU9Zjz9UJ9H\nEix3RMQcSVeR9CwWSDoEWAi0ftvz7Ii4P133BOB2YACwHrgiIrZKWgcMBLak6yyNiO+08d5t9ijG\njoX77oMTTujU/pqZ9QmV7FHU5PdRfPRR8v3S772XPNLbzMw+raeHnvY6L70ERx/tkDAz6wk1GRSV\nvtHOzMzaV5NBUekb7czMrH01GRS+4snMrOfU3MnsXbuSR4tv2pT8a2Zmf6pPn8xetw5GjHBImJn1\nlJoLCg87mZn1rJoMCl/xZGbWc2ouKHzFk5lZz6qpoIhwUJiZ9bSaCoo33kjC4tDSb8MwM7NuU1NB\n0dqb8HdQmJn1nJoKCl/xZGbW82ouKHzFk5lZz6qpoPCJbDOznlczj/B4/304+GDYuhX6V+qbvs3M\neqk++QiP116Dww93SJiZ9bSaCYrt22G//ardCjOzvqemgmLffavdCjOzvsdBYWZmuRwUZmaWy0Fh\nZma5HBRmZparrKCQVC9pjaS1kma0sXyopEZJKyStkjQls2yYpAclNUlaLenUtPxASYslvSLpMUm5\n31nnoDAzq44Og0JSP2A+MBEYC1wq6ZiSalOB1RFxInA2cKuk1jse5gGPRsSXgBOAprR8JvBERBwN\nPAXckNcOB4WZWXWU06MYB6yLiOaI2Ak8AEwqqRNAXTpdB2yJiBZJQ4EzIuIugIhoiYhtab1JwN3p\n9N3AX+Y1wkFhZlYd5QTFocCGzPzGtCxrPnCspM3ASuC6tHwk8LakuyS9IGmBpNbb5j4XEW8CRMQb\nwOfyGuGgMDOrjko9EGMisDwizpF0JPC4pOPT7Z8MTI2I5yXdRjLkNAsofQZJuw+damho4KmnYJ99\noFgsUCgUKtRsM7PeoVgsUiwWu2XbHT4UUNJ4oCEi6tP5mUBExNxMnUXA7IhYks4/Ccwg6Yk8GxGj\n0vLTgRkRcaGkJqAQEW9KOhh4Oj2PUfr+ERFMmwbDh8P06ZXYbTOz3q2nHwq4DBgt6QhJA4HJQGNJ\nnWZgQtq4EcAYYH06tLRB0pi03rnAy+l0IzAlnf4b4JG8RnjoycysOjoceoqIXZKuARaTBMsdEdEk\n6apkcSwAbgIWSnoxXW16RLyTTl8L3CtpALAeuCItnwv8XNI3SYLm63ntcFCYmVVHzXwfxWWXwXnn\nweWXV7tFZmZ7vz75fRTuUZiZVYeDwszMcjkozMwsl4PCzMxyOSjMzCyXg8LMzHI5KMzMLJeDwszM\ncjkozMwsl4PCzMxyOSjMzCxXTQRFSwvs2gUDBlS7JWZmfU9NBMWOHUlvQhV5vJWZmXVGTQWFmZn1\nvJoICp+fMDOrHgeFmZnlclCYmVkuB4WZmeVyUJiZWS4HhZmZ5XJQmJlZLgeFmZnlclCYmVmusoJC\nUr2kNZLWSprRxvKhkholrZC0StKUzLJXJa2UtFzSc5nyEyQ921ou6ZT23t9BYWZWPf07qiCpHzAf\nOBfYDCyT9EhErMlUmwqsjoiLJB0EvCLpnohoAXYDhYh4t2TTNwOzImKxpAuAW4Cz22qDg8LMrHrK\n6VGMA9ZFRHNE7AQeACaV1AmgLp2uA7akIQGgdt5nNzAsnT4A2NReAxwUZmbV02GPAjgU2JCZ30gS\nHlnzgUZJm4EhwDcyywJ4XNIuYEFE/DQtvx54TNKtJGHylfYa4KAwM6uecoKiHBOB5RFxjqQjSYLh\n+Ij4ADgtIl6XNDwtb4qIZ4Crgesi4mFJ/xu4EzivrY0/9lgD/fpBQwMUCgUKhUKFmm1m1jsUi0WK\nxWK3bFsRkV9BGg80RER9Oj8TiIiYm6mzCJgdEUvS+SeBGRHxfMm2ZgHvR8S/SHovIg7ILNsaEcMo\nISmmTQsOOgimT+/6jpqZ9SWSiIiKfItPOecolgGjJR0haSAwGWgsqdMMTEgbNwIYA6yXtL+kIWn5\nYOB8YFW6ziZJZ6XLzgXWttcADz2ZmVVPh0NPEbFL0jXAYpJguSMimiRdlSyOBcBNwEJJL6arTY+I\ndySNBB6SFOl73RsRj6d1rgR+KGkfYDvw7fba4KAwM6ueDoeeqk1SXHZZMGECXH55tVtjZlYbenro\nqercozAzqx4HhZmZ5XJQmJlZLgeFmZnlclCYmVkuB4WZmeVyUJiZWS4HhZmZ5XJQmJlZLgeFmZnl\nclCYmVmumgiKlhYYMKDarTAz65tqIij23RdUkUdbmZlZZ9VMUJiZWXU4KMzMLJeDwszMcjkozMws\nl4PCzMxyOSjMzCyXg8LMzHI5KMzMLJeDwszMcjkozMwsV1lBIale0hpJayXNaGP5UEmNklZIWiVp\nSmbZq5JWSlou6bmS9f5WUlO6zpz23t9BYWZWPf07qiCpHzAfOBfYDCyT9EhErMlUmwqsjoiLJB0E\nvCLpnohoAXYDhYh4t2S7BeBC4LiIaEnXa5ODwsysesrpUYwD1kVEc0TsBB4AJpXUCaAuna4DtqQh\nAaB23udqYE5rvYh4u70GOCjMzKqnnKA4FNiQmd+YlmXNB46VtBlYCVyXWRbA45KWSboyUz4GOFPS\nUklPSzqlvQY4KMzMqqfDoacyTQSWR8Q5ko4kCYbjI+ID4LSIeF3S8LS8KSKeSd/7wIgYL+nLwM+B\nUW1t/NlnG2hoSKYLhQKFQqFCzTYz6x2KxSLFYrFbtq2IyK8gjQcaIqI+nZ8JRETMzdRZBMyOiCXp\n/JPAjIh4vmRbs4D3I+JfJP2SZOjp1+my/wFOjYgtJevE3LnB9Ol7uqtmZn2HJCKiIt/kU87Q0zJg\ntKQjJA0EJgONJXWagQlp40aQDCutl7S/pCFp+WDgfOCldJ2HgXPSZWOAAaUh0cpDT2Zm1dPh0FNE\n7JJ0DbCYJFjuiIgmSVcli2MBcBOwUNKL6WrTI+IdSSOBhyRF+l73RsTitM6dwJ2SVgE7gMvba4OD\nwsysejoceqo2SXH33cHl7caImZmV6umhp6pzj8LMrHocFGZmlstBYWZmuRwUZmaWy0FhZma5HBRm\nZpbLQWFmZrkcFGZmlstBYWZmuRwUZmaWy0FhZma5aiIoBgyodgvMzPqumggKVeSxVmZm1hU1ERRm\nZlY9DgozM8vloDAzs1wOCjMzy+WgMDOzXA4KMzPL5aAwM7NcDgozM8vloDAzs1wOCjMzy1VWUEiq\nl7RG0lpJM9pYPlRSo6QVklZJmpJZ9qqklZKWS3qujXX/TtJuSZ/Zoz0xM7Nu0b+jCpL6AfOBc4HN\nwDJJj0TEmky1qcDqiLhI0kHAK5LuiYgWYDdQiIh329j2YcB5QHMF9sXMzLpBOT2KccC6iGiOiJ3A\nA8CkkjoB1KXTdcCWNCQAlPM+PwCmda7JZmbWk8oJikOBDZn5jWlZ1nzgWEmbgZXAdZllATwuaZmk\nK1sLJV0EbIiIVV1quZmZ9YgOh57KNBFYHhHnSDqSJBiOj4gPgNMi4nVJw9PyJuD/ATeSDDu1avdh\n4g0NDX+cLhQKFAqFCjXbzKx3KBaLFIvFbtm2IiK/gjQeaIiI+nR+JhARMTdTZxEwOyKWpPNPAjMi\n4vmSbc0C3gcWA08AH5EExGHAJmBcRPy+ZJ3oqI1mZvZpkoiIinybTzlDT8uA0ZKOkDQQmAw0ltRp\nBiakjRsBjAHWS9pf0pC0fDBwPvBSRLwUEQdHxKiIGEkynHVSaUiYmVn1dTj0FBG7JF1D0gvoB9wR\nEU2SrkoWxwLgJmChpBfT1aZHxDuSRgIPSYr0ve6NiMVtvQ05Q09mZlY9HQ49VZuHnszMOq+nh57M\nzKwPc1CYmVkuB4WZmeVyUJiZWS4HhZmZ5XJQmJlZLgeFmZnlclCYmVkuB4WZmeVyUJiZWS4HhZmZ\n5XJQmJlZLgeFmZnlclCYmVkuB4WZmeVyUJiZWS4HhZmZ5XJQmJlZLgeFmZnlclCYmVkuB4WZmeVy\nUJiZWa6ygkJSvaQ1ktZKmtHG8qGSGiWtkLRK0pTMslclrZS0XNJzmfKbJTWl6/ynpKEV2SMzM6uo\nDoNCUj9gPjARGAtcKumYkmpTgdURcSJwNnCrpP7pst1AISJOiohxmXUWA2PTddYBN+zZrlhbisVi\ntZvQ6/iYVp6P6d6tnB7FOGBdRDRHxE7gAWBSSZ0A6tLpOmBLRLSk82rrfSLiiYjYnc4uBQ7rbOOt\nY/4FrDwf08rzMd27lRMUhwIbMvMb07Ks+cCxkjYDK4HrMssCeFzSMklXtvMe3wR+WV6TzcysJ/Xv\nuEpZJgLLI+IcSUeSBMPxEfEBcFpEvC5peFreFBHPtK4o6XvAzoi4r0JtMTOzSoqI3BcwHvhVZn4m\nMKOkziKSQGidfxI4pY1tzQK+m5mfAiwBBuW8f/jll19++dX5V0ef7+W+yulRLANGSzoCeB2YDFxa\nUqcZmAAskTQCGAOsl7Q/0C8iPpA0GDgf+D+QXEkFTAPOjIgd7b15RKiMNpqZWTdR+ld7fqXkQ30e\nyTmNOyJijqSrSBJrgaRDgIXAIekqsyPifkkjgYdI0q0/cG9EzEm3uQ4YCGxJ11kaEd+p3K6ZmVkl\nlBUUZmbWd/nO7BrX1g2Nkg6UtFjSK5IekzQsU/8GSevSmx3Pr17L9x6S7pD0pqQXM2WdPoaSTpb0\nYnpj6m09vR97k3aO6SxJGyW9kL7qM8t8TDsg6TBJT0land7YfG1a3v0/q5U62eFXdV7AeuDAkrK5\nwPR0egYwJ50+FlhOMgz4ReB/SHuVffkFnA6cCLy4J8cQ+A3w5XT6UWBitfdtLzums8hczJIp/5KP\naVnH9GDgxHR6CPAKcExP/Ky6R1H72rqhcRJwdzp9N/CX6fRFwAMR0RIRr5LcET+OPi6Sy7XfLSnu\n1DGUdDBQFxHL0nr/kVmnz2nnmELy81pqEj6mHYqINyJiRTr9AdBEcqNyt/+sOihqX/DJDY3fSstG\nRMSbkPxwAZ9Ly0tvntzEn948aYnPdfIYHkpyM2qrtm5MNbgmfb7b7ZkhEh/TTpL0RZIe21I6//ve\n6ePqoKh9p0XEycBXgamSziAJjyxfsbDnfAz33I+BUZE83+0N4NYqt6cmSRoC/AK4Lu1ZdPvvu4Oi\nxkXE6+m/bwEPkwwlvZnez0Lazfx9Wn0T8IXM6oelZfanOnsMfWw7EBFvRTooDvyUT4Y9fUzLlD5s\n9RfAzyLikbS4239WHRQ1TNL+6V8XZG5oXAU0ktz1DvA3QOsPVCMwWdLA9B6X0cBzGCRj59nx804d\nw7TLv1XSOEkCLs+s01d96pimH2Ktvga8lE77mJbvTuDliJiXKev+n9Vqn8n3a4+ughgJrCC5smEV\nMDMt/wzwBMlVEYuBAzLr3EBy9UMTcH6192FveAH3AZuBHcBrwBXAgZ09hsD/Sv8f1gHzqr1fe+Ex\n/Q/gxfRn9mGSsXUf0/KP6WnArszv/AtAfVd+3zt7XH3DnZmZ5fLQk5mZ5XJQmJlZLgeFmZnlclCY\nmVkuB4WZmeVyUJiZWS4HhZmZ5XJQmJlZrv8PkEXkUeq6ZWYAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "results = []\n", "n_estimator_options = [30, 50, 100, 200, 500, 1000, 2000]\n", "\n", "for trees in n_estimator_options:\n", " model = RandomForestRegressor(trees, oob_score=True, n_jobs=-1, random_state=42)\n", " model.fit(X, y)\n", " print (trees, \"trees\")\n", " roc = roc_auc_score(y, model.oob_prediction_)\n", " print (\"C-stat: \", roc)\n", " results.append(roc)\n", " print (\"\")\n", " \n", "pd.Series(results, n_estimator_options).plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### max_features" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "auto option\n", "C-stat: 0.864043076726\n", "\n", "None option\n", "C-stat: 0.864043076726\n", "\n", "sqrt option\n", "C-stat: 0.86337466313\n", "\n", "log2 option\n", "C-stat: 0.86337466313\n", "\n", "0.9 option\n", "C-stat: 0.863534443273\n", "\n", "0.2 option\n", "C-stat: 0.86337466313\n", "\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD7CAYAAACfQGjDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEQhJREFUeJzt3XuwbnVdx/H3h9sgd1HZKgw7TA1xEKREFMoT3o6aoNgF\nNEMU8Y8YLM2gmjxHbcbQdLKUJtQhzLyRaJRTYOrDpEmg3JWb4kEuHTA1QCtE/PbHXke3m/M7l72f\ntZ+1n/N+zTzDuq/fb36b83nW7/estVJVSJK0MdtNugCSpOEyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS\n1LTDpAuwMUn8Xa4kLUJVZZzHG+yVRFVN7WfNmjUTL4P1s37bWt22hfr1YbAhIUmaPENCktRkSEzA\nqlWrJl2EXlm/lWua6wbTX78+pK9+rKVIUkMslyQNWRJqWxm4liRNniEhSWoyJCRJTYO8mQ7m+tak\ncZmZmWX9+nWTLoa04gx24BqGVy6tZOntZiNpKBy4liQtK0NCktRkSEiSmsYaEklWJ7k+yY1JTt/I\n+pcmuar7fD7JweM8vyRpvMY2cJ1kO+BG4JnAHcBlwPFVdf28bY4Arququ5OsBtZW1REbOZYD1xoz\nB641/YY+cH04cFNV3VJV9wMfAY6dv0FVXVJVd3ezlwD7jvH8kqQxG2dI7AvcOm/+NjYdAicD/zzG\n80uSxmwiN9Ml+WXgJOCo9lZr502v6j6SpA1GoxGj0ajXc4xzTOII5sYYVnfzZwBVVWcu2O5JwMeB\n1VX19caxHJPQmDkmoek39DGJy4DHJplNshNwPHDB/A2S7M9cQLy8FRCSpOEYW3dTVT2Q5FTgIubC\n5/1VdV2S18ytrrOBPwb2Bs7K3MOZ7q+qw8dVBknSePnsJm0j7G7S9Bt6d5MkacoYEpKkJkNCktRk\nSEiSmgwJSVLTYF9fCr6+VOMzMzM76SJIK9JgQ8KfK0rS5NndJElqMiQkSU2GhCSpyZCQJDUZEpKk\nJkNCktRkSEiSmgwJSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlS02BfOpT4\nZjqN38zMLOvXr5t0MaQVI0N8A1ySguGVS9MgvvVQUysJVTXWb9h2N0mSmgwJSVKTISFJahprSCRZ\nneT6JDcmOX0j6/dKcn6Sq5JckuSgcZ5fkjReYwuJJNsB7waeCzwROCHJgQs2+0Pgiqo6BDgR+Itx\nnV+SNH7jvJI4HLipqm6pqvuBjwDHLtjmIOCzAFV1A/AzSR4xxjJIksZonCGxL3DrvPnbumXzXQUc\nB5DkcGB/YL8xlkGSNEbLfTPdnwLvSnI5cA1wBfDAxjddO296VfeRJG0wGo0YjUa9nmNsN9MlOQJY\nW1Wru/kzgKqqMzexzzeAg6vqewuWezOdeuLNdJpeQ7+Z7jLgsUlmk+wEHA9cMH+DJHsm2bGbfjVw\n8cKAkCQNx9i6m6rqgSSnAhcxFz7vr6rrkrxmbnWdDTwBODfJj4CvAK8a1/klSePns5u0jbG7SdNr\n6N1NkqQpY0hIkpoMCUlSkyEhSWoyJCRJTYN9fSn4+lKN38zM7KSLIK0ogw0Jf6YoSZNnd5MkqcmQ\nkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJ\nUpMhIUlqMiQkSU2DfelQ4pvpND4zM7OsX79u0sWQVpwM8Q1wSQqGVy6tZPFth5p6SaiqsX7DtrtJ\nktRkSEiSmgwJSVLTokIiyb2LPWGSDya5PsnVSd6XZPvFHkuS1K/FXkksZQTwg1V1YFU9CdgFOHkJ\nx5Ik9WjJ3U1J3p7kmiRXJfn1blmSnJXkq0kuTPKpJMcBVNW/zNv9UmC/pZZBktSPJd0nkeQlwJOq\n6uAk+wCXJbkYOArYv6oOSjIDXAe8f8G+OwAvB05bShkkSf1Z6pXEkcCHAarqLmAEHM5cSJzXLb8T\n+NxG9j0LuLiqvrDEMkiSejLuO67DFoxXJHkj8PCqOqW91dp506u6jyRpg9FoxGg06vUci7rjOsm9\nVbV7khcDpwAvAB7G3BjDU4FfAk4EjgH2Ab4KvLqqzk9yMnAScHRV3dc4vndca8y841rTr487rhd7\nJVEAVfWJJEcAVwE/At5QVXcl+ThwNPAV4Fbgy8Dd3b5/BawDLpkLA86vqj9ZfBUkSX3p7dlNSXat\nqu8n2Rv4D+DIbtxiS/b1SkJj5pWEpt+QriS2xD8l2QvYEXjzlgaEJGk4fAqsthFeSWj6+RRYSdKy\nMiQkSU2GhCSpabCvL527L08aj5mZ2UkXQVqRBhsSDjJK0uTZ3SRJajIkJElNhoQkqcmQkCQ1GRKS\npCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlq\nGuyb6RJfX6rxmZmZZf36dZMuhrTiZIivCU1SMLxyaSWLr8TV1EtCVY31G7bdTZKkJkNCktQ0kZBI\nckiS503i3JKkLbfsIZFke+BQ4PnLfW5J0tZZ9MB1kl2AjwH7AtsDbwHuAf4c+D7wBeAxVfXCJGuA\nnwUOAG4FjgR2Bm4H3lpV5y04tgPXGjMHrjX9+hi4XspPYFcDt1fVrwAk2QO4FlhVVTcn+Sg//S/9\nE4Ajq+oHSU4Efr6qTlvC+SVJPVtKd9M1wLOTvDXJUcxdJdxcVTd36z+4YPsLquoHSzifJGmZLfpK\noqpuSnIYc2MLbwE+u5ldvr91Z1g7b3pV95EkbTAajRiNRr2eYyljEo8CvlNV9yV5AXAqc11KR3fd\nTR8CdquqY7oxiXur6p3dvscBx1TVKxrHdkxCY+aYhKbf0G6mOxi4NMkVwBuBPwJOAT6V5EvAnZvY\n93PAQUkuT/JrSyiDJKlHvT2WI8kzgNdX1TGL2NcrCY2ZVxKafkO7kpAkTTkf8KdthFcSmn5eSUiS\nlpUhIUlqMiQkSU2DfTMd+GY6jc/MzOykiyCtSIMNCQcZJWny7G6SJDUZEpKkJkNCktRkSEiSmgwJ\nSVKTISFJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEhSWoyJCRJTYaEJKnJkJAk\nNQ32zXSJry/V8pmZmWX9+nWTLoY0OBnia0KTFAyvXJpm8ZW5WvGSUFVj/YZtd5MkqcmQkCQ1GRKS\npKbNhkSSHyV5+7z51yd5Y7/FkiQNwZZcSdwHHJdk774LI0kali0JiR8CZwOvW7giyWySzyS5Msmn\nk+zXLT8nybuSfCHJ15IcN2+f30tyabfPmrHVRJI0dlsSEgW8B3hZkt0XrPtL4JyqOhT4UDe/wSOr\n6kjghcCZAEmeDTyuqg4Hngz8QpKjllgHSVJPtuhmuqr6XpJzgdcC/ztv1dOAF3fTf0sXBp1Pdvte\nl2SfbtlzgGcnuRwIsCvwOODzDz7r2nnTq7qPJGmD0WjEaDTq9RybvZkuyT1VtUeShwKXA+cAVNWb\nk9wFPKqqHkiyA3BHVe2T5BzgH6vq/AXH+DPghqp672bO6c10WmbeTKeVb1I30wWgqr4LfAx41bx1\n/w6c0E3/JvBvmzoGcCHwyiS7AiR5dJJHbG2hJUnLY0vHJDZ4B/CwectOA05KciXwMua6oxbu8+P5\nqvo0c2MXX0xyNXAesNviii5J6pvPbpIAu5s0DXx2kyRpWRkSkqQmQ0KS1GRISJKaBvtmup/8albq\n38zM7KSLIA3SYEPCX5pI0uTZ3SRJajIkJElNhoQkqcmQkCQ1GRKSpCZDQpLUZEhIkpoMCUlSkyEh\nSWoyJCRJTYaEJKnJkJAkNRkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqGuyb6RJfX6rlMzMzy/r16yZd\nDGlwMsTXhCYpGF65NM3iK3O14iWhqsb6DdvuJklSkyEhSWoyJCRJTb2GRJJjkxzY5zkkSf3p+0ri\nRcATez6HJKknWx0SST6R5LIk1yQ5uVt277z1L0lyTpKnAccAb0tyeZIDkhyS5ItJrkzy8SR7jq8q\nkqRxW8yVxElV9RTgKcBrk+zNg3+vWlX1ReAC4A1VdVhVfQP4QDd/KHAtsHbxRZck9W0xN9P9TpIX\nddP7AY/bkp2S7AHsWVWf7xadC3ysvcfaedOruo8kaYPRaMRoNOr1HFsVEkmeARwNPLWq7kvyOWBn\nfvpKYufxFG3teA4jSVNq1apVrFq16sfzb3rTm8Z+jq3tbtoT+G4XEAcCR3TL70zyc0m2A148b/t7\ngT0Aquoe4LtJjuzWvRy4ePFFlyT1basey5FkJ+CTwCxwA7AXc1/5Hw68DbgL+BKwW1W9MsnTgfcC\n/wf8KrA78NfAQ4CbmRvfuHsj5/GxHFpmPpZDK18fj+Xw2U0SYEhoGvjsJknSsjIkJElNhoQkqcmQ\nkCQ1GRKSpKbBvr4UfH2pls/MzOykiyAN0mBDwp8jStLk2d0kSWoyJCRJTYaEJKnJkJiAvh/tO2nW\nb+Wa5rrB9NevD4bEBEz7H6r1W7mmuW4w/fXrgyEhSWoyJCRJTQN+VLgkaWttE++TkCQNg91NkqQm\nQ0KS1NR7SCRZneT6JDcmOX0j6/dIckGSK5Nck+QV89atS3JVkiuSXDpv+UOTXJTkhiQXJtmz73q0\n9FS/NUluS3J591m9TNV5kCXWb88k5yW5LslXkjy1Wz4t7deq34pvvySP7/4uL+/+e3eS07p1g2i/\nnuq24tuuW/e7Sa5NcnWSv0uyU7d869uuqnr7MBdCXwNmgR2BK4EDF2zzB8Bbu+mHA98GdujmbwYe\nupHjngn8fjd9OvCnfdZjAvVbA7xuEnUac/3+Bjipm94B2GPK2q9Vv6lovwXHuQPYbyjt12PdVnzb\nAY/u/m3ZqVv3UeC3Ftt2fV9JHA7cVFW3VNX9wEeAYxdsU8Du3fTuwLer6ofdfNj41c6xwLnd9LnA\ni8Za6i3XV/02rJu0RdcvyR7AL1bVOQBV9cOquqfbbsW332bqByu8/RZs8yzg61V1Wzc/hPbrq24w\nHW23PbBrkh2AXYDbu+Vb3XZ9h8S+wK3z5m/rls33buCgJHcAVwGvnbeugE8nuSzJq+ct36eq7gSo\nqvXAPmMv+Zbpq34Ap3aXke+bYHfMUup3APBfSc7pLtvPTvKQbt00tN+m6gcrv/3m+w3gw/Pmh9B+\nfdUNVnjbVdUdwDuAbzIXDv9dVZ/p9tnqthvCwPVzgSuq6tHAk4H3JNmtW3dkVR0GPB/47SRHNY4x\n5N/xLqZ+ZwGPqapDgfXAO5e70FuhVb8dgMOA93R1/B/gjG6fhd/UVmL7bap+09B+ACTZETgGOG8T\nxxhq+y2mbiu+7ZLsxdwVwyxzXU+7JXlp4xibbbu+Q+J2YP958/vxk8ueDU4Czgeoqq8D3wAO7Ob/\ns/vvt4BPMHcJBnBnkhmAJI8E7uqp/JvTS/2q6lvVdRoC7wWe0lP5N2cp9bsNuLWqvtRt9/fM/aMK\nsH4K2q9Zvylpvw2eB3y5+xvdYAj///VStylpu2cBN1fVd6rqgW6bp3f7bHXb9R0SlwGPTTLbja4f\nD1ywYJtbmKsUXeEfD9ycZJcNqZ9kV+A5wLXdPhcAr+imTwT+oc9KbEIv9esab4Pj+Em9l9ui69dd\n0t6a5PHdds8EvtpNr/j221T9pqH95q0/gQd3xwyh/Xqp25S03TeBI5LsnCTM/W1e1+2z9W23DKP0\nq4EbgJuAM7plrwFO6aYfBVwIXN19TuiWH8DciP4VwDUb9u3W7Q38a3fci4C9+q7HMtfvA922VwKf\nBGZWWv26dYcw98d+JXPfZvaclvbbTP2mpf12Ab4F7L7gmINov57qNi1tt4a5YLiauQHqHRfbdj6W\nQ5LUNISBa0nSQBkSkqQmQ0KS1GRISJKaDAlJUpMhIUlqMiQkSU2GhCSp6f8BbkhdwN1JIzQAAAAA\nSUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "results = []\n", "max_features_options = [\"auto\", None, \"sqrt\", \"log2\", 0.9, 0.2]\n", "\n", "for max_features in max_features_options:\n", " model = RandomForestRegressor(n_estimators=1000, oob_score=True, n_jobs=-1, random_state=42, max_features=max_features)\n", " model.fit(X, y)\n", " print (max_features, \"option\")\n", " roc = roc_auc_score(y, model.oob_prediction_)\n", " print (\"C-stat: \", roc)\n", " results.append(roc)\n", " print (\"\")\n", " \n", "pd.Series(results, max_features_options).plot(kind=\"barh\", xlim=(.85,.88));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### min_samples_leaf" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 min samples\n", "C-stat: 0.864043076726\n", "\n", "2 min samples\n", "C-stat: 0.869654022731\n", "\n", "3 min samples\n", "C-stat: 0.871571384442\n", "\n", "4 min samples\n", "C-stat: 0.873478094142\n", "\n", "5 min samples\n", "C-stat: 0.874269005848\n", "\n", "6 min samples\n", "C-stat: 0.874029335634\n", "\n", "7 min samples\n", "C-stat: 0.873304998988\n", "\n", "8 min samples\n", "C-stat: 0.871866977705\n", "\n", "9 min samples\n", "C-stat: 0.869294517411\n", "\n", "10 min samples\n", "C-stat: 0.867430415748\n", "\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEACAYAAACznAEdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8VPW5x/HPAyKKrG7IJooUFa0CKuKCjYCCVnGtgBbX\noq0LtvVWxLaCvd4r2utVW+xtsSAWwu4C4gKiRrCtAgqKCoJAMRBFQRCpioE894/fiYwxkEkyyZnl\n+3695pVzzpxz5hlI5pnfbu6OiIhInbgDEBGR9KCEICIigBKCiIhElBBERARQQhARkYgSgoiIAEkm\nBDPrY2bLzGy5mQ0p5/nGZjbDzBab2RIzuzI63sHMFpnZG9HPz8xscMJ1N5nZ0uiaESl7VyIiUmlW\n0TgEM6sDLAd6AkXAAqC/uy9LOGco0Njdh5rZ/sB7QHN3317mPmuBru6+1szygNuBs919u5nt7+4b\nUvv2REQkWcmUELoCK9x9jbsXA5OA88qc40CjaLsRsDExGUR6ASvdfW20/zNgROl5SgYiIvFKJiG0\nAgoT9tdGxxKNBDqaWRHwJnBzOffpB0xM2O8AnGZmr5rZS2Z2fPJhi4hIqqWqUbk3sMjdWwKdgYfM\nrGHpk2ZWD+gLTE24Zg+gmbt3A24FpqQoFhERqYI9kjhnHXBwwn7r6Fiiq4C7Adx9pZmtBo4AFkbP\nnwW87u6fJFxTCDweXbPAzErMbD9335h4YzPTZEsiIlXg7laZ85MpISwA2ptZWzPbE+gPzChzzhpC\nGwFm1pxQHbQq4fkBfLu6COBJoEd0TQegXtlkUMrd0+oxbNiw2GPIhJjSNS7FpJhyIa6qqLCE4O47\nzOxGYDYhgYx296Vmdl142kcBdwFjzeyt6LJb3f3T6MO+ASFZXFvm1o8AY8xsCbANuLxK70BERFIi\nmSoj3P054PAyx/6SsP0hoR2hvGu/AA4o53gxMLAywYqISM3RSOUqyMvLizuE70jHmCA941JMyVFM\nyUvXuCqrwoFpcTMzT/cYRUTSjZnhNdCoLCIiOUAJQUREACUEERGJKCGIiAighCAiIhElBBERAZQQ\nREQkooQgIiKAEoKIiESUEEREBFBCEBGRiBKCiIgASggiIhJRQhAREUAJQUREIkoIIiICKCGIiEhE\nCUFERAAlBBERiSghiIgIoIQgIiIRJQQREQGUEEREJKKEICIigBKCiIhE9og7AJHa4A4vvwzjxkFJ\nCbRtGx4HHxx+tmkD9evHHaVIvMzdKz7JrA/wAKFEMdrd7ynzfGNgPHAwUBe4z93HmlkHYDLggAHt\ngN+6+x8Srr0F+D2wv7t/Ws5rezIxipTn449h7Fj4619hzz3hmmugUSP44ANYs2bno6gI9ttvZ4JI\nTBal202bxv1uRJJnZri7Veqaij5szawOsBzoCRQBC4D+7r4s4ZyhQGN3H2pm+wPvAc3dfXuZ+6wF\nTnT3wuhYa+CvwOHAcUoIkgolJTBnDjz8MDz/PFxwAVx7LXTrBraLP48dO+DDD7+dKMomjTp1yk8U\npdsHHRTOEUkHVUkIyVQZdQVWuPua6EUmAecByxLOcaBRtN0I2JiYDCK9gJWlySByP/ArYEZlghYp\nT1ERjBkDo0dDs2YwaFAoGTRpUvG1detC69bhcfLJ333eHTZv/m6iWLhw5/amTeH6XSWNNm1gr71S\n/75FUiWZhNAKSPwQX0tIEolGAjPMrAhoCPQr5z79gImlO2bWFyh09yW2q69tIhXYsQOefTaUBubO\nhUsugWnT4LjjUvs6ZiHJNGsGnTqVf85XX0Fh4beTxty5O7fXrg3Xl1e66No1lDBE4pSqRuXewCJ3\n72FmhwHPm9kx7r4VwMzqAX2B26L9vYHbgTMS7qGsIElbsyaUBsaMgVatQmkgPx8aNowvpr32gu99\nLzzKs2MHrF//7SqpZcvguefgiivgxBNh4EA4/3zYZ5/ajV0EkksI6wiNxaVaR8cSXQXcDeDuK81s\nNXAEsDB6/izgdXf/JNo/DDgEeNNC8aA18LqZdXX3j8sGMHz48G+28/LyyMvLSyJsyTbFxfDUU6E0\nMH8+XHopPP00HHNM3JElp25daNkyPE466dvPffEFTJ8O48fDjTfCueeG5NCjR7hOpCIFBQUUFBRU\n6x7JNCrXJTQS9wQ+BOYDA9x9acI5DwEfu/udZtackAiOLW0kNrOJwHPu/uguXmM10MXdN5XznBqV\nc9zKlaEtYOxYaN8+NBBffDHsvXfckdWM9eth0qTQRbaoKCS+gQPh2GPjjkwySY30Mopu3Ad4kJ3d\nTkeY2XWAu/soM2sBjAVaRJfc7e4To2sbAGuAdu7++S7uvwo4Xr2MpNS2bfDkkzBqFCxZEj4Qf/IT\nOPLIuCOrXUuXhlLD+PGhcXzgwJAgWrWKOzJJdzWWEOKkhJBbli0LVULjxsH3vx9KA+efr0FjJSXw\nyivh3+Wxx6BLl5AcLrwwjKsQKUsJQTLSl1+GnkEPPwwrVsCVV4YBZO3bxx1Zevrqq9CWMn58GH19\n9tkhOZxxBuyhuQckooQgGeWtt0ISmDAhdLscNCg0ptarF3dkmWPDBpg8OSSH1auhf/+QHLp02fUg\nPMkNSgiS9rZuDR9gDz8c+uVfcw1cfXXoiy/Vs2LFzvaG+vV3tjfo3zY3KSFI2nr99ZAEpkyB7t1D\naaBPH1Vx1AR3+Oc/Q3vD1Klw9NEhOVx8cXKjtiU7KCFIWvnss1Ad9PDDYVqHa66Bq65SD5natG1b\nGMk9bhy88AKceWZIDr17h8n+JHspIUjaeOWVMKlcXl4oDfTqpYnf4rZpUyihjR8P770XpvkYODC0\n36i9IfsoIUhamDMHBgwIU0mceWbc0Uh5Vq0K/z/jxoX9H/84PNq1izcuSR0lBIndU0+FqqHHHgtt\nBZLe3GHBgpAYJk+GDh1CqeFHP4J99407OqkOJQSJ1eTJMHgwzJwJJ5wQdzRSWcXFMGtWSA6zZsFF\nF8H990PjxnFHJlVRlYSgWl1JiTFj4Be/CAvSKBlkpnr14JxzQmJfsybsd+oEr74ad2RSW1RCkGr7\n4x/h978PyeDww+OORlLpiSfgpz+Fm26CoUM182omUZWR1LoRI0K30hdegEMOiTsaqQlr14Z2hZKS\n0EOpTZu4I5JkqMpIao07/OY38Le/wbx5SgbZrHXr0HPsrLPg+OPDvFOSnVRCkEpzD+0FL78Ms2fD\nAQfEHZHUltKFiU4/HR54QCu7pTOVEKTG7dgRpqR+7TV46SUlg1zTtSssWgRffx0m0HvjjbgjklRS\nQpCkFRfD5ZfD+++HkkHTpnFHJHFo1AgefRSGDQtTYNx3X2hfkMynKiNJyrZt0K9fSArTpmXv8pVS\nOatXw2WXQcOGIUm0aFHxNVI7VGUkNeKLL6Bv3zAz6RNPKBnIToceCnPnwkknhSqkmTPjjkiqQyUE\n2a0tW8JgpUMOCYPPNF217Mq8eWE+pL59w7iUvfaKO6LcphKCpNSnn4ZZSo86CsaOVTKQ3eveHRYv\nhvXrw2j1t9+OOyKpLCUEKdf69WHq6tNOgz/9SVNXS3KaNQtTX/zyl6Fr6p/+FLopS2ZQlZF8R2Fh\nKBlceinccYfmypeqWb48/A61bBmqG/ffP+6IcouqjKTaVq4MpYJrrw3dCpUMpKo6dIB//AOOOCJM\nkjdnTtwRSUVUQpBvLF0aFrT59a/DhGYiqTJnDlxxReiietddWr6zNqiEIFW2eDH06AH/9V9KBpJ6\nvXqF37Fly+Dkk0N1kqQfJQTh1VfDiNORI8NIZJGacMABMH06XH01nHJKaFdQ4T+9qMooxxUUhOUS\nH30Uzj477mgkV7z9dlh3u2NH+POfQ+8kSS1VGUmlPPNMSAZTpigZSO06+ugwc2rz5qHBed68uCMS\nUAkhZz32GFx/PTz5ZJh2QCQuM2fCoEHhcccdGgCZKjVWQjCzPma2zMyWm9mQcp5vbGYzzGyxmS0x\nsyuj4x3MbJGZvRH9/MzMBkfP3WtmS6NrHjMzLeVdS8aNgxtvhOeeUzKQ+J1zTphG+9VXQ5fn1avj\njih3VZgQzKwOMBLoDRwFDDCzI8qcdgPwjrt3Ak4H7jOzPdx9ubt3dvcuwHHAv4HHo2tmA0dF16wA\nhqbkHclu/fnPYW3cF16Azp3jjkYkaNEifEG56KKw5sKECXFHlJuSKSF0BVa4+xp3LwYmAeeVOceB\nRtF2I2Cju28vc04vYKW7rwVw9znuXjqL+qtA66q8AUnefffBPfeElc46dow7GpFvq1MHbrkFZs2C\nO+8MPd62bIk7qtySTEJoBRQm7K+NjiUaCXQ0syLgTeDmcu7TD5i4i9e4Gng2iVikCtzDH9ioUWGq\n4sMOizsikV0rXYmtfv2wPX9+3BHljlQ13/QGFrl7DzM7DHjezI5x960AZlYP6AvcVvZCM/s1UOzu\nuywkDh8+/JvtvLw88vLyUhR29nOHW28N37rmzg29OkTS3T77wMMPh84P554LP/95+D2uWzfuyNJX\nQUEBBQUF1bpHhb2MzKwbMNzd+0T7twHu7vcknDMTuNvd/x7tvwAMcfeF0X5f4PrSeyRcdyUwCOjh\n7tt28frqZVRFJSWh8XjhwlA/u+++cUckUnmFhTBwYJhXa9w4aK3K5aTUVC+jBUB7M2trZnsC/YEZ\nZc5ZQ2gjwMyaAx2AVQnPD6BMdZGZ9QF+BfTdVTKQqtu+Ha66KgwAmjNHyUAyV5s2oRNEr15w4olQ\nVBR3RNkrqXEI0Yf3g4QEMtrdR5jZdYSSwigzawGMBUpXVL3b3SdG1zYgJIx27v55wj1XAHsCG6ND\nr7r79eW8tkoIlfT112ESsS1bwpKXDRrEHZFIavzudzB7Nrz0EtSrF3c06a0qJQQNTMsyX34JF18c\n/lgmTw4NcyLZoqQktCl06AD33x93NOlNU1fkuK1b4Yc/hCZNYOpUJQPJPnXqhHaE6dPDlCuSWkoI\nWWLz5rCWwWGHhT8YFaclW+27b+h9dMMN8O67cUeTXZQQssAnn4S1DLp2DWMN1DVPsl3nznDvvWFk\n8+efV3y+JEdtCBnuww9D74vzzw8rUWnJS8kl114LmzaF6iP97n+b2hByzAcfhMnALrssrHSmPwjJ\nNX/4Q5gMTw3MqaESQoZ6//1QMvj5z8NDJFetWRPGJ0yZEr4gSaASQo54913Iy4Pbb1cyEGnbNqz4\nN2BAqEKVqlNCyDCLF0PPnjBiRKg/FZGwJvh118Ell0BxcdzRZC5VGWWQ116Dvn3hoYfC4DMR2al0\n0Nrhh8P//m/c0cRPVUZZbO7c8Ms+ZoySgUh5SgetPfmkBq1VlUoIGWD27NCTaNKkUF0kIrv2xhuh\nCmnuXDjyyLijiY9KCFloxgz48Y/DJHVKBiIV69IlrAx44YUatFZZKiGkscmTYfBgmDkTTjgh7mhE\nMsugQWFKl1wdtKYSQhZ59FH4xS/g+eeVDESq4o9/1KC1ylIJIQ393//Bf/93SAZHHBF3NCKZ61//\nCoPWpk7NvUFrKiFkgfvuC5N2vfyykoFIdR1yiAatVYZKCGnCPUxON358WPKyTZu4IxLJHnfeGf6u\nXnwxd6aG14ppGcodhg6Fp58O1UQHHRR3RCLZpaQEzjknlLpzZdCaqowyUEkJ3HxzSAQFBUoGIjWh\nTp1Q+n7iCQ1a2x2VEGK0Y0eYf+Xdd+GZZ6Bp07gjEsluuTRoTSWEDFJcDJdfDqtWhZHISgYiNa9L\nlzAxpAatlU8lhBhs2xZ6PXz1VVgbdu+9445IJLf85CewZUsY/Jmtg9ZUQsgAX34ZlruEUJ+pZCBS\n+0aOhJUr4YEH4o4kvaiEUIu2bg3TV7doEfpG77FH3BGJ5K7Vq6FbN5g2Dbp3jzua1FMJIY1t3gxn\nngmHHQZ/+5uSgUjcDj0Uxo6F/v01aK2UEkIt2LAhzFR6/PHwl79A3bpxRyQiAGedFVYe7NdPK62B\nEkKN++ijsP7xmWfCgw+G/tAikj5++1to2BBuuy3uSOKnj6caVFgYJtTq3z9MVpetvRlEMlnpoLXH\nHw+T4OWypBKCmfUxs2VmttzMhpTzfGMzm2Fmi81siZldGR3vYGaLzOyN6OdnZjY4eq6Zmc02s/fM\nbJaZNUnpO4vZqlUhGfz0p/Cb3ygZiKSzffcNXcCvvx6WLo07mvhU2MvIzOoAy4GeQBGwAOjv7ssS\nzhkKNHb3oWa2P/Ae0Nzdt5e5z1qgq7uvNbN7gI3ufm+UZJq5+3cKbZnYy2jZMjjjDLj9dvjZz+KO\nRkSSNXp0mHF4/vxQjZTJaqqXUVdghbuvcfdiYBJwXplzHGgUbTcifNBvL3NOL2Clu6+N9s8DHo22\nHwXOr0zg6eqtt6BHjzBzqZKBSGa55ho4+eTwM8O+h6ZEMgmhFVCYsL82OpZoJNDRzIqAN4Gby7lP\nP2Biwv6B7r4ewN0/Ag5MNuh0tWBBKBk88ABccUXc0YhIVYwcCe+/HzqB5JpU9YbvDSxy9x5mdhjw\nvJkd4+5bAcysHtAX2F07/i7z8fDhw7/ZzsvLIy8vLxUxp9Qrr4T5UUaPhnPPjTsaEamqvfYKg9W6\ndQtdxU89Ne6IklNQUEBBQUG17pFMG0I3YLi794n2bwPc3e9JOGcmcLe7/z3afwEY4u4Lo/2+wPWl\n94iOLQXy3H29mR0EvOTu35l/MBPaEObMCXMTTZgQSggikvmefRYGDYKFCzNzWvqaakNYALQ3s7Zm\ntifQH5hR5pw1hDYCzKw50AFYlfD8AL5dXUR0jyuj7SuA6ZUJPF08/TRcemnooaBkIJI9zjorTIKX\nS4PWkprLyMz6AA8SEshodx9hZtcRSgqjzKwFMBZoEV1yt7tPjK5tQEgY7dz984R77gtMAdpEz1/i\n7pvLee20LSFMmwY33ABPPQVdu8YdjYikWkkJ/PCHcNRR8D//E3c0laMlNGvRuHFw662hWNmpU9zR\niEhN2bgxtCX8/vdw8cVxR5M8JYRaMmoU/O53YdnLbF91SUTg9dehTx+YNy+sy5wJNNtpLXj66TAN\nRUGBkoFIrjjuOLj77tCTcOvWuKOpOSohVNJ554VfCo0zEMk911wD//43TJyY/tPRqMqohn36KbRr\nBx98AI0bxx2NiNS2L7+EU04JXwhvLm/4bRqpSkLQMi2VMHUq9O6tZCCSq/beO3Qx79YtVCNlyqC1\nZKkNoRLGj4fLLos7ChGJ06GHwiOPhGntP/oo7mhSS1VGSVqzJnwjKCqCPfeMOxoRiduwYaFzyZw5\nUK9e3NF8l3oZ1aAJE+BHP1IyEJHgjjtCFdLQoXFHkjpKCElwV3WRiHxb3bqQnw/Tp4eu6GlQkVFt\nalROwptvhq5mJ58cdyQikk722w/mzg1rpm/aBPfem/7dUXdHJYQk5OeH0kEd/WuJSBktWsDLL4cp\n8AcNgh074o6o6tSoXIEdO6BtW5g9Gzp2jC0MEUlzW7fCBRdAkybhS2T9+vHGo0blGjB3LhxwgJKB\niOxew4Ywc2ZoS+jbN1QzZxolhAqoMVlEklW/PkyeDK1ahfVRNm2KO6LKUZXRbnz1FbRsCUuWhP9g\nEZFkuMN//EeYEXn27HhWXFOVUYo9/TR07qxkICKVYxYW1OnXL0xvsXp13BElR91Od6O0d5GISGWZ\nwa9/DU2bwmmnwXPPhZXX0pmqjHZh0yY45JAws2mTJrX+8iKSRfLz4ZZbwnK7J5xQO6+pKqMUmjYt\nDDZRMhCR6rrsMnj44bA+80svxR3Nrikh7IKqi0Qklc49N0yh369fmO4iHanKqBwffBAak4uK4h9c\nIiLZZeHCkBzuuQcuv7zmXkcL5KTIxIlw0UVKBiKSescfDy++GBbb2rwZBg+OO6KdlBDKkZ8PI0fG\nHYWIZKsjj4R583YOXrvjjvSYFE9tCGW89VbI2tm2NJ6IpJe2bUNSeOIJ+PnPoaQk7oiUEL4jPx8u\nvVQzm4pIzWvePKy69vrrcPXVsH17vPGoUTlBSUkYe/D00/D979fKS4qI8MUXO9stJ02Cvfaq/j01\nDqGa5s2DZs2UDESkdjVoELqi1q8fxip8/nk8cSghJNDYAxGJy557hrXb27eHnj1h48bajyGphGBm\nfcxsmZktN7Mh5Tzf2MxmmNliM1tiZlcmPNfEzKaa2VIze8fMToyOH2tm/zSzRWY238yOT9m7qoJt\n2+Cxx2DAgDijEJFcVrcu/PnP0KNHmP9o3braff0KE4KZ1QFGAr2Bo4ABZnZEmdNuAN5x907A6cB9\nZlbapfVB4Bl3PxI4FlgaHb8XGObunYFhwO+r+2aq45lnQlVRmzZxRiEiuc4MRowIg9a6d4eVK2vv\ntZMZh9AVWOHuawDMbBJwHrAs4RwHGkXbjYCN7r7dzBoD3d39SgB33w5sic4rAUpnCmoK1HIu/Lb8\nfPjxj+OMQERkpyFDQpvmaafBs8/CMcfU/GsmkxBaAYUJ+2sJSSLRSGCGmRUBDYF+0fFDgQ1m9gih\ndLAQuNndvwR+Acwys/sAA06u8ruops2bw0IWf/1rXBGIiHzXtdeGCTbPOAOefBJOOqlmXy9VI5V7\nA4vcvYeZHQY8b2bHRPfvAtzg7gvN7AHgNkIV0c8IyeFJM7sYGAOcUd7Nhw8f/s12Xl4eeXl5KQo7\neOyx0IjTtGlKbysiUm39+kHjxmGd5gkTQnIoT0FBAQUFBdV6rQrHIZhZN2C4u/eJ9m8D3N3vSThn\nJnC3u/892n8BGEIoWfzT3dtFx08Fhrj7uWa22d2bJtzjM3f/zmTTtTEOoUcPuOGG0A9YRCQdvfJK\n+Iz605+S+6yqqXEIC4D2ZtbWzPYE+gMzypyzBugVBdEc6ACscvf1QKGZdYjO6wm8G22vM7MfRNf0\nBJZXJvBUWbcOFi8OfX9FRNLVqafCrFlw000wZkzNvEaFVUbuvsPMbgRmExLIaHdfambXhad9FHAX\nMNbM3oouu9XdP422BwP5ZlYPWAVcFR2/FnjQzOoCX0X7tW7iRLjwwtSMDBQRqUmdOoWpLs48M0yK\nd8stqb1/zk9d0akT3H8/nH56jb2EiEhKFRaGpHDRRfCf/1n+TKmauqKS3nkHNmyAH/wg7khERJLX\npg3MnRu6o954Y+pmSs3phKCZTUUkUx1wQFif+e23YeBAKC6u/j1ztsqopATatQsTSh17bMpvLyJS\nK778Ei65BNzDms177x2Oq8qoEv7+d2jYsHZG/4mI1JS994bHHw8D2Pr0gc8+q/q9cjYhlM5smg7L\n1omIVEe9ejBuHBx9dBhX9cknVbtPTlYZff01tGwZVilq2zaltxYRiY07/Pa3MG0avPeeqoyS8txz\n0LGjkoGIZBczuOsu+OUvq3h9LpYQLrkkzF103XUpva2ISNqoSqNyziWELVtCH97Vq2HffVN2WxGR\ntKJeRkl4/PEwKlnJQETk23IuIWjdZBGR8uVUlVFRERx1VPhZOnhDRCQbqcqoApMmwfnnKxmIiJQn\npxKC1k0WEdm1nEkIS5fCRx9BilffFBHJGjmTEPLzoX9/qFs37khERNJThSumZQP3sDj1tGlxRyIi\nkr5yooTwz3+GJTI7d447EhGR9JUTCUEzm4qIVCzrxyEUF4eZTefPh0MPTWFgIiJpTOMQyjFrFnTo\noGQgIlKRrE8IGnsgIpKcrK4y+vzzMLPp++/D/vunODARkTSmKqMynngCundXMhARSUZWJwTNbCoi\nkrysrTL66CM48khYtw4aNKiBwERE0piqjBJMngx9+yoZiIgkK2sTwvjxqi4SEamMpBKCmfUxs2Vm\nttzMhpTzfGMzm2Fmi81siZldmfBcEzObamZLzewdMzsx4bmbouNLzGxESt4RsHw5rF0LPXqk6o4i\nItmvwsntzKwOMBLoCRQBC8xsursvSzjtBuAdd+9rZvsD75nZeHffDjwIPOPuPzKzPYAG0X3zgHOB\n77v79ui6lCid2XSPnJi6T0QkNZIpIXQFVrj7GncvBiYB55U5x4FG0XYjYGP0Id8Y6O7ujwC4+3Z3\n3xKd9zNgRJQ0cPcN1XwvIRBX7yIRkapIJiG0AgoT9tdGxxKNBDqaWRHwJnBzdPxQYIOZPWJmb5jZ\nKDMrXcCyA3Camb1qZi+Z2fFVfxs7vfZaWPPguONScTcRkdyRqkbl3sAid28JdAYeMrOGhCqpLsBD\n7t4F+AK4LbpmD6CZu3cDbgWmpCIQzWwqIlI1ydSyrwMOTthvHR1LdBVwN4C7rzSz1cARhJJFobsv\njM6bBpQ2Sq8FHo+uWWBmJWa2n7tvLBvA8OHDv9nOy8sjbxfrYBYXw5Qp8I9/JPGuRESySEFBAQUF\nBdW6R4UD08ysLvAeoVH5Q2A+MMDdlyac8xDwsbvfaWbNgYXAse7+qZm9DAxy9+VmNgxo4O5DzOw6\noKW7DzOzDsDz7t62nNdPemDas8/C734XFsQREcllVRmYVmEJwd13mNmNwGxCFdNod18afaC7u48C\n7gLGmtlb0WW3uvun0fZgIN/M6gGrCKUJgDHAGDNbAmwDLq9M4OXR2AMRkarLmqkrtm6F1q3DGIQD\nD6yFwERE0lhOT10xfTqccoqSgYhIVWVNQtDYAxGR6smKKqOPPw7LZK5bB/vsU0uBiYiksZytMpo8\nGc45R8lARKQ6siIhaN1kEZHqy/gqo/ffD43J69ZpMjsRkVI5WWWUnw/9+ikZiIhUV0YnBM1sKiKS\nOhmdEBYuDEmha9e4IxERyXwZnRA0s6mISOpkbKPy9u1hqop58+B734shMBGRNJZTjcovvAAHH6xk\nICKSKhmbEDT2QEQktTKyyujf/w7VRcuWQfPmMQUmIpLGcqbKaMYMOPFEJQMRkVTKyISgsQciIqmX\ncVVGGzZA+/awdi00bBhjYCIiaSwnqoymTIGzz1YyEBFJtYxLCFo3WUSkZmRUldGqVdCtW5jZtF69\nmAMTEUljWV9lNGECXHKJkoGISE3ImISgmU1FRGpWxiSEN96Ar78OVUYiIpJ6GZMQ8vPh0ks1s6mI\nSE3JiEbl7dudNm3gxRfhiCPijkhEJP1lbaPySy9By5ZKBiIiNSkjEoLGHoiI1LyMqDJq2tR5911o\n0SLuaESOolSeAAAFzUlEQVREMkONVRmZWR8zW2Zmy81sSDnPNzazGWa22MyWmNmVCc81MbOpZrbU\nzN4xsxPLXHuLmZWY2b67ev0TTlAyEBGpaRUmBDOrA4wEegNHAQPMrGxt/g3AO+7eCTgduM/M9oie\nexB4xt2PBI4FlibcuzVwBrBmdzGkW3VRQUFB3CF8RzrGBOkZl2JKjmJKXrrGVVnJlBC6AivcfY27\nFwOTgPPKnONAo2i7EbDR3bebWWOgu7s/AuDu2919S8J19wO/qiiACy5IIspalI7/+ekYE6RnXIop\nOYopeekaV2UlkxBaAYUJ+2ujY4lGAh3NrAh4E7g5On4osMHMHjGzN8xslJntDWBmfYFCd19SUQCN\nGycRpYiIVEuqehn1Bha5e0ugM/CQmTUE9gC6AA+5exfgC+C2KCncDgxLuIeGnImIxMndd/sAugHP\nJezfBgwpc85M4JSE/ReA44HmwKqE46cCTwFHAx8Bq4DVQDHwL+DAcl7f9dBDDz30qPyjos/3so/S\nht/dWQC0N7O2wIdAf2BAmXPWAL2Av5tZc6ADIRF8amaFZtbB3ZcDPYF33f1t4KDSi81sNdDF3TeV\nffHKdpsSEZGqqTAhuPsOM7sRmE2oYhrt7kvN7LrwtI8C7gLGmtlb0WW3uvun0fZgIN/M6hFKBFeV\n9zKoykhEJFZpPzBNRERqR9pOXWFmo81sfUKpI3Zm1trMXowG2C0xs8FpEFN9M3vNzBZFMQ2LO6ZS\nZlYn6l02I+5YAMzsX2b2ZvRvNT/ueEpVNHgzhng6RP9Gb0Q/P0uT3/VfmNnbZvaWmeWb2Z5pENPN\n0d9dbJ8H5X1WmlkzM5ttZu+Z2Swza5LMvdI2IQCPEHovpZPtwC/d/SjgJOCGcgbp1Sp33wac7u6d\ngU7AWWbWNc6YEtwMvBt3EAlKgDx37+zu6fJvBLsZvBkHd18e/Rt1AY4D/g08EWdMZtYSuInQ1ngM\nobq7f8wxHQVcQ+hA0wk4x8zaxRBKeZ+VtwFz3P1w4EVgaDI3StuE4O6vAN9pZI6Tu3/k7ouj7a2E\nP9yyYzJqnbt/EW3WJ/yhxF4PGI1CPxv4a9yxJDDS7Hc+icGbcesFrHT3wgrPrHl1gX2iWRAaAEUx\nx3Mk8Jq7b3P3HcBc4MLaDmIXn5XnAY9G248C5ydzr7T648gkZnYI4VvBa/FG8k3VzCJCV97n3X1B\n3DGxcxR67MkpgQPPm9kCMxsUdzCRXQ7eTBP9gIlxB+HuRcB9wAfAOmCzu8+JNyreBrpH1TMNCF+A\n2sQcU6kD3X09hC+ywIHJXKSEUAXRoLtpwM1RSSFW7l4SVRm1Bk40s45xxmNmPwTWR6UpI316kJ0S\nVYOcTajuOzXugNjF4M14QwqinoF9galpEEtTwrfetkBLoKGZXRpnTO6+DLgHeB54BlgE7Igzpt1I\n6ouZEkIlRcXVacA4d58edzyJoqqGl4A+MYdyCtDXzFYRvl2ebmZ/izkm3P3D6OcnhDrxdGhHWEuY\nwmVhtD+NkCDSwVnA69G/V9x6EY1tiqpnHgdOjjkm3P0Rdz/e3fOAzcDymEMqtT4aE4aZHQR8nMxF\n6Z4Q0unbZakxhMF1D8YdCICZ7V/agyCqajgDWBZnTO5+u7sf7O7tCA1/L7r75XHGZGYNopIdZrYP\ncCahyB+rqFhfaGYdokM9SZ+G+AGkQXVR5AOgm5ntZWZG+HeKtfEdwMwOiH4eDFwATIgrFL79WTkD\nuDLavgJI6strMiOVY2FmE4A8YD8z+wAYVtrwFmNMpwCXAUuiOnsHbnf352IMqwXwaDRNeR1gsrs/\nE2M86ao58ISZOeH3Pt/dZ8ccU6lkBm/WqqhOvBdwbdyxALj7fDObRqiWKY5+joo3KgAei9ZyKQau\nj6NDQHmflcAIYKqZXU2YSeKSpO6lgWkiIgLpX2UkIiK1RAlBREQAJQQREYkoIYiICKCEICIiESUE\nEREBlBBERCSihCAiIgD8PyYYv/RexgaHAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "results = []\n", "min_samples_leaf_options = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n", "\n", "for min_samples in min_samples_leaf_options:\n", " model = RandomForestRegressor(n_estimators=1000, \n", " oob_score=True, \n", " n_jobs=-1, \n", " random_state=42, \n", " max_features=\"auto\", \n", " min_samples_leaf=min_samples)\n", " model.fit(X, y)\n", " print (min_samples, \"min samples\")\n", " roc = roc_auc_score(y, model.oob_prediction_)\n", " print (\"C-stat: \", roc)\n", " results.append(roc)\n", " print (\"\")\n", " \n", "pd.Series(results, min_samples_leaf_options).plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final model" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C-stat: 0.874269005848\n" ] } ], "source": [ "model = RandomForestRegressor(n_estimators=1000, \n", " oob_score=True, \n", " n_jobs=-1, \n", " random_state=42, \n", " max_features=\"auto\", \n", " min_samples_leaf=5)\n", "model.fit(X, y)\n", "roc = roc_auc_score(y, model.oob_prediction_)\n", "print (\"C-stat: \", roc)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }