{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# OPTaaS Scikit-learn Pipelines\n", "\n", "### Note: To run this notebook, you need an API Key. You can get one here.\n", "\n", "Using the OPTaaS Python Client, you can optimize any scikit-learn pipeline. For each step or estimator in the pipeline, OPTaaS just needs to know what parameters to optimize and what constraints will apply to them.\n", "\n", "Your pipeline can even include **optional** steps (such as feature selection), **choice** steps (such as choosing between a set of classifiers) and **nested** pipelines.\n", "\n", "We have provided pre-defined parameters and constraints for some of the most widely used estimators, such as Random Forest and XGBoost. The example below demonstrates how to use them. See also our [tutorial on defining your own custom optimizable estimators](07.%20Custom%20Scikit-learn%20Estimators.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load your dataset\n", "\n", "We will run a classification pipeline using the German Credit Data available [here](https://newonlinecourses.science.psu.edu/stat857/node/215/). The data contains 1000 rows, with 20 feature columns and 1 target column which includes 2 classes." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "data = pd.read_csv('../../data/german_credit.csv')\n", "features = data[data.columns.drop(['Creditability'])]\n", "target = data['Creditability']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create your OptimizablePipeline\n", "\n", "Our pipeline will include:\n", "\n", "- An optional feature selection step using PCA\n", "\n", "- A choice of classifier from: Random Forest, Extra Trees and Gradient Boost" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from mindfoundry.optaas.client.sklearn_pipelines.estimators.pca import PCA\n", "from mindfoundry.optaas.client.sklearn_pipelines.estimators.ensemble import RandomForestClassifier, ExtraTreesClassifier, GradientBoostingClassifier\n", "from mindfoundry.optaas.client.sklearn_pipelines.mixin import OptimizablePipeline, choice, optional_step\n", "\n", "optimizable_pipeline = OptimizablePipeline([\n", " ('feature_selection', optional_step(PCA())),\n", " ('classification', choice(\n", " RandomForestClassifier(),\n", " ExtraTreesClassifier(),\n", " GradientBoostingClassifier()\n", " ))\n", "])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connect to the OPTaaS server using your API Key\n", "\n", "We now create a client, and connect to the web service that will perform our optimization. You will need to input your personal API key. Make sure you keep your key private and don't commit it to your version control system. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from mindfoundry.optaas.client.client import OPTaaSClient\n", "\n", "client = OPTaaSClient('https://optaas.mindfoundry.ai', '')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create your Sklearn Task\n", "\n", "We don't need to worry about specifying all the parameters and constraints - they are generated based on our OptimizablePipeline. Sometimes we will need to provide additional kwargs, e.g. `feature_count` which is required by PCA.\n", "\n", "If we do need to optimize any additional parameters that are outside of our pipeline, we can include them in `additional_parameters` and `additional_constraints`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 'pipeline',\n", " 'name': 'pipeline',\n", " 'type': 'group',\n", " 'items': [{'id': 'pipeline__feature_selection',\n", " 'name': 'feature_selection',\n", " 'type': 'group',\n", " 'optional': True,\n", " 'items': [{'id': 'pipeline__feature_selection__n_components',\n", " 'name': 'n_components',\n", " 'type': 'integer',\n", " 'minimum': 1,\n", " 'maximum': 20},\n", " {'id': 'pipeline__feature_selection__whiten',\n", " 'name': 'whiten',\n", " 'type': 'boolean',\n", " 'default': False}]},\n", " {'id': 'classification',\n", " 'name': 'classification',\n", " 'type': 'choice',\n", " 'choices': [{'id': 'pipeline__classification__0',\n", " 'name': '0',\n", " 'type': 'group',\n", " 'items': [{'id': 'pipeline__classification__0__max_features',\n", " 'name': 'max_features',\n", " 'type': 'categorical',\n", " 'default': 'auto',\n", " 'enum': ['auto', 'sqrt', 'log2']},\n", " {'id': 'pipeline__classification__0__min_samples_split',\n", " 'name': 'min_samples_split',\n", " 'type': 'integer',\n", " 'default': 2,\n", " 'minimum': 2,\n", " 'maximum': 20,\n", " 'distribution': 'Uniform'},\n", " {'id': 'pipeline__classification__0__min_samples_leaf',\n", " 'name': 'min_samples_leaf',\n", " 'type': 'integer',\n", " 'default': 1,\n", " 'minimum': 1,\n", " 'maximum': 20},\n", " {'id': 'pipeline__classification__0__criterion',\n", " 'name': 'criterion',\n", " 'type': 'categorical',\n", " 'default': 'gini',\n", " 'enum': ['gini', 'entropy']},\n", " {'id': 'pipeline__classification__0__max_leaf_nodes',\n", " 'name': 'max_leaf_nodes',\n", " 'type': 'integer',\n", " 'optional': True,\n", " 'includeInDefault': False,\n", " 'minimum': 10,\n", " 'maximum': 10000,\n", " 'distribution': 'LogUniform'},\n", " {'id': 'pipeline__classification__0__max_depth',\n", " 'name': 'max_depth',\n", " 'type': 'integer',\n", " 'optional': True,\n", " 'includeInDefault': False,\n", " 'minimum': 1,\n", " 'maximum': 100,\n", " 'distribution': 'LogUniform'},\n", " {'id': 'pipeline__classification__0__min_weight_fraction_leaf',\n", " 'name': 'min_weight_fraction_leaf',\n", " 'type': 'number',\n", " 'default': 0.0,\n", " 'minimum': 0.0,\n", " 'maximum': 0.5},\n", " {'id': 'pipeline__classification__0__min_impurity_decrease',\n", " 'name': 'min_impurity_decrease',\n", " 'type': 'number',\n", " 'default': 0.0,\n", " 'minimum': 0,\n", " 'maximum': 1},\n", " {'id': 'pipeline__classification__0__bootstrap',\n", " 'name': 'bootstrap',\n", " 'type': 'boolean',\n", " 'default': True},\n", " {'id': 'pipeline__classification__0__n_estimators',\n", " 'name': 'n_estimators',\n", " 'type': 'integer',\n", " 'default': 10,\n", " 'minimum': 10,\n", " 'maximum': 500}]},\n", " {'id': 'pipeline__classification__1',\n", " 'name': '1',\n", " 'type': 'group',\n", " 'items': [{'id': 'pipeline__classification__1__max_features',\n", " 'name': 'max_features',\n", " 'type': 'categorical',\n", " 'default': 'auto',\n", " 'enum': ['auto', 'sqrt', 'log2']},\n", " {'id': 'pipeline__classification__1__min_samples_split',\n", " 'name': 'min_samples_split',\n", " 'type': 'integer',\n", " 'default': 2,\n", " 'minimum': 2,\n", " 'maximum': 20,\n", " 'distribution': 'LogUniform'},\n", " {'id': 'pipeline__classification__1__min_samples_leaf',\n", " 'name': 'min_samples_leaf',\n", " 'type': 'integer',\n", " 'default': 1,\n", " 'minimum': 1,\n", " 'maximum': 20},\n", " {'id': 'pipeline__classification__1__criterion',\n", " 'name': 'criterion',\n", " 'type': 'categorical',\n", " 'default': 'gini',\n", " 'enum': ['gini', 'entropy']},\n", " {'id': 'pipeline__classification__1__max_leaf_nodes',\n", " 'name': 'max_leaf_nodes',\n", " 'type': 'integer',\n", " 'optional': True,\n", " 'includeInDefault': False,\n", " 'minimum': 10,\n", " 'maximum': 10000,\n", " 'distribution': 'LogUniform'},\n", " {'id': 'pipeline__classification__1__max_depth',\n", " 'name': 'max_depth',\n", " 'type': 'integer',\n", " 'optional': True,\n", " 'includeInDefault': False,\n", " 'minimum': 1,\n", " 'maximum': 100,\n", " 'distribution': 'Uniform'},\n", " {'id': 'pipeline__classification__1__min_weight_fraction_leaf',\n", " 'name': 'min_weight_fraction_leaf',\n", " 'type': 'number',\n", " 'default': 0.0,\n", " 'minimum': 0.0,\n", " 'maximum': 0.5},\n", " {'id': 'pipeline__classification__1__min_impurity_decrease',\n", " 'name': 'min_impurity_decrease',\n", " 'type': 'number',\n", " 'default': 0.0,\n", " 'minimum': 0,\n", " 'maximum': 1},\n", " {'id': 'pipeline__classification__1__bootstrap',\n", " 'name': 'bootstrap',\n", " 'type': 'boolean',\n", " 'default': False},\n", " {'id': 'pipeline__classification__1__n_estimators',\n", " 'name': 'n_estimators',\n", " 'type': 'integer',\n", " 'default': 10,\n", " 'minimum': 10,\n", " 'maximum': 500}]},\n", " {'id': 'pipeline__classification__2',\n", " 'name': '2',\n", " 'type': 'group',\n", " 'items': [{'id': 'pipeline__classification__2__max_features',\n", " 'name': 'max_features',\n", " 'type': 'categorical',\n", " 'enum': ['auto', 'sqrt', 'log2']},\n", " {'id': 'pipeline__classification__2__min_samples_split',\n", " 'name': 'min_samples_split',\n", " 'type': 'integer',\n", " 'default': 2,\n", " 'minimum': 2,\n", " 'maximum': 20,\n", " 'distribution': 'Uniform'},\n", " {'id': 'pipeline__classification__2__min_samples_leaf',\n", " 'name': 'min_samples_leaf',\n", " 'type': 'integer',\n", " 'default': 1,\n", " 'minimum': 1,\n", " 'maximum': 20},\n", " {'id': 'pipeline__classification__2__criterion',\n", " 'name': 'criterion',\n", " 'type': 'categorical',\n", " 'default': 'friedman_mse',\n", " 'enum': ['mse', 'friedman_mse', 'mae']},\n", " {'id': 'pipeline__classification__2__max_leaf_nodes',\n", " 'name': 'max_leaf_nodes',\n", " 'type': 'integer',\n", " 'optional': True,\n", " 'includeInDefault': False,\n", " 'minimum': 10,\n", " 'maximum': 10000,\n", " 'distribution': 'LogUniform'},\n", " {'id': 'pipeline__classification__2__max_depth',\n", " 'name': 'max_depth',\n", " 'type': 'integer',\n", " 'optional': True,\n", " 'default': 3,\n", " 'minimum': 1,\n", " 'maximum': 100,\n", " 'distribution': 'Uniform'},\n", " {'id': 'pipeline__classification__2__min_weight_fraction_leaf',\n", " 'name': 'min_weight_fraction_leaf',\n", " 'type': 'number',\n", " 'default': 0.0,\n", " 'minimum': 0.0,\n", " 'maximum': 0.5},\n", " {'id': 'pipeline__classification__2__min_impurity_decrease',\n", " 'name': 'min_impurity_decrease',\n", " 'type': 'number',\n", " 'default': 0.0,\n", " 'minimum': 0,\n", " 'maximum': 1},\n", " {'id': 'pipeline__classification__2__learning_rate',\n", " 'name': 'learning_rate',\n", " 'type': 'number',\n", " 'default': 0.1,\n", " 'minimum': 5e-324,\n", " 'maximum': 1},\n", " {'id': 'pipeline__classification__2__n_estimators',\n", " 'name': 'n_estimators',\n", " 'type': 'integer',\n", " 'default': 100,\n", " 'minimum': 10,\n", " 'maximum': 500},\n", " {'id': 'pipeline__classification__2__subsample',\n", " 'name': 'subsample',\n", " 'type': 'number',\n", " 'default': 1.0,\n", " 'minimum': 0,\n", " 'maximum': 1}]}]}]},\n", " {'id': '2544445067952',\n", " 'name': 'additional',\n", " 'type': 'group',\n", " 'items': [{'id': 'extra',\n", " 'name': 'extra',\n", " 'type': 'integer',\n", " 'minimum': 0,\n", " 'maximum': 10}]}]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "['#extra != 7']" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from mindfoundry.optaas.client.parameter import IntParameter\n", "from mindfoundry.optaas.client.constraint import Constraint\n", "\n", "my_extra_param = IntParameter('extra', id='extra', minimum=0, maximum=10)\n", "my_extra_constraint = Constraint(my_extra_param != 7)\n", "\n", "task = client.create_sklearn_task(\n", " title='My Sklearn Task', \n", " pipeline=optimizable_pipeline,\n", " feature_count=len(features.columns),\n", " additional_parameters=[my_extra_param],\n", " additional_constraints=[my_extra_constraint],\n", " min_known_score=0, max_known_score=1 # optional: define the min and max known score values\n", ")\n", "\n", "display(task.parameters)\n", "display(task.constraints)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define your scoring function\n", "\n", "We define a function to run our pipeline and calculate the mean score and variance:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from sklearn.model_selection import cross_val_score\n", "\n", "def scoring_function(pipeline):\n", " scores = cross_val_score(pipeline, features, target, scoring='f1_micro', cv=5)\n", " return scores.mean(), scores.var()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run your task\n", "\n", "We run the task for 20 iterations and review the results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running task \"My Sklearn Task\" for 20 iterations\n", "(or until score is 1.0 or better)\n", "\n", "Iteration: 0 Score: 0.7139774505043966 Variance: 0.0011199704714439302\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_lea...n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 1 Score: 0.7429884974795155 Variance: 0.0013301807783011328\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', GradientBoostingClassifier(criterion='friedman_mse', init=None,\n", " learning_rate=0.1, loss='deviance', max_depth=3,\n", " max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100,\n", " presort='auto', random_state=None, subsample=1.0, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 2 Score: 0.6969754185323048 Variance: 0.000789097736972498\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=50, max_features='auto', max_leaf_...n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 3 Score: 0.7449755144365922 Variance: 0.0014781269404066037\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_...timators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 4 Score: 0.7159884435333538 Variance: 0.0015111632909465863\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=50, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 5 Score: 0.7209964455473438 Variance: 0.0005489016935114717\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=50, max_features='auto', max_leaf_no...timators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 6 Score: 0.6999694305083527 Variance: 0.0009542203697347021\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 7 Score: 0.7089844335353318 Variance: 0.0004593334249857999\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=50, max_features='auto', max_leaf_nodes=5005,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 8 Score: 0.7079744415073757 Variance: 0.0006647942864664427\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=5005,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 9 Score: 0.7279794764824704 Variance: 0.0017496840000436487\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_...timators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 10 Score: 0.7289804774834715 Variance: 0.00163495165607608\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=12, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_...timators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 11 Score: 0.7029784275293257 Variance: 0.0022902989142203427\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_lea...n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 12 Score: 0.7209874545203886 Variance: 0.0010947255754004488\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=8, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_n...timators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 13 Score: 0.7159644674614735 Variance: 0.0017269709319498695\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=50, max_features='auto', max_leaf_nodes=5005,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 14 Score: 0.699999400598203 Variance: 1.7964125712735343e-07\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=11, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='entropy',\n", " max_depth=None, max_features='auto', max_le...,\n", " n_jobs=1, oob_score=False, random_state=None, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 15 Score: 0.6849873825921731 Variance: 0.0004643686436742\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', GradientBoostingClassifier(criterion='friedman_mse', init=None,\n", " learning_rate=0.1, loss='deviance', max_depth=None,\n", " max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samp... presort='auto', random_state=None, subsample=1.0, verbose=0,\n", " warm_start=False))])\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Iteration: 16 Score: 0.7339884794974615 Variance: 0.001510748319642625\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', GradientBoostingClassifier(criterion='friedman_mse', init=None,\n", " learning_rate=0.1, loss='deviance', max_depth=3,\n", " max_features='sqrt', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=100,\n", " presort='auto', random_state=None, subsample=1.0, verbose=0,\n", " warm_start=False))])\n", "\n", "Iteration: 17 Score: 0.723010435585286 Variance: 0.0013184775096818067\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=50, max_features='auto', max_leaf_no...timators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))])\n", "\n", "Iteration: 18 Score: 0.6789903676131223 Variance: 0.0005889760651514416\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=11, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', GradientBoostingClassifier(criterion='friedman_mse', init=None,\n", " learning_rate=0.9301432147001057, loss='deviance',\n", " ...uto', random_state=None,\n", " subsample=0.26687567221016584, verbose=0, warm_start=False))])\n", "\n", "Iteration: 19 Score: 0.6979794165422907 Variance: 0.0004282720144984198\n", "Pipeline: Pipeline(memory=None,\n", " steps=[('classification', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=5005,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False))])\n", "\n", "Task Completed\n", "\n", "Best Result: { 'pipeline': Pipeline(memory=None,\n", " steps=[('feature_selection', PCA(copy=True, iterated_power='auto', n_components=10, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)), ('classification', ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_...timators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False))]),\n", " 'score': 0.7449755144365922,\n", " 'user_defined_data': None}\n" ] } ], "source": [ "best_result = task.run(scoring_function, 20)\n", "print(\"Best Result: \", best_result)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": false, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }