{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![](logo.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Welcome to the automatminer basic tutorial!\n", "#### Versions used to make this notebook (`automatminer 2019.10.14` and `matminer 0.6.2`, `python 3.7.3` on MacOS Mojave `10.14.6`)\n", "\n", "---\n", "\n", "[Automatminer](https://github.com/hackingmaterials/automatminer) is a package for *automatically* creating ML pipelines using matminer's featurizers, feature reduction techniques, and Automated Machine Learning (AutoML). Automatminer works end to end - raw data to prediction - without *any* human input necessary. \n", "\n", "#### Put in a dataset, get out a machine that predicts materials properties.\n", "\n", "Automatminer is competitive with state of the art hand-tuned machine learning models across multiple domains of materials informatics. Automatminer also included utilities for running MatBench, a materials science ML benchmark. \n", "\n", "#### Learn more about Automatminer and MatBench from the [official documentation](http://hackingmaterials.lbl.gov/automatminer/). \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# How does automatminer work?\n", "Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer’s descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline. Once a pipeline has been fit, it can be summarized in a text file, saved to disk, or used to make predictions on new materials.\n", "\n", "![](pipe.png)\n", "\n", "Materials primitives (e.g., crystal structures) go in one end, and property predictions come out the other. MatPipe handles the intermediate operations such as assigning descriptors, cleaning problematic data, data conversions, imputation, and machine learning.\n", "\n", "### MatPipe is the main Automatminer object\n", "`MatPipe` is the central object in Automatminer. It has a sklearn BaseEstimator syntax for `fit` and `predict` operations. Simply `fit` on your training data, then `predict` on your testing data.\n", "\n", "### MatPipe uses [pandas](https://pandas.pydata.org>) dataframes as inputs and outputs. \n", "Put dataframes (of materials) in, get dataframes (of property predictions) out.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# What's in this notebook?\n", "\n", "In this notebook, we walk through the basic steps of using Automatminer to train and predict on data. We'll also view the internals of our AutoML pipeline using Automatminer's API. \n", "\n", "* First, we'll load a dataset of ~4,600 band gaps collected from experimental sources.\n", "* Next, we'll fit a Automatminer `MatPipe` (pipeline) to the data\n", "* Then, we'll predict experimental band gap from chemical composition, and see how our predictions do (note, this is not an easy problem!)\n", "* We'll examine our pipeline with `MatPipe`'s introspection methods.\n", "* Finally, we look at how to save and load pipelines for reproducible predictions.\n", "\n", "*Note: for the sake of brevity, we will use a single train-test split in this notebook. To run a full Automatminer benchmark, see the documentation for `MatPipe.benchmark`*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Preparing a dataset\n", "\n", "Let's load a dataset to play around with. For this example, we will use matminer to load one of the MatBench v0.1 datasets. If you have been through some of machine learning or data retrieval tutorials on this repo, you will be familiar with the commands needed to fetch our dataset as a dataframe.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gap expt
count4604.000000
mean0.975951
std1.445034
min0.000000
25%0.000000
50%0.000000
75%1.812500
max11.700000
\n", "
" ], "text/plain": [ " gap expt\n", "count 4604.000000\n", "mean 0.975951\n", "std 1.445034\n", "min 0.000000\n", "25% 0.000000\n", "50% 0.000000\n", "75% 1.812500\n", "max 11.700000" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from matminer.datasets import load_dataset\n", "\n", "df = load_dataset(\"matbench_expt_gap\")\n", "\n", "# Let's look at our dataset\n", "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Looking at the data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
compositiongap expt
0Ag(AuS)20.00
1Ag(W3Br7)20.00
2Ag0.5Ge1Pb1.75S41.83
3Ag0.5Ge1Pb1.75Se41.51
4Ag2BBr0.00
\n", "
" ], "text/plain": [ " composition gap expt\n", "0 Ag(AuS)2 0.00\n", "1 Ag(W3Br7)2 0.00\n", "2 Ag0.5Ge1Pb1.75S4 1.83\n", "3 Ag0.5Ge1Pb1.75Se4 1.51\n", "4 Ag2BBr 0.00" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Seeing how many unique compositions are present\n", "We should find all the compositions are unique." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4604" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many unique compositions do we have?\n", "df[\"composition\"].unique().shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate a train-test split" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "train_df, test_df = train_test_split(df, test_size=0.2, shuffle=True, random_state=20191014)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Remove the target property from the test_df\n", "\n", "Let's remove the testing dataframe's target property so we can be sure we are not giving Automatminer any test information.\n", "\n", "Our target variable is `\"gap expt\"`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
composition
4514ZnSb
834Co1Te1.88
4481Zn2Ni9O13
3958TiAlAu2
3087Pr(MnSi)2
\n", "
" ], "text/plain": [ " composition\n", "4514 ZnSb\n", "834 Co1Te1.88\n", "4481 Zn2Ni9O13\n", "3958 TiAlAu2\n", "3087 Pr(MnSi)2" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "target = \"gap expt\"\n", "prediction_df = test_df.drop(columns=[target])\n", "prediction_df.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
composition
count921
unique921
topLa2GeSe5
freq1
\n", "
" ], "text/plain": [ " composition\n", "count 921\n", "unique 921\n", "top La2GeSe5\n", "freq 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prediction_df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fitting and predicting with Automatminer's MatPipe\n", "\n", "Our dataset contains 4,604 unique stoichiometries and experimentally measured band gaps. We have everything we need to start our AutoML pipeline.\n", "\n", "For simplicity, we will use an `MatPipe` preset. `MatPipe` is highly customizable and has hundreds of configuration options, but most use cases will be satisfied by using one of the preset configurations. We use the `from_preset` method.\n", "\n", "In this example, we'll use the \"express\" preset, which will take approximately an hour.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/ardunn/alex/lbl/projects/common_env/common_env3/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning:\n", "\n", "sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.\n", "\n" ] } ], "source": [ "from automatminer import MatPipe\n", "\n", "pipe = MatPipe.from_preset(\"express\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fitting the pipeline\n", "\n", "To fit an Automatminer `MatPipe` to the data, pass in your training data and desired target." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2019-10-14 20:51:56 INFO Problem type is: regression\n", "2019-10-14 20:51:56 INFO Fitting MatPipe pipeline to data.\n", "2019-10-14 20:51:56 INFO AutoFeaturizer: Starting fitting.\n", "2019-10-14 20:51:56 INFO AutoFeaturizer: Compositions detected as strings. Attempting conversion to Composition objects...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "922915b82f5040b28fad1d6f20dea9ef", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='StrToComposition', max=3683, style=ProgressStyle(description_…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 20:51:56 INFO AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4686c83e5f1c458e96601d7f6b79a586", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='CompositionToOxidComposition', max=3683, style=ProgressStyle(…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Will remove YangSolidSolution because it's fraction passing the precheck for this dataset (0.4051045343469997) was less than the minimum (0.9)\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Will remove Miedema because it's fraction passing the precheck for this dataset (0.4051045343469997) was less than the minimum (0.9)\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Featurizer type structure not in the dataframe to be fitted. Skipping...\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Featurizer type bandstructure not in the dataframe to be fitted. Skipping...\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Featurizer type dos not in the dataframe to be fitted. Skipping...\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Finished fitting.\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Starting transforming.\n", "2019-10-14 20:52:55 INFO AutoFeaturizer: Featurizing with ElementProperty.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a671f3fa242244529ff99bfacf51d0a5", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='ElementProperty', max=3683, style=ProgressStyle(description_w…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 20:53:03 INFO AutoFeaturizer: Featurizing with OxidationStates.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "75d12247bf8d4680aff919838addc8d8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='OxidationStates', max=3683, style=ProgressStyle(description_w…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 20:53:03 INFO AutoFeaturizer: Featurizing with ElectronAffinity.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e671628382cf44859fe8aaec0e33e336", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='ElectronAffinity', max=3683, style=ProgressStyle(description_…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 20:53:03 INFO AutoFeaturizer: Featurizing with IonProperty.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3b2d03fa5e824df4a4ac975704166675", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='IonProperty', max=3683, style=ProgressStyle(description_width…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 20:53:15 INFO AutoFeaturizer: Featurizer type structure not in the dataframe. Skipping...\n", "2019-10-14 20:53:15 INFO AutoFeaturizer: Featurizer type bandstructure not in the dataframe. Skipping...\n", "2019-10-14 20:53:15 INFO AutoFeaturizer: Featurizer type dos not in the dataframe. Skipping...\n", "2019-10-14 20:53:15 INFO AutoFeaturizer: Finished transforming.\n", "2019-10-14 20:53:15 INFO DataCleaner: Starting fitting.\n", "2019-10-14 20:53:15 INFO DataCleaner: Cleaning with respect to samples with sample na_method 'drop'\n", "2019-10-14 20:53:15 INFO DataCleaner: Replacing infinite values with nan for easier screening.\n", "2019-10-14 20:53:16 INFO DataCleaner: Before handling na: 3683 samples, 141 features\n", "2019-10-14 20:53:16 INFO DataCleaner: 0 samples did not have target values. They were dropped.\n", "2019-10-14 20:53:16 INFO DataCleaner: Handling feature na by max na threshold of 0.01 with method 'drop'.\n", "2019-10-14 20:53:16 INFO DataCleaner: These 8 features were removed as they had more than 1.0% missing values: {'avg ionic char', 'compound possible', 'std_dev oxidation state', 'minimum oxidation state', 'avg anion electron affinity', 'maximum oxidation state', 'max ionic char', 'range oxidation state'}\n", "2019-10-14 20:53:16 INFO DataCleaner: After handling na: 3683 samples, 133 features\n", "2019-10-14 20:53:16 INFO DataCleaner: Finished fitting.\n", "2019-10-14 20:53:16 INFO FeatureReducer: Starting fitting.\n", "2019-10-14 20:53:16 INFO FeatureReducer: 18 features removed due to cross correlation more than 0.95\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/ardunn/alex/lbl/projects/common_env/common_env3/lib/python3.7/site-packages/sklearn/ensemble/forest.py:245: FutureWarning:\n", "\n", "The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "2019-10-14 20:54:21 INFO TreeFeatureReducer: Finished tree-based feature reduction of 114 initial features to 48\n", "2019-10-14 20:54:21 INFO FeatureReducer: Finished fitting.\n", "2019-10-14 20:54:21 INFO FeatureReducer: Starting transforming.\n", "2019-10-14 20:54:21 INFO FeatureReducer: Finished transforming.\n", "2019-10-14 20:54:21 INFO TPOTAdaptor: Starting fitting.\n", "28 operators have been imported by TPOT.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e189724fde454719a3d13af30004d4a2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='Optimization Progress', max=20, style=ProgressStyle(descripti…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "_pre_test decorator: _random_mutation_operator: num_test=1 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required by StandardScaler..\n", "Skipped pipeline #38 due to time out. Continuing to the next pipeline.\n", "Generation 1 - Current Pareto front scores:\n", "-3\t-0.5019658700076691\tRandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)\n", "\n", "Generation 2 - Current Pareto front scores:\n", "-3\t-0.49662437839581725\tRandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)\n", "\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "Generation 3 - Current Pareto front scores:\n", "-3\t-0.4704077886260397\tRandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)\n", "\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "Generation 4 - Current Pareto front scores:\n", "-3\t-0.4684578070615303\tRandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)\n", "\n", "Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation.\n", "Generation 5 - Current Pareto front scores:\n", "-3\t-0.46727537793493007\tRandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)\n", "\n", "Generation 6 - Current Pareto front scores:\n", "-3\t-0.46136176692378017\tRandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)\n", "\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "Generation 7 - Current Pareto front scores:\n", "-3\t-0.455318603917173\tRandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)\n", "\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "Skipped pipeline #164 due to time out. Continuing to the next pipeline.\n", "Skipped pipeline #167 due to time out. Continuing to the next pipeline.\n", "Generation 8 - Current Pareto front scores:\n", "-3\t-0.455318603917173\tRandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)\n", "\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "_pre_test decorator: _random_mutation_operator: num_test=0 Found array with 0 feature(s) (shape=(50, 0)) while a minimum of 1 is required..\n", "\n", "61.48513473333333 minutes have elapsed. TPOT will close down.\n", "TPOT closed during evaluation in one generation.\n", "WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation.\n", "\n", "\n", "TPOT closed prematurely. Will use the current best pipeline.\n", "2019-10-14 21:56:26 INFO TPOTAdaptor: Finished fitting.\n", "2019-10-14 21:56:26 INFO MatPipe successfully fit.\n" ] }, { "data": { "text/plain": [ "MatPipe(autofeaturizer=AutoFeaturizer(bandstructure_col=None, cache_src=None,\n", " composition_col='composition',\n", " do_precheck=True, dos_col='dos',\n", " drop_inputs=True, exclude=[],\n", " featurizers={'bandstructure': [BandFeaturizer(find_method='nearest',\n", " kpoints=None,\n", " nbands=2),\n", " BranchPointEnergy(atol=1e-05,\n", " calculate_band_edges=True,\n", " n_cb=1,\n", " n_vb=1)],\n", " 'composition': [ElementPr...\n", " max_na_frac=0.01, na_method_fit='drop',\n", " na_method_transform='fill'),\n", " learner=TPOTAdaptor(logger=),\n", " logger=,\n", " reducer=FeatureReducer(corr_threshold=0.95, keep_features=None,\n", " logger=,\n", " n_pca_features='auto', n_rebate_features=0.3,\n", " reducers=('corr', 'tree'), remove_features=None,\n", " tree_importance_percentile=0.99))" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pipe.fit(train_df, target)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predicting new data\n", "\n", "Our MatPipe is now fit. Let's predict our test data with `MatPipe.predict`. This should only take a few minutes." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2019-10-14 21:56:56 INFO Beginning MatPipe prediction using fitted pipeline.\n", "2019-10-14 21:56:56 INFO AutoFeaturizer: Starting transforming.\n", "2019-10-14 21:56:56 INFO AutoFeaturizer: Compositions detected as strings. Attempting conversion to Composition objects...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8a5f51325bec4682ba91096aedfc2f94", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='StrToComposition', max=921, style=ProgressStyle(description_w…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 21:56:57 INFO AutoFeaturizer: Guessing oxidation states of compositions, as they were not present in input.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "038c5d7339ae4552a2d2b589c2849e06", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='CompositionToOxidComposition', max=921, style=ProgressStyle(d…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 21:57:08 INFO AutoFeaturizer: Featurizing with ElementProperty.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "142e77a5105b4e1aa86baf5535214ee4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='ElementProperty', max=921, style=ProgressStyle(description_wi…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 21:57:10 INFO AutoFeaturizer: Featurizing with OxidationStates.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8b2602a37a264ff4854cd0aeac675443", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='OxidationStates', max=921, style=ProgressStyle(description_wi…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 21:57:10 INFO AutoFeaturizer: Featurizing with ElectronAffinity.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "eee01aef9bc341f0bea3b02078f807c5", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='ElectronAffinity', max=921, style=ProgressStyle(description_w…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 21:57:10 INFO AutoFeaturizer: Featurizing with IonProperty.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f8a8f177fe974fc4ae07658325852936", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='IonProperty', max=921, style=ProgressStyle(description_width=…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "2019-10-14 21:58:45 INFO AutoFeaturizer: Featurizer type structure not in the dataframe. Skipping...\n", "2019-10-14 21:58:45 INFO AutoFeaturizer: Featurizer type bandstructure not in the dataframe. Skipping...\n", "2019-10-14 21:58:45 INFO AutoFeaturizer: Featurizer type dos not in the dataframe. Skipping...\n", "2019-10-14 21:58:45 INFO AutoFeaturizer: Finished transforming.\n", "2019-10-14 21:58:45 INFO DataCleaner: Starting transforming.\n", "2019-10-14 21:58:45 INFO DataCleaner: Cleaning with respect to samples with sample na_method 'fill'\n", "2019-10-14 21:58:45 INFO DataCleaner: Replacing infinite values with nan for easier screening.\n", "2019-10-14 21:58:45 INFO DataCleaner: Before handling na: 921 samples, 140 features\n", "2019-10-14 21:58:45 WARNING DataCleaner: Mismatched columns found in dataframe used for fitting and argument dataframe.\n", "2019-10-14 21:58:45 WARNING DataCleaner: Coercing mismatched columns...\n", "2019-10-14 21:58:45 WARNING DataCleaner: Following columns are being dropped:\n", "['minimum oxidation state', 'maximum oxidation state', 'range oxidation state', 'std_dev oxidation state', 'avg anion electron affinity', 'compound possible', 'max ionic char', 'avg ionic char']\n", "2019-10-14 21:58:45 INFO DataCleaner: After handling na: 921 samples, 132 features\n", "2019-10-14 21:58:45 INFO DataCleaner: Target not found in df columns. Ignoring...\n", "2019-10-14 21:58:45 INFO DataCleaner: Finished transforming.\n", "2019-10-14 21:58:45 INFO FeatureReducer: Starting transforming.\n", "2019-10-14 21:58:45 WARNING FeatureReducer: Target not found in columns to transform.\n", "2019-10-14 21:58:45 INFO FeatureReducer: Finished transforming.\n", "2019-10-14 21:58:45 INFO TPOTAdaptor: Starting predicting.\n", "2019-10-14 21:58:46 INFO TPOTAdaptor: Prediction finished successfully.\n", "2019-10-14 21:58:46 INFO TPOTAdaptor: Finished predicting.\n", "2019-10-14 21:58:46 INFO MatPipe prediction completed.\n" ] } ], "source": [ "prediction_df = pipe.predict(prediction_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Examine predictions\n", "\n", "`MatPipe` places the predictions a column called `\"{target} predicted\"`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MagpieData maximum NumberMagpieData minimum MendeleevNumberMagpieData range MendeleevNumberMagpieData avg_dev MendeleevNumberMagpieData avg_dev AtomicWeightMagpieData maximum MeltingTMagpieData range MeltingTMagpieData mean MeltingTMagpieData avg_dev MeltingTMagpieData mean Column...MagpieData mean GSvolume_paMagpieData avg_dev GSvolume_paMagpieData mode GSvolume_paMagpieData maximum GSbandgapMagpieData mean GSbandgapMagpieData avg_dev GSbandgapMagpieData avg_dev GSmagmomMagpieData mean SpaceGroupNumberMagpieData avg_dev SpaceGroupNumbergap expt predicted
451451.069.016.08.00000028.190000903.78211.10798.230000105.55000013.500000...22.7600008.80000013.9600000.0000.0000000.000000.000000180.00000014.0000000.92970
83452.058.032.014.50617331.1278921768.001045.341085.625278473.87133513.569444...26.25002311.11459934.7633330.4640.3028890.210340.701950166.58333319.0393520.84045
448130.061.026.012.18750021.8024081728.001673.20735.406667744.44500013.416667...9.9652080.9318929.1050000.0000.0000000.000000.279091107.041667102.9618060.84395
395879.043.030.09.50000079.7711501941.001007.531387.282500276.8587509.750000...16.6425000.08125016.7000000.0000.0000000.000000.000008217.25000011.6250001.65345
308759.017.061.018.08000031.8066811687.00483.001523.200000131.0400009.000000...19.5060347.21475910.4875860.7730.3092000.371040.000149216.4000008.9600000.00000
\n", "

5 rows × 49 columns

\n", "
" ], "text/plain": [ " MagpieData maximum Number MagpieData minimum MendeleevNumber \\\n", "4514 51.0 69.0 \n", "834 52.0 58.0 \n", "4481 30.0 61.0 \n", "3958 79.0 43.0 \n", "3087 59.0 17.0 \n", "\n", " MagpieData range MendeleevNumber MagpieData avg_dev MendeleevNumber \\\n", "4514 16.0 8.000000 \n", "834 32.0 14.506173 \n", "4481 26.0 12.187500 \n", "3958 30.0 9.500000 \n", "3087 61.0 18.080000 \n", "\n", " MagpieData avg_dev AtomicWeight MagpieData maximum MeltingT \\\n", "4514 28.190000 903.78 \n", "834 31.127892 1768.00 \n", "4481 21.802408 1728.00 \n", "3958 79.771150 1941.00 \n", "3087 31.806681 1687.00 \n", "\n", " MagpieData range MeltingT MagpieData mean MeltingT \\\n", "4514 211.10 798.230000 \n", "834 1045.34 1085.625278 \n", "4481 1673.20 735.406667 \n", "3958 1007.53 1387.282500 \n", "3087 483.00 1523.200000 \n", "\n", " MagpieData avg_dev MeltingT MagpieData mean Column ... \\\n", "4514 105.550000 13.500000 ... \n", "834 473.871335 13.569444 ... \n", "4481 744.445000 13.416667 ... \n", "3958 276.858750 9.750000 ... \n", "3087 131.040000 9.000000 ... \n", "\n", " MagpieData mean GSvolume_pa MagpieData avg_dev GSvolume_pa \\\n", "4514 22.760000 8.800000 \n", "834 26.250023 11.114599 \n", "4481 9.965208 0.931892 \n", "3958 16.642500 0.081250 \n", "3087 19.506034 7.214759 \n", "\n", " MagpieData mode GSvolume_pa MagpieData maximum GSbandgap \\\n", "4514 13.960000 0.000 \n", "834 34.763333 0.464 \n", "4481 9.105000 0.000 \n", "3958 16.700000 0.000 \n", "3087 10.487586 0.773 \n", "\n", " MagpieData mean GSbandgap MagpieData avg_dev GSbandgap \\\n", "4514 0.000000 0.00000 \n", "834 0.302889 0.21034 \n", "4481 0.000000 0.00000 \n", "3958 0.000000 0.00000 \n", "3087 0.309200 0.37104 \n", "\n", " MagpieData avg_dev GSmagmom MagpieData mean SpaceGroupNumber \\\n", "4514 0.000000 180.000000 \n", "834 0.701950 166.583333 \n", "4481 0.279091 107.041667 \n", "3958 0.000008 217.250000 \n", "3087 0.000149 216.400000 \n", "\n", " MagpieData avg_dev SpaceGroupNumber gap expt predicted \n", "4514 14.000000 0.92970 \n", "834 19.039352 0.84045 \n", "4481 102.961806 0.84395 \n", "3958 11.625000 1.65345 \n", "3087 8.960000 0.00000 \n", "\n", "[5 rows x 49 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prediction_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Score predictions\n", "\n", "Now let's score our predictions using mean average error, and compare them to a Dummy Regressor from sklearn." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dummy MAE: 1.1256546688824407 eV\n", "MatPipe MAE: 0.4591923995656894 eV\n" ] } ], "source": [ "from sklearn.metrics import mean_absolute_error\n", "from sklearn.dummy import DummyRegressor\n", "\n", "# fit the dummy\n", "dr = DummyRegressor()\n", "dr.fit(train_df[\"composition\"], train_df[target])\n", "dummy_test = dr.predict(test_df[\"composition\"])\n", "\n", "\n", "# Score dummy and MatPipe\n", "true = test_df[target]\n", "matpipe_test = prediction_df[target + \" predicted\"]\n", "\n", "mae_matpipe = mean_absolute_error(true, matpipe_test)\n", "mae_dummy = mean_absolute_error(true, dummy_test)\n", "\n", "print(\"Dummy MAE: {} eV\".format(mae_dummy))\n", "print(\"MatPipe MAE: {} eV\".format(mae_matpipe))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Examining the internals of MatPipe\n", "\n", "Inspect `MatPipe` internals with a dict/text digest from either `MatPipe.inspect` (long, comprehensive version of all proper attriute names) or `MatPipe.summarize` (executive summary). " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'data_cleaning': {'drop_na_targets': 'True',\n", " 'encoder': 'one-hot',\n", " 'feature_na_method': 'drop',\n", " 'na_method_fit': 'drop',\n", " 'na_method_transform': 'fill'},\n", " 'feature_reduction': {'reducer_params': \"{'tree': {'importance_percentile': \"\n", " \"0.99, 'mode': 'regression', \"\n", " \"'random_state': 0}}\",\n", " 'reducers': \"('corr', 'tree')\"},\n", " 'features': ['MagpieData maximum Number',\n", " 'MagpieData minimum MendeleevNumber',\n", " 'MagpieData range MendeleevNumber',\n", " 'MagpieData avg_dev MendeleevNumber',\n", " 'MagpieData avg_dev AtomicWeight',\n", " 'MagpieData maximum MeltingT',\n", " 'MagpieData range MeltingT',\n", " 'MagpieData mean MeltingT',\n", " 'MagpieData avg_dev MeltingT',\n", " 'MagpieData mean Column',\n", " 'MagpieData avg_dev Column',\n", " 'MagpieData mean Row',\n", " 'MagpieData range CovalentRadius',\n", " 'MagpieData mean CovalentRadius',\n", " 'MagpieData avg_dev CovalentRadius',\n", " 'MagpieData minimum Electronegativity',\n", " 'MagpieData range Electronegativity',\n", " 'MagpieData mean Electronegativity',\n", " 'MagpieData avg_dev Electronegativity',\n", " 'MagpieData mode Electronegativity',\n", " 'MagpieData avg_dev NsValence',\n", " 'MagpieData mean NpValence',\n", " 'MagpieData avg_dev NpValence',\n", " 'MagpieData maximum NdValence',\n", " 'MagpieData range NdValence',\n", " 'MagpieData mean NdValence',\n", " 'MagpieData avg_dev NdValence',\n", " 'MagpieData minimum NValence',\n", " 'MagpieData range NValence',\n", " 'MagpieData mean NValence',\n", " 'MagpieData avg_dev NValence',\n", " 'MagpieData mean NsUnfilled',\n", " 'MagpieData mean NpUnfilled',\n", " 'MagpieData avg_dev NpUnfilled',\n", " 'MagpieData avg_dev NdUnfilled',\n", " 'MagpieData mean NUnfilled',\n", " 'MagpieData avg_dev NUnfilled',\n", " 'MagpieData minimum GSvolume_pa',\n", " 'MagpieData range GSvolume_pa',\n", " 'MagpieData mean GSvolume_pa',\n", " 'MagpieData avg_dev GSvolume_pa',\n", " 'MagpieData mode GSvolume_pa',\n", " 'MagpieData maximum GSbandgap',\n", " 'MagpieData mean GSbandgap',\n", " 'MagpieData avg_dev GSbandgap',\n", " 'MagpieData avg_dev GSmagmom',\n", " 'MagpieData mean SpaceGroupNumber',\n", " 'MagpieData avg_dev SpaceGroupNumber'],\n", " 'featurizers': {'bandstructure': [BandFeaturizer(find_method='nearest', kpoints=None, nbands=2),\n", " BranchPointEnergy(atol=1e-05, calculate_band_edges=True, n_cb=1, n_vb=1)],\n", " 'composition': [ElementProperty(data_source=,\n", " features=['Number', 'MendeleevNumber', 'AtomicWeight',\n", " 'MeltingT', 'Column', 'Row', 'CovalentRadius',\n", " 'Electronegativity', 'NsValence', 'NpValence',\n", " 'NdValence', 'NfValence', 'NValence', 'NsUnfilled',\n", " 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled',\n", " 'GSvolume_pa', 'GSbandgap', 'GSmagmom',\n", " 'SpaceGroupNumber'],\n", " stats=['minimum', 'maximum', 'range', 'mean', 'avg_dev',\n", " 'mode']),\n", " OxidationStates(stats=['minimum', 'maximum', 'range', 'std_dev']),\n", " ElectronAffinity(),\n", " IonProperty(data_source=,\n", " fast=False)],\n", " 'dos': [DOSFeaturizer(contributors=1, decay_length=0.1, gaussian_smear=0.05,\n", " sampling_resolution=100),\n", " DopingFermi(T=300, dopings=[-1e+20, 1e+20], eref='midgap', return_eref=False),\n", " Hybridization(decay_length=0.1, gaussian_smear=0.05, sampling_resolution=100,\n", " species=[]),\n", " DosAsymmetry(decay_length=0.5, gaussian_smear=0.05, sampling_resolution=100)],\n", " 'structure': [DensityFeatures(desired_features=None),\n", " GlobalSymmetryFeatures(desired_features=None),\n", " EwaldEnergy(accuracy=4),\n", " SineCoulombMatrix(diag_elems=True, flatten=True),\n", " GlobalInstabilityIndex(disordered_pymatgen=False, r_cut=4.0),\n", " StructuralComplexity(symprec=0.1)]},\n", " 'ml_model': 'Pipeline(memory=Memory(location=/var/folders/4z/3vrw2wq10kzfh29c4x35qk3m0000gp/T/tmp79ge0rli/joblib),\\n'\n", " \" steps=[('variancethreshold', \"\n", " 'VarianceThreshold(threshold=0.2)),\\n'\n", " \" ('normalizer', Normalizer(copy=True, \"\n", " \"norm='max')),\\n\"\n", " \" ('randomforestregressor',\\n\"\n", " ' RandomForestRegressor(bootstrap=False, '\n", " \"criterion='mse',\\n\"\n", " ' max_depth=None,\\n'\n", " ' '\n", " 'max_features=0.7500000000000002,\\n'\n", " ' max_leaf_nodes=None,\\n'\n", " ' '\n", " 'min_impurity_decrease=0.0,\\n'\n", " ' min_impurity_split=None,\\n'\n", " ' min_samples_leaf=1, '\n", " 'min_samples_split=2,\\n'\n", " ' '\n", " 'min_weight_fraction_leaf=0.0,\\n'\n", " ' n_estimators=200, '\n", " 'n_jobs=None,\\n'\n", " ' oob_score=False, '\n", " 'random_state=None,\\n'\n", " ' verbose=0, '\n", " 'warm_start=False))],\\n'\n", " ' verbose=False)'}\n" ] } ], "source": [ "import pprint\n", "\n", "# Get a summary and save a copy to json\n", "summary = pipe.summarize(filename=\"MatPipe_predict_experimental_gap_from_composition_summary.json\")\n", "\n", "pprint.pprint(summary)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'autofeaturizer': {'autofeaturizer': {'cache_src': None, 'preset': 'express', '_logger': , 'featurizers': {'composition': [ElementProperty(data_source=,\n", " features=['Number', 'MendeleevNumber', 'AtomicWeight',\n", " 'MeltingT', 'Column', 'Row', 'CovalentRadius',\n", " 'Electronegativity', 'NsValence', 'NpValence',\n", " 'NdValence', 'NfValence', 'NValence', 'NsUnfilled',\n", " 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled',\n", " 'GSvolume_pa', 'GSbandgap', 'GSmagmom',\n", " 'SpaceGroupNumber'],\n", " stats=['minimum', 'maximum', 'range', 'mean', 'avg_dev',\n", " 'mode']), OxidationStates(stats=['minimum', 'maximum', 'range', 'std_dev']), ElectronAffinity(), IonProperty(data_source=,\n", " fast=False)], 'structure': [DensityFeatures(desired_features=None), GlobalSymmetryFeatures(desired_features=None), EwaldEnergy(accuracy=4), SineCoulombMatrix(diag_elems=True, flatten=True), GlobalInstabilityIndex(disordered_pymatgen=False, r_cut=4.0), StructuralComplexity(symprec=0.1)], 'bandstructure': [BandFeaturizer(find_method='nearest', kpoints=None, nbands=2), BranchPointEnergy(atol=1e-05, calculate_band_edges=True, n_cb=1, n_vb=1)], 'dos': [DOSFeaturizer(contributors=1, decay_length=0.1, gaussian_smear=0.05,\n", " sampling_resolution=100), DopingFermi(T=300, dopings=[-1e+20, 1e+20], eref='midgap', return_eref=False), Hybridization(decay_length=0.1, gaussian_smear=0.05, sampling_resolution=100,\n", " species=[]), DosAsymmetry(decay_length=0.5, gaussian_smear=0.05, sampling_resolution=100)]}, 'exclude': [], 'functionalize': False, 'ignore_cols': [], 'fitted_input_df': {'obj': , 'columns': 2, 'samples': 3683}, 'converted_input_df': {'obj': , 'columns': 2, 'samples': 3683}, 'ignore_errors': True, 'drop_inputs': True, 'multiindex': False, 'do_precheck': True, 'n_jobs': None, 'guess_oxistates': True, 'features': ['MagpieData minimum Number', 'MagpieData maximum Number', 'MagpieData range Number', 'MagpieData mean Number', 'MagpieData avg_dev Number', 'MagpieData mode Number', 'MagpieData minimum MendeleevNumber', 'MagpieData maximum MendeleevNumber', 'MagpieData range MendeleevNumber', 'MagpieData mean MendeleevNumber', 'MagpieData avg_dev MendeleevNumber', 'MagpieData mode MendeleevNumber', 'MagpieData minimum AtomicWeight', 'MagpieData maximum AtomicWeight', 'MagpieData range AtomicWeight', 'MagpieData mean AtomicWeight', 'MagpieData avg_dev AtomicWeight', 'MagpieData mode AtomicWeight', 'MagpieData minimum MeltingT', 'MagpieData maximum MeltingT', 'MagpieData range MeltingT', 'MagpieData mean MeltingT', 'MagpieData avg_dev MeltingT', 'MagpieData mode MeltingT', 'MagpieData minimum Column', 'MagpieData maximum Column', 'MagpieData range Column', 'MagpieData mean Column', 'MagpieData avg_dev Column', 'MagpieData mode Column', 'MagpieData minimum Row', 'MagpieData maximum Row', 'MagpieData range Row', 'MagpieData mean Row', 'MagpieData avg_dev Row', 'MagpieData mode Row', 'MagpieData minimum CovalentRadius', 'MagpieData maximum CovalentRadius', 'MagpieData range CovalentRadius', 'MagpieData mean CovalentRadius', 'MagpieData avg_dev CovalentRadius', 'MagpieData mode CovalentRadius', 'MagpieData minimum Electronegativity', 'MagpieData maximum Electronegativity', 'MagpieData range Electronegativity', 'MagpieData mean Electronegativity', 'MagpieData avg_dev Electronegativity', 'MagpieData mode Electronegativity', 'MagpieData minimum NsValence', 'MagpieData maximum NsValence', 'MagpieData range NsValence', 'MagpieData mean NsValence', 'MagpieData avg_dev NsValence', 'MagpieData mode NsValence', 'MagpieData minimum NpValence', 'MagpieData maximum NpValence', 'MagpieData range NpValence', 'MagpieData mean NpValence', 'MagpieData avg_dev NpValence', 'MagpieData mode NpValence', 'MagpieData minimum NdValence', 'MagpieData maximum NdValence', 'MagpieData range NdValence', 'MagpieData mean NdValence', 'MagpieData avg_dev NdValence', 'MagpieData mode NdValence', 'MagpieData minimum NfValence', 'MagpieData maximum NfValence', 'MagpieData range NfValence', 'MagpieData mean NfValence', 'MagpieData avg_dev NfValence', 'MagpieData mode NfValence', 'MagpieData minimum NValence', 'MagpieData maximum NValence', 'MagpieData range NValence', 'MagpieData mean NValence', 'MagpieData avg_dev NValence', 'MagpieData mode NValence', 'MagpieData minimum NsUnfilled', 'MagpieData maximum NsUnfilled', 'MagpieData range NsUnfilled', 'MagpieData mean NsUnfilled', 'MagpieData avg_dev NsUnfilled', 'MagpieData mode NsUnfilled', 'MagpieData minimum NpUnfilled', 'MagpieData maximum NpUnfilled', 'MagpieData range NpUnfilled', 'MagpieData mean NpUnfilled', 'MagpieData avg_dev NpUnfilled', 'MagpieData mode NpUnfilled', 'MagpieData minimum NdUnfilled', 'MagpieData maximum NdUnfilled', 'MagpieData range NdUnfilled', 'MagpieData mean NdUnfilled', 'MagpieData avg_dev NdUnfilled', 'MagpieData mode NdUnfilled', 'MagpieData minimum NfUnfilled', 'MagpieData maximum NfUnfilled', 'MagpieData range NfUnfilled', 'MagpieData mean NfUnfilled', 'MagpieData avg_dev NfUnfilled', 'MagpieData mode NfUnfilled', 'MagpieData minimum NUnfilled', 'MagpieData maximum NUnfilled', 'MagpieData range NUnfilled', 'MagpieData mean NUnfilled', 'MagpieData avg_dev NUnfilled', 'MagpieData mode NUnfilled', 'MagpieData minimum GSvolume_pa', 'MagpieData maximum GSvolume_pa', 'MagpieData range GSvolume_pa', 'MagpieData mean GSvolume_pa', 'MagpieData avg_dev GSvolume_pa', 'MagpieData mode GSvolume_pa', 'MagpieData minimum GSbandgap', 'MagpieData maximum GSbandgap', 'MagpieData range GSbandgap', 'MagpieData mean GSbandgap', 'MagpieData avg_dev GSbandgap', 'MagpieData mode GSbandgap', 'MagpieData minimum GSmagmom', 'MagpieData maximum GSmagmom', 'MagpieData range GSmagmom', 'MagpieData mean GSmagmom', 'MagpieData avg_dev GSmagmom', 'MagpieData mode GSmagmom', 'MagpieData minimum SpaceGroupNumber', 'MagpieData maximum SpaceGroupNumber', 'MagpieData range SpaceGroupNumber', 'MagpieData mean SpaceGroupNumber', 'MagpieData avg_dev SpaceGroupNumber', 'MagpieData mode SpaceGroupNumber', 'minimum oxidation state', 'maximum oxidation state', 'range oxidation state', 'std_dev oxidation state', 'avg anion electron affinity', 'compound possible', 'max ionic char', 'avg ionic char'], 'auto_featurizer': True, 'removed_featurizers': [YangSolidSolution(), Miedema(data_source='Miedema', ss_types=['min'],\n", " struct_types=['inter', 'amor', 'ss'])], 'composition_col': 'composition', 'structure_col': 'structure', 'bandstruct_col': 'bandstructure', 'dos_col': 'dos', 'is_fit': True, 'fittable_fcls': {'PartialRadialDistributionFunction', 'BagofBonds', 'BondFractions'}, 'needs_fit': False, 'min_precheck_frac': 0.9}}, 'cleaner': {'cleaner': {'_logger': , 'max_na_frac': 0.01, 'feature_na_method': 'drop', 'encoder': 'one-hot', 'encode_categories': True, 'drop_na_targets': True, 'na_method_fit': 'drop', 'na_method_transform': 'fill', 'dropped_features': ['avg ionic char', 'std_dev oxidation state', 'maximum oxidation state', 'avg anion electron affinity', 'range oxidation state', 'minimum oxidation state', 'max ionic char', 'compound possible'], 'object_cols': [], 'number_cols': ['MagpieData minimum Number', 'MagpieData maximum Number', 'MagpieData range Number', 'MagpieData mean Number', 'MagpieData avg_dev Number', 'MagpieData mode Number', 'MagpieData minimum MendeleevNumber', 'MagpieData maximum MendeleevNumber', 'MagpieData range MendeleevNumber', 'MagpieData mean MendeleevNumber', 'MagpieData avg_dev MendeleevNumber', 'MagpieData mode MendeleevNumber', 'MagpieData minimum AtomicWeight', 'MagpieData maximum AtomicWeight', 'MagpieData range AtomicWeight', 'MagpieData mean AtomicWeight', 'MagpieData avg_dev AtomicWeight', 'MagpieData mode AtomicWeight', 'MagpieData minimum MeltingT', 'MagpieData maximum MeltingT', 'MagpieData range MeltingT', 'MagpieData mean MeltingT', 'MagpieData avg_dev MeltingT', 'MagpieData mode MeltingT', 'MagpieData minimum Column', 'MagpieData maximum Column', 'MagpieData range Column', 'MagpieData mean Column', 'MagpieData avg_dev Column', 'MagpieData mode Column', 'MagpieData minimum Row', 'MagpieData maximum Row', 'MagpieData range Row', 'MagpieData mean Row', 'MagpieData avg_dev Row', 'MagpieData mode Row', 'MagpieData minimum CovalentRadius', 'MagpieData maximum CovalentRadius', 'MagpieData range CovalentRadius', 'MagpieData mean CovalentRadius', 'MagpieData avg_dev CovalentRadius', 'MagpieData mode CovalentRadius', 'MagpieData minimum Electronegativity', 'MagpieData maximum Electronegativity', 'MagpieData range Electronegativity', 'MagpieData mean Electronegativity', 'MagpieData avg_dev Electronegativity', 'MagpieData mode Electronegativity', 'MagpieData minimum NsValence', 'MagpieData maximum NsValence', 'MagpieData range NsValence', 'MagpieData mean NsValence', 'MagpieData avg_dev NsValence', 'MagpieData mode NsValence', 'MagpieData minimum NpValence', 'MagpieData maximum NpValence', 'MagpieData range NpValence', 'MagpieData mean NpValence', 'MagpieData avg_dev NpValence', 'MagpieData mode NpValence', 'MagpieData minimum NdValence', 'MagpieData maximum NdValence', 'MagpieData range NdValence', 'MagpieData mean NdValence', 'MagpieData avg_dev NdValence', 'MagpieData mode NdValence', 'MagpieData minimum NfValence', 'MagpieData maximum NfValence', 'MagpieData range NfValence', 'MagpieData mean NfValence', 'MagpieData avg_dev NfValence', 'MagpieData mode NfValence', 'MagpieData minimum NValence', 'MagpieData maximum NValence', 'MagpieData range NValence', 'MagpieData mean NValence', 'MagpieData avg_dev NValence', 'MagpieData mode NValence', 'MagpieData minimum NsUnfilled', 'MagpieData maximum NsUnfilled', 'MagpieData range NsUnfilled', 'MagpieData mean NsUnfilled', 'MagpieData avg_dev NsUnfilled', 'MagpieData mode NsUnfilled', 'MagpieData minimum NpUnfilled', 'MagpieData maximum NpUnfilled', 'MagpieData range NpUnfilled', 'MagpieData mean NpUnfilled', 'MagpieData avg_dev NpUnfilled', 'MagpieData mode NpUnfilled', 'MagpieData minimum NdUnfilled', 'MagpieData maximum NdUnfilled', 'MagpieData range NdUnfilled', 'MagpieData mean NdUnfilled', 'MagpieData avg_dev NdUnfilled', 'MagpieData mode NdUnfilled', 'MagpieData minimum NfUnfilled', 'MagpieData maximum NfUnfilled', 'MagpieData range NfUnfilled', 'MagpieData mean NfUnfilled', 'MagpieData avg_dev NfUnfilled', 'MagpieData mode NfUnfilled', 'MagpieData minimum NUnfilled', 'MagpieData maximum NUnfilled', 'MagpieData range NUnfilled', 'MagpieData mean NUnfilled', 'MagpieData avg_dev NUnfilled', 'MagpieData mode NUnfilled', 'MagpieData minimum GSvolume_pa', 'MagpieData maximum GSvolume_pa', 'MagpieData range GSvolume_pa', 'MagpieData mean GSvolume_pa', 'MagpieData avg_dev GSvolume_pa', 'MagpieData mode GSvolume_pa', 'MagpieData minimum GSbandgap', 'MagpieData maximum GSbandgap', 'MagpieData range GSbandgap', 'MagpieData mean GSbandgap', 'MagpieData avg_dev GSbandgap', 'MagpieData mode GSbandgap', 'MagpieData minimum GSmagmom', 'MagpieData maximum GSmagmom', 'MagpieData range GSmagmom', 'MagpieData mean GSmagmom', 'MagpieData avg_dev GSmagmom', 'MagpieData mode GSmagmom', 'MagpieData minimum SpaceGroupNumber', 'MagpieData maximum SpaceGroupNumber', 'MagpieData range SpaceGroupNumber', 'MagpieData mean SpaceGroupNumber', 'MagpieData avg_dev SpaceGroupNumber', 'MagpieData mode SpaceGroupNumber', 'minimum oxidation state', 'maximum oxidation state', 'range oxidation state', 'std_dev oxidation state', 'avg anion electron affinity', 'compound possible', 'max ionic char', 'avg ionic char'], 'fitted_df': {'obj': , 'columns': 133, 'samples': 3683}, 'fitted_target': 'gap expt', 'dropped_samples': {'obj': , 'columns': 141, 'samples': 0}, 'max_problem_col_warning_threshold': 0.3, 'warnings': [], 'is_fit': True}}, 'reducer': {'reducer': {'reducers': ('corr', 'tree'), 'corr_threshold': 0.95, 'n_pca_features': 'auto', 'tree_importance_percentile': 0.99, 'n_rebate_features': 0.3, '_logger': , '_keep_features': [], '_remove_features': [], 'removed_features': {'corr': ['MagpieData minimum Number', 'MagpieData mean Number', 'MagpieData avg_dev Number', 'MagpieData mode Number', 'MagpieData minimum AtomicWeight', 'MagpieData maximum AtomicWeight', 'MagpieData range AtomicWeight', 'MagpieData mean AtomicWeight', 'MagpieData mode AtomicWeight', 'MagpieData range NsValence', 'MagpieData range NfValence', 'MagpieData maximum NsUnfilled', 'MagpieData range NdUnfilled', 'MagpieData range NfUnfilled', 'MagpieData maximum GSvolume_pa', 'MagpieData range GSbandgap', 'MagpieData range GSmagmom', 'MagpieData range SpaceGroupNumber'], 'tree': ['MagpieData range Number', 'MagpieData maximum MendeleevNumber', 'MagpieData mean MendeleevNumber', 'MagpieData mode MendeleevNumber', 'MagpieData minimum MeltingT', 'MagpieData mode MeltingT', 'MagpieData minimum Column', 'MagpieData maximum Column', 'MagpieData range Column', 'MagpieData mode Column', 'MagpieData minimum Row', 'MagpieData maximum Row', 'MagpieData range Row', 'MagpieData avg_dev Row', 'MagpieData mode Row', 'MagpieData minimum CovalentRadius', 'MagpieData maximum CovalentRadius', 'MagpieData mode CovalentRadius', 'MagpieData maximum Electronegativity', 'MagpieData minimum NsValence', 'MagpieData maximum NsValence', 'MagpieData mean NsValence', 'MagpieData mode NsValence', 'MagpieData minimum NpValence', 'MagpieData maximum NpValence', 'MagpieData range NpValence', 'MagpieData mode NpValence', 'MagpieData minimum NdValence', 'MagpieData mode NdValence', 'MagpieData minimum NfValence', 'MagpieData maximum NfValence', 'MagpieData mean NfValence', 'MagpieData avg_dev NfValence', 'MagpieData mode NfValence', 'MagpieData maximum NValence', 'MagpieData mode NValence', 'MagpieData minimum NsUnfilled', 'MagpieData range NsUnfilled', 'MagpieData avg_dev NsUnfilled', 'MagpieData mode NsUnfilled', 'MagpieData minimum NpUnfilled', 'MagpieData maximum NpUnfilled', 'MagpieData range NpUnfilled', 'MagpieData mode NpUnfilled', 'MagpieData minimum NdUnfilled', 'MagpieData maximum NdUnfilled', 'MagpieData mean NdUnfilled', 'MagpieData mode NdUnfilled', 'MagpieData minimum NfUnfilled', 'MagpieData maximum NfUnfilled', 'MagpieData mean NfUnfilled', 'MagpieData avg_dev NfUnfilled', 'MagpieData mode NfUnfilled', 'MagpieData minimum NUnfilled', 'MagpieData maximum NUnfilled', 'MagpieData range NUnfilled', 'MagpieData mode NUnfilled', 'MagpieData minimum GSbandgap', 'MagpieData mode GSbandgap', 'MagpieData minimum GSmagmom', 'MagpieData maximum GSmagmom', 'MagpieData mean GSmagmom', 'MagpieData mode GSmagmom', 'MagpieData minimum SpaceGroupNumber', 'MagpieData maximum SpaceGroupNumber', 'MagpieData mode SpaceGroupNumber']}, 'retained_features': ['MagpieData range NValence', 'MagpieData mean GSbandgap', 'MagpieData avg_dev MeltingT', 'MagpieData range CovalentRadius', 'MagpieData maximum Number', 'MagpieData range GSvolume_pa', 'MagpieData mean CovalentRadius', 'MagpieData mean NValence', 'MagpieData avg_dev NdValence', 'MagpieData avg_dev CovalentRadius', 'MagpieData mean NsUnfilled', 'MagpieData avg_dev NsValence', 'MagpieData avg_dev GSmagmom', 'MagpieData avg_dev NpValence', 'MagpieData mean NdValence', 'MagpieData mean NUnfilled', 'MagpieData avg_dev GSbandgap', 'MagpieData mode Electronegativity', 'MagpieData range Electronegativity', 'MagpieData range MeltingT', 'MagpieData mean GSvolume_pa', 'MagpieData avg_dev GSvolume_pa', 'MagpieData avg_dev AtomicWeight', 'MagpieData mean Column', 'MagpieData avg_dev SpaceGroupNumber', 'MagpieData avg_dev Column', 'MagpieData avg_dev NValence', 'MagpieData maximum GSbandgap', 'MagpieData range MendeleevNumber', 'MagpieData avg_dev NUnfilled', 'MagpieData maximum NdValence', 'MagpieData avg_dev MendeleevNumber', 'MagpieData mean NpUnfilled', 'MagpieData avg_dev Electronegativity', 'MagpieData range NdValence', 'MagpieData mean Row', 'MagpieData avg_dev NpUnfilled', 'MagpieData mode GSvolume_pa', 'MagpieData minimum GSvolume_pa', 'MagpieData avg_dev NdUnfilled', 'MagpieData mean MeltingT', 'MagpieData minimum MendeleevNumber', 'MagpieData minimum Electronegativity', 'MagpieData maximum MeltingT', 'MagpieData mean Electronegativity', 'MagpieData minimum NValence', 'MagpieData mean NpValence', 'MagpieData mean SpaceGroupNumber'], 'reducer_params': {'tree': {'importance_percentile': 0.99, 'mode': 'regression', 'random_state': 0}}, '_pca': None, '_pca_feats': None, 'is_fit': True}}, 'learner': {'learner': {'mode': 'regression', 'tpot_kwargs': {'max_time_mins': 60, 'population_size': 20, 'cv': 5, 'n_jobs': -1, 'verbosity': 3, 'memory': 'auto', 'template': 'Selector-Transformer-Regressor', 'config_dict': {'sklearn.linear_model.ElasticNetCV': {'l1_ratio': array([0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,\n", " 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ]), 'tol': [1e-05, 0.0001, 0.001, 0.01, 0.1]}, 'sklearn.ensemble.ExtraTreesRegressor': {'n_estimators': [20, 100, 200, 500, 1000], 'max_features': array([0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]), 'min_samples_split': range(2, 21, 3), 'min_samples_leaf': range(1, 21, 3), 'bootstrap': [True, False]}, 'sklearn.ensemble.GradientBoostingRegressor': {'n_estimators': [20, 100, 200, 500, 1000], 'loss': ['ls', 'lad', 'huber', 'quantile'], 'learning_rate': [0.01, 0.1, 0.5, 1.0], 'max_depth': range(1, 11, 2), 'min_samples_split': range(2, 21, 3), 'min_samples_leaf': range(1, 21, 3), 'subsample': array([0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 , 0.55,\n", " 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ]), 'max_features': array([0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 , 0.55,\n", " 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ]), 'alpha': [0.75, 0.8, 0.85, 0.9, 0.95, 0.99]}, 'sklearn.tree.DecisionTreeRegressor': {'max_depth': range(1, 11, 2), 'min_samples_split': range(2, 21, 3), 'min_samples_leaf': range(1, 21, 3)}, 'sklearn.neighbors.KNeighborsRegressor': {'n_neighbors': range(1, 101), 'weights': ['uniform', 'distance'], 'p': [1, 2]}, 'sklearn.linear_model.LassoLarsCV': {'normalize': [True, False]}, 'sklearn.svm.LinearSVR': {'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive'], 'dual': [True, False], 'tol': [1e-05, 0.0001, 0.001, 0.01, 0.1], 'C': [0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 5.0, 10.0, 15.0, 20.0, 25.0], 'epsilon': [0.0001, 0.001, 0.01, 0.1, 1.0]}, 'sklearn.ensemble.RandomForestRegressor': {'n_estimators': [20, 100, 200, 500, 1000], 'max_features': array([0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]), 'min_samples_split': range(2, 21, 3), 'min_samples_leaf': range(1, 21, 3), 'bootstrap': [True, False]}, 'sklearn.linear_model.RidgeCV': {}, 'xgboost.XGBRegressor': {'n_estimators': [20, 100, 200, 500, 1000], 'max_depth': range(1, 11, 2), 'learning_rate': [0.01, 0.1, 0.5, 1.0], 'subsample': array([0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]), 'min_child_weight': range(1, 21, 4), 'nthread': [1]}, 'sklearn.preprocessing.Binarizer': {'threshold': array([0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,\n", " 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ])}, 'sklearn.decomposition.FastICA': {'tol': array([0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,\n", " 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ])}, 'sklearn.cluster.FeatureAgglomeration': {'linkage': ['ward', 'complete', 'average'], 'affinity': ['euclidean', 'l1', 'l2', 'manhattan', 'cosine']}, 'sklearn.preprocessing.MaxAbsScaler': {}, 'sklearn.preprocessing.MinMaxScaler': {}, 'sklearn.preprocessing.Normalizer': {'norm': ['l1', 'l2', 'max']}, 'sklearn.kernel_approximation.Nystroem': {'kernel': ['rbf', 'cosine', 'chi2', 'laplacian', 'polynomial', 'poly', 'linear', 'additive_chi2', 'sigmoid'], 'gamma': array([0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,\n", " 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ]), 'n_components': range(1, 11)}, 'sklearn.decomposition.PCA': {'svd_solver': ['randomized'], 'iterated_power': range(1, 11)}, 'sklearn.preprocessing.PolynomialFeatures': {'degree': [2], 'include_bias': [False], 'interaction_only': [False]}, 'sklearn.kernel_approximation.RBFSampler': {'gamma': array([0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,\n", " 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ])}, 'sklearn.preprocessing.RobustScaler': {}, 'sklearn.preprocessing.StandardScaler': {}, 'tpot.builtins.ZeroCount': {}, 'tpot.builtins.OneHotEncoder': {'minimum_fraction': [0.05, 0.1, 0.15, 0.2, 0.25], 'sparse': [False], 'threshold': [10]}, 'sklearn.feature_selection.SelectFwe': {'alpha': array([0. , 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008,\n", " 0.009, 0.01 , 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017,\n", " 0.018, 0.019, 0.02 , 0.021, 0.022, 0.023, 0.024, 0.025, 0.026,\n", " 0.027, 0.028, 0.029, 0.03 , 0.031, 0.032, 0.033, 0.034, 0.035,\n", " 0.036, 0.037, 0.038, 0.039, 0.04 , 0.041, 0.042, 0.043, 0.044,\n", " 0.045, 0.046, 0.047, 0.048, 0.049]), 'score_func': {'sklearn.feature_selection.f_regression': None}}, 'sklearn.feature_selection.SelectPercentile': {'percentile': range(1, 100), 'score_func': {'sklearn.feature_selection.f_regression': None}}, 'sklearn.feature_selection.VarianceThreshold': {'threshold': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2]}, 'sklearn.feature_selection.SelectFromModel': {'threshold': array([0. , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,\n", " 0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1. ]), 'estimator': {'sklearn.ensemble.ExtraTreesRegressor': {'n_estimators': [100], 'max_features': array([0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95])}}}}, 'scoring': 'neg_mean_absolute_error'}, 'models': OrderedDict([('XGBRegressor', [{'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('XGBRegressor(MinMaxScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=65)), XGBRegressor__learning_rate=0.01, XGBRegressor__max_depth=3, XGBRegressor__min_child_weight=13, XGBRegressor__n_estimators=200, XGBRegressor__nthread=1, XGBRegressor__subsample=0.35000000000000003)',), 'operator_count': 3, 'internal_cv_score': -0.569127814761169}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('XGBRegressor(StandardScaler(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesRegressor__max_features=0.7500000000000002, SelectFromModel__ExtraTreesRegressor__n_estimators=100, SelectFromModel__threshold=0.05)), XGBRegressor__learning_rate=0.01, XGBRegressor__max_depth=7, XGBRegressor__min_child_weight=17, XGBRegressor__n_estimators=200, XGBRegressor__nthread=1, XGBRegressor__subsample=0.6500000000000001)',), 'operator_count': 3, 'internal_cv_score': -0.575087151522238}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('XGBRegressor(StandardScaler(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesRegressor__max_features=0.7500000000000002, SelectFromModel__ExtraTreesRegressor__n_estimators=100, SelectFromModel__threshold=0.0)), XGBRegressor__learning_rate=0.01, XGBRegressor__max_depth=7, XGBRegressor__min_child_weight=17, XGBRegressor__n_estimators=200, XGBRegressor__nthread=1, XGBRegressor__subsample=0.6500000000000001)',), 'operator_count': 3, 'internal_cv_score': -0.6215060869696674}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('XGBRegressor(MinMaxScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=65)), XGBRegressor__learning_rate=0.01, XGBRegressor__max_depth=3, XGBRegressor__min_child_weight=13, XGBRegressor__n_estimators=200, XGBRegressor__nthread=1, XGBRegressor__subsample=0.35000000000000003)',), 'operator_count': 3, 'internal_cv_score': -0.6669258792711521}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.6969706764396285}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('XGBRegressor(MinMaxScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=65)), XGBRegressor__learning_rate=0.1, XGBRegressor__max_depth=3, XGBRegressor__min_child_weight=13, XGBRegressor__n_estimators=200, XGBRegressor__nthread=1, XGBRegressor__subsample=0.35000000000000003)',), 'operator_count': 3, 'internal_cv_score': -0.7385281412308007}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.7480645948173358}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('XGBRegressor(StandardScaler(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesRegressor__max_features=0.7500000000000002, SelectFromModel__ExtraTreesRegressor__n_estimators=100, SelectFromModel__threshold=0.05)), XGBRegressor__learning_rate=0.1, XGBRegressor__max_depth=7, XGBRegressor__min_child_weight=17, XGBRegressor__n_estimators=200, XGBRegressor__nthread=1, XGBRegressor__subsample=0.6500000000000001)',), 'operator_count': 3, 'internal_cv_score': -0.8026821366120996}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('XGBRegressor(StandardScaler(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesRegressor__max_features=0.7500000000000002, SelectFromModel__ExtraTreesRegressor__n_estimators=100, SelectFromModel__threshold=0.05)), XGBRegressor__learning_rate=0.01, XGBRegressor__max_depth=7, XGBRegressor__min_child_weight=17, XGBRegressor__n_estimators=200, XGBRegressor__nthread=1, XGBRegressor__subsample=0.6500000000000001)',), 'operator_count': 3, 'internal_cv_score': -0.8089374047850457}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -1.7837394544333396}]), ('ElasticNetCV', [{'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.8164626453233076}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('ElasticNetCV(RobustScaler(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.05)), ElasticNetCV__l1_ratio=1.0, ElasticNetCV__tol=0.001)',), 'operator_count': 3, 'internal_cv_score': -0.8164660633330266}]), ('ExtraTreesRegressor', [{'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('ExtraTreesRegressor(ZeroCount(SelectPercentile(input_matrix, SelectPercentile__percentile=32)), ExtraTreesRegressor__bootstrap=True, ExtraTreesRegressor__max_features=0.6500000000000001, ExtraTreesRegressor__min_samples_leaf=13, ExtraTreesRegressor__min_samples_split=17, ExtraTreesRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.6001026044965307}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('ExtraTreesRegressor(ZeroCount(SelectPercentile(input_matrix, SelectPercentile__percentile=74)), ExtraTreesRegressor__bootstrap=True, ExtraTreesRegressor__max_features=0.6500000000000001, ExtraTreesRegressor__min_samples_leaf=13, ExtraTreesRegressor__min_samples_split=17, ExtraTreesRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.6023846638211461}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 1, 'predecessor': ('ExtraTreesRegressor(StandardScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=33)), ExtraTreesRegressor__bootstrap=False, ExtraTreesRegressor__max_features=0.6500000000000001, ExtraTreesRegressor__min_samples_leaf=13, ExtraTreesRegressor__min_samples_split=11, ExtraTreesRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.6045251072197617}, {'generation': 'INVALID', 'mutation_count': 0, 'crossover_count': 1, 'predecessor': ('ExtraTreesRegressor(StandardScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=20)), ExtraTreesRegressor__bootstrap=False, ExtraTreesRegressor__max_features=0.6500000000000001, ExtraTreesRegressor__min_samples_leaf=13, ExtraTreesRegressor__min_samples_split=11, ExtraTreesRegressor__n_estimators=500)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)'), 'operator_count': 3, 'internal_cv_score': -0.6158514341896033}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.6375123860905978}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.6509010828842549}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('ExtraTreesRegressor(Nystroem(SelectPercentile(input_matrix, SelectPercentile__percentile=66), Nystroem__gamma=0.65, Nystroem__kernel=additive_chi2, Nystroem__n_components=8), ExtraTreesRegressor__bootstrap=True, ExtraTreesRegressor__max_features=0.45000000000000007, ExtraTreesRegressor__min_samples_leaf=19, ExtraTreesRegressor__min_samples_split=8, ExtraTreesRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.7833717352531547}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.8384927269928548}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('ExtraTreesRegressor(Nystroem(SelectPercentile(input_matrix, SelectPercentile__percentile=66), Nystroem__gamma=0.65, Nystroem__kernel=additive_chi2, Nystroem__n_components=8), ExtraTreesRegressor__bootstrap=True, ExtraTreesRegressor__max_features=0.45000000000000007, ExtraTreesRegressor__min_samples_leaf=19, ExtraTreesRegressor__min_samples_split=8, ExtraTreesRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.8576068260773978}]), ('RandomForestRegressor', [{'generation': 'INVALID', 'mutation_count': 9, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)'), 'operator_count': 3, 'internal_cv_score': -0.455318603917173}, {'generation': 'INVALID', 'mutation_count': 7, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.45842830981210525}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.45912305078608934}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4594158770131556}, {'generation': 'INVALID', 'mutation_count': 8, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.05), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.45977194554893497}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4605703481726741}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)', 'RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)'), 'operator_count': 3, 'internal_cv_score': -0.46136176692378017}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.46176736844433963}, {'generation': 'INVALID', 'mutation_count': 7, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.46179249348121043}, {'generation': 'INVALID', 'mutation_count': 7, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.4620840399681434}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.4636897739587635}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4647406961609935}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.46527932953808043}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4653152674989677}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.46563285018582967}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4661545082738482}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.46727537793493007}, {'generation': 'INVALID', 'mutation_count': 8, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.468052936596661}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.46806860712494835}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)'), 'operator_count': 3, 'internal_cv_score': -0.4682111621364521}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4682698397218453}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4684578070615303}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4693576708232553}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4694256948336381}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4704077886260397}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)'), 'operator_count': 3, 'internal_cv_score': -0.4704287379948086}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.4706860033700077}, {'generation': 'INVALID', 'mutation_count': 9, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.4712993735104124}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.05), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.47190471902468883}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.47253502577281575}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.47256092221083323}, {'generation': 'INVALID', 'mutation_count': 7, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.4736661653552886}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.05), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.47395881953867036}, {'generation': 'INVALID', 'mutation_count': 10, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.4755418318978232}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.4767649465160435}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4770011241962126}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.4771079125862781}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.47736442567696874}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4775844756577783}, {'generation': 'INVALID', 'mutation_count': 7, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)'), 'operator_count': 3, 'internal_cv_score': -0.4788696819140464}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4790047978364108}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4790520868975282}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.4794093061234272}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4813523796531178}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.4814206338121645}, {'generation': 'INVALID', 'mutation_count': 9, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.48288021967730527}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4830894376464305}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.4848959183862999}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.45000000000000007, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.48573855595856125}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.486632824201072}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.48686893877942305}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.48808093210332115}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=36), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.4883693989166911}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.48964958977346473}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.4896931615833874}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.49068955754032634}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4916322394327768}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.05), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.4944770445819323}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.4949934013332545}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)'), 'operator_count': 3, 'internal_cv_score': -0.495442817414375}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.4965168745575482}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.49662437839581725}, {'generation': 'INVALID', 'mutation_count': 7, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.49785791904644255}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.498404746131628}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5006650431759189}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.501016326273348}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=7, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5015576419135714}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.45000000000000007, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5017029800970444}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5017239416184885}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.5019658700076691}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.45000000000000007, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.502132009247242}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.502173717074214}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.05), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.5028369660168946}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.503488141295253}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5043467494687167}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.506085745863076}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5066423203277093}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5082377148963679}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5108597894961291}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5113743494461277}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5138761448307937}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)'), 'operator_count': 3, 'internal_cv_score': -0.5140737071551884}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.5142631825324426}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5143240529651936}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.45000000000000007, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.5164723844402882}, {'generation': 'INVALID', 'mutation_count': 6, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=83), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=200)',), 'operator_count': 3, 'internal_cv_score': -0.5185720052718958}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5197089440712135}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=45), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5256653512083849}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=36), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=8, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5256704665329128}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=37), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=8, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5256704665329128}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=37), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=8, RandomForestRegressor__n_estimators=100)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)'), 'operator_count': 3, 'internal_cv_score': -0.5285409298914097}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5290078620113645}, {'generation': 'INVALID', 'mutation_count': 7, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.529713660310341}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5314490979883193}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5323142612284427}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=36), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5344570254494924}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=13, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5372115527408662}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.5382520161974218}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.539747793345718}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5419200265242919}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.6500000000000001, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5551007043567818}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5575491468660686}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5578647227402184}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=1000)',), 'operator_count': 3, 'internal_cv_score': -0.5580485680169603}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -0.558141826934892}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 2, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=13, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5596290082737609}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5625875480737101}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 2, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=13, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)', 'RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=32), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)'), 'operator_count': 3, 'internal_cv_score': -0.5625875480737101}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.05), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5645996661332664}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=47), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5650904927981569}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5659222817539376}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=37), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=8, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5660384787526581}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5664588621662168}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=89), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.56672074045975}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=36), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=8, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5668011133999042}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=36), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5669404523220268}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=13, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5672280982980369}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.5684071259693165}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=36), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)', 'ExtraTreesRegressor(ZeroCount(SelectPercentile(input_matrix, SelectPercentile__percentile=32)), ExtraTreesRegressor__bootstrap=True, ExtraTreesRegressor__max_features=0.6500000000000001, ExtraTreesRegressor__min_samples_leaf=13, ExtraTreesRegressor__min_samples_split=17, ExtraTreesRegressor__n_estimators=200)'), 'operator_count': 3, 'internal_cv_score': -0.5684071259693165}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=37), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=8, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5726874801214599}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5738818779775587}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5760918962300835}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=32), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5760918962300835}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5771892863113667}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5792735807780065}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.45000000000000007, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5841747490929737}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5867055144239277}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5867056463769564}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=13, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5876694192685129}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=33), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.5885524671574142}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=43), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5908714583830116}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.5994665910197039}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=36), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)', 'ExtraTreesRegressor(StandardScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=20)), ExtraTreesRegressor__bootstrap=False, ExtraTreesRegressor__max_features=0.6500000000000001, ExtraTreesRegressor__min_samples_leaf=13, ExtraTreesRegressor__min_samples_split=11, ExtraTreesRegressor__n_estimators=500)'), 'operator_count': 3, 'internal_cv_score': -0.6038520140105461}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=20), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=20, RandomForestRegressor__n_estimators=100)',), 'operator_count': 3, 'internal_cv_score': -0.6085274872092868}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(FastICA(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.0001), FastICA__tol=0.65), RandomForestRegressor__bootstrap=True, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.6462788409484199}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(FastICA(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.0001), FastICA__tol=0.65), RandomForestRegressor__bootstrap=True, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.6545899218581008}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.6634546572350697}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(FastICA(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.0001), FastICA__tol=0.65), RandomForestRegressor__bootstrap=True, RandomForestRegressor__max_features=0.35000000000000003, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.6634546572350697}, {'generation': 'INVALID', 'mutation_count': 5, 'crossover_count': 1, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=71), Normalizer__norm=max), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.7734237644225525}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.7996890896172534}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Nystroem(SelectPercentile(input_matrix, SelectPercentile__percentile=14), Nystroem__gamma=0.15000000000000002, Nystroem__kernel=additive_chi2, Nystroem__n_components=6), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=10, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.7996890896172534}, {'generation': 'INVALID', 'mutation_count': 2, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Nystroem(SelectPercentile(input_matrix, SelectPercentile__percentile=14), Nystroem__gamma=0.75, Nystroem__kernel=additive_chi2, Nystroem__n_components=6), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=10, RandomForestRegressor__min_samples_split=11, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -0.7996890896172534}, {'generation': 'INVALID', 'mutation_count': 4, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(SelectPercentile(input_matrix, SelectPercentile__percentile=96), Normalizer__norm=l1), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.25000000000000006, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=500)',), 'operator_count': 3, 'internal_cv_score': -inf}, {'generation': 'INVALID', 'mutation_count': 3, 'crossover_count': 0, 'predecessor': ('RandomForestRegressor(Normalizer(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), Normalizer__norm=l2), RandomForestRegressor__bootstrap=False, RandomForestRegressor__max_features=0.7500000000000002, RandomForestRegressor__min_samples_leaf=1, RandomForestRegressor__min_samples_split=2, RandomForestRegressor__n_estimators=20)',), 'operator_count': 3, 'internal_cv_score': -inf}]), ('GradientBoostingRegressor', [{'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GradientBoostingRegressor(PolynomialFeatures(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.0001), PolynomialFeatures__degree=2, PolynomialFeatures__include_bias=False, PolynomialFeatures__interaction_only=False), GradientBoostingRegressor__alpha=0.75, GradientBoostingRegressor__learning_rate=0.01, GradientBoostingRegressor__loss=ls, GradientBoostingRegressor__max_depth=9, GradientBoostingRegressor__max_features=1.0, GradientBoostingRegressor__min_samples_leaf=1, GradientBoostingRegressor__min_samples_split=14, GradientBoostingRegressor__n_estimators=100, GradientBoostingRegressor__subsample=0.2)',), 'operator_count': 3, 'internal_cv_score': -0.7340601028290766}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.7398949682708912}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.9083310211639299}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GradientBoostingRegressor(PCA(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.2), PCA__iterated_power=5, PCA__svd_solver=randomized), GradientBoostingRegressor__alpha=0.8, GradientBoostingRegressor__learning_rate=0.01, GradientBoostingRegressor__loss=lad, GradientBoostingRegressor__max_depth=9, GradientBoostingRegressor__max_features=0.45, GradientBoostingRegressor__min_samples_leaf=16, GradientBoostingRegressor__min_samples_split=20, GradientBoostingRegressor__n_estimators=20, GradientBoostingRegressor__subsample=0.45)',), 'operator_count': 3, 'internal_cv_score': -0.9087905235167509}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GradientBoostingRegressor(FastICA(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.1), FastICA__tol=0.8), GradientBoostingRegressor__alpha=0.75, GradientBoostingRegressor__learning_rate=0.5, GradientBoostingRegressor__loss=quantile, GradientBoostingRegressor__max_depth=9, GradientBoostingRegressor__max_features=0.1, GradientBoostingRegressor__min_samples_leaf=1, GradientBoostingRegressor__min_samples_split=14, GradientBoostingRegressor__n_estimators=20, GradientBoostingRegressor__subsample=0.6000000000000001)',), 'operator_count': 3, 'internal_cv_score': -0.9559240000630987}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.9773121723225457}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -1.644452799130757}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GradientBoostingRegressor(Nystroem(SelectPercentile(input_matrix, SelectPercentile__percentile=91), Nystroem__gamma=0.55, Nystroem__kernel=chi2, Nystroem__n_components=3), GradientBoostingRegressor__alpha=0.8, GradientBoostingRegressor__learning_rate=0.01, GradientBoostingRegressor__loss=quantile, GradientBoostingRegressor__max_depth=9, GradientBoostingRegressor__max_features=0.45, GradientBoostingRegressor__min_samples_leaf=4, GradientBoostingRegressor__min_samples_split=8, GradientBoostingRegressor__n_estimators=100, GradientBoostingRegressor__subsample=0.7500000000000001)',), 'operator_count': 3, 'internal_cv_score': -1.645012338359622}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GradientBoostingRegressor(Nystroem(SelectPercentile(input_matrix, SelectPercentile__percentile=91), Nystroem__gamma=0.55, Nystroem__kernel=chi2, Nystroem__n_components=3), GradientBoostingRegressor__alpha=0.8, GradientBoostingRegressor__learning_rate=0.01, GradientBoostingRegressor__loss=quantile, GradientBoostingRegressor__max_depth=9, GradientBoostingRegressor__max_features=0.45, GradientBoostingRegressor__min_samples_leaf=4, GradientBoostingRegressor__min_samples_split=8, GradientBoostingRegressor__n_estimators=100, GradientBoostingRegressor__subsample=0.7500000000000001)',), 'operator_count': 3, 'internal_cv_score': -1.6458734739624976}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GradientBoostingRegressor(PolynomialFeatures(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.0001), PolynomialFeatures__degree=2, PolynomialFeatures__include_bias=False, PolynomialFeatures__interaction_only=False), GradientBoostingRegressor__alpha=0.75, GradientBoostingRegressor__learning_rate=0.01, GradientBoostingRegressor__loss=ls, GradientBoostingRegressor__max_depth=9, GradientBoostingRegressor__max_features=1.0, GradientBoostingRegressor__min_samples_leaf=1, GradientBoostingRegressor__min_samples_split=14, GradientBoostingRegressor__n_estimators=100, GradientBoostingRegressor__subsample=0.2)',), 'operator_count': 3, 'internal_cv_score': -inf}]), ('DecisionTreeRegressor', [{'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.6392929675471104}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('DecisionTreeRegressor(PolynomialFeatures(SelectPercentile(input_matrix, SelectPercentile__percentile=40), PolynomialFeatures__degree=2, PolynomialFeatures__include_bias=False, PolynomialFeatures__interaction_only=False), DecisionTreeRegressor__max_depth=5, DecisionTreeRegressor__min_samples_leaf=16, DecisionTreeRegressor__min_samples_split=17)',), 'operator_count': 3, 'internal_cv_score': -0.6596436463837325}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('DecisionTreeRegressor(RBFSampler(SelectPercentile(input_matrix, SelectPercentile__percentile=55), RBFSampler__gamma=0.8500000000000001), DecisionTreeRegressor__max_depth=3, DecisionTreeRegressor__min_samples_leaf=4, DecisionTreeRegressor__min_samples_split=17)',), 'operator_count': 3, 'internal_cv_score': -1.1518206595613913}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('DecisionTreeRegressor(RBFSampler(SelectPercentile(input_matrix, SelectPercentile__percentile=55), RBFSampler__gamma=0.8500000000000001), DecisionTreeRegressor__max_depth=3, DecisionTreeRegressor__min_samples_leaf=4, DecisionTreeRegressor__min_samples_split=17)',), 'operator_count': 3, 'internal_cv_score': -1.1554775152101997}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -1.1571825434056886}]), ('KNeighborsRegressor', [{'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.6580176053988469}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('KNeighborsRegressor(MaxAbsScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=97)), KNeighborsRegressor__n_neighbors=82, KNeighborsRegressor__p=1, KNeighborsRegressor__weights=uniform)',), 'operator_count': 3, 'internal_cv_score': -0.7266140656857427}]), ('LassoLarsCV', [{'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('LassoLarsCV(PCA(VarianceThreshold(input_matrix, VarianceThreshold__threshold=0.005), PCA__iterated_power=10, PCA__svd_solver=randomized), LassoLarsCV__normalize=False)',), 'operator_count': 3, 'internal_cv_score': -0.8109762291081436}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.8125928360909143}, {'generation': 0, 'mutation_count': 0, 'crossover_count': 0, 'predecessor': ('ROOT',), 'operator_count': 3, 'internal_cv_score': -0.9044340233523197}, {'generation': 'INVALID', 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('LassoLarsCV(Binarizer(SelectPercentile(input_matrix, SelectPercentile__percentile=17), Binarizer__threshold=1.0), LassoLarsCV__normalize=False)',), 'operator_count': 3, 'internal_cv_score': -0.9048426620159453}])]), 'random_state': None, 'greater_score_is_better': True, '_fitted_target': 'gap expt', '_backend': [\"('variancethreshold', VarianceThreshold(threshold=0.2))\", \"('normalizer', Normalizer(copy=True, norm='max'))\", \"('randomforestregressor', RandomForestRegressor(bootstrap=False, criterion='mse', max_depth=None,\\n max_features=0.7500000000000002, max_leaf_nodes=None,\\n min_impurity_decrease=0.0, min_impurity_split=None,\\n min_samples_leaf=1, min_samples_split=2,\\n min_weight_fraction_leaf=0.0, n_estimators=200,\\n n_jobs=None, oob_score=False, random_state=None,\\n verbose=0, warm_start=False))\"], '_features': ['MagpieData maximum Number', 'MagpieData minimum MendeleevNumber', 'MagpieData range MendeleevNumber', 'MagpieData avg_dev MendeleevNumber', 'MagpieData avg_dev AtomicWeight', 'MagpieData maximum MeltingT', 'MagpieData range MeltingT', 'MagpieData mean MeltingT', 'MagpieData avg_dev MeltingT', 'MagpieData mean Column', 'MagpieData avg_dev Column', 'MagpieData mean Row', 'MagpieData range CovalentRadius', 'MagpieData mean CovalentRadius', 'MagpieData avg_dev CovalentRadius', 'MagpieData minimum Electronegativity', 'MagpieData range Electronegativity', 'MagpieData mean Electronegativity', 'MagpieData avg_dev Electronegativity', 'MagpieData mode Electronegativity', 'MagpieData avg_dev NsValence', 'MagpieData mean NpValence', 'MagpieData avg_dev NpValence', 'MagpieData maximum NdValence', 'MagpieData range NdValence', 'MagpieData mean NdValence', 'MagpieData avg_dev NdValence', 'MagpieData minimum NValence', 'MagpieData range NValence', 'MagpieData mean NValence', 'MagpieData avg_dev NValence', 'MagpieData mean NsUnfilled', 'MagpieData mean NpUnfilled', 'MagpieData avg_dev NpUnfilled', 'MagpieData avg_dev NdUnfilled', 'MagpieData mean NUnfilled', 'MagpieData avg_dev NUnfilled', 'MagpieData minimum GSvolume_pa', 'MagpieData range GSvolume_pa', 'MagpieData mean GSvolume_pa', 'MagpieData avg_dev GSvolume_pa', 'MagpieData mode GSvolume_pa', 'MagpieData maximum GSbandgap', 'MagpieData mean GSbandgap', 'MagpieData avg_dev GSbandgap', 'MagpieData avg_dev GSmagmom', 'MagpieData mean SpaceGroupNumber', 'MagpieData avg_dev SpaceGroupNumber'], '_logger': , 'from_serialized': True, '_best_models': OrderedDict([('RandomForestRegressor', -0.455318603917173), ('XGBRegressor', -0.569127814761169), ('ExtraTreesRegressor', -0.6001026044965307), ('DecisionTreeRegressor', -0.6392929675471104), ('KNeighborsRegressor', -0.6580176053988469), ('GradientBoostingRegressor', -0.7340601028290766), ('LassoLarsCV', -0.8109762291081436), ('ElasticNetCV', -0.8164626453233076)]), 'is_fit': True}}, '_logger': , 'pre_fit_df': {'obj': , 'columns': 2, 'samples': 3683}, 'post_fit_df': {'obj': , 'columns': 49, 'samples': 3683}, 'ml_type': 'regression', 'target': 'gap expt', 'version': '2019.10.11', 'is_fit': True}\n" ] } ], "source": [ "# Explain the MatPipe's internals more comprehensively\n", "details = pipe.inspect(filename=\"MatPipe_predict_experimental_gap_from_composition_details.json\")\n", "\n", "print(details)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Access MatPipe's internal objects directly.\n", "\n", "You can access MatPipe's internal objects directly, instead of via a text digest; you just need to know which attributes to access. See the online API docs or the source code for more info." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pipeline(memory=Memory(location=/var/folders/4z/3vrw2wq10kzfh29c4x35qk3m0000gp/T/tmp79ge0rli/joblib),\n", " steps=[('variancethreshold', VarianceThreshold(threshold=0.2)),\n", " ('normalizer', Normalizer(copy=True, norm='max')),\n", " ('randomforestregressor',\n", " RandomForestRegressor(bootstrap=False, criterion='mse',\n", " max_depth=None,\n", " max_features=0.7500000000000002,\n", " max_leaf_nodes=None,\n", " min_impurity_decrease=0.0,\n", " min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " n_estimators=200, n_jobs=None,\n", " oob_score=False, random_state=None,\n", " verbose=0, warm_start=False))],\n", " verbose=False)\n" ] } ], "source": [ "# Access some attributes of MatPipe directly, instead of via a text digest\n", "print(pipe.learner.best_pipeline)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ElementProperty(data_source=,\n", " features=['Number', 'MendeleevNumber', 'AtomicWeight',\n", " 'MeltingT', 'Column', 'Row', 'CovalentRadius',\n", " 'Electronegativity', 'NsValence', 'NpValence',\n", " 'NdValence', 'NfValence', 'NValence', 'NsUnfilled',\n", " 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled',\n", " 'GSvolume_pa', 'GSbandgap', 'GSmagmom',\n", " 'SpaceGroupNumber'],\n", " stats=['minimum', 'maximum', 'range', 'mean', 'avg_dev',\n", " 'mode']), OxidationStates(stats=['minimum', 'maximum', 'range', 'std_dev']), ElectronAffinity(), IonProperty(data_source=,\n", " fast=False)]\n" ] } ], "source": [ "print(pipe.autofeaturizer.featurizers[\"composition\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Persistence of pipelines\n", "\n", "### Being able to reproduce your results is a crucial aspect of materials informatics.\n", "`MatPipe` provides methods for easily saving and loading **entire pipelines** for use by others.\n", "\n", "Save a MatPipe for later with `MatPipe.save`. Load it with `MatPipe.load`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Save the pipeline for later\n", "\n", "filename = \"MatPipe_predict_experimental_gap_from_composition.p\"\n", "pipe.save(filename)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load your saved pipeline later, or on another machine\n", "pipe_loaded = MatPipe.load(filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# This concludes the Automatminer basic tutorial\n", "\n", "Congrats! You've made it through the basic Automatminer tutorial!\n", "\n", "In this tutorial, you learned how to:\n", "\n", "1. Access a MatBench benchmarking dataset with matminer.\n", "2. Fit and make production predictions with `MatPipe`.\n", "3. Inspect the `MatPipe` pipeline.\n", "4. Save and share your results for reproducible science. \n", "\n", "\n", "If you encountered any problems running this notebook, please open an issue on the repo or post an issue on our [support forum](https://hackingmaterials.discourse.group). " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }