{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Train a Model to Predict Formation Energy using the OQMD\n", "This notebook recreates a 2016 paper by [Ward et al.](https://www.nature.com/articles/npjcompumats201628) on predicting the formation enthalpy of materials based on their composition. We will use the [Materials Data Facility](http://materialsdatafacility.org) to retrieve a training set from the the [OQMD](http://oqmd.org), compute features based on the composition of each entry, and then train a random forest model.\n", "\n", "This example was last updated on 06/07/2021 for Matminer v.0.7.0" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from matminer.data_retrieval import retrieve_MDF\n", "from matminer.featurizers.base import MultipleFeaturizer\n", "from matminer.featurizers import composition as cf\n", "from matminer.featurizers.conversions import StrToComposition\n", "from matplotlib import pyplot as plt\n", "from matplotlib.colors import LogNorm\n", "import numpy as np\n", "import pandas as pd\n", "import pickle as pkl\n", "from sklearn import metrics\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.model_selection import cross_val_score, cross_val_predict, GridSearchCV, ShuffleSplit, KFold" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Settings to change" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "quick_demo = True # Whether to run an faster version of this demo. \n", "# The full OQMD model takes about a hour to test and ~8GB of RAM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Training Set\n", "Ward _et al._ trained their machine learning models on the formation enthalpies of crystalline compounds form the [OQMD](oqmd.org). Here, we extract the data using the copy of the OQMD available through the MDF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the Data\n", "We first create a `Forge` instance, which simplifies performing search queries against the MDF." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step is to create a tool for reading from the MDF's search index." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "mdf = retrieve_MDF.MDFDataRetrieval(anonymous=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we assemble a query that gets only the converged static calculations from the OQMD. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "query_string = 'mdf.source_name:oqmd AND (oqmd.configuration:static OR '\\\n", " 'oqmd.configuration:standard) AND dft.converged:True'\n", "if quick_demo:\n", " query_string += \" AND mdf.scroll_id:<10000\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "data = mdf.get_data(query_string, unwind_arrays=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tool creates a DataFrame object with the metadata for each entry in the OQMD" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
crystal_structure.cross_reference.icsdcrystal_structure.number_of_atomscrystal_structure.space_group_numbercrystal_structure.volumedft.convergeddft.cutoff_energydft.exchange_correlation_functionalfilesmaterial.compositionmaterial.elements...oqmd.delta_e.unitsoqmd.delta_e.valueoqmd.magnetic_moment.unitsoqmd.magnetic_moment.valueoqmd.stability.unitsoqmd.stability.valueoqmd.total_energy.unitsoqmd.total_energy.valueoqmd.volume_pa.unitsoqmd.volume_pa.value
090433.01262185.154True520.0PBE[{'data_type': 'ASCII text, with very long lin...Nb1Pt1Si1[Nb, Pt, Si]...eV/atom-0.805020bohr/atom-0.000119eV/atom-0.105391eV/atom-7.996541angstrom^3/atom15.4295
1639016.0313959.293True520.0PBE[{'data_type': 'ASCII text, with very long lin...Hf2Zn1[Hf, Zn]...eV/atom-0.173969bohr/atom-0.000561eV/atom-0.042780eV/atom-7.232890angstrom^3/atom19.7643
\n", "

2 rows × 29 columns

\n", "
" ], "text/plain": [ " crystal_structure.cross_reference.icsd crystal_structure.number_of_atoms \\\n", "0 90433.0 12 \n", "1 639016.0 3 \n", "\n", " crystal_structure.space_group_number crystal_structure.volume \\\n", "0 62 185.154 \n", "1 139 59.293 \n", "\n", " dft.converged dft.cutoff_energy dft.exchange_correlation_functional \\\n", "0 True 520.0 PBE \n", "1 True 520.0 PBE \n", "\n", " files material.composition \\\n", "0 [{'data_type': 'ASCII text, with very long lin... Nb1Pt1Si1 \n", "1 [{'data_type': 'ASCII text, with very long lin... Hf2Zn1 \n", "\n", " material.elements ... oqmd.delta_e.units oqmd.delta_e.value \\\n", "0 [Nb, Pt, Si] ... eV/atom -0.805020 \n", "1 [Hf, Zn] ... eV/atom -0.173969 \n", "\n", " oqmd.magnetic_moment.units oqmd.magnetic_moment.value oqmd.stability.units \\\n", "0 bohr/atom -0.000119 eV/atom \n", "1 bohr/atom -0.000561 eV/atom \n", "\n", " oqmd.stability.value oqmd.total_energy.units oqmd.total_energy.value \\\n", "0 -0.105391 eV/atom -7.996541 \n", "1 -0.042780 eV/atom -7.232890 \n", "\n", " oqmd.volume_pa.units oqmd.volume_pa.value \n", "0 angstrom^3/atom 15.4295 \n", "1 angstrom^3/atom 19.7643 \n", "\n", "[2 rows x 29 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We only need two columns: `delta_e` and `material.composition`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "data = data[['oqmd.delta_e.value', 'material.composition']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Renaming the columns to make the rest of the code more succinct" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "data = data.rename(columns={'oqmd.delta_e.value': 'delta_e', 'material.composition':'composition'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compile the Training Set\n", "Our next step is to get only the lowest-energy entry for each composition." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d282e929093a4199bc386dce782b93c3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "StrToComposition: 0%| | 0/4849 [00:00= -20, data['delta_e'] <= 5)]\n", "print('Removed %d/%d entries'%(original_count - len(data), original_count))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build an ML model\n", "In this part of the notebook, we build a ML model using [scikit-learn](http://scikit-learn.org/stable/) and evaluate its performance using cross-validation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 1: Compute Representation\n", "The first step in building a ML model is to convert the raw materials data (here: the composition) into the required input for an ML model: a finite list of quantitative attributes. In this example, we use the \"general-purpose\" attributes of [Ward *et al* 2016](https://www.nature.com/articles/npjcompumats201628)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "feature_calculators = MultipleFeaturizer([cf.Stoichiometry(), cf.ElementProperty.from_preset(\"magpie\"),\n", " cf.ValenceOrbital(props=['avg']), cf.IonProperty(fast=True)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the feature names" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "feature_labels = feature_calculators.feature_labels()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the features" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a83b306b07e548a1a9e25f809a87e18b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "MultipleFeaturizer: 0%| | 0/4415 [00:00" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "\n", "# Plot the score as a function of alpha\n", "ax.scatter(model.cv_results_['param_max_features'].data,\n", " np.sqrt(-1 * model.cv_results_['mean_test_score']))\n", "ax.scatter([model.best_params_['max_features']], np.sqrt([-1*model.best_score_]), marker='o', color='r', s=40)\n", "ax.set_xlabel('Max. Features')\n", "ax.set_ylabel('RMSE (eV/atom)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Save our best model" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "model = model.best_estimator_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 3: Cross-validation Test\n", "Quantify the performance of this model using 10-fold cross-validation" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "cv_prediction = cross_val_predict(model, data[feature_labels], data['delta_e'], cv=KFold(10, shuffle=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute aggregate statistics" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "r2_score 0.8229782353297522\n", "mean_absolute_error 0.19136257807725038\n", "mean_squared_error 0.08679513226277784\n" ] } ], "source": [ "for scorer in ['r2_score', 'mean_absolute_error', 'mean_squared_error']:\n", " score = getattr(metrics,scorer)(data['delta_e'], cv_prediction)\n", " print(scorer, score)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestRegressor(max_features=14, n_estimators=20, n_jobs=-1)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot the individual predictions" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAANwAAADQCAYAAABsmA/6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAArY0lEQVR4nO2deXhV1dW43xUICSAQ5iEMYRIQApFBcMQBlVBRZPgEVLRALa0WWqoVtXX25/hVHKtC1SIU5FMEFEgVpIIyCFEmAWUKkDCFIYQhIQlZvz/Ozb3nhtzk3uSOyX6f5zw5w977rJt711n7rL322qKqGAyG4BAVagEMhqqEUTiDIYgYhTMYgohROIMhiBiFMxiCSPVQC1ARGjVqpAkJCaEWwxCBpKamHlXVxsG+b0QrXEJCAuvXrw+1GIYIYe/evbRu3RoRQUT2hkIG06U0VAm2bt3KpZdeyjPPPBNSOYzCGSo96enpDBw4kJiYGO6+++6QyhLRXUqDoSyysrJITk4mKyuLFStW0LZt25DKYxTOUGlRVYYPH87PP/9MSkoKSUlJoRbJdCkNlRcRYfLkyXz00Udcf/31oRYHMBbOUAlRVTZt2kSPHj0YNGhQqMVxw1g4Q6XjhRdeoGfPnqxatSrUolxA2CiciLQSkeUislVEfhKRSaGWyRB5fPjhhzz66KOMGjWKfv36hVqcCwinLmUB8GdV/UFE6gCpIvKVqm4NtWCGyGDJkiWMHz+eG2+8kffff5+oqLCxJ07CRiJVPaiqPzj2TwHbgPjQSmWIFPbu3cvw4cPp3r07n376KTVq1Ai1SCUSNgpnR0QSgEuBtSVcu09E1ovI+szMzKDLZghPWrduzcsvv8zixYupU6dOqMXxiIRbigURuQj4BnhOVeeVVrZ3795qYimrNocPH+bo0aN07drVp3oikqqqvQMklkfC6R0OEYkGPgVmlaVsBsOpU6cYNGgQhw8fZufOncTGxoZapDIJG4UTEQH+CWxT1b+HWh5DeJOXl8fw4cPZuHEjCxcujAhlg/B6h7sSuBu4XkQ2OLbwGrU0hAWFhYWMGzeOL7/8kunTp4fd4HZphI2FU9VvAQm1HIbwZ/r06cycOZPnnnuOe++9N9Ti+ETYKJzB4C333HMPMTExjBkzptRyuQWu/dgw+aWHU5fSYCiV//znPxw7doyYmBjuuecerNf+yMIonCEiSPnqawYPHsyfH/yL13Viq7u2cMEonCHs2bBhA/8zbAgdL76Y5196JdTiVAijcIawJi0tjeTkZOrWq8eCL1KoX79+qEWqEGFkbA1VnZKcHL///e/Jzc3l22+/pUNCy9AI5keMwhnCmvfff599+/b5HLoVrpgupSGk5Ba4tiIHR3UKePKFv/PLwVOcrVaPRm0TPdaJNIzCGcIKVWXChAk89cif+WbZf0Itjt8xCmcIK5588kn++c9/cv/kh7nh5l+FWhy/Y97hDEHHUwTIs6+8wdNPP82I0WN4+YXn3Qa2N+zNdu4ntalbofuEEmPhDGHB8ePHefHpv3LdjQN59pU3IzKKxBvCRO8NVYkDJ3Kc++0a1wSgQYMGTP94Ma0T2nE2Ty+wSCfz8p37npwlxeuEi1WzYyycIaRs3bqVd999F4BOlyRSs1btEEsUWMLwGWCoKhw8kM6owQPJz8/njjvuoCo8/43CGUJC9sksxo663bnIRlxcHPvTDzmvJ+HuGGkVV6tC9wsXB4pROEPQOZeby4R77mDPzl9YsmRJWCyyESyMwlVSQv1EL+7YsDtK5i9ezLo13/HYK/+gYYc+Tpd/16b1nGW+2XHMrX7ftg1LvI+3ny1cHChhIoahKnHF9QP5cPF3tG7XMdSiBJ3K/5ZqCBs+ePdNVn/7DUCVVDYwFq7SEsgu1O7MC8fRip/fn3XWrc6/Zv6LD57+C1f8ajhvvXWF87x9fM2+X7wLaf88kRi0XITPFk5EaotINX8LIiLvi8gREdni77YNoWXtimX867m/0KXPlYx55PlQixNSynwOikgUMBK4E+gDnANiROQosAh4V1V3+kGWD4E3gRl+aKtKEixHiSerVhLbN/3AExPH0iThYm6b8gYHT8P+Uy7rd+RsnnN/3GWtnfulWbGSIlUiBW8s3HKgPfAI0ExVW6lqE+AqYA3woojcVVFBVHUFcLyi7RjCi68WfkJcg4aMfno6MbXDd5GNYOHNc3CAquYXP6mqx7HWAfjUsSZAUBCR+4D7wFoxxRDe3P/os4z+7SQ2nQrP5aOCTZkKV6RsItIbeAxo46gn1mXtXpJCBgpVfQ94D6zVc4J130jA226kr13P4t07e53le1xLhp05V2iVP3uaz/7+GCMnTqFJfGuoVY9OtkCRVnVKjhopTS77tUjrRtrxpac/C3gI2AwUBkYcQ6RTkJ/HB3/7Pb/8sIr+t46wFM7gxBeFy1TVhQGTxBA0fHWo2J0Uxdl/wuX02Lz3GOs/eIp961Zy6+Tnie3Qj58PW3V7xV/kLFe3lusNpBUua1ceC20nXKJJSsMXEZ8QkenAMixPJQD+WsdNRGYD1wKNRCQdeEJV/+mPtg3BYcu8t9i3Zgldh0zg0puGh1qcsMQXhfs10BmIxtWlVMAvCqeqo/zRjiE05J/LJXP7etpdO5xOg34danHCFl8Uro+qdgqYJIYKU55xOHsde9exRf2aJe4DzPphn3N/95HTzv1+E9+iWnQN8vMKiY12T5GQmuEqZw9Sto/J2afklOaoKQ+hDuYuwpdIk1UicknAJDFEJIe2rmP51D+Rn3uW6jE1kSi/ByFVKnzR9X7ABhHZg/UO5xwWCIhkBp+p6JPbHv9ot2p/+sw92m7Tdmso4MyhXWx5/0/E1mvMgYNZNG7R2Flmd6Z7LOWdPVs499dmuOIbRiXFO/cDaYXCxaHiixgDAyaFIeLIPXGIrR9NoXpMLZLGv0x0LRNF4g1eK5yq7g2kIIbIIf/sSbZ99DCF+efoef9bxMY1DbVIEYNPhlZEegBXOw5XqupG/4tkKI3yTE0prTu1PcOVYNU+JWbcrB+c+wcPn3Krczx9HwV552g1+BGq1WtFTo4lVIumrvG1alHuTpPsc65gpNu7ld2N9LfTJFzw2mkiIpOwok2aOLaZIvKHQAlmCD+00BoNim3Uhg73vEXt+Mqxok0w8eW5MQ7oq6pnAETkRWA18EYgBDP4TnmsQMouVyzks1/97Nzfs9uVU+R01mlUlSPfvEtUTG0SBk5AJAaAWraokWrVXFbtmo7eLZxYWSyXt/gyLCDAedvxecc5QxXg+PpPOLltGRJVvdKmIQ8Gvjxf3gfWishnjuMhgAm9qgJkbf2KY+vnUrfzdTS8bGSoxYlofFG4ZljhXVc5jn+NNRPcEGAqGkFSHHtESW6+a4ZTVlauc//4AaureXZfKkdWTKNmqyQaXjGW8wXniY+Pc5arHu3qJNWp6epedqnvPkxgD1i24yk/SqCmGoUaX0S8UVUfBpzuKxH5F/Cw36UyhA1aWEBs0040vX4SEhUBv+gwx5ucJr8Dfg+0E5FNtkt1gO8CJZjBRUXjIovz9mrXkOov+7Oc+7u3pzv39XweUq06tRP60LD79VipbSzO5ble5Zs1KnnxjeIWLfusa1jAbskqOpk0EqyaHW/E/TewBHgemGI7f8qRZsFQySg4c5yDi56hfs9h1E64zE3ZDBXDmxQLJ4GTgJk+UwU4f+4Mh/7zIudzT1K9jokg8Te+RprUBzoCsUXnHNm2DH7Am5nMpUVg2LuR9kDklGJ5+jfuOOrc3/n9Bud+Yd5ZDn/7HvlZB2g28GFqNrsYgJq1Y93qx9VzHfdt65pqc11bV/By8Sk99mN/OjoqrdNERMYDk4CWwAas2QOrgesDIpkhqKgWcnTdvzl3dBeN+txJrfjEUItUKfHlmTAJKxHsGlW9TkQ6A/8vMGJVHXzNz1HcGWJ3Otit2oEzLhf/oSz3OjvX2Xxfxw9Yf1WJrtuU+o1u56J2V1DrIleukSv7uCcCurFjA+f+4G7NSpSz+Oo3/Tu64jQjOVV5RfFF4XJVNVdEEJEYVd0uImYGeCWgsOAcUdVjqN/9tlCLUunxxf2ULiJxwHzgKxFZAJgpOxHO6YObSV/1LvlnjpVd2FBhvBmHuxyrG3m749STIrIcqAekBFK4qkBFX/TtkRp258jWvSec++n7TrjVKepG5hzdxdGti4mt35rqNetxUfOWziJtOzRx7ndo5J641Z7ItWgxRYDO8a6cJPYuZHEqmnzW17bCCW8s3BggVUTmiMi9ItJMVb9R1YWqmldmbR8QkYEi8rOI7BSRKWXXMJSXcycPcGTTp9So3Zgm3YeaKJIg4c043O8AHE6SZOBDEamHtchHCvCdqp4vpQmvcCyB9RZwI5AOrBORhaq6taJtRwrlcXHb3e3HTjnThXLqlC0uct8+tzr5+Wc4vGEu1WrUpu2oF4i+yHKC1Ih15f//Tf8E537dGPeoEXsUid1p48nage+WKNIsl7d4/Q6nqttV9VVVHYg1FPAtMAJY6ydZLgN2qupuh+WcA5i3+ABQrcZF1GzcgaZ9xjiVzRAcvHmHewuYrarfFp1T1RxgsWPzF/HAfttxOtDXj+1XeQrzLasXVb0GjXsMC7E0VRNvDPcvwMsi0hyYi6V8PwZWLM9U5uWqPEWU2POOFGfaOtczarctEPlQhs1RcvwAWnieI6mzKSzIofngZ5Aoq3PTsk0jZ7GubV3WztMKN3BhFEkR9m5kZZ1eU1HK7FKq6muqejnQHzgGvC8i20XkCRG52I+yZACtbMctHeeKy/OeqvZW1d6NGzcuftlQAqrK0S0LyT22izqtejuVzRB8fE2T9yLWiqeXYs0AfxzwV6rddUBHEWmLpWgjgdF+ajts8eYJn3o4y7m/7ZB7gtVtNvf/gf2uYYGcY654yRN7V3HmwCbiuv2KOpfcSNO2rudaQguXVerV2jVp1O4YKW7R/OmirwpWzY4vWbuqi8hgEZmFNV3nZ2CovwRR1QLgAeA/wDZgrqr+5K/2qyqndqwge/tS6rS/inpdbg61OFUeb5wmN2JNzRkEfI/lPbyvKHuXP1FVfztiqjw1m3el7sXXUb/7EJP8JwzwxqA/gjUJ9c+qeqKswgbfsHep7FEj9hnSTWq5xse+Pe3+FRw96nruFRa4hkOj5Bw14pojDRrRrm8/tzrX9XQlYh1j2/fkDAkFVTYRrKper6rTgSwRuUtEHgcQkdYiclnAJTT4TO7RNNLm/oXM1f8OtSiGYvjy3HgbayHG64GngVPAp1hTdgyl4CkzFbg/ye1W7WSea3/VPtewQHQ1z8/I6Jho8rKPkLHoOarH1qL9DXcQG1ef5L4t3cq1aeCaQOppxRx/u/V9df9XFotWHF8+Vl9V7SkiPwKo6gkRqVFWJUPwKMjJZu9nT1FYkEePcW8QG9ek7EqGoOLLgEy+I95RAUSkMa6lhw0hRlXZv+gl8k4eps2tj1G7WbtQi2QoAV8s3OvAZ0ATEXkOGA78NSBSRSieZjJ7mwrOHqnx2RbXmH9+geu5dvi4+zhc3bqu7mGfUQ+Qdyablj2v49KEOOf5ge3dAwQ8BR+XB2+7fpW1i+grvgx8zxKRVOAGrDUFhqjqtoBJZvAKVeXk3p+IS+hGk069Qi2OoQy8GYcTVVWwZgwA20srU5Xx5ilud6AUx+7A+HKrK1LkbClJQDJX/ZtNn73DTY9OZ8gNrkVq7WuwBZLSHEKGC/HmHW65iPxBRNwihUWkhohc70h3fk9gxDOUxt5v57Pps3dof/WtNOnUM9TiGLzAmy7lQGAsMNsR55gF1MRS1i+BqaGcPVBVObRpBRtnv0SL7lfS79d/NVEkEYI3M75zscbg3haRaKARkKOqWQGWLaLx1AssLRB4bYYrc7x9JZpjttR41aOjyDlxmNR//o36bToz/LHXiY61ptK0KJaw1RP2VHsV7Qba61e1qTblwad/i6rmAwcDJIvBS2rWb0rSXVNo2vVyp7IZIgPzHAoQniaTFk/kaneUXFTdNdOpdoyrgaKJoaeOHSYm5zhtuvSALvcC0KOZa0pN37auTFmlWRh/OjeqclLX8uDzTEQ/Tzo1eEnu6WzmPj6eaY+MJ+9cbtkVDGFJeab+/saxZpwhSBTk5zHv2fs5tn83dz32d2rEePeuZgg/ytOlPA5McKTN2whsUNUfyqhTpfHWgXDaNr3mitZW1ElhYSHPTr6PfZu/Z/xTU7lvmHsiM2/yiARrqotxlJSNNwPfnYBfbIPfz4vIMqwZ30lYa34bhQsQSz+bzfplixjxh0fpN/D2sisYwhpvnkmfAG1E5BdgE7DZ8TdGVb8BvgmgfGFHeaaj2B0l9ik4APVqlLzYfJED5LqHJ9GvU2uGDhuOiFzgdPHGxR9Iy2Osmm94Mw6XKCIxQHes9AdngFuAriKCqpa8XpGhQnzx+UJ69e5D8+bNGTZ8RKjFMfgJr5wmqnpOVdcBp1X1D6p6g0PRugRWvKrJ9yu/ZtT/DOPxxx4JtSgGP+Nrh8AtQNnkOPGMvatl7+rtznQvZ+9iHjiTy+6fNvLMA/fSsVMXHnzyRQ6cyHGrbwKEI5syLZyIvCUi4xy5KAMSsCciI0TkJxEpFJHegbhHJHB4fxovTbyHOnENeH/OfOrUqVt2JUNE4Y2F24jljRwD1BGRrcBPwFZgq6p+7Ac5tmDluHzXD215TUXzcZTmbvc2ltJ+PGn8c0RpIdNmzqdJ0+bO82YKTOXBG6fJe/ZjEWkJJGI5UW4BKqxwRRNZq3rE+9OvvM3B9P0ktO8YalEMAcLnSBNVTVfVJcAqwPMqEwavyM/P55WXXyQ3N5d6cQ3o3K1HqEUyBBCfnCaO97jRwP8Ah4DOwP1e1l0KlDSE8JiqLvBBBr+tnuNt99BOeaageBqHU1WmPjGRGTNm0L1rF2699dYS72OvX3zGuKfkreE8PlaVp/F4E2lyMVaq89FYuSj/D7hWVfeIyB5vb6SqA8otpXs77wHvAfTu3Tui0zq88dJTzJgxg6efftpN2QyVF2+eL9uxVrYZrqqbi12L6B+8neJPWk8WxtOES3Bfxy2pTckexqKMWR9Oe5sP/vEqo+8Zz+jf/pndme7uf0/DCt4SSCtS0barmlWz48073FBgD/CliHzkWEGn5HikciIit4tIOnA5sEhE/uPP9sONk1knePN/n+fG5ME8+cKrVd5ZVJXwxks5H5gvIrWx1ty+D5guIosBvwwUqepnWDkvqwT14urzf4u+pnmLllSr5q/l9QyRgC95Kc9graLzbxGpD4wA2gRKsGDj7czl0rpT9qky9nJFjo0tmzfz6YIvGP/7SbR1uP49dRc9yRMOiVercpewopTrX+cI6XI6Lwxls2/fPm69ZSCqMGL0GOLqNyi7kqHSUSWeVd5Gitjx5LTwto6dA0eOc9stAzl75gxLl6/kknauJK2eZPN2Mqk39zeED+YrCjA5OTkMGzKY3bt28cWSL+mWmBhqkQwhpDw5TZyIyB/9JEelZcWKFaSuX8cHM2Zx9TX9Qy2OIcRU1MJNBqb6QY6A4qkb6W33siLcfPPN7NixgzZtLP9S8UgRXxOpmm5jZFMhC0eAputUBl544QU+//xzAKeyGQwVfV5GXKSJN9auPPXt16ZNm8YjjzzC2LFjGTx4sFuZ0pYcNlR+vImlPIWlWCVZM5NnuxgLFy5kwoQJJCcn88477wTtvgkJCezduzdo96sE9BKRoBmM2NjYwzk5Oc28iTSpU1YZg8Xq1asZOXIkvXr1Yu7cuURH+zUCrlT27t2LWaIvfBGRpuCdhVtY2nVVDVmYu6qrS1aeGdueyoDnQOTS2lqwYAHx8fEsWrSIiy66yKv7GCdI1ULKeiqKSCaQjhXWtZZiXUtHbsqQ0KtXb/1u7XrAPykS7JRH4VSVEydO0KCB5yiSQCmcI2Whfxoz+B3H9yPeeCmbAY8A3YDXgBuBo6r6TSiVDUDE+sHGVrd+yPbNTlEZX37cSW3qOjd7u8XbOnnyJIMHD2bbtm2cOy/UqtugzGiQ8shjqByUqXCqel5VU1T1HqAfsBP4r4g8EHDpwpxz584xZMgQUlJSSE9PD7U4FyAi3HXXXc7jgoICGjduzC233OJWbsiQIfTr18/t3JNPPkl8fDxJSUnOLSsrq9T7paamkpiYSIcOHZg4cWKJFnf79u1cfvnlxMTE8Morr7hde+211+jWrRtdu3Zl6tSpXn/OtLQ0WrZsSWFhodv5pKQk1q5dC8DBgwe56aabPLaRlZXF22+/7fU9y4tX43AiEiMiQ4GZWCkVXqcKTacpicLCQsaMGcN///tfPvjgA2688cZQi3QBtWvXZsuWLeTkWIPtX331FfHx8W5lsrKySE1N5eTJk+zevdvt2p/+9Cc2bNjg3OLi4kq93+9+9zumTZvGjh072LFjBykpKReUadCgAa+//joPPvig2/ktW7Ywbdo0vv/+ezZu3MgXX3zBzp07vfqcCQkJtG7dmpUrVzrPbd++nVOnTtG3b18AUlJSuPnmmz22ETYKJyIzgNVAT+ApVe2jqs+oakbApfOB4l01T93L3Zk5zs1efntGtttm58CJHOcG1rva5MmTmTt3Li+99JKbFSmieBe3JFmKl/OEp7a8GcMbNGgQixYtAmD27NmMGjXK7fq8efMYPHgwI0eOZM6cOWU36IGDBw+SnZ1Nv379EBHGjBnD/PnzLyjXpEkT+vTpc4EHd9u2bfTt25datWpRvXp1+vfvz7x58y6on5mZybBhw+jTpw99+vThu+++A2DUqFFu8s+ZM4eRI0c6j1NSUkhOTub06dPccMMN9OzZk8TERBYssNLpTJkyhV27dpGUlMRDDz2EqvLQQw/RrVs3EhMT+fhjKzndf//7X/r3789tt91Gu3btmDJlCrNmzeKyyy4jMTGRXbt2lf6PUtVSN6AQK5fJKawsXUXbKSC7rPqB3Hr16qWeyMl3bXZ2HTnr3Oz8mHbSbSutTk5Ojl5zzTU6adIkLSwsLPGenjZv5fRUxlN71lfpTu3atXXjxo06bNgwzcnJ0R49eujy5cv1V7/6lbPMgAEDdMWKFfrzzz9rt27dnOefeOIJbdGihfbo0UN79Oih1157raqqZmRkaHJy8gX3Wrdund5www3O4xUrVrjdpzhPPPGEvvzyy87jrVu3aseOHfXo0aN65swZ7devnz7wwAMX1Bs1apSuXLlSVVX37t2rnTt3VlXVQ4cOabNmzTQ/3/qndO7cWTdv3qyqqgUFBdqjRw9VVc3Pz9eTJ63vNzMzU9u3b6+FhYW6Z88e7dq1q/M+n3zyiQ4YMEALCgr00KFD2qpVKz1w4IAuX75c69WrpwcOHNDc3Fxt0aKFPv7446qqOnXqVJ00aVKJn9fx/Xg147ui4V8hwZM30p7lyh7XWDwHSfGYxyJy8hWpHsvCxV+SebqAPUdLXo3U2zwk3jhOKuJc6d69O2lpacyePZtBgwa5XTt8+DA7duzgqquuQkSIjo5my5YtdOvWDbC6lMW7fi1atGDx4sXlF8gDXbp04eGHH+amm26idu3aJCUllTgbfunSpWzdutV5nJ2dzenTp2natCndunVj2bJlNG3alOrVqzs/x9q1a51dS1Xl0UcfZcWKFURFRZGRkcHhw4cvuM+3337LqFGjqFatGk2bNqV///6sW7eOunXr0qePtcgKQPv27Z3vhomJiSxfvrzUzxmRyhQqVi5fyqCbB3DixAliYmKIioqMf9+tt97Kgw8+eEF3cu7cuZw4cYK2bduSkJDgVMzyEB8f7+Y4Sk9Pv+B9sSzGjRtHamoqK1asoH79+lx88YWrWxcWFrJmzRrne2VGRoZzzLOoWzlnzhy3z7pkyRIGDhwIwKxZs8jMzCQ1NZUNGzbQtGlTcnN9W8I5JibGuR8VFeU8joqKoqCg9H5+ZPxiwoDNG3/g/rGjOXb0aMQoWhFjx47liSeeILHYXLzZs2eTkpJCWloaaWlppKamlvs9rnnz5tStW5c1a9agqsyYMYPbbrut7Io2jhw5Aliz4+fNm8fo0aMvKHPTTTfxxhtvOI83bNjg3B86dCiLFy/m448/dnt/W7ZsGQMGWFkaT548SZMmTYiOjmb58uXOcLg6depw6tQpZ52rr76ajz/+mPPnz5OZmcmKFSu47LLLfPo8JRHRI0HnCgqdXb/SunCeumSekqiCe8LWGrmZ3HfnUBo2asj8L5ZQr169EutXNHlsoFLbtWzZkokTJ7qdS0tLY+/evW7DAW3btqVevXpOV/qrr77KzJkzndfnz59PjRo1GD9+fIndyrfffpt7772XnJwckpOTSU5OBnDGlE6YMIFDhw7Ru3dvsrOziYqKYurUqWzdupW6desybNgwjh07RnR0NG+99VaJXtHXX3+d+++/n+7du1NQUMA111zjbD8uLo7LL7+cQ4cO0a5dO8ByssTGxlKnjhWheOeddzJ48GASExPp3bs3nTt3BqBhw4ZceeWVdOvWjeTkZF566SVWr15Njx49EBFeeuklmjVrxvbt28v1HRRRZqRJMBCRl4HBQB6wC/i1qmaVVS8xqacu+MryUvk7d+OGvZan8vjRTH5zx82czMri62++4+JOnTy2F0qFM5EmJTNz5kzS09OZMmVKSOUoijQJFwv3FfCIqhaIyItYkS0Pl1UppnqUU9HKlWuklOV6izJw7ThziJgaNfjiiy/o3rVTha1QaaFd/kyvbrAoacgmlITF16eqX9oO1wDDQyVLEQUFBVSrVo2OF1/M5s2bTf5Ig18Ix7f/scCSUAqgqkz4zTgm3v87VNUom8FvBM3CebN6jog8BhQAs0ppx6vVc3xdHNFe5m+PPcqsmTN46qmnqBktXq0z4C2lrWFQ0dnohvAnaF+llrF6jojci7XA4w1aytu/Bnj1nLfffINXXnqB++67j7/97W/+bt5QxQmLZ6eIDAT+AvRX1bPe1rMPCxTHbr3sFsLT8r27M3NYvHAeD06exICBt/C/r73FufNyQf3ShhI8EY4Wav78+SxatIjs7GzGjRtXaiS9wX+Eyzvcm0Ad4CsR2SAiwUsGYqNmrVpcec31vPbuv6hePQy1pBy8++67NGvWjB49etC+fXtmzJgBWFNypk2bxjvvvOMMzPWFlJQUOnXqRIcOHXjhhRdKLPPqq6/StWtXunXrxqhRo8jNzWX//v1cd911XHLJJXTt2pXXXnutQp8v4tAQBh9XdOvW41K3wGL75kvw8pkzZ5zndh4+U2p9b4KNfcFf96GE4GVV1fvvv1//8Y9/qKrq2rVrtWHDhm7XJ0+erKmpqT7JXFBQoO3atdNdu3bpuXPntHv37vrTTz+5lUlPT9eEhAQ9e9b6P48YMUI/+OADPXDggPN+2dnZ2rFjxwvqVkbwNng5nKlRLcrZxSut2+apG5lbYIURDbjuap595hnGjBlTYv1QpLLzthtalmybNm1i2LBhgBVJUqNGDcB60E6ZMoXk5GR69uzpk2zff/89HTp0cEZzjBw5kgULFnDJJZe4lSsoKCAnJ4fo6GjOnj1LixYtaN68uTPwt06dOnTp0oWMjIwL6lZWwqVLGRKOH7cW2cg+eZJLL7001OIEhM2bN9OpUydUlTfffJPnnnsOgDfeeIOlS5fyySefuKXzGzRoEAcOHCi1zYyMDFq1auU8btmyJRkZ7tMj4+PjefDBB2ndujXNmzenXr16F7wnpqWl8eOPPzoj+asCEW3hykORtcvNyWHcSNciG8UDe+2E81prpdXfv38/p06dYtCgQWRkZNC9e3eefPJJACZOnHhBfCXgt6k3J06cYMGCBezZs4e4uDhGjBjBzJkznZEfp0+fZtiwYUydOpW6df2yrmdEUCUt3Pnz5/njhHtZu2Z1pV5kY/PmzVxzzTVs2LCBX375he3bt7N69eoKtxsfH8/+/fudxyVNxVm6dClt27alcePGREdHM3ToUFatWgVAfn4+w4YN484772To0KEVlieSqJIKFxUVRVKvPrzy99cYOizkUWQBY9OmTc6ucv369Rk9erQz3UJF6NOnDzt27GDPnj3k5eUxZ84cbr3VPT1p69atWbNmDWfPnkVVWbZsGV26dEFVGTduHF26dGHy5MkVliXi0DDwNpZ3Ky3FgicyMzN9ruNvz2Qg7kkJXsrRo0frRx995Dz+5ptvNCkpqdR2kpOTNSMjo8z7LVq0SDt27Kjt2rXTZ599tsT6jz/+uHbq1Em7du2qd911l+bm5urKlSsV0MTERGf6hkWLFnn3ISMYx/cTHtNzykvv3r11/fr1XpefPn06Dz30EN99951PXrFQhFn5ek8zPSe88SURbNii6n32qs8//5zf/va3JF7am2pxLT1GqJSEp8StvmbQ8gWTLLZyEtEK5y1r1qzhjjvuoGfPnrz1z38HdZENg8FOpX9+7t69m1tuucW5yMZpKXmRDYMhGFQahfPUpWvYNJ7hI+5g4h8nU7dBE+p6qFOe1Aemu2fwlUr7k8nOzqagoIAGDRow9Y23Qi2OwQBEuMLlnS90Tgi1T5s5d+4co0bcztGjR0lNTa1w5L+3a8qF0uK1adMGEbPkergSGxt7GCJc4UqisLCQ8WPv4euvv2bGjBmVZppNWaSlpYVahIhCRFJVtXew71upvJSqysMP/ZlP5n7Miy++yN133x1qkQwGNyL68W9Pkwfwzjvv8ubrU5k0aRIPPfRQwO9vnCYGX6lUP5nbbruN9PR0nn76afM+YwhLKkWXcvPmzRQUFNC8eXOeffbZiMv9b6g6RPwv84cffuCKK67g4YfLTNRsMISciA5eTkxM1CNHjlCzZk1WrVpFixYtQi2SIUIwXspysGPHDgoKCkhJSfG7sgUqKNlQtYlop0leXh5ff/21c8khgyHciegupYhkAnsDeItGwNEAth8qzOeCNqraOJDClEREK1ygEZH1oejnBxrzuUJHRL/DGQyRhlE4gyGIGIUrnfdCLUCAMJ8rRJh3OIMhiBgLZzAEEaNwBkMQMQpXBiLysohsF5FNIvKZiMSFWqaKICIDReRnEdkpIlNCLY8/EJFWIrJcRLaKyE8iMinUMnnCvMOVgYjcBHytqgUi8iKAqkZkpLSIVAN+AW4E0oF1wChV3RpSwSqIiDQHmqvqDyJSB0gFhoTj5zIWrgxU9UtVLYqoXAO0DKU8FeQyYKeq7lbVPGAOcFuIZaowqnpQVX9w7J8CtgHxpdcKDUbhfGMssCTUQlSAeGC/7TidMP1hlhcRSQAuBdaGWJQSiejgZX8hIkuBZiVcekxVFzjKPAYUALOCKZvBe0TkIuBT4I+qmh1qeUrCKBygqgNKuy4i9wK3ADdoZL/0ZgCtbMctHeciHhGJxlK2Wao6L9TyeMI4TcpARAYCfwf6q2pmqOWpCCJSHctpcgOWoq0DRqvqTyEVrIKIlcDmX8BxVf1jiMUpFaNwZSAiO4EY4Jjj1BpVnRBCkSqEiAwCpgLVgPdV9bnQSlRxROQqYCWwGSh0nH5UVf2zfrIfMQpnMAQR46U0GIKIUTiDIYgYhTMYgohROIMhiBiFMxiCiFE4gyGIGIUzGIKIUTgvEJHzIrLBMddqo4j8WUSiil0r2nrZ9g+JSIbtuEYJbQ8RERWRzsXO/1ZE/lHs3BYR6VKGrDVF5BvHVBxPZZaLyM3Fzv2x6H4i8o6IXFlK/TgR+X1pcviCiNQQkRWOSJhKjVE478hR1SRV7Yo1lywZeKLYtaIttWgfeAd41XYtr4S2RwHrHX/tJAI/FB2ISCyQgBWaVRpjgXmqer6UMrOBkcXOjXScB+iHNRXJE3GA3xTO8X9ZBtzhrzbDFaNwPqKqR4D7gAekgovQOaLbrwXGc6HCdcemcFgK+EsZigRwJ1A0w+EuEfneYV3ftVm9T4BfFVlcx5SWFsBKhwV13kdE5otIqsO63+eo/wLQ3tHuy45ykx0WeIuI/LGoXcds+Q9F5BcRmSUiA0TkOxHZISKX2eSe75C9cqOqZitjA06XcC4LaAqcBzY4ts+KlXkSeLCUdu/Eim4HS7l62a4dw0rjnubYjgIfliFnDeCQY78L8DkQ7Th+GxhjK/sFcJtjfwrwimN/MjDWVq6B429NYAvQEMvSbrGV6YUVx1gbuAj4CWtOWgLWlKZErId7KvA+IFgTX+fb2qgGZIb6uw70Vun7zEEgR63uY3kYBUxz7M91HKeKSCusH5/zvU5E3gT22I4fwfrxT1fV7Y7TjbAeBGDNCOgFrHMY4prAEdu9i7qVCxx/xznO3wz82lZuoojc7thvBXQEDhX7HFdhPWzOOGSbB1wNLAT2qOpmx/mfgGWqqiKyGUshAVDV8yKSJyJ11Jq1XSkxXcpyICLtsCzbkbLKltJGA6AvkOI4NRe4w9FNTcSyEnYuATY56vbFUs40m7IB5ACxRbcA/qWu98dOqvqkrewC4AYR6QnUUtVUEakFxKnqAcd9rgUGAJerag/gR1v73nLOtl9oOy7kwvmYMUCuj+1HFEbhfEREGmM5Q95UR1+onAwHFqvqOQBV3Q0cxLIM3YHiCXC6YnXbAH4GvlHVN+0FVPUEUM3hYFkGDBeRJg65G4hIG1vZ08ByrC5ekbPkOse5IuoBJ1T1rMOL2s9x/hRQx1ZuJTBERGqJSG3gdsc5rxGRhsBRVc33pV6kYbqU3lFTRDYA0VjvJB9hTUqtCKOAHiKSZjvX0HG+LtY7FuC0hqKqRV25JGCjh3a/BK5S1aUi8lfgS8cQRj5wP+7Le80GPsPlsUzGcqgUkQJMEJFtWEq+BkBVjzkcH1uAJar6kIh8CHzvqDddVX90OGO85TpgkQ/lIxIzHy4CcXgBv1XV9SVc6wn8SVXvLke7PwB9Q2FlHO99U1S1rGGPiMYoXCVERMZivb+VNYQQFjiGJ0aq6oxQyxJojMIZDEHEOE0MhiBiFM5gCCJG4QyGIGIUzmAIIkbhDIYgYhTOYAgiRuEMhiDy/wHO/or1ucsZpAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "\n", "ax.hist2d(pd.to_numeric(data['delta_e']), cv_prediction, norm=LogNorm(), bins=64, cmap='Blues', alpha=0.9)\n", "\n", "ax.set_xlim(ax.get_ylim())\n", "ax.set_ylim(ax.get_xlim())\n", "\n", "mae = metrics.mean_absolute_error(data['delta_e'], cv_prediction)\n", "r2 = metrics.r2_score(data['delta_e'], cv_prediction)\n", "ax.text(0.5, 0.1, 'MAE: {:.2f} eV/atom\\n$R^2$: {:.2f}'.format(mae, r2),\n", " transform=ax.transAxes,\n", " bbox={'facecolor': 'w', 'edgecolor': 'k'})\n", "\n", "ax.plot(ax.get_xlim(), ax.get_xlim(), 'k--')\n", "\n", "ax.set_xlabel('DFT $\\Delta H_f$ (eV/atom)')\n", "ax.set_ylabel('ML $\\Delta H_f$ (eV/atom)')\n", "\n", "fig.set_size_inches(3, 3)\n", "fig.tight_layout()\n", "fig.savefig('oqmd_cv.png', dpi=320)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }