{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*author: Joseph Montoya*\n", "\n", "This notebook demonstrates a few basic examples from matminer's data retrieval features. Matminer supports data retrieval from the following sources.\n", "\n", "* [Materials Project](https://materialsproject.org)\n", "* [Citrine Informatics](https://citrination.com)\n", "* [The Materials Platform for Data Science (MPDS)](https://mpds.io) \n", "* [The Materials Data Facility](https://materialsdatafacility.org/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook was last updated 11/15/18 for version 0.4.5 of matminer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each resource has a corresponding object in matminer designed for retrieving data and preprocessing it into a pandas dataframe. In addition, matminer can also access and aggregate data from your own [mongo database](https://www.mongodb.com/), if you have one.\n", "\n", "![data retrieval](data_retrieval.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Materials Project" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The materials project data retrieval tool, `matminer.data_retrieval.retrieve_MP.MPDataRetrieval` is initialized using an api_key that can be found on your personal dashboard page on [materialsproject.org](materialsproject.org) if you've created an account. If you've set your api key via pymatgen (e.g. `pmg config --add PMG_MAPI_KEY YOUR_API_KEY_HERE`), the data retrieval tool may be initialized without an input argument." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from matminer.data_retrieval.retrieve_MP import MPDataRetrieval" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "mpdr = MPDataRetrieval() # or MPDataRetrieval(api_key=YOUR_API_KEY here)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Getting a dataframe corresponding to the materials project is essentially equivalent to using the MPRester's query method.(see [`pymatgen.ext.matproj.MPRester`](http://pymatgen.org/_modules/pymatgen/ext/matproj.html)) The inputs are `criteria`, a mongo-style dictionary with which to filter the data, and `properties`, a list of supported properties which to return. See the [MAPI documentation](https://github.com/materialsproject/mapidoc/tree/master/materials) for a list of and information about supported properties." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example 1: Get densities of all elemental materials, i. e. those that contain one element" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 565 entries on MP with 1 element\n" ] } ], "source": [ "df = mpdr.get_dataframe(criteria={\"nelements\": 1}, properties=['density', 'pretty_formula'])\n", "print(\"There are {} entries on MP with 1 element\".format(df['density'].count()))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
densitypretty_formula
material_id
mp-8626908.281682Ac
mp-100188.305509Ac
mp-1249.948341Ag
mp-9897379.922633Ag
mp-85669.909385Ag
\n", "
" ], "text/plain": [ " density pretty_formula\n", "material_id \n", "mp-862690 8.281682 Ac\n", "mp-10018 8.305509 Ac\n", "mp-124 9.948341 Ag\n", "mp-989737 9.922633 Ag\n", "mp-8566 9.909385 Ag" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example 2: Get all bandgaps larger than 4.0 eV" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "df = mpdr.get_dataframe({\"band_gap\": {\"$gt\": 4.0}}, ['pretty_formula', 'band_gap'])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 6232 entries on MP with a band gap larger than 4.0\n" ] } ], "source": [ "print(\"There are {} entries on MP with a band gap larger than 4.0\".format(df['band_gap'].count()))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df.to_csv('ss.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example 3: Get all VRH shear and bulk moduli from the \"elasticity\" sub-document for which no warnings are found" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "df = mpdr.get_dataframe({\"elasticity\": {\"$exists\": True}, \"elasticity.warnings\": []},\n", " ['pretty_formula', 'elasticity.K_VRH', 'elasticity.G_VRH'])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 13934 elastic entries on MP with no warnings\n" ] } ], "source": [ "print(\"There are {} elastic entries on MP with no warnings\".format(df['elasticity.K_VRH'].count()))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
elasticity.K_VRHelasticity.G_VRH
count13934.0000013934.000000
mean100.5629441.112315
std71.93579122.037330
min-0.00000-8480.000000
25%44.0000016.000000
50%84.0000033.000000
75%144.0000063.000000
max591.000005303.000000
\n", "
" ], "text/plain": [ " elasticity.K_VRH elasticity.G_VRH\n", "count 13934.00000 13934.000000\n", "mean 100.56294 41.112315\n", "std 71.93579 122.037330\n", "min -0.00000 -8480.000000\n", "25% 44.00000 16.000000\n", "50% 84.00000 33.000000\n", "75% 144.00000 63.000000\n", "max 591.00000 5303.000000" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let us do a more sophisticated query and ask for more properties such as \"bandstructure\" and \"dos\" (density of states) that are stored as pymatgen objects. The query commands under criteria are common MongoDB syntax." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "df = mpdr.get_dataframe(criteria={\"elasticity\": {\"$exists\": True}, \n", " \"elasticity.warnings\": [],\n", " \"elements\": {\"$all\": [\"Pb\", \"Te\"]},\n", " \"e_above_hull\": {\"$lt\": 1e-6}}, # to limit the number of hits for the sake of time\n", " properties = [\"elasticity.K_VRH\", \"elasticity.G_VRH\", \"pretty_formula\", \n", " \"e_above_hull\", \"bandstructure\", \"dos\"])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 3 elastic entries on MP with no warnings that contain Pb and Te with energy above hull ~ 0.0 eV\n" ] } ], "source": [ "print(\"There are {} elastic entries on MP with no warnings that contain \"\n", " \"Pb and Te with energy above hull ~ 0.0 eV\".format(df['elasticity.K_VRH'].count()))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
elasticity.K_VRHelasticity.G_VRHpretty_formulae_above_hullbandstructuredos
material_id
mp-1971740.024.0TePb0<pymatgen.electronic_structure.bandstructure.B...Complete DOS for Full Formula (Te1 Pb1)\\nReduc...
mp-2074025.013.0Tl4Te3Pb0<pymatgen.electronic_structure.bandstructure.B...Complete DOS for Full Formula (Tl8 Te6 Pb2)\\nR...
mp-60502834.016.0Te2Pd3Pb20<pymatgen.electronic_structure.bandstructure.B...Complete DOS for Full Formula (Te4 Pd6 Pb4)\\nR...
\n", "
" ], "text/plain": [ " elasticity.K_VRH elasticity.G_VRH pretty_formula e_above_hull \\\n", "material_id \n", "mp-19717 40.0 24.0 TePb 0 \n", "mp-20740 25.0 13.0 Tl4Te3Pb 0 \n", "mp-605028 34.0 16.0 Te2Pd3Pb2 0 \n", "\n", " bandstructure \\\n", "material_id \n", "mp-19717 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "from pymatgen.electronic_structure.plotter import BSDOSPlotter\n", "\n", "mpid = 'mp-20740'\n", "idx = df.index[df.index==mpid][0]\n", "plt = BSDOSPlotter().get_plot(bs=df.loc[idx, 'bandstructure'], dos=df.loc[idx, 'dos']);\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Citrine informatics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Citrination data retrieval tool, `matminer.data_retrieval.retrieve_Citrine.CitrineDataRetrieval` is initialized using an api_key that can be found on your \"Account Settings\" tab under your username in the upper right hand corner of the user interface at [citrination.com](citrination.com). You can also set an environment variable, `CITRINE_KEY` to have your API key read automatically by the citrine informatics python API, (e. g. put `export CITRINE_KEY=YOUR_API_KEY_HERE` into your .bashrc)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "from matminer.data_retrieval.retrieve_Citrine import CitrineDataRetrieval" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example 1: Get band gaps of various entries with formula PbTe" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "cdr = CitrineDataRetrieval() # or CitrineDataRetrieval(api_key=YOUR_API_KEY) if $CITRINE_KEY is not set" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 7/7 [00:00<00:00, 73.09it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "all available fields:\n", "['chemicalFormula', 'Band gap-dataType', 'references', 'Band gap', 'Band gap-conditions', 'category', 'uid', 'Crystallinity', 'Band gap-units', 'Band gap-methods']\n", "\n", "suggested common fields:\n", "['chemicalFormula', 'references', 'Band gap', 'Band gap-conditions', 'Band gap-dataType', 'Band gap-methods', 'Band gap-units', 'Crystallinity']\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "df = cdr.get_dataframe(criteria={'formula':'Si', 'data_type': 'EXPERIMENTAL'}, \n", " properties=['Band gap'],\n", " secondary_fields=True)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "cdr.get_dataframe?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Example 2: Get adsorption energies of O\\* and OH\\*" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 9/9 [00:00<00:00, 15.47it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "all available fields:\n", "['chemicalFormula', 'Surface facet', 'Adsorption energy of OH-conditions', 'references', 'Morphology', 'category', 'uid', 'Adsorption energy of OH-units', 'Adsorption energy of OH', 'Adsorption energy of OH-dataType']\n", "\n", "suggested common fields:\n", "['chemicalFormula', 'references', 'Adsorption energy of OH', 'Adsorption energy of OH-conditions', 'Adsorption energy of OH-dataType', 'Adsorption energy of OH-units', 'Morphology', 'Surface facet']\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 21/21 [00:00<00:00, 35.11it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "all available fields:\n", "['Reconstruction', 'chemicalFormula', 'Surface facet', 'references', 'Adsorption energy of O-units', 'category', 'uid', 'Adsorption energy of O-conditions', 'Adsorption energy of O']\n", "\n", "suggested common fields:\n", "['chemicalFormula', 'references', 'Adsorption energy of O', 'Adsorption energy of O-conditions', 'Adsorption energy of O-units', 'Reconstruction', 'Surface facet']\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "df_OH = cdr.get_dataframe(criteria={}, properties=['adsorption energy of OH'], secondary_fields=True)\n", "df_O = cdr.get_dataframe(criteria={}, properties=['adsorption energy of O'], secondary_fields=True)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chemicalFormulareferencesAdsorption energy of OHAdsorption energy of OH-conditionsAdsorption energy of OH-dataTypeAdsorption energy of OH-unitsMorphologySurface facet
1Pt[{'citation': '10.1039/c2cc30281k', 'doi': '10...2.44NaNNaNeVNaN(111)
2Cu[{'citation': '10.1016/s1872-2067(12)60642-1',...-3.55NaNCOMPUTATIONALeVNaN(211)
3ZnO[{'citation': '10.1016/s1872-2067(12)60642-1',...-3.03NaNCOMPUTATIONALeVThin filmNaN
4Fe[{'citation': '10.1016/j.corsci.2012.11.011', ...-3.95NaNNaNeVNaN(100)
5Pt[{'citation': '10.1021/jp807094m', 'doi': '10....2.71[{'name': 'Site', 'scalars': [{'value': 'Top s...NaNeVNaN(111)
\n", "
" ], "text/plain": [ " chemicalFormula references \\\n", "1 Pt [{'citation': '10.1039/c2cc30281k', 'doi': '10... \n", "2 Cu [{'citation': '10.1016/s1872-2067(12)60642-1',... \n", "3 ZnO [{'citation': '10.1016/s1872-2067(12)60642-1',... \n", "4 Fe [{'citation': '10.1016/j.corsci.2012.11.011', ... \n", "5 Pt [{'citation': '10.1021/jp807094m', 'doi': '10.... \n", "\n", " Adsorption energy of OH Adsorption energy of OH-conditions \\\n", "1 2.44 NaN \n", "2 -3.55 NaN \n", "3 -3.03 NaN \n", "4 -3.95 NaN \n", "5 2.71 [{'name': 'Site', 'scalars': [{'value': 'Top s... \n", "\n", " Adsorption energy of OH-dataType Adsorption energy of OH-units Morphology \\\n", "1 NaN eV NaN \n", "2 COMPUTATIONAL eV NaN \n", "3 COMPUTATIONAL eV Thin film \n", "4 NaN eV NaN \n", "5 NaN eV NaN \n", "\n", " Surface facet \n", "1 (111) \n", "2 (211) \n", "3 NaN \n", "4 (100) \n", "5 (111) " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_OH.head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chemicalFormulareferencesAdsorption energy of OAdsorption energy of O-conditionsAdsorption energy of O-unitsReconstructionSurface facet
1Fe[{'citation': '10.1016/j.jcat.2007.04.018', 'd...-5.42NaNeVNaN(111)
2Pt[{'citation': '10.1002/cctc.201100308', 'doi':...1.53NaNeVNaN(111)
3Pt[{'citation': '10.1021/jp307055j', 'doi': '10....-4.54NaNeVNaN(111)
4Co[{'citation': '10.1021/jp710674q', 'doi': '10....2.37[{'name': 'Site', 'scalars': [{'value': 'FCC s...eVNaN(0001)
5Rh[{'citation': '10.1007/bf00806980', 'doi': '10...-300NaNkJ/molNaN(110)
\n", "
" ], "text/plain": [ " chemicalFormula references \\\n", "1 Fe [{'citation': '10.1016/j.jcat.2007.04.018', 'd... \n", "2 Pt [{'citation': '10.1002/cctc.201100308', 'doi':... \n", "3 Pt [{'citation': '10.1021/jp307055j', 'doi': '10.... \n", "4 Co [{'citation': '10.1021/jp710674q', 'doi': '10.... \n", "5 Rh [{'citation': '10.1007/bf00806980', 'doi': '10... \n", "\n", " Adsorption energy of O Adsorption energy of O-conditions \\\n", "1 -5.42 NaN \n", "2 1.53 NaN \n", "3 -4.54 NaN \n", "4 2.37 [{'name': 'Site', 'scalars': [{'value': 'FCC s... \n", "5 -300 NaN \n", "\n", " Adsorption energy of O-units Reconstruction Surface facet \n", "1 eV NaN (111) \n", "2 eV NaN (111) \n", "3 eV NaN (111) \n", "4 eV NaN (0001) \n", "5 kJ/mol NaN (110) " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_O.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MPDS - The Materials Platform for Data Science\n", "\n", "The [Materials Platform for Data Science](https://mpds.io/) interface is contained in `matminer.data_retrieval.retrieve_MPDS.MPDSDataRetrieval`, and is invoked using an API key and an optional endpoint. Similarly to the Citrine and MP interfaces, MPDS can be invoked without specifying your API key if MPDS_KEY is set as an environment variable (e. g. put `export MPDS_KEY=YOUR_MPDS_KEY` into your .bashrc or .bash_profile)." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "from matminer.data_retrieval.retrieve_MPDS import MPDSDataRetrieval" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": true }, "outputs": [], "source": [ "mpdsdr = MPDSDataRetrieval() # or MPDSDataRetrieval(api_key=YOUR_API_KEY)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `get_dataframe` method of the MPDSDataRetrieval class uses a search functionality documented on the [MPDS website](http://developer.mpds.io/#Categories). Basically, the `search` keyword argument should take a dictionary with keys and values corresponding to search categories and values. Note that the search functionality of the MPDS interface may be severely limited without full (i.e. paid subscription) access to the database." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r\n", "Got 5 hits\r\n" ] } ], "source": [ "df = mpdsdr.get_dataframe(criteria={\"elements\": \"K-Ag\", \"props\": \"heat capacity\"})" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PhaseFormulaSGEntryPropertyUnitsValue
030000KAg4I5 rt213P1201629-3heat capacity at constant pressureJ K-1 g-at.-130.5
179286K2NaAg3[CN]6 lt12P1307433-3heat capacity at constant pressureJ K-1 g-at.-183.0
279286K2NaAg3[CN]6 lt12P1307433-4heat capacity at constant pressureJ K-1 g-at.-1112.0
379286K2NaAg3[CN]6 lt12P1307434-3heat capacity at constant pressureJ K-1 g-at.-178.0
479286K2NaAg3[CN]6 lt12P1307434-4heat capacity at constant pressureJ K-1 g-at.-111.0
\n", "
" ], "text/plain": [ " Phase Formula SG Entry \\\n", "0 30000 KAg4I5 rt 213 P1201629-3 \n", "1 79286 K2NaAg3[CN]6 lt 12 P1307433-3 \n", "2 79286 K2NaAg3[CN]6 lt 12 P1307433-4 \n", "3 79286 K2NaAg3[CN]6 lt 12 P1307434-3 \n", "4 79286 K2NaAg3[CN]6 lt 12 P1307434-4 \n", "\n", " Property Units Value \n", "0 heat capacity at constant pressure J K-1 g-at.-1 30.5 \n", "1 heat capacity at constant pressure J K-1 g-at.-1 83.0 \n", "2 heat capacity at constant pressure J K-1 g-at.-1 112.0 \n", "3 heat capacity at constant pressure J K-1 g-at.-1 78.0 \n", "4 heat capacity at constant pressure J K-1 g-at.-1 11.0 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MDF - The Materials Data Facility\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The MDF data retrieval tool, `matminer.data_retrieval.retrieve_MDF.MDFDataRetrieval` is initialized using a Globus initialization key. Upon the first invocation of a MDFDataRetrieval object, you should be prompted with a string of numbers and letters you can enter on the MDF Globus authentication web site. One advantage of this system is that it doesn't actually require authentication at all. You can use `anonymous=True` and several of the MDF datasets will be available. However, a number of them will not, and you will have to authenticate using the web to access the entirety of MDF." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "from matminer.data_retrieval.retrieve_MDF import MDFDataRetrieval" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "mdf_dr = MDFDataRetrieval(anonymous=True) # Or anonymous=False if you have a Globus login" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "df = mdf_dr.get_dataframe(criteria={'elements': ['Ag', 'Be'], 'sources': [\"oqmd\"]})" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
crystal_structure.cross_reference.icsdcrystal_structure.number_of_atomscrystal_structure.space_group_numbercrystal_structure.volumedft.convergeddft.cutoff_energydft.exchange_correlation_functionalfiles.0.data_typefiles.0.filenamefiles.0.globus...oqmd.delta_e.unitsoqmd.delta_e.valueoqmd.magnetic_moment.unitsoqmd.magnetic_moment.valueoqmd.stability.unitsoqmd.stability.valueoqmd.total_energy.unitsoqmd.total_energy.valueoqmd.volume_pa.unitsoqmd.volume_pa.value
0NaN222124.0794True520.0PBEASCII text, with very long lines, with no line...86132.jsonglobus://e38ee745-6d04-11e5-ba46-22000b92c6ec/......NaNNaNbohr/atomNaNNaNNaNeV/atom-3.105143angstrom^3/atom12.0397
1NaN422540.8748True249.8PBEASCII text, with very long lines, with no line...113626.jsonglobus://e38ee745-6d04-11e5-ba46-22000b92c6ec/......NaNNaNbohr/atomNaNNaNNaNeV/atom-3.272963angstrom^3/atom10.2187
2NaN222125.2675True249.8PBEASCII text, with very long lines, with no line...537497.jsonglobus://e38ee745-6d04-11e5-ba46-22000b92c6ec/......NaNNaNbohr/atomNaNNaNNaNeV/atom-3.125454angstrom^3/atom12.6338
3NaN413940.6980True520.0PBEASCII text, with very long lines, with no line...71045.jsonglobus://e38ee745-6d04-11e5-ba46-22000b92c6ec/......eV/atom0.201601bohr/atomNaNeV/atom0.201601eV/atom-3.320249angstrom^3/atom10.1745
4NaN422540.8748True520.0PBEASCII text, with very long lines, with no line...113627.jsonglobus://e38ee745-6d04-11e5-ba46-22000b92c6ec/......eV/atom0.222443bohr/atomNaNeV/atom0.222443eV/atom-3.299407angstrom^3/atom10.2187
\n", "

5 rows × 47 columns

\n", "
" ], "text/plain": [ " crystal_structure.cross_reference.icsd crystal_structure.number_of_atoms \\\n", "0 NaN 2 \n", "1 NaN 4 \n", "2 NaN 2 \n", "3 NaN 4 \n", "4 NaN 4 \n", "\n", " crystal_structure.space_group_number crystal_structure.volume \\\n", "0 221 24.0794 \n", "1 225 40.8748 \n", "2 221 25.2675 \n", "3 139 40.6980 \n", "4 225 40.8748 \n", "\n", " dft.converged dft.cutoff_energy dft.exchange_correlation_functional \\\n", "0 True 520.0 PBE \n", "1 True 249.8 PBE \n", "2 True 249.8 PBE \n", "3 True 520.0 PBE \n", "4 True 520.0 PBE \n", "\n", " files.0.data_type files.0.filename \\\n", "0 ASCII text, with very long lines, with no line... 86132.json \n", "1 ASCII text, with very long lines, with no line... 113626.json \n", "2 ASCII text, with very long lines, with no line... 537497.json \n", "3 ASCII text, with very long lines, with no line... 71045.json \n", "4 ASCII text, with very long lines, with no line... 113627.json \n", "\n", " files.0.globus ... \\\n", "0 globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/... ... \n", "1 globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/... ... \n", "2 globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/... ... \n", "3 globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/... ... \n", "4 globus://e38ee745-6d04-11e5-ba46-22000b92c6ec/... ... \n", "\n", " oqmd.delta_e.units oqmd.delta_e.value oqmd.magnetic_moment.units \\\n", "0 NaN NaN bohr/atom \n", "1 NaN NaN bohr/atom \n", "2 NaN NaN bohr/atom \n", "3 eV/atom 0.201601 bohr/atom \n", "4 eV/atom 0.222443 bohr/atom \n", "\n", " oqmd.magnetic_moment.value oqmd.stability.units oqmd.stability.value \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN eV/atom 0.201601 \n", "4 NaN eV/atom 0.222443 \n", "\n", " oqmd.total_energy.units oqmd.total_energy.value oqmd.volume_pa.units \\\n", "0 eV/atom -3.105143 angstrom^3/atom \n", "1 eV/atom -3.272963 angstrom^3/atom \n", "2 eV/atom -3.125454 angstrom^3/atom \n", "3 eV/atom -3.320249 angstrom^3/atom \n", "4 eV/atom -3.299407 angstrom^3/atom \n", "\n", " oqmd.volume_pa.value \n", "0 12.0397 \n", "1 10.2187 \n", "2 12.6338 \n", "3 10.1745 \n", "4 10.2187 \n", "\n", "[5 rows x 47 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 421 entries in the Ag-Be chemical system\n" ] } ], "source": [ "print(\"There are {} entries in the Ag-Be chemical system\".format(len(df)))" ] } ], "metadata": { "kernelspec": { "display_name": "matminer python", "language": "python", "name": "matminer" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }