{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Programmatically Access Materials Project Electrolyte Genome Data\n", "Donny Winston and Xiaohui Qu
\n", "Created: November 18, 2015
\n", "Last Update: April 19, 2018" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook documents URL patterns to access Electrolyte Genome data and provides examples of access using the Python `requests` library.\n", "\n", "If you have questions, please contact the Materials Project team. Contact information is available at https://materialsproject.org." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## URL patterns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is one way to query for results given search criteria (`results`), and there are a few ways to obtain data for individual molecules, either in full with metadata (`json`) or simply the structure for display (`svg`) or analysis (`xyz`). Below are the four corresponding URL patterns." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "urlpattern = {\n", " \"results\": \"https://materialsproject.org/molecules/results?query={spec}\",\n", " \"mol_json\": \"https://materialsproject.org/molecules/{mol_id}/json\",\n", " \"mol_svg\": \"https://materialsproject.org/molecules/{mol_id}/svg\",\n", " \"mol_xyz\": \"https://materialsproject.org/molecules/{mol_id}/xyz\",\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import json\n", "import os\n", "import sys\n", "if sys.version_info[0] == 2:\n", " from urllib import quote_plus\n", "else:\n", " from urllib.parse import quote_plus\n", "\n", "import requests" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Ensure you have an API key, which is located on your dashboard\n", "# (https://materialsproject.org/dashboard).\n", "\n", "MAPI_KEY = \"fAkEaP1K4y\" # <-- replace with your api key\n", "\n", "# Please do NOT share a notebook with others with your API key hard-coded in it.\n", "# One alternative: Load API key from a set environment variable, e.g.\n", "#\n", "# MAPI_KEY = os.environ['PMG_MAPI_KEY']\n", "#\n", "# Best alternative: Store and load API key using pymatgen, e.g.\n", "### Do once, on command line (without \"!\" in front) or in notebook\n", "# !pmg config --add PMG_MAPI_KEY \"your_api_key_goes_here\"\n", "### Then, in notebook/script:\n", "# from pymatgen import SETTINGS\n", "# MAPI_KEY = SETTINGS.get(\"PMG_MAPI_KEY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting a set of molecules" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Here is a function we'll use to get results. We'll walk though some examples that use it.\n", "\n", "def get_results(spec, fields=None):\n", " \"\"\"Take a specification document (a `dict`), and return a list of matching molecules.\n", " \"\"\"\n", " # Stringify `spec`, ensure the string uses double quotes, and percent-encode it...\n", " str_spec = quote_plus(str(spec).replace(\"'\", '\"'))\n", " # ...because the spec is the value of a \"query\" key in the final URL.\n", " url = urlpattern[\"results\"].format(spec=str_spec)\n", " return (requests.get(url, headers={'X-API-KEY': MAPI_KEY})).json()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Find molecules containing oxygen and phosphorous,\n", "# and collect the ionization energies (relative to a lithium electrode) of the results.\n", "\n", "# Separate elements with a \"-\"\n", "spec = {\"elements\": \"O-P\"}\n", "\n", "results = get_results(spec)\n", "\n", "# Not all molecules have data for all available properties\n", "ionization_energies = [molecule[\"IE\"] for molecule in results if \"IE\" in molecule]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "smiles\n", "oxidation_magnesium\n", "task_id\n", "oxidation_lithium\n", "svg\n", "oxidation_hydrogen\n", "EA\n", "point_group\n", "reduction_hydrogen\n", "MW\n", "charge\n", "reduction_magnesium\n", "formula\n", "IE\n", "reduction_lithium\n" ] } ], "source": [ "# Molecules with ionization energies (\"IE\") will have oxidation potentials relative to metallic electrodes,\n", "# available as \"oxidation_\" keys. \"IE\" itself is relative to lithium.\n", "# There is an analogous relationship between the presence of electron affinity (\"EA\") values\n", "# and corresponding \"reduction_\" keys for reduction potentials using a reference metal.\n", "\n", "# `task_id` is the molecule's identifier, which we'll use later in this notebook.\n", "\n", "# `MW` is molecular weight\n", "# `smiles`: https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system\n", "\n", "for key in results[0]:\n", " print(key)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# A \"silly\" example specification that demonstrates many keys available to query, and\n", "# the expected format of their value specifications.\n", "#\n", "# The \"$\"-prefixed keys are MongoDB syntax (https://docs.mongodb.org/manual/reference/operator/query/).\n", "\n", "spec = {\n", " \"elements\": \"C-H-O-F\",\n", " \"notelements\": [\"Al\", \"Br\"], # a list (inconsistent for now with \"elements\" -- sorry)\n", " \"charge\": {\"$in\": [0, -1]}, # {0, 1, -1}\n", " \"pointgroup\": \"C1\",\n", " \"functional_groups\": {\"$in\": [\"-COOH\"]},\n", " \"base_molecule\": {\"$in\": [\"s3\"]},\n", " \"nelements\": 4,\n", " \"EA\": {\"$gte\": 0.4}, # >= 0.4\n", " \"IE\": {\"$lt\": 5}, # < 5\n", " \"formula\": \"H11 C11 O4 F1\", # \"H11C11O4F\" works too\n", "}\n", "\n", "results = get_results(spec)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we just want \"everything\"? Let's use an empty spec." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21954 molecules in total right now\n" ] } ], "source": [ "results = get_results({})\n", "print(\"{} molecules in total right now\".format(len(results)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above request might take some time, but hopefully not much more than a few seconds. Why do we allow this? Well, we don't return all the data for each molecule, and the total size of what we send right now is less than 10 MB.\n", "\n", "As our collection of molecules grows in size, this policy may change. So, please use targeted query specifications to get the results you need, *especially* if you want to periodically check for new molecules that meet some specification." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting data for individual molecules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can get all data for a molecule given its ID." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def get_molecule(mol_id, fmt='json'):\n", " url = urlpattern[\"mol_\" + fmt].format(mol_id=mol_id)\n", " response = requests.get(url, headers={'X-API-KEY': MAPI_KEY})\n", " if fmt == 'json':\n", " return response.json()\n", " else:\n", " return response.content" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ID: mol-64060\n", "There are 29 key/value pairs in molecule mol-64060. Have a look around!\n", "scalable vector graphic saved\n", "XYZ file saved. Can load into molecule-viewer software.\n" ] } ], "source": [ "first_result = results[0]\n", "mol_id = first_result['task_id']\n", "print(\"ID: {}\".format(mol_id))\n", "\n", "# Get all data by default\n", "molecule = get_molecule(mol_id)\n", "print(\"There are {} key/value pairs in molecule {}. Have a look around!\".format(len(molecule), mol_id))\n", "\n", "# The SVG format provides a two-dimensional \"pretty picture\" of the molecular structure.\n", "svg_of_molecule = get_molecule(mol_id, fmt='svg')\n", "with open('molecule.svg','w') as f:\n", " f.write(svg_of_molecule)\n", " print(\"scalable vector graphic saved\")\n", "\n", "# The XYZ representation provided is the optimized geometry of the molecule in a charge-neutral state.\n", "xyz_of_molecule = get_molecule(mol_id, fmt='xyz')\n", "with open('molecule.xyz','w') as f:\n", " f.write(xyz_of_molecule)\n", " print(\"XYZ file saved. Can load into molecule-viewer software.\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 1 }