{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gathering and Investigating Materials Project Data\n", "\n", "This notebooks will show how you can use `requests` and `pandas` so gather and explore your data. Often times you will need to suply your data by other methods.\n", "\n", "The `api` that we will be using is the material project. Link to the [api description](https://materialsproject.org/docs/api#materials_.28calculated_materials_data.29)\n", "\n", "![Materials Projnect](../images/materials_project.png)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "base_url = 'https://materialsproject.org/rest/v2/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Materials Project Api Key\n", "\n", "This [link](https://www.materialsproject.org/open) details the steps necissary. \n", "\n", "1. Visit [dashboard](https://materialsproject.org/dashboard) you may need to login\n", "2. Generate API key if it has not already been generated and set `API_KEY` to this value.\n", "\n", "The subprocess method is a way that I store my passwords on my computer and will not work for you.\n", "\n", "Afterwards in the next cell we will test that our API key works. \n", "\n", "This is done by performing a `GET` or `POST` request to `https://www.materialsproject.org/rest/v1/api_check`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import subprocess\n", "API_KEY = subprocess.check_output('gopass www/materialsproject.com apikey'.split()).decode('utf-8')\n", "# API_KEY = \"\"\n", "\n", "session = requests.Session()\n", "session.headers.update({'X-API-KEY': API_KEY})" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'valid_response': True, 'api_key_valid': True}\n" ] } ], "source": [ "# for some reason the v2 API does not include an API check method??\n", "response = session.get(f'https://www.materialsproject.org/rest/v1/api_check')\n", "data = response.json()\n", "print(data)\n", "\n", "if not data['api_key_valid']:\n", " raise ValueError('You are not authenticated!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Materials Project API\n", "\n", "The materials project provides a RESTfull API for getting material properties which is detailed [here](https://www.materialsproject.org/docs/api#materials_.28calculated_materials_data.29).\n", "\n", "If you have followed the steps above you should be ready to parse materials project data.\n", "\n", "A RESTfull API is a nice way to expose data over the web. While they provide convenient methods for getting each individual material property they have a limit of 500 queries per day so we need to be efficient in our queries. To do this we will use the `npquery` to get properties in batch.\n", "\n", "Lets start by getting a list of materials that are compossed of the following elements `Fe`, `Ti`, `O`, `C`, `N`, `He`. This does not affect your API limit" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def get_materials(elements):\n", " elements_str = '-'.join(elements)\n", " response = session.get(f'{base_url}/materials/{elements_str}/mids')\n", " data = response.json()\n", " print(f'Found {len(data[\"response\"])} Materials in the Materials Project with the elements: {elements}')\n", " return data['response']\n", "\n", "def get_material_experimental_properties(mid):\n", " response = session.get(f'{base_url}/materials/{mid}/exp/')\n", " print(response.content)\n", " data = response.json()['response'][0]\n", " print(data)\n", " return data\n", "\n", "def get_material_vasp_properties(mid, piezoelectric=False, dielelectric=False):\n", " response = session.get(f'{base_url}/materials/{mid}/vasp/')\n", " material_data = response.json()['response'][0]\n", " \n", " if piezoelectric:\n", " response = session.get(f'{base_url}/materials/{mid}/vasp/piezo')\n", " data = response.json()\n", " if not data['valid_response']:\n", " material_data['piezoelectric'] = None\n", " else:\n", " material_data['piezoelectric'] = data['response']\n", " \n", " if dielelectric:\n", " response = session.get(f'{base_url}/materials/{mid}/vasp/diel')\n", " data = response.json()\n", " if not data['valid_response']:\n", " material_data['dielelectric'] = None\n", " else:\n", " material_data['dielelectric'] = data['response']\n", " \n", " return material_data" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 385 Materials in the Materials Project with the elements: ['Fe', 'O', 'Ni', 'He', 'Zn', 'Cu']\n" ] } ], "source": [ "material_ids = get_materials(['Fe', 'O', 'Ni', 'He', 'Zn', 'Cu'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Basic VASP properties\n", "\n", "Includes:\n", "\n", " - `energy`, `energy_per_atom`, `volume`, `formation_energy_per_atom`, `nsites`, `unit_cell_formula`, `pretty_formula`, `e_above_hull`, `spacegroup`, `icsd_ids`, `cif`, \n", "\n", " - properties: `band_gap`, `density`, `energry`, `energy_per_atom`, `formation_energy_per_atom`, `elascticity`, `total_magnetization`\n", " \n", "But some properties are still not included:\n", " \n", " - `piezo`, `diel`" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [], "source": [ "# MgO\n", "\n", "material_id = 'mp-1265'\n", "\n", "# Na2O\n", "\n", "material_id = 'mp-776952'" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['energy', 'energy_per_atom', 'volume', 'formation_energy_per_atom', 'nsites', 'unit_cell_formula', 'pretty_formula', 'is_hubbard', 'elements', 'nelements', 'e_above_hull', 'hubbards', 'is_compatible', 'spacegroup', 'task_ids', 'band_gap', 'density', 'icsd_id', 'icsd_ids', 'cif', 'total_magnetization', 'material_id', 'oxide_type', 'tags', 'elasticity', 'full_formula', 'piezoelectric', 'dielelectric'])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = get_material_vasp_properties(material_id, piezoelectric=True, dielelectric=True)\n", "data.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Basic Experimental properties\n", "\n", "Turns out to be thermochemical data and not worth looking at" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_material_experimental_properties(material_id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Let's gather the material data\n", "\n", "The Material Project definently is not enforcing their `500` materials per day rate limit.\n", "\n", "Also if you have a query that get greater than 3,000 materials it fails. Thus why some are commented out." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "materials_data = {}" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 2661 Materials in the Materials Project with the elements: ['H', 'He', 'O', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn']\n", "Number of materials 2661\n" ] } ], "source": [ "# Lets just grab a bunch of materials\n", "material_ids = get_materials(['H', 'He', \n", " #'Li', 'Be', \n", " #'B', 'C', 'N', \n", " 'O', \n", " #'F', 'Ne', \n", " #'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar'\n", " 'K', 'Ca', \n", " 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn',\n", " # 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr',\n", " ])\n", "print('Number of materials', len(material_ids))" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "# store the results\n", "for mid in material_ids:\n", " if mid in materials_data:\n", " continue\n", " materials_data[mid] = get_material_vasp_properties(mid)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6928" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(materials_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Save all of the downloaded data to a json file" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "import json" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "json.dump(materials_data, open('mpdata.json', 'w'))" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12K\t1-gather-data.ipynb\n", "4.0K\tOverview.ipynb\n", "22M\tmpdata.json\n" ] } ], "source": [ "! du -sh *" ] } ], "metadata": { "kernelspec": { "display_name": "scratch", "language": "python", "name": "scratch" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }