{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Submit Structures to MPComplete" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook documents the process of\n", "1. Taking and validating a collection of CIFs (e.g. in a ZIP file), creating pymatgen Structure objects\n", "3. Filtering for structures that are submittable to MP (e.g. they are not duplicates)\n", "4. Taking and validating common metadata for the structures (authors, references, etc.)\n", "5. Submitting the resulting StructureNL objects to MP-Complete via the pymatgen MPRester" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Taking and validating a collection of CIFs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first need a filename for the ZIP archive." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "zipfilename = '/Users/dwinston/Dropbox/best/structures/ever.zip'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a list of structures from the ZIP archive's CIF files. Anything invalid about the ZIP archive or CIF files will raise an exception here." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from zipfile import ZipFile\n", "from pymatgen.io.cif import CifParser\n", "\n", "structures = []\n", "\n", "myzip = ZipFile(zipfilename, 'r')\n", "for name in myzip.namelist():\n", " with myzip.open(name) as cif_file:\n", " structures.extend(CifParser(cif_file).get_structures())" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "len(structures)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering for structures that are submittable to MP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reject structures already on MP web site." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from pymatgen import MPRester\n", "\n", "mpr = MPRester()\n", "\n", "mp_ids = []\n", "new_structures = []\n", "\n", "for s in structures:\n", " found = mpr.find_structure(s)\n", " if len(found) > 0:\n", " mp_ids.extend(found)\n", " else:\n", " new_structures.append(s)\n", "\n", "if len(mp_ids) > 0:\n", " print(\"Filtered out structures already on MP: {}\".format(mp_ids))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "len(new_structures)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a mock \"job\" for each structure, and then simulate the checks the submission processor does to reject jobs. The structures that pass here will actually spawn a ready workflow, so we will filter for such structures." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from pymatgen import Composition\n", "from pymatgen.matproj.snl import StructureNL\n", "\n", "def get_meta_from_structure(structure):\n", " \"\"\"Used by `structure_to_mock_job`, to \"fill out\" a job document.\"\"\"\n", " comp = structure.composition\n", " elsyms = sorted(set([e.symbol for e in comp.elements]))\n", " meta = {'nsites': len(structure),\n", " 'elements': elsyms,\n", " 'nelements': len(elsyms),\n", " 'formula': comp.formula,\n", " 'reduced_cell_formula': comp.reduced_formula,\n", " 'reduced_cell_formula_abc': Composition(comp.reduced_formula)\n", " .alphabetical_formula,\n", " 'anonymized_formula': comp.anonymized_formula,\n", " 'chemsystem': '-'.join(elsyms),\n", " 'is_ordered': structure.is_ordered,\n", " 'is_valid': structure.is_valid()}\n", " return meta\n", "\n", "def structure_to_mock_job(structure):\n", " # Needs at least one author. This is for a mock job, so can put whatever.\n", " snl = StructureNL(structure, [{\"name\": \"Evgraf Fedorov\", \"email\": \"symmetry@ftw.org\"}])\n", " job = snl.as_dict()\n", " if 'is_valid' not in job: job.update(get_meta_from_structure(snl.structure))\n", " sorted_structure = snl.structure.get_sorted_structure()\n", " job.update(sorted_structure.as_dict())\n", " return job\n", "\n", "# mpworks.processors.process_submissions.SubmissionProcessor#submit_new_workflow \n", "MAX_SITES = 200 # SubmissionProcessor.MAX_SITES above\n", "\n", "# from mpworks.workflows.wf_utils import NO_POTCARS\n", "NO_POTCARS = ['Po', 'At', 'Rn', 'Fr', 'Ra', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'Lr']\n", "\n", "def job_is_submittable(job):\n", " snl = StructureNL.from_dict(job)\n", " if len(snl.structure.sites) > MAX_SITES:\n", " print 'REJECTED WORKFLOW FOR {} - too many sites ({})'.format(\n", " snl.structure.formula, len(snl.structure.sites))\n", " elif not job['is_valid']:\n", " print 'REJECTED WORKFLOW FOR {} - invalid structure (atoms too close)'.format(\n", " snl.structure.formula)\n", " elif len(set(NO_POTCARS) & set(job['elements'])) > 0:\n", " print 'REJECTED WORKFLOW FOR {} - invalid element (No POTCAR)'.format(\n", " snl.structure.formula)\n", " elif not job['is_ordered']:\n", " print 'REJECTED WORKFLOW FOR {} - invalid structure (disordered)'.format(\n", " snl.structure.formula)\n", " else:\n", " return True\n", " return False\n", "\n", "# No longer need separate reference for new_structures \n", "structures = new_structures\n", "\n", "submittables = []\n", "for s in structures:\n", " if job_is_submittable(structure_to_mock_job(s)):\n", " submittables.append(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Taking and validating common metadata for the structures" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If there are issues with the metadata, an exception will be raised on attempting to create `snl_list`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# No longer need separate reference for submittables\n", "structures = submittables\n", "\n", "# List of (name, email) pairs\n", "authors = [\n", " ('Evgraf Fedorov', 'symmetry@ftw.org'),\n", " ('Arthur Schoenflies', 'art@berlin.de'),\n", "]\n", "\n", "# BiBTeX string of references\n", "references = \"\"\"\n", "@article{Graf1961,\n", "author = {Graf, Donald L},\n", "journal = {American Mineralogist},\n", "number = {11},\n", "pages = {1283--1316},\n", "title = {{Crystallographic tables for the rhombohedral carbonates}},\n", "volume = {46},\n", "year = {1961}\n", "}\n", "@article{Akao_1977,\n", "author = {Akao, M and Iwai, S},\n", "doi = {10.1107/s0567740877005834},\n", "journal = {Acta Crystallogr Sect B},\n", "month = {apr},\n", "number = {4},\n", "pages = {1273--1275},\n", "publisher = {International Union of Crystallography ({\\{}IUCr{\\}})},\n", "title = {{The hydrogen bonding of hydromagnesite}},\n", "url = {http://dx.doi.org/10.1107/s0567740877005834},\n", "volume = {33},\n", "year = {1977}\n", "}\n", "\"\"\"\n", "\n", "# Projects? List of strings.\n", "projects = []\n", "\n", "# Remarks? List of strings.\n", "remarks = []\n", "\n", "snl_list = StructureNL.from_structures(structures, authors, references=references,\n", " projects=projects, remarks=remarks)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submitting the structures to MP-Complete" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Using v1 endpoint\n", "mpr = MPRester(endpoint=\"https://www.materialsproject.org/rest/v1\")\n", "\n", "#mpr.submit_snl(snl_list)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }