{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "When the RDKit has problems processing a molecule, it outputs information to the error console about what those problems were. Here's an example:\n", "```\n", "In [23]: m = Chem.MolFromSmiles('CO(C)C')\n", "[06:18:04] Explicit valence for atom # 1 O, 3, is greater than permitted\n", "```\n", "It's sometimes useful to have programmatic access to this information for later use in reporting.\n", "\n", "It also would be great if these types of messages were visible in the jupyter notebook.\n", "\n", "Brian Kelley recently added functionality to the RDKit to enable both of these things. For anyone interested, the two pull requests for those changes are: [#736](https://github.com/rdkit/rdkit/pull/736) and [#739](https://github.com/rdkit/rdkit/pull/739).\n", "\n", "This is a short note on how to take advantage of that.\n", "\n", "A couple of things to note: \n", " - This is currently in git and will be available in the 2016.03 release.\n", " - This post was written using Python3, some adaptation would be required for Python2." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Let's start by showing the standard state of affairs:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2016.03.1.dev1\n" ] } ], "source": [ "from rdkit import Chem\n", "from rdkit import rdBase\n", "print(rdBase.rdkitVersion)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "m = Chem.MolFromSmiles('CO(C)C')\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point there is an error message in the console I launched Jupyter from, but it sure would be nice if it were visible here.\n", "\n", "We can enable that by just importing the usual RDKit Jupyter integration code:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from rdkit.Chem.Draw import IPythonConsole" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "RDKit ERROR: [06:46:02] Explicit valence for atom # 1 O, 3, is greater than permitted\n" ] } ], "source": [ "m = Chem.MolFromSmiles('CO(C)C')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "RDKit ERROR: [06:46:11] Can't kekulize mol \n", "RDKit ERROR: \n" ] } ], "source": [ "Chem.MolFromSmiles('c1cc1')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "RDKit ERROR: [06:46:18] SMILES Parse Error: unclosed ring for input: 'c1'\n" ] } ], "source": [ "Chem.MolFromSmiles('c1')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "RDKit ERROR: [06:46:41] SMILES Parse Error: syntax error for input: 'Ch'\n" ] } ], "source": [ "Chem.MolFromSmiles('Ch')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far so good. What if I want to have access to the error messages as strings in Python?" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from io import StringIO\n", "import sys\n", "Chem.WrapLogs()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "error message: RDKit ERROR: [06:49:14] SMILES Parse Error: syntax error for input: 'Ch'\n", "\n" ] } ], "source": [ "sio = sys.stderr = StringIO()\n", "Chem.MolFromSmiles('Ch')\n", "print(\"error message:\",sio.getvalue())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I can use this to write a bit of code that processes all of the molecules in an SDF and captures the errors:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def readmols(suppl):\n", " ok=[]\n", " failures=[]\n", " sio = sys.stderr = StringIO()\n", " for i,m in enumerate(suppl):\n", " if m is None:\n", " failures.append((i,sio.getvalue()))\n", " sio = sys.stderr = StringIO() # reset the error logger\n", " else:\n", " ok.append((i,m))\n", " return ok,failures\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import gzip,os\n", "from rdkit import RDConfig\n", "inf = gzip.open(os.path.join(RDConfig.RDDataDir,'PubChem','Compound_000200001_000225000.sdf.gz'))\n", "suppl = Chem.ForwardSDMolSupplier(inf)\n", "ok,failures = readmols(suppl)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2035 RDKit ERROR: [07:31:28] Explicit valence for atom # 0 Br, 5, is greater than permitted\n", "RDKit ERROR: [07:31:28] ERROR: Could not sanitize molecule ending on line 404864\n", "\n", "11460 RDKit ERROR: [07:31:28] ERROR: Explicit valence for atom # 0 Br, 5, is greater than permitted\n", "RDKit ERROR: [07:31:32] Explicit valence for atom # 2 Te, 4, is greater than permitted\n", "RDKit ERROR: [07:31:32] ERROR: Could not sanitize molecule ending on line 2344967\n", "\n", "17016 RDKit ERROR: [07:31:32] ERROR: Explicit valence for atom # 2 Te, 4, is greater than permitted\n", "RDKit ERROR: [07:31:34] Explicit valence for atom # 1 Br, 5, is greater than permitted\n", "RDKit ERROR: [07:31:34] ERROR: Could not sanitize molecule ending on line 3489884\n", "\n" ] } ], "source": [ "for i,fail in failures:\n", " print(i,fail)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }