{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "When the RDKit has problems processing a molecule, it outputs information to the error console about what those problems were. Here's an example:\n", "```\n", "In [23]: m = Chem.MolFromSmiles('CO(C)C')\n", "[06:18:04] Explicit valence for atom # 1 O, 3, is greater than permitted\n", "```\n", "It's sometimes useful to have programmatic access to this information for later use in reporting.\n", "\n", "It also would be great if these types of messages were visible in the jupyter notebook.\n", "\n", "Brian Kelley recently added functionality to the RDKit to enable both of these things. For anyone interested, the two pull requests for those changes are: [#736](https://github.com/rdkit/rdkit/pull/736) and [#739](https://github.com/rdkit/rdkit/pull/739).\n", "\n", "This is a short note on how to take advantage of that.\n", "\n", "A couple of things to note: \n", " - This is currently in git and will be available in the 2016.03 release.\n", " - This post was written using Python3, some adaptation would be required for Python2." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Let's start by showing the standard state of affairs:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2025.03.2\n" ] } ], "source": [ "from rdkit import Chem\n", "import rdkit\n", "\n", "print(rdkit.__version__)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[15:44:04] Explicit valence for atom # 1 O, 3, is greater than permitted\n" ] } ], "source": [ "m = Chem.MolFromSmiles('CO(C)C')\n", "m" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[15:44:15] Can't kekulize mol. Unkekulized atoms: 0 1 2\n" ] } ], "source": [ "Chem.MolFromSmiles('c1cc1')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[15:44:15] SMILES Parse Error: unclosed ring for input: 'c1'\n" ] } ], "source": [ "Chem.MolFromSmiles('c1')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[15:44:16] SMILES Parse Error: syntax error while parsing: Ch\n", "[15:44:16] SMILES Parse Error: check for mistakes around position 2:\n", "[15:44:16] Ch\n", "[15:44:16] ~^\n", "[15:44:16] SMILES Parse Error: Failed parsing SMILES 'Ch' for input: 'Ch'\n" ] } ], "source": [ "Chem.MolFromSmiles('Ch')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far so good. What if I want to have access to the error messages as strings in Python?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "module 'rdkit.Chem' has no attribute 'WrapLogs'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[6], line 3\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mio\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m StringIO\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01msys\u001b[39;00m\n\u001b[0;32m----> 3\u001b[0m \u001b[43mChem\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mWrapLogs\u001b[49m()\n", "\u001b[0;31mAttributeError\u001b[0m: module 'rdkit.Chem' has no attribute 'WrapLogs'" ] } ], "source": [ "from io import StringIO\n", "import sys\n", "Chem.WrapLogs()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "error message: RDKit ERROR: [06:49:14] SMILES Parse Error: syntax error for input: 'Ch'\n", "\n" ] } ], "source": [ "sio = sys.stderr = StringIO()\n", "Chem.MolFromSmiles('Ch')\n", "print(\"error message:\",sio.getvalue())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I can use this to write a bit of code that processes all of the molecules in an SDF and captures the errors:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def readmols(suppl):\n", " ok=[]\n", " failures=[]\n", " sio = sys.stderr = StringIO()\n", " for i,m in enumerate(suppl):\n", " if m is None:\n", " failures.append((i,sio.getvalue()))\n", " sio = sys.stderr = StringIO() # reset the error logger\n", " else:\n", " ok.append((i,m))\n", " return ok,failures\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import gzip,os\n", "from rdkit import RDConfig\n", "inf = gzip.open(os.path.join(RDConfig.RDDataDir,'PubChem','Compound_000200001_000225000.sdf.gz'))\n", "suppl = Chem.ForwardSDMolSupplier(inf)\n", "ok,failures = readmols(suppl)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2035 RDKit ERROR: [07:31:28] Explicit valence for atom # 0 Br, 5, is greater than permitted\n", "RDKit ERROR: [07:31:28] ERROR: Could not sanitize molecule ending on line 404864\n", "\n", "11460 RDKit ERROR: [07:31:28] ERROR: Explicit valence for atom # 0 Br, 5, is greater than permitted\n", "RDKit ERROR: [07:31:32] Explicit valence for atom # 2 Te, 4, is greater than permitted\n", "RDKit ERROR: [07:31:32] ERROR: Could not sanitize molecule ending on line 2344967\n", "\n", "17016 RDKit ERROR: [07:31:32] ERROR: Explicit valence for atom # 2 Te, 4, is greater than permitted\n", "RDKit ERROR: [07:31:34] Explicit valence for atom # 1 Br, 5, is greater than permitted\n", "RDKit ERROR: [07:31:34] ERROR: Could not sanitize molecule ending on line 3489884\n", "\n" ] } ], "source": [ "for i,fail in failures:\n", " print(i,fail)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }