{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Back in 2010 Rajarshi Guha [blogged](http://blog.rguha.net/?p=850) about converting the [PAINS substructure filters](http://dx.doi.org/10.1021/jm901137j) from [SLN](http://pubs.acs.org/doi/abs/10.1021/ci960109j) to SMARTS. Given the paucity of tools that effectively support SLN and the usefulness of the PAINS filters, these SMARTS patterns really caught on. Rajarshi ended up also publishing [a paper](http://doi.wiley.com/10.1002/minf.201100076) together with J. Baell, one of the original authors of the PAINS paper, that provided KNIME workflows for using the RDKit and Indigo to filter PAINS compounds. A big focus of the paper was differences in the number of matches provided by the two toolkits. The RDKit matched a lot less structures than Indigo did, but neither matched as many as expected.\n", "\n", "With the advent of the [FilterCatalog functionality](https://github.com/rdkit/rdkit/pull/536) in the RDKit, we included the \"standard\" SMARTS version of PAINS. Axel Pahl quickly [pointed out](https://github.com/rdkit/rdkit/pull/536#issuecomment-123706922) some problems that he had experienced using the PAINS filters and the RDKit. Axel also pointed out that we really didn't have a good test set for the filters.\n", "\n", "This all led me to spend some time looking at the SMARTS versions of the PAINS filters and the way the RDKit handles them. It's fortunate that I had a good-sized block of time, because this turned into a larger task than I had anticipated. I ended up making a number of improvements to the ways Hs from SMARTS are handled, fixing [some](https://github.com/rdkit/rdkit/issues/544) [bugs](https://github.com/rdkit/rdkit/issues/557), and modifying a lot of the PAINS SMARTS definitions. This required a bunch of iterative tweaking and testing that I won't describe in detail here, but I think it's worthwhile getting into a bit of what I did and why the changes were necessary.\n", "\n", "An aside here: SLN is an interesting format that never really caught on. This set provides a good opportunity for testing and refining the RDKit's SLN support, which has been there for a while at least for standard molecules, but is not extensively tested and would need to be extended to support more query features." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2015.09.1.dev1\n", "Sun Aug 9 15:04:15 2015\n" ] } ], "source": [ "from rdkit import Chem\n", "import time,random\n", "from rdkit.Chem.Draw import IPythonConsole\n", "IPythonConsole.molSize = (450,250)\n", "from rdkit import rdBase\n", "from __future__ import print_function\n", "%load_ext sql\n", "print(rdBase.rdkitVersion)\n", "print(time.asctime())" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## H atoms in SMARTS\n", "\n", "In order for any of this to make sense, it's first important to understand what the RDKit does with H atoms in SMARTS.\n", "\n", "Here's one of the SMARTS from the PAINS set:\n", "\n", "```\"[#8]=[#16](=[#8])-[#6](-[#6]#[#7])=[#7]-[#7]-[#1]\",\"\"```\n", "\n", "Here's a rendering of the pattern:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAcIAAAD6CAYAAAAyRkcxAAAX4ElEQVR4nO3deXBV9d3H8U8SsxAg\n", "bCGsRkgEcilRURFUKqKAlQFFmMieErFgaatitW5TtTpKRJ2RWm3FgpiwCARBNtfSCiqKIBAaQoSE\n", "sGQRZElIzX7P88cZH0WSC4Sbe27u7/2aYYRzT8798gfz9uxBlmVZAgDAUMFODwAAgJMIIQDAaIQQ\n", "AGA0QggAMBohBAAYjRACAIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBohBAAYjRACAIxG\n", "CAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBohBAAYjRACAIxGCAEARiOEAACjEUIAgNEIIQDA\n", "aIQQAGA0QggAMBohBAAYjRACAIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBohBAAYjRAC\n", "AIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBohBAAYjRACAIxGCAEARiOEAACjEUIAgNEI\n", "IQDAaIQQAGA0QggAMBohBAAYjRACAIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBohBAAY\n", "jRACAIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBohhP9au1bq00cKD7f/u26d0xMBCECE\n", "EP5p507prrukWbOko0el1FQpJUXKzHR6MgABJsiyLMvpIYAzjB0rDRokzZjx47JXX5U2bZLeftu5\n", "uQAEHEII/9S9u7Rxo3TxxT8uO3TIjmNennNzAQg4hBD+KSJCKimxzw/+oLJSat1aKi93bCwAgYdz\n", "hPBPnTpJR46cvuzIEXs5AHgRIYR/6tdPWrPm9GWrV9vLAcCLLnJ6AKBOjz0mDRsmxcZKN9xgXyTz\n", "l79IH37o9GQAAgznCOG/Vq+WHn1U2rtX6tHDvoVi5EinpwIQYAghAMBonCNE07J2rTRzptNTAAgg\n", "hBD+5+WXpa+/rvuzigppxQqfjgMgsBFC+J+MDOnf/677M5dLOnxYKi317UwAAhYhhP9xuaTs7Lo/\n", "69FDCgmRvvnGtzMBCFiEEP7HUwjDwqS4uPo/B4DzRAjhf1wuafduz58TQgBeQgjhf1wu6eRJ6dtv\n", "6/+cEALwEkII/xMbKzVvXn/sCCEkvfTSS+e1HKgPIYT/CQ6Wevb0HMLcXPttFABwgQgh/JOnvb6E\n", "BKm21o4hAFwgQgj/5CmELVtKXbpweBSAV/D2Cfgnl0v6xz88f04Ijcf5QHgDIYR/crmkggL7LfWt\n", "Wp3xsfuqRNW4CxXmwGjwH3/84x/PWEYccb44NAr/1KOHFBoq5eTU+fGxB3ooN+kLHw8FIBARQvin\n", "0FDlfH2ljsXXffgzIsKlioocSW7fzgUg4BBC+K2LmnVSRUXdIWzWzCW3+3tVVR308VRwQlVVlT79\n", "9FPV1tY6PQoCEC/mhd8qKHhMFRVZio9/t87Pd+6MVrdu6WrV6lYfTwZfOnnypFatWqWQkBCNGTNG\n", "kZGRTo+EAMMeIfxWRIRL5eX1XxkaEZFQ7x5jIAsKkvr3r3t5oDl8+LAWL16s6OhojRs3jgiiURBC\n", "+K1mzVyqqsqTZdX9BBn7PKF5IZSkNm2kpUudnqJxZWZmKiMjQ4mJiRoxYoRCQ0OdHgkBihDCb0VE\n", "JMiy3Kqo2FvP5+aG8MUXpT//WaqqcnoS76utrdUHH3ygTz75RCNGjNAvf/lLp0dCgCOE8FvBwS0U\n", "Fta13tjZh049vK4pgPXpIw0eLL36qtOTeFdZWZmWLl2qw4cPa8KECbr00kudHgkGIITwa572+po1\n", "c6m29oRqao74eCr/8PTT0pw59hurAkFhYaEWLlyo8PBwTZw4Ue3atXN6JBiCEMKveQphWFisgoOb\n", "e7ygJpB16CBNmyY9+6zTk1y47OxsLV++XL169dLo0aMVERHh9EgwCCGEX/N85WiwIiJ6GnueUJJm\n", "zpTefVc6cODHZX/9q7R5s3MznY+aGum++6T1649p2LBhGjx4sIIC8fJX+DVCCL/WrJlLlZU5sqy6\n", "b6Q2+YIZSWrWTHrySemxx+w/u93SZ59JN94o9e4tvfCCVFzs6Ij1OnZM+tWvpBUrpBtuGCiXy+X0\n", "SDAUIYRfi4hwye2uUFXVgTo/b9duiqKihvl4Kt+qrZVOnKj/8wkTpH377N8HB9u3VRQXS/ffLy1Z\n", "InXtKg0dKi1fLlVX+2Tks9q1S+rXT6qokLZutX8POIUny8Dv5ecnq2PHxxUR0cvpUXzu+HFp3Dgp\n", "KkrKyGjYNrKypPR0ad48O6pJSdJvfytdcYVXRz1na9dKEyfaf69XXpHCeIUIHEYIAT+VkyONGmXf\n", "PL9ihdSp04Vtr7JSWr1aSkuT3nvPDuHkydKkSZIvLtC0LGn2bOmJJ+wrXh9+uPG/EzgXhBBNSnV1\n", "sQoK/qTS0o9UU3NCzZtfo44d/6RWrUY4PZpXrVtn7zWNGCG98YZ9LtCbCgqkhQvtbRcUSCNH2lEc\n", "PlwKCfHud0lSWZmUnCxt3CgtWybddJP3vwNoKM4RoknJz0/WRRfFyOXaob59S9S58zM6evTvTo/l\n", "NZYlPf+8NHq09Oijdqy8HUFJ6tLF3iP75hvpo4/svc7x46Vu3aRHHpHy8rz3Xbm50rXX2tvcupUI\n", "wv+wR4gmZfv2Frr88mIFB7dwehSvKyuTpkyR/vMf+4KXm2/27feXlkqrVtnnEzdssOOVnGzvmTZv\n", "3rBtbt0q3XKLfbHO/PkSz8yGPyKEaFJycgYpMvJyxcTcq/DwwHn81qFD0h132M8OffddqXt3Z+fZ\n", "s0dasMD+VVEhjR1rHzodOPDMdYOCpGuukb788szlJSXSokXSPfcE5tsxEBg4NAq/VN99g3Fxy2RZ\n", "1crJuVE7drTW/v0TVV1d6OPpvGvTJunqq+2LYT791PkISlJCgpSaagd6wQKpqMh+tulll0kvv2zf\n", "A/hT9b0NIyrKvkKVCMKfEUL4nerqIuXkDNTx42+f8VloaAfFxv5dl112WH36fKPQ0I7av3+CJKm8\n", "/L9yuyt8Pe4FmTtXGjJESkmx9wSjoupf1+2WPv/cd7NJUmiofeXq6tV2FKdOtcP49denrxfIb8NA\n", "4OPQKPzK999vV27uKEVEJCgu7m2FhLTxuH5t7SllZnZS375lys7up8rKXLVtO17R0SmKjLzaR1Of\n", "v8pK6Xe/s/ei0tLsw6KelJbatzls22bfVtHC4VOklvXjXl5QkP3n6dPtPcmZM09fDvg79gjhN44f\n", "X6KcnOsVFfUrXXrpujojuG/fCJ069Ync7grV1Hyn4uJnFRl5lSQpIeELxcUtU03NUe3Zc52ysnqr\n", "uPh5v3s7RVFRkYYPr9Unn0hffHH2CO7bJ113nb1H9tlnzkdQqvtQZ6C9DQPmIIRwnGXVqqDgER04\n", "cJdiY1/TJZe8rqCgi+pct33736qw8M/asaO1srJ6q7Jyn7p3T5ckBQWFKCpqiOLilikx8aDat5+u\n", "48cXKzPzYu3bN1InTiyXZTn7jLEtW7aoX79+iotbrC+/lH7xC8/rf/CBfSFKnz52BLt188mYDRJI\n", "b8OAWTg0CkfV1BzX/v3jVF6epfj4d9S8eX+vf8f332/TsWNpOn58oYKCwtWmTZKio+9Ws2aJXv8u\n", "T5YsWaKpU6dq8uTJ+tvf/qbQ0FCP68+dK/3hD/b9hE8+6b8XnPz0EGh5uXT55fa9id26cWgUTQMh\n", "hGMqKnKUmztKISFtFB+/QqGhnp8hVl39rUJDOzT4+9zuCpWUrNF3381Vaem/FBl5pdq3n6Y2bcYr\n", "JKRlg7d7NrW1tXr88cc1Z84cvfbaa0pJSfG4fmWlfbvBihX2PX23395oo3nFz88FLlokrV8vLV5M\n", "CNE0EEI4oqRknfbvn6hWrUbokkveUHCw58enfPfdGzp06H717p2p8PD4C/7+yspcHTu2QMeOLVBN\n", "zUkFB09RaemdGjhwoFffh1daWqpJkyZp27Zteuedd9S/v+c93sJC+5zhiRP2VaRN4c1EPw+hZUkD\n", "BkhbthBCNA2cI4RvWZaKDz2r3NzR6tz5GXXvvtBjBC2rRgUFj+jQoXsVG/sPr0RQksLD49W58zNK\n", "TDyg+PgM7doVpqFDh6pXr16aNWuWCgoKLvg7vvnmG/Xv319Hjx7V1q1bzxrBzZvt+wnbtrUj0hQi\n", "KJ0Zu6Ag++Z6IoimghDCd8rKpKQktX5mk3r0eF8xMX/wuHpNzTHt3XuLjh9fpF69Nqldu8mNMFSw\n", "oqJu0aRJL6m4uFgPPvig1q1bp9jYWA0dOlRpaWkqLy8/762uX79e11xzja666ipt2LBBnc7y6oiF\n", "Cz/S0KH2I83WrZNat27gXwfA+bMAXzh40LKuvNKyLrvMsvLyzrr699/vtHbt6m7t2XO9VVVV7IMB\n", "T5eVlWU9/PDDVvv27a3WrVtb06ZNs7Zt23bWn3O73VZqaqoVFhZmpaamnnX96upq6/e//70VGRlp\n", "vfPOLm+MDuA8cY4QjW/jRvttsP37269T8PT4FEknTixXfn6K2radqNjYVxQU5NybW6uqqvTBBx8o\n", "PT1dK1euVM+ePZWcnKypU6cqOjr6tHX/97//acqUKdqwYYOWLl2qIUOGeNz2sWPHNHbsWO3Zs0cr\n", "V65UP17TDjiCEKJx/XAPwMyZ0nPPScGejsZbKi6ercLCJ9S16wuKibnXZ2Oei8LCQqWnp2vevHk6\n", "ePCgbrvtNk2ePFm33nqrioqKdMcdd6iqqkqrVq1SXFycx21lZmZq1KhR6ty5szIyMtSxY0cf/S0A\n", "/BwhROOorJRmzLDfwnoOzxCrrT2l/PxklZV9qri4ZWrZcrCPBj1/lmVp06ZNmj9/vjIyMtSmTRuV\n", "l5dr0KBBeuutt9TiLI9+WbNmjSZNmqRx48bplVdeUViYc3u8AAghGkNRkf1m2aNH7XsAzvL4FPeB\n", "fdpTNkpBweGKj1+lsLCLfTTohTt16pQeeughvffee8rPz/d464VlWZo9e7aeeOIJPf3003r44Yd9\n", "OCmA+nDVKLxr+3b7JrKoKOmrr87pGWLBfa/RJV/dql69Pm1SEZSkli1baurUqSosLFSVh1cvlJWV\n", "acyYMXrxxRf1/vvvE0HAjxBCeM+KFfbToZOS7EeLtPH85gjNnSvddpt0771q/uvZZ72p3l8lJCSo\n", "trZWubm5dX6em5urAQMGKC8vT1999ZUGD/bfw76Aiep+sjHQED17Sm+8Yb8vyJOfPkNs2TL/f4bY\n", "WbRs2VJdunRRdna2evfufdpnlmXpzjvvVGJioubNm6fIyEiHpgRQH0II70lMtH95UlBgnz88ccJ+\n", "/EhTeXzKWbhcLmVnZ5+xPCgoSOvXr1dMTIxXH90GwHs4NArf+fzzpvkMsXNQXwglqUOHDkQQ8GOE\n", "EA23dq39orzwcPu/69bVv+7ChdKQIdKvf23/XIA9Q8xTCAH4N0KIhtm5U7rrLmnWLPs2idRUKSVF\n", "ysw8fb2aGumRR6Tf/EZ6/XV7vZAQZ2ZuRC6XSzk5OXK73U6PAuA8cR8hGmbsWGnQIPum+R+8+qq0\n", "aZP09ts/LhszRtq6VVq1Surb1+dj+sqRI0fUoUMH7d+/X938+TXyAM7AHiEaZssWaeTI05fddpu9\n", "/Kceesi+nzCAIyhJMTExateuHYdHgSaIEKJhioqkmJjTl8XE2Mt/asCAM9cLUAkJCYQQaIIIIRqm\n", "UyfpyJHTlx05Yi83FBfMAE0TIUTD9OsnrVlz+rLVq+3lhiKEQNPEDfVomMcek4YNk2JjpRtusC+S\n", "+ctfpA8/dHoyx7hcLu3evdvpMQCcJ64aRcOtXi09+qi0d6/Uo4d9a8TPL6AxSH5+vrp3767i4mJ1\n", "6NDB6XEAnCNCCHiJ2+1WVFSU1q5dqxtvvNHpcQCcI84RAl4SHBysnj17cp4QaGIIIeBFXDADND2E\n", "EPAiQgg0PYQQ8CJCCDQ9hBDwIpfLpYKCApWUlDg9CoBzRAgBL+rRo4dCQ0OVk5Pj9CgAzhEhBLwo\n", "NDRUcXFxHB4FmhBCCHgZ5wmBpoUQAl5GCIGmhRACXkYIgaaFEAJe5nK5lJeXp8rKSqdHAXAOCCHg\n", "ZQkJCXK73dq7d6/TowA4B4QQ8LIWLVqoa9euHB4FmghCCDQCzhMCTQchBBoBIQSaDkIINAJPIayt\n", "rVVqaqry8/N9OxSAOhFCoBG4XC7l5OSotrb2jM+Kior07rvvKj4+XkOGDNGiRYtUXl7uwJQAJEII\n", "NAqXy6WKigodOHDgjM+6du2qzZs3a/fu3br66qv14IMPqlOnTkpOTtbHH3/swLSA2Qgh0Ajat2+v\n", "6Ohoj+cJe/XqpdTUVB0+fFgZGRmqqKjQ8OHD1bt3bz3//PP69ttvfTgxYC5CCDSShISEc7pgJiQk\n", "REOGDNGyZctUXFys+++/X0uWLFGXLl00dOhQLV++XNXV1T6YGDATIQQaSUOuHG3btq2mTZumHTt2\n", "6Msvv1Tv3r01Y8bvdN11tbrvPikzs5GGBQxGCIFGcu211yorK0sLFixQWVnZef/8VVddpTlz5ujg\n", "wcO6994I7dol9e0rXX+9NG+edOpUIwwNGCjIsizL6SGAQPXUU09p3rx5KikpUVJSklJSUjRw4MAG\n", "b+/wYWnRIun116WiImnkSGnaNOnmm6WgIC8ODhiEEAKNzO12a8OGDUpLS9OKFSvUtWtXjR8/Xikp\n", "KbrkkksauE3p88+l9HRp4UKpXTtpwgTpnnukbt28Oz8Q6Agh4EMnT57UsmXLNHfuXG3fvl033XST\n", "Jk+erKSkJDVr1qxB2zxxQlq8WHrzTWnHDmnoUGnWLOmKK7w6OhCwCCHgkKysLKWnp2v+/PmqqalR\n", "UlKS7rnnHvXt27fB28zMlObPl6ZPl1wuLw4LBDBCCDissrJSH374odLT07Vy5Ur17NlTycnJmjp1\n", "qqKjo50eDwh4hBDwI4WFhUpPT9c///lPxcamq127AZo8Wbr1Vumii5yeDghMhBDwQ5ZlaePGGs2f\n", "H6qMDPtimClT7F9xcU5PBwQWQgg0ou3btys7O1ujR49WREREg7ZRWiqtWmVfIfqvf0lXXmnfMjFh\n", "gtSihXfnBUzEDfVAI9qzZ48eeOABderUSTNmzNDWrVvPextRUVJysvTRR1J2tjRihPTss1KXLvby\n", "jz+W+N9ZoOEIIdCIxo8fr4KCAi1fvlzfffedrrvuuv9/qPaRI0fOe3u9eklPPSXl5UkrVkgVFdLw\n", "4fYVok89JR08eObPBAVJ/fvXvRwAIQQa3U8fqn3w4EFNnz5dS5YsUefOnRv8UO2QEGnIEGnZMvtp\n", "M9OmSRkZUny8dPvt0nvvnb5+mzbS0qVe/EsBAYRzhIBDtm3bprS0NC1cuFDh4eFKSkrS3XffrcTE\n", "xAZvc8sW+8b6Fi2kF16wlwUFSbt2SaNHS//9rxQW9uNy/vUDhBBwXEVFhdasWaO5c+equLibwsPf\n", "0LRp0vjxUsuWF779H4I3fbqUkCDNnHn6csB0hBDwI3l5lXrzzXC99ZZ0/LiUlCTddZc0cGDDz+n9\n", "ELxvv7XPFe7YIbVuTQiBHxBCwA+53dKGDVJamn1RTHS0vYfYkIdq/zR4zz1nP5v0hRcIIfADQgj4\n", "uZIS+0KXtDRp82bpppukyZPtvcVzeU73T4NXXi5dfrl9K0a3boQQkAgh0KTs3Gk/VHvRIvvPEydK\n", "s2dL4eH1/8zP9/wWLZLWr7ffWMG/foAQAk1SZaW0Zo19+PS11zyv+/MQWpY0YIB9hSn/+gFCCAAw\n", "HDfUAwCMRggBAEYjhAAAoxFCAIDRCCEAwGiEEABgNEIIADAaIQQAGI0QAgCMRggBAEYjhAAAoxFC\n", "AIDRCCEAwGiEEABgNEIIADAaIQQAGI0QAgCMRggBAEYjhAAAoxFCAIDRCCEAwGiEEABgNEIIADAa\n", "IQQAGI0QAgCMRggBAEYjhAAAoxFCAIDRCCEAwGiEEABgNEIIADAaIQQAGI0QAgCMRggBAEYjhAAA\n", "oxFCAIDRCCEAwGiEEABgNEIIADAaIQQAGI0QAgCMRggBAEYjhAAAoxFCAIDRCCEAwGiEEABgNEII\n", "ADAaIQQAGI0QAgCMRggBAEYjhAAAoxFCAIDRCCEAwGiEEABgNEIIADAaIQQAGI0QAgCMRggBAEYj\n", "hAAAoxFCAIDRCCEAwGiEEABgNEIIADAaIQQAGI0QAgCMRggBAEYjhAAAoxFCAIDR/g+p5yT8wRIt\n", "egAAAABJRU5ErkJggg==\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sma = '[#8]=[#16](=[#8])-[#6](-[#6]#[#7])=[#7]-[#7]-[#1]'\n", "pains8 = Chem.MolFromSmarts(sma)\n", "pains8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And a molecule that should match:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAcIAAAD6CAYAAAAyRkcxAAA4n0lEQVR4nO3deVxU9f4/8NcMO7II\n", "IrvIIJmiqIihJto10TbRrxp5KdTKwqUbtvnDuhW23HunXc3yYovBzeVq3m6uFbYYaLkBCgYpCsgi\n", "u+wMwzDv3x/nuuAMyDIzZ4Z5Px8PHsn5zJx58Uh5zefM55wjISICY4wxZqakYgdgjDHGxMRFyBhj\n", "zKxxETLGGDNrXISMMcbMGhchY4wxs8ZFyBhjzKxxETLGGDNrXISMMcbMGhchY4wxs8ZFyBhjzKxx\n", "ETLGGDNrXISMMcbMGhchY4wxs8ZFyBhjzKxxETLGGDNrXISMMcbMGhchY4wxs8ZFyBhjzKxxETLG\n", "GDNrXISMMcbMGhchY4wxs8ZFyBhjzKxxETLGGDNrXISMMcbMGhchY4wxs8ZFyJgJkkiAL77Q3MYY\n", "6zkuQsZMVHIyUFgodgrGTB8XIWMmauNGYPlyQK0WOwljpo2LkDETFRQEzJoFfPCB2EkYM21chIyZ\n", "sGeeAQ4dAs6eFTsJY6ZLQkQkdgjGWM9IJMDVf7mFhcDixcAvv1zfxhjrPp4RMmbihg4FHntM7BSM\n", "mS6eETLGGDNrPCNkzET9/jtw//1ip2DM9HERMmaibGyAgweBK1fETsKYaeMiZMxE+fkBFhZAfr7m\n", "WEIC8Nprhs/EmCniImTMyCmVwJEjmitCrawAHx/tRahSARkZhsnHmKnjImTMyFVUAOHhQHm55phM\n", "Bly8qH27toJkjGniImTMyHl7C58Haiu2zgovIICLkLHu4iJkzMhJpcLngT0pQpkMaGgAqqv1n48x\n", "U8dFyJgJ6OwQaGczvyFDAEtLnhUy1h1chIyZgK5mfoWFmnegsLQEfH25CBnrDi5CHarm41BMT7oq\n", "QoUCKCvr/nMYYx1xEepIRUUFvLy8cO7cObGjsH6os1Lz8gJsbXv2+SFjrCMuQh1xd3fHHXfcgW3b\n", "tokdhfVDAQFAcTHQ1tZxu0QiXHSbi5Cx3uMi1KHo6Ghs3bpV7BisH5LJhJPki4u1j3VWhGVlfPt6\n", "xm6Fi1CHFi5ciIKCApw8eVLsKKyfGTQIcHLSXnhjxjSivl7zQ0KZ7Dfk5tpDffNKGsZYB1yEOjR4\n", "8GBERERg+/btYkdh/dB9951HeXmuxnY3t004eTJaY7tMNhStra24fPmyIeIxZrK4CHUsOjoa27dv\n", "R3t7u9hRWD+jUKzG2bNfamyXyWTI1zJV9PT0hL29vdYxxth1XIQ6Nm/ePNTV1eHw4cNiR2H9TGeF\n", "J5PJUFxcDKVS2WG7RCKBn58fFyFjt8BFqGOOjo6YPXs2Hx5lOtdVEba3t6OoqKjbz2GMXcdFqAfR\n", "0dH46quv0NraKnYUw2hrA1avFk5q8/ICXnhBc50/67POSs3V1RXOzs6dliQXIWNd4yLUg/vvvx9S\n", "qRQHDx4UO4phyOVAZiZw4oTwlZ4OvPWW2Kn6HZlMhvLycjQ3N2uM+fv7cxEy1ktchHpgbW2N+fPn\n", "m8/h0eRkYP164eKWvr7Ahg1AUpLYqfodmUwGIkJhYaHGWEBAABchY73ERagn0dHR2LNnD+rr68WO\n", "on/FxcKlT64KDARKSsTL008NGDAAgwcP7lHhXV1IYzaH6RnrBS5CPbnrrruwePx4VH33ndhR9M/H\n", "p+M9gvLyhG1M57oqvM62q9VqrQtpGGMCLkI9sbCwQOLEiQjYskXsKPq3aBGwapUwMywuFv68ZInY\n", "qfqlnhahi4sL/vSnP2n9XJExJuAi1KfoaCAlBSgvFztJ3xEBH34IaPuF+uKLwJgxwIQJwB13ACEh\n", "QHy84TOaga6KsKKiAo2NjRpjP/30E8aMGWOIeIyZJAkRkdgh+rXbbwfi4oCnnhI7Sd+8+irw0UfA\n", "sWPCZ4BMFBkZGaioqMA999zTYXtLSwsGDBiArKwsjBo1SqR0jJkmnhHqW3Q0YOqrR3fsEE6H2LWL\n", "S1BkISEhGiUIAHZ2dnB3d+cVooz1AhehvsXEAEePmu6N4VJTgcceAz79FLj7brHTsC7MmDEDpaWl\n", "YsdgzORwEepbYCAwfrwwqzI1Fy4A8+cLn/ctWqQ5rlQCe/YYPhfTioiQm6t5dwrGWNe4CA0hOhr4\n", "UvOuAUatuhq47z5gxgwgIUFznAiIjQWeeUb7AhpmcHzyPGO9w0VoCNHRwB9/AFlZYifpHqUSePBB\n", "wMNDuEKMRKL5mDffBL7+WpgR2tsbPiPTwEXIWO9wERqCtzcwbZppLJohApYuBS5dAv7zH8DGRvMx\n", "O3cKRbh7NzB6tOEzMq1kMhku3nhhA8ZYt3ARGkp0NLB1q1A0xuyVV4CDB4WvwYM1x9PShJPlN2wA\n", "IiIMn491SiaToaGhATU1NWJHYcykcBEayoMPCifWHz0qdpLOffEF8M47wmkSw4drjufnAwsWAM8/\n", "DyxbZvB4rGt+fn6wtLTkw6OM9RAXoaG4uAD33mu0h0d/+ukn/CspCdiyBZg+XfMBNTXC4pm77gJe\n", "f93wAdktWVpawsfHh4uQsR7iIjSk5583ysOJubm5WLBgAfKnTwcefljzAW1tQFQU4OQkzBql/NfG\n", "WPGCGcZ6zlLsAGZl6lSxE2ioqqpCZGQk7rnnHrzyyiuaDyACnnhCOCz622+8QtTIcREy1nP81t7Q\n", "2tqA1asBLy/h64UXhG0iUCgUmDt3Lry8vPDFF19Aou00iddfB775RjhNwt3d8CFZj3ARMtZzXISG\n", "JpcDmZnAiRPCV3q6cB1PAyMiPPHEEygvL8d//vMf2Gg5TWLHtm0o+uEH4XxBPk3CJJhcERrRG0Nm\n", "vrgIDS05GVi/HvD1Fb42bBBOWjewv/71r/j2229x8OBBuLm5aYynpaXh0ccfx8GYGO2LZ5hRkslk\n", "KCwshFqtFjtK9xjJG0Nm3vg2TIZmZwdcuQLY2grfKxSAqyvwyy/CpcrGjgWcnfUaYcuWLVi5ciUO\n", "HTqEKVOmaIxfvHgRkyZNwvLly/E6rxA1KaWlpfDx8UFJSQm8vb3FjnNrt90mHHoPChK+z84G5s0D\n", "zp8XNxczKzwjNDQfH+DGq3/k5QnbvvpKOL3CxQUYNkw4X++NN4C9e4WrvOjI4cOHsWLFCnzyySda\n", "S7Cmpgb33Xcfpk+fjtdee01nr8sMw8vLC7a2tqZzeLS4GAgIuP59YCBQUiJeHmaWuAgNbdEiYNUq\n", "4RdAcbHw5yVLhENEDQ1CMa5bBwQHA8ePCxe2HjoUGDgQCA8XHr95s3CFF4WiRy+dk5ODefPm4ZVX\n", "XkFMTIzGuFKpxIMPPojBgwcjKSlJ++IZZtQkEgmGDh1qOpda6+yNIWMGxKdPGNqLLwpfEyYIF7N+\n", "5BHhNkcAYGEhvDsOCAAiI68/p6RE+Bzl9GnhvwcOABcvQm1jg2njxmFEUBDGjRuHsWPHYuzYsXBy\n", "ctJ42bq6OjzwwAOYO3cu/vrXv2qMX108U1BQgN9++w22Vw/dMpNjUgtmrr4x3LJF+P7qG0PGDIg/\n", "IzRVDQ1ozcrCpxkZOH36NDIyMpCdnY3W1lYEBAQgJCQEY8eOxbhx4zBu3Dj4+vpi69atiIqKgrW1\n", "tcbuEhISsHHjRhw9ehS33367CD8Q05WVK1eipaUFW66WizFTKoU3hlu3Xn9j+I9/AFZWYidjZoSL\n", "sB9RqVTIzc1FZmYmTp8+jczMTGRkZKC6uhpubm7XSvGee+5BxA1XuNmxYweWLFmCgwcP4m6+C73J\n", "e+edd7B//378/PPPYkfR9PnnQGiosCiMMSPBRWgGioqKrhVjZmYmCgoK8Oqrr2LOnDlITU3FrFmz\n", "kJiYiMWLF4sdlenAV199heeffx6FhYViR+mooUH4/O/f/xauW8uYkeAiNEPLly9HVVUV/v73v2Py\n", "5Ml4+umnsXbtWrFjMR1JT09HWFgYmpubtR4GF82GDcLXuXOa16utrNR+2y/GDIBXjZqh6Oho7N+/\n", "HzY2NlizZg0SEhLEjsR0SCaTob29HUVFRWJH6WjzZmD5cs0SrK8XThky5luUsX6Ni9AMTZs2DR4e\n", "Hvjxxx+xevVqPk2in3FxccHAgQONa+XoDz8Ip0k89pjm2BdfCLPBSZMMHosxgIvQLEkkEjz00EPY\n", "bqT3RmR95+/vb1xF+NFHwJ//DAwapDm2eTOwYgXf3ouJhv/mmano6Gj88MMPKC0tFTsK0wOjOpew\n", "tBTYtw9YuVJzrKuZImMGwkVopkJCQjBy5Ejs2rVL7ChMD4yqCDdtAkJChItI3KyrmSJjBsJFaMYW\n", "LlzIh0f7KaMpQqUS+PRT4KmnNMe6mikyZkBchGbs4YcfxvHjx3Ger/Tf7xhNEe7eDahUwEMPaY51\n", "NVNkzIC4CM3YsGHDEBYWhh07dogdhemYTCZDRUUFGhsbRc1RbfktVK8/c/22Y1d1NVNkzMC4CM1c\n", "dHQ0tm3bJnYMpmMymQwSiUTUq8u0tJxBQcCXaH/8Yc3BrmaKjBkYF6GZi46ORl5eHjIzM8WOwnTI\n", "zs4O7u7uoh4eraj4CM7O98PGRqYxpt63C3j8cc2ZImMi4CI0c+7u7pg+fTovmumHfH198csvv4jy\n", "2u3tdaip2Qp3d81Dny0tZ5D53F60vbxChGSMaeIiZIiOjsb27duhVqvFjsJ0QK1W4/XXX0dOTg7W\n", "rVuH8ePH49NPP0Vzc7PBMlRXfwErK084Oc3SGKuo+AhOzvfCytHfYHkY6woXIcOCBQtQVVWFtLQ0\n", "saOwPqqursYDDzyAjRs34uuvv0ZRUREWLlyIN954A15eXli2bBnOnj2r5xSEyspNGDx4BW7+FdPV\n", "TJExsXARMjg5OeG+++7jw6Mm7uTJk5gwYQKampqQmZmJWbNmwcPDA/Hx8cjPz8eXX36JixcvYsyY\n", "MZg5cyZ27doFlUql8xz19YegVF7CoEGPaox1NVM0dhKJcFnUm7cx08dFyAAIh0d37twJpVIpdhTW\n", "C5s3b0Z4eDjmzJmDH374Ad7e3h3GpVIpIiMjkZKSgpycHISGhmLZsmUYOnQo1q5di4qKCp1lqaz8\n", "GK6u0bC0vPlqMZ3PFE1FcjJgbLd5ZDpAjBFRS0sLOTs70969e8WOYhAA0ZYtmtu0/bmrbWKrr6+n\n", "hx56iJycnGjXrl09fu7HH39Mo0ePJhsbG4qJiaHjx3/rU57W1kt06pQlNTWd0Birq/ue0tPtqK2t\n", "qk+vIRaA6OxZonvvJWpvv76NmT7TfFvGdM7W1hbz5s0zq8Ojpv7uPicnB5MmTcL58+eRnp6OBx98\n", "sEfPd3R0xIoVK5CVlYUjR45AIpEgJ+cl5OSMR1XVZqjVTT3OVFn5T9jbT4C9vebVYjqfKZqOoCBg\n", "1izggw/ETsJ0ie9Qz675/vvvMW/ePJSXl8PBwUHsOHolkQBnzwLPPw/s3y/cAUgiAa7+a7jxzzc+\n", "x1j+tSQnJ2PFihVYsGABEhMTYWdnp/Vx58+fR3Z2NubNm9et/ba1laG6OgmVlR+jvb0OLi4L4eGx\n", "Cra2Qd16fm3tHkilNnByuqfDdqWyCNnZARgx4letJWkKrv7/JwLuvx94911g9Gjg0iXAx4fvImXK\n", "+H8du2bGjBlwcnLCnj17xI5iELd6dy+RdPwyBi0twNKlT2DlypVITExEcnJypyW4e/duTJgwoUf/\n", "P62sPOHpGY/g4HzIZP+CUnkRZ88G4/z5mbhyZReIul5cM3DgHI0SBLqeKZoaiQT45z+vXyt8wQJg\n", "+HChGGtqxM3GeoeLkF1jYWFhdjfsfeYZ4NAhYXZ4s6vv/q9+ie3cOeEm7g0NYTh27BhiYmK0Pk6l\n", "UmHNmjWIjo7GSy+9hC1btvTi1aRwdo7EbbelYNSoHNjbh6KwcBmysoaitHQtVKrKHu2tsTH1f4tk\n", "TMfPPwOdXY9g6NDrt1A8dAj4f/8PSEoCvL2Fq8YdOWKwmEwXxP6QkhmXX3/9laysrKiyslLsKHp1\n", "49/8ggKiadOMe7HM118TDRxINH8+UW1t548rKiqiKVOm0JAhQ+jo0aM6zaBS1VNlZSKdPTuGTp2y\n", "pgsXoqiuLqXDY+rrf6Dffx9P6em2lJ09nKqrt/5vRE1qtUqnefRFrSb6xz+IrKyI3nuvZ89NTSWK\n", "iiKytCQKDSVKTCRqatJPTqY7XIRMw2233UabNm0SO4ZOtbcTnTlz/fubS23LFuMswrY2ovh4Ihsb\n", "onXrun7sjz/+SB4eHvTAAw9QdXW1XnPV1/9EFy5E0alTVlRf/+O17adPe1FVVRK1t7eQQnGeLl58\n", "WK85dK22VnizMWgQ0YEDvd9PaSmRXE7k6yu8gYmLI7p4UXc5mW5xETINL7/8Mk2bNk3sGDpTUUE0\n", "cybRbbcJxdIXBQVEjY26yXUrRUVEd95J5OdH9OuvnT9OrVaTXC4na2trSkhIoPara/sNQKksIaLr\n", "r5edPZwuX5ZTQ8MRam83ranQqVNEAQFEEyborrRaW4l27iSKiCCSSoX/7txJpDKNybHZ4CJkGnJy\n", "ckgikVBBQYHYUfrs+HEif3/h0GdJSd/3FxMjvMN/5hmiP/7o+/4688MPRB4eRLNnE3U1uauoaKKZ\n", "M2eSp6cn/fjjj50/0EAUijwqLHyKcnLuoIyMgVRdvV3sSN2SlERkZ0cUGyuUlz7k5AgzQ3//Nhox\n", "IpjkcjlVVZnmOZX9DRch02rs2LH09ttvix2jTxIThUOKcXFESqVu9tneTpSSIhSUREI0ZYrwDr+v\n", "M82rVCqihAQia2vhv11N7tLShENvjz76JpWWluomgA41N2fR6dOeRETU3t4schrtGhqIoqOJHB2J\n", "duzo3T6ys7t+s3KzK1fq6YMPPqDhw4eTvb09LV26lE6dOtW7F2c6wUXItJLL5RQSEiJ2jF6pqxMW\n", "LDg5EX31lf5e59w54fM7V1cib2/hz8XFvd9fRQXRrFlE7u5C2XZGrRY+L9R1yevCxYuPkEJxjtRq\n", "BVVXb6fMTHdqa6umjAwXys9fQo2Nx8SOeE1ODtHo0UQjRwpXjOmthQuF2eTjjxOdPNmz56amplJU\n", "VBRZWlpSaGgoJSYmUnOzcb5p6M+4CJlWly5dIqlUStnZ2WJH6ZGMDKLAQKLx44kuXDDMa7a0CIfW\n", "xo4VZnJRUV0XmTZHjwplOn06UVlZ548zVMn3VnX1NsrOHk6nTtnQ2bOjqLZ2DxERNTWdpPz8RXTq\n", "lBX9/nsoVVYmivoZ4r/+RTRgANGiRbpZ1dnX1aIlJSUkl8vJx8eHBg4cSHFxcZSfn9/3YKxbuAhZ\n", "p6ZOnUqvvPKK2DG6LSmJyN5e+OUm1pvqkyeF17eyIgoJEX4h3ri4prNrnGZlEb36ateLKMQo+a60\n", "thbS77+HkkrVxfkcN1EqL9Ply3I6c2YIZWQ4U0FBLLW0/K7HlB21tLTQyy9vJTs7os8/1/3++7pa\n", "tLW1lXbu3EkREREklUopIiKC9uzZQ2q1Wvdh2TVchKxTzz77LI0cOZIWLVpE//73v6m2qxPYRNTS\n", "QvTEE8Lhqc8+EzuNoLSU6LXXiHx8iNzcrpchIMz6blyH1J3TMoyh5G9WXPwi5eRM7tVz1WoV1dbu\n", "oXPnIujkSSmdOxdBNTU79XquYV5eHoWEhNDw4cMpK0u/f5dbW4m2bhU+Q5ZKiSIjib7/vrlHK3rT\n", "09PpiSeeIHt7e1q4cCE1Gmq5shniImRaFRUVkYuLC61atYqWLl1KHh4eZGVlRTNmzKAPPviAzp8/\n", "L3ZEIiLKzc2lmTMbKSiI6HfDTSy6ra2N6MiR69/39A4GV0vewYFo2zb9Zu0JtbqVTp/2oOrqLzXG\n", "WluL6MqVr7u9r5aWXLp0KY7S0x0oKyuALl+WU1ubbi/osGfPHnJxcaG5c+fSlStXdLrvW8nIIHry\n", "SaIZM7ZSYGAgvffee1RTU9Pt51dXV5OdnR2lpqbqL6SZ4yJkGtrb22n69Ok0Z86cDtuzs7NJLpfT\n", "lClTSCKRUEBAAMXFxVFKSgopRVixsWPHDnJ0dKSnn15vsHP7+upq6b3/PtG773bcps3GjUSjRhlf\n", "yVdX/4syMwdTe3uLxlhx8RrKzb2zx/tUqeqovHwDZWePoKSkSfToo4/S8ePH+5Szra2NEhISyNra\n", "muRyuaiHGOvq6igxMZFGjRpFNjY2FBUVRUdufJfUhREjRlBycrKeE5ovLkKm4d133yU3Nze6fPly\n", "p48pLy+npKQkioqKIgcHB3JxcaGoqChKSkrq0bvd3mhra6P4+HiytbWldbe63IqRuVp6arUwK8zO\n", "7roI29uN8xJdOTmTqLj4JY3tarWCMjPdtc4Uu09NJ0/+TPPnzydLS0sKCwujL774glpaNEu3K8XF\n", "xRQeHk6+vr7dLhxD6elq0fvuu49ee+01AyY0L1yErIOzZ8+SnZ1dj27y2tzcTCkpKRQXF0e+vr5k\n", "YWFBoaGhlJCQQCd7up78Fi5dukSTJ08mPz8/+u23vt1EVgy3usapKWhqyqBTpyyotTVfY6yqKrnT\n", "mWJvlJaWklwuJ19f32urKS90Y6XQTz/9RJ6ennT33XdTWVfLcEXW3dWiK1eupEcffdTwAc2Eif0T\n", "ZPqkVCppwoQJff4Hp+0QamxsLO3Zs4da+3DZjkOHDpG7uzvNnj1b77NOfbnVNU5NQUHBUsrLm6t1\n", "rLOZYl9pW025c+dOUt20zPbq5eYsLS0pPj7eoJeb64sbf77HH39cY/ydd97pV5c9NDYm9k+Q6dNL\n", "L71Evr6+Oi2ZioqKa4dQHR0dacCAATR79mxKTEzs8tDrjVQq1bXPeQx9LU3WkUp1hdLTB1Bd3fca\n", "Y13NFHUpJyeH4uLiyMHBgYYNG0ZyuZwqKyupqqqK7r33Xho8eDB99913es2gT9reLH711Vc0ZMgQ\n", "EdKYBy5CRkRER48eJSsrK63Xq9y4caNODnW2tLRcO4Q6ZMiQbh1CraiooJkzZ5K7uzsdOnSo16/N\n", "dKOs7D3KygqkGy+0fVVXM0V9qK2tpfXr19OIESPIzs6Ohg4dSlOnTqUSXVxU1sikp6eTVColhUIh\n", "dpR+iYuQUWNjIw0fPpyef/55reM5OTm0du1amjBhAkkkEgoMDKRnn32WDh061KfVojceQpVKpSST\n", "ya4dQlUoFHT48GHy9vamu+66yyivpWl+1JSVdRuVl3+gMdLVTFHvqdRqev/998nGxkaU1cuGUFtb\n", "SwCM5rSl/kZCZAz33mZiWr58OVJTU3Hy5EnY2dl1+djKykocPHgQ+/btw7fffgu1Wo3p06cjMjIS\n", "c+bMgaenZ68ylJWVYf/+/di3bx9SUlIgkUigVqvx3HPPYe3atbCwsOjVfpkOffcd1K+vAR3+CRaW\n", "AzsMlZe/j8rKTRg9+g8AUoNHKy4uxpAhQ3D58uVe/x00di4uLti5cydmzpwpdpR+h4vQzH3//feI\n", "jIzEkSNHMGHChB49V6FQIC0tDXv37sV///tfFBcXIyQkBLNnz0ZkZCRCQ0N7lUmhUCA2NhalpaU4\n", "dOhQr/bB9GDOHMDTE9i8+aYBwtmzIzB48Aq4uz8jRjKo1WrY29vjp59+wuTJkzuMbdmyBQ0NDYiL\n", "ixMlm66MHz8ey5cvR2xsrNhR+h3Dv3VjRkOlqsagQW9i82Z5j0sQAGxtbREREYH169ejsLAQZ86c\n", "QVRUFA4dOoSwsDDIZDIsW7YMe/fuRWtra4/2O3LkSEil/NfTaFy6BBw4ACxfrjHU/tuPGHR8MAYN\n", "WiJCMIFUKoWfnx/y8/M1xoqKivDtt9+KkEq3ZDKZ1p+P9R3/pjFjly6txIABKixerJt3yqNGjUJ8\n", "fDzS0tJQXl6Ot99+Gy0tLYiJiYGrqysiIyOxefNmlJaW3nJf1tbWUCqVOsnFdODjj4FJk4Dx4zWG\n", "LP6+Hp6/BsHCwkWEYNd1VhT9pUD6y89hjLgIzVRNzZeoq9sPf/8vIJHo/vM3Nzc3REVFITk5GdXV\n", "1fjuu+8watQovP/++xgyZAgmTJiAtWvX4tSpU9B2dJ6L0Ii0tgJbtgArV2qOdTFTNLSuirCgoEDr\n", "3zNTwkWoP1yEZqitrQRFRavg6/sebG2H6/31LC0tER4eDrlcjtzcXGRmZmLBggX4/vvvrx1CTU9P\n", "7/AcLkIjsnMnQAQsWKA5tmkTMHGi1pmioXVVhAqFAmVlZSKk0h0uQv2xFDsAMzRCQcETsLefgMGD\n", "xfnQPTg4GMHBwXjxxRdRWVmJAwcOYNiwYR0ew0VoRD76CIiNBWxsOm5vbQU+/xz44ANxct2ks6Lw\n", "9vaGra0t8vPz4eXlJUIy3ZDJZKisrERDQwMcHR3FjtOv8IzQzFRUfIimpmMYOvQzABKx42Dw4MFY\n", "smQJnJ2dO2znIjQSGRnAyZPAk09qjnU1UxSBTCZDUVERVCpVh+0SiaTThTSmxN/fHxKJBIWFhWJH\n", "6Xe4CM2IQpGLkpI1GDr0n7C29hU7Tpe4CI3Exo1AZCQwdKjmWGczRZHIZDKoVCoUFxdrHTP1IrSz\n", "s4OHh4fJ/xzGiIvQTBCpUFCwBAMHzoeLy0Nix7klLkIjUF8P7NihfZFMVzNFkbi5ucHR0bHfrxy9\n", "ePGi2DH6HS5CM3H58ptQKkswZMgGsaN0CxehEXByAo4eBSIiNMe6mimKyN/fv98XYX/4OYwNL5Yx\n", "A83N6Sgr+wcCA7+BpaWr2HG6hYvQSIwdq7mttlaYKX7zjcHj3EpXK0f7w0wqICAAZ86cETtGv8Mz\n", "wn6OSIX8/BgMHrwMTk73ih2n27gIjVhNDRATA8yYIXYSDV0VYXFxMdra2kRIpTs8I9QPLsJ+TiKx\n", "hJ/fRvj4yMWO0iPW1tY9uiwb04O2NmD1asDLS/h64QVhW0AAkJgISMRfdXyzroqwvb0dRUVFIqTS\n", "nasXB2C6xUVoBhwd74ZUai92jB6xtraGWq1Ge3u72FHMl1wOZGYCJ04IX+npwFtviZ2qS50V4aBB\n", "g+Ds7GzysymZTIaGhgZUVVWJHaVf4SLsZxoafkROTigyMuxw9uztqKnZJnakXrG2tgYAPjwqpuRk\n", "YP16wNdX+NqwAUhKEjtVl2QyGcrKytDc3Kwx1tlCGlPi6+sLKysrk/85jA0XYT+Tnx8Dd/dVGDv2\n", "CgID96Oubr/YkXrlahHy4VERFRcLh0GvCgwESkrEy9MNMpkMRKT1pPP+8PmapaUlfH19Tf7nMDZc\n", "hP2MhYUj2touo7k5HVZW3pDJtoodqVd4RmgEfHyAG1da5uUJ24yYg4MD3Nzc+BQK1iNchP1MYOAB\n", "KJVFKC5+BmfO+KCmZofYkXrF5n9XK+EiFNGiRcCqVcLMsLhY+PMS8e452F18OybWU1yE/YyNzTD4\n", "+W3EiBHHcfvtqSguflbsSL3CM0Ij8OKLwJgxwIQJwB13ACEhQHy82KluKSAggIuQ9QgXYT+Tnx+D\n", "1tbzIGpFS0s2iNRiR+oVLkIjYG0NvPceUFYGXL4MvPsuYGUldqpb6mpGWFFRoXUhjbFRq9V4/fXX\n", "NW5PBnAR6gMXYT/j7PwA8vJmIyPDGWVlb8Lf/1OxI/WKSRZhZ+fdMYPqrCj8/f0BwOjPw6usrMS9\n", "996LTZs2aS1tmUyGwsJCqNWm+SbXGHER9jOurtEYNeoPjB+vQFBQNpydI6FWK6BUmtaJxFZWVpBI\n", "JKZVhCZ43l1/1Nnl1AYMGIDBgwcb9WzqxIkTCAsLQ2trK06dOoXw8HCNx8hkMiiVSpSWloqQsH/i\n", "IjQDZWV/R17efVCrm8SO0m0SiQRWVlamVYQmeN5dfySTyVBXV4fa2lqtY8ZYhESE9evXY+rUqZgz\n", "Zw4OHToEb29vrY8tKyvDqFGjTOIQr6ngIjQDnp4vQiKxQmGh8dwypztM7nqjJnjeXX80dOhQWFhY\n", "mMyCmfr6eixcuBCvvvoqtm7divXr18Oqk89iP//8c9x5552YNm0ahg8fbuCk/RcXoRmQSu0QELAT\n", "dXX7UVW1Wew43WZyRWiC5931R1ZWVvD29jaJIszMzERoaCguXLiAjIwMLFiwQOvjFAoFli1bhr/8\n", "5S/48MMP8fHHHxs4af/GRWgmbGxuw9Chn6CoaBWam0+JHadbTK4ITfS8u/6os8IbOXIkLC2N4+5z\n", "ycnJmDJlCiZPnoy0tDQE3Hg04Qbnz5/HxIkTcfjwYRw7dgxLly41cNL+j4vQjLi4PIRBg5biwoUF\n", "UKlqxI5zTUlJCY4ePaqx3WiLsK0NOHtWc7uJnnfXH3VWhIsWLcLOnTtFSHSdQqHAk08+iaeeegqf\n", "fvopkpOTYWdnp/Wx33zzDcLCwhAYGIhjx44hODjYwGnNAxehmRky5ANYWXmjoGAJABItx5UrV/Dp\n", "p5/i7rvvhp+fH97SsrrSaIvwzTeBuXOBm++MYaLn3fVHxnYI9Kpz585h4sSJSEtLw6+//oro6Git\n", "j1OpVFizZg0WLlyItWvXYvfu3XB2djZwWvPBRWhmJBIrBATsQFPTrygvf9+gr61QKLB371489NBD\n", "8PT0xOuvv47g4GAcPnwY32i527lRFuGpU8JpEh99BFhYiJ2GdcIYi/Drr79GWFgYgoODcfLkSYwe\n", "PVrr44qLi3HXXXdh27Zt+Pnnn7Fq1SoDJzU/XIRmyNraD/7+SSgpeQmNjal6fa329nakpaVh2bJl\n", "8PDwwKOPPgpbW1vs378fhYWFWL9+vdZzpYScRlaEzc3AI48AK1YA99wjdhrWhcDAQLS2tmL79u2i\n", "/x26Ort7+OGH8dprr+HLL7/EgAEDtD72xx9/xIQJE+Di4oLMzExMmjTJwGnNFDGzVVy8hk6f9iSl\n", "8rLO933y5EmKi4sjDw8PsrOzo6ioKNqzZw8plcpu7yMsLIw2btyo82y99tRTRCNGEDU3a441NBA9\n", "+SRRdbXhczGt1q1bR25ubuTp6Ukvv/wyFRUVGTzDpUuXaPLkyeTn50e//fZbp49Tq9Ukl8vJ2tqa\n", "EhISqL293YApGRehGVOr2+iPP+6ic+ciiKjv//Cys7MpISGBAgMDycbGhmbPnk1JSUnU0NDQq30N\n", "Hz6cVq5c2edcOpGSQmRlRXTsmPbxJ54gCgoiamkxbC7WJYVCQTt37qSIiAiSSqUUERFBe/bsIbVa\n", "rffX3rdvH7m6ulJkZCTV1NR0+riKigqaNWsWubu7U0pKit5zMU1chGZOqSylZ599gBISEnr1/EuX\n", "LtG6deto/PjxJJVKacqUKbRu3TqqqKjo8b4KCwtJLpfTmDFjSCqVUlhYGFlbW9OyZct6VaY6c+UK\n", "0ZAhRK+9pn3822+JbGyIMjMNGov1zMmTJyk2Npbs7e3ptttuI7lcTtV6mMGrVCpKSEjo1uzul19+\n", "IW9vb5o2bRqVlpbqPAvrHi5CRj///DNZW1vTwYMHu/X46upqSkpKooiICJJIJBQUFEQJCQl08eLF\n", "Hr92TU2N1n1duHCBiIiysrJo3Lhx5O/vT4cPH+7x/nUiOpooNJRI22HdykoiT08iudzwuViv1NbW\n", "UmJiIo0cOZJsbW1p0aJFlKmjNzEVFRU0c+ZMcnd3p0OHDnX6OLVaTevWrSNra2uKi4vr0UcGTPe4\n", "CBkREf3tb38jV1dXys/P1zre3NxMO3fupNmzZ5OVlRX5+/tTfHw85ebm9vi1btyXtbU1DR06lOLi\n", "4igjI0Pr41taWig+Pp4sLS0pLi6OFApFj1+z13b+m8jenqiznzMqiujOO4lUKsNlYjrR3t5OKSkp\n", "FBUVRRYWFhQaGkqJiYnU0svD2ydOnCAvLy+aMWMGlZeXd/q4uro6evDBB8nZ2Zn+85//9DY+0yEu\n", "QkZEwjvUuXPn0sSJE6m1tVVjfMWKFeTj40PPPfccnTx5ssf7V6lUlJKSQosWLSIHBwcaNGgQxcbG\n", "Umpqarc/rzly5AgFBgbS6NGjOy1NXVIqSyj3l9tI+d/PtD8gKYlowACic+f0noXp14ULFyg+Pp7c\n", "3NzIw8OD4uPjqbCwsEf7uHz5Mv39738nVRdvitLT02nYsGEUGhraqyMoTD+4CNk1NTU1JJPJ6Nln\n", "n9UYq62t7fFKtvb2dkpNTaW4uDhyd3cnZ2dnWrRoEe3Zs4fa2tp6lbGuro5iY2PJ1taW5HJ5l790\n", "+kZN58/f/7+FRFqKuqiIyMWF6JNP9PT6TAxXF9dMmTKFpFIpzZ49m1JSUnSyuCYpKYns7OwoNjZW\n", "65tNJh4uQtbB8ePHycbGhnbt2tXrfVxdPRoQENBh9WhTU5POch44cIC8vLzozjvvpPPnz+tsv1dV\n", "VHxEGRkDqbX1kuZgezvR3XcT3XMPkQFWHzJx3Li4Zvjw4SSXy7tc/dmZ5uZmWrp0KTk4OND27dv1\n", "kJT1FRch0/Dhhx+So6Mj5eTkdPs5BQUFJJfLacSIEddWjyYmJlJdXZ3eclZUVND//d//kZOTEyUm\n", "JupsvwpFHqWnO1B1tfZfWk2/fELk4010WffnXzLjc+XKFVq3bh3JZDJydHSk2NhYOn36dLeem5ub\n", "S8HBwTRixAjKzs7Wc1LWW1yETKuYmBgKDg7uchZXVVVFiYmJNGXKlGsrPuVyOV02cEF89tlndOed\n", "gXTu3J+pra3zRQrdoVa3UU7OJMrLm691vKXld0pPt6O6sq/79DrM9GhbXJOUlNTpis/du3eTs7Mz\n", "xcTEUGNjo4HTsp7gImRaNTQ00MiRI2np0qUdttfW1lJSUhLNnj2bLC0taeTIkZSQkEDnRF4w0tyc\n", "T7m50ygz052uXPlvr/dTWvo6nT7tTW1tVRpjQkneQfn5i/sSlfUDeXl51xbXeHp6dlhco1AoKC4u\n", "jmxtbWndunUiJ2XdISEi8W5BwIxadnY2Jk2ahPfffx9eXl7YtWsXdu/eDVdXV8yfPx9RUVGdXidU\n", "HISKig0oLo7HwIFzMHRoIiwsXLr97ObmDOTmTsKwYV/D2fl+jfHS0ldQXf0FgoLO9Gi/rP9qaWnB\n", "9u3b8fHHH+PMmTO4//77kZ+fj6amJnz11VcYN26c2BFZN3ARsi59/PHHeOutt9DU1ISHHnoI0dHR\n", "CA8Ph0QiETtap1pazqKgYDFUqmr4+38BR8c/3fI5RK3IyZkAB4dp8PP7SGO8ufkUcnPvxG23HYSj\n", "4916SM1M3bFjx/DJJ59AJpPhL3/5C982yYQYx62amdGyt7cHEeHy5cuwMpF769nZjcKIEb/h8uW/\n", "4fz5mRg8eCV8fd+GRGLT6XPa2i7DxiYQvr5va4yp1c3Iz38Y7u5PcQmyTk2cOBETJ04UOwbrBZ4R\n", "si4tWLAAPj4+2LBhg9hReqWp6Tfk5y+GVGoNf/9k2NuP7/E+Ll1aiYaGnzFy5ClIpdrvJM4YM118\n", "P0LWKaVSiZSUFMyePVvsKL02YMAkBAVlwtFxBnJzJ6G0dC2I2m/9xP+pr09BVdVn8PdP4hJkrJ/i\n", "ImSd+vnnn0FEuOuuu8SO0idSqT2GDFmPYcO+RlVVIs6duwvt7Vdu+Ty1uhkFBY/By+tlDBhwhwGS\n", "MsbEwEXIOrVv3z7MmjULNjadf7ZmSpydH0BQUBacnO6FhcVAAEBDw4/IyQlFRoYdzp69HTU12649\n", "Xiq1h0yWBE/PF0VKzBgzBC5C1ql9+/YhMjJS7Bg6ZWnpBi+vlwEIq17z82Pg7r4KY8deQWDgftTV\n", "7e/weEfHGZBIeE0ZY/0ZFyHTKjs7G4WFhbjvvvvEjqJXFhaOaGu7jObmdFhZeUMm2yp2JMaYgXER\n", "Mq327duHsLAweHh4iB1FrwIDD0CpLEJx8TM4c8YHNTU7xI7EGDMwPubDtNq7d69JrxbtLhubYfDz\n", "2wgAaGnJxvnzM+Hq+meRUzHGDIlnhExDdXU1jh071u8+H9QmPz8Gra3nQdSKlpZsEKn/N0JQqxWi\n", "ZmOMGQYXIdOwf/9+eHt7Izg4WOwoOlVe/i4qKtZ32Obs/ADy8mYjI8MZZWVvwt//UwBAdfW/kJMT\n", "iubmU2JEZYwZEBch03B1tagxX0+0p1paTqOk5GXY2AzrsN3VNRqjRv2B8eMVCArKhrOzMAt2cYnC\n", "wIGRyM29EyUla0DUJkbsPktNTcXatWvFjsGYUeMiZB0QteHee+uxcOEcsaPoDFEr8vMXY9CgJXB2\n", "7t7nnlKpHXx85AgM3Ieamq34449wKBR/6DmpbhARDhw4gKlTpyIiIgJFRUVQqVRix2LMaHERsg4a\n", "Gg4jJCQN4eGmfTWZG5WUvAy1ugG+vu9qjKnVLVAqizt9rpPTTAQFZcPW9nbk5ISgrOwtAOpOHy8m\n", "tVqNvXv3YuLEiZg/fz6CgoKQl5eHzz77DJaWvC6Osc5wEbIO6ur2wclpJqRSW7Gj6ERj4xFUVKyH\n", "v/8XsLBw1BgvKVmDixcf7HIfFhbO8PdPhr9/EsrL38H58/d2WZ6GplQqkZycjKCgIMTExGDy5MnI\n", "z89HYmIihgwZInY8xoweFyHroK5uf7cPHxo7tboJBQWPwsPjeTg4TNMYb2j4AZWVmzBkyHotz9bk\n", "4hKFoKBsSCTW+P33YFRX/0vXkXukqakJ69evx7Bhw7B69Wr8+c9/RmFhIdavXw8vLy9RszFmSvh4\n", "CbtGochBa+sFODv3j6vJFBXFQSq1gZdXgsZYe3sdCgoeh6fnSxgwoPv3kLOy8kRg4F5UVX2CS5dW\n", "oK5uL/z8NsHScpAuo3epvr4emzZtwjvvvANHR0e88MILiI2NhZ0d3x2Dsd7g+xGya8rK3kZt7VcY\n", "MeK42FH6rLZ2Dy5efAgjRx6Hnd0YjfH8/BgoFDkYMeI3SCS9u+GwQvEHCgoWo66uHo2NH2LGjIi+\n", "xu5SeXk5Nm3ahHXr1mHIkCFYvXo1Hn74Yf78j7E+4kOj7Jp+c1i0shKqHZvg4/OG1hKsrf0atbW7\n", "4e+f3OsSBABb29tx++1HcOnSctx//wN4+umn0dzc3JfkWuXn52PVqlXw9/fHnj17sGHDBpw+fRqL\n", "Fy/mEmRMB7gIGQBApapBU9PRa+fRmbTYWLhta4KH+/MaQypVBQoLl8PH5y3Y2Y3q80tJJJaIilqF\n", "EydO4JdffkFwcDDS0tL6vF8AyMrKwuLFizF8+HCcOnUKO3fuRHp6OhYvXgyplP/pMqYr/K+JAQDq\n", "6w/C0tId9vbjxI7SN1u2ACkpwOefAxLNv94FBY/Dzi4Y7u5P6/Rlx4wZg2PHjiEqKgozZszAmjVr\n", "oFQqe7WvI0eOIDIyEiEhIbhy5QrS0tKQlpZmFpe8Y0wMXIQMgHDaxMCBkbh6nz6TVFwMPPccsGED\n", "EBioMVxVtRmNjUfg778F+vg5bW1tIZfLceDAAWzbtg133HEHzpw50+3nXy276dOnw8XFBVlZWdfO\n", "C2SM6Q8XIQORCnV135n254NqNbBoETBtGvD445rjFy7AddYGBDj/E9bW+j23bsaMGcjKysKkSZMw\n", "ceJEvPXWW1CrtZ+Ef/Uk+LCwMERERMDb2xt5eXlITk7GyJEj9ZqTMSbgImRobEwDkQKOjneLHaX3\n", "3n8fyM4GEhM1x9Rq4PHHIfUfDifZQoPEcXZ2RmJiIr788ku8++67mDVrFoqKiq6NXz0JftSoUXjk\n", "kUc6nATv5+dnkIyMMQEvOWOoq9sHR8cZkErtxY7SO7//Drz6KpCcDHh6ao7L5UBuLpCVZfBoCxYs\n", "QHh4OGJjYxEcHIw33ngDarUa7733HhQKBVauXIlnnnkGAwcONHg2xpiAzyNkqKxMhJWVBwYO/D+x\n", "o/RcWxtw551AcLCwQOZmmZnApEnA7t3AAw8YPN5VRITNmzfjzTffBBFh9erVfBI8Y0aCi5CZtvXr\n", "gXXrgNOnASenjmOtrcAddwBTpgCbNokS72be3t7YtGkT5s6dK3YUxtj/8GeEZqqh4Ufk5IQiI8MO\n", "Z8/ejpqabWJH6p2VK4XTJW4uQQB46SWgqQl4+23D59KCiFBdXQ0PDw+xozDGbsBFaKby82Pg7r4K\n", "Y8deQWDgftTV7Rc7Uu9YWWk9VQJpacCHHwrnFTpq3nVCDHV1dVAqlXBzcxM7CmPsBlyEZsrCwhFt\n", "bZfR3JwOKytvyGRbxY6kO/X1QEwMsHq1cDqFkaiqqgIADB48WOQkjLEbcRGalevnsgUGHoBSWYTi\n", "4mdw5owPamp2iJirF9rahKLz8hK+XnhB2AYAFy4Ao0YBCZp3nRBTZWUlrK2t4aTtMC5jTDR8+kQ/\n", "plY3orHxNzQ2pqGp6QhaWwswevR5AICNzTD4+W0EALS0ZOP8+Zlwdf2zmHF7Ri4XVoSeOCF8v3gx\n", "8NZbwMsvAyEhwH7jO9RbWVkJNzc3SCQmfPUexvohLsJ+pK2tFI2NqWhsTENDwy9oacmGhYUzHBym\n", "wNFxJry8wgEQAAny82Pg7Z0Aa2s/tLRkg0j7lU+MVnIy8M03gK+v8P2GDcC8eUIRGqnKyko+LMqY\n", "EeIiNGGtrRfR2JiGxsYjaGxMg0LxO6ysvODgEA43t8fh4BAOe/sQaDsC7uz8APLyZqO1tRC2toHw\n", "9//U8D9AXxQXAwEB178PDARKSsTL0w1VVVVchIwZIS5CE6FSqZCRkYG0tDSkpqYiIqIFEyd+B1vb\n", "kXBwCIen5xo4Ok6FtbV/t/bn6hoNV9do/YbWJx8f4OJFIChI+D4vT9hmxK4eGmWMGRcuQiPV3NyM\n", "Y8eOITU1FWlpafj111+hUCgwfvx4hIeHY/z4CIwd+y9YWprpL9ZFi4BVq4TTIwDhz0uWiJvpFvjQ\n", "KGPGiYvQSDQ0NODYsWNIS0vDkSNHkJqaCgsLC4SEhCA8PBxPP/00pk2bBmdnZ7GjGocXXxS+JkwA\n", "JBLgkUeA+HixU3WpsrISgdrOeWSMiYovsSaypqYmTJo0CWfPnsWgQYMwZcoUTJ06FeHh4QgNDYWl\n", "Jb9X6S/CwsLw2GOPYcWKFWJHYYzdgH/LimzAgAGIj4/H+PHjMXLkSF5a34/xoVHGjBMXoRGIiYkR\n", "OwIzAC5CxowTX1mGMQNoaWlBU1MTFyFjRoiLkDEDqKysBAA+fYIxI8RFyJgBVFVVQSqVwtXVVewo\n", "jLGbcBEyZgCVlZVwcXHhVcCMGSEuQsYMgBfKMGa8uAgZMwAuQsaMFxchYwbAF9xmzHhxETJmAHzB\n", "bcaMFxchYwbAh0YZM15chIwZABchY8aLi5AxA6iqquJDo4wZKS5CxgyAZ4SMGS8uQsb0rL29HbW1\n", "tVyEjBkpLkLG9KyqqgpqtZqLkDEjxUXImJ7xBbcZM25chIzpWVVVFRwcHGBrayt2FMaYFlyEjOkZ\n", "L5RhzLhxETKmZ15eXnj44YfFjsEY64SEiEjsEIwxxphYeEbIGGPMrHERMsYYM2tchIwxxswaFyFj\n", "jDGzxkXIGGPMrHERMsYYM2tchIwxxswaFyFjjDGzxkXIGGPMrHERMsYYM2tchIwxxswaFyFjjDGz\n", "xkXIGGPMrHERMsYYM2tchIwxxswaFyFjjDGzxkXIGGPMrHERMsYYM2tchIwxxswaFyFjjDGzxkXI\n", "GGPMrHERMsYYM2tchIwxxswaFyFjjDGzxkXIGGPMrHERMsYYM2tchIwxxswaFyFjjDGz9v8B4J0Q\n", "mSrjxh4AAAAASUVORK5CYII=\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol8 = Chem.MolFromSmiles(r'COC(=O)c1sc(SC)c(S(=O)(=O)C(C)C)c1N/N=C(\\C#N)S(=O)(=O)c1ccccn1') #CHEMBL3211428\n", "mol8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though we can clearly see that there should be a substructure match, we don't get one:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol8.HasSubstructMatch(pains8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The problem is that explicit H in the SMARTS. In order to get a match we need to either add hydrogens:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol8h = Chem.AddHs(mol8)\n", "mol8h.HasSubstructMatch(pains8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or merge the H atom into the atom it's attached to:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pains8h = Chem.MergeQueryHs(pains8)\n", "mol8.HasSubstructMatch(pains8h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's important to note that this still matches the molecule with Hs:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mol8h.HasSubstructMatch(pains8h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So what did this do to the query?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[#8]=[#16](=[#8])-[#6](-[#6]#[#7])=[#7]-[#7&!H0]'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Chem.MolToSmarts(pains8h)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`MergeQueryHs()` finds explicit H atom queries and merges them with the query on the attached atom to add an hydrogen count query. In this case it's specified that the nitrogen has at least one H atom.\n", "\n", "The handling of explicit Hs in queries was, I suspect a large part of the reason that the RDKit didn't generate many matches in the [KNIME PAINS paper](http://doi.wiley.com/10.1002/minf.201100076): 391 of the 480 PAINS SMARTS patterns contain an explicit H atom or an explicit H as part of an atom query. Some of those would still generate matches, but many would not.\n", "\n", "I'll show a few more examples of how the merging works. Rather than using `MergeQueryHs()` explicitly in these examples, I will tell the RDKit to perform the merge as part of building the molecule from the SMARTS using the `mergeHs` argument to `MolFromSmartS()`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[#6&!H0&!H1]'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "patt = Chem.MolFromSmarts('[#6]([#1])[#1]',mergeHs=True)\n", "Chem.MolToSmarts(patt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code can handle recursive SMARTS properly:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[$([#6]-[#7]),$([#6&!H0]),$([#6])]'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "patt = Chem.MolFromSmarts('[$([#6]-[#7]),$([#6]-[#1]),$([#6])]',mergeHs=True)\n", "Chem.MolToSmarts(patt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But there's nothing it can do about OR queries (this is a solvable problem but the logic is complex, [here's the bug](https://github.com/rdkit/rdkit/issues/558)):" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'[#6]-[#1,#6]'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "patt = Chem.MolFromSmarts('[#6]-[#1,#6]',mergeHs=True)\n", "Chem.MolToSmarts(patt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While fixing the PAINS SMARTS definitions (see below) I worked around this shortcoming in the current version of the code by manually editing the affected SMARTS, so something like the above example would become:\n", "`[#6;!H0,$([#6]-#6])]`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Back to the PAINS\n", "\n", "The `SLN->SMARTS` translation that Rajarshi did resulted in SMARTS that contain explicit Hs. So in order to have the PAINS be useful, I needed to make sure that the H merging was working as well as possible and that \n", "\n", "The first step in testing and cleaning up the SMARTS was to find molecules that match when explicit Hs are present, but which don't match when the Hs are implicit. Here's some code I used for doing that:\n", "```\n", "print(\" reading patts\")\n", "smas = [x[0] for x in csv.reader(open('./wehi_pains.csv'))]\n", "opatts = [Chem.MolFromSmarts(x,mergeHs=False) for x in smas]\n", "patts = [Chem.MolFromSmarts(x,mergeHs=True) for x in smas]\n", "\n", "print(\" reading mols\")\n", "smis = [x[0] for x in csv.reader(open('./test_data/wehi_mols.csv'))]\n", "ms = [Chem.MolFromSmiles(x) for x in smis]\n", "mhs = [Chem.AddHs(x) for x in ms]\n", "\n", "print(\" filtering\")\n", "matches=[]\n", "found=0\n", "for i,(m,mh) in enumerate(zip(ms,mhs)):\n", " for j,(patt,opatt) in enumerate(zip(patts,opatts)):\n", " t1 = m.HasSubstructMatch(patt)\n", " t2 = mh.HasSubstructMatch(opatt)\n", " if t1:\n", " found+=1 \n", " if t1^t2:\n", " matches.append((i,j,smis[i],smas[j]))\n", " print(i,j,smis[i],smas[j])\n", " if not (i+1)%100:\n", " print(\" Done: \",i+1,\" matches: \",len(matches),\" found: \",found)\n", "print(\" Done: \",i+1,\" matches: \",len(matches),\" found: \",found)\n", "```\n", "The idea is simple: find cases where one form of the pattern matches and the other doesn't.\n", "\n", "Starting from the SMARTS definitions and 10K test molecules provided as part of the [KNIME PAINS paper](http://doi.wiley.com/10.1002/minf.201100076) I did a number of passes of pattern-tweaking and bug-fixing until the above code produced no matches.\n", "\n", "The next step, more time consuming, was to repeat the process for a larger set of molecules: the 1.4 million molecules in the [ChEMBL20 SDF](ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_20/chembl_20.sdf.gz). \n", "\n", "At this point I had a set of SMARTS definitions that produced the same results as on molecules without Hs as the original SMARTS did for molecules with Hs. This had been tested on ChEMBL20 and the 10K test compounds." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Producing a test set\n", "\n", "Another nice thing to have would be a test set for the PAINS filters: a set of molecules where we know which PAINS filters they should match (and which they should *not* match!).\n", "\n", "Building a good test set is hard and takes more time than I had available, but I wanted to get a start on it by finding at least one molecule that matched each of the PAINS filters.\n", "\n", "I started using the 10K test molecules and ChEMBL20. The 10K set produced matches for 144 PAINS, ChEMBL produced matches for 293. Taking duplicates into account I now had examples for 309 of 480 PAINS. Running the set I now had across around 16 million compounds from the [ZINC full set](http://zinc.docking.org/db/bysubset/6/6_p0.smi.gz) turned up another 106 matches, bringing the total to 399; 81 to go!\n", "\n", "I figured my best bet to get the remaining 81 was PubChem, but I really didn't feel like pulling down the full set of 40+ million compounds and processing it. So I opted to use the PubChem web services API. Here's a bit of the code I used for that:\n", "```\n", "import urllib,requests,json\n", "base = 'http://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastsubstructure/smarts/%s/property/CanonicalSmiles/json'\n", "pubchem={}\n", "for key in missing_keys2:\n", " sma = smas[key]\n", " url = base%urllib.quote(sma)\n", " print(sma)\n", " try:\n", " r = requests.post(url)\n", " pubchem[key]=json.loads(r.text)\n", " except KeyboardInterrupt:\n", " break\n", " except:\n", " import traceback\n", " traceback.print_exc()\n", "```\n", "\n", "This returned matches that led to [another round of SMARTS editing](https://github.com/rdkit/rdkit/commit/e916171537a78c046a89f801ecbb030cc979314a), most of which were due to differences in aromaticity models. After finishing the process, I had examples for an additional 63 SMARTS where the RDKit could generate matches. There are 18 left that I still don't have a working example for:\n", "```\n", "\"[#8]-[#6](=[#8])-[#6](-[#1])(-[#1])-[#16;X2]-[#6](=[#7]-[#6]#[#7])-[#7](-[#1])-c:1:c:c:c:c:c:1\",\"\"\n", "\"c:1-3:c(:c:c:c:c:1)-[#16]-[#6](=[#7]-[#7]=[#6]-2-[#6]=[#6]-[#6]=[#6]-[#6]=[#6]-2)-[#7]-3-[#6](-[#1])-[#1]\",\"\"\n", "\"c:1(:c(:c:2:c(:n:c:1-[#7](-[#1])-[#1]):c:c:c(:c:2-[#7](-[#1])-[#1])-[#6]#[#7])-[#6]#[#7])-[#6]#[#7]\",\"\"\n", "\"[#6](-[#1])-[#6]:2:[#7]:[#7](-c:1:c:c:c:c:c:1):[#16]:3:[!#6&!#1]:[!#1]:[#6]:[#6]:2:3\",\"\"\n", "\"[#6]-2(=[#16])-[#7]-1-[#6]:[#6]-[#7]=[#7]-[#6]-1=[#7]-[#7]-2-[#1]\",\"\"\n", "\"[#7](-[#1])(-[#1])-c:1:c(:c(:c(:c(:c:1-[#7](-[#1])-[#16](=[#8])=[#8])-[#1])-[#7](-[#1])-[#6](-[#1])-[#1])-[F,Cl,Br,I])-[#1]\",\"\"\n", "\"[#7]-4(-c:1:c:c:c:c:c:1)-[#6](=[#7+](-c:2:c:c:c:c:c:2)-[#6](=[#7]-c:3:c:c:c:c:c:3)-[#7]-4)-[#1]\",\"\"\n", "\"c:1:3:c(:c:c:c:c:1)-[#7]-2-[#6](=[#8])-[#6](=[#6](-[F,Cl,Br,I])-[#6]-2=[#8])-[#7](-[#1])-[#6]:[#6]:[#6]:[#6](-[#8]-[#6](-[#1])-[#1]):[#6]:[#6]:3\",\"\"\n", "\"[#6]-1(=[#6](-!@[#6]=[#7])-[#16]-[#6](-[#7]-1)=[#8])-[$([F,Cl,Br,I]),$([#7+](:[#6]):[#6])]\",\"\"\n", "\"s:1:c(:c(-[#1]):c(:c:1-[#6]-3=[#7]-c:2:c:c:c:c:c:2-[#6](=[#7]-[#7]-3-[#1])-c:4:c:c:n:c:c:4)-[#1])-[#1]\",\"\"\n", "\"[#7]=[#6]-1-[#7](-[#1])-[#6](=[#6](-[#7]-[#1])-[#7]=[#7]-1)-[#7]-[#1]\",\"\"\n", "\"[#6]-2(=[#7]-c1c(c(nn1-[#6](-[#6]-2(-[#1])-[#1])=[#8])-[#7](-[#1])-[#1])-[#7](-[#1])-[#1])-[#6]\",\"\"\n", "\"c:1:2(:c(:c(:c(:o:1)-[#6])-[#1])-[#1])-[#6](=[#8])-[#7](-[#1])-[#6]:[#6](-[#1]):[#6](-[#1]):[#6](-[#1]):[#6](-[#1]):[#6]:2-[#6](=[#8])-[#8]-[#1]\",\"\"\n", "\"[#6](-[#1])(-c:1:c(:c(:c(:c(:c:1-[#1])-[#1])-[Cl])-[#1])-[#1])(-c:2:c(:c(:c(:c(:c:2-[#1])-[#1])-[Cl])-[#1])-[#1])-[#8]-[#6](-[#1])(-[#1])-[#6](-[#1])(-[#1])-[#6](-[#1])(-[#1])-c3nc(c(n3-[#6](-[#1])(-[#1])-[#1])-[#1])-[#1]\",\"\"\n", "\"c2(c-1n(-[#6](-[#6]=[#6]-[#7]-1)=[#8])nc2-c3cccn3)-[#6]#[#7]\",\"\"\n", "\"[#7](-[#1])(-c:1:c(:c(:c(:c(:c:1-[#1])-[#1])-[#1])-[#1])-[#8]-[#1])-[#6]-2=[#6](-[#8]-[#6](-[#7]=[#7]-2)=[#7])-[#7](-[#1])-[#1]\",\"\"\n", "\"[#8]=[#6]-3-c:1:c(:c:c:c:c:1)-[#6]-2=[#6](-[#8]-[#1])-[#6](=[#8])-[#7]-c:4:c-2:c-3:c:c:c:4\",\"\"\n", "\"c:1:c:c-2:c(:c:c:1)-[#7](-[#6](-[#8]-[#6]-2)(-[#6](=[#8])-[#8]-[#1])-[#6](-[#1])-[#1])-[#6](=[#8])-[#6](-[#1])-[#1]\",\"\"\n", "```\n", "\n", "These, along with having a set of molecules which are known *not* to match each PAINS, and looking into the aromaticity model changes in some more detail, are things to come back to. In the meantime, the test set, along with code that runs it, is here: https://github.com/rdkit/rdkit/tree/master/Data/Pains/test_data\n", "The updated version of the PAINS filters is here:\n", "https://github.com/rdkit/rdkit/blob/master/Data/Pains/wehi_pains.csv\n", "\n", "A request: getting this put together was a fair amount of work and it would be very nice to get some feedback on it/credit for it. If you use these, please let me know either via email, posts to the mailing list, a comment here, or a message on github. The data files and tests are under the same BSD license as the rest of the RDKit, so there's no requirement that you do so, but it would be a nice gesture. :-)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }