{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fingerprint-based substructure screening 1\n", "\n", "Some form of fingerprint is often used to make substructure searching more efficient. The idea is to use a fingerprinting algorithm with the property:\n", "```\n", "FP(query) & FP(mol) = FP(query)\n", "```\n", "if query is a substructure of mol. In words: if `query` is a substructure of `mol` then every bit set in the fingerprint of `query` should also be set in `mol`.\n", "\n", "A bunch of different approaches to this have been developed, I'm not going to review them all here. :-)\n", "\n", "Andrew Dalke has done some writing on the topic: http://dalkescientific.com/writings/diary/archive/2012/06/11/optimizing_substructure_keys.html\n", "\n", "One of the best-known approaches is the Daylight algorithm:\n", "http://www.daylight.com/dayhtml/doc/theory/theory.finger.html\n", "\n", "As Andrew mentions in the post I link to above, one of the big problems here is the lack of a reasonable query dataset for benchmarking. The approach I've taken is to use collections of small molecules from ZINC as well as some fragments of PubChem molecules. The database I query is a set of ChEMBL molecules from an earlier post.\n", "\n", "Here's the atom-number count distributions of the queries and molecules:\n", "\n", "\n", "\n", "It's not perfect, but it's a start.\n", "\n", "## TL;DR summary\n", "\n", "Using the RDKit pattern fingerprinter (the default used in the postgresql cartridge, more on that in another post), a screenout accuracy of around 60% is achievable with these three query sets. This means that 60% of molecules passing the fingerprint screen actually have a substructure match.\n", "\n", "To put that in perspective: with the leads query set and the pattern fingerprint, 4650 substructure matches are found in a total of 7601 searches, that's 61% accuracy for the fingerprint. If the fingerprint had not been used to pre-screen, 25000000 (500*50000) searches would have needed to be done, so we've reduced the number of substructure searches by 99.97%. By this logic, even a comparatively poor fingerprint with an accuracy of 5% would help enormously: reducing the number of searches by 99.7%. This will become important when actually using the fingerprints in a database index, but that's a different post.\n", "\n", "On the technical side: there's some info below about using IPython's ability to run code in parallel.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from rdkit import Chem\n", "from rdkit.Chem import rdMolDescriptors\n", "from rdkit.Avalon import pyAvalonTools\n", "from rdkit.Chem import Draw\n", "from rdkit.Chem.Draw import IPythonConsole\n", "from rdkit import rdBase\n", "from rdkit import DataStructs\n", "import cPickle,random,gzip,time\n", "from __future__ import print_function\n", "print(rdBase.rdkitVersion)\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "2014.03.1pre\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read in the data\n", "\n", "## Database molecules\n", "\n", "Here we will use the 50K molecules that make up the set of 25K reference pairs generated in an earlier post: http://rdkit.blogspot.ch/2013/10/building-similarity-comparison-set-goal.html\n", "\n", "As a quick reminder: these are pairs of molecules taken from ChEMBL with MW<600 and a count-based MFP0 similarity of at least 0.7 to each other." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ind = [x.split() for x in gzip.open('../data/chembl16_25K.pairs.txt.gz')]\n", "mols = []\n", "for i,row in enumerate(ind):\n", " m1 = Chem.MolFromSmiles(row[1])\n", " mols.append(m1)\n", " m2 = Chem.MolFromSmiles(row[3])\n", " mols.append(m2)\n", " " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query molecules\n", "\n", "We'll use three sets:\n", "\n", " 1. Fragments: 500 diverse molecules taken from the ZINC Fragments set\n", " 2. Leads: 500 diverse molecules taken from the ZINC Lead-like set\n", " 3. Pieces: 823 pieces of molecules obtained by doing a BRICS fragmentation of some molecules from the pubchem screening set.\n", " \n", "These sets were discussed in this thread on the mailing list:\n", "http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02066.html\n", "and this presentation:\n", "http://www.hinxton.wellcome.ac.uk/advancedcourses/MIOSS%20Greg%20Landrum.pdf" ] }, { "cell_type": "code", "collapsed": false, "input": [ "frags = [Chem.MolFromSmiles(x.split()[0]) for x in file('../data/zinc.frags.500.q.smi')]\n", "leads = [Chem.MolFromSmiles(x.split()[0]) for x in file('../data/zinc.leads.500.q.smi')]\n", "pieces = [Chem.MolFromSmiles(x) for x in file('../data/fragqueries.q.txt')]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at the size distributions:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "figure(figsize=(16,4))\n", "subplot('141')\n", "hist([x.GetNumAtoms() for x in pieces],bins=20)\n", "title('Pieces')\n", "xlabel('NumAtoms')\n", "subplot('142')\n", "hist([x.GetNumAtoms() for x in frags],bins=20)\n", "title('Fragments')\n", "xlabel('NumAtoms')\n", "subplot('143')\n", "hist([x.GetNumAtoms() for x in leads],bins=20)\n", "title('Leads')\n", "_=xlabel('NumAtoms')\n", "subplot('144')\n", "hist([x.GetNumAtoms() for x in mols],bins=20)\n", "title('Mols')\n", "_=xlabel('NumAtoms')\n" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAA6YAAAEVCAYAAADzQ/gZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3X9UU/f9P/BnkHStlQCZkrBEjS1QjOLEH+i2OtPSoKWV\nYbV01En81Xbabdrts865036lZ5O4tp/OH+Vz9umoo/azoj2fDWlP5UNtjbXthIq2dcVK6lAhAtPy\nKyo2Avf7B/NWlACBJPdy83yck3Pg/sh93SSve+/73vcPlSAIAoiIiIiIiIgkEiZ1AERERERERBTa\nWDAlIiIiIiIiSbFgSkRERERERJJiwZSIiIiIiIgkxYIpERERERERSYoFUyIiIiIiIpIUC6YKERER\ngVOnTkkdBhERUcj785//jDlz5kgdBhH9G3NyeGDBdJgxmUwYOXIkIiIioNfrsXz5cly8eBFutxsm\nk0nq8IgU5dp8i4iIgEajQUNDg9RhDcqpU6cQFhaGrq4uqUMhkpzJZMI777wjdRhENEAmkwnf+MY3\n8OWXX/aYnpycjLCwMJw5c0aiyMifWDAdZlQqFd5880243W4cOXIEhw8fxm9/+1upwyJSpGvzze12\no62tDXq9Xpzf0dEhYXSDIwiC1CEQSU6lUkGlUkkdBhENkEqlwm233YbXXntNnHbs2DG0t7czlxWE\nBdNh7Fvf+hbuvfde/OMf/0BYWBj++c9/AgC++uor/Md//AfGjx8PvV6P1atX4/Lly+J6e/bswdSp\nUxEZGYm4uDj83//9HwCgtbUVK1euxLe+9S0YjUY89dRT4tOVL774AnPnzkVUVBTGjBmDH/7wh8Hf\nYSIZCAsLQ35+PuLj43HHHXcAANauXYtx48YhMjISM2bMwPvvvy8u397eDpvNBq1WC7PZjN///vcY\nO3asON9kMuG5557DlClTEBERgZUrV6KxsRH33nsvIiMjYbVa0dLSIi5/6NAhfPe730V0dDSmTp2K\nAwcOiPMsFguefvpp3HnnndBoNJg3b554d/n73/8+ACAqKgoREREoLy9nXhNdQxAE2O12xMXFYfTo\n0XjooYfQ3Nwszn/wwQcRGxuLqKgozJ07F1VVVeK8L7/8EhkZGYiMjMSsWbNw8uTJHu/7xBNPQKfT\nITIyElOmTMFnn30W1H0jUoIf/ehHeOWVV8T/CwsLkZOTI95wbW1tRU5ODmJiYmAymfC73/2u15ux\nzEn5YsF0GLqaZLW1tXjrrbeQnJzcY/769evxxRdf4JNPPsEXX3wBl8uFZ555BgBQUVEBm82G559/\nHq2trXjvvffEKsDLli3DTTfdhJMnT+Lo0aMoKyvDn/70JwDAU089hfnz56OlpQUulws/+9nPgrfD\nRBLq7aS2Z88efPTRR+KFaUpKCj755BM0Nzfj4YcfxoMPPgiPxwMAyM3NxZkzZ1BTU4O3334br776\nao+7uyqVCn/961/xzjvv4MSJE3jzzTdx7733wm6341//+he6urqwdetWAIDL5cL999+Pp59+Gs3N\nzXjuueewaNGiHlWbXnvtNfz5z3/Gv/71L3g8Hjz33HMAgIMHDwLoPnG73W7MmjWLeU10ja1bt6Kk\npATvvfce6uvrER0djccff1ycf9999+GLL77AuXPnMG3aNCxZskSc9/jjj2PkyJFoaGjAyy+/jB07\ndoh5XlZWhoMHD8LpdKK1tRWvv/46vvnNbwZ9/4iGu9mzZ6OtrQ2ff/45Ojs7sWvXLvzoRz8C0H2u\n/ulPfwq3242amhocOHAAr7zyCnbs2HHD+zAnZUygYWX8+PHCqFGjhKioKGH8+PHC448/LrS3twsq\nlUo4efKk0NXVJdx6663CyZMnxXU+/PBDYcKECYIgCMKjjz4q/PznP7/hfRsaGoRvfOMbQnt7uzjt\nL3/5i3DXXXcJgiAIOTk5wqOPPirU1dUFeA+J5OPafIuKihIyMzMFlUol7N+/v8/1oqOjhU8//VQQ\nBEG47bbbhLKyMnHen/70J8FoNIr/m0wm4S9/+Yv4/6JFi4Q1a9aI/2/btk3IzMwUBEEQ7Ha7sHTp\n0h7bmjdvnlBYWCgIgiBYLBbhd7/7nTgvPz9fmD9/viAIglBTUyOoVCqhs7NTnM+8plBlMpmEd955\np8e0iRMn9ph29uxZQa1W98iZq5qbmwWVSiW0tbUJHR0dglqtFk6cOCHO37Bhg3DnnXcKgiAI77zz\njpCQkCAcOnSo1/ciov6ZTCZh3759wm9/+1vh17/+tbB3714hLS1N6OjoEK+Bb7rpJuH48ePiOn/8\n4x8Fi8UiCIIg7Nixgzk5DPCJ6TCjUqmwZ88eNDc349SpU9i+fTtuvvlmcf65c+dw6dIlTJ8+HdHR\n0YiOjsa9996L8+fPAwDq6upw++233/C+p0+fxpUrVxAbGyuu9+Mf/xjnzp0DAPz+97+HIAhISUnB\n5MmTe70DRaQ01+Zbc3Mz/va3vwFAj6q4APDcc8/BbDYjKioK0dHRaG1tFXPu7NmzPZY3Go03bEen\n04l/33LLLT3+v/nmm3HhwgUA3Xn6+uuvizkaHR2NDz74oEeHTNe2gb3lllvEdXvDvCb62qlTp7Bw\n4UIxt8xmM8LDw9HY2IjOzk6sX78ecXFxiIyMxIQJE6BSqXD+/HmcO3cOHR0dPfJ83Lhx4t933303\nfvKTn+Dxxx+HTqfDY489BrfbLcUuEg1rKpUKS5cuxf/8z//cUI33/PnzuHLlCsaPHy8uP27cOLhc\nrhvehzkpXyyYKszo0aNxyy23oKqqSryYbmlpQVtbG4DuC+ovvvjihvXGjh0r9nZ2db3W1lYcO3YM\nQPeF83//93/D5XLhj3/8I9asWSO2aSUKNddWxT148CCeffZZvP7662hpaUFzczMiIyPFk2VsbCxq\na2vF5a/92xvBSwdF48aNw9KlS8UcbW5uhtvtxpNPPulTzFcxr4m+Nm7cOJSWlvbIr0uXLiE2NhZ/\n+ctfUFJSgnfeeQetra2oqamBIAgQBAFjxoxBeHh4j15Br+8h9Kc//SkOHz6MqqoqVFdX49lnnw32\n7hEpwrhx43Dbbbdh7969eOCBB8Tpo0ePhlqt7jF04pkzZ3q9GQwwJ+WKBVOFCQsLwyOPPIJ169aJ\nTztdLhfKysoAACtXrsSOHTvw7rvvoqurCy6XCydOnEBsbCzS0tLw85//HG63G11dXTh58iTee+89\nAMDrr7+Ouro6AN2dp6hUKoSF8edD5Ha7ER4ejtGjR8Pj8eCZZ54RbwQBQFZWFvLy8sR2nNu3bx90\nD4I/+tGP8MYbb6CsrAydnZ24fPkyHA5HjzvC3gq1Y8aMQVhYWI9OWZjXFMo8Hg8uX74svlatWoUN\nGzaIhcpz586hpKQEAHDhwgV84xvfgFarxcWLF7FhwwbxfUaMGIEHHngAGzduRHt7O6qqqlBYWCjm\n+eHDh1FeXo4rV65g5MiRuPnmmzFixIjg7zCRQhQUFODdd9/FLbfcIk4bMWIEsrKy8Jvf/AYXLlzA\n6dOn8cILL4htUK/FnJQvXoEoxLUXups3b0ZcXBxmz54t9upZXV0NAJg5cyZ27NiBJ554AlFRUbBY\nLOJJ+JVXXoHH44HZbIZWq8WDDz4oVhE8fPgwZs+ejYiICPzgBz/A1q1bOW4qhaTrC5Xz58/H/Pnz\nkZCQAJPJhFtuuaVHNb6nn34aRqMREyZMQFpaGh588EHcdNNNA97GtcNaGI1G7NmzB5s2bUJMTAzG\njRuH559/vkdh1Nu6I0eOxG9+8xt873vfg1arRXl5OfOaQlp6ejpGjhwpvlpaWpCRkYG0tDRoNBp8\n5zvfQUVFBQAgJycH48ePh8FgwOTJk/Gd73ynR65t374dFy5cgF6vx4oVK7BixQpxXltbGx599FFo\ntVqYTCaMHj0av/zlL4O+v0RKcdttt2HatGni/1fPddu2bcOtt96K2267DXPmzMGSJUuwfPnyHssA\nzEk5Uwnebq8TEZHf/dd//Rd2796N/fv3Sx0KERERkWz0+cS0trYWd911FyZNmoTJkyeLQxY0NTXB\narUiISEBaWlpPcbYy8vLQ3x8PBITE8Xqo0QUfCdOnEBycrL4ioyMxNatW/vMX/K/hoYGfPDBB+jq\n6sKJEyfwn//5n1i4cKHUYZEMtLS0YPHixZg4cSLMZjPKy8uZn0QBsmXLFiQlJWHy5MnYsmULgMFd\nz1ZWViIpKQnx8fFYu3Zt0PeDSMn6LJiq1Wq88MIL+Oyzz3Do0CG8+OKLOH78OOx2u1g9NDU1FXa7\nHQBQVVWFXbt2oaqqCqWlpVizZg26urqCsiNE1NMdd9yBo0eP4ujRo6isrMTIkSOxcOFCr/lLgeHx\nePDjH/8YGo0GqampyMzMxJo1a6QOi2Rg7dq1SE9Px/Hjx/Hpp58iMTGR+UkUAP/4xz/wpz/9CR99\n9BE++eQTvPnmmzh58qRP17NXKxiuXr0aBQUFcDqdcDqdKC0tlXLXiBSlz4KpXq/H1KlTAQCjRo3C\nxIkT4XK5UFJSApvNBgCw2WwoLi4G0D3ofHZ2NtRqNUwmE+Li4sT2GUQknX379iEuLg5jx471mr8U\nGOPGjcOxY8dw4cIF1NXV4dlnn0V4eLjUYZHEWltbcfDgQbEtYnh4OCIjI5mfRAHw+eefY9asWWIn\nN3PnzsX//u//+nQ9W15ejvr6erjdbqSkpADobnvMHCXynwF3fnTq1CkcPXoUs2bNQmNjozjOnk6n\nQ2NjI4Du8fqu7ZbZaDT2On4QEQVXUVERsrOzAcBr/hJR8NTU1GDMmDFYvnw5pk2bhkceeQQXL15k\nfhIFwOTJk3Hw4EE0NTXh0qVLeOutt1BXV+fz9ez10w0GA69zifxoQLftL1y4gEWLFmHLli2IiIjo\nMe/aXq5609u8wQ6VQKQ0weh7zOPx4I033sDmzZtvmOctf5mjRN0ClaMdHR04cuQItm/fjpkzZ2Ld\nunU3VNvt6/zKHCXqNpAcTUxMxK9+9SukpaXh1ltvxdSpU28YHqS/61lfMUeJuvlyHu33iemVK1ew\naNEiLF26FJmZmQC67ypdHUakvr4eMTExALrvHF07eHxdXR0MBoPXIIP1+n//7/8pcltK356S900Q\ngtcZ9t69ezF9+nSMGTMGgPf8DWaOBvqzDsZ3Odz3Ybi/fzC2EUhGoxFGoxEzZ84EACxevBhHjhyB\nXq8fUH4GOkfl+p0wNsZ27csXK1aswOHDh3HgwAFER0cjISHBp+tZo9EIg8Egjv18dbq361zmKGNj\nbL6fR/ssmAqCgJUrV8JsNmPdunXi9IyMDBQWFgIACgsLxQJrRkYGioqK4PF4UFNTA6fTKdbDJyJp\nvPbaa2I1XsB7/hJR8Oj1eowdO1YcY3rfvn2YNGkSFixYwPwkCoB//etfAIAzZ87gr3/9Kx5++GGf\nr2f1ej00Gg3Ky8shCAJ27tzJHCXyoz6r8n7wwQd49dVXMWXKFCQnJwPo7j57/fr1yMrKQkFBAUwm\nE3bv3g0AMJvNyMrKgtlsRnh4OPLz81mVgUhCFy9exL59+/DSSy+J07zlLxEF17Zt27BkyRJ4PB7c\nfvvt2LFjBzo7O5mfRAGwePFifPnll1Cr1cjPz0dkZOSgrmfz8/OxbNkytLe3Iz09HfPnz5dyt4gU\npc+C6Z133ul1uJd9+/b1On3Dhg3YsGHD0CPzI4vFoshtKX17St63YLn11ltx/vz5HtO0Wq3X/A2W\nQH/Wwfguh/s+DPf3D9Y2Aunb3/42PvrooxumS52fQyHn74SxDY6cY/PFe++9d8O0vs6H3q5np0+f\njmPHjvk9vmCR8/fJ2AZHzrH5SiUMpgLwUDeqUg2q3jGRksg5D+QcG1GwyDkP5BwbUbDIOQ/kHBtR\nsPiaBwMeLoaIiIiIiIgoEFgwJSIiIiIiIkmxYEpERERERESSYsGUiIiIiIiIJMWCKRERERERDWsa\njRYqlcrrS6PRSh0i9YO98hJJRM55IOfYiIJFznkg59iIgkXOeSDn2JSqe6zZvj5zfifBxl55iYiI\niIiIaFhhwZSIiIiIiIgkpYiCaX91ylmvnIiIiIiISL4U0ca0/zrlAOuVk9zIuf2JnGMjChY554Gc\nYyMKFjnngZxjUyq2MZUftjElIiIiIiKiYYUFUyIiIiIiIpIUC6ZEREREREQkKRZMiYiIiIiISFIs\nmBIREREREZGkWDAlIiIiIkXLy8vDpEmTkJSUhIcffhhfffUVmpqaYLVakZCQgLS0NLS0tPRYPj4+\nHomJiSgrKxOnV1ZWIikpCfHx8Vi7dq0Uu0KkWCyYEhEREZFinTp1Ci+99BKOHDmCY8eOobOzE0VF\nRbDb7bBaraiurkZqairsdjsAoKqqCrt27UJVVRVKS0uxZs0acciL1atXo6CgAE6nE06nE6WlpVLu\nGpGisGBKRERERIql0WigVqtx6dIldHR04NKlS/jWt76FkpIS2Gw2AIDNZkNxcTEAYM+ePcjOzoZa\nrYbJZEJcXBzKy8tRX18Pt9uNlJQUAEBOTo64DhENXbjUARARERERBYpWq8UvfvELjBs3Drfccgvm\nzZsHq9WKxsZG6HQ6AIBOp0NjYyMA4OzZs5g9e7a4vtFohMvlglqthtFoFKcbDAa4XC6v2924caP4\nt8VigcVi8e+OEcmMw+GAw+EY9PosmBIRERGRYp08eRJ/+MMfcOrUKURGRuLBBx/Eq6++2mMZlUoF\nlUrl1+1eWzAlCgXX34DJzc31aX1W5SUiIiK6jkajFQsrvb00Gq3UIdIAHT58GN/97nfxzW9+E+Hh\n4XjggQfw97//HXq9Hg0NDQCA+vp6xMTEAOh+ElpbWyuuX1dXB6PRCIPBgLq6uh7TDQZDcHeGSMFY\nMCUiIiK6jtvdDEDw+uqeT8NBYmIiDh06hPb2dgiCgH379sFsNmPBggUoLCwEABQWFiIzMxMAkJGR\ngaKiIng8HtTU1MDpdCIlJQV6vR4ajQbl5eUQBAE7d+4U1yGioWNVXiKFamlpwapVq/DZZ59BpVJh\nx44diI+Px0MPPYTTp0/DZDJh9+7diIqKkjpUkhGNRtvvBXdERDTa2pqCFBER0dB8+9vfRk5ODmbM\nmIGwsDBMmzYNjz76KNxuN7KyslBQUCCeEwHAbDYjKysLZrMZ4eHhyM/PF6v55ufnY9myZWhvb0d6\nejrmz58v5a4RKYpKuNr/dTA3qlLBn5vtPlj0937+3SbRUPk7D65ns9kwd+5crFixAh0dHbh48SJ+\n97vfYfTo0XjyySexefNmNDc3i93jBzM2ki8eT78m5zyQc2xK0X8u8DuQmpzzQM6xKRVzVn58zQMW\nTIkkEsiTVmtrK5KTk/HPf/6zx/TExEQcOHAAOp0ODQ0NsFgs+Pzzz4MaG8kbj6dfk3MeyDk2peBF\nrvzJOQ/kHJtSMWflx9c8YFVeIgWqqanBmDFjsHz5cnzyySeYPn06/vCHP3jtGr837OZeeQZSTTeU\nDbWbeyIiIho8PjElkkgg76YePnwY3/nOd/Dhhx9i5syZWLduHSIiIrB9+3Y0N39dMNFqtWhqurGt\nIO/0KtNAj5U8nnaTcx7IOTal4NMX+ZNzHsg5NqVizsqPr3nAXnmJFMhoNMJoNGLmzJkAgMWLF+PI\nkSNeu8YnouAzmUyYMmUKkpOTkZKSAgBoamqC1WpFQkIC0tLS0NLSInGUREREwcGCKZEC6fV6jB07\nFtXV1QCAffv2YdKkSV67xiei4FOpVHA4HDh69CgqKioAAHa7HVarFdXV1UhNTe21czIiIiIlYlVe\nIokEuprPJ598glWrVsHj8eD222/Hjh070NnZiaysLJw5c6bP4WJYBUmZWJXXN4HOgwkTJuDw4cP4\n5je/KU5jB2XywWqB8ifnPJBzbErFnJUf9srrfSn+GElW5HzSknNsNHgsmPom0Hlw2223ITIyEiNG\njMBjjz2GRx55BNHR0WI7cEEQoNVqe7QLD1ZsxIvc4UDOeSDn2JSKOSs/7JWXiIhoGPjggw8QGxuL\nc+fOwWq1IjExscd8lUr17wut3rHnbAo17DmbSNn4xJRIInK+myrn2GjwgvnEdCBD00RERKOt7cZe\noeUimHmQm5uLUaNG4aWXXoLD4YBer0d9fT3uuusuVuWVCJ++yJ+c80DOsSkVc1Z+2CsvERFJrrtQ\nKvT5CuUxVS9dugS32w0AuHjxIsrKypCUlISMjAx2UEZERCGJVXmJiIiCrLGxEQsXLgQAdHR0YMmS\nJUhLS8OMGTOQlZWFgoICsYMyIiKiUMCqvEQSkXM1HznHRoMXzKq8SjguyzkP5BybUrBaoPzJOQ/k\nHJtSMWflh1V5iYiIiIiIaFhhwZSIiIiIiIgkxYIpERERERERSYoFUyIiIiIiIpIUC6ZEREREpGgn\nTpxAcnKy+IqMjMTWrVvR1NQEq9WKhIQEpKWloaWlRVwnLy8P8fHxSExMRFlZmTi9srISSUlJiI+P\nx9q1a6XYHSJFYsGUiChANBotVCpVny+NRit1mEREinfHHXfg6NGjOHr0KCorKzFy5EgsXLgQdrsd\nVqsV1dXVSE1Nhd1uBwBUVVVh165dqKqqQmlpKdasWSP2Lrp69WoUFBTA6XTC6XSitLRUyl0jUgwW\nTImIAsTtbkZ31/XeX93LEBFRsOzbtw9xcXEYO3YsSkpKYLPZAAA2mw3FxcUAgD179iA7OxtqtRom\nkwlxcXEoLy9HfX093G43UlJSAAA5OTniOkQ0NOFSB0BEREREFCxFRUXIzs4GADQ2NkKn0wEAdDod\nGhsbAQBnz57F7NmzxXWMRiNcLhfUajWMRqM43WAwwOVy9bqdjRs3in9bLBZYLBY/7wmRvDgcDjgc\njkGvz4IpEREREYUEj8eDN954A5s3b75h3tUmFv5ybcGUKBRcfwMmNzfXp/VZlZeIiIiIQsLevXsx\nffp0jBkzBkD3U9KGhgYAQH19PWJiYgB0Pwmtra0V16urq4PRaITBYEBdXV2P6QaDIYh7QKRc/RZM\nV6xYAZ1Oh6SkJHHaxo0bYTQaxZ7N9u7dK87z1oMZEREREZGUXnvtNbEaLwBkZGSgsLAQAFBYWIjM\nzExxelFRETweD2pqauB0OpGSkgK9Xg+NRoPy8nIIgoCdO3eK6xDR0KiEq12MeXHw4EGMGjUKOTk5\nOHbsGIDux7IRERH4+c9/3mPZqqoqPPzww/joo4/gcrlwzz33oLq6GmFhPcu/KpUK/WzWt51QqdDd\nkUifS/l1m0RD5e888Cc5xzacyO3YNNB4/BGz3PZ9MOScB3KOTSn6/w3zO5Car3lw8eJFjB8/HjU1\nNYiIiAAANDU1ISsrC2fOnIHJZMLu3bsRFRUFANi0aRNefvllhIeHY8uWLZg3bx6A7uFili1bhvb2\ndqSnp2Pr1q1Djo2GjjkrP77mQb9tTOfMmYNTp07dML23jfTWg1lFRUWPxuNERETdwvttzxUREY22\ntqYgxUNESnbrrbfi/PnzPaZptVrs27ev1+U3bNiADRs23DB9+vTp4sMaIvKfQXd+tG3bNrzyyiuY\nMWMGnn/+eURFRXntwaw37KmMQs1QeyojUp4O9PdU1e32X0ckREREJF+DKpiuXr0aTz/9NADgqaee\nwi9+8QsUFBT0uqy3u+HsqYxCzVB7KiMiIiIiUqpB9cobExMjdqm9atUqVFRUAOi9BzP2VEZERERE\nRER9GVTBtL6+Xvz7b3/7m9hjr7cezIiIiIiIiIi86bcqb3Z2Ng4cOIDz589j7NixyM3NhcPhwMcf\nfwyVSoUJEybgj3/8IwDAbDYjKysLZrMZ4eHhyM/P9+tAxURERET+oNFo4XY3Sx0GERH9W7/DxQRk\noxwuhigoXcmbTCZoNBqMGDECarUaFRUVaGpqwkMPPYTTp0/f0DV+MGMLBXI7NslxuBg5fT43bFnG\neSDn2IaLgQwtwaEn5E3OeSDn2JSKw8XIj695MKiqvEQ0PKhUKjgcDhw9elRsC26322G1WlFdXY3U\n1FTY7XaJoyQiIiKiUMeCKZHCXX+nqqSkBDabDQBgs9lQXFwsRVhERERERKJBj2NKRPKnUqlwzz33\nYMSIEXjsscfwyCOPoLGxETqdDgCg0+nQ2NjY67oca5hCDccaJiIikg7bmBJJJBjtT+rr6xEbG4tz\n587BarVi27ZtyMjIQHPz1x1+aLVaNDU1BT22UCC3YxPbmPpGznkg59iGC7YxHf7knAdyjk2p2MZU\nftjGlIhEsbGxAIAxY8Zg4cKFqKiogE6nQ0NDA4DugmtMTIyUIRIRERERsWBKpFSXLl2C2+0GAFy8\neBFlZWVISkpCRkYGCgsLAQCFhYXIzMyUMkwiIiIiIrYxJVKqxsZGLFy4EADQ0dGBJUuWIC0tDTNm\nzEBWVhYKCgrE4WKIiIiIiKQk+zamAx8AW77tlIh6I+f2J3KObThhG1O2MQ0UOcc2XLCN6fAn5zyQ\nc2xKxTam8uNrHsj+iWl3oXQgFzdEREREREQ0HLGNKREREREREUmKBVMiIiKJdHZ2Ijk5GQsWLAAA\nNDU1wWq1IiEhAWlpaWhpaZE4QiLlaGlpweLFizFx4kSYzWaUl5f3mXN5eXmIj49HYmIiysrKxOmV\nlZVISkpCfHw81q5dK8WuECkSC6ZEREQS2bJlC8xm87/bRgF2ux1WqxXV1dVITU2F3W6XOEIi5Vi7\ndi3S09Nx/PhxfPrpp0hMTPSac1VVVdi1axeqqqpQWlqKNWvWiG3lVq9ejYKCAjidTjidTpSWlkq5\nW0SKwYIpERGRBOrq6vDWW29h1apV4gVvSUkJbDYbAMBms6G4uFjKEIkUo7W1FQcPHsSKFSsAAOHh\n4YiMjPSac3v27EF2djbUajVMJhPi4uJQXl6O+vp6uN1upKSkAABycnKYp0R+IvvOj/wnXLwj7U1E\nRDTa2pqCFA8REYWyJ554As8++yza2trEaY2NjdDpdAAAnU6HxsZGr+tv3LhR/NtiscBisQQqVCJZ\ncDgccDgcg1q3pqYGY8aMwfLly/HJJ59g+vTp+MMf/uA1586ePYvZs2eL6xuNRrhcLqjVahiNRnG6\nwWCAy+V99cuSAAAgAElEQVTqdZvMUQo1Q8lRIKQKph3or3dft5u9+xIRUeC9+eabiImJQXJysteT\nuEql6vOG6rUXvUSh4PrCXW5u7oDX7ejowJEjR7B9+3bMnDkT69atu6GqfH855yvmKIWaoeQowKq8\nREREQffhhx+ipKQEEyZMQHZ2Nt59910sXboUOp0ODQ0NAID6+nrExMRIHCmRMhiNRhiNRsycORMA\nsHjxYhw5cgR6vb7XnDMYDKitrRXXr6urg9FohMFgQF1dXY/pBoMhiHtCpFwsmBIREQXZpk2bUFtb\ni5qaGhQVFeHuu+/Gzp07kZGRgcLCQgBAYWEhMjMzJY6USBn0ej3Gjh2L6upqAMC+ffswadIkLFiw\noNecy8jIQFFRETweD2pqauB0OpGSkgK9Xg+NRoPy8nIIgoCdO3cyT4n8JISq8hIREcnT1eqD69ev\nR1ZWFgoKCmAymbB7926JIyNSjm3btmHJkiXweDy4/fbbsWPHDnR2dvaac2azGVlZWTCbzQgPD0d+\nfr6Yp/n5+Vi2bBna29uRnp6O+fPnS7lbRIqhEq52BRjMjapUGOhmuw8C/S3rv2Uk+DgoRPmSB8Em\n59iGk4Eev/zxWWs0WrjdzQNYMjjHSn8eu6X6Lco5D+Qc23DR/2+0//n8DqQl5zyQc2xKNZCc5ncS\nXL7mAZ+YEhEpQHehdCAFQSIiIiL5YRtTIiIiIiIikhQLpkRERERERCQpFkyJiIiIiEjhwsWxant7\naTRaqQMMeWxjSkRERERECteBvvpicLvZD4PU+MSUiIiIyGd9P33pft3EJzRERAPEJ6ZERJIKF8fG\n8yYiIhptbU1BioeIBqbvpy/d+h6+gk9oiIi+xoIpEZGk+r+45cUrERERKR2r8hIREREREZGkWDAl\nIiIixdFotH227yQiInlhVV4iIiJSHLe7GX1Xk2fhlIhITvjElIiIiIiIiCTFgimRgnV2diI5ORkL\nFiwAADQ1NcFqtSIhIQFpaWloaWmROEIiIiIiIhZMiRRty5YtMJvNYnsqu90Oq9WK6upqpKamwm63\nSxwhERERERELpkSKVVdXh7feegurVq2CIHS3syopKYHNZgMA2Gw2FBcXSxkiEREREREAdn5EpFhP\nPPEEnn32WbS1tYnTGhsbodPpAAA6nQ6NjY1e19+4caP4t8VigcViCVSoRLLgcDjgcDikDoOIAsRk\nMkGj0WDEiBFQq9WoqKhAU1MTHnroIZw+fRomkwm7d+9GVFQUACAvLw8vv/wyRowYga1btyItLQ0A\nUFlZiWXLluHy5ctIT0/Hli1bpNwtIsVQCVcfpQRzoyoVBrrZ7iqI/S3rv2Uk+DgoRPmSB7568803\nsXfvXrz44otwOBx4/vnn8cYbbyA6OhrNzc3iclqtFk1NTUGNLZT48/jV3/cht2NlMPc9UOScB3KO\nTS76/w0Gev7A3qOv71Gj0f67d+HeRUREo63txmN4qPA1DyZMmIDKykpotVpx2pNPPonRo0fjySef\nxObNm9Hc3Ay73Y6qqio8/PDD+Oijj+ByuXDPPffA6XRCpVIhJSUF27dvR0pKCtLT0/Gzn/0M8+fP\nH1JsNHT+yHl+Z/7lax6wKi+RAn344YcoKSnBhAkTkJ2djXfffRdLly6FTqdDQ0MDAKC+vh4xMTES\nR0pERN58PeRN76++Cq3Uu+svkr01cdmzZw+ys7OhVqthMpkQFxeH8vJy1NfXw+12IyUlBQCQk5PD\nZjFEfsKCKZECbdq0CbW1taipqUFRURHuvvtu7Ny5ExkZGSgsLAQAFBYWIjMzU+JIiYiIgkOlUuGe\ne+7BjBkz8NJLLwHw3sTl7NmzMBqN4rpGoxEul+uG6QaDAS6XK4h7QaRcbGNKFAKu9sq7fv16ZGVl\noaCgQGxLQ0REFAo++OADxMbG4ty5c7BarUhMTOwxX6VSiedLf2BfDRRqhtpXAwumRAo3d+5czJ07\nF0B3m9J9+/ZJHBEREVHwxcbGAgDGjBmDhQsXoqKiQmziotfrezRxMRgMqK2tFdetq6uD0WiEwWBA\nXV1dj+kGg6HX7V1bMKWh66/NNUnv+hswubm5Pq3PqrxEREREpGiXLl2C2+0GAFy8eBFlZWVISkry\n2sQlIyMDRUVF8Hg8qKmpgdPpREpKCvR6PTQaDcrLyyEIAnbu3MlmMUHSX5trGv74xJSIiIiIFK2x\nsRELFy4EAHR0dGDJkiVIS0vDjBkzem3iYjabkZWVBbPZjPDwcOTn54vVfPPz87Fs2TK0t7cjPT39\nhh55iWhwOFzMdcuwm2gKFjl3JS/n2IYTDhczvI/Lcs4DOccmF0oYLmYg+xDKvwM554GcYxuugpHT\n/M78i8PFEBERERER0bDCgikRERERERFJigVTIiIiIiIikhQ7PyIiIiKSRLhfx80kIhrO+n1iumLF\nCuh0OiQlJYnTmpqaYLVakZCQgLS0NLS0tIjz8vLyEB8fj8TERJSVlQUmaiIiomHs8uXLmDVrFqZO\nnQqz2Yxf//rXAPo+v5ISdYDDXxARdeu3YLp8+XKUlpb2mGa322G1WlFdXY3U1FTY7XYAQFVVFXbt\n2oWqqiqUlpZizZo16OrqCkzkREREw9TNN9+M/fv34+OPP8ann36K/fv34/333/d6fiUiUjKNRguV\nStXni5Sv34LpnDlzEB0d3WNaSUkJbDYbAMBms6G4uBgAsGfPHmRnZ0OtVsNkMiEuLg4VFRUBCJuI\niGh4GzlyJADA4/Ggs7MT0dHRXs+vRERK5nY3o+/aA6xBEAoG1ca0sbEROp0OAKDT6dDY2AgAOHv2\nLGbPni0uZzQa4XK5en2PjRs3in9bLBZYLJbBhEI0bDgcDjgcDqnDICKZ6OrqwrRp03Dy5EmsXr0a\nkyZN8np+7Y3Sz6MajfbfF6u9i4iIRltbUxAjIqnxPEqB1Xebbx5zAm/InR/193jd27xrT6hEoeD6\nC8fc3FzpgiEiyYWFheHjjz9Ga2sr5s2bh/379/eY39/5Venn0a+foHibz6p9oYbnUQqsq22+e8dj\nTuANargYnU6HhoYGAEB9fT1iYmIAAAaDAbW1teJydXV1MBgMfgiTiIhImSIjI3HfffehsrLS6/mV\niIhI6QZVMM3IyEBhYSEAoLCwEJmZmeL0oqIieDwe1NTUwOl0IiUlxX/REhERKcD58+fFHnfb29vx\n9ttvIzk52ev5lYiISOn6rcqbnZ2NAwcO4Pz58xg7diyeeeYZrF+/HllZWSgoKIDJZMLu3bsBAGaz\nGVlZWTCbzQgPD0d+fj570SIiIrpOfX09bDYburq60NXVhaVLlyI1NRXJycm9nl+JiIiUTiUIQtC7\nuVKpVBjoZrsLtv0t679lJPg4KET5kgfBJufYhhN/Hr/6+z7kdqwM5r4HipzzQM6x+Uv/v6G+P4OB\nrB/Y+cHYhvJ/B32Rcx7IOTY58s85I9Dz1ehuh9o7do50I1/zYMidHxERERERESkbO0cKtEG1MSUi\nIiIiIiLyFxZMiYiIyK80Gq043I23l0ajHeJWwvt8fyIiGl5YMCUiIiK/+noMUu+v7mWG4mq1Om8v\nop46OzuRnJyMBQsWAACamppgtVqRkJCAtLQ0sadsAMjLy0N8fDwSExNRVlYmTq+srERSUhLi4+Ox\ndu3aoO8DkZKxYEpENAgDeSJERETysWXLFpjNZvH4bLfbYbVaUV1djdTUVNjtdgBAVVUVdu3ahaqq\nKpSWlmLNmjViBy6rV69GQUEBnE4nnE4nSktLJdsfkpu+a3EMvZaI8rFgSkQ0CAN5IkRERPJQV1eH\nt956C6tWrRILmSUlJbDZbAAAm82G4uJiAMCePXuQnZ0NtVoNk8mEuLg4lJeXo76+Hm63GykpKQCA\nnJwccR2i/mpxDL2WiPKxYEpEREREivbEE0/g2WefRVjY15e+jY2N0Ol0AACdTofGxkYAwNmzZ2E0\nGsXljEYjXC7XDdMNBgNcLleQ9oBI+ThcDJECXb58GXPnzsVXX30Fj8eDH/zgB8jLy0NTUxMeeugh\nnD59GiaTCbt370ZUVJTU4RIREQXMm2++iZiYGCQnJ8PhcPS6TCCaYGzcuFH822KxwGKx+PX9ieTG\n4XB4zbGBYMGUSIFuvvlm7N+/HyNHjkRHRwfuvPNOvP/++ygpKYHVasWTTz6JzZs3w263i21qiIiI\nlOjDDz9ESUkJ3nrrLVy+fBltbW1YunQpdDodGhoaoNfrUV9fj5iYGADdT0Jra2vF9evq6mA0GmEw\nGFBXV9djusFg8LrdawumRKHg+hswubm5Pq3PqrxECjVy5EgAgMfjQWdnJ6Kjo722pyEiCj4O9zJ0\n7GxlIDZt2oTa2lrU1NSgqKgId999N3bu3ImMjAwUFhYCAAoLC5GZmQkAyMjIQFFRETweD2pqauB0\nOpGSkgK9Xg+NRoPy8nIIgoCdO3eK6xDR0PGJKZFCdXV1Ydq0aTh58iRWr16NSZMmeW1P0xtWQaJQ\nM9QqSOSrqx2FeMPCaf/6/gzdbn6Gvbl642P9+vXIyspCQUGB2LwFAMxmM7KysmA2mxEeHo78/Hxx\nnfz8fCxbtgzt7e1IT0/H/PnzJdsPIqVRCVe7JgvmRlUqDHSz3QeC/pb13zISfBwUonzJg6FobW3F\nvHnzkJeXhwceeADNzV/3CqfVatHU1CRZbMOZ3I5NSo5Hqt+inPNAzrEB/vr+h/t8OcQg79/JUMk5\nD+QcmxyFyjEj1H4TvuYBq/ISKVxkZCTuu+8+VFZWiu1pAPRoT0NEREREJCUWTP1Mo9H22d6DbT4o\nGM6fP4+WlhYAQHt7O95++20kJyd7bU9DRERERCQltjH1s+7Bc/t+ZM02HxRo9fX1sNls6OrqQldX\nF5YuXYrU1FQkJyf32p6GiIiIiEhKbGN63TJD/TgGGm+o1TGnG8m5/YmcY5MLuR2blBwP25jeSM6x\nAaHTXmw47KOcfydDJec8kHNschQqx4xQ+02wjSkRERERERENKyyYEhERERERkaRYMCUiIiIiIiJJ\nsWDqg4H0uEtERERERES+YcHUB1/3uNvXi4iI/Ce83xuCHIaLiIho+ONwMUREJGMdGMhNPw7DRURE\nNLzxiSkRERERERFJigVTIiIiIiIikhQLpkRERESK1HcbbbbNJiI5YRtTIiIiIkXqu40222YTkZzw\niSkRERERERFJigVTIiIiIiIKGI1G22e1ciKABVMiIqKgq62txV133YVJkyZh8uTJ2Lp1KwCgqakJ\nVqsVCQkJSEtLQ0tLi8SREinD5cuXMWvWLEydOhVmsxm//vWvAfSdc3l5eYiPj0diYiLKysrE6ZWV\nlUhKSkJ8fDzWrl0b9H0ZjtzuZnRXK/f2ImLB9Dp9dxJARETkD2q1Gi+88AI+++wzHDp0CC+++CKO\nHz8Ou90Oq9WK6upqpKamwm63Sx0qkSLcfPPN2L9/Pz7++GN8+umn2L9/P95//32vOVdVVYVdu3ah\nqqoKpaWlWLNmDQShuwC1evVqFBQUwOl0wul0orS0VMpdI1IMFkx7uNpJAO/mEBFR4Oj1ekydOhUA\nMGrUKEycOBEulwslJSWw2WwAAJvNhuLiYinDJFKUkSNHAgA8Hg86OzsRHR3tNef27NmD7OxsqNVq\nmEwmxMXFoby8HPX19XC73UhJSQEA5OTkME+J/IS98hIREUno1KlTOHr0KGbNmoXGxkbodDoAgE6n\nQ2Njo9f1Nm7cKP5tsVhgsVgCHCmRtBwOBxwOx6DX7+rqwrRp03Dy5EmsXr0akyZN8ppzZ8+exezZ\ns8V1jUYjXC4X1Go1jEajON1gMMDlcvW6PeYohZqh5igLpkRERBK5cOECFi1ahC1btiAiIqLHvP6a\nkVx70UsUCq4v3OXm5vq0flhYGD7++GO0trZi3rx52L9/f4/5/m66xRylUDPkHPVzPERERDQAV65c\nwaJFi7B06VJkZmYC6H5i09DQAACor69HTEyMlCESKVJkZCTuu+8+VFZWes05g8GA2tpacZ26ujoY\njUYYDAbU1dX1mG4wGIK7A0QKxYIpERFRkAmCgJUrV8JsNmPdunXi9IyMDBQWFgIACgsLxQIrEQ3N\n+fPnxR5329vb8fbbbyM5OdlrzmVkZKCoqAgejwc1NTVwOp1ISUmBXq+HRqNBeXk5BEHAzp07madE\nfqISrnYxFsyNqlQY6Ga7q1T0t2ywlvHfdiT42ElmfMmDYJNzbHIR7GNTf99HaMczsJh8Fcg8eP/9\n9/H9738fU6ZMEasO5uXlISUlBVlZWThz5gxMJhN2796NqKiooMbmD/75/of7fDnE0P98Of+O+uNL\nHhw7dgw2mw1dXV3o6urC0qVL8ctf/hJNTU1ec27Tpk14+eWXER4eji1btmDevHkAuoeLWbZsGdrb\n25Geni4O9zTY2EJB/8cEZeSTkvNtMHzNAxZMfVqGBVPyHzmftOQcm1yEdkFQbvEMLCZfyTkP5Bwb\nwIKpfGJQ9oWynPNAzrFJgQXT7vmh9pvwNQ9YlZeIiIiIiIgkxYIpERERERERSYoFUyIFqq2txV13\n3YVJkyZh8uTJYvuXpqYmWK1WJCQkIC0tTewIIlRoNFpxOABvL41GK3WYRERERCGHbUx9WoZtTMl/\nAtn+pKGhAQ0NDZg6dSouXLiA6dOno7i4GDt27MDo0aPx5JNPYvPmzWhubobdbg9qbFIa6PFkIPse\n2m065RbPwGLylZzzQM6xAWxjKp8YlN3mTc55IOfYpMA2pt3zQ+03wTamRAS9Xo+pU6cCAEaNGoWJ\nEyfC5XKhpKQENpsNAGCz2VBcXCxlmEREREREAIBwKTd+6tQpNDU1SRkCkeKdOnUKR48exaxZs9DY\n2AidTgcA0Ol0aGxs9Lrexo0bxb8tFgssFkuAIyWSlsPhgMPhkDoMIiKikCRpVd4xY8biq6+iEBam\n7nW5zs52XLjwOViVl5QoGNV8Lly4gLlz5+Kpp55CZmYmoqOj0dzcLM7XarW93hxSahUkVuVVajwD\ni8lXcs4DOccGsCqvfGJQdtVCOeeBnGOTAqvyds8Ptd+Er3kg6RNTj6cDbncZgFgvS3wGYHIQIyJS\njitXrmDRokVYunQpMjMzAXQ/JW1oaIBer0d9fT1iYmIkjpKIiIiIiG1MiRRJEASsXLkSZrMZ69at\nE6dnZGSgsLAQAFBYWCgWWImIiIiIpCRpVd7IyFi0tR1B/09MWZWXlCeQ1Xzef/99fP/738eUKVP+\nXX0GyMvLQ0pKCrKysnDmzBmYTCbs3r0bUVFRQY1NSqzKq9R4BhaTr+ScB1LHptFo4XY397OU8qvl\nKWEf5fobHwip86Avco5NCqzK2z0/1H4TvubBkAqmJpMJGo0GI0aMgFqtRkVFBZqamvDQQw/h9OnT\nXi98WTANvR8m3UjOJy05xzYULJgqNZ6BxeQrOeeB1LHxInMg8+UQg7IvlKXOg77IOTYp8JjRPT/U\nfhNBHS5GpVLB4XDg6NGjqKioAADY7XZYrVZUV1cjNTW11zESiYiIiIiIQkc4VCpVny+NRit1kJIa\nchvT60vBHCeRiIiIiIjoWh3ofqLq/dV/EwllG1KvvCqVCvfccw9GjBiBxx57DI888siAx0ncuHEj\nvvrqAoDnAdwPwDKUUIhkj2MkEhGRvISL/RB4ExERjbY2jjlPRIE3pDam9fX1iI2Nxblz52C1WrFt\n2zZkZGT0O04i25iGXh1zupGc25/IObahYBtTpcYzsJh8Jec8kDo2thcbyHw5xOCffWAe+E7OsUmB\nx4yBzO9eRkm/m6C2MY2N7S5QjhkzBgsXLkRFRYU4TiIAjpNIRERERERE/Rp0wfTSpUtwu90AgIsX\nL6KsrAxJSUkcJ5GIiIiIZKW2thZ33XUXJk2ahMmTJ2Pr1q0AgKamJlitViQkJCAtLQ0tLS3iOnl5\neYiPj0diYiLKysrE6ZWVlUhKSkJ8fDzWrl0b9H0hUqpBF0wbGxsxZ84cTJ06FbNmzcL999+PtLQ0\nrF+/Hm+//TYSEhLw7rvvYv369f6Ml4iIiIjIJ2q1Gi+88AI+++wzHDp0CC+++CKOHz/udTSJqqoq\n7Nq1C1VVVSgtLcWaNWvEKomrV69GQUEBnE4nnE4nSktLpdw1WdBotH32Nks0EIPu/GjChAn4+OOP\nb5iu1Wqxb9++IQVFROQrjUYb8r3ZERFR7/R6PfR6PQBg1KhRmDhxIlwuF0pKSnDgwAEA3aNJWCwW\n2O127NmzB9nZ2VCr1TCZTIiLi0N5eTnGjx8Pt9uNlJQUAEBOTg6Ki4sxf/58yfZNDrrPv/21ryTq\n25B65SUikov+T4oAT4xERHTq1CkcPXoUs2bN8jqaxNmzZzF79mxxHaPRCJfLBbVaDaPRKE43GAxw\nuVy9bmfjxo3i3xaLBRaLxf87QyQjQx2BggVTIiIiIgoJFy5cwKJFi7BlyxZERET0mOfvaqfXFkyJ\nQsH1N2Byc3N9Wn9IvfISEREREQ0HV65cwaJFi7B06VKxc05vo0kYDAbU1taK69bV1cFoNMJgMKCu\nrq7HdIPBEMS9IFIuFkyJiIiISNEEQcDKlSthNpuxbt06cbq30SQyMjJQVFQEj8eDmpoaOJ1OpKSk\nQK/XQ6PRoLy8HIIgYOfOnRyBgshPWJWXiIiIiBTtgw8+wKuvvoopU6YgOTkZQPdwMOvXr0dWVhYK\nCgpgMpmwe/duAIDZbEZWVhbMZjPCw8ORn58vVvPNz8/HsmXL0N7ejvT09JDv+IjIX1TC1b6vg7lR\nlQqCICAyMhZtbUcAxHpZ8jMAkzGwDk2CsYz/tiPBx04yczUP5EjOsXnTfcEQvPz05/b8EVNoxzOw\nmHwl5zyQOrb+v99gnHPlPl8OMfhjH9QAOrzOjYiIRltbUz/vERhS50Ff5BxbIAz9mBAK+RR6ZQRf\n84BPTImIiIjIiw70dTHtdrO3cyLyD7YxJSIiIiIiIkmxYEpERERERESSYsGUiIhIAitWrIBOp0NS\nUpI4rampCVarFQkJCUhLS0NLS4sksWk0WnFMx95eRERE/saCKRERkQSWL1+O0tLSHtPsdjusViuq\nq6uRmpoKu90uSWxudzO62xV6exEREfkXC6aSCO/zTrRGo5U6QCIiCrA5c+YgOjq6x7SSkhLYbDYA\ngM1mQ3FxsRShERGRJEK7jMBeeSXBHu6IiOhGjY2N0Ol0AACdTofGxkaJIyIiouAJ7TICC6ZEREQy\n1F97zo0bN4p/WywWWCyWwAdFJCGHwwGHwyF1GEQUICyYEhERyYROp0NDQwP0ej3q6+sRExPjddlr\nC6ZEoeD6GzC5ubnSBUNEfsc2pkQKJeceP4modxkZGSgsLAQAFBYWIjMzU+KIiIiIgoMFUyKFknOP\nn0QEZGdn47vf/S5OnDiBsWPHYseOHVi/fj3efvttJCQk4N1338X69eulDpOIiCgoVIIgBL3fd5VK\nBUEQEBkZi7a2IwBivSz5GYDJ6L9relWQlgnediT4WijIruZBIJ06dQoLFizAsWPHAACJiYk4cOCA\nWF3QYrHg888/lyQ2f+tui+ef/BzIvvtze/6IKbTjGVhMvpJzHgQ6tv6/v2CcL4f7fDnEEJx9kCpP\nQjlH5YbHDH/MH9h7DKffla95wDamRCHElx4/2bEKhRp2rEJERCQdPjH1aRk+MSX/keKJaXR0NJqb\nm8X5Wq0WTU1NksTmb3xiGsrxDCwmX8k5D1QqFTZv3tzrvLCwMKxYsQJa7eDHu+PTD3/Ml0MMyn6C\nI/cclWtsgcBjhj/mD+w9htPvik9MicgrX3r8JCJ5+81vzvc6fcSI1zFp0iTce++9QY6IiIho8Fgw\nJQohV3v8/NWvfhWUHj81Gi3c7uY+l4mIiEZb241PbYmobx0dv+91+q23HgtyJESkdAM5nxMNFXvl\nJVIoOfT42X0SE/p88URHRESB5usQanl5eYiPj0diYiLKysrE6ZWVlUhKSkJ8fDzWrl0b1H2QUv/n\ncwqOcKhUKq8vjWbwTTjkgAVTIoV67bXXcPbsWXg8HtTW1mL58uXQarXYt28fqqurUVZWhqioKKnD\nJCIiCjhfhlCrqqrCrl27UFVVhdLSUqxZs0ZsJ7d69WoUFBTA6XTC6XTe8J5EgdUBJd/sZ8FUlvq+\nG6KEOyI0/Gk02n5/p0RERHIwZ84cREdH95hWUlICm80GALDZbCguLgYA7NmzB9nZ2VCr1TCZTIiL\ni0N5eTnq6+vhdruRkpICAMjJyRHXIaKhYxtTWbp6N8Q7t5sX/SStr6v19IW/UyIikidvQ6idPXsW\ns2fPFpczGo1wuVxQq9UwGo3idIPBAJfL5fX9OewahZqhDrvGgikRERERhbRA1PS5tmBKFAquvwGT\nm5vr0/qsyktEssdqw0RE5G9Xh1AD0GMINYPBgNraWnG5uro6GI1GGAwG1NXV9ZhuMBiCGzSRgrFg\nSkSyN5DefYmIiHxxdQg1AD2GUMvIyEBRURE8Hg9qamrgdDqRkpICvV4PjUaD8vJyCIKAnTt3BnzY\nNaJQwqq8RERERKRo2dnZOHDgAM6fP4+xY8fimWeewfr165GVlYWCggKYTCbs3r0bAGA2m5GVlQWz\n2Yzw8HDk5+eLNXPy8/OxbNkytLe3Iz09HfPnz5dyt4gURSVc7f86mBtVqSAIAiIjY9HWdgRArJcl\nPwMwGQPrYCUYy8hrOxJ8deRHV/NAjgYSW/dJ2j/5EMxt+Su35BZTaMczsJh8Jfcc9fa5REbei9de\n+xnuvffegLz/v5cY4nx/vIfc58shhuDsg1R5IvcclWtsgxH4Y0Io5FNw9lFOvztf84BVeYmIiEJM\nf+22iYiIgo1VeYmIiEJM/8M9sXBKRETBxSemREREREREJCkWTImIiIiIQhir9ytFeJ/fo0ajlTrA\nPrEqLxERERFRCGP1fqXoQF/fo9st7++RT0yJiIiIiIhIUiyYEhERERERkaRYMCUiIiIiIiJJsY3p\nsP62H04AABIESURBVBXeb2P0iIhotLU1BSkeUpp9+/Z5nceOEIiIiIjIn1gwHbb6btwMyL+BM8nb\n4sV2r/MuX/40iJEQERERkdKxYEpEvWpt9f7EVKNZjK+++t8gRkNERERESsY2pkRERERERCQpFkyJ\niEgB+h5UfDgMLE5EFCgajbbP4yORHLAqLxERKQDb3RMReeN2N6PvYySPjyQ9PjElIiIiIiJSvL5r\nF0ldsyggBdPS0lIkJiYiPj4emzdvDsQmfOSQOoAAcvQxr/+qbSrVTT79QB2OvrbnX8HclhTbk5L0\nOeoY5u8fDI5h/v7B4JA6gIAZao4uWvRDiartOQL43kPlkDqAPjikDoB8JP15dLAcUgfQB4fUAfTB\nIXUAfXD4sOzV2kW9v7qfrEvH7wXTzs5O/OQnP0FpaSmqqqrw2muv4fjx4/7ejI8cEm8/kBx9zOv7\nx9f9utLn/Ot/oCyYDn/yyFHHMH//YHAM8/cPBofUAQSEP3K0vb0NfR/7A8URwPceKofUAfTBIXUA\n5AN5nEe/1l/70Z43oxxShTkADqkD6IND6gD64JA6AL/xe8G0oqICcXFxMJlMUKvV+OEPf4g9e/b4\nezMUND2fuubm5srusX8g9HeQH877zBwlkjfmKJG8yS1Hv24/KsXNKCL/8XvnRy6XC2PHjhX/NxqN\nKC8v733j4WGIiMhBWNjNvc7v7HTjwgV/R0i+ub5DkY3/fn1NiR2K9NdJwHDe54HmaGTkAq/v8dVX\nhwMSGxENPUcvX2Z+EgWSL9e6/fnwww/xve99r89lwsJuQleXZ1DvTzSc+L1gOtC2K18vd3YgS8to\nGTluJzfIsdy4vUC1WcrN7WvfAr29vvdpuHavPtC4W1vfHMi7DWGZrz/rgcXk6zLefjv9v8/Av9tA\nHw+G+2cUjGUGnrPA8Mhb/+XoUM8zg52f28/8YMTgbf71v/lAb9+X9xhsPkq/D8Mhr/zJ92vdoRlY\nodQf141S/9Zy0fc1bTBikHr+YN/Dl3OhfPPZ7wVTg8GA2tpa8f/a2loYjcYeywgCqxQQSYU5SiRv\nzFEieWOOEgWG39uYzpgxA06nE6dOnYLH48GuXbuQkZHh780Q0SAxR4nkjTlKJG/MUaLA8PsT0/Dw\ncGzfvh3z5s1DZ2cnVq5ciYkTJ/p7M0Q0SMxRInljjhLJG3OUKECEINq7d69wxx13CHFxcYLdbvf7\n+y9fvlyIiYkRJk+eLE778ssvhXvuuUeIj48XrFar0Nzc7LftnTlzRrBYLILZbBYmTZokbNmyJWDb\nbG9vF1JSUoRvf/vbwsSJE4X169cHbFvX6ujoEKZOnSrcf//9Ad/e+PHjhaSkJGHq1KnCzJkzA769\n5uZmYdGiRUJiYqIwceJE4dChQwHZ3ueffy5MnTpVfGk0GmHLli0B/+4G6/rv3J+u/8z//ve/+30b\nmzZtEsxmszB58mQhOztbuHz58pDe7/+3d28xUV1dHMD/g0Jpiqkt4WanhAkEiDAMKBRi2iCCtbGl\ngCIRFWJLNbFpExq8vUkfCigYhaZPDUZrLLS1VaQBKgoCKUHUAVuVxFaGeOFimJYil3QYWN+D4Xyg\ngx/DnHNm+2X9npiBWWufffbaszcM5yg9r9iKv3v3bgoNDaWIiAhKS0ujoaEh2Y9hWklJCWk0GjKb\nzbLHLysro9DQUAoLC6O9e/fKGv/y5csUExNDkZGRFB0dTe3t7QuOr+Zc7ihb86SzqP2e62jbDhw4\nQK+99po0F9fW1qreLpHH2lxtE6HfnLUGWgil17r2ELlGRa2F52Gsqbk2t4cc63jVNqZWq5UCAwPJ\nZDKRxWIhg8FAt27dkjVHc3MzGY3GWQW4Z88eOnjwIBERFRUV0b59+2TL19fXRx0dHURE9OjRIwoO\nDqZbt24plnN0dJSIiCYmJig2NpZaWloUPT4iosOHD9OWLVsoOTmZiJTtz4CAgKcWx0rmy87OpvLy\nciJ63KdDQ0OK9+fk5CT5+vrS3bt3Fc+1UE+ecznZ6nM5mUwm0ul00mY0IyODjh8/7lBMpecVW/HP\nnz9Pk5OTRES0b98+h8eGrRxEjxcG69ats1l7jsZvaGigpKQkslgsRET08OFDWePHx8dTXV0dERHV\n1NTQ6tWrFxxf7bncEY6eKzmp/Z7raNvy8/Pp8OHDTmnPNJHH2lxtE6HfiJyzBrKXGmtde4hcoyLX\nguhjTc21uT3kWMertjFtbW2ldevWSY8LCwupsLBQ9jwmk2lWAYaEhFB/fz8RPS6CkJAQ2XNOS0lJ\nofr6esVzjo6OUnR0NN24cUPRXPfu3aPExERqaGiQfiujZL6AgAAaHByc9ZxS+YaGhkin0z31vNLn\n7pdffqE333xTlVwLYeucy2WuPpeT2Wym4OBg+uuvv2hiYoLee+89qq+vdziu0vPKk/Fn+umnn2jr\n1q0OxZ8rR3p6Ol2/fl2Wzc6T8Tdt2kQXL150KOaz4m/evJm+++47IiL69ttvZemjaWrN5Qtha550\nJme+5/4vT7YtPz+fSkpKnNYeW0Qea9NtE63f1FoDLYRaa117iFyjM4lYCyKONbXX5vaQYx0v+8WP\n5mLrnk8PHjxQPO/AwAB8fHwAAD4+PhgYGFAkT09PDzo6OhAbG6tYzqmpKURGRsLHxwcJCQkICwtT\n9Pg+++wzFBcXw8Xlv8NEyXwajQZJSUmIjo7G119/rWg+k8kELy8vfPDBB1ixYgV27NiB0dFRxcdL\nZWUlMjMzAag3Nu1h65zLxVafj42NyZrj1VdfRV5eHvz9/bFs2TIsXboUSUlJsuYA1D13x44dw/r1\n62WPW1VVBa1Wi4iICNljA8Aff/yB5uZmxMXFYfXq1bh6Vd57axYVFUnnes+ePSgsLJQlrhpzuSNs\nzZMiEbHPZvryyy9hMBiQk5ODoaEhp7ZF5LE23ba4uDgAYvSb2mughXDWWtceovUZIF4tiDzW1F6b\n20OOdbxqG1MR7nGl0WgUacfIyAg2btyI0tJSLFmyRLGcLi4u6OzsxP3799Hc3IzGxkbFcv3888/w\n9vZGVFTUnJc8l7s/f/31V3R0dKC2thZfffUVWlpaFMtntVphNBrx8ccfw2g04qWXXkJRUZFi+QDA\nYrGguroamzZteup7So1Ne8znnDtiPn3uqDt37uDo0aPo6elBb28vRkZGcOrUKVlzPEnJc/fFF1/A\nzc0NW7ZskTXu2NgYCgoKZt23V+5zbrVa8ffff6OtrQ3FxcXIyMiQNX5OTg7Kyspw9+5dHDlyBB9+\n+KHDMdWayx3xv+ZJkYjSZ9N27doFk8mEzs5O+Pn5IS8vz2ltEXmsjYyMID09HaWlpfDw8BCm39Rc\nAy2Us/PbS4Q+E7EWRB1rzlib20OOdbxqG9P53PNJCT4+Pujv7wcA9PX1wdvbW9b4ExMT2LhxI7Ky\nspCamqpKzpdffhnvvvsurl27pliu1tZWnDt3DjqdDpmZmWhoaEBWVpaix+bn5wcA8PLyQlpaGtrb\n2xXLp9VqodVqERMTAwBIT0+H0WiEr6+vYsdXW1uLlStXwsvLC4Dy48Rets55dna2bPHn6nM5Xb16\nFatWrYKnpycWL16MDRs2oLW1VdYcgDrn7vjx46ipqVFkY33nzh309PTAYDBAp9Ph/v37WLlyJR4+\nfChbDq1Wiw0bNgAAYmJi4OLiArPZLFv89vZ2pKWlAXg8ltrb2x2K54y5fCFszZMiEbHPpnl7e0sL\no48++shpfSfyWJtu27Zt26S2idJv09RYAy2Us9a69hCpz0SuBUC8seaMtbk95FjHq7YxddY9n95/\n/32cOHECAHDixAlp4MuBiJCTk4Ply5cjNzdX0ZyDg4PSx2fGx8dRX1+PqKgoxY6voKAA9+7dg8lk\nQmVlJdasWYOTJ08qlm9sbAyPHj0CAIyOjuL8+fPQ6/WK5fP19cXrr7+O27dvAwAuXLiAsLAwJCcn\nKzZeKioqpI/xAsqOzYWwdc6/+eYb2eLP1edyCg0NRVtbG8bHx0FEuHDhApYvXy5rDkD5c1dXV4fi\n4mJUVVXB3d1d1tgAoNfrMTAwAJPJBJPJBK1WC6PRKOubWWpqKhoaGgAAt2/fhsVigaenp2zxg4KC\n0NTUBABoaGhAcHDwgmOpOZc7Yq55UiSi9dlMfX190tdnzpxxSt+JPNbmapsI/ab2Gmihnof7m4rS\nZ6LWgshjTe21uT1kW8fL/H+vz1RTU0PBwcEUGBhIBQUFssffvHkz+fn5kaurK2m1Wjp27BiZzWZK\nTExU5BLKLS0tpNFoyGAwzLqMuhI5f/vtN4qKiiKDwUB6vZ4OHTpERKTo8U27dOmSdOUvpfJ1d3eT\nwWAgg8FAYWFh0vhQ8vg6OzspOjp61i05lMo3MjJCnp6eNDw8LD2nxrlbqJnnXE62+lxuBw8elG4X\nk52dLV0VdqGUnleejF9eXk5BQUHk7+8vzSu7du2S5Rjc3NykY5hJp9M5dPEjW/EtFgtt27aNwsPD\nacWKFdTY2Ohw/Jnn4MqVK9Il/ePi4shoNC44vppzuSPmmiedRe33XEfaVl5eTllZWaTX6ykiIoJS\nUlKkC3KoSeSxZqttNTU1QvSbM9dA9lJ6rWsPkWtU1Fp4XsaaGmtze8i1jtcQKfDPZIwxxhhjjDHG\n2Dyp9lFexhhjjDHGGGPMFt6YMsYYY4wxxhhzKt6YMsYYY4wxxhhzKt6YMsYYY4wxxhhzKt6YCsTF\nxQW7d++WHpeUlODzzz+XNUdkZOSsW5YAwNGjRzE+Pi5rHsb+H3GNMiY2rlHGxMY1yp6FN6YCcXNz\nw5kzZ6Sb0Gs0Glnjd3V1wd3dHZcvX8bY2Jj0fGlp6azHjDHbuEYZExvXKGNi4xplz8IbU4G4urpi\n586dOHLkyFPf2759O3788UfpsYeHBwDg0qVLiI+PR2pqKgIDA7F//36cPHkSb7zxBiIiItDd3S29\npqKiApmZmXj77bdRVVUFACgrK0Nvby8SEhKQmJgo/VxERAT0ej32798/K+fevXsRHh6OtWvXoq2t\nDfHx8QgMDER1dTUA4ObNm4iNjUVUVBQMBgP+/PNP+TuKMSfhGmVMbFyjjImNa5Q9k/K3XGXz5eHh\nQcPDwxQQEED//PMPlZSUUH5+PhERbd++nU6fPj3rZ4mIGhsbaenSpdTf30///vsvLVu2jA4cOEBE\nRKWlpZSbmyu9JiQkhHp7e+nixYvSTXmJiAICAshsNhMR0YMHD8jf358GBwfJarXSmjVr6OzZs0RE\npNFoqK6ujoiI0tLSaO3atWS1Wun69esUGRlJRESffPIJnTp1ioiIJiYmaHx8XImuYswpuEYZExvX\nKGNi4xplz8J/MRXMkiVLkJ2djbKysnm/JiYmBj4+PnBzc0NQUBDWrVsHAAgPD0dPTw8A4OrVq/Dy\n8oKfnx/i4+PR2dmJoaGhp2JduXIFCQkJ8PT0xKJFi7B161Y0NzcDePzxi+nYer0eCQkJWLRo0aw8\nq1atQkFBAQ4dOoSenh64u7s70BuMiYdrlDGxcY0yJjauUTYX3pgKKDc3F+Xl5RgdHZWeW7x4Maam\npgAAU1NTsFgs0vdeeOEF6WsXFxfpsYuLC6xWK4DHH1no6uqCTqdDUFAQhoeHcfr06adyazQaEJH0\nmIikz/+7urrOyuPm5vZUnszMTFRXV+PFF1/E+vXr0djY6FhnMCYgrlHGxMY1ypjYuEaZLbwxFdAr\nr7yCjIwMlJeXS4USEBCAa9euAQDOnTuHiYmJeccjIvzwww+4ceMGTCYTTCYTzp49i4qKCgCPf3M1\nPDwM4PFvpJqammA2mzE5OYnKykrEx8fPO1d3dzd0Oh0+/fRTpKSk4Pfff5/3axl7XnCNMiY2rlHG\nxMY1ymzhjalAZl6ZLC8vD4ODg9LjHTt2oKmpCZGRkWhra5P+IfzJ1z0ZT6PRoKWlBVqtFr6+vtL3\n3nrrLXR1daG/vx87d+7EO++8g8TERPj5+aGoqAgJCQmIjIxEdHQ0kpOTbeaZ+Xj66++//x7h4eGI\niorCzZs3kZ2d7UCPMCYWrlHGxMY1ypjYuEbZs2ho5t+yGWOMMcYYY4wxlfFfTBljjDHGGGOMORVv\nTBljjDHGGGOMORVvTBljjDHGGGOMORVvTBljjDHGGGOMORVvTBljjDHGGGOMORVvTBljjDHGGGOM\nOdV/ACIsw152wS8BAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 97 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up the parallelization\n", "\n", "I'm going to be doing a fair amount of embarassingly parallel computation, so I'll take advantage of IPython's support for parallel computing. For this to work, you need to have a cluster of workers set up. This is easy in the Notebook (there's a tab in the dashboard), and it's doable from the command line (http://ipython.org/ipython-doc/stable/parallel/parallel_process.html)\n", "\n", "It took a bit of time to figure out how to do this, but once over that hurdle, this thing is super easy and useful." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.parallel import Client\n", "rc = Client()\n", "dview = rc[:]\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "with dview.sync_imports():\n", " from rdkit import Chem\n", " from rdkit import DataStructs\n", " from rdkit.Avalon import pyAvalonTools" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "importing Chem from rdkit on engine(s)\n", "importing DataStructs from rdkit on engine(s)\n", "importing pyAvalonTools from rdkit.Avalon on engine(s)\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Verify that it works:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "t1=time.time()\n", "_ = [Chem.RDKFingerprint(m) for m in mols[:10000]]\n", "t2=time.time()\n", "print('Serial: %.2f'%(t2-t1))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Serial: 15.39\n" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "fn = lambda x:Chem.RDKFingerprint(x)\n", "t1=time.time()\n", "_ = dview.map_sync(fn,mols[:10000])\n", "t2=time.time()\n", "print('Parallel: %.2f'%(t2-t1))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Parallel: 5.24\n" ] } ], "prompt_number": 70 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above runtimes were collected with a cluster of 4 workers, so it looks like the parallization is working." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The test harness" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def subTest(mol,queries,qfps,fpf,verify,silent):\n", " nFound=0\n", " nScreened=0\n", " nTot=0\n", " nFailed = 0\n", " molfp = fpf(mol,False)\n", " for j,queryfp in enumerate(qfps):\n", " nTot += 1 \n", " if DataStructs.AllProbeBitsMatch(queryfp,molfp):\n", " nScreened += 1\n", " if mol.HasSubstructMatch(queries[j]):\n", " nFound += 1\n", " elif verify:\n", " if mol.HasSubstructMatch(queries[j]):\n", " nFailed += 1\n", " if not silent:\n", " print('Failure: %s %s'%(Chem.MolToSmiles(mol,True),Chem.MolToSmiles(queries[j],True)))\n", " return nFound,nScreened,nTot,nFailed\n", "\n", "def testScreenout(mols,queries,fpf,dview,verify=False,silent=False):\n", " nFound=0\n", " nScreened=0\n", " nTot=0\n", " nFailed = 0\n", " #if not silent: print(\"Building query fingerprints\")\n", " qfps = [fpf(m,True) for m in queries]\n", " #if not silent: print(\"Running Queries\")\n", " t = lambda x,qfps=qfps,queries=queries,fpf=fpf,verify=verify,silent=silent,subTest=subTest:subTest(x,queries,qfps,fpf,verify,silent)\n", " res=dview.map_async(t,mols)\n", " for entry in res:\n", " nFound+=entry[0]\n", " nScreened+=entry[1]\n", " nTot+=entry[2]\n", " nFailed+=entry[3]\n", " if not silent:\n", " accuracy = float(nFound)/nScreened\n", " print(\"Found %d matches in %d searches with %d failures. Accuracy: %.3f\"%(nFound,nScreened,nFailed,accuracy))\n", " return nTot,nScreened,nFound,nFailed\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tests" ] }, { "cell_type": "code", "collapsed": false, "input": [ "methods = {\n", " 'Avalon-2K':lambda x,y: pyAvalonTools.GetAvalonFP(x,nBits=2048,isQuery=y,bitFlags=pyAvalonTools.avalonSSSBits),\n", "\n", " 'Pattern-1K':lambda x,y: Chem.PatternFingerprint(x,fpSize=1024),\n", " 'Pattern-2K':lambda x,y: Chem.PatternFingerprint(x,fpSize=2048),\n", " 'Pattern-4K':lambda x,y: Chem.PatternFingerprint(x,fpSize=4096),\n", "\n", " 'Layered': lambda x,y:Chem.LayeredFingerprint(x,layerFlags=Chem.LayeredFingerprint_substructLayers),\n", " }" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "leadResults={}\n", "fragResults={}\n", "pieceResults={}\n", "for method,func in methods.iteritems():\n", " print(\"----------------------------\")\n", " print(\"Doing %s\"%method)\n", " print(\"Leads\")\n", " leadResults[method]=testScreenout(mols,leads,func,dview)\n", " print(\"Frags\")\n", " fragResults[method]=testScreenout(mols,frags,func,dview)\n", " print(\"Pieces\")\n", " pieceResults[method]=testScreenout(mols,pieces,func,dview)\n", " " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "----------------------------\n", "Doing Avalon-2K\n", "Leads\n", "Found 886 matches in 2044 searches with 0 failures. Accuracy: 0.433" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Frags\n", "Found 4479 matches in 13656 searches with 0 failures. Accuracy: 0.328" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Pieces\n", "Found 1926728 matches in 4141259 searches with 0 failures. Accuracy: 0.465" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "----------------------------\n", "Doing Layered\n", "Leads\n", "Found 1274 matches in 6797 searches with 0 failures. Accuracy: 0.187" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Frags\n", "Found 4659 matches in 44258 searches with 0 failures. Accuracy: 0.105" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Pieces\n", "Found 1935315 matches in 4642461 searches with 0 failures. Accuracy: 0.417" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "----------------------------\n", "Doing Pattern-2K\n", "Leads\n", "Found 1274 matches in 1781 searches with 0 failures. Accuracy: 0.715" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Frags\n", "Found 4659 matches in 7892 searches with 0 failures. Accuracy: 0.590" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Pieces\n", "Found 1935315 matches in 3381436 searches with 0 failures. Accuracy: 0.572" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "----------------------------\n", "Doing Pattern-1K\n", "Leads\n", "Found 1274 matches in 7527 searches with 0 failures. Accuracy: 0.169" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Frags\n", "Found 4659 matches in 16372 searches with 0 failures. Accuracy: 0.285" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Pieces\n", "Found 1935315 matches in 3874208 searches with 0 failures. Accuracy: 0.500" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "----------------------------\n", "Doing Pattern-4K\n", "Leads\n", "Found 1274 matches in 1748 searches with 0 failures. Accuracy: 0.729" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Frags\n", "Found 4659 matches in 7742 searches with 0 failures. Accuracy: 0.602" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Pieces\n", "Found 1935315 matches in 3048867 searches with 0 failures. Accuracy: 0.635" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarize that:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "mns = sorted(methods.keys())\n", "for mn in mns:\n", " print(mn,'Pieces','%.3f'%(float(pieceResults[mn][2])/pieceResults[mn][1]),'Fragments','%.3f'%(float(fragResults[mn][2])/fragResults[mn][1]),\n", " 'Leads','%.3f'%(float(leadResults[mn][2])/leadResults[mn][1]))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Avalon-2K Pieces 0.465 Fragments 0.328 Leads 0.433\n", "Layered Pieces 0.417 Fragments 0.105 Leads 0.187\n", "Pattern-1K Pieces 0.500 Fragments 0.285 Leads 0.169\n", "Pattern-2K Pieces 0.572 Fragments 0.590 Leads 0.715\n", "Pattern-4K Pieces 0.635 Fragments 0.602 Leads 0.729\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A caveat\n", "\n", "The results from the Avalon fingerprint shown above are not directly comparable to the others since structures are being filtered out that shouldn't be.\n", "\n", "To demonstrate this, we need a serial form of the screenout test:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def testScreenoutSerial(mols,queries,fpf,verify=False,silent=False):\n", " nFound=0\n", " nScreened=0\n", " nTot=0\n", " nFailed = 0\n", " #if not silent: print(\"Building query fingerprints\")\n", " qfps = [fpf(m,True) for m in queries]\n", " #if not silent: print(\"Running Queries\")\n", " t = lambda x,qfps=qfps,queries=queries,fpf=fpf,verify=verify,silent=silent,subTest=subTest:subTest(x,queries,qfps,fpf,verify,silent)\n", " res=map(t,mols)\n", " for entry in res:\n", " nFound+=entry[0]\n", " nScreened+=entry[1]\n", " nTot+=entry[2]\n", " nFailed+=entry[3]\n", " if not silent:\n", " accuracy = float(nFound)/nScreened\n", " print(\"Found %d matches in %d searches with %d failures. Accuracy: %.3f\"%(nFound,nScreened,nFailed,accuracy))\n", " return nTot,nScreened,nFound,nFailed\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now we can get some examples:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "testScreenoutSerial(mols[:500],leads,methods['Avalon-2K'],verify=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Failure: Cc1ccc(C)c(S(=O)(=O)c2nnn3c4ccsc4c(Nc4ccc(C)c(C)c4)nc23)c1 c1cn[nH]n1\n", "Failure: COc1ccc(OCC#Cc2cn([C@H](C)C[C@@H]3CC[C@H]([C@@H](C)C(=O)N(C)Cc4ccccc4)O3)nn2)cc1 c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: Cc1c(C(=O)Nc2ccccc2)nnn1Cc1ccccc1O c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: c1c(CCc2ccccc2)nnn1C[C@H]1CC[C@@H]([C@@H]2CC[C@H](Cn3cc(CCc4ccccc4)nn3)O2)O1 c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C(NCc1ccc(F)cc1)[C@@H](CCN1[C@@H]2CC[C@@H]1CC(n1nnc3cccnc31)C2)c1ccccc1 c1cn[nH]n1\n", "Found 10 matches in 22 searches with 5 failures. Accuracy: 0.455\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "(250000, 22, 10, 5)" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at a specific example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "mfp = pyAvalonTools.GetAvalonFP('Cc1ccc(C)c(S(=O)(=O)c2nnn3c4ccsc4c(Nc4ccc(C)c(C)c4)nc23)c1',True,2048,isQuery=False)\n", "qfp = pyAvalonTools.GetAvalonFP('c1cn[nH]n1',True,2048,isQuery=True)\n", "DataStructs.AllProbeBitsMatch(qfp,mfp)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "False" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "Chem.MolFromSmiles('Cc1ccc(C)c(S(=O)(=O)c2nnn3c4ccsc4c(Nc4ccc(C)c(C)c4)nc23)c1')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAYAAABNcIgQAAAfE0lEQVR4nO3de1iVVb4H8C8iDCoI\nhIp5QUQEQUzBKyBegI1nUhtvGGrglAmNebTmyUtTk9lt1Kw5JlrYZOHJUVGTQkG5CQSKCBLIRREV\nuSjqiG0uAsLe6/yxjyjeuL2bDe7v53l6hmTv3/q9fzjf1nrXu14dIYQAERGRluqi6QaIiIg0iUFI\nRERajUFIRERajUFIRERajUFIRERajUFIRERajUFIRERajUFIRERajUFIRERajUFIRERajUFIRERa\njUFIRERajUFIRERajUFIRERajUFIRERajUFIRERajUFI1EIpKSn49ddfNd0GEUmEQUjUQlFRUfjg\ngw803QYRSYRBSNRCXl5eOHHiBCorKzXdChFJgEFI1EJOTk4wMjJCfHy8plshIgkwCIlaSFdXF+7u\n7oiKitJ0K0QkAQYhUSvIZDJERkZqug0ikoCOEEJougmizubKlSuwtLTElStXYGFhoel2iKgNOCMk\naoVBgwbBxsaGy6NEzwAGIVEryWSydg/C0NBQKBSKdh2T6FnHICRqJZlMhujoaCiVynYZ76OPPsKf\n//xn5OXltct4RNqCQUjUSu7u7igvL8eZM2fUPtaqVauwZcsWxMXFwc7OTu3jEWkTBiFRKxkZGWH8\n+PFq3z26evVq7Ny5E9HR0Rg1apRaxyLSRgxCojZQ933CNWvW4F//+heioqLg6OiotnGItBmDkKgN\nZDKZ2o5b+/DDD/HNN98gPDwcTk5OktcnIhUGIVEbjBs3Dj169JD8uLX169fjyy+/xNGjRzFhwgRJ\naxNRYwxCojbQ1dXFhAkTsHPnTly9elWSmh999BG++OILHD16FM7OzpLUJKIn66rpBog6s4yMDCQn\nJ8PU1BT9+/eHnZ0dPD094enpiSlTpqBnz54tqvfxxx/j888/x9GjR+Hi4qKmronoQZwRErXS6dOn\nMXnyZPj6+iI/Px83b97E+vXrUVtbi7feegvGxsYYMmQIAgICsH//fpSXlz+13ieffIJNmzYhIiIC\nrq6u7XQVRMSzRolaITU1FTKZDL6+vtiyZQt0dHQe+cylS5cQHR2N6OhoxMbGQi6XY+TIkQ0zxsmT\nJ0NPTw+AKgQ3bNiAiIgIuLm5tfflEGk1BiFRC6WlpcHT07MhBAHg7bffhre39xNncnV1dTh16lRD\nMJ46dQpmZmZwd3eHkZERgoODceDAAcyYMaM9L4WIwCAkapGzZ8/C3d0dPj4++Oqrr6BUKvHKK68g\nISEBsbGxsLW1bVadiooKxMfHNwTjn/70J3z66adq7p6IHodBSNRMWVlZcHd3x/z587F161YIIbBk\nyRJERka2KAQftmTJEigUCvzwww/SNkxEzcLNMkTNkJ2dDQ8PD3h7ezeE4Ouvv97qEDx//jwOHz4M\n4P7pNPxvUiLNYBASNSEnJwfu7u6YN28eAgMDIYTA0qVLcezYsVbPBDMzMxEQEAAhBDw9PVFaWors\n7Gw1dE9ETWEQEj3FvRCcO3duoxA8evRom5ZDPT09cf36deTk5KBXr14YNWoUX/JLpCEMQqInyM3N\nhbu7O+bMmYNt27ZBCAF/f/82hyAAmJqawsnJqeHNFZp4yS8RqTAIiR7j4sWL8PLywuzZsxtCMCAg\nABEREW0OwXseDD+ZTIb4+HjU1ta2uS4RtQyDkOgxgoODMWLECAQGBgIA3njjDYSHh0sWgkDj8HNz\nc4OOjg6SkpIkqU1EzccgJHqM27dvw8TEBLq6uigrK0NGRgbCw8MlC0EAcHFxQZcuXZCUlAR9fX24\nublxeZRIAxiERI9xb9lSqVTCzMwMp06dwsiRIyUdQ19fH5MmTWq0PMogJGp/DEKix5g6dSrkcjnS\n09PVOs7D9wnT09Nx8+ZNtY5JRI0xCIkew8jICBMmTGjY1aku98Lvxo0bcHBwgLm5OWJjY9U6JhE1\nxiAkeoL2WKocPnw4+vXrh9jYWOjo6HB5lEgDGIRETyCTyZCUlITKykq1juPh4dFoefTYsWNqHY+I\nGmMQEj3B2LFjYWhoiISEBLWO4+Hhgbi4OACqICwpKcG5c+fUOiYR3ccgJHoCXV1dTJ06Ve1LlXPn\nzsWZM2cAAGZmZrCzs0NxcbFaxySi+xiERE8hk8nUvmGme/fuMDY2Rn19PRYtWoTa2lrJH9Ugoifj\n+wiJnqKgoACDBw9GYWEhBg4cqLZxamtrMWfOHBQUFCA2Nhbm5uZqG4uIGuOMkOgpLC0tMWTIELUu\nj9bV1WHhwoW4fPkyQ5BIAxiERE3w8vJqFIR1dXWS1a6rq4OPjw8yMjIQGRnJECTSAAYhURNkMhmi\no6OhVCoBAK+88gr69euH+fPnY8eOHa3e2FJXV4cFCxYgIyMDx48fx4ABA6Rsm4iaifcIiZogl8vR\nq1cvJCcnY/To0SguLkZMTEzDP9evX8fo0aPh4eEBDw8PuLi4oFu3bk+tWV9fDx8fH6SnpyMuLk6t\n9x+J6OkYhETN4OjoCHt7e6xfvx7W1taNfnft2jUkJiYiOjoa4eHhuHbtGkaNGgVPT094enpi4sSJ\nMDAwaPh8fX09FixYgLS0NMTFxcHCwqK9L0dSNTU1SExMxKRJk6Cvr6/pdohajEFI1ISff/4Z8+fP\nR+/evVFSUgJLS8uGkHN3d0fv3r0bPqtQKJCWltYwW0xKSoK9vT3S0tIAqEJw4cKFOH36NOLi4jBo\n0CBNXZZkLly4ABsbG8jlcvTs2VPT7RC1GIOQ6Cn27t0LPz8/fP3111iyZAmuX7+OhIQEREdHIzIy\nEgUFBbCysmoIRk9PT5iamjZ8v6amBoWFhbCxsWkIwRMnTiAuLu6RmWVnlZ6ejtGjR6O+vh5dunDb\nAXU+DEKiJzh27A78/IZgzZpV+Otf//rYz1y6dAnR0dFITExsuF/4uGXRew/LJyUl4fjx4xg6dGg7\nX436JCQk4MUXX1T7maxE6sIgJHqMY8eAWbOAzz+/geXL+zTrO/X19UhJSUFsbCxiYmJw8uRJdOvW\nDVOmTIFCoUBycjJiY2Ph4OCg3ubbWXh4OF577TWUlpZquhWiVmEQEj3kl18Ab29g0yZg5crW17lz\n5w6SkpIQExOD7t27Y9asWXjhhReka7SD2LdvH95//31cuHBB060QtUpXTTdA1NGEhAB//3vbQhBQ\nnSEqk8kgk8mkaayDqqyshKGhoabbIGo1BiHRQ3btArjno/kqKipgZGSk6TaIWo1/3YkewhBsGc4I\nqbPjX3kiapPKykrOCKlT49IoaZ29e4EvvwRSUoDDh4F7u/4NDFQ7RQ8fBn7/HXjlFU122XlUVFRw\nRkidGmeEpJWsrYFDhzTdxbOBM0Lq7BiEpJV8fID9+4EHHx765hvgjTeA7ds111dnxBkhdXYMQtJK\nXboA8+aplknveeMNVRguW6a5vjojbpahzo5BSFpr9myAz4C3HR+foM6Om2VI6/j43P85JeXR38+Y\n0X69PAs4I6TOjjNCImq1iIgI1NbWIiMjAzytkTorBiERtVhMTAxcXV0xd+5cODk5ITg4GFOmTEFe\nXp6mWyNqMQYhabW8PGDu3Ma7R+nJwsLCMGHCBMycORNubm4oKirC3r17cfHiRQwbNgwODg5Yu3Yt\nampqNN0qUbMxCEmr9ekD/Pwz8Ntvmu6kY4uNjYWrqyvmz5+PKVOmoKCgABs2bICZmRnkcjlMTEwQ\nFBSE6OhohIaGYsSIEYiNjdV020TNwiAkrWZiAowZA0RFabqTjik+Ph4TJ07EtGnT4ODggPPnz2PD\nhg3o0+f+Oxrnzp2LmTNnori4GJMmTUJ6ejoWLVqEF198EX5+figrK9PgFRA1jUFIWk8mYxA+LDU1\nFTKZDB4eHhg+fDjy8vIQFBQECwuLRz67Y8cO1NbWwsbGBhs3boS+vj4+/PBDnDp1Crm5uXBwcMDB\ngwc1cBVEzcMgJK0nkwGJicCdO5ruRPPS09Mhk8kwbtw4mJqaIjMzE0FBQRg8ePATv2NlZYXIyEgE\nBwdj8+bNGDt2LNLT0zFy5EicPHkSa9asweLFizFz5kyUlJS049UQNQ+DkLTehAmAg8NyJCfHaboV\njfrb3/6GcePG4e7du4iPj0dISAjs7e2b/X1vb29kZWXBwcEB48ePx9q1a6FQKLBy5UpkZmaipqYG\nDg4O2LFjBx+1oA6FQUhaT18feP75QkREHNF0Kxrz22+/ITg4GEeOHEF8fDzc3NxaVcfc3By7du3C\noUOHsGfPHowePRrJycmwsrLCsWPH8PHHH+Odd97BjBkzUFhYKPFVELUOg5AIgJeXF6K0+EbhrVu3\ncOfOHXh5eaG+vh6XLl2CUqlsdb3p06fj7Nmz8PDwgJubGwICAnDnzh0sX74c2dnZqKysxD//+U8J\nr4Co9RiERFAFYWZmJkpLSzXdikY8eF5ofn4+hgwZ0uZnAXv27IktW7YgIiIC0dHRGDNmDM6ePYuB\nAwdi/vz5SE5OlqJ1ojZjEBIBsLGxQd++ffHuu+8iMzNT6+5hPXheaEVFBXR1ddGtWzdJant6euLs\n2bOYN29ew2MXPKibOhIGIRGA7du3Qy6X48SJExg5ciT69u2LhQsX4rvvvkNBQYGm21O7B4OpsrIS\nPXr0gI6OjmT1u3fvjk8++QTm5uYNY/CgbuooGISk9TZu3Ig1a9bg2LFjOH/+PEpKSrB58+aG5+GG\nDx8Le3uBgAAgJAT4z3803bH0Hp4Rqnu2xrfaU0fCICSttmXLFnz66aeIiIjAxIkTUV5ejn79+sHX\n1xc//PADioqKcObMb3jzTR3cvKl6ea+5OeDkBKxaBZSXq17uO26cqt7hw41f9ttZPBhM7TFb41vt\nqSPh+whJa3311Vd4//33ER4ejokTJwIAZs+ejby8PLi7u8PDwwMeHh6wte0PW1vgzTcBhQI4cwaI\niQGSkoDu3VW1rK2BQ4cAPT01NVtTA/z1r6oBfv8dWLcOsLKSrPyDwdReM8LevXurdQyi5mIQklba\nunUr3nvvPYSHhzd6Zm7//v2Ii4tDTEwM/vGPf2Dx4sWQyeSwtu4JDw9gyhRg7FjVPw/y8VHNBBcs\nUFPD334LzJwJ/PGPwK1bwFtvAf/7v5KV54yQtBmXRknrbN26FWvXrsWhQ4ceeXD8ueeew5w5c7Bt\n2zbk5uaipKQEixYZQS4Hli1Tva1i9GhALm9cs0sXYN48NS6L5uYCjo6qn83MgKoqSctrYkbIIKSO\ngkFIWiUwMBBr165FaGgoPD09m/x8v379sHixDnbvBkpLgdOngVdfBYyNH/3s7NnAhQtqaBoAhg0D\n0tNVP9+6dX9NViIPBlN7zQi5WYY6Ci6NktbYtm0bVq9ejZ9//hkymazF39fRUW2ScXJq/Oc+Pvd/\nTklpY5NPsnSpajn06FGgrAz48ENJyz/8+ARnhKRNGITUIVVVJaO0dBP09PrB0NAZzz23qE31tm/f\njlWrViE0NLRVIahx3boBQUFqK//w4xN9+/ZV21j3xuCMkDoKBiF1SHJ5OPr2fRc9eoxt+sNN2L59\nO9555x2EhobCy8tLgu6erqIC6Gz/H88ZIWkzBiF1SH36/DdKSzfg5s2tMDV9GcbG03Ht2seQy8Oh\nq2sIXV0TdOliCF1dQwhhhD17jGFoaIgePXrA0NAQJiYmMDIyQkxMDD755BMcOnSoXUJw2zbgxx+B\nkyfVPpRKdTWwcCHwxRdtepziwWCytbWFnZ2dVB0+QqFQ4M6dO5wRUofBIKQOqWvX3hgw4AsAAhcu\n/BeMjafD0NAVurqmUCqroFDchkJRCYWiAtXVZYiOvoGKigpUVVWhsrIScrkc5eXlGDhwIDZv3oxp\n06a1S98TJwIrV6r2s5iZqXGgvDzAxATo0wcCStTEfYtuVv9oVanq6mrI5XLo6uoCADZt2iRho8C5\nc+fwwQcfoFevXti+fTuqqqoghOCMkDoOQdQB3br1b3HlypuioOB1cf36/7S6jrOzs/j888+FEEIE\nBwcLHx8fqVp8LKVSiOefF2LfPrUOI8TkyUK8+aYQQoiqqtMiLa2rqKnJa1GJ2tpaERgYKPr16ycs\nLS3FgAEDxOHDhyVrsbCwUAQEBAg9PT3h5eUlkpOThRBClJSUCADi2rVrko1F1BYMQnqmrVu3Tnh5\neQkhhPj111+FgYGBuHPnjlrH9PUV4vXX1TqEqP/tV5EXbi1qawuEEEJUVCQKIZTN+m51dbXYsGGD\n6N27t+jfv78ICgoS1dXVIigoSBgaGooZM2aIoqKiVveWn58vfH19hZ6envD29hYZGRkNv7t06ZJY\nsGCBmDZtmqiurm71GERSYhDSMy0xMbEh/Orq6oSxsbGIiopS65jBwXXCyytRrWMIIURBwRJRUZHQ\ngm8oxLlz+4W1tbUwMzMTn332maioqGj0iUuXLgmZTCaMjY1FUFCQUCqbF65CCFFcXCz8/f2Fnp6e\nmDFjhjh58mTD7x6eHZ4+fboFfROpF4OQnmn3wi8yMlIIIcRLL70kVq9erdYxS0tLhY6Ojjh37pxa\nx2k+pSgr2y+ysuzFb7/1EYGBm0R5eflTvxESEiLMzMyEm5tbk9dx9epV4e/vL/T19cXUqVNFYuL9\n/wh42uyQqKPgyTLU4cjl4Sgr2yNJra5du2Lq1KmIiooCAMhkMkRGRkpS+0nMzc3h4ODQMKamCKHA\nrVu7kJ1tj6Ki5ejd2x8jRlzGm2+uanLHpre3N7Kzs2FpaQlHR0ds3LgRCoWi0Wfu3r2L9evXw87O\nDsnJydi3bx9iYmLg6uqKkpISBAQEwM7ODrdv30ZCQgJCQkLwwgsvqPOSiVqFQUgdzrVrH6O2Nk+y\nejKZrFEQZmRk4MaNG5LVb2rM9lBVlYyLF+egsHA5ysp2Q7Xb1hOFhctgYjIL9vbZ6NNnJbp0af7R\nbObm5ti1axdCQkIQGBiIMWPGIC0treH3enp6yMvLw44dO5Ceno5Zs2ahtLQUAQEBsLKywoULF3D8\n+HGEhYVhwoQJarhqIoloekpK9KDq6hyRmtqlYROIFPLy8oSOjk7DLkULCwvx448/Slb/cSIiIoSR\nkZG4e/euWse5p6Tk76KyMqXRn92+/bOoq7suSf3ff/+94f7fmjVrHtnoUlpaKlasWCF69Oghxo0b\np/b7sERS4oyQOpRbt4JhZDQF+vqDJKs5dOhQWFpaIiYmBkD7zNaef/55mJmZ4bXXXsOVK1fUOhag\nOoDg9u29KCjwg1x+BABgYvISunbtI0l9Y2NjBAUFISYmBqGhoRgxYgSOHz+O33//HWvXroW1tTWS\nkpIQGhqKU6dONetAc6KOgkFIHYYQSpSV7YGZma/ktR9eHo2KioIQQvJxACA/Px/Tp0/H0KFDce3a\nNdjY2OAvf/kLioqKJB+ruvosamsvNxxAYGkZjBs3vpJ8nHvc3Nxw+vRpeHl5Ydq0abC1tcXBgwex\nfft2BiB1WgxC6jAiI6MQEKAHY2NvyWs/GH6enp4oLS1FTk6O5OPk5OTA1dUV7u7uiIiIQHR0NE6f\nPo1bt25h8ODBmD9/PvLz89s8Tk3NOVy+vAC5uY6Qyw+jrGwPCguX48oVfxgbvyjBlTyZkZERtm3b\nhiVLlsDa2hq5ubnw9fVtOJmGqLNhEFKH8e2338LBwR1du/aQvLaHhweuX7+OnJwcmJmZwdHRUfLd\nozk5OXB3d4eXlxe+//57VFVVobi4GC+88AJCQkKQmJiI6upq2Nvbw8/PDxcvXmzxGNXVGcjPn4mc\nnBHQ0dGDvX0W+vT5bzz33AJYWARi0KBv0afPSkmv60l69uyJgQMHomtXntRInRuDkDqEmzdvIiws\nDK+++ioAoLy8XNL6pqamGD16dEP4SX2f8PJl4L33yuDi4oKdO3dCV1cX+/btg7W1NVasWIGrV69i\nwoQJCAsLQ3x8PG7fvg07Ozv4+fnh0qVLTdYvKirCwYMfIDd3LISoh63tCVha7oKBwTDJrqGlHnyr\nPVFnxiCkDmHPnj0YMmQInJ2dkZeXB0tLSxw4cEDSMby8vBrC76WXXsKoUaMkqVtQAEydCujoTMS+\nfQegp6cHAFi6dCni4uJw+fJlWFpaNswCnZ2dERYWhri4OFy9ehV2dnYICAhASUnJI7VLS0uxYsUK\nDB06FLt3p8PGJhZDh0Y0+/VUN278D+TycEmu82Ht8bomovbAIKQOobi4GLa2thBCwMbGBoGBgfDz\n80NAQADq6uokGUMmkyE+Ph61tbVwdnbGZ5991uaa+fmAmxswaRKwfz+gp9f4r9S9WWBcXNwjs0AX\nFxdER0cjJiYG+fn5sLKyQkBAAK5evYqrV68iICAAgwYNQm5uLuLi4vDTT2EwNJzYZE9K5R3U1l4G\nANy9exUVp74B1LAxiDNCemZo+PENIiGEEPHx8cLMzEzMmzev4fzLM2fOiEGDBolJkyaJ69fb/jzc\n9evXRffu3cWyZctEcnKyqK+vb1O9/HwhBgwQws9PiOaWSkxMFB4eHkJfX1/4+/uL4uJiIYQQSqVS\n/PLLL8LJyUn06NFDGBoaCicnJ3HkyJEW91VS8p7IzR0rhFAKZfl/hDA2FuLgwRbXaYqnp6fYsGGD\n5HWJ2htnhNQhTJo0CRkZGSgqKoKTkxOys7Ph6OiI1NRU6OrqYsyYMUhNTW11/bCwMDg6OsLKygqn\nTp2Ci4sLzMzMMGvWLGzdurVVO0gTElTvH/zuO6C5GyZdXV2fOAucOXMmUlNT4ejoiHnz5iE1NRUv\nvtjyHaB9+qyE1bruwC+/QMfIDNi9Gxg3rsV1mvLgW+2JOjMGIXUY/fv3R3x8PNzc3ODs7IxDhw6h\nV69eiIyMxMKFC+Hm5oZdu3a1qGZxcTFmzJiBhQsXYt26dcjMzERqairKy8tx4MABDBs2DMHBwXjp\npXno2ROQyYCNG4EHThJ7oldfBfbsAVqzaXLixImIiYnBwYMHkZaWhmHDhiEtLQ06Ojq4ceMGPDw8\noKOj0/LCUL3UWN/KGcjJVf3B9OnAgAGtqvU0D77VnqhT0/SUlOhxgoKChL6+vlizZo1QKBRCCCF2\n7dolDAwMxIoVK5pc1lQqlSIoKEiYmJgId3d3cfHixad+vrBQLnbuFGLRIiH69hUCEMLWVojly1XL\nnnv2CDF2rOqzYWGqf5eKUqkUR44cEXV1daKmpkZ07dq1U7ymyMLCQvz000+aboOozfgAEHVI/v7+\nsLOzg7e3N7KysrB79274+vrC2toac+fOxfnz57Fnzx6Ympo+8t0rV64gICAAJ06cwObNm7F06dIm\nZ1cDB/bEq6+qZnkAkJUFxMQAFy7cX/a0tgYOHQL+f1OoZHR0dBqWQPPy8qBQKGBrayvtIGrAGSE9\nK7g0Sh2Wm5sbUlNTcePGDYwbNw65ublwdnZGcnIybt68CVdX10bneCqVSmzcuBH29vZQKBTIzMyE\nv79/q5YYHRyAlSuBwMD7f+bjo9oZqqaT2QAAFy/egIWFjbT33mpqgGXLVBe0eDHQjOcWm4P3COlZ\nwSCkDm3AgAFISEiAi4sLxo8fj9DQUFhYWCAxMRGzZ89Gr169AADnz5/HpEmTsHHjRnz99deIjIyE\npaWlpL106QLMmwfs3Stp2Uays6dg8OBz0hb99ltg5kxgyxbgyy+BdevaXLK2thZ1dXWcEdIzgUFI\nHZ6BgQG+//57bN68GS+//DLWrl2LP/zhD/j0009hYGCAjRs3wtHREb169UJ2djb8/PxavdGkKbNn\nq5ZL1eXcOcDOTuKiubmAo6PqZzMzoKqqzSUrKysBgDNCeibwHiF1Gv7+/hg2bFjD29NXr16NFStW\noKioCMHBwfD2lv6w7nt8fO7/nJKitmFw7hywYIHERYcNA9LTgT/+Ebh1C+je/JfzPklFRQUAcEZI\nzwQdIdR5x4NIenl5eZg1axYqKythZ2eHHTt2YNAg6d5fqEkmJsC+fcC0aRIWra4G3noLMDAAyspU\nS6PW1m0qmZWVhREjRqC2thb6+vrS9EmkIZwRUqdjY2ODlJQUVFRUoG/fvmpbBm1KTg5gby9dvbIy\n1f/a2EhXEwDQrRsQFCRpycrKSujr6zME6ZnAGSFRK8jlqmfU//1v1T4UIuq8uFmGqBWMjYG33wbe\nfRdQKjXdDRG1BZdGiVrprbeAigrVLbgeLXiXcEUFcPs2YGGhttaIqAW4NEqkJuXlqicXsrJUu0Gz\nslT/fuUKMHkyEBen+tzevarH+1JSgMOHgcrKxrtUiUi9OCMkUpPPPlOdTDNqFDB8ODBliuqAl+HD\nH50Nquv4NiJqGmeERGoil6s2bD5uY6VCAVy+DFhaAgcOqB7t27tX9QxhVRVnhETtiZtliNpo7977\nr/s7fPj+EWzGxqpzSc+eBUJCgPXrgZdfBkaOVN1THDoUOH9e9dn2OL6NiB6PS6NEEnh4aVMIYMQI\nVdDV1wP9+6uOTrOzA6ZOVR32Mnw40Lu3KigB1fFtGzZo7hqItBWXRonaaO/exy9t/vSTKgCHDVPN\nDomoY+LSKJEEHre0OWcOMH48Q5Coo2MQEklE3W+mICL14NIoERFpNc4IiYhIqzEIiYhIqzEIiYhI\nqzEIiYhIqzEIiYhIqzEIiYhIqzEIiYhIqzEIiYhIqzEIiYhIqzEIiYhIqzEIiYhIqzEIiYhIqzEI\niYhIqzEIiYhIq/0fKXgSLh0ZenMAAAAASUVORK5CYII=\n", "prompt_number": 18, "text": [ "" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The normal explanation for this would be differences in the aromaticity model. In this case, however, that turns out not be exactly it.\n", "\n", "Here's the molecule we're querying with:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "Chem.MolFromSmiles('c1cn[nH]n1')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAYAAABNcIgQAAAMGUlEQVR4nO3de0zV9R/H8RcUolY2\njdScNi9UOFxgcy2Dpf0UQ1BMAfGSbphTWfpj6DDmXEgrQkcMsDWcmS4hIS5WqOVwzpoiURsXy2Zp\nFy31uMlyiiUG5/fHV6b85HhBOLfP87Ext/P9nnPe/7jnPud8v5/jY7fb7QIAwFC+rh4AAABXIoQA\nAKMRQgCA0QghAMBohBAAYDRCCAAwGiEEABiNEAIAjEYIAQBGI4QAAKMRQgCA0QghAMBohBAAYDRC\nCAAwGiEEABiNEAIAjEYIAQBGI4QAAKMRQgCA0QghAMBohBAAYDRCCAAwGiEEABiNEAIAjEYIAQBG\nI4QAAKMRQgCA0QghAMBohBAAYDRCCAAwGiEEABiNEAIAjEYIAQBGI4QAAKMRQgCA0QghAMBohBAA\nYDRCCAAwGiEEABiNEAIAjEYIAQBGI4QAAKMRQsANnTlzRuXl5bLZbK4eBfB697t6AACWkydPqqKi\nQuXl5aqurtbQoUN1+fJlNTY26rHHHnP1eIDXYkUIuNDBgweVlpam4OBgjRgxQmVlZYqLi9Mvv/yi\nX3/9VWPHjlVCQoJaW1tdPSrgtXzsdrvd1UMAprDb7Tp06JBKS0tVWVmpU6dOKTIyUvHx8Zo6daoe\nffTRDuefO3dOISEhSkpK0htvvOGiqQHvRgiBHtbW1qbq6mqVlpbq008/1dmzZzVlyhTFx8crOjpa\njzzyyC2f/+WXX2r69OmqqqrSxIkTnTM0YBBCCPQAu1365hvpwIGftHnzVJ08eVITJ05UbGysXn75\nZQ0ePPiuXm/lypUqLS1VQ0ODBgwY0ENTA2YihEA3aW2VDh2Sysuligrp3Dlp1qx/FBHxsWbMmHHb\nld+tXLlyRc8//7yGDx+u8vLybpwaACEE7kFzs7Rnj1RZKe3aJbW1STExUny8NHmy1KdP973X8ePH\n9cwzzyg7O1tLlizpvhcGDMdVozDamDFSv37S+fPXHysulsaNc/ycixeljz6Spk+XBg6Uli+X+veX\nPvlEstmuH7uXCNbV1envv//u8FhgYKDy8vKUnJysxsbGrr84gA4IIYzn5yetX3/rcy5c6Bi/tDRp\n5Ehp717p9GkpL89aAfr7d20Gu92umpoapaamatSoUXr22Wf17bff3nReYmKiYmNjNW/evJtCCaBr\nCCGMl5IiFRRIZ892fvzDD6UhQ6RVq6TBg6WdO6XffrPiFx4u3Xdf1963paVFlZWVWrhwoQICAjRl\nyhTZbDbl5ubqwoULeuGFFzp9XkFBga5evarVq1d37Y0BdMDOMjBeUJA0c6aUmSnl5998PCzM+g5w\nwoSuR6/dpUuXVFFRoV27dqmqqkp2u10xMTHatm2bIiIi1Lt379u+xoMPPqiioiKFhYVp8uTJmjFj\nxr0NBRiOEAKS0tOlkBApNfXmY089Zf111T//SFVV/6q8fLE+//xztba2Kjo6Wps3b1ZUVJT69u17\n1685btw4ZWRkKDExUfX19Xr88ce7PiBgOK4ahdHGjJHWrZPi4qSlS62rPidNkrKzpe++6/rrXrok\nffGFdSvF7t1Sr17SwoU5mjTpSUVERMi/q18m3qCtrU2RkZG6cuWK9u/fr/vudbkKGIoVIXDN2rXS\n6NHS8OFde/6ZM9aVo6WlUk2NdTHNvHnWKnPsWMnXd2W3zuvr66vt27crNDRU77zzjtauXdutrw+Y\nghAC1wwbJi1aJOXkSCNG3Nlz/vjDWvWVlkqHD0tPPiklJEibNknBwT07ryQNGjRIW7duVUxMjF58\n8UWFhYX1/JsCXoarRoEbrFljfad3O6dPS//5j7V6fO896+rRw4elo0etj1qdEcF2kZGReu211zR3\n7lw1NTU5740BL8F3hEAX/PuvlJVl3VcYEuLqaawt2MaPH6+RI0eqrKzM1eMAHoUQAl6ifQu2nJwc\nLV682NXjAB6Dj0YBLxEYGKjc3FwlJyfrxx9/dPU4gMdgRQh4mfnz56uxsVG1tbXq0527fgNeihUh\n4GXef/99NTc3Ky0tzdWjAB6BEAKyfktw2zbpr79cPcm9e/jhh1VSUqKCggJ99tlnrh4HcHt8NArI\numWiTx/pyBFrtxlvkJmZqezsbLZgA26DEALyzhC2tbXppZdeUktLC1uwAbfAR6OAl/L19VVhYaGO\nHTumrKwsV48DuC1CCHix9i3YMjIyVF1d7epxALdECAEvN3XqVCUlJWnOnDlswQZ0ghACBtiwYYMC\nAgK0dOlSV48CuB1CCHiZ1tZWff311x0e8/f3V0lJifbu3astW7a4aDLAPRFCwMu89dZbSkhIUHNz\nc4fHn3jiCeXk5LAFG/B/uH0CkPfcPrFv3z5FRUWpqqpKEyZM6PScadOmycfHR5WVlU6eDnBPrAgB\nL2Gz2fTKK68oNTXVYQQB3IxfqAe8gN1uV2JiokaNGqWMjAyH5+Xm5urw4cOqr6933nCAmyOEgBfY\nuHGjampq1NDQoPvv7/y/dV1dndLS0lRUVKRhw4Y5eULAfRFCwMPV1dVp9erVKiwsdBi4ixcvavbs\n2VqwYIFiY2OdPCHg3viOEPBg7YGbP3++4uLiHJ6XnJwsPz8/5eXlOXE6wDOwIgQ8WEpKivz8/JSf\nn+/wnKKiIhUXF6u2tlZ9+/Z14nSAZyCEgIfavfu4duzYoQMHDuiBBx7o9Jyff/5Zy5Yt0/r16zXG\nk+8LAXoQ9xEC8rz7CE+dkkJDpXXrbFqxYlCn57S0tCg8PFxDhgzRzp075ePj49whAQ/BihDwMFev\nSrGxUni4tHx55xGUpPT0dNlsNu3du5cIArdACAFJfn7S1q3S0KGunuT21q2T/vxT2rNHctS3qqoq\nvfvuu9q3b5/69+/v1PkAT8NHo4AH2bdPioqSqqokR5vHNDU1KTQ0VHPmzNGGDRucOyDggQgh4CFs\nNikkRHr1Venttx2fN3t2i/z987Vly3/Vq1cv5w0IeChCCHgAu12KjpYuXJC++kpysHmM8vOljAyp\nvl5i8xjgzvAdIeABNm6UamqkhgbHEayvl1avloqKiCBwN1gRAm6urk4aP14qLJQcbR5z+bI0bpwU\nFiZt3uzc+QBPxxZrMNqYMVK/ftL589cfKy62onLjOYWFHZ9XXCwFBTlnxrQ0KT7ecQQlKSXF+pcd\n1IC7x0ejMJ6fn7R+veSuF1hu327d7O9ISYl1Tm2txA5qwN1jRQjjpaRIBQXS2bOunqRzAwdKDz3U\n+bHff5eWLZOysjxjRxzAHRFCGC8oSJo5U8rMdPUkd6e1VVqwwLqfcMUKV08DeC5CCEhKT7d2ljl1\nqvPjSUlSQMD1vyVLnDtfZzIzpRMnpA8+cLzDDIDbI4SApJEjpXnzpDff7Px4ZqZ1e0L7X1aWE4fr\nxKFD1k31H39shRlA13GxDHDN2rXS6NHS8OE3H+vfv+M+pAMGOG2smzQ1SXPnSqtWOd5mDcCdY0UI\nXDNsmLRokZST4+pJbu3IESkw0Po4F8C9I4TADdassX6b8G41Nkrl5daN7T1twgRp/36JbUSB7sHO\nMkA3KCqybsNobrZ+HWLWLGnaNMe3PQBwH4QQ6EY//CCVlko7dlhXdD73nLUrTHy8NGSIq6cD0Bk+\nGgW6UXCw9cO5x45ZG2RPnixt2mR9/xgebu1gc+KE4+ffyZZvALoXIQR6SHsUjx61vkOcPl2qrLQu\ndGk/9tNPNz+vfcs3AM5BCAEnCA6WXn9dOnjQWinGxUllZdbtGuHhHS/Qcfct3wBvQwgBJ3v6aevH\nc7//3vpOMTFR6t37+nFP3fIN8FTcUA+4UFBQ5z/nlJ4uhYRIqanOnwkwDStCwA3dbss3AN2HFSHg\npm615RuA7sOKEHBTnrLlG+DpCCHgxrq65RuAO8fOMgAAo7EiBAAYjRACAIxGCAEARiOEAACjEUIA\ngNEIIQDAaIQQAGA0QggAMBohBAAYjRACAIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBoh\nBAAYjRACAIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGA0QggAMBohBAAYjRACAIxGCAEARiOEAACj\nEUIAgNEIIQDAaIQQAGA0QggAMBohBAAYjRACAIxGCAEARiOEAACjEUIAgNEIIQDAaIQQAGC0/wFD\nb1pVsmEqIgAAAABJRU5ErkJggg==\n", "prompt_number": 19, "text": [ "" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "qfp2 = pyAvalonTools.GetAvalonFP('c1cnn[nH]1',True,2048,isQuery=True)\n", "qfp==qfp2" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "False" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The deciding factor is the tautomer. The Avalon fingerprinter considers this significant,\n", "and that difference explains the different screenout.\n", "\n", "If we use the fingerprint from the second tautomer we see the expected screenout behavior:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "DataStructs.AllProbeBitsMatch(qfp2,mfp)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 102, "text": [ "True" ] } ], "prompt_number": 102 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This ends up mattering because the RDKit's substructure matcher ignores the H completely:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "Chem.MolFromSmiles('c1cnn[nH]1').HasSubstructMatch(Chem.MolFromSmiles('c1cn[nH]n1'))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 103, "text": [ "True" ] } ], "prompt_number": 103 }, { "cell_type": "code", "collapsed": false, "input": [ "Chem.MolFromSmiles('c1cnnn1C').HasSubstructMatch(Chem.MolFromSmiles('c1cn[nH]n1'))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 104, "text": [ "True" ] } ], "prompt_number": 104 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unless, of course, you construct the query from SMARTS:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "Chem.MolFromSmiles('c1cnnn1C').HasSubstructMatch(Chem.MolFromSmarts('c1cn[nH]n1'))" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 105, "text": [ "False" ] } ], "prompt_number": 105 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Other fingerprints" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The RDKit fingerprint" ] }, { "cell_type": "code", "collapsed": false, "input": [ "fn = lambda x,v:Chem.RDKFingerprint(x,minPath=1,maxPath=5,nBitsPerHash=1,useHs=False)\n", "testScreenoutSerial(mols,leads,fn,verify=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Failure: COc1ccc(-n2c(SC(C)=O)nc3sc4c(c3c2=O)CCCC4)cc1 CSC(C)=O\n", "Failure: CCCCCC[C@H]1SC(=O)c2ccccc2[C@@H]1C(=O)O CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccccc1SC(=O)c1cccc(C=O)n1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(-n2c(SC(C)=O)nc3sc4c(c3c2=O)CCCC4)cc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CC(C)(C)OC(=O)n1c(-c2ccc3c(c2)CC(NS(=O)(=O)c2ccccc2)C3)cc2cc(O)ccc21 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: Cc1c2cc(O)ccc2n(Cc2ccc(OC[C@H](C)N3CCCC3)cc2)c1-c1ccc(O)cc1 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CC1(C)Oc2ccc(-c3nc(-c4cccc5c(CCC(=O)O)c[nH]c54)no3)cc2O1 Cc1noc(-c2ccc(O)cc2)n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(CC(=O)Nc2ccc(-c3noc(-c4cc(OC)c(OC)c(OC)c4)n3)cc2)cc1 Cc1noc(-c2ccc(O)cc2)n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CO[C@H]1C(=O)N2C(C(=O)C(C)(C)C)=C(C)[C@@H](S(=O)(=O)c3ccccc3)S(=O)(=O)C12 CSCSC" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: Cc1c2c(nc3ccc(OCCN4CCCCC4)cc13)-c1cc3cc(O)ccc3n1C2 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc2[nH]c(-c3ccc4ccccc4c3)c(-c3cc(OC)c(OC)c(OC)c3)c2c1 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C(O)C1CCCN1C(=O)CC(SC(=O)c1ccccc1)C(=O)c1ccccc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CC(C)(C)OC(=O)n1c(-c2ccc3c(c2)CC(NS(=O)(=O)c2ccccc2)C3)cc2cc(O)ccc21 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=Cc1cccc(C(=O)Sc2ccc(Cl)cc2)n1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C1NCc2cccc(-c3cc4cc(OCCN5CCCCC5)ccc4[nH]3)c21 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(C#CCC2(S(=O)(=O)c3ccc(C)cc3)SC(=O)NC2=O)cc1 CSCSC" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CCCC(=O)Sc1nc2sc3c(c2c(=O)n1-c1ccc(OC)cc1)CCCC3 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(-n2c(SC(C)=O)nc3sc4c(c3c2=O)CCCC4)cc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(-c2nc(-c3ccc(NC(=O)c4ccc(Br)o4)cc3)no2)cc1OC Cc1noc(-c2ccc(O)cc2)n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1cc2c(cc1OC)C(=O)N1CSCC1C(=O)S2 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: Oc1nc2ccccc2cc1-c1cc2cc(OCCN3CCCCC3)ccc2[nH]1 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CC(C)(C)OC(=O)[C@@H]1CCCN1C(=O)[C@H]1CCC[C@@H]1SC(=O)c1ccccc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C1Sc2ccccc2C(=O)N2CCSC12 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CC(C)(C)OC(=O)n1c(-c2ccc3c(c2)CC(NS(=O)(=O)c2ccccc2)C3)cc2cc(O)ccc21 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=c1n(Cc2ccccc2)nc2n1-c1ccccc1OC2 CCn1nc2n(c1=O)-c1ccccc1OC2" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(-n2c(SC(C)=O)nc3sc4c(c3c2=O)CCCC4)cc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CCCOC(=O)c1c(CCC)c(C(=O)SCC)c(CC)nc1-c1cccc(F)c1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C1Sc2sc(=S)sc2SC1c1ccccc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(-n2c(SC(C)=O)nc3sc4c(c3c2=O)CCCC4)cc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc2c(c1)c(CCN(C)C)c1n2S(=O)(=O)c2ccccc2-1 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CC(C)(C)OC(=O)n1c(-c2ccc3c(c2)CC(NS(=O)(=O)c2ccccc2)C3)cc2cc(O)ccc21 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CCCC(=O)Sc1nc2sc3c(c2c(=O)n1-c1ccc(OC)cc1)CCCC3 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(-c2cc3cc(OC)ccc3[nH]2)cc1 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C(O)[C@@H](Cc1ccc(-c2ccccc2)cc1)SC(=O)c1ccccc1 CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Found 1240 matches in 1399 searches with 34 failures. Accuracy: 0.886" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "(25000000, 1399, 1240, 34)" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The failures happen here because the RDKit fingerprinter uses information about aromaticity in the calculation of atom invariants and while hashing bonds. That extra information is a big part of why the accuracy is so high.\n", "\n", "Turning off all bond order information (finer grained control is unfortunately not possible) and using atomic number as the atom invariants gives no errors, but a much lower screenout rate:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def func(mol,v):\n", " invs = [x.GetAtomicNum() for x in mol.GetAtoms()]\n", " return Chem.RDKFingerprint(mol,minPath=1,maxPath=5,nBitsPerHash=1,useHs=False,useBondOrder=False,atomInvariants=invs)\n", "testScreenoutSerial(mols,leads,func,verify=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Found 1274 matches in 125651 searches with 0 failures. Accuracy: 0.010\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 53, "text": [ "(25000000, 125651, 1274, 0)" ] } ], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a shorter maxPath length also hurts accuracy:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "fn = lambda x,v:Chem.RDKFingerprint(x,minPath=1,maxPath=4,nBitsPerHash=1,useHs=False)\n", "testScreenout(mols,leads,fn,dview)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Found 1240 matches in 2331 searches with 0 failures. Accuracy: 0.532\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 35, "text": [ "(25000000, 2331, 1240, 0)" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is most likely luck that there are no failures here. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chemfp\n", "\n", "Andrew Dalke's chemfp package (http://chemfp.com/) includes a pattern-based fingerprinter which is based on the PubChem/CACTVS substructure keys:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from chemfp import rdkit_patterns\n", "dalke_fp=rdkit_patterns.SubstructRDKitFingerprinter_v1.make_fingerprinter()\n", "fn = lambda x,v,fp=dalke_fp:DataStructs.CreateFromBinaryText(fp(x))\n", "testScreenoutSerial(mols[:500],leads,fn,verify=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Failure: CC(C)(C)c1cc2c(cc1SC1=C(O)OC(CCc3ccccc3)(c3ccccc3)CC1=O)CCCC2 C1=COCCC1\n", "Failure: CC(C)(C)c1cc2c(cc1SC1=C(O)OC(CCc3ccccc3)(c3ccccc3)CC1=O)CCCC2 O=C1C=CO[C@H](c2ccccc2)C1\n", "Failure: Cc1ccc(C)c(S(=O)(=O)c2nnn3c4ccsc4c(Nc4ccc(C)c(C)c4)nc23)c1 c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CCOC(=O)Cc1c(C)n(CC2CCCCC2)c2ccc(OC)cc12 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(OCC#Cc2cn([C@H](C)C[C@@H]3CC[C@H]([C@@H](C)C(=O)N(C)Cc4ccccc4)O3)nn2)cc1 c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COC1=C(OC)C(=O)C2=C(C[C@@H]3[C@@]4(C)C[C@H]4C[C@@]3(C)C2)C1=O COC1=CCC=CC1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc2[nH]c3c(c2c1)CC(NC(=O)C1CC1)CC3 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: Cc1c(C(=O)Nc2ccccc2)nnn1Cc1ccccc1O c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1c(=O)[nH]c2ccccc2c1-c1ccccc1 Oc1cnc2ccccc2c1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: Cc1ccc2c(nc(-c3ccc(NC(=O)C(F)(F)F)cc3)c(O)c2C(=O)O)c1C Oc1cnc2ccccc2c1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: c1c(CCc2ccccc2)nnn1C[C@H]1CC[C@@H]([C@@H]2CC[C@H](Cn3cc(CCc4ccccc4)nn3)O2)O1 c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C(NCc1ccc(F)cc1)[C@@H](CCN1[C@@H]2CC[C@@H]1CC(n1nnc3cccnc31)C2)c1ccccc1 c1cn[nH]n1\n", "Found 3 matches in 108 searches with 12 failures. Accuracy: 0.028\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 31, "text": [ "(250000, 108, 3, 12)" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's not particularly effective as a screening fingerprint and, as seen above, creates some holes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MACCS keys\n", "\n", "Another key-based fingerprint that isn't particularly effective and results in some misses:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "fn = lambda x,v:rdMolDescriptors.GetMACCSKeysFingerprint(x)\n", "testScreenoutSerial(mols[:500],leads,fn,verify=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Failure: CC(C)(C)c1cc2c(cc1SC1=C(O)OC(CCc3ccccc3)(c3ccccc3)CC1=O)CCCC2 C1=COCCC1\n", "Failure: Cc1ccc(C)c(S(=O)(=O)c2nnn3c4ccsc4c(Nc4ccc(C)c(C)c4)nc23)c1 c1cn[nH]n1\n", "Failure: CCOC(=O)C1=COC(C)(CCc2ccccc2)CC1=O C1=COCCC1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CCOC(=O)Cc1c(C)n(CC2CCCCC2)c2ccc(OC)cc12 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc(OCC#Cc2cn([C@H](C)C[C@@H]3CC[C@H]([C@@H](C)C(=O)N(C)Cc4ccccc4)O3)nn2)cc1 c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COC1=C(OC)C(=O)C2=C(C[C@@H]3[C@@]4(C)C[C@H]4C[C@@]3(C)C2)C1=O COC1=CCC=CC1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1ccc2[nH]c3c(c2c1)CC(NC(=O)C1CC1)CC3 Cc1cc2cc(O)ccc2[nH]1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: Cc1c(C(=O)Nc2ccccc2)nnn1Cc1ccccc1O c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: COc1c(=O)[nH]c2ccccc2c1-c1ccccc1 Oc1cnc2ccccc2c1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: CC(=O)NCCCC[C@H](NC(C)=O)C(=O)NCC(=O)S[C@H](C)C(=O)O CSC(C)=O" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: c1c(CCc2ccccc2)nnn1C[C@H]1CC[C@@H]([C@@H]2CC[C@H](Cn3cc(CCc4ccccc4)nn3)O2)O1 c1cn[nH]n1" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Failure: O=C(NCc1ccc(F)cc1)[C@@H](CCN1[C@@H]2CC[C@@H]1CC(n1nnc3cccnc31)C2)c1ccccc1 c1cn[nH]n1\n", "Found 3 matches in 134 searches with 12 failures. Accuracy: 0.022\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "(250000, 134, 3, 12)" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Improving things\n", "\n", "A first pass at improving is to combine some selected patterns with an algorithmic fingerprinter like the pattern fingerprinter. Here's a thread describing an experiment:\n", "\n", "http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02078.html\n", "\n", "And a reproduction of that with this test harness:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "qs=\"\"\"0 2 times O largest 55458\n", "1 2 times Ccc largest 29602\n", "2 1 times CCN largest 16829\n", "3 1 times cnc largest 11439\n", "4 1 times cN largest 8998\n", "5 1 times C=O largest 7358\n", "6 1 times CCC largest 6250\n", "7 1 times S largest 4760\n", "8 1 times c1ccccc1 largest 4524\n", "9 2 times N largest 2854\n", "10 1 times C=C largest 2162\n", "11 1 times nn largest 1840\n", "12 2 times CO largest 1248\n", "13 1 times Ccn largest 964\n", "14 1 times CCCCC largest 857\n", "15 1 times cc(c)c largest 653\n", "16 3 times O largest 653\n", "17 1 times O largest 466\n", "18 2 times CNC largest 464\n", "19 1 times s largest 457\n", "20 1 times CC(C)C largest 335\n", "21 1 times o largest 334\n", "22 1 times cncnc largest 334\n", "23 1 times C=N largest 321\n", "24 2 times CC=O largest 238\n", "25 4 times Ccc largest 238\n", "26 1 times Cl largest 230\n", "27 4 times O largest 149\n", "28 2 times ccncc largest 149\n", "29 6 times CCCCCC largest 76\n", "30 2 times c1ccccc1 largest 76\n", "31 1 times F largest 75\n", "32 3 times CCOC largest 44\n", "33 3 times N largest 44\n", "34 1 times c(cn)n largest 44\n", "35 1 times N largest 41\n", "36 9 times C largest 41\n", "37 1 times CC=C(C)C largest 33\n", "38 1 times c1ccncc1 largest 26\n", "39 1 times CC(C)N largest 26\n", "40 1 times CC largest 26\n", "41 4 times CCC(C)O largest 25\n", "42 2 times ccc(cc)n largest 21\n", "43 6 times C largest 21\n", "44 1 times C1CCCC1 largest 18\n", "45 1 times C largest 18\n", "46 5 times O largest 18\n", "47 2 times Ccn largest 14\n", "48 1 times CNCN largest 13\n", "49 3 times cncn largest 13\n", "50 1 times CSC largest 13\n", "51 3 times CC=O largest 11\n", "52 1 times CCNCCCN largest 11\n", "53 1 times CccC largest 11\n", "54 3 times ccccc(c)c largest 10\"\"\"\n", "\n", "def _initPatts():\n", " ssPatts=[]\n", " for q in qs.split('\\n'):\n", " q = q.split(' ')\n", " count = int(q[1])\n", " patt = Chem.MolFromSmiles(q[3],sanitize=False)\n", " patt.UpdatePropertyCache(strict=False) # <- github #149, without this we cannot pickle the queries\n", " ssPatts.append((patt,count))\n", " return ssPatts\n", " \n", "_ssPatts=None\n", "def GetCombinedSubstructFP(m,ssPatts=None,fpSize=1024,verbose=False):\n", " if ssPatts is None:\n", " ssPatts = _initPatts()\n", " sz = len(ssPatts)\n", " lfp=Chem.PatternFingerprint(m,fpSize=fpSize)\n", " res = DataStructs.ExplicitBitVect(fpSize+sz)\n", " obls = [x+sz for x in lfp.GetOnBits()]\n", " res.SetBitsFromList(obls)\n", " for i,(p,count) in enumerate(ssPatts):\n", " matches = m.GetSubstructMatches(p,uniquify=True)\n", " if len(matches)>=count:\n", " res.SetBit(i)\n", " return res\n", " \n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "ssPatts = _initPatts()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "fn = lambda x,v,ssPatts=ssPatts:GetCombinedSubstructFP(x,ssPatts=ssPatts,fpSize=2048)\n", "testScreenoutSerial(mols[:500],leads,fn,verify=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Found 15 matches in 23 searches with 0 failures. Accuracy: 0.652\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 65, "text": [ "(250000, 23, 15, 0)" ] } ], "prompt_number": 65 }, { "cell_type": "code", "collapsed": false, "input": [ "leadResults={}\n", "fragResults={}\n", "pieceResults={}\n", "print(\"----------------------------\")\n", "fn = lambda x,v,ssPatts=ssPatts,fpf=GetCombinedSubstructFP:fpf(x,ssPatts=ssPatts,fpSize=1024)\n", "method='1024'\n", "print(\"Doing %s\"%method)\n", "print(\"Pieces\")\n", "pieceResults[method]=testScreenout(mols,pieces,fn,dview)\n", "print(\"Frags\")\n", "fragResults[method]=testScreenout(mols,frags,fn,dview)\n", "print(\"Leads\")\n", "leadResults[method]=testScreenout(mols,leads,fn,dview)\n", "\n", "print(\"----------------------------\")\n", "fn = lambda x,v,ssPatts=ssPatts,fpf=GetCombinedSubstructFP:fpf(x,ssPatts=ssPatts,fpSize=2048)\n", "method='2048'\n", "print(\"Doing %s\"%method)\n", "print(\"Pieces\")\n", "pieceResults[method]=testScreenout(mols,pieces,fn,dview)\n", "print(\"Frags\")\n", "fragResults[method]=testScreenout(mols,frags,fn,dview)\n", "print(\"Leads\")\n", "leadResults[method]=testScreenout(mols,leads,fn,dview)\n", " \n", " \n", " " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "----------------------------\n", "Doing 1024\n", "Pieces\n", "Found 1935315 matches in 3023990 searches with 0 failures. Accuracy: 0.640" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Frags\n", "Found 4659 matches in 13765 searches with 0 failures. Accuracy: 0.338" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Leads\n", "Found 1274 matches in 6857 searches with 0 failures. Accuracy: 0.186" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "----------------------------\n", "Doing 2048\n", "Pieces\n", "Found 1935315 matches in 2810633 searches with 0 failures. Accuracy: 0.689" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Frags\n", "Found 4659 matches in 7669 searches with 0 failures. Accuracy: 0.608" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Leads\n", "Found 1274 matches in 1761 searches with 0 failures. Accuracy: 0.723" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing to the previous pattern fp results:\n", "```\n", "Pattern-1K Pieces 0.500 Fragments 0.285 Leads 0.169\n", "Combine-1K Pieces 0.640 Fragments 0.338 Leads 0.186\n", "Pattern-2K Pieces 0.572 Fragments 0.590 Leads 0.715\n", "Combine-2K Pieces 0.689 Fragments 0.608 Leads 0.723\n", "Pattern-4K Pieces 0.635 Fragments 0.602 Leads 0.729\n", "\n", "```\n", "These help a bit, particularly with the pieces, but not a whole lot.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }