{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Comparing Fraggle to other fingerprints\n", "\n", "The Fraggle similarity algorithm from Jameed Hussain and Gavin Harper is available in the RDKit since the 2013_09 release.\n", "\n", "The algorithm, which is described here: https://github.com/rdkit/UGM_2013/blob/master/Presentations/Hussain.Fraggle.pdf?raw=true , uses the similarity between fragments of the query molecule and the database molecule and is an interesting complement to standard fingerprint similiarity.\n", "\n", "Here I will take a look at Fraggle using the same tools I applied to the other fingerprinting methods in these two posts:\n", "\n", "http://rdkit.blogspot.ch/2013/10/fingerprint-thresholds.html\n", "\n", "http://rdkit.blogspot.ch/2013/10/comparing-fingerprints-to-each-other.html\n", "\n", "## TL;DR Summary\n", "\n", "The baseline similarity values for Fraggle are quite high:\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
FingerprintMetric90% level95% level99% level
Fraggle0.4830.5380.650
\n", "\n", "As expected from the definition, Fraggle similarity tends to be higher than RDKit5 similarity:\n", "\n", "\n", "This is a nice example of a case where the RDKit5 fingerprint says the molecules are quite dissimilar, but Fraggle provides the expected high similarity score:\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mol1mol2FraggleRDKit5FragmentFragMol
15634 \"Mol\" \"Mol\" 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 \"Mol\"
\n", "
\n", "\n", "Another interesting point about Fraggle is that it pulls back compounds that are quite complementary to the other methods we've looked at. To demonstrate, here is the percent overlap in the top 100 pairs found by Fraggle and a few other fingerprints:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Fingerprint 1Fingerprint 2Fraction in common (top 100)
FraggleAP0.18
FraggleAvalon-10240.16
FraggleRDKit50.24
FraggleTT0.21
APAvalon-10240.58
APRDKit50.69
APTT0.86
Avalon-1024RDKit50.56
Avalon-1024TT0.60
RDKit5TT0.70
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Move on to actually do the work" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from rdkit import Chem\n", "from rdkit.Chem import rdMolDescriptors\n", "from rdkit.Avalon import pyAvalonTools\n", "from rdkit.Chem import Draw\n", "from rdkit.Chem.Fraggle import FraggleSim\n", "from rdkit.Chem.Draw import IPythonConsole\n", "from rdkit import rdBase\n", "from rdkit import DataStructs\n", "from collections import defaultdict\n", "import cPickle,random,gzip,time\n", "import scipy as sp\n", "import pandas\n", "from rdkit.Chem import PandasTools\n", "PandasTools.RenderImagesInAllDataFrames()\n", "from scipy import stats\n", "from IPython.core.display import display,HTML,Javascript\n", "\n", "print rdBase.rdkitVersion\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "2014.03.1pre\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Start with finding the baseline similarity value\n", "\n", "## read in the molecule pairs and shuffle them so that we have random pairs" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ind = [x.split() for x in gzip.open('../data/chembl16_25K.pairs.txt.gz')]\n", "ms1 = []\n", "ms2 = []\n", "for i,row in enumerate(ind):\n", " m1 = Chem.MolFromSmiles(row[1])\n", " ms1.append((row[0],m1))\n", " m2 = Chem.MolFromSmiles(row[3])\n", " ms2.append((row[2],m2))\n", " " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "random.seed(23)\n", "random.shuffle(ms2)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "t1=time.time()\n", "sims=[]\n", "for i,(m1,m2) in enumerate(zip(ms1,ms2)):\n", " sim,frag= FraggleSim.GetFraggleSimilarity(m1[-1],m2[-1])\n", " sims.append((sim,i))\n", " if not (i%200):\n", " print 'Done: %d in %.2f seconds'%(i,time.time()-t1)\n", "t2=time.time()\n", "print 'Finished in %.2f seconds'%(t2-t1)\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "cPickle.dump(sims,gzip.open('../data/chembl16_25K.fraggle_randompairs.sims.pkl.gz','wb+'))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Here's the analysis" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sl = sorted(sims)\n", "np = len(sl)\n", "for bin in (.7,.8,.9,.95,.99):\n", " print bin,sl[int(bin*np)]\n", "hist([x[0] for x in sims],bins=20)\n", "xlabel(\"Fraggle\")\n", " " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.7 (0.37727272727272726, 11580)\n", "0.8 (0.4196078431372549, 17489)\n", "0.9 (0.4826254826254826, 393)\n", "0.95 (0.5377358490566038, 3077)\n", "0.99 (0.65, 17818)\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEMCAYAAADNtWEcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X9Mm/eBx/G3W3xaf4Q0uQ3Ts7ORBafUhRD3MifV1skd\nJUmpiphScWW3Qtok25HuSn9Ip/VObeGkBe666pSlh5SryI2y3SDaXSHqEotpV/e2tnNaStqtri5u\nj3S2MahZQkNTMpLw3B8sz0J+gLHBEJ7PS7LkPH6+z/f7fGOez+Pn19dmGIaBiIhYzlVz3QAREZkb\nCgAREYtSAIiIWJQCQETEohQAIiIWpQAQEbGopALg7NmzeL1e7rnnHgDq6+txuVx4vV68Xi8HDhww\n521sbMTtdlNQUEB3d7c5vaenh6KiItxuN3V1dTO8GiIiMl1JBcDOnTvxeDzYbDYAbDYbjz32GL29\nvfT29nLXXXcBEA6H6ejoIBwOEwgE2L59O+duM6itraWlpYVIJEIkEiEQCMzSKomISDKmDIBYLMb+\n/fvZunWruTE3DINL3T/W1dVFVVUVdrudvLw88vPzCYVCJBIJhoeH8fl8AFRXV9PZ2TnDqyIiItMx\nZQA8+uijPPPMM1x11Z9mtdls7Nq1i+LiYrZs2cLQ0BAA/f39uFwucz6Xy0U8Hr9outPpJB6Pz+R6\niIjINGVN9uFLL71ETk4OXq+XYDBoTq+treWpp54C4Mknn+Txxx+npaVlRhp07jCTiIgkL5Wn+kz6\nC+C1115j3759LF++nKqqKv77v/+b6upqcnJysNls2Gw2tm7dysGDB4HxPftoNGqWj8ViuFwunE4n\nsVhswnSn0znpilj99fTTT895G+bLS32hvlBfTP5K1aQBsGPHDqLRKH19fbS3t/O1r32NF154gUQi\nYc7z4osvUlRUBEB5eTnt7e2Mjo7S19dHJBLB5/ORm5tLdnY2oVAIwzBoa2ujoqIi5UaLiEj6Jj0E\ndD7DMMzDM3/3d3/H22+/jc1mY/ny5ezevRsAj8dDZWUlHo+HrKwsmpubzTLNzc1s3ryZkZERysrK\n2Lhx4yysjoiIJMtmpPP7YRbYbLa0ftIsFMFgEL/fP9fNmBfUF3+ivvgT9cWfpLrdVACIiFzhUt1u\n6lEQIiIWpQAQEbEoBYCQnb3UvKx3uq/s7KVz3XwRSZHOAcgfr9RKtc/1/yUy13QOQEREpkUBICJi\nUQoAERGLUgCIiFiUAkBExKIUACIiFqUAEBGxKAWAiIhFKQBERCxKASAiYlEKABERi0oqAM6ePYvX\n6+Wee+4B4NixY5SWlrJy5UrWr1/P0NCQOW9jYyNut5uCggK6u7vN6T09PRQVFeF2u6mrq5vh1RAR\nkelKKgB27tyJx+Mxh3dsamqitLSUw4cPU1JSQlNTEwDhcJiOjg7C4TCBQIDt27ebDyiqra2lpaWF\nSCRCJBIhEAjM0iqJiEgypgyAWCzG/v372bp1q7kx37dvHzU1NQDU1NTQ2dkJQFdXF1VVVdjtdvLy\n8sjPzycUCpFIJBgeHsbn8wFQXV1tlhERkbkx5aDwjz76KM888wwnTpwwpw0ODuJwOABwOBwMDg4C\n0N/fz7p168z5XC4X8Xgcu92Oy+UypzudTuLx+GXrrK+vN9/7/X6N+ykicp5gMEgwGEx7OZMGwEsv\nvUROTg5er/eylZ0bGGQmnR8AIiIy0YU7xg0NDSktZ9IAeO2119i3bx/79+/n1KlTnDhxgvvvvx+H\nw8HAwAC5ubkkEglycnKA8T37aDRqlo/FYrhcLpxOJ7FYbMJ0p9OZUoNFRGRmTHoOYMeOHUSjUfr6\n+mhvb+drX/sabW1tlJeX09raCkBraysVFRUAlJeX097ezujoKH19fUQiEXw+H7m5uWRnZxMKhTAM\ng7a2NrOMiIjMjSnPAZzv3KGe7373u1RWVtLS0kJeXh579+4FwOPxUFlZicfjISsri+bmZrNMc3Mz\nmzdvZmRkhLKyMjZu3DjDqyIiItOhMYFFYwKLXOE0JrCIiEyLAkBExKIUACIiFqUAEBGxKAWAiIhF\nKQBERCxKASAiYlEKABERi5rWncAyP2VnL2V4+Pgc1Z6V1sMAFy1awokTx2awPSKSLN0JvACkdycv\nQHp3Aqdbt/6/RdKjO4FFRGRaFAAiIhalABARsSgFgIiIRSkAREQsSgEgImJRkwbAqVOnWLt2LatX\nr8bj8fDEE08A44O2u1wuvF4vXq+XAwcOmGUaGxtxu90UFBTQ3d1tTu/p6aGoqAi3201dXd0srY6I\niCRryvsAPv30U6699lrOnDnDV77yFb7//e/zi1/8gkWLFvHYY49NmDccDvONb3yDN954g3g8zp13\n3kkkEsFms+Hz+Xjuuefw+XyUlZXx8MMPX3JYSN0HMH26D0DE2mbtPoBrr70WgNHRUc6ePcuSJUsA\nLllZV1cXVVVV2O128vLyyM/PJxQKkUgkGB4exufzAVBdXU1nZ+e0GysiIjNnykdBjI2Nceutt/LB\nBx9QW1vLLbfcwk9/+lN27drFCy+8wJo1a3j22We54YYb6O/vZ926dWZZl8tFPB7HbrfjcrnM6U6n\nk3g8ftk66+vrzfd+vx+/35/a2omILEDBYJBgMJj2cqYMgKuuuopDhw7x8ccfs2HDBoLBILW1tTz1\n1FMAPPnkkzz++OO0tLSk3Zhzzg8AERGZ6MId44aGhpSWk/RVQIsXL+buu+/mzTffJCcnB5vNhs1m\nY+vWrRw8eBAY37OPRqNmmVgshsvlwul0EovFJkx3Op0pNVhERGbGpAFw9OhRhoaGABgZGeHnP/85\nXq+XgYEBc54XX3yRoqIiAMrLy2lvb2d0dJS+vj4ikQg+n4/c3Fyys7MJhUIYhkFbWxsVFRWzuFoi\nIjKVSQ8BJRIJampqGBsbY2xsjPvvv5+SkhKqq6s5dOgQNpuN5cuXs3v3bgA8Hg+VlZV4PB6ysrJo\nbm42HxXc3NzM5s2bGRkZoays7JJXAImISObocdALgC4DFbE2PQ5aRESmRQEgImJRCgAREYtSAIiI\nWJQCQETEohQAIiIWpQAQEbEoBYCIiEUpAERELEoBICJiUQoAERGLUgCIiFiUAkBExKIUACIiFqUA\nEBGxKAWAiIhFTRoAp06dYu3ataxevRqPx8MTTzwBwLFjxygtLWXlypWsX7/eHDYSoLGxEbfbTUFB\nAd3d3eb0np4eioqKcLvd1NXVzdLqiIhIsiYNgM985jO8/PLLHDp0iHfeeYeXX36ZX/3qVzQ1NVFa\nWsrhw4cpKSmhqakJgHA4TEdHB+FwmEAgwPbt281Rampra2lpaSESiRCJRAgEArO/diIicllTHgK6\n9tprARgdHeXs2bMsWbKEffv2UVNTA0BNTQ2dnZ0AdHV1UVVVhd1uJy8vj/z8fEKhEIlEguHhYXw+\nHwDV1dVmGRERmRuTDgoPMDY2xq233soHH3xAbW0tt9xyC4ODgzgcDgAcDgeDg4MA9Pf3s27dOrOs\ny+UiHo9jt9txuVzmdKfTSTwev2yd9fX15nu/34/f75/ueomILFjBYJBgMJj2cqYMgKuuuopDhw7x\n8ccfs2HDBl5++eUJn9tstj8OSj5zzg8AERGZ6MId44aGhpSWk/RVQIsXL+buu++mp6cHh8PBwMAA\nAIlEgpycHGB8zz4ajZplYrEYLpcLp9NJLBabMN3pdKbUYBERmRmTBsDRo0fNK3xGRkb4+c9/jtfr\npby8nNbWVgBaW1upqKgAoLy8nPb2dkZHR+nr6yMSieDz+cjNzSU7O5tQKIRhGLS1tZllRERkbkx6\nCCiRSFBTU8PY2BhjY2Pcf//9lJSU4PV6qayspKWlhby8PPbu3QuAx+OhsrISj8dDVlYWzc3N5uGh\n5uZmNm/ezMjICGVlZWzcuHH2105ERC7LZpy7TnOesNlszLMmzXvjIZtOn6VTPv269f8tkp5Ut5u6\nE1hExKIUACIiFqUAEBGxKAWAiIhFKQBkjmWZNxNO95WdvXSuGy9yRdNVQAvAlX4VUDp167sioquA\nRERkmhQAIiIWpQAQEbEoBYCIiEUpAERELEoBICJiUQoAERGLUgCIiFiUAkBExKKmDIBoNModd9zB\nLbfcQmFhIT/4wQ+A8XF7XS4XXq8Xr9fLgQMHzDKNjY243W4KCgro7u42p/f09FBUVITb7aaurm4W\nVkdERJI15aMgBgYGGBgYYPXq1XzyySf85V/+JZ2dnezdu5dFixbx2GOPTZg/HA7zjW98gzfeeIN4\nPM6dd95JJBLBZrPh8/l47rnn8Pl8lJWV8fDDD180MpgeBTF9ehSEiLXN2qMgcnNzWb16NQDXX389\nN998M/F4HOCSFXZ1dVFVVYXdbicvL4/8/HxCoRCJRILh4WF8Ph8A1dXVdHZ2TrvBC1V29tKUH4om\nIpKKaZ0DOHLkCL29vaxbtw6AXbt2UVxczJYtW8zB4/v7+3G5XGYZl8tFPB6/aLrT6TSDRGB4+Djj\ne8KpvEREpm/SQeHP98knn3Dvvfeyc+dOrr/+empra3nqqacAePLJJ3n88cdpaWmZkUbV19eb7/1+\nP36/f0aWKyKyEASDQYLBYNrLSSoATp8+zaZNm/jmN79JRUUFADk5OebnW7du5Z577gHG9+yj0aj5\nWSwWw+Vy4XQ6icViE6Y7nc5L1nd+AIiIyEQX7hg3NDSktJwpDwEZhsGWLVvweDw88sgj5vREImG+\nf/HFFykqKgKgvLyc9vZ2RkdH6evrIxKJ4PP5yM3NJTs7m1AohGEYtLW1mWEiIiKZN+UvgFdffZUf\n/ehHrFq1Cq/XC8COHTv4yU9+wqFDh7DZbCxfvpzdu3cD4PF4qKysxOPxkJWVRXNzs3misrm5mc2b\nNzMyMkJZWdlFVwCJiEjmaESweSK9Szl1GaiIlWlEMBERmRYFgIiIRSkAREQsSgEgImJRCgAREYtS\nAIiIWJQCQETEohQAIiIWpQAQEbEoBYCIiEUpAERELEoBICJiUQoAERGLUgCIiFiUAkBExKIUACIi\nFjVlAESjUe644w5uueUWCgsL+cEPfgDAsWPHKC0tZeXKlaxfv56hoSGzTGNjI263m4KCArq7u83p\nPT09FBUV4Xa7qaurm4XVERGRZE0ZAHa7nX/5l3/h3Xff5de//jX/+q//ynvvvUdTUxOlpaUcPnyY\nkpISmpqaAAiHw3R0dBAOhwkEAmzfvt0cqaa2tpaWlhYikQiRSIRAIDC7ayciIpc1ZQDk5uayevVq\nAK6//npuvvlm4vE4+/bto6amBoCamho6OzsB6OrqoqqqCrvdTl5eHvn5+YRCIRKJBMPDw/h8PgCq\nq6vNMiIiknnTOgdw5MgRent7Wbt2LYODgzgcDgAcDgeDg4MA9Pf343K5zDIul4t4PH7RdKfTSTwe\nn4l1EBGRFGQlO+Mnn3zCpk2b2LlzJ4sWLZrwmc1m++Og5jOjvr7efO/3+/H7/TO2bBGRK10wGCQY\nDKa9nKQC4PTp02zatIn777+fiooKYHyvf2BggNzcXBKJBDk5OcD4nn00GjXLxmIxXC4XTqeTWCw2\nYbrT6bxkfecHgMjlZaW147Fo0RJOnDg2g+0RyYwLd4wbGhpSWs6Uh4AMw2DLli14PB4eeeQRc3p5\neTmtra0AtLa2msFQXl5Oe3s7o6Oj9PX1EYlE8Pl85Obmkp2dTSgUwjAM2trazDIiqTkDGCm/hoeP\nz0GbReYPm3HuEp3L+NWvfsVXv/pVVq1aZe5tNTY24vP5qKys5He/+x15eXns3buXG264AYAdO3aw\nZ88esrKy2LlzJxs2bADGLwPdvHkzIyMjlJWVmZeUTmiQzcYUTVqQxvs21fVOp2y65a/suq34XZOF\nJ9Xt5pQBkGkKgJRKp1E23fJXdt1W/K7JwpPqdlN3AouIWJQCQETEohQAIiIWpQAQEbEoBYCIiEUp\nAERELEoBICJiUQoAERGLUgCIiFiUAkBExKIUACIiFqUAEBGxKAWAiIhFKQBERCxKASAiYlEKABER\ni5oyAB588EEcDgdFRUXmtPr6elwuF16vF6/Xy4EDB8zPGhsbcbvdFBQU0N3dbU7v6emhqKgIt9tN\nXV3dDK+GiIhM15QB8MADDxAIBCZMs9lsPPbYY/T29tLb28tdd90FQDgcpqOjg3A4TCAQYPv27eYo\nNbW1tbS0tBCJRIhEIhctU0REMmvKALj99ttZsmTJRdMvNfxYV1cXVVVV2O128vLyyM/PJxQKkUgk\nGB4exufzAVBdXU1nZ+cMNF9ERFKVlWrBXbt28cILL7BmzRqeffZZbrjhBvr7+1m3bp05j8vlIh6P\nY7fbcblc5nSn00k8Hr/ssuvr6833fr8fv9+fajNFRBacYDBIMBhMezkpBUBtbS1PPfUUAE8++SSP\nP/44LS0taTfmnPMDQEREJrpwx7ihoSGl5aR0FVBOTg42mw2bzcbWrVs5ePAgML5nH41GzflisRgu\nlwun00ksFpsw3el0ptTg+So7e6nZJ6m8REQyLaUASCQS5vsXX3zRvEKovLyc9vZ2RkdH6evrIxKJ\n4PP5yM3NJTs7m1AohGEYtLW1UVFRMTNrME8MDx8HjDReIiKZNeUhoKqqKl555RWOHj3KsmXLaGho\nIBgMcujQIWw2G8uXL2f37t0AeDweKisr8Xg8ZGVl0dzcbO7dNjc3s3nzZkZGRigrK2Pjxo2zu2Yi\nIjIpm3Gpy3nmkM1mu+QVRvPdeNCl0+50yqvuVMtfid81kQulut3UncAiIhalABARsSgFgIiIRSkA\nxMKyUr5sNzt76Vw3XiRtKd8JLHLlO0OqJ5GHh3Xvhlz59AtARMSiFAAiIhalABARsSgFgIiIRSkA\nREQsSgEgImJRCgAREYtSAIiIWJQCQETEohQAIiIWpQAQEbGoKQPgwQcfxOFwmMM+Ahw7dozS0lJW\nrlzJ+vXrGRoaMj9rbGzE7XZTUFBAd3e3Ob2np4eioiLcbjd1dXUzvBoiIjJdUwbAAw88QCAQmDCt\nqamJ0tJSDh8+TElJCU1NTQCEw2E6OjoIh8MEAgG2b99ujlJTW1tLS0sLkUiESCRy0TJFRCSzpgyA\n22+/nSVLlkyYtm/fPmpqagCoqamhs7MTgK6uLqqqqrDb7eTl5ZGfn08oFCKRSDA8PIzP5wOgurra\nLCMiInMjpcdBDw4O4nA4AHA4HAwODgLQ39/PunXrzPlcLhfxeBy73Y7L5TKnO51O4vH4ZZdfX19v\nvvf7/fj9/lSaKSKyIAWDQYLBYNrLSXs8gHMDZMyk8wNAREQmunDHuKGhIaXlpHQVkMPhYGBgAIBE\nIkFOTg4wvmcfjUbN+WKxGC6XC6fTSSwWmzDd6XSm1GAREZkZKQVAeXk5ra2tALS2tlJRUWFOb29v\nZ3R0lL6+PiKRCD6fj9zcXLKzswmFQhiGQVtbm1lGRETmiDGF++67z7jxxhsNu91uuFwuY8+ePcbv\nf/97o6SkxHC73UZpaalx/Phxc/7vfe97xooVK4ybbrrJCAQC5vQ333zTKCwsNFasWGH87d/+7WXr\nS6JJ8xJggJHGK53yqnsu6haZL1L9Ptr+WHjesNlszLMmJWX8PEg67U6nvOqei7qvxO+pLEypbjd1\nJ7CIiEUpAERELEoBICJiUQoAkZRkmffApPLKzl461ysgkv6NYAtJdvZShoePz3Uz5IpwhnROQA8P\nz+zNkyKp0FVAF9R9pV6RorqvvLrn2Z+eXMF0FZCIiEyLAkBExKIUACIiFqUAEBGxKAWAiIhFKQBE\nRCxKASAiYlEKABERi1IAiIhYlAJARMSi0gqAvLw8Vq1ahdfrxefzAXDs2DFKS0tZuXIl69evZ2ho\nyJy/sbERt9tNQUEB3d3d6bVcRETSklYA2Gw2gsEgvb29HDx4EICmpiZKS0s5fPgwJSUlNDU1ARAO\nh+no6CAcDhMIBNi+fTtjY2Ppr4GIiKQk7UNAFz6AaN++fdTU1ABQU1NDZ2cnAF1dXVRVVWG328nL\nyyM/P98MDRERyby0Hgdts9m48847ufrqq/n2t7/Ntm3bGBwcxOFwAOBwOBgcHASgv7+fdevWmWVd\nLhfxePySy62vrzff+/1+/H5/Os0UEVlQgsEgwWAw7eWkFQCvvvoqN954Ix999BGlpaUUFBRM+Pzc\n4BeXc7nPzg8AERGZ6MId44aGhpSWk9YhoBtvvBGAz33uc3z961/n4MGDOBwOBgYGAEgkEuTk5ADg\ndDqJRqNm2VgshtPpTKd6kStY6iOKaTQxmSkpB8Cnn37K8PAwACdPnqS7u5uioiLKy8tpbW0FoLW1\nlYqKCgDKy8tpb29ndHSUvr4+IpGIeeWQiPWcG1Fs+i+NWiczJeVDQIODg3z9618H4MyZM/z1X/81\n69evZ82aNVRWVtLS0kJeXh579+4FwOPxUFlZicfjISsri+bm5kkPD4mIyOzSkJAX1G3V4QlV95VV\n9zz7s5U5piEhRURkWhQAIiIWpQAQEbGotO4DmG+ys5fqCgkRkSQtqAAY3/ine2JORMQadAhIRMSi\nFAAiV5zU7yLWncRyvgV1CEjEGs7dRZya4WEd6pRx+gUgImJR8/IXwNat35nrJoiILHjz8lEQsCuF\nkgHgZ1j10QCqW3VPp/w8+7OXNKX6KIh5+QsAUvkFMMR4AIiISDJ0DkDEcjQWgYybp78ARGT2pH4V\nka4gWlj0C0BExKIyHgCBQICCggLcbjf/9E//lOnqryDBuW7APBKc6wbMI8G5bsC8MRODoltdRgPg\n7NmzfOc73yEQCBAOh/nJT37Ce++9l8kmXEGCc92AeSQ41w2YR4JzXP/8uQtZAZC+jAbAwYMHyc/P\nJy8vD7vdzn333UdXV1cmmyAiaUl9LOPx8YyH50V4yLiMngSOx+MsW7bM/LfL5SIUCl003+LF90x7\n2adOHeYPf0ireSIy69I5AW2/aBzxhoaGpMsvWrSEEyeOpVR3uo+aT6fu2ZTRAEh2EPiPP34pnVrS\nKJtu+ZmuO/kv98Ja70u5XF8s9PVW3TNlePh40tughVT3ZDIaAE6nk2g0av47Go3icrkmzKM7FEVE\nMiOj5wDWrFlDJBLhyJEjjI6O0tHRQXl5eSabICIif5TRXwBZWVk899xzbNiwgbNnz7JlyxZuvvnm\nTDZBRET+KOP3Adx1113s3LmTrKws9uzZc9l7AR5++GHcbjfFxcX09vZmuJWZM9V9ET/+8Y8pLi5m\n1apVfPnLX+add96Zg1ZmRrL3iLzxxhtkZWXxX//1XxlsXWYl0xfBYBCv10thYSF+vz+zDcygqfri\n6NGjbNy4kdWrV1NYWMgPf/jDzDcyAx588EEcDgdFRUWXnWfa200jw86cOWOsWLHC6OvrM0ZHR43i\n4mIjHA5PmOdnP/uZcddddxmGYRi//vWvjbVr12a6mRmRTF+89tprxtDQkGEYhnHgwAFL98W5+e64\n4w7j7rvvNn7605/OQUtnXzJ9cfz4ccPj8RjRaNQwDMP46KOP5qKpsy6Zvnj66aeN7373u4ZhjPfD\n0qVLjdOnT89Fc2fV//zP/xhvvfWWUVhYeMnPU9luZvwXQDL3Auzbt4+amhoA1q5dy9DQEIODg5lu\n6qxLpi9uu+02Fi9eDIz3RSwWm4umzrpk7xHZtWsX9957L5/73OfmoJWZkUxf/Md//AebNm0yL6L4\n7Gc/OxdNnXXJ9MWNN97IiRMnADhx4gR//ud/TlbWwnvM2e23386SJUsu+3kq282MB8Cl7gWIx+NT\nzrMQN3zJ9MX5WlpaKCsry0TTMi7Z70VXVxe1tbVA8pcVX2mS6YtIJMKxY8e44447WLNmDW1tbZlu\nZkYk0xfbtm3j3Xff5S/+4i8oLi5m586dmW7mvJDKdjPjMZnsH61xweWgC/GPfTrr9PLLL7Nnzx5e\nffXVWWzR3EmmLx555BGamprMwS8u/I4sFMn0xenTp3nrrbf4xS9+waeffsptt93GunXrcLvdGWhh\n5iTTFzt27GD16tUEg0E++OADSktLefvtt1m0aFEGWji/THe7mfEASOZegAvnicViOJ3OjLUxU5Lp\nC4B33nmHbdu2EQgEJv0JeCVLpi96enq47777gPETfwcOHMButy+4S4mT6Ytly5bx2c9+lmuuuYZr\nrrmGr371q7z99tsLLgCS6YvXXnuNf/iHfwBgxYoVLF++nP/93/9lzZo1GW3rXEtpuzljZyiSdPr0\naeOLX/yi0dfXZ/zhD3+Y8iTw66+/vmBPfCbTFx9++KGxYsUK4/XXX5+jVmZGMn1xvs2bNxv/+Z//\nmcEWZk4yffHee+8ZJSUlxpkzZ4yTJ08ahYWFxrvvvjtHLZ49yfTFo48+atTX1xuGYRgDAwOG0+k0\nfv/7389Fc2ddX19fUieBk91uZvwXwOXuBdi9ezcA3/72tykrK2P//v3k5+dz3XXX8e///u+ZbmZG\nJNMX//iP/8jx48fN4952u52DBw/OZbNnRTJ9YRXJ9EVBQQEbN25k1apVXHXVVWzbtg2PxzPHLZ95\nyfTF3//93/PAAw9QXFzM2NgY//zP/8zSpQvvwXFVVVW88sorHD16lGXLltHQ0MDp06eB1Leb825Q\neBERyQyNCCYiYlEKABERi1IAiIhYlAJARMSiFt790iIXuPrqq1m1apX5766uLj7/+c/PWn3XX389\nn3zyyawtX2Sm6CogWfAWLVrE8PDwJT879/WfyTvNJ6tPZD7RISCxnCNHjnDTTTdRU1NDUVER0WiU\n7du386UvfYnCwkLq6+vNeffv38/NN9/MmjVrePjhh7nnnvHxqj/66CNKS0spLCxk27Zt5OXlcezY\nxWO+PvPMM/h8PoqLiycsV2Q+UADIgjcyMoLX68Xr9bJp0yZsNhvvv/8+Dz30EL/97W/5/Oc/z/e+\n9z3eeOMN3n77bV555RV+85vfcOrUKf7mb/6GQCDAm2++ydGjR81fCg0NDdx555389re/5d577+V3\nv/vdRfV2d3fz/vvvc/DgQXp7e+np6eGXv/xlpldf5LJ0DkAWvGuuuWbC4BhHjhzhC1/4Aj6fz5zW\n0dHB8892g7HPAAABy0lEQVQ/z5kzZ0gkEoTDYc6ePcsXv/hFvvCFLwDjd2L+27/9GwCvvvoqnZ2d\nAGzYsOGSz2jq7u6mu7sbr9cLwMmTJ3n//fe5/fbbZ21dRaZDASCWdN1115nv+/r6ePbZZ3nzzTdZ\nvHgxDzzwAKdOnbrovMCFp8uSOX32xBNP8K1vfWtmGi0yw3QISCzvxIkTXHfddWRnZzM4OMiBAwew\n2WzcdNNN/N///R8ffvghMP4r4VwofPnLX2bv3r3A+J7+8ePHL1ruhg0b2LNnDydPngTGn9f+0Ucf\nZWitRKamXwCy4F3qCp/zpxUXF+P1eikoKGDZsmV85StfAeAzn/kMzc3NbNy4keuuu44vfelLZrmn\nn36aqqoq2trauO2228jNzTWfP39untLSUt577z1uu+02YPzqoB/96EcLejQzubLoMlCRSZw8edI8\nXPTQQw+xcuVK6urqGB0d5eqrr+bqq6/m9ddf56GHHuKtt96a49aKTI9+AYhM4vnnn6e1tZXR0VFu\nvfVW87HUH374IX/1V3/F2NgYf/Znf8bzzz8/xy0VmT79AhARsSidBBYRsSgFgIiIRSkAREQsSgEg\nImJRCgAREYtSAIiIWNT/A2Y3uW6G2JxcAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Now do the same thing for the related compound pairs." ] }, { "cell_type": "code", "collapsed": false, "input": [ "scoredLists = cPickle.load(gzip.open('../data/chembl16_25K.pairs.sims.pkl.gz','rb'))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": true, "input": [ "t1=time.time()\n", "rl=[]\n", "for i,(m1,m2) in enumerate(zip(ms1,ms2)):\n", " sim,frag= FraggleSim.GetFraggleSimilarity(m1[-1],m2[-1])\n", " rl.append((sim,i))\n", " if not (i%200):\n", " print 'Done: %d in %.2f seconds'%(i,time.time()-t1)\n", "t2=time.time()\n", "print 'Finished in %.2f seconds'%(t2-t1)\n", "scoredLists['Fraggle']=rl" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Done: 0 in 0.10 seconds\n", "Done: 200 in 37.79 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 400 in 83.12 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 600 in 133.13 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 800 in 174.72 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 1000 in 226.38 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 1200 in 274.27 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 1400 in 322.24 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 1600 in 366.22 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 1800 in 408.76 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 2000 in 460.27 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 2200 in 504.41 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 2400 in 543.93 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 2600 in 591.81 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 2800 in 635.61 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 3000 in 681.73 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 3200 in 728.26 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 3400 in 771.40 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 3600 in 813.29 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 3800 in 861.38 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 4000 in 906.49 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 4200 in 954.90 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 4400 in 997.52 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 4600 in 1041.82 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 4800 in 1088.03 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 5000 in 1134.05 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 5200 in 1170.79 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 5400 in 1211.28 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 5600 in 1257.09 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 5800 in 1301.07 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 6000 in 1343.60 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 6200 in 1385.13 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 6400 in 1425.09 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 6600 in 1471.67 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 6800 in 1513.88 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 7000 in 1560.01 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 7200 in 1603.64 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 7400 in 1647.56 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 7600 in 1692.30 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 7800 in 1737.26 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 8000 in 1781.66 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 8200 in 1828.17 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 8400 in 1871.50 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 8600 in 1915.69 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 8800 in 1956.71 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 9000 in 1997.96 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 9200 in 2040.47 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 9400 in 2085.69 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 9600 in 2133.86 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 9800 in 2185.82 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 10000 in 2234.24 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 10200 in 2284.10 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 10400 in 2333.33 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 10600 in 2375.41 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 10800 in 2418.13 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 11000 in 2470.55 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 11200 in 2512.55 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 11400 in 2553.32 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 11600 in 2598.75 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 11800 in 2646.64 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 12000 in 2692.88 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 12200 in 2741.21 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 12400 in 2783.86 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 12600 in 2828.30 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 12800 in 2872.25 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 13000 in 2918.02 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 13200 in 2959.99 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 13400 in 3007.89 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 13600 in 3050.15 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 13800 in 3099.61 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 14000 in 3145.79 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 14200 in 3190.81 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 14400 in 3234.20 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 14600 in 3275.04 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 14800 in 3314.82 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 15000 in 3358.80 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 15200 in 3400.57 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 15400 in 3441.54 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 15600 in 3494.32 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 15800 in 3533.18 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 16000 in 3578.51 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 16200 in 3623.28 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 16400 in 3664.12 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 16600 in 3711.36 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 16800 in 3751.84 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 17000 in 3797.13 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 17200 in 3844.04 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 17400 in 3881.47 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 17600 in 3928.48 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 17800 in 3971.64 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 18000 in 4016.54 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 18200 in 4060.79 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 18400 in 4106.77 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 18600 in 4149.58 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 18800 in 4190.75 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 19000 in 4237.42 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 19200 in 4279.87 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 19400 in 4328.97 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 19600 in 4373.51 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 19800 in 4415.70 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 20000 in 4458.43 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 20200 in 4505.40 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 20400 in 4549.35 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 20600 in 4591.15 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 20800 in 4632.82 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 21000 in 4675.24 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 21200 in 4722.26 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 21400 in 4763.14 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 21600 in 4804.33 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 21800 in 4850.55 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 22000 in 4893.26 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 22200 in 4935.05 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 22400 in 4980.35 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 22600 in 5021.81 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 22800 in 5063.18 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 23000 in 5103.84 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 23200 in 5146.18 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 23400 in 5187.49 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 23600 in 5232.10 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 23800 in 5275.02 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 24000 in 5318.88 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 24200 in 5360.90 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 24400 in 5404.27 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 24600 in 5443.92 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Done: 24800 in 5488.28 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Finished in 5535.28 seconds" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "cPickle.dump(scoredLists,gzip.open('../data/chembl16_25K.pairs.sims2.pkl.gz','wb+'))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the lists" ] }, { "cell_type": "code", "collapsed": false, "input": [ "scoredLists = cPickle.load(gzip.open('../data/chembl16_25K.pairs.sims2.pkl.gz','rb'))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "def directCompare(scoredLists,fp1,fp2,plotIt=True,silent=False):\n", " l1 = scoredLists[fp1]\n", " l2 = scoredLists[fp2]\n", " rl1=[x[-1] for x in l1]\n", " rl2=[x[-1] for x in l2]\n", " vl1=[x[0] for x in l1]\n", " vl2=[x[0] for x in l2]\n", " if plotIt:\n", " _=scatter(vl1,vl2,edgecolors='none')\n", " maxvl1=max(vl1)\n", " minvl1=min(vl1)\n", " maxvl2=max(vl2)\n", " minvl2=min(vl2)\n", " _=plot((minvl1,maxvl1),(minvl2,maxvl2),color='k',linestyle='-')\n", " xlabel(fp1)\n", " ylabel(fp2)\n", " \n", " tau,tau_p=stats.kendalltau(vl1,vl2)\n", " spearman_rho,spearman_p=stats.spearmanr(vl1,vl2)\n", " pearson_r,pearson_p = stats.pearsonr(vl1,vl2)\n", " if not silent:\n", " print fp1,fp2,tau,tau_p,spearman_rho,spearman_p,pearson_r,pearson_p\n", " return tau,spearman_rho,pearson_r" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Fraggle algorithm makes use of the RDKit5 fingerprint, so let's look at the comparison to that." ] }, { "cell_type": "code", "collapsed": false, "input": [ "_=directCompare(scoredLists,'Fraggle','RDKit5')\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Fraggle RDKit5 0.510174399518 0.0 0.676266099876 0.0 0.734593163378 0.0\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEMCAYAAAA1VZrrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8TPf+P/DXZJHFmqARSQiJiIgklEQojRJBW9fFbaOt\nraiiVHdd0ful3C63i6La0pRW9XaLFvlpEVpKLAm1tLUEEaVI0hBksrx/f3w6yZzZM5mcM2fm/Xw8\nzkPOnDNn3omZ857PriEiAmOMMWaBh9IBMMYYc36cLBhjjFnFyYIxxphVnCwYY4xZxcmCMcaYVZws\nGGOMWaVIsnjwwQcRFBSEbt26mTz+ySefID4+HnFxcejbty8OHTokc4SMMcb0KZIsJk6ciKysLLPH\nO3bsiB07duDQoUN48cUX8dBDD8kYHWOMMUOKJIt+/fohICDA7PHk5GQ0b94cAJCUlIRz587JFRpj\njDETnL7N4sMPP8SwYcOUDoMxxtyal9IBWLJt2zasXLkSO3fuNDqm0WgUiIgxxtTPnlmenLZkcejQ\nIUyZMgXr1683W2VFRKrd5s6dq3gMHL/ycXD86tvUHDuR/VMBOmWyOHv2LEaOHIk1a9YgMjJS6XAY\nY8ztKVINNWbMGGzfvh2XL19GWFgY5s+fj4qKCgDA1KlT8fLLL6O4uBjTpk0DAHh7eyMnJ0eJUBlj\njEGhZLF27VqLxz/44AN88MEHMkWjjJSUFKVDqBeOX1kcv3LUHHt9aKg+lVgK0mg09ap/Y4wxd2Tv\nvdMp2ywYY4w5F04WjDHGrOJkwRhjzCpOFowxxqziZMEYY8wqThaMMcas4mTBGGPMKk4WjDHGrOJk\nwRhjzCpOFowxxqziZMEYY8wqThaMMcas4mTBGGPMKk4WjDHGrOJkwRhjzCpOFowxxqziZMEYY8wq\nThaMMcas4mTBGGPMKk4WjDHGrOJkwRhjzCpOFowxxqySPVk8+OCDCAoKQrdu3cyeM2vWLHTq1Anx\n8fHIzc2VMTrGGGOmyJ4sJk6ciKysLLPHN27ciBMnTuD48eNYsWIFpk2bJmN0DW/DBsDTE9BogJAQ\npaNhjNlKo5Fu7kb2ZNGvXz8EBASYPb5+/XqMHz8eAJCUlISSkhJcvHhRrvAa3F13AdXV4ufz54HE\nRGXjYYxZ9/LL+nvXALhfwvBSOgBDhYWFCAsLq9kPDQ3FuXPnEBQUZHTuvHnzan5OSUlBSkqKDBE6\n1uHDSkfAGDPn//0/4MEHxRc7QAtgAYB1AA7DCW+fJmVnZyM7O7ve13HK35aIJPsaMylcP1mola+v\n0hEwxkxJTwfWrdPt5QKYACAMwFY46a3TJMMv0vPnz7frOk7XGyokJAQFBQU1++fOnUOIC1fuu1tR\nljFnV1UFPP20LlFoAcwFkAbgCQDfAmirYHTKcbpkMXz4cHz88ccAgN27d6NFixYmq6DUyjA59Oyp\nTByMMWN79gD+/sCrrwKiNNELwH4AeQDGAXDfb3eyl6XGjBmD7du34/LlywgLC8P8+fNRUVEBAJg6\ndSqGDRuGjRs3IjIyEo0bN8aqVavkDrFBbdokGrkrK4HISMBCxzDGmEzefBN46inxuaxtm1gG4DUA\nY+HOSUJHQ4YNBCqh0WiM2jYYY6wu8vKA3r2B8nLdI/ptEytgrcpJjbcge++dnCwYY26rSROgrAyw\npzTh7Q1otQ0doePZe+90ujYLxhhrSFVVwLJlQECALlHY1zahGy/lLjhZMMbcQlUV8MILgJ8fMH06\nUFJSv55OVVUNE6ezUk9nYcYYs1NRERAVBVy5ontEv20iD+7aHbYuuGTBGHNp69YBQUG6RMHjJuzF\nyYIx5pKqqoCFC8VIbNEllsdN1AdXQzHGXNKMGcB77wE8bsIxOFkwxlzGoUPAzp3AwYO6RMFtE47C\nyYIxpnpVVcDjjwNLlui6tHJpwtE4WTDGVO3DD0WX2AsXdI9waaIhcLJgjKlSVRWQkKC/JgyXJhoS\n94aS2YkTQL9+QPv2wHPPKR0NY+r0ww/A0KH6iYJ7OjU0nhtKZvHxohFOZ+1a0bWPMWabhQuB55/X\n7SlbmlDhLYjnhlKLo0el+5mZysTBmNpUVADjxgEvvaR7hEsTcuI2C5l5eekGCAmtWysXC2NqUVEB\ntGwJXL0KKF2a0HG3VS65ZCGz+++v/blRI2DiROViYczZrVkDBAeL1etEonCe0oQaq6Dqg0sWMnvj\nDeDsWaCwEJg0CejeXemIGHM+1dXAZ5+JaidxU3aO0oQ74wZumY0eDXz5pfjZwwPIzha9oxhjAhEw\nYACwfbvukbqtXicXjUada1pwA7dKZGfX/lxdDezYoVgojDmlgwd1icK5Z4j18VE6AnlxspBZjx6W\n9xlzV3/+CTzwgK4dz3naJsxxt8WPuBpKZpcvA088ARQUAPfdB0yerHREjClnxw4xbsLTE9i9Gygq\nUk/bRGCg/mJK6mHvvZMbuGUWEAD06SMaublUwdxZYSEwbJhuHWxAbXM6uVu3dy5ZyOzhh3VTJwO+\nvuLbVHy8sjExJrfSUmDvXmDQIECtPZ28vQGtVuko6k5VDdxZWVmIjo5Gp06dsHjxYqPjly9fxpAh\nQ5CQkIDY2Fh89NFH8gfZQHQ9oQDg5k1g40blYmFMbhcuiC9HzZvrqmCdv23CHDX2hKoP2ZNFVVUV\nHnnkEWRlZeHo0aNYu3Ytjh07JjlnyZIl6N69O/Ly8pCdnY0nnngClfrDnlWsQwfpfseOysTBmNzK\nyoAnn9TNjabF6dPO29PJFv7+SkcgL9mTRU5ODiIjIxEeHg5vb2+kp6cj02CCpODgYJSWlgIASktL\n0bJlS3h5uUbzyi231P6s0QBhYcrFwpgc/voLuO02oEkT4JNPADWXJvSVlysdgbxkvwMXFhYiTO8O\nGRoaij179kjOmTJlCu644w60bdsWV69exeeff27yWvPmzav5OSUlBSkpKQ0RskPt3l37MxGwbZto\n8GbMlVy/DmRkiESRnS2WOlVr24Q5ammvyM7ORrb+AC87yZ4sNDbMvrVw4UIkJCQgOzsbJ0+eRGpq\nKg4ePIimTZtKztNPFmoRHw9s3SrdZ8yVVFQAMTHAmTP6j6qrp5MrMfwiPX/+fLuuI3s1VEhICAoK\nCmr2CwoKEBoaKjln165d+Ne//gUAiIiIQIcOHfDbb7/JGmdD+fRT4N57geRksV7wXXcpHRFjjrNj\nBxAXp58onHsUdn34+SkdgbxkL1n07NkTx48fx+nTp9G2bVusW7cOa9eulZwTHR2NH374AX379sXF\nixfx22+/oaOLtAQHBYkJ0hhzNU8/Dbz6qv4jrl2aCAhQOgJ5yZ4svLy8sGTJEqSlpaGqqgqTJk1C\nly5d8N7fgw+mTp2K5557DhMnTkR8fDyqq6vxn//8B4GBgXKHyhizkf74oYZum/D1Fd3OlXb5stIR\nyIsH5THG6s3bW7eoV8POEOvjA3z3HZCa6tDL2oVnnWWMsTooKQGqq+VpmygvB4YOFW1+TF6cLBhj\ndVJaCtx9txiU1qwZEBiYi+pq+cZNVFYCR44A7ds32EvYxMPN7p5cDcUYs8mNG8D+/aJ94sgRwNXG\nTdSVh4c6pynnWWcZYw1mzx6xet2NG7pHXLunky14ug/W4CoqgKIipaNgzDaLFgF9++oShWuOm7Bn\nzIS7TdXDyUJm2dlAq1ZAy5bA4MHuN78MU5f584Fnn9VVt7jGnE6m1JaYbGcw/6nL4zYLmQUEiN4j\nOgsXig8jY87I3x+4ccO92yYsUeEtiLvOqoV+ogCAFSuUiYMxc8aOBVq00CUK1y1NsLrhZKEwT0+l\nI2Cs1rPPAmvWAH/9pcWNG/Vvm2jTxvgx/blE7Xn/N2lS9+c0BHf77HJvKJm1bQucP1+7//zzysXC\nmM6XX4rJ/8TClY7r6XThgvFjRCJhpKYCmzfX/ZrXrtkdjkO52ySgnCxkFhMjTRadOysXC2Nnz4p1\nsI8fB+QcN0EEbNnSIJeWjbstL8AN3DLz8pIO5Bk9Gvjf/5SLh7mv/fuBpCT9nk4T0FBzOrmiyEhd\nklUXHpSnEoajPt1tTnymrKwsYNYsMdZHowGqqtTb00mjUbY3kqkqNlfGJQuZ6U/l7OMD5OYCXboo\nGxNzD3/9BYSEAGVluke4NGGNl5duNl3TVHgL4q6zarF8uUgWTzwB7N3LiYLJo7QUKCjQJQrXHIXt\naE8/bTlRuBuuhpLZJ58A06eLqqivvwby8gCDpcUZc6jRo0VvJ0F9czq1aiX/QkOJiWKzxFm68MqF\nq6Fk5usrneJj2jRg6VLl4mGuiwgYMwZYtw5w9xli68LPT5TAiouB6Gjg0iXT57VpA/zxh7yxOQJX\nQ6mE4VxQO3cqEwdzTZcvAxcvip9HjtQlCh6FXRc3bojqp8BA0aYYHW36PHerEeBkITPDUZ/duikT\nB3M9CxcCt9wivvEmJgLffMNtE/bSzdcWEiLW7vjtN+PR6O426yxXQ8ksPV33bU90/fvlF6BrV2Vj\nYuq3bx/Qq5f+I+ro6eTj47wzL//wg1hbPD8fePFF0UFAX0QEcOKEMrHVB4+zUAn96Q2IRCM3JwtW\nH3v2AAMH6vbU1Tah1SodgXmDBlk+7izTjsiFq6FkVlws3f/mG2XiYOpXVATMmwfcf7+uS6z62iYa\nNVI6Avt5eysdgby4ZCEzw1GnwcHKxcLU68wZMVWHaMxWV2lCnzNUQXl5icbq7t2B7dttX1fby83u\nnoqULLKyshAdHY1OnTphsZjm0kh2dja6d++O2NhYpKSkyBtgAxowQLo/c6YycTD1mjoVCA/XJQp5\nSxMadeQgkwICTD9eWSlK/Fu32p4oAODcOcfEpRoks8rKSoqIiKD8/HzSarUUHx9PR48elZxTXFxM\nMTExVFBQQEREly5dMrqOAqE7xKxZRKJsQeTjQ3TihNIRMbVYu5Zo8mTd+6ecgJcIaE1ABgHVNe8r\n3kxvvXo5/ppqZO+9U/aSRU5ODiIjIxEeHg5vb2+kp6cjMzNTcs6nn36KUaNGITQ0FADQqlUrucNs\nMMuX1/5cXg4sWqRcLMy5VVcDTz0FREUBQUFigN0HHwBqbJtQmoeH6AJrjZpLTg1N9lq3wsJChOl1\nUA4NDcWePXsk5xw/fhwVFRUYMGAArl69ikcffRRjx441uta8efNqfk5JSVFFdZXhXDNqHAHK5LFs\nGfDaa/qPqLdtQmnV1cD169bPq0uPUrW0WWRnZyM7O7ve17H51z116hRyc3PRtWtXRJsb0mgDjQ2p\nu6KiAgcOHMCWLVtw/fp1JCcno3fv3ujUqZPkPP1koRaenuKNq2Nq2UnGAGkpVI1zOjkzwxUr7dG/\nv2NiaWiGX6Tnz59v13XMVkONGDGi5ufMzEwMHDgQ3333HYYPH45Vq1bZ9WIAEBISggK90S0FBQU1\n1U06YWFhGDx4MPz8/NCyZUv0798fBw8etPs1nYnh+hUG+Y8xAOIb7tGjgLPPEKvWyfSeeKL+nz1z\nc0a5LHONGQkJCTU/9+7dm06dOkVEorG5W7dudjWQEBFVVFRQx44dKT8/n8rLy002cB87dowGDhxI\nlZWVVFZWRrGxsXTkyBHJORZCd2qentIGspEjlY6IOaP9+4mAAwTEEXAnAYWKNxCb2nx9lY+hrltA\nAFFFBdHq1fW7zi23KP0usY+9906bqqG0Wi06dOgAQDQ2e3jY3y7u5eWFJUuWIC0tDVVVVZg0aRK6\ndOmC9/5eEWjq1KmIjo7GkCFDEBcXBw8PD0yZMgUxMTF2v6Yz0a+CAoCbN5WJgzmfy5eBY8eA//1P\ni2XLnL9tQqNR9v3r4WH8ebJFWZkYOZ6RUb/Xr11Eyj2YnRvK09MT/v7+AICbN2/i7NmzCA4ORnl5\nOXr16oVDhw7JGqghtc4N1aEDcPp07f6XX4rZQZl7W70aGD8eIHK+OZ1SUgBT7aPWVpGTw4gRlmdB\niInRVedJeXqK8oFhsmnRAigpse21PT2V//3tYe+9s84TCZaUlODYsWNITk6u84s5klqThbe39A2W\nng6sXatcPEwZmzcDf/4JDB0qpsL299fi5k3nLE3cf79YtEsuHTqIyfvqS7d2TEPdJjw86jaIz1k0\n2HoWzzzzjGS/RYsW+IYnNLKb4TcRd1v0nQFPPgmkpQFjxwKtWwNBQbm4edN5x03I/WXmzBnHXOfm\nzfonimbNzB+rR228Kln9dTfrT5P6t40bNzZIMO6otFTpCJiciIB33tHtaUE0F5cuOWdPJx172gXU\n9HqWDB0KDB9u+pi7DeAz28C9bNkyLF26FCdPnkQ3vRV6rl69ir59+8oSnDuwtX6UuYaLF3VVF64z\nbsLbG6ioUDoKMfdTz57Ajh3WJyj09xdjnE6dMj7m5ycSQUQE8MwzopowIMC4QdxwITNXZ7bN4q+/\n/kJxcTHmzJmDxYsX19RxNW3aFC1btpQ1SFPU2mZh+G2kRw9g/35lYmENT6sFJkwA1q8XYxIuXdKi\nuto52ybcTY8eYlT3r79KH4+MFGMo/vpLJBV/f9FTzVC3boDC/Xzs4vDFjzQaDcLDw/Huu+8ajbou\nKipCYGBg3aNkRt/CevdWLhbW8N56q7bOv6zMdUoTruDAAdOlA/3V765fNz9NiC3Th7gSs8lizJgx\n2LBhA2699VaTU3TkO6K7ghuaORN44w3xs78/MGuWsvGwhpWVBfCcTo7j6enYHkj1uZa7JQu71uB2\nhpKFWquhAOCLL4CzZ4G77+bpPlxNfj7w6qtinfWiIkAta2Ez88wlqFat1Dnlh8O7zk6ePNnk4wUF\nBeivlhm0nNC+fWJemqefBubPV2c/bWba4cOivnvZMqCoyLnndGK2M9c7y1Q7hiszmywqKirwwAMP\noFrvL3X06FHcfvvtePLJJ2UJzhVNmCBKFVVVYqDTRx8pHRFzBCLRE0d8XHi9CWdiaayELZo0AXx8\nHBOLmplNFqtWrYK/vz/uvfdeVFVVYdeuXUhLS8Pbb7+NCRMmyBiiazEstqqxGMuM/f47UF7OpQln\ndPWq/c/18xNTsfw9dZ1EVJT911Ujq20WM2fORG5uLs6ePYt169YpPs2HjlrbLB59FHj7bfGzr69Y\nvatjR2VjYvY7exa4cQO46648nDgxAUAouG1CWcHB9i0q1q6d+P/U8fcHHn9c/KzRANeuic9uVRXQ\nuLGY9FFvHTfVcPjcUDNnzqy56KeffooePXrULHqk0Wjwtu6OpxC1Jov27aVvyKVLgWnTlIuH2Wf7\ndrGK3XffcU8nV6KbydbHR/x844b5c5csAWbMkC82R3F4svjoo49quswanqLRaDB+/Hg7wnQctSYL\nDw/pfDVDhgCbNikXD6ub3btFu9NvvwGiPWICuDThnjp3Nh7QpwYOH5TH7RINo1kzMTJU59ZblYuF\n1c25c2IpzYoKLk24grAwQG/RzjpzgoksZGVxIsGPPvoIPXr0gL+/P/z9/dGzZ09k1HfFEDf3zTei\nrQIAkpOBl15SNh5m3eHDYnDdiy8CFRV5ABLBPZ3ULzFRLBEQHW1+UkBz8z9pNKYbvV2auSX0Pvro\nI0pISKCtW7dScXExFRUV0ZYtW6hHjx6UkZFh17J8jmQhdKe2bBmRRiOWZQwJIbpyRemImCW1/1/l\nBLxEQGsCMgioVnx5UN4sb7rPmbnNw0Msr5qRQdSsmelzOnQgiogwfeyOO5R+d9rH3nun2WclJibW\nrLutLz8/nxITE+16MUdSa7IwXLN46lSlI2KWBAcTAbkExJMzr4XNW903jYZoyhTL57z9NlH79qYT\nj7e30u9O+9h77zRbDXX16tWadbf1hYeH42p9Oi67OcOpk/WXWGXO4Y8/RN/6zZu1uHFjLoDBAB4H\nj5tQlzZtgKlTRTfXRo1EF/WgoNrjRMD771u+xscfi8WYiIyPOcO07HIy28Dtq6tYr+MxZpnhur3t\n2ikXC5P66Sfgl1+AuXOBS5f0ezrxDLHOQqMxfeM25cIFYOVKsVa4Vlu7dkWTJmLMhC3On7cvTldk\nNlkcO3ZMsuiRvpMnTzZYQK7OcFnVgweViYPVKi0VvdLE1NTc08lZRUUBx4/X7TkVFcYlAMPSffv2\nQEgIsGuX9HEvLzEwzxxeKe9vx44dM/skU1OWM/u42zq+zujVV3WJgksTSvrnP4GvvzZ//PffHfM6\nMTHSL2nTp4seUf/4h/S8ykrxvjBXEvEye/d0TWZ/3fDwcJOPExE+//xztG/fvqFicmleXtLSRUyM\ncrEwYdcuLk04A0uJwlGmTAFef110WT9wABgwAHjySTHQ0pwOHUSJ5uZN6eM8zuJv165dw+uvv47p\n06dj6dKlqK6uxtdff42uXbvik08+qdeLZmVlITo6Gp06dcLixYvNnrd37154eXnhq6++qtfrOZMW\nLaT7Zmr6mEz+8588bN3K4ybcwapVwIoVQNOmwH//K6Zsad0aCA0FvvvO/POGDDE9JY+7zURrdrqP\nkSNHolmzZkhOTsbmzZtRUFAAX19fvP3220hISLD7BauqqtC5c2f88MMPCAkJQa9evbB27Vp06dLF\n6LzU1FT4+/tj4sSJGDVqlDRwlU73YViDd9ttwI8/KhOLO8rKAjZvBgoLtdi0aQGuXuXShLvw8QH6\n9gUeeUSsiV5YCPzwg/kG80aNxESCiYnAzp3Am29K15/h6T7+duLECRz6ezXyyZMnIzg4GGfOnIGf\nn5/9UQLIyclBZGRkTTVXeno6MjMzjZLFO++8g9GjR2Pv3r31ej1n56h6WGbdp58C998PcNuEeyov\nB7ZuFZstRo8G4uKAkSNNH3e3laXNJgtPvXHunp6eCAkJqXeiAIDCwkKE6c3rGxoaij179hidk5mZ\nia1bt2Lv3r1mG9TnzZtX83NKSgpSUlLqHZ/c4uKUjsD1abXiG+Hy5dw24Qr8/Rt+/et27cSKhw89\n1LCvI4fs7GxkZ2fX+zpmk8WhQ4fQtGnTmv0bN27U7Gs0GpSWltr1grb0pJo9ezYWLVpUU1wyV2TS\nTxZqpdUqHYHr0mrF/D1PPglotVyacBWOThSmks9nn4m1KywNmq1HbbysDL9Iz58/367rmE0WVQ20\nOHRISAgK9KZ6LCgoQGhoqOSc/fv3Iz09HQBw+fJlbNq0Cd7e3hg+fHiDxKSko0eVjsA1bdggJom7\ndo1LE8wyU8ln1ixg3z7xs7e3GIdx9qx0PW53G7BndaU8R6usrETnzp2xZcsWtG3bFomJiSYbuHUm\nTpyIu+++GyMNKg5dpYG7Y0eAxzg61okTonrvxg1eb4I5xscfA+PGSR9r1Mh4gJ8aOLyBu6F4eXlh\nyZIlSEtLQ1VVFSZNmoQuXbrgvb/n+506darcIclKtxKXTo8eysXiio4dA3r00OLmTS5NMPs1bSpd\nuzsqSkzVo1/hYml0tyuSvWThKGotWXh6SpPFqFHAF18oF48rqK4Wk721aAHcd18esrImgEsTrD7a\ntBFzSwGAnx+QmwsMHixdEvkf/xDr06iNvfdOnmxCZob/R/qr5rG602qBpCSgY0ctAgPnIiuLZ4hl\ntktLM/24LlEAYh3uzz6TJgoA2Lix4eJyRpwsZGY4F1RwsDJxqF1FhShRzJgB7NvHq9cx+0yZYn41\nPH233GL8mOGkoK6Ok4XM9KugAPfrUeEIzz4rqgaaNNEiI4PXm2D227pV2g5hSlCQSCqGnVP0Rha4\nBTebN1F5htVQV64oE4faaLXAjh3A888DOTkAkIcbNyaAx00we7VuLabssKZDBzEBqLe3dFyUuyUL\nLlnIzPAN9q9/KROHmqxbBzRrBqSmAjk5WgBcmmD28/YGZs4Efv5ZjNC+7TbL5+t6PRl+0eNqKNag\nmjWT7uvNfMJMqKgQ00eL/uzcNuHuDD8/1jRuDMTGSmeIragAzp0DIiJEb6azZ0UCMae6GigpAQwX\nCOUpylmDKiyU7r/5pjJxqMX588DNm1yaYEBgoKgSqovnnxeN04aD577+Gli7FrjvPpEs9FfTi4mR\nJo/sbKBPH+O2DXebqofHWcjMsJGsfXvL88+4s6VLgUceyQPRBPC4CWYrw5XtDBcc0xk4ENiyRfrY\n8uVA//6mFyVr1EiaIEJDAb2Zi1SDx1mohGGy6N5dmTicXUmJFjNmzAURlyZY3RgugWqubSE21vix\nwYOB334zfrxxY6CtwdvP1PNdGfeGkhkPyjMtJwf45RegqAh47bU8/PnnBHBPJ9ZQDMc7AaIhW6sV\ns83qa94c+N//RBvF7beLZNS2LfDhh/LE6iw4WSisqEjpCJS3di3wwANAdTXPEMscz3A+NkDs79gh\nfez6dbHMsWFpf9w40RPv/Hlx/NAhoFcv4yWSXR1XQ8nMsBrK1MhQd1JRATz2GFBdzT2dmGP07i3d\nN0wUOkVFxg3mFRWi2kr3eEIC8MIL4udHHxXdbcvKgMxMYNEix8bt7DhZyMywGspwvhl38+qrWly8\nyD2dmH1MdXlNTbXtuWfOiGrgO++UPt6ihVg2oLhYTCCo+0Jn2JPRcN/VcbJQmGHfbXeSl5eHN97g\n0gQzzdqcTY0bS7u86h5r2lQsVmSLoiJgxAhAfzmdI0eAX381rmb65z+l+/fea9truApOFjLTHxwE\niO577kar1WLOnLlITByMK1e4NMGkfHyAW28VjcrmVmH28RHVQfpuvRWYOxd4+mnz3/oNP38A8O67\n0uRy8aK4hqGFC6X7ps5xZdzALTPDN38DrV7rtPLy8nD33RNw8WIoKiq4pxMzNmIE8MwzwKBBxtW2\nOuXl4rOkf3zAAOPzfH2Bnj2Bn34S+97eQKdOQH4+cPOm+Pzl5Rk/z7D7LSBGces7dsy238dVcMlC\nZjdvSvd//VWZOOSm1Woxd+5cDBgwGOfOPY6KCi5NMNPWrRMrSFrrKWiYSF5/Xbq6HQAMHQocOFC7\nf+0acPiwKJUYflHTdadt1Ei0V0RGAi++WHvcsH2Ee0MxWR09qnQEDWvRIiAyMg+tWiUiO3s/NBpu\nm2ANgwj4979r95s0EWtn2zqfVHW1KK20bw8cPCgauf/v/8TCR4BxtdPy5Y6JWy04WSjMlQflff65\nFs8+OxeqAagAAAAeYklEQVQnTw7G1auPIyfnWxQXc2mCyePaNdEAnpFh2wJHgEg4f/whfez4cfHv\ntm3Sx3mlPCYrV50TPy8vD7NnS3s63bzJpQlmn+bNbTtPv02wTx8gIEBM4XH8ONCvn1iZMiZGrJ9t\nav4nT0/RxqGvdWvxr+Ecbq5eK2CIk4XCIiKUjsCxdG0TgwcPRv/+3NOJOYa5EriPjxhA5+sLtGkD\nPPGEWJ9i/Hgxbce994oR2ffdJ5YDqK4WN/kDB4CoKOm1oqLElOVBQdLHp00Dpk83bvQ2bB9xdTzr\nrMwMe0OFhIi59V1BXl4eJkyYgODgUBw6tALnz3OSYPbx8TGeVtwSPz+RFH7+WVQlaTSiVGGpkfye\ne0SV09694rlr14r2irFjgTVrrL9m48ame005O551VqVKS5WOoP50pYlBgwZj3LjHkZ//LScKZhfd\nl6m6JAoAuHED2LWrtocUkfXeVJ6ewMqVYp2Mn38WS6xOniymnzGcYRYQU53ra9y4bjGqHilg06ZN\n1LlzZ4qMjKRFixYZHV+zZg3FxcVRt27dqE+fPnTw4EGjcxQKvd7E27h2a9VK6YjqJzc3l2Jj48nf\n/04CCo1+P954s7S1bEnk60vk4UEUHEz02GMN+3re3uLfwECivDyisWONz+nbl6isjOiee2ofS0oi\nmjxZet6aNUp/+uxj773TvmfVQ2VlJUVERFB+fj5ptVqKj4+no0ePSs7ZtWsXlZSUEJFILElJSUbX\ncZVk0b690hHZp7y8nF566SVq3bo19euXQUC14jce3pxz8/ExfywmRrrftm39Xisw0PyxsDCiU6eI\ntm8nunRJJItGjUyfW1oq3ufbthF9+y3RjRtEQUHSc3r3VvQjaDd7752yV0Pl5OQgMjIS4eHh8Pb2\nRnp6OjIzMyXnJCcno/nf3R+SkpJwzlUq9U0wbExTg7y8PCQmJmL//v3Iy8vDtWs8boKZZ6lKybBH\n0fnztl2zc2fg22+N2wDfe8989VBBgZhF9qWXxHTkL71kemnUsDAxRgMAUlKAu+4SDehXrkjPO3HC\ntlhdhezTfRQWFiIsLKxmPzQ0FHv27DF7/ocffohhw4aZPDZv3ryan1NSUpCSkuKoMGVDpHQEttNq\ntViwYAGWLVuGhx9+DSdPjsWCBRqjUemM1ZePj+j8ceqU9HE/P2DMGGDBAtFLyfDz89dfwGuvATNn\nml4hr7QU2L5d9Jjq3Nn4eECAGD9hak6qFi2Ay5dr9yMj6/57KSE7OxvZ2dn1vo7syUJjbmYwE7Zt\n24aVK1di586dJo/rJwu1UsMavpcuAWlpecjLm4DmzUPx2mt5mDq1rdvNa8Xkc+edQHq66LGk78YN\nMRVHmzaAqdvCDz8AP/5ofilVnYICIDpa+lhcnHiuuRHfcXHA1q21+8nJ1n8PZ2D4RXr+/Pl2XUf2\naqiQkBAU6N0hCwoKEBoaanTeoUOHMGXKFKxfvx4BAQFyhigrZ18pT6vV4o475iI3dzCIHkdJybeY\nPJkTBbOPrSOp770XWLbM9LHvvhP/JiUZH4uPN57wz5SgIODCBeljPj6WpwYxnOW2uNj667gS2ZNF\nz549cfz4cZw+fRparRbr1q3D8OHDJeecPXsWI0eOxJo1axCplrKenUzVmToLXdvE6dO83gSrv4ED\nRVdVf38xaZ/h8qX6oqLMT9SnWwNm4ULRxdXLS1QfzZkDPPUU8PjjluPw8hID9gzbUnJzxfxP8+eb\n/hJnGG9CguXXcTkObmi3ycaNGykqKooiIiJo4cKFRES0fPlyWr58ORERTZo0iQIDAykhIYESEhKo\nV69eRtdQKPR6M9Xzwtno93T6978zKDmZezrxZnpr3tz6OY8/TrR6NVFlJdG1a0QrVxK1ayeOaTTG\n52s0RIsXix5LAQHSY15eRF99Jd6nfftKj332We17eNs2ohUrzMd0112iN5O54926EWm10s9FWRnR\nP/5B1KED0dSpRFVVsn0kHcree6d9z3ICrpIsvLyUjkjqyy9zqV27eEpJuZO+/bbQYrdH3txna9LE\n+DEvL6KJE22/hq9v3V7z+++JysuJfviBaM8e8TMR0V9/EQ0ZYnz+8OHG7+d580xf+847jbvCGm6/\n/iq91gcf1B7z9hYJSY04WaiEqQ+cMygvL6f773+JgNYE8LgJ3qxvpkoFjtyWLTP9Xp092/T5ERGm\nzz98WJQ62rQR5wUEEP30k+Xfp2lTor+HetXo00d6/qRJjv0MysXeeyevlKew6mqlIwCWLMnDE09M\nQEVFKETbBE/V4c4aNbKtLY3I/LHgYNGIbGoVOltjGDTI9DFzS6aePy/matKNkdDp2lVsQ4cCv/8O\ndOwoGrK9vKS9piZNAnbvFoscvfaa8Uy3bdpY3nd5Dk5aslFr6Ka+zSilvLycnnmGSxO81W7+/kTP\nPlu77+lp33UCA8VoaVvONWz36NyZ6MABw/dq7c+ffWb+Wrt32/7+/+ST2uk/Rowgqq62fH5BgZj2\nw9dXtHlcu2b7azkTe++dPJGgwgyXapSLrqfT1q3c04nVun5dfLv+/ntgyRLglVekx/XG01pUVARM\nnGh9HQp/f+OpvpOSxCp0+/aJQXlduohurf36iW6xOTmmr+XrC7RrZ1t8gJi2vKREjCP6+mvTA/H0\ntW4tYtBt/v62v5ZLcHDSko1aQzf8JnTLLfK+vn5Pp4yMDOrcmUsTvEm3mBjpe+bLL4lmzSL6+GOi\n11+v37UbN5buR0ZaPnfQIOljTz1FNHq09DEPD6IuXYg2bDD/vt+1i+jJJ4neeUf0yrLHzJnS1126\n1L7rKM3eeye3WShMzkF5uvUm/P1DMXhwHo4fb2u0+hdzHr6+UGQqlTZtxBgEHx+xP3Kk2ABRxz9v\nnuWFfzw9IRm02by5mIajZUvp/Eoajfi2XlgoRmYbKisznuGgqEiUCL74ovax6mpRoggLA44cEe0T\n+vbtA26/XSyxCgCHD9u3frZhiSYnR0w54i64GkphcjRwa7VaPPfcXPTpMxg3bz6O3bu/xSeftMX/\n/V/d1w1g8mmoRBEcbPn41q3As8+aPublBQwZYvn5gYHS/QceEMngn/+UPk4k1pHQTyyGVUFxcdJj\nI0aI64wYIT1v3z5xbmwsMGOG9NimTbWJAgAM5i21Wb9+lvddnoNLOLJRa+iGRe2AgIZ9vdzcXIqP\nj6fQUF5vwtm3pk2tn6PRiO7W9lzf11dUJ1maxlu3BQSI9Rv+/FP6frp+nejHH4mio80/9+mniebM\nEY3BDz8sBrMREa1fb7677ZIlRJ9+SvTCC9LH/fyk+7NmiWvt3Wt8TH/7/ffamNeulR7r18++z9Lp\n02IwoZcXUUJC7e+lNvbeO+17lhNwlWTxj380zOsYtk1ERnLbhNxbly5EzZrZPhjNy4to2jRxkzUc\nuRwYWPdBbbZukyebv/Hefnvte+riRdFTCajtRaS/aTREzz1n3Kto82bRg6m4mCgri+jRR8WiX7rn\ntWhB9Mcf4ty33pJe08NDuj9lSu11f/2V6P33iT76yDgWwwF1L75IFBVFlJpKdOaMfZ8pw7aSF1+0\n7zpK42ShEoZv6m7dHP8autLEnXfeSYWFhURElJys/M3TVbf27U1/Y966tfb/5LvvxLftOXPE6GFT\ni+60bVt7/okTYkRyv35En39OlJtb97j8/MSXEUvneHsT/fvf5o/7+tbGZPit39TWtSvR+fO1z5k1\nq/ZYp05ERUXi8VOnxOjvMWOI9u+vPf/4cWnJKTCw9m/r4UE0cqTowmrIsOF85cr6f44MGX6GHnzQ\n8a8hB04WKmH44WrSxHHXNixNVOt9xTP8MPEmNkeMQvbxEQlD/7E2baT/N/n50m/To0ZJX7t5c6Jf\nfjH/f3v1qnR6Co1GLAn6yCPiC8fttxvHdffdIjbDb+e6rV07oh07iLZsMf+7DRwo4hozRiQCW/4e\nzZqJaiKt1vi1V6+2/B7+8Ufr1+/c2bhHk+FYjbfeqssnxzbLltVe38tLTEeiRpwsVMLwje+oNgvD\n0sT69aLYPHMm0fz5yt+UDbdGjYgeesj6eYbVHRoNUY8eYqI5R9zomzcnio+3fI5hlZCpLThYuj9n\njvT/55VXpMeDgsSynv/7nyhF2GLECOk1HnhAevy//xXtHgEB0oF1+lvTpiKxjB9PdPly7XOff158\noQgNFe+XkSOJpk8XJYDWrWufrxuk5+srzgkJMV0lde+9ojqqWTPp499+a/l3/O032/5f9Usvhn/f\n9u2JLlyw7W9qKDNT/A1atyb6e15Tiawsov/8R8xVpVacLFTC8E3fsWP9rmdYmsjOrqb09Iaft6cu\nm6lvts2bi1lFDW+yhlt6OtHCheJbsP51HnhAfJg7dhSNjXfdZfr5uuoeH5/auYEMt+nT6/f7xcZK\nq2gaNyY6dEj6//T++9LnGI5lsMU//ym9xogR5s/99FPbb7SW5OQYP3/16to2BiKRdPr3l54zbpw4\ntn69SFAajWhvsDZKmkg0duuS3j33GL9++/ZEFRXGz/v5ZzEm5MoV238/fSUl0rYbDw+iY8dqj5eV\niZJcv34iodryuzgjThYqYfjGNzf5mS0MSxM7d9o/PYOjt5YtiXr1InrzTaJ9+8TvqX+8Z0/xO5w5\nI74F61fR6DaNprao/9pr0mNNm0r/FqYGi2k04sP/yCPinCeeMD6nRQvRHjBrVm0JwlR7guHm6Ska\noxctqq2Hz8wU3+71bzA6FRXi27aHh2ibsOeb6fbtYjoOQPy7fbv5c69cEVNpG8bdo0fdpta+ckXa\ne6pVK+MJ9ohEg3JoqDgnPJzo5MnaY1VVRDdu2P6ahr76SiSa224TpWX9nk6OZGp6Ev12p2nTpMfe\nfrth4mhonCxUwvDN2Lp13a9hrm3ClmodR2zWpnbWLwEsWKCLWXwj02hEIjG80X3zjfQajRqJKhqd\nzEzp8fbtRRxhYeLYggWWY/r2W6JVq0wf8/ER9fa6pKW76eknBt3PTZqIqj1LN2pLTH0jrotTp4i+\n/lr8a82ffxK9955IYLNni6omXWKriwMHRClm5EhRdWbOzZsiLv15nNSkqkpaQurcWTr/U1KS9H0x\ncaJysdYHJwuVMLxRRUfX7fmmejrphIcb37QffJDo5Zdrv5E6anvsMVHXr+u50qeP6PEzbpz0vPBw\nEVt1NdEdd9Q+3qGD6EqpLytLnDNqlOlv56+8Ij7AvXtLq9n8/ERdt+7312iMq74++EBc46WXxDd7\nwwZ/w4VwWrQQiUPXVVR/s9QQzdTt+nXRkP3mm8bVWXPmSN8HGRnKxFhfnCxUwrAxcNo0255nqadT\ncbH4Vmyqr/xff4lzsrPJoQsZpaVJ9ydPFq/zzjvSx5OSxOOFhcbXsDSXjyWmeu+cOiWqRzZvFolm\n8uTaY8HBxvX0hkktMVG67+cnzjt9WlqyaNTI/sZTpm6VlaJjxX33NUzXXLlwslCJ+++XfvO3ZUpl\nS6UJIuPisf7Wv7+48YWFmT4eH080Y4ZoE7jrLtGW8Mgjom7bUrIYOlS6HxsrYtFqRTdLHx9Rajp8\nWDxeViYdoazREB08aN/f8OpV6Tf+fv2M6+Grq8X4hHffFYnKUEGBaGQGRJfQn36Sdr/UX9gmI0M0\njrdtS7RunX0xM+Ys7L13av5+supoNBqoMfTycuD118UEaffcAwwYYP5crVaLBQsWYNmyZXjttdcw\nduxYaAwmzykpEYvV26NnT2D9etNzBV2/DixYIOIcNUrMrfP112Iuq7vvFvP03Hdf7fkPPQS8957l\n1/v+e2DsWDEltL8/8MYbwJQp9sV+5QqwerWY7G7CBMDPz77rlJaKhXAAsTDOV1+JifTGjQM8eOY0\n5oLsvXdysnBSuhliQ0NDsWLFCrRta3r1uupqoH174Nw5sa/RiO/GgFgH4Nix2nN1s5j6+4ubYlpa\n/WJcsQLYuBGIjhYzkfr6Wj6/qAgICamdIM/DA/j1V6BTp/rFwRizHScLF2FLacLQkSPA7NniW/Jj\njwHx8eLGfeiQmFq6ulrcmFetAqKiRHKxNvNoQzh+XLy+vu3bgf795Y+FMXfFycIF2FqaqIvdu8XW\nqxfQt68DgqyHqiogJQX46Sex36WLmFra7VYcY0xBnCxUzJ7ShFpdvw5kZABarWgXsLe9hTFmH3vv\nnYo04WVlZSE6OhqdOnXC4sWLTZ4za9YsdOrUCfHx8cjNzZU5woZ1+DCQlSVWD9Othb1//37k5eVh\n3LhxLpsoAFGKmDYNePRRThSMqUo9emDZpbKykiIiIig/P5+0Wi3Fx8fT0aNHJeds2LCBhg4dSkRE\nu3fvpiRdZ309CoTuEMuW6QaUlVOLFi9Ry5bG4yYYY6yh2HvvlL1kkZOTg8jISISHh8Pb2xvp6enI\nNFjncP369Rg/fjwAICkpCSUlJbh48aLcoTaIl18GiE4CSERJyX48/LDrlyYYY+rnJfcLFhYWIiws\nrGY/NDQUe/bssXrOuXPnEBQUJDlv3rx5NT+npKQgJSWlQWJ2JDEeoDWAZwCko00bThKMsYaTnZ2N\n7Ozsel9H9mRh6zdoMmiAMfU8/WShFkuXAqNHN8O1a2OQkgJMmqR0RIwxV2b4RXr+/Pl2XUf2ZBES\nEoKCgoKa/YKCAoSGhlo859y5cwgJCZEtxoaUlgZcvAgUFwNt24pBdIwx5uxkb7Po2bMnjh8/jtOn\nT0Or1WLdunUYPny45Jzhw4fj448/BgDs3r0bLVq0MKqCUjN/fzGSmRMFY0wtZC9ZeHl5YcmSJUhL\nS0NVVRUmTZqELl264L2/JxaaOnUqhg0bho0bNyIyMhKNGzfGqlWr5A6TMcaYHh6UxxhjbkRVg/IY\nY4ypCycLxhhjVnGyYIwxZhUnC8YYY1ZxsmCMMWYVJwvGGGNWcbJgjDFmFScLxhhjVnGyYIwxZhUn\nC8YYY1ZxsmCMMWYVJwvGGGNWcbJgjDFmFScLxhhjVnGyYIwxZhUnC8YYY1ZxsmCMMWYVJwvGGGNW\ncbJgjDFmFScLxhhjVnGyYIwxZhUnC8YYY1bJmiyKioqQmpqKqKgoDB48GCUlJUbnFBQUYMCAAeja\ntStiY2Px9ttvyxmibLKzs5UOoV44fmVx/MpRc+z1IWuyWLRoEVJTU/H7779j4MCBWLRokdE53t7e\n+O9//4sjR45g9+7dePfdd3Hs2DE5w5SF2t9wHL+yOH7lqDn2+pA1Waxfvx7jx48HAIwfPx7ffPON\n0Tlt2rRBQkICAKBJkybo0qULzp8/L2eYjDHGDMiaLC5evIigoCAAQFBQEC5evGjx/NOnTyM3NxdJ\nSUlyhMcYY8wMDRGRIy+YmpqKCxcuGD2+YMECjB8/HsXFxTWPBQYGoqioyOR1rl27hpSUFLzwwgsY\nMWKE0XGNRuO4oBljzI3Yc9v3cnQQ33//vdljQUFBuHDhAtq0aYM//vgDt9xyi8nzKioqMGrUKDzw\nwAMmEwVg3y/LGGPMPrJWQw0fPhwZGRkAgIyMDJOJgIgwadIkxMTEYPbs2XKGxxhjzAyHV0NZUlRU\nhHvuuQdnz55FeHg4Pv/8c7Ro0QLnz5/HlClTsGHDBvz000/o378/4uLiaqqaXnnlFQwZMkSuMBlj\njBkilbhy5QoNGjSIOnXqRKmpqVRcXGx0ztmzZyklJYViYmKoa9eu9NZbbykQqdSmTZuoc+fOFBkZ\nSYsWLTJ5zsyZMykyMpLi4uLowIEDMkdombX416xZQ3FxcdStWzfq06cPHTx4UIEozbPl709ElJOT\nQ56envTll1/KGJ1ltsS+bds2SkhIoK5du9Ltt98ub4BWWIv/0qVLlJaWRvHx8dS1a1datWqV/EGa\nMXHiRLrlllsoNjbW7DnO/Lm1Fr89n1vVJIunnnqKFi9eTEREixYtomeeecbonD/++INyc3OJiOjq\n1asUFRVFR48elTVOfZWVlRQREUH5+fmk1WopPj7eKJ4NGzbQ0KFDiYho9+7dlJSUpESoJtkS/65d\nu6ikpISIxM1BbfHrzhswYADdeeed9MUXXygQqTFbYi8uLqaYmBgqKCggInHzdRa2xD937lyaM2cO\nEYnYAwMDqaKiQolwjezYsYMOHDhg9mbrzJ9bIuvx2/O5Vc10H2oco5GTk4PIyEiEh4fD29sb6enp\nyMzMlJyj/3slJSWhpKTEapdiudgSf3JyMpo3bw5AxH/u3DklQjXJlvgB4J133sHo0aPRunVrBaI0\nzZbYP/30U4waNQqhoaEAgFatWikRqkm2xB8cHIzS0lIAQGlpKVq2bAkvL4f3ubFLv379EBAQYPa4\nM39uAevx2/O5VU2yUOMYjcLCQoSFhdXsh4aGorCw0Oo5znLDtSV+fR9++CGGDRsmR2g2sfXvn5mZ\niWnTpgFwni7ZtsR+/PhxFBUVYcCAAejZsydWr14td5hm2RL/lClTcOTIEbRt2xbx8fF466235A7T\nbs78ua0rWz+3zpHG/2ZpjIY+jUZj8UN97do1jB49Gm+99RaaNGni8DhtZeuNhwz6GDjLDasucWzb\ntg0rV67Ezp07GzCiurEl/tmzZ2PRokXQaDQgUS0rQ2TW2RJ7RUUFDhw4gC1btuD69etITk5G7969\n0alTJxkitMyW+BcuXIiEhARkZ2fj5MmTSE1NxcGDB9G0aVMZIqw/Z/3c1kVdPrdOlSzkGqMhl5CQ\nEBQUFNTsFxQU1FQZmDvn3LlzCAkJkS1GS2yJHwAOHTqEKVOmICsry2LRV262xL9//36kp6cDAC5f\nvoxNmzbB29sbw4cPlzVWQ7bEHhYWhlatWsHPzw9+fn7o378/Dh486BTJwpb4d+3aheeffx4AEBER\ngQ4dOuC3335Dz549ZY3VHs78ubVVnT+3DmtRaWBPPfVUTY+KV155xWQDd3V1NY0dO5Zmz54td3gm\nVVRUUMeOHSk/P5/Ky8utNnD//PPPTtVQZkv8Z86coYiICPr5558VitI8W+LXN2HCBKfpDWVL7MeO\nHaOBAwdSZWUllZWVUWxsLB05ckShiKVsif+xxx6jefPmERHRhQsXKCQkhK5cuaJEuCbl5+fb1MDt\nbJ9bHUvx2/O5VU2yuHLlCg0cONCo62xhYSENGzaMiIh+/PFH0mg0FB8fTwkJCZSQkECbNm1SMmza\nuHEjRUVFUUREBC1cuJCIiJYvX07Lly+vOWfGjBkUERFBcXFxtH//fqVCNcla/JMmTaLAwMCav3ev\nXr2UDNeILX9/HWdKFkS2xf7qq69STEwMxcbGOkVXcX3W4r906RLdddddFBcXR7GxsfTJJ58oGa5E\neno6BQcHk7e3N4WGhtKHH36oqs+ttfjt+dzKOiiPMcaYOqmmNxRjjDHlcLJgjDFmFScLxhhjVnGy\nYIwxZpVTjbNgTGmenp6Ii4ur2c/MzES7du0a7PWaNGmCa9euNdj1GXMU7g3FmJ6mTZvi6tWrJo/p\nPiqOHKlr6fUYcyZcDcWYBadPn0bnzp0xfvx4dOvWDQUFBZg+fTp69eqF2NhYzJs3r+bcjRs3okuX\nLujZsydmzZqFu+++GwBw6dIlpKamIjY2FlOmTEF4eLjJ5YRfffVVJCYmIj4+XnJdxpwBJwvG9Ny4\ncQPdu3dH9+7dMWrUKGg0Gpw4cQIzZszA4cOH0a5dOyxYsAB79+7FwYMHsX37dvzyyy+4efMmHn74\nYWRlZWHfvn24fPlyTQlk/vz5GDRoEA4fPozRo0fj7NmzRq+7efNmnDhxAjk5OcjNzcX+/fvx448/\nyv3rM2YWt1kwpsfPzw+5ubk1+6dPn0b79u2RmJhY89i6devw/vvvo7KyEn/88QeOHj2KqqoqdOzY\nEe3btwcAjBkzBitWrAAA7Ny5s2ZK/bS0NJPz8GzevBmbN29G9+7dAQBlZWU4ceIE+vXr12C/K2N1\nwcmCMSsaN25c83N+fj5ef/117Nu3D82bN8fEiRNx8+ZNo3YMw6ZAW5oGn332WTz00EOOCZoxB+Nq\nKMbqoLS0FI0bN0azZs1w8eJFbNq0CRqNBp07d8apU6dw5swZAKL0oUsgffv2xeeffw5AlCCKi4uN\nrpuWloaVK1eirKwMgFgv4dKlSzL9VoxZxyULxvSY6umk/1h8fDy6d++O6OhohIWF4bbbbgMA+Pr6\nYunSpRgyZAgaN26MXr161Txv7ty5GDNmDFavXo3k5GS0adOmZs0G3Tmpqak4duwYkpOTAYheUmvW\nrHGq1fuYe+Ous4w5SFlZWU2V1YwZMxAVFYVHH30UWq0Wnp6e8PT0xM8//4wZM2bgwIEDCkfLWN1w\nyYIxB3n//feRkZEBrVaLHj16YOrUqQCAM2fO4N5770V1dTUaNWqE999/X+FIGas7Llkwxhizihu4\nGWOMWcXJgjHGmFWcLBhjjFnFyYIxxphVnCwYY4xZxcmCMcaYVf8fnPuLke8Q55sAAAAASUVORK5C\nYII=\n", "text": [ "" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's an interesting shape... \n", "\n", "### Let's a look at the points where the Fraggle similarity is high but the RDKit similarity is low.\n", "\n", "We'll get ready by loading the data into a Pandas data frame." ] }, { "cell_type": "code", "collapsed": false, "input": [ "df = pandas.DataFrame(index=range(len(ms1)),columns=['mol1','mol2','Fraggle','RDKit5'])\n", "df.mol1 = [x[1] for x in ms1]\n", "df.mol2 = [x[1] for x in ms2]\n", "df.Fraggle = [x[0] for x in scoredLists['Fraggle']]\n", "df.RDKit5 = [x[0] for x in scoredLists['RDKit5']]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now do the subset" ] }, { "cell_type": "code", "collapsed": false, "input": [ "subset = df[df.RDKit5<0.2][df.Fraggle>0.8]\n", "subset.sort(columns=['Fraggle'],ascending=False,inplace=True)\n", "len(subset)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 55, "text": [ "62" ] } ], "prompt_number": 55 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add the fragment that Fraggle is using to each row:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "frags = []\n", "for row in subset.itertuples():\n", " m1 = row[1]\n", " m2 = row[2]\n", " sim,frag= FraggleSim.GetFraggleSimilarity(m1,m2)\n", " frags.append(frag) \n", "mfrags = [Chem.MolFromSmiles(x) for x in frags]\n", "subset['Fragment']=frags\n", "subset['FragMol']=mfrags" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 56 }, { "cell_type": "code", "collapsed": false, "input": [ "subset" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mol1mol2FraggleRDKit5FragmentFragMol
2768 \"Mol\"/ \"Mol\"/ 1.000000 0.198157 [*]C(F)(F)Cl.[*]C(F)(Cl)C(F)(F)F \"Mol\"/
2937 \"Mol\"/ \"Mol\"/ 1.000000 0.128205 [*]C[Se](=O)O \"Mol\"/
7696 \"Mol\"/ \"Mol\"/ 1.000000 0.157738 [*]c1ncnc(N)c1[*] \"Mol\"/
21156 \"Mol\"/ \"Mol\"/ 1.000000 0.184080 [*]CCC.[*]c1c2ccccc2nc2ccccc12 \"Mol\"/
3347 \"Mol\"/ \"Mol\"/ 1.000000 0.104478 [*]CC(C)(C)CO.[*]C(C)(C)CO \"Mol\"/
6534 \"Mol\"/ \"Mol\"/ 1.000000 0.071942 [*]CCNC.[*]CNC \"Mol\"/
10494 \"Mol\"/ \"Mol\"/ 0.969231 0.079365 [*]CCCCCCCCCCCCCCCCC \"Mol\"/
23207 \"Mol\"/ \"Mol\"/ 0.964602 0.164706 [*]SCCO.[*][N+](=O)[O-] \"Mol\"/
6250 \"Mol\"/ \"Mol\"/ 0.952096 0.172185 [*]c1ccccc1[*] \"Mol\"/
24245 \"Mol\"/ \"Mol\"/ 0.950000 0.185687 [*]c1cccc[n+]1[O-].[*]C(C)c1cc(C)ccc1C \"Mol\"/
15887 \"Mol\"/ \"Mol\"/ 0.950000 0.176136 [*][C@@H]1CCCNC1 \"Mol\"/
17667 \"Mol\"/ \"Mol\"/ 0.949580 0.161392 [*]c1ccccc1.[*]N1C(=O)CNC1=O \"Mol\"/
17500 \"Mol\"/ \"Mol\"/ 0.948718 0.156951 [*]CCCCCCCCCCC.[*]CP(=O)(OC)OC \"Mol\"/
19213 \"Mol\"/ \"Mol\"/ 0.931034 0.190476 [*]NC(=N)CN.[*]C(=O)O \"Mol\"/
21961 \"Mol\"/ \"Mol\"/ 0.929412 0.168790 [*]CSC#N.[*]c1ccccc1[*] \"Mol\"/
15634 \"Mol\"/ \"Mol\"/ 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 \"Mol\"/
17356 \"Mol\"/ \"Mol\"/ 0.925926 0.120805 [*]CCCCCC.[*]CC(N)=O \"Mol\"/
19129 \"Mol\"/ \"Mol\"/ 0.919355 0.174863 [*]c1cncc(Cl)c1 \"Mol\"/
13401 \"Mol\"/ \"Mol\"/ 0.918750 0.184275 [*]CC1CC1.[*]c1ccccc1Br \"Mol\"/
22933 \"Mol\"/ \"Mol\"/ 0.916667 0.183784 [*]CC#C.[*]c1ncccn1 \"Mol\"/
4404 \"Mol\"/ \"Mol\"/ 0.907216 0.186667 [*]CSC.[*]c1ccccc1 \"Mol\"/
12294 \"Mol\"/ \"Mol\"/ 0.894737 0.099010 [*]c1ccccc1.[*]N(C)C \"Mol\"/
16786 \"Mol\"/ \"Mol\"/ 0.893617 0.112426 [*]c1sc[n+](C)c1C.[*]c1sc[n+](C)c1C \"Mol\"/
13760 \"Mol\"/ \"Mol\"/ 0.887218 0.190283 [*]COC(N)=O.[*][N+](=O)[O-] \"Mol\"/
4473 \"Mol\"/ \"Mol\"/ 0.885417 0.145985 [*]c1ccccc1.[*]c1ccccc1 \"Mol\"/
19112 \"Mol\"/ \"Mol\"/ 0.883721 0.157598 [*]c1cn2ccsc2n1.[*]n1nc(C)cc1C \"Mol\"/
17148 \"Mol\"/ \"Mol\"/ 0.882353 0.166667 [*]CCCCCCCCCCCCC \"Mol\"/
6334 \"Mol\"/ \"Mol\"/ 0.882353 0.190678 [*]c1ccccc1[*] \"Mol\"/
16077 \"Mol\"/ \"Mol\"/ 0.879518 0.123684 [*]c1nc(C)nn1[*].[*]c1ccccc1 \"Mol\"/
8779 \"Mol\"/ \"Mol\"/ 0.878505 0.152685 [*]c1nc2nnnc-2c(O)n1[*] \"Mol\"/
2002 \"Mol\"/ \"Mol\"/ 0.875000 0.154667 [*]c1ccco1.[*]c1ncnn1[*] \"Mol\"/
5529 \"Mol\"/ \"Mol\"/ 0.875000 0.135714 [*]C(N)=O.[*]C(CC)CCCC \"Mol\"/
15573 \"Mol\"/ \"Mol\"/ 0.859813 0.090196 [*]C(CSCCCCCCCCCCCCCCCC)OC.[*][n+]1ccsc1 \"Mol\"/
17492 \"Mol\"/ \"Mol\"/ 0.858824 0.182573 [*]c1ccc2c[nH]nc2c1 \"Mol\"/
20831 \"Mol\"/ \"Mol\"/ 0.858824 0.182573 [*]c1ccc2c[nH]nc2c1 \"Mol\"/
2570 \"Mol\"/ \"Mol\"/ 0.853659 0.111842 [*]C(=O)CCCCCCC.[*]C(=O)CCCCCCC \"Mol\"/
23156 \"Mol\"/ \"Mol\"/ 0.853659 0.130081 [*]CCCCCCCCC.[*]OC(=O)C=C \"Mol\"/
13103 \"Mol\"/ \"Mol\"/ 0.853333 0.091892 [*]CCCCCCCCCCC.[*]C(N)=O \"Mol\"/
18140 \"Mol\"/ \"Mol\"/ 0.851852 0.197101 [*]C(=O)OC(C)(C)C.[*]C(=O)OC(C)(C)C \"Mol\"/
24087 \"Mol\"/ \"Mol\"/ 0.851351 0.180851 [*]c1c(C)ncn1[*] \"Mol\"/
6051 \"Mol\"/ \"Mol\"/ 0.850000 0.182927 [*]C(=O)OCC.[*]C(=O)OCC \"Mol\"/
7595 \"Mol\"/ \"Mol\"/ 0.839161 0.094955 [*]OC=O.[*]C(C(=O)O)C(=O)O \"Mol\"/
3185 \"Mol\"/ \"Mol\"/ 0.838323 0.111111 [*]N1CCOCC1.[*]S(C)(=O)=O \"Mol\"/
15940 \"Mol\"/ \"Mol\"/ 0.835821 0.127490 [*]/C=C(\\O)C(=O)O.[*]C(C)C \"Mol\"/
20472 \"Mol\"/ \"Mol\"/ 0.831858 0.160458 [*]C(=O)Cc1ccsc1.[*]C(=O)NC1CCCCCC1 \"Mol\"/
16457 \"Mol\"/ \"Mol\"/ 0.829787 0.197581 [*]C(=O)OCC.[*]C(=O)C(C)N1CCOCC1 \"Mol\"/
1283 \"Mol\"/ \"Mol\"/ 0.828571 0.158301 [*]c1ccccc1.[*]C(C)C \"Mol\"/
8629 \"Mol\"/ \"Mol\"/ 0.827273 0.183333 [*]CCC(N)=O.[*]NCc1ccccc1 \"Mol\"/
10367 \"Mol\"/ \"Mol\"/ 0.827273 0.161812 [*]CCc1ccccc1.[*]C(CCS)C(=O)O \"Mol\"/
9572 \"Mol\"/ \"Mol\"/ 0.826446 0.164080 [*]c1ccc(Cl)c(Cl)c1.[*]c1cc(Cl)c(Cl)cc1[*] \"Mol\"/
13684 \"Mol\"/ \"Mol\"/ 0.824074 0.130233 [*]C1CN2CCC1C2 \"Mol\"/
12972 \"Mol\"/ \"Mol\"/ 0.823529 0.154762 [*]c1c(Br)cnn1[*] \"Mol\"/
22001 \"Mol\"/ \"Mol\"/ 0.822785 0.136364 [*]C(=O)O.[*]c1ncsc1[*] \"Mol\"/
5087 \"Mol\"/ \"Mol\"/ 0.822222 0.173913 [*]CC(C)(C)C(=O)O \"Mol\"/
19447 \"Mol\"/ \"Mol\"/ 0.816327 0.184507 [*]c1cc2ccccc2o1.[*]c1nnnn1C \"Mol\"/
8002 \"Mol\"/ \"Mol\"/ 0.810345 0.115672 [*]CCCC.[*]C(O)(P(=O)(O)O)P(=O)(O)O \"Mol\"/
18304 \"Mol\"/ \"Mol\"/ 0.809091 0.153226 [*]c1ccco1.[*]c1ccccn1 \"Mol\"/
17612 \"Mol\"/ \"Mol\"/ 0.809091 0.162791 [*]c1ccco1.[*]c1ccncc1 \"Mol\"/
225 \"Mol\"/ \"Mol\"/ 0.806122 0.198330 [*]c1ccccc1[*] \"Mol\"/
6688 \"Mol\"/ \"Mol\"/ 0.804598 0.138122 [*]CCO.[*]c1ccccc1[*] \"Mol\"/
16812 \"Mol\"/ \"Mol\"/ 0.802083 0.188732 [*]CCCCC.[*]c1nnc(N)s1 \"Mol\"/
14552 \"Mol\"/ \"Mol\"/ 0.801980 0.157407 [*]NC(N)=S.[*]Oc1cccc2ccccc21 \"Mol\"/
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 54, "text": [ " mol1 mol2 Fraggle RDKit5 Fragment FragMol\n", "2768 \"Mol\"/ \"Mol\"/ 1.000000 0.198157 [*]C(F)(F)Cl.[*]C(F)(Cl)C(F)(F)F \"Mol\"/\n", "2937 \"Mol\"/ \"Mol\"/ 1.000000 0.128205 [*]C[Se](=O)O \"Mol\"/\n", "7696 \"Mol\"/ \"Mol\"/ 1.000000 0.157738 [*]c1ncnc(N)c1[*] \"Mol\"/\n", "21156 \"Mol\"/ \"Mol\"/ 1.000000 0.184080 [*]CCC.[*]c1c2ccccc2nc2ccccc12 \"Mol\"/\n", "3347 \"Mol\"/ \"Mol\"/ 1.000000 0.104478 [*]CC(C)(C)CO.[*]C(C)(C)CO \"Mol\"/\n", "6534 \"Mol\"/ \"Mol\"/ 1.000000 0.071942 [*]CCNC.[*]CNC \"Mol\"/\n", "10494 \"Mol\"/ \"Mol\"/ 0.969231 0.079365 [*]CCCCCCCCCCCCCCCCC \"Mol\"/\n", "23207 \"Mol\"/ \"Mol\"/ 0.964602 0.164706 [*]SCCO.[*][N+](=O)[O-] \"Mol\"/\n", "6250 \"Mol\"/ \"Mol\"/ 0.952096 0.172185 [*]c1ccccc1[*] \"Mol\"/\n", "24245 \"Mol\"/ \"Mol\"/ 0.950000 0.185687 [*]c1cccc[n+]1[O-].[*]C(C)c1cc(C)ccc1C \"Mol\"/\n", "15887 \"Mol\"/ \"Mol\"/ 0.950000 0.176136 [*][C@@H]1CCCNC1 \"Mol\"/\n", "17667 \"Mol\"/ \"Mol\"/ 0.949580 0.161392 [*]c1ccccc1.[*]N1C(=O)CNC1=O \"Mol\"/\n", "17500 \"Mol\"/ \"Mol\"/ 0.948718 0.156951 [*]CCCCCCCCCCC.[*]CP(=O)(OC)OC \"Mol\"/\n", "19213 \"Mol\"/ \"Mol\"/ 0.931034 0.190476 [*]NC(=N)CN.[*]C(=O)O \"Mol\"/\n", "21961 \"Mol\"/ \"Mol\"/ 0.929412 0.168790 [*]CSC#N.[*]c1ccccc1[*] \"Mol\"/\n", "15634 \"Mol\"/ \"Mol\"/ 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 \"Mol\"/\n", "17356 \"Mol\"/ \"Mol\"/ 0.925926 0.120805 [*]CCCCCC.[*]CC(N)=O \"Mol\"/\n", "19129 \"Mol\"/ \"Mol\"/ 0.919355 0.174863 [*]c1cncc(Cl)c1 \"Mol\"/\n", "13401 \"Mol\"/ \"Mol\"/ 0.918750 0.184275 [*]CC1CC1.[*]c1ccccc1Br \"Mol\"/\n", "22933 \"Mol\"/ \"Mol\"/ 0.916667 0.183784 [*]CC#C.[*]c1ncccn1 \"Mol\"/\n", "4404 \"Mol\"/ \"Mol\"/ 0.907216 0.186667 [*]CSC.[*]c1ccccc1 \"Mol\"/\n", "12294 \"Mol\"/ \"Mol\"/ 0.894737 0.099010 [*]c1ccccc1.[*]N(C)C \"Mol\"/\n", "16786 \"Mol\"/ \"Mol\"/ 0.893617 0.112426 [*]c1sc[n+](C)c1C.[*]c1sc[n+](C)c1C \"Mol\"/\n", "13760 \"Mol\"/ \"Mol\"/ 0.887218 0.190283 [*]COC(N)=O.[*][N+](=O)[O-] \"Mol\"/\n", "4473 \"Mol\"/ \"Mol\"/ 0.885417 0.145985 [*]c1ccccc1.[*]c1ccccc1 \"Mol\"/\n", "19112 \"Mol\"/ \"Mol\"/ 0.883721 0.157598 [*]c1cn2ccsc2n1.[*]n1nc(C)cc1C \"Mol\"/\n", "17148 \"Mol\"/ \"Mol\"/ 0.882353 0.166667 [*]CCCCCCCCCCCCC \"Mol\"/\n", "6334 \"Mol\"/ \"Mol\"/ 0.882353 0.190678 [*]c1ccccc1[*] \"Mol\"/\n", "16077 \"Mol\"/ \"Mol\"/ 0.879518 0.123684 [*]c1nc(C)nn1[*].[*]c1ccccc1 \"Mol\"/\n", "8779 \"Mol\"/ \"Mol\"/ 0.878505 0.152685 [*]c1nc2nnnc-2c(O)n1[*] \"Mol\"/\n", "2002 \"Mol\"/ \"Mol\"/ 0.875000 0.154667 [*]c1ccco1.[*]c1ncnn1[*] \"Mol\"/\n", "5529 \"Mol\"/ \"Mol\"/ 0.875000 0.135714 [*]C(N)=O.[*]C(CC)CCCC \"Mol\"/\n", "15573 \"Mol\"/ \"Mol\"/ 0.859813 0.090196 [*]C(CSCCCCCCCCCCCCCCCC)OC.[*][n+]1ccsc1 \"Mol\"/\n", "17492 \"Mol\"/ \"Mol\"/ 0.858824 0.182573 [*]c1ccc2c[nH]nc2c1 \"Mol\"/\n", "20831 \"Mol\"/ \"Mol\"/ 0.858824 0.182573 [*]c1ccc2c[nH]nc2c1 \"Mol\"/\n", "2570 \"Mol\"/ \"Mol\"/ 0.853659 0.111842 [*]C(=O)CCCCCCC.[*]C(=O)CCCCCCC \"Mol\"/\n", "23156 \"Mol\"/ \"Mol\"/ 0.853659 0.130081 [*]CCCCCCCCC.[*]OC(=O)C=C \"Mol\"/\n", "13103 \"Mol\"/ \"Mol\"/ 0.853333 0.091892 [*]CCCCCCCCCCC.[*]C(N)=O \"Mol\"/\n", "18140 \"Mol\"/ \"Mol\"/ 0.851852 0.197101 [*]C(=O)OC(C)(C)C.[*]C(=O)OC(C)(C)C \"Mol\"/\n", "24087 \"Mol\"/ \"Mol\"/ 0.851351 0.180851 [*]c1c(C)ncn1[*] \"Mol\"/\n", "6051 \"Mol\"/ \"Mol\"/ 0.850000 0.182927 [*]C(=O)OCC.[*]C(=O)OCC \"Mol\"/\n", "7595 \"Mol\"/ \"Mol\"/ 0.839161 0.094955 [*]OC=O.[*]C(C(=O)O)C(=O)O \"Mol\"/\n", "3185 \"Mol\"/ \"Mol\"/ 0.838323 0.111111 [*]N1CCOCC1.[*]S(C)(=O)=O \"Mol\"/\n", "15940 \"Mol\"/ \"Mol\"/ 0.835821 0.127490 [*]/C=C(\\O)C(=O)O.[*]C(C)C \"Mol\"/\n", "20472 \"Mol\"/ \"Mol\"/ 0.831858 0.160458 [*]C(=O)Cc1ccsc1.[*]C(=O)NC1CCCCCC1 \"Mol\"/\n", "16457 \"Mol\"/ \"Mol\"/ 0.829787 0.197581 [*]C(=O)OCC.[*]C(=O)C(C)N1CCOCC1 \"Mol\"/\n", "1283 \"Mol\"/ \"Mol\"/ 0.828571 0.158301 [*]c1ccccc1.[*]C(C)C \"Mol\"/\n", "8629 \"Mol\"/ \"Mol\"/ 0.827273 0.183333 [*]CCC(N)=O.[*]NCc1ccccc1 \"Mol\"/\n", "10367 \"Mol\"/ \"Mol\"/ 0.827273 0.161812 [*]CCc1ccccc1.[*]C(CCS)C(=O)O \"Mol\"/\n", "9572 \"Mol\"/ \"Mol\"/ 0.826446 0.164080 [*]c1ccc(Cl)c(Cl)c1.[*]c1cc(Cl)c(Cl)cc1[*] \"Mol\"/\n", "13684 \"Mol\"/ \"Mol\"/ 0.824074 0.130233 [*]C1CN2CCC1C2 \"Mol\"/\n", "12972 \"Mol\"/ \"Mol\"/ 0.823529 0.154762 [*]c1c(Br)cnn1[*] \"Mol\"/\n", "22001 \"Mol\"/ \"Mol\"/ 0.822785 0.136364 [*]C(=O)O.[*]c1ncsc1[*] \"Mol\"/\n", "5087 \"Mol\"/ \"Mol\"/ 0.822222 0.173913 [*]CC(C)(C)C(=O)O \"Mol\"/\n", "19447 \"Mol\"/ \"Mol\"/ 0.816327 0.184507 [*]c1cc2ccccc2o1.[*]c1nnnn1C \"Mol\"/\n", "8002 \"Mol\"/ \"Mol\"/ 0.810345 0.115672 [*]CCCC.[*]C(O)(P(=O)(O)O)P(=O)(O)O \"Mol\"/\n", "18304 \"Mol\"/ \"Mol\"/ 0.809091 0.153226 [*]c1ccco1.[*]c1ccccn1 \"Mol\"/\n", "17612 \"Mol\"/ \"Mol\"/ 0.809091 0.162791 [*]c1ccco1.[*]c1ccncc1 \"Mol\"/\n", "225 \"Mol\"/ \"Mol\"/ 0.806122 0.198330 [*]c1ccccc1[*] \"Mol\"/\n", "6688 \"Mol\"/ \"Mol\"/ 0.804598 0.138122 [*]CCO.[*]c1ccccc1[*] \"Mol\"/\n", "16812 \"Mol\"/ \"Mol\"/ 0.802083 0.188732 [*]CCCCC.[*]c1nnc(N)s1 \"Mol\"/\n", "14552 \"Mol\"/ \"Mol\"/ 0.801980 0.157407 [*]NC(N)=S.[*]Oc1cccc2ccccc21 \"Mol\"/" ] } ], "prompt_number": 54 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a particularly nice example where a small change in the middle of the molecule (N->S) destroys what would otherwise be a fairly high RDKit similarity, but where Fraggle still produces a high score:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "subset[subset.index==15634]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mol1mol2FraggleRDKit5FragmentFragMol
15634 \"Mol\"/ \"Mol\"/ 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 \"Mol\"/
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 69, "text": [ " mol1 mol2 Fraggle RDKit5 Fragment FragMol\n", "15634 \"Mol\"/ \"Mol\"/ 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 \"Mol\"/" ] } ], "prompt_number": 69 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Demonstrate the disproportionate influence of the central S by replacing it with an N and repeating the similarity calculations" ] }, { "cell_type": "code", "collapsed": false, "input": [ "Chem.MolToSmiles(subset.ix[15634]['mol2'],True)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 70, "text": [ "'CCCSc1ncnc2[nH]ncc21'" ] } ], "prompt_number": 70 }, { "cell_type": "code", "collapsed": false, "input": [ "tmol = Chem.MolFromSmiles('CCCNc1ncnc2[nH]ncc21')\n", "fp1 = Chem.RDKFingerprint(subset.ix[15634]['mol1'],maxPath=5)\n", "fp2 = Chem.RDKFingerprint(tmol,maxPath=5)\n", "print 'RDKit5: ',DataStructs.TanimotoSimilarity(fp1,fp2)\n", "print 'Fraggle: ',FraggleSim.GetFraggleSimilarity(subset.ix[15634]['mol1'],tmol)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RDKit5: 0.501992031873\n", "Fraggle: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "(0.927710843373494, '[*]c1ncnc2[nH]cnc21')\n" ] } ], "prompt_number": 73 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The RDKit5 similarity is now well above the random threshold (0.29 and 95%), but there's no impact on Fraggle." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What about the cases where the Fraggle similarity is zero, but RDKit5 has a value?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "subset2 = df[df.RDKit5>.5][df.Fraggle<0.1]\n", "subset2.sort(columns=['RDKit5'],ascending=False,inplace=True)\n", "len(subset2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 59, "text": [ "38" ] } ], "prompt_number": 59 }, { "cell_type": "code", "collapsed": false, "input": [ "frags = []\n", "for row in subset2.itertuples():\n", " m1 = row[1]\n", " m2 = row[2]\n", " sim,frag= FraggleSim.GetFraggleSimilarity(m1,m2)\n", " frags.append(frag) \n", "subset2['Fragment']=frags\n", "subset2" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mol1mol2FraggleRDKit5Fragment
2958 \"Mol\"/ \"Mol\"/ 0 1.000000 None
4568 \"Mol\"/ \"Mol\"/ 0 1.000000 None
21906 \"Mol\"/ \"Mol\"/ 0 1.000000 None
11718 \"Mol\"/ \"Mol\"/ 0 1.000000 None
11745 \"Mol\"/ \"Mol\"/ 0 1.000000 None
1581 \"Mol\"/ \"Mol\"/ 0 0.987552 None
9838 \"Mol\"/ \"Mol\"/ 0 0.987552 None
20969 \"Mol\"/ \"Mol\"/ 0 0.987552 None
10615 \"Mol\"/ \"Mol\"/ 0 0.986063 None
16125 \"Mol\"/ \"Mol\"/ 0 0.881890 None
17498 \"Mol\"/ \"Mol\"/ 0 0.875472 None
22214 \"Mol\"/ \"Mol\"/ 0 0.849057 None
17812 \"Mol\"/ \"Mol\"/ 0 0.816901 None
22514 \"Mol\"/ \"Mol\"/ 0 0.798701 None
23507 \"Mol\"/ \"Mol\"/ 0 0.720000 None
19071 \"Mol\"/ \"Mol\"/ 0 0.714744 None
19067 \"Mol\"/ \"Mol\"/ 0 0.713948 None
7808 \"Mol\"/ \"Mol\"/ 0 0.711111 None
12405 \"Mol\"/ \"Mol\"/ 0 0.692000 None
21204 \"Mol\"/ \"Mol\"/ 0 0.687764 None
18018 \"Mol\"/ \"Mol\"/ 0 0.657895 None
4167 \"Mol\"/ \"Mol\"/ 0 0.654867 None
8750 \"Mol\"/ \"Mol\"/ 0 0.650000 None
7008 \"Mol\"/ \"Mol\"/ 0 0.630252 None
12990 \"Mol\"/ \"Mol\"/ 0 0.630000 None
8355 \"Mol\"/ \"Mol\"/ 0 0.623377 None
21753 \"Mol\"/ \"Mol\"/ 0 0.602941 None
9773 \"Mol\"/ \"Mol\"/ 0 0.598361 None
3180 \"Mol\"/ \"Mol\"/ 0 0.589474 None
19333 \"Mol\"/ \"Mol\"/ 0 0.576119 None
8083 \"Mol\"/ \"Mol\"/ 0 0.575758 None
22375 \"Mol\"/ \"Mol\"/ 0 0.559633 None
20954 \"Mol\"/ \"Mol\"/ 0 0.555556 None
5359 \"Mol\"/ \"Mol\"/ 0 0.548173 None
20305 \"Mol\"/ \"Mol\"/ 0 0.542601 None
22193 \"Mol\"/ \"Mol\"/ 0 0.542601 None
15618 \"Mol\"/ \"Mol\"/ 0 0.536585 None
12669 \"Mol\"/ \"Mol\"/ 0 0.531056 None
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 60, "text": [ " mol1 mol2 Fraggle RDKit5 Fragment\n", "2958 \"Mol\"/ \"Mol\"/ 0 1.000000 None\n", "4568 \"Mol\"/ \"Mol\"/ 0 1.000000 None\n", "21906 \"Mol\"/ \"Mol\"/ 0 1.000000 None\n", "11718 \"Mol\"/ \"Mol\"/ 0 1.000000 None\n", "11745 \"Mol\"/ \"Mol\"/ 0 1.000000 None\n", "1581 \"Mol\"/ \"Mol\"/ 0 0.987552 None\n", "9838 \"Mol\"/ \"Mol\"/ 0 0.987552 None\n", "20969 \"Mol\"/ \"Mol\"/ 0 0.987552 None\n", "10615 \"Mol\"/ \"Mol\"/ 0 0.986063 None\n", "16125 \"Mol\"/ \"Mol\"/ 0 0.881890 None\n", "17498 \"Mol\"/ \"Mol\"/ 0 0.875472 None\n", "22214 \"Mol\"/ \"Mol\"/ 0 0.849057 None\n", "17812 \"Mol\"/ \"Mol\"/ 0 0.816901 None\n", "22514 \"Mol\"/ \"Mol\"/ 0 0.798701 None\n", "23507 \"Mol\"/ \"Mol\"/ 0 0.720000 None\n", "19071 \"Mol\"/ \"Mol\"/ 0 0.714744 None\n", "19067 \"Mol\"/ \"Mol\"/ 0 0.713948 None\n", "7808 \"Mol\"/ \"Mol\"/ 0 0.711111 None\n", "12405 \"Mol\"/ \"Mol\"/ 0 0.692000 None\n", "21204 \"Mol\"/ \"Mol\"/ 0 0.687764 None\n", "18018 \"Mol\"/ \"Mol\"/ 0 0.657895 None\n", "4167 \"Mol\"/ \"Mol\"/ 0 0.654867 None\n", "8750 \"Mol\"/ \"Mol\"/ 0 0.650000 None\n", "7008 \"Mol\"/ \"Mol\"/ 0 0.630252 None\n", "12990 \"Mol\"/ \"Mol\"/ 0 0.630000 None\n", "8355 \"Mol\"/ \"Mol\"/ 0 0.623377 None\n", "21753 \"Mol\"/ \"Mol\"/ 0 0.602941 None\n", "9773 \"Mol\"/ \"Mol\"/ 0 0.598361 None\n", "3180 \"Mol\"/ \"Mol\"/ 0 0.589474 None\n", "19333 \"Mol\"/ \"Mol\"/ 0 0.576119 None\n", "8083 \"Mol\"/ \"Mol\"/ 0 0.575758 None\n", "22375 \"Mol\"/ \"Mol\"/ 0 0.559633 None\n", "20954 \"Mol\"/ \"Mol\"/ 0 0.555556 None\n", "5359 \"Mol\"/ \"Mol\"/ 0 0.548173 None\n", "20305 \"Mol\"/ \"Mol\"/ 0 0.542601 None\n", "22193 \"Mol\"/ \"Mol\"/ 0 0.542601 None\n", "15618 \"Mol\"/ \"Mol\"/ 0 0.536585 None\n", "12669 \"Mol\"/ \"Mol\"/ 0 0.531056 None" ] } ], "prompt_number": 60 }, { "cell_type": "markdown", "metadata": {}, "source": [ "At first these seem somewhat surprising, but it's just due to the fact that the molecules don't generate any fragments. Here is an example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "subset2.ix[21906]['mol1']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAYAAABNcIgQAAAZt0lEQVR4nO3de1jNeeIH8He6qV2X\nFFFU1qhzKjIpl/QY0iJ3nmXIbWYxWAaTMyS7iB0zboWdMRNmLNZ1HrJoMzYZptANTaObhkSI6kws\n3U7n/P4Y+jkjOqX6nMv79TzzePqe7znn3Xe+9e7zOd+LkUqlUoGIiMhANRMdgIiISCQWIRERGTQW\nIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRmcAdkDsLFg\no+gYWo3biAwJi5D0Tm5FLt7PfR92P9rB7LIZOqV1QtCdIBRUFoiOpjW4jYj+H4uQ9EpWWRa8MryQ\nVpqG8E7hSJAk4GvHr1GsKMYn9z8RHU8rcBsRqTMRHYCoIc3Lmwd7U3tclFyEqZFp9fLBLQejSFFU\n6/MLCgrQrl07GBkZNWZMod50GxHpG44ISW8UKgoR+zgWMluZ2i/456xNrF/53P/9738IDg5G586d\nERcX15gxhbr79G69txGRvmIRkt74ufxnqKCCpLlE4+eoVCrs2bMHEokE69atQ2lpKfbu3duIKcW6\nrbpd521EpO9YhGSwysvL0adPH0yfPh35+fnVyw8fPozS0lKByRqHUqmEUqkUHYNI67AISW90Me8C\nAMgsy9Ro/fv37yMxMfGl5SUlJYiKimrQbNogJSUFlsWWADTfRkSGgEVIesPGxAZ+LfywqWATFCrF\nS4/X5UAQfZwe9fb2hntH9wbbRkT6gkVIeuVzh8+RV5EHnywfHJYfxtWnV3H60Wm8l/seQu+Favw6\np06dQnFxcSMmbVqPHz9GXFwcjI2NG2wbEekLFiHpFWlzKZKlyZA2l2Lh7YXwzvTGjFszYNHMAsHt\ngzV+nYqKCuzfv78RkzatnJwcTJkyBQ8fPmywbUSkL4xUKpVKdAgiEW7dugUnJ6dXPt6rVy8kJCQ0\nXaBGVlpaioqKCrRq1Up0FCKtwhEh0SskJiYiM1P3Dyq5efMmvvjiC5iamrIEiWrAIiR6jX379omO\n8MaqqqoQGRkJT09PlJWViY5DpHU4NUoG686dO+jUqdNr13F0dMTNmzf14pJrR48eRe/evWFvby86\nCpFW4YiQDJZcLoelpeVr17l16xZ++OGHJkrU8JKSkrBkyRKUlJRg3LhxsLGxweeffw5PT09s2rQJ\nT58+FR2RSDgWIRmsbt26Yc+ePbWO9nT5nMJOnTpBLpdDIpEgIiICJiYmmD9/Pk6cOIHLly9DKpVC\nLpeLjkkkFKdGyeCtXr0aK1eufOXjLVu2xP3792FhYdGEqRrW1atXERQUhKKiIoSFhWHQoEEAgKys\nLLi4uAhORyQWR4Rk8P72t79h8uTJr3z80aNHOHnyZBMmahjHjh3DkCFDkJGRgR49eiA2NharVq3C\n7NmzMXr0aDx48IAlSAQWIRGMjIywc+dO9O3b95Xr6OL06PDhwzF8+HAMGDAACxcuhFwux9ixY5Ge\nno4hQ4bwVAqiZzg1SvRMQUEBevXqhby8vJceMzU1RX5+Ptq2bSsgWd2Vl5dDqVTCwsIChYWFWLFi\nBY4ePYqVK1figw8+gLGxseiIRFqDI0KiZ2xtbREdHV3jSKmyshIHDx4UkKp+Tpw4AVdXV3z77bew\nsbHBtm3bcPr0aURGRiI7O1t0PCKtwhEh0W9ER0dj5MiRqKqqUlvu7e1d422btFVycjIWLVqEsrIy\nbN68Gb6+vqIjEWkljgiJfiMgIADr1q17aXlSUpJOXHJNLpfjwYMH8PLywvnz5zF79myMHz8e8fHx\noqMRaSUWIVENFi9ejNmzZ7+0/F//+peANHVz9uxZuLm5YcOGDVAoFJg1axaysrJeezAQkSHj1CjR\nK1RWVmLo0KGIjY2tXubo6IgbN26gWTPt/hvy+vXrkMlkSE9Px759+9CrVy/RkYi0FouQ6DWKi4vR\np08fXL9+vXrZuXPn0L9/f4GpXi0rKwvFxcXVo7+YmBhIpVJeX5ToNbT7z1oiwdq0aYMTJ07Aysqq\nepk2n1N4584dvPvuu5g0aRJu3boFf39/liBRLViERLVwcXHBoUOHYGJiAgA4fPgwSktLBaeq2aBB\ng5CZmQmpVKoTB/YQaQNOjRJp6Ouvv8bMmTMBAIcOHcKECRMEJ1L33//+F3l5eXj//fe1/jNMIm3C\nnxYiDc2YMQMLFiwAoJ3To+3atcOuXbsQFhYmOgqRTuGIkKgOqqqqMGbMGJw6dQp37tyBra2t6Ehq\nVCoVKioqYG5uLjoKkc7giJCoDoyNjbF//35IpVIcPnxYdJxqX331FbZu3QojIyOWIFEdcURIVA+5\nubmYP3++1tyeqbi4GMXFxXjrrbdERyHSOSxConqKj4/H06dPUVlZKTRH586dIZVKhWYg0mUmogMQ\n6aq8vDyEhYXh1q1bQnM4Ojpi8eLFmDhxotAcRLqKI0Kienj06BGcnZ2xbNkyLFy4UGiWzZs347PP\nPkN2djZatmwpNAuRLuLBMkT18Nlnn6Ft27aYN2+e6CiYP38+bGxsarxjBhHVjiNCojq6fv063N3d\nERUVBX9/f9FxAPx6TdHhw4fjp59+QteuXUXHIdIpLEKiOho3bhwqKytx4sQJ0VHUjBgxAubm5jhy\n5IjoKEQ6hUVIVAenT5/GqFGjcO3aNXTp0kV0HDU5OTlwd3fH8ePHMXjwYNFxiHQGi5BIQwqFAh4e\nHhg2bBg2bNggOk6NZDIZoqOjkZqaWn2RcCJ6PR4sQ6ShiIgIyOVyrFixQnSUV1q5ciWKi4uxfft2\n0VGIdAZHhEQaKCoqQteuXbF+/frqO1Boqx07diA4OBjZ2dmwtrYWHYdI67EIiTSwYMECXLp0CZcu\nXdL6WxwplUr07t0bPj4+2LJli+g4RFqPRUhUi7S0NHh6euLcuXPw8fERHUcjFy5cwDvvvIMrV67A\n3d1ddBwircYiJKqFv78/2rZtiwMHDoiOUicTJ05EYWEhYmJiREch0mosQqLXOH78OAIDA5GZmYmO\nHTuKjlMnd+7cgUQiwYEDBzBy5EjRcYi0lnZ/2EEkUHl5OYKCgiCTyXSuBAGgY8eOWLx4MT766COU\nl5eLjkOktViERK+wZcsWKBQKLF26VHSUegsODoZCocDWrVtFRyHSWpwaJarBvXv34OLigoiICEya\nNEl0nDeyf/9+zJkzB1lZWejQoYPoOERah0VIVIMZM2YgJycH586dEx3ljalUKrzzzjtwdnbGzp07\nRcch0josQqLfSExMRL9+/ZCYmIi3335bdJwGceXKFfTq1QsXLlyAt7e36DhEWoVFSPQClUqFfv36\nwdXVVe9GTzNmzEBGRgbi4+NhZGQkOg6R1uDBMkQvOHDgANLT07F27VrRURrcp59+ivT0dBw8eFB0\nFCKtwiIkeubJkydYsmQJli9fjnbt2omO0+DatWuHkJAQLFmyBE+ePBEdh0hrsAiJnlm/fj1+97vf\nYeHChaKjNJpFixbB0tJSa28jRSQCPyMkApCbmwtXV1ccOnRI76/Ccvz4cUycOBEZGRlwdHQUHYdI\nOBYhNakB2QMwotUIzLacjcDAQHTr1g3du3dHt27d4OzsDFNTUyG5Jk6ciJKSEkRHRwt5/6YWEBCA\n1q1ba/X1U5/vKzJbmegopOd4C2tqULkVuQi9G4rvHn2HQkUhbE1tMd5qPJbaLoWtqW31elVVVTh5\n8iROnjyp9vwOHTqgZ8+ecHNzg6urK3r27AmpVNqotz46d+4cjh49itTU1EZ7D20TFhYGDw8PnD9/\nHv3792+091EqlcjIyEBKSgrS09Nx7do1hISEoG/fvhrvK0SNjSNCajBZZVnol9UPTmZO+Lj9x3A2\nd8ZDxUPsL96PlsYtsbXT1uq/8meaz4SVlZVGr2tmZoa33npLrSC9vb3Rvn37N85cVVUFLy8v9O/f\n3+Du3bdgwQLExcUhKSkJxsbGb/x69+/fR1JSUnXhpaSkICcnBxUVFWrrff/992jfu73G+wpHhNTY\nOCKkBjMvbx7sTe1xUXIRpkb/P8U5uOVgFCmK6v26FRUVSE9PR3p6utpyKyur6lHj84L09PSEpaWl\nxq/9zTff4M6dO1i1alW98+mq0NBQODs7Y9euXZg5c6bGz3v69CkuX76sVnjp6emQy+Uav0Zj7StE\n9cEipAZRqChE7ONY7HbarfaL7TlrE+sGf0+5XI74+HjEx8dXLzMxMYGDg8NLBVnT9Oovv/yCv/71\nrwgNDdV4dKpPrKyssGrVKixfvhzjx49Hq1at1B5/Pq3528LLy8uDQqGo9/uWGJU0+b5C9DosQmoQ\nP5f/DBVUkDSXCM2hUChw48YN3LhxQ+3zxxYtWsDZ2VmtIBMTE9G6dWvMmjVLYGKxPvjgA2zZsgXb\ntm2Dt7e3WuFlZ2fj8ePHDf6ed5vd1Yp9heg5nkdIBuH5CMbY2BgKhQIqlQpKpVJwKu2hVCqhVCqh\nUCiqPy98k1EfkS7hiJAaRBfzLgCAzLJMeP9O3EWdLSws4OnpWT0l+vxfOzu7l9b19fXFjh07sGXL\nFshkhnlAxubNm1FRUYGgoCBYWFhg8ODBao/fvXu3emr0+b+XL19GaWlpvd/TTvnr/wvR+wrRcyxC\nahA2Jjbwa+GHTQWbMKnNJJgYqe9aRYqiBv3sp1mzZpBIJGqnWbi5ucHBwQEmJprt1hYWFli7di3m\nzp2LyZMnG9y9+u7du4c1a9bgq6++goWFRY3r2NnZwc7ODv7+/tXLFAoF8vLy1KZRr127hszMTI1G\n2a1UrZp0XyGqDU+foAaTUZaBfpn98FbztyCzlcHZ3BkPFA/e+PSJms4t7Nq1K8zMzN448/N79bm4\nuGDHjh1v/Hq6ZObMmcjOzsa5c+ca5G4UFRUVuH79ulo5pqSk4N69e2rrff/992jXu53G+wpPn6DG\nxhEhNRhpcymSpckIvReKhbcXolBRiPam7TGi1QgEtw+u9fm///3v4eHhoVZ6bm5ujXpEp5GRETZv\n3ozevXtj9uzZ8PLyarT30iZJSUnYvXs3EhMTG+yWTGZmZnBzc4Obm5vacrlcrja12qpVqzfeV4ga\nEkeEJMTjx48RGBgId3d3dO/eHe7u7pBIJMIusfbnP/8ZWVlZiIuL0/t79alUKvj6+sLFxQXffPON\n6DhEwrEIiQAUFBTA2dkZ27dvx7vvvis6TqM6ePAg5syZg6ysLNja8lJmRDx9ggiAra0tli1bBplM\nptf36nvy5Ak+/vhjLFu2jCVI9AyLkOiZ56cQbNy4UXSURrNhwwZYWlrio48+Eh2FSGtwapToBceO\nHUNgYKBe3qvv1q1bkEqlOHDgAEaPHi06DpHWYBES/cbQoUPRpk0b7N+/X3SUBjVp0iTI5XKcOnVK\ndBQircIiJPqN9PR09OjRAzExMY16r76mdP78efj7+yM1NRVSqVR0HCKtwiIkqsGHH36I+Ph4JCcn\nN+pNgZtCVVUVvL294evri61bt4qOQ6R1dPsnnKiRrF69Grdv38auXbtER3lj//znP3H79m2EhoaK\njkKklViERDWwsrLCypUrERISgpKSEtFx6q2kpAQhISFYtWqVQd5zkUgTnBoleoWqqiq8/fbbCAgI\nwLp160THqZclS5bgu+++w+XLl6tvr0RE6liERK8RGxuLgIAApKWlwdnZWXScOsnOzka3bt0QHR0N\nPz8/0XGItBaLkKgWY8aMgUqlwr///W/RUepk1KhRMDY2RmRkpOgoRFqNRUhUixs3bsDV1RXHjh3D\n0KFDRcfRyKlTpzB27Fhcu3YNf/jDH0THIdJqLEIiDQQHB+P48eNITU0VdocMTVVWVsLDwwOjR4/G\np59+KjoOkdbjUaNEGli+fDl++eUXfPnll6Kj1Grbtm0oKSnB8uXLRUch0gkcERJpaNeuXVi8eDGy\ns7NhY2MjOk6NCgsL4ezsjLCwMLz33nui4xDpBBYhkYaUSiX69u0LLy8vfPHFF6Lj1Ogvf/kLUlJS\ncOnSJb2/wTBRQ2EREtXBxYsX0b9/f6SkpKB79+6i46hJTU2Fl5cXfvjhB/Tp00d0HCKdwSIkqqMp\nU6bg7t27iI2NFR1FjZ+fH+zt7bF3717RUYh0CouQqI7y8/Ph4uKCvXv3YuzYsaLjAACOHj2K6dOn\nIzMzE/b29qLjEOkUHjVKVEf29vZYunQpZDIZysrKRMdBWVkZZDIZli5dyhIkqgcWIVE9fPzxx1Aq\nlQgPDxcdBWFhYVCpVJDJZKKjEOkkTo0S1dOBAwcQHh6O3NxcoTkcHBwgk8kwceJEoTmIdBWLkOgN\nZGRk4ObNm0IzmJmZoaKiAsOGDROag0hXmYgOQKSLcnJy0KZNG0ilUkilUtFxEBAQAAcHB7i7u4uO\nQqRz+BkhUT385z//wbfffis6RrXRo0dj1KhRePDggegoRDqHU6NEdVBeXg4zMzOtu2qLXC5Hhw4d\n4O3tjZiYGJibm4uORKQzOCIkqoOtW7eif//++PHHH0VHUWNlZYVhw4YhLi4Os2fPFh2HSKewCInq\nYPHixZg+fbpWTkFOnToVALB7926sX79ecBoi3cGpUSINfPfdd0hISIBMJoOlpaXoODWqqKiAvb09\nCgsL0axZMxw5cgRjxowRHYtI63FESKQBiUSCjIwMSCQSnDlzRnScGpmZmWH8+PEAfr1TRmBgIJKS\nkgSnItJ+HBESvUZ+fj4yMzMxaNAgAL/efcLa2hrOzs6Ck9Xs4sWL8PHxqf7azs4OiYmJvPQa0Wuw\nCIleIyEhAZMnT4a7uzs2bNiArl27io5UKxcXF2RnZ1d/3bNnT5w/f15rp3SJROPUKFENlEolHj16\nhN69e+PatWvw8fGBj48PIiMjRUerVWBgoNrXKSkpmDZtGvg3L1HNWIRENbhw4QIkEgl27twJU1NT\nLFmyBD/99BMGDBggOlqtpk+f/tJ5jkeOHMGqVavEBCLScixCohr4+vri7NmziIyMhKurK6KiomBr\nawsrKyvR0Wrl5OSk9jnhc2vWrMG+ffsEJCLSbixCohdkZGRgyJAhSEtLg4uLC6KiorB69WrMmzcP\nR44cER1PY8/PKXyRSqXCzJkzcfHiRQGJiLQXD5YhekFVVRW2b9+O0NBQjBs3DmvWrIG1tTVKS0vR\nrFkznbl02fNLrpWXl7/0mK2tLRITE+Hg4CAgGZH24YiQCL9eQ/TLL79EVVUV5s6di4yMDJiamsLV\n1RVbt26FqampzpQg8Osl14YPH17jYwUFBQgICEBJSUkTpyLSTixCIgAlJSWIjo6Gm5sbIiMjYWVl\nhS1btuDs2bOIiopCVFSU6Ih1VtP06HPp6emYNGkSqqqqmjARkXbi1CgZvKysLLi4uAAAYmJiEBQU\nBBsbG4SHh8PDw0Nwuvp78ZJrryKTybBhw4YmTEWkfTgiJINWXFyMP/7xj5gyZQry8/Ph7++PK1eu\nYMKECRg6dChmzZqFgoIC0THrxczMDBMmTHjtOhs3bkREREQTJSLSTixCMmht2rRBRkYGOnfujB49\nemDt2rWorKzEnDlzkJmZidatW+P27duiY9bbb6dHO3bsCCcnJ7X/Nm7ciAsXLghKSCQep0bJYN27\ndw9nzpzB5MmTYWRkhJs3b0Imk+Hq1avYuHEjxo4dKzpig5BIJMjKygIA5ObmwtHRUXAiIu3CESEZ\nrDZt2uAf//gHfHx8kJCQgM6dO+PIkSPYsWMHVqxYAX9/f+Tk5IiO+cZ+e8k1IlLHIiSDMyB7ADYW\nbIS5uTkuXbqEuXPn4k9/+hOmTp2K/Px8+Pn54cqVKxg7dixMTExEx31j06ZNe+mSa7V5vo2IDAGL\nkPRObkUu3s99H3Y/2sHsshk6pXVC0J0gFFSqH/Qil8tRUlKCadOmITMzE46OjujRowfWrFmDyspK\nzJs3D05OTmK+iQbk5OSEfv36qS3TdBsRGQIWIemVrLIseGV4Ia00DeGdwpEgScDXjl+jWFGMT+5/\norauSqWCVCpFREQEmjdvjr///e9ISkpCWloapFIprl69KuabaAQvHjRTl21EZAh4sAzpFf9sfzxU\nPESyNBmmRqZqjxUpimBtYo0B2QMwotUIyGxlSE1NxaJFiyCXyxEeHo6BAwcCAOLi4uDh4YEWLVqI\n+DYaXFFREezs7JCdnY0Z5TPqtI2I9J3ufwBC9EyhohCxj2Ox22n3S7/gAcDaxFrt6/Lycnh4eODs\n2bOIiYnBhx9+iA4dOiA8PBy+vr5NFbtJWFtbY+TIkShWFtdpGxEZAk6Nkt74ufxnqKCCpLlEo/Xv\n37+POXPm4OHDh/D390dycjIGDhyIgQMHIjk5uZHTNr2pU6ciT5lXp21EZAhYhGSwmjdvjoiICDg7\nOyM8PBzGxsYICQlBWloaPD09RcdrcMOGDavz0aNEhoBFSHqji3kXAEBmWaZG69va2uLMmTNwcHBA\nUFAQunTpgj179qB9+/Zo1kz/fjRMTU0hbSEFoPk2IjIE+vfTTgbLxsQGfi38sKlgExQqxUuPFymK\nXlrm5+eHpKQkbNq0CY8fP8b06dORkJDQFHGFcGjpUOdtRKTvWISkVz53+Bx5FXnwyfLBYflhXH16\nFacfncZ7ue8h9F5ojc8xMzNDUFAQsrOz8cknn8Db27uJUzcdc3Pzem0jIn3GIiS9Im0uRbI0GdLm\nUiy8vRDemd6YcWsGLJpZILh98Guf27ZtW4SEhOjltOiL3mQbEekjnkdIREQGTb//9CUiIqoFi5CI\niAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwa\ni5CIiAza/wFf9ulLiNI27wAAAABJRU5ErkJggg==\n", "prompt_number": 66, "text": [ "" ] } ], "prompt_number": 66 }, { "cell_type": "code", "collapsed": false, "input": [ "FraggleSim.generate_fraggle_fragmentation(subset2.ix[21906]['mol1'])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 67, "text": [ "set()" ] } ], "prompt_number": 67 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# What about how different the compounds are?\n", "\n", "This is repeating the last bit of analysis from http://rdkit.blogspot.ch/2013/10/comparing-fingerprints-to-each-other.html" ] }, { "cell_type": "code", "collapsed": false, "input": [ "nToDo=200\n", "apl = sorted(scoredLists['AP'],reverse=True)[:nToDo]\n", "ttl = sorted(scoredLists['TT'],reverse=True)[:nToDo]\n", "avl = sorted(scoredLists['Avalon-1024'],reverse=True)[:nToDo]\n", "rdkl = sorted(scoredLists['RDKit5'],reverse=True)[:nToDo]\n", "fragl = sorted(scoredLists['Fraggle'],reverse=True)[:nToDo]\n", "\n", "idsToKeep=set()\n", "idsToKeep.update([x[1] for x in apl])\n", "idsToKeep.update([x[1] for x in ttl])\n", "idsToKeep.update([x[1] for x in avl])\n", "idsToKeep.update([x[1] for x in rdkl])\n", "idsToKeep.update([x[1] for x in fragl])\n", "\n", "print 'Overall number:',len(idsToKeep)\n", "ids={}\n", "ids['AP']=set([x[1] for x in apl])\n", "ids['TT']=set([x[1] for x in ttl])\n", "ids['Avalon-1024']=set([x[1] for x in avl])\n", "ids['RDKit5']=set([x[1] for x in rdkl])\n", "ids['Fraggle']=set([x[1] for x in fragl])\n", "\n", "\n", "ks = sorted(ids.keys())\n", "for i,k in enumerate(ks):\n", " for j in range(i+1,len(ks)):\n", " overlap=len(ids[k].intersection(ids[ks[j]]))\n", " print ks[i],ks[j],overlap,'%.2f'%(float(overlap)/len(apl))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Overall number: 475\n", "AP Avalon-1024 102 0.51\n", "AP Fraggle 77 0.39\n", "AP RDKit5 112 0.56\n", "AP TT 137 0.69\n", "Avalon-1024 Fraggle 68 0.34\n", "Avalon-1024 RDKit5 125 0.62\n", "Avalon-1024 TT 111 0.56\n", "Fraggle RDKit5 82 0.41\n", "Fraggle TT 70 0.35\n", "RDKit5 TT 117 0.58\n" ] } ], "prompt_number": 78 }, { "cell_type": "code", "collapsed": false, "input": [ "nToDo=100\n", "apl = sorted(scoredLists['AP'],reverse=True)[:nToDo]\n", "ttl = sorted(scoredLists['TT'],reverse=True)[:nToDo]\n", "avl = sorted(scoredLists['Avalon-1024'],reverse=True)[:nToDo]\n", "rdkl = sorted(scoredLists['RDKit5'],reverse=True)[:nToDo]\n", "fragl = sorted(scoredLists['Fraggle'],reverse=True)[:nToDo]\n", "\n", "idsToKeep=set()\n", "idsToKeep.update([x[1] for x in apl])\n", "idsToKeep.update([x[1] for x in ttl])\n", "idsToKeep.update([x[1] for x in avl])\n", "idsToKeep.update([x[1] for x in rdkl])\n", "idsToKeep.update([x[1] for x in fragl])\n", "\n", "print 'Overall number:',len(idsToKeep)\n", "ids={}\n", "ids['AP']=set([x[1] for x in apl])\n", "ids['TT']=set([x[1] for x in ttl])\n", "ids['Avalon-1024']=set([x[1] for x in avl])\n", "ids['RDKit5']=set([x[1] for x in rdkl])\n", "ids['Fraggle']=set([x[1] for x in fragl])\n", "\n", "\n", "ks = sorted(ids.keys())\n", "for i,k in enumerate(ks):\n", " for j in range(i+1,len(ks)):\n", " overlap=len(ids[k].intersection(ids[ks[j]]))\n", " print ks[i],ks[j],overlap,'%.2f'%(float(overlap)/len(apl))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Overall number: 240\n", "AP Avalon-1024 58 0.58\n", "AP Fraggle 18 0.18\n", "AP RDKit5 69 0.69\n", "AP TT 86 0.86\n", "Avalon-1024 Fraggle 16 0.16\n", "Avalon-1024 RDKit5 56 0.56\n", "Avalon-1024 TT 60 0.60\n", "Fraggle RDKit5 24 0.24\n", "Fraggle TT 21 0.21\n", "RDKit5 TT 70 0.70\n" ] } ], "prompt_number": 79 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fraggle is really pulling back different compounds." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }