{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparing Fraggle to other fingerprints\n",
"\n",
"The Fraggle similarity algorithm from Jameed Hussain and Gavin Harper is available in the RDKit since the 2013_09 release.\n",
"\n",
"The algorithm, which is described here: https://github.com/rdkit/UGM_2013/blob/master/Presentations/Hussain.Fraggle.pdf?raw=true , uses the similarity between fragments of the query molecule and the database molecule and is an interesting complement to standard fingerprint similiarity.\n",
"\n",
"Here I will take a look at Fraggle using the same tools I applied to the other fingerprinting methods in these two posts:\n",
"\n",
"http://rdkit.blogspot.ch/2013/10/fingerprint-thresholds.html\n",
"\n",
"http://rdkit.blogspot.ch/2013/10/comparing-fingerprints-to-each-other.html\n",
"\n",
"## TL;DR Summary\n",
"\n",
"The baseline similarity values for Fraggle are quite high:\n",
"\n",
"
\n",
"Fingerprint | Metric | 90% level | 95% level | 99% level |
\n",
"\n",
" Fraggle | | \n",
" 0.483 | \n",
" 0.538 | \n",
" 0.650 | \n",
"
\n",
"
\n",
"\n",
"As expected from the definition, Fraggle similarity tends to be higher than RDKit5 similarity:\n",
"\n",
"\n",
"This is a nice example of a case where the RDKit5 fingerprint says the molecules are quite dissimilar, but Fraggle provides the expected high similarity score:\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mol1 | \n",
" mol2 | \n",
" Fraggle | \n",
" RDKit5 | \n",
" Fragment | \n",
" FragMol | \n",
"
\n",
" \n",
" \n",
" \n",
" 15634 | \n",
" | \n",
" | \n",
" 0.927711 | \n",
" 0.191693 | \n",
" [*]c1ncnc2[nH]cnc21 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
\n",
"\n",
"Another interesting point about Fraggle is that it pulls back compounds that are quite complementary to the other methods we've looked at. To demonstrate, here is the percent overlap in the top 100 pairs found by Fraggle and a few other fingerprints:\n",
"\n",
"\n",
"\n",
"Fingerprint 1 | Fingerprint 2 | Fraction in common (top 100) |
\n",
"Fraggle | AP | 0.18 |
\n",
"Fraggle | Avalon-1024 | 0.16 |
\n",
"Fraggle | RDKit5 | 0.24 |
\n",
"Fraggle | TT | 0.21 |
\n",
"AP | Avalon-1024 | 0.58 |
\n",
"AP | RDKit5 | 0.69 |
\n",
"AP | TT | 0.86 |
\n",
"Avalon-1024 | RDKit5 | 0.56 |
\n",
"Avalon-1024 | TT | 0.60 |
\n",
"RDKit5 | TT | 0.70 |
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Move on to actually do the work"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from rdkit import Chem\n",
"from rdkit.Chem import rdMolDescriptors\n",
"from rdkit.Avalon import pyAvalonTools\n",
"from rdkit.Chem import Draw\n",
"from rdkit.Chem.Fraggle import FraggleSim\n",
"from rdkit.Chem.Draw import IPythonConsole\n",
"from rdkit import rdBase\n",
"from rdkit import DataStructs\n",
"from collections import defaultdict\n",
"import cPickle,random,gzip,time\n",
"import scipy as sp\n",
"import pandas\n",
"from rdkit.Chem import PandasTools\n",
"PandasTools.RenderImagesInAllDataFrames()\n",
"from scipy import stats\n",
"from IPython.core.display import display,HTML,Javascript\n",
"\n",
"print rdBase.rdkitVersion\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2014.03.1pre\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Start with finding the baseline similarity value\n",
"\n",
"## read in the molecule pairs and shuffle them so that we have random pairs"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ind = [x.split() for x in gzip.open('../data/chembl16_25K.pairs.txt.gz')]\n",
"ms1 = []\n",
"ms2 = []\n",
"for i,row in enumerate(ind):\n",
" m1 = Chem.MolFromSmiles(row[1])\n",
" ms1.append((row[0],m1))\n",
" m2 = Chem.MolFromSmiles(row[3])\n",
" ms2.append((row[2],m2))\n",
" "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"random.seed(23)\n",
"random.shuffle(ms2)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"t1=time.time()\n",
"sims=[]\n",
"for i,(m1,m2) in enumerate(zip(ms1,ms2)):\n",
" sim,frag= FraggleSim.GetFraggleSimilarity(m1[-1],m2[-1])\n",
" sims.append((sim,i))\n",
" if not (i%200):\n",
" print 'Done: %d in %.2f seconds'%(i,time.time()-t1)\n",
"t2=time.time()\n",
"print 'Finished in %.2f seconds'%(t2-t1)\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"cPickle.dump(sims,gzip.open('../data/chembl16_25K.fraggle_randompairs.sims.pkl.gz','wb+'))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Here's the analysis"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sl = sorted(sims)\n",
"np = len(sl)\n",
"for bin in (.7,.8,.9,.95,.99):\n",
" print bin,sl[int(bin*np)]\n",
"hist([x[0] for x in sims],bins=20)\n",
"xlabel(\"Fraggle\")\n",
" "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.7 (0.37727272727272726, 11580)\n",
"0.8 (0.4196078431372549, 17489)\n",
"0.9 (0.4826254826254826, 393)\n",
"0.95 (0.5377358490566038, 3077)\n",
"0.99 (0.65, 17818)\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
""
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEMCAYAAADNtWEcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X9Mm/eBx/G3W3xaf4Q0uQ3Ts7ORBafUhRD3MifV1skd\nJUmpiphScWW3Qtok25HuSn9Ip/VObeGkBe666pSlh5SryI2y3SDaXSHqEotpV/e2tnNaStqtri5u\nj3S2MahZQkNTMpLw3B8sz0J+gLHBEJ7PS7LkPH6+z/f7fGOez+Pn19dmGIaBiIhYzlVz3QAREZkb\nCgAREYtSAIiIWJQCQETEohQAIiIWpQAQEbGopALg7NmzeL1e7rnnHgDq6+txuVx4vV68Xi8HDhww\n521sbMTtdlNQUEB3d7c5vaenh6KiItxuN3V1dTO8GiIiMl1JBcDOnTvxeDzYbDYAbDYbjz32GL29\nvfT29nLXXXcBEA6H6ejoIBwOEwgE2L59O+duM6itraWlpYVIJEIkEiEQCMzSKomISDKmDIBYLMb+\n/fvZunWruTE3DINL3T/W1dVFVVUVdrudvLw88vPzCYVCJBIJhoeH8fl8AFRXV9PZ2TnDqyIiItMx\nZQA8+uijPPPMM1x11Z9mtdls7Nq1i+LiYrZs2cLQ0BAA/f39uFwucz6Xy0U8Hr9outPpJB6Pz+R6\niIjINGVN9uFLL71ETk4OXq+XYDBoTq+treWpp54C4Mknn+Txxx+npaVlRhp07jCTiIgkL5Wn+kz6\nC+C1115j3759LF++nKqqKv77v/+b6upqcnJysNls2Gw2tm7dysGDB4HxPftoNGqWj8ViuFwunE4n\nsVhswnSn0znpilj99fTTT895G+bLS32hvlBfTP5K1aQBsGPHDqLRKH19fbS3t/O1r32NF154gUQi\nYc7z4osvUlRUBEB5eTnt7e2Mjo7S19dHJBLB5/ORm5tLdnY2oVAIwzBoa2ujoqIi5UaLiEj6Jj0E\ndD7DMMzDM3/3d3/H22+/jc1mY/ny5ezevRsAj8dDZWUlHo+HrKwsmpubzTLNzc1s3ryZkZERysrK\n2Lhx4yysjoiIJMtmpPP7YRbYbLa0ftIsFMFgEL/fP9fNmBfUF3+ivvgT9cWfpLrdVACIiFzhUt1u\n6lEQIiIWpQAQEbEoBYCQnb3UvKx3uq/s7KVz3XwRSZHOAcgfr9RKtc/1/yUy13QOQEREpkUBICJi\nUQoAERGLUgCIiFiUAkBExKIUACIiFqUAEBGxKAWAiIhFKQBERCxKASAiYlEKABERi0oqAM6ePYvX\n6+Wee+4B4NixY5SWlrJy5UrWr1/P0NCQOW9jYyNut5uCggK6u7vN6T09PRQVFeF2u6mrq5vh1RAR\nkelKKgB27tyJx+Mxh3dsamqitLSUw4cPU1JSQlNTEwDhcJiOjg7C4TCBQIDt27ebDyiqra2lpaWF\nSCRCJBIhEAjM0iqJiEgypgyAWCzG/v372bp1q7kx37dvHzU1NQDU1NTQ2dkJQFdXF1VVVdjtdvLy\n8sjPzycUCpFIJBgeHsbn8wFQXV1tlhERkbkx5aDwjz76KM888wwnTpwwpw0ODuJwOABwOBwMDg4C\n0N/fz7p168z5XC4X8Xgcu92Oy+UypzudTuLx+GXrrK+vN9/7/X6N+ykicp5gMEgwGEx7OZMGwEsv\nvUROTg5er/eylZ0bGGQmnR8AIiIy0YU7xg0NDSktZ9IAeO2119i3bx/79+/n1KlTnDhxgvvvvx+H\nw8HAwAC5ubkkEglycnKA8T37aDRqlo/FYrhcLpxOJ7FYbMJ0p9OZUoNFRGRmTHoOYMeOHUSjUfr6\n+mhvb+drX/sabW1tlJeX09raCkBraysVFRUAlJeX097ezujoKH19fUQiEXw+H7m5uWRnZxMKhTAM\ng7a2NrOMiIjMjSnPAZzv3KGe7373u1RWVtLS0kJeXh579+4FwOPxUFlZicfjISsri+bmZrNMc3Mz\nmzdvZmRkhLKyMjZu3DjDqyIiItOhMYFFYwKLXOE0JrCIiEyLAkBExKIUACIiFqUAEBGxKAWAiIhF\nKQBERCxKASAiYlEKABERi5rWncAyP2VnL2V4+Pgc1Z6V1sMAFy1awokTx2awPSKSLN0JvACkdycv\nQHp3Aqdbt/6/RdKjO4FFRGRaFAAiIhalABARsSgFgIiIRSkAREQsSgEgImJRkwbAqVOnWLt2LatX\nr8bj8fDEE08A44O2u1wuvF4vXq+XAwcOmGUaGxtxu90UFBTQ3d1tTu/p6aGoqAi3201dXd0srY6I\niCRryvsAPv30U6699lrOnDnDV77yFb7//e/zi1/8gkWLFvHYY49NmDccDvONb3yDN954g3g8zp13\n3kkkEsFms+Hz+Xjuuefw+XyUlZXx8MMPX3JYSN0HMH26D0DE2mbtPoBrr70WgNHRUc6ePcuSJUsA\nLllZV1cXVVVV2O128vLyyM/PJxQKkUgkGB4exufzAVBdXU1nZ+e0GysiIjNnykdBjI2Nceutt/LB\nBx9QW1vLLbfcwk9/+lN27drFCy+8wJo1a3j22We54YYb6O/vZ926dWZZl8tFPB7HbrfjcrnM6U6n\nk3g8ftk66+vrzfd+vx+/35/a2omILEDBYJBgMJj2cqYMgKuuuopDhw7x8ccfs2HDBoLBILW1tTz1\n1FMAPPnkkzz++OO0tLSk3Zhzzg8AERGZ6MId44aGhpSWk/RVQIsXL+buu+/mzTffJCcnB5vNhs1m\nY+vWrRw8eBAY37OPRqNmmVgshsvlwul0EovFJkx3Op0pNVhERGbGpAFw9OhRhoaGABgZGeHnP/85\nXq+XgYEBc54XX3yRoqIiAMrLy2lvb2d0dJS+vj4ikQg+n4/c3Fyys7MJhUIYhkFbWxsVFRWzuFoi\nIjKVSQ8BJRIJampqGBsbY2xsjPvvv5+SkhKqq6s5dOgQNpuN5cuXs3v3bgA8Hg+VlZV4PB6ysrJo\nbm42HxXc3NzM5s2bGRkZoays7JJXAImISObocdALgC4DFbE2PQ5aRESmRQEgImJRCgAREYtSAIiI\nWJQCQETEohQAIiIWpQAQEbEoBYCIiEUpAERELEoBICJiUQoAERGLUgCIiFiUAkBExKIUACIiFqUA\nEBGxKAWAiIhFTRoAp06dYu3ataxevRqPx8MTTzwBwLFjxygtLWXlypWsX7/eHDYSoLGxEbfbTUFB\nAd3d3eb0np4eioqKcLvd1NXVzdLqiIhIsiYNgM985jO8/PLLHDp0iHfeeYeXX36ZX/3qVzQ1NVFa\nWsrhw4cpKSmhqakJgHA4TEdHB+FwmEAgwPbt281Rampra2lpaSESiRCJRAgEArO/diIicllTHgK6\n9tprARgdHeXs2bMsWbKEffv2UVNTA0BNTQ2dnZ0AdHV1UVVVhd1uJy8vj/z8fEKhEIlEguHhYXw+\nHwDV1dVmGRERmRuTDgoPMDY2xq233soHH3xAbW0tt9xyC4ODgzgcDgAcDgeDg4MA9Pf3s27dOrOs\ny+UiHo9jt9txuVzmdKfTSTwev2yd9fX15nu/34/f75/ueomILFjBYJBgMJj2cqYMgKuuuopDhw7x\n8ccfs2HDBl5++eUJn9tstj8OSj5zzg8AERGZ6MId44aGhpSWk/RVQIsXL+buu++mp6cHh8PBwMAA\nAIlEgpycHGB8zz4ajZplYrEYLpcLp9NJLBabMN3pdKbUYBERmRmTBsDRo0fNK3xGRkb4+c9/jtfr\npby8nNbWVgBaW1upqKgAoLy8nPb2dkZHR+nr6yMSieDz+cjNzSU7O5tQKIRhGLS1tZllRERkbkx6\nCCiRSFBTU8PY2BhjY2Pcf//9lJSU4PV6qayspKWlhby8PPbu3QuAx+OhsrISj8dDVlYWzc3N5uGh\n5uZmNm/ezMjICGVlZWzcuHH2105ERC7LZpy7TnOesNlszLMmzXvjIZtOn6VTPv269f8tkp5Ut5u6\nE1hExKIUACIiFqUAEBGxKAWAiIhFKQBkjmWZNxNO95WdvXSuGy9yRdNVQAvAlX4VUDp167sioquA\nRERkmhQAIiIWpQAQEbEoBYCIiEUpAERELEoBICJiUQoAERGLUgCIiFiUAkBExKKmDIBoNModd9zB\nLbfcQmFhIT/4wQ+A8XF7XS4XXq8Xr9fLgQMHzDKNjY243W4KCgro7u42p/f09FBUVITb7aaurm4W\nVkdERJI15aMgBgYGGBgYYPXq1XzyySf85V/+JZ2dnezdu5dFixbx2GOPTZg/HA7zjW98gzfeeIN4\nPM6dd95JJBLBZrPh8/l47rnn8Pl8lJWV8fDDD180MpgeBTF9ehSEiLXN2qMgcnNzWb16NQDXX389\nN998M/F4HOCSFXZ1dVFVVYXdbicvL4/8/HxCoRCJRILh4WF8Ph8A1dXVdHZ2TrvBC1V29tKUH4om\nIpKKaZ0DOHLkCL29vaxbtw6AXbt2UVxczJYtW8zB4/v7+3G5XGYZl8tFPB6/aLrT6TSDRGB4+Djj\ne8KpvEREpm/SQeHP98knn3Dvvfeyc+dOrr/+empra3nqqacAePLJJ3n88cdpaWmZkUbV19eb7/1+\nP36/f0aWKyKyEASDQYLBYNrLSSoATp8+zaZNm/jmN79JRUUFADk5OebnW7du5Z577gHG9+yj0aj5\nWSwWw+Vy4XQ6icViE6Y7nc5L1nd+AIiIyEQX7hg3NDSktJwpDwEZhsGWLVvweDw88sgj5vREImG+\nf/HFFykqKgKgvLyc9vZ2RkdH6evrIxKJ4PP5yM3NJTs7m1AohGEYtLW1mWEiIiKZN+UvgFdffZUf\n/ehHrFq1Cq/XC8COHTv4yU9+wqFDh7DZbCxfvpzdu3cD4PF4qKysxOPxkJWVRXNzs3misrm5mc2b\nNzMyMkJZWdlFVwCJiEjmaESweSK9Szl1GaiIlWlEMBERmRYFgIiIRSkAREQsSgEgImJRCgAREYtS\nAIiIWJQCQETEohQAIiIWpQAQEbEoBYCIiEUpAERELEoBICJiUQoAERGLUgCIiFiUAkBExKIUACIi\nFjVlAESjUe644w5uueUWCgsL+cEPfgDAsWPHKC0tZeXKlaxfv56hoSGzTGNjI263m4KCArq7u83p\nPT09FBUV4Xa7qaurm4XVERGRZE0ZAHa7nX/5l3/h3Xff5de//jX/+q//ynvvvUdTUxOlpaUcPnyY\nkpISmpqaAAiHw3R0dBAOhwkEAmzfvt0cqaa2tpaWlhYikQiRSIRAIDC7ayciIpc1ZQDk5uayevVq\nAK6//npuvvlm4vE4+/bto6amBoCamho6OzsB6OrqoqqqCrvdTl5eHvn5+YRCIRKJBMPDw/h8PgCq\nq6vNMiIiknnTOgdw5MgRent7Wbt2LYODgzgcDgAcDgeDg4MA9Pf343K5zDIul4t4PH7RdKfTSTwe\nn4l1EBGRFGQlO+Mnn3zCpk2b2LlzJ4sWLZrwmc1m++Og5jOjvr7efO/3+/H7/TO2bBGRK10wGCQY\nDKa9nKQC4PTp02zatIn777+fiooKYHyvf2BggNzcXBKJBDk5OcD4nn00GjXLxmIxXC4XTqeTWCw2\nYbrT6bxkfecHgMjlZaW147Fo0RJOnDg2g+0RyYwLd4wbGhpSWs6Uh4AMw2DLli14PB4eeeQRc3p5\neTmtra0AtLa2msFQXl5Oe3s7o6Oj9PX1EYlE8Pl85Obmkp2dTSgUwjAM2trazDIiqTkDGCm/hoeP\nz0GbReYPm3HuEp3L+NWvfsVXv/pVVq1aZe5tNTY24vP5qKys5He/+x15eXns3buXG264AYAdO3aw\nZ88esrKy2LlzJxs2bADGLwPdvHkzIyMjlJWVmZeUTmiQzcYUTVqQxvs21fVOp2y65a/suq34XZOF\nJ9Xt5pQBkGkKgJRKp1E23fJXdt1W/K7JwpPqdlN3AouIWJQCQETEohQAIiIWpQAQEbEoBYCIiEUp\nAERELEoBICJiUQoAERGLUgCIiFiUAkBExKIUACIiFqUAEBGxKAWAiIhFKQBERCxKASAiYlEKABER\ni5oyAB588EEcDgdFRUXmtPr6elwuF16vF6/Xy4EDB8zPGhsbcbvdFBQU0N3dbU7v6emhqKgIt9tN\nXV3dDK+GiIhM15QB8MADDxAIBCZMs9lsPPbYY/T29tLb28tdd90FQDgcpqOjg3A4TCAQYPv27eYo\nNbW1tbS0tBCJRIhEIhctU0REMmvKALj99ttZsmTJRdMvNfxYV1cXVVVV2O128vLyyM/PJxQKkUgk\nGB4exufzAVBdXU1nZ+cMNF9ERFKVlWrBXbt28cILL7BmzRqeffZZbrjhBvr7+1m3bp05j8vlIh6P\nY7fbcblc5nSn00k8Hr/ssuvr6833fr8fv9+fajNFRBacYDBIMBhMezkpBUBtbS1PPfUUAE8++SSP\nP/44LS0taTfmnPMDQEREJrpwx7ihoSGl5aR0FVBOTg42mw2bzcbWrVs5ePAgML5nH41GzflisRgu\nlwun00ksFpsw3el0ptTg+So7e6nZJ6m8REQyLaUASCQS5vsXX3zRvEKovLyc9vZ2RkdH6evrIxKJ\n4PP5yM3NJTs7m1AohGEYtLW1UVFRMTNrME8MDx8HjDReIiKZNeUhoKqqKl555RWOHj3KsmXLaGho\nIBgMcujQIWw2G8uXL2f37t0AeDweKisr8Xg8ZGVl0dzcbO7dNjc3s3nzZkZGRigrK2Pjxo2zu2Yi\nIjIpm3Gpy3nmkM1mu+QVRvPdeNCl0+50yqvuVMtfid81kQulut3UncAiIhalABARsSgFgIiIRSkA\nxMKyUr5sNzt76Vw3XiRtKd8JLHLlO0OqJ5GHh3Xvhlz59AtARMSiFAAiIhalABARsSgFgIiIRSkA\nREQsSgEgImJRCgAREYtSAIiIWJQCQETEohQAIiIWpQAQEbGoKQPgwQcfxOFwmMM+Ahw7dozS0lJW\nrlzJ+vXrGRoaMj9rbGzE7XZTUFBAd3e3Ob2np4eioiLcbjd1dXUzvBoiIjJdUwbAAw88QCAQmDCt\nqamJ0tJSDh8+TElJCU1NTQCEw2E6OjoIh8MEAgG2b99ujlJTW1tLS0sLkUiESCRy0TJFRCSzpgyA\n22+/nSVLlkyYtm/fPmpqagCoqamhs7MTgK6uLqqqqrDb7eTl5ZGfn08oFCKRSDA8PIzP5wOgurra\nLCMiInMjpcdBDw4O4nA4AHA4HAwODgLQ39/PunXrzPlcLhfxeBy73Y7L5TKnO51O4vH4ZZdfX19v\nvvf7/fj9/lSaKSKyIAWDQYLBYNrLSXs8gHMDZMyk8wNAREQmunDHuKGhIaXlpHQVkMPhYGBgAIBE\nIkFOTg4wvmcfjUbN+WKxGC6XC6fTSSwWmzDd6XSm1GAREZkZKQVAeXk5ra2tALS2tlJRUWFOb29v\nZ3R0lL6+PiKRCD6fj9zcXLKzswmFQhiGQVtbm1lGRETmiDGF++67z7jxxhsNu91uuFwuY8+ePcbv\nf/97o6SkxHC73UZpaalx/Phxc/7vfe97xooVK4ybbrrJCAQC5vQ333zTKCwsNFasWGH87d/+7WXr\nS6JJ8xJggJHGK53yqnsu6haZL1L9Ptr+WHjesNlszLMmJWX8PEg67U6nvOqei7qvxO+pLEypbjd1\nJ7CIiEUpAERELEoBICJiUQoAkZRkmffApPLKzl461ysgkv6NYAtJdvZShoePz3Uz5IpwhnROQA8P\nz+zNkyKp0FVAF9R9pV6RorqvvLrn2Z+eXMF0FZCIiEyLAkBExKIUACIiFqUAEBGxKAWAiIhFKQBE\nRCxKASAiYlEKABERi1IAiIhYlAJARMSi0gqAvLw8Vq1ahdfrxefzAXDs2DFKS0tZuXIl69evZ2ho\nyJy/sbERt9tNQUEB3d3d6bVcRETSklYA2Gw2gsEgvb29HDx4EICmpiZKS0s5fPgwJSUlNDU1ARAO\nh+no6CAcDhMIBNi+fTtjY2Ppr4GIiKQk7UNAFz6AaN++fdTU1ABQU1NDZ2cnAF1dXVRVVWG328nL\nyyM/P98MDRERyby0Hgdts9m48847ufrqq/n2t7/Ntm3bGBwcxOFwAOBwOBgcHASgv7+fdevWmWVd\nLhfxePySy62vrzff+/1+/H5/Os0UEVlQgsEgwWAw7eWkFQCvvvoqN954Ix999BGlpaUUFBRM+Pzc\n4BeXc7nPzg8AERGZ6MId44aGhpSWk9YhoBtvvBGAz33uc3z961/n4MGDOBwOBgYGAEgkEuTk5ADg\ndDqJRqNm2VgshtPpTKd6kStY6iOKaTQxmSkpB8Cnn37K8PAwACdPnqS7u5uioiLKy8tpbW0FoLW1\nlYqKCgDKy8tpb29ndHSUvr4+IpGIeeWQiPWcG1Fs+i+NWiczJeVDQIODg3z9618H4MyZM/z1X/81\n69evZ82aNVRWVtLS0kJeXh579+4FwOPxUFlZicfjISsri+bm5kkPD4mIyOzSkJAX1G3V4QlV95VV\n9zz7s5U5piEhRURkWhQAIiIWpQAQEbGotO4DmG+ys5fqCgkRkSQtqAAY3/ine2JORMQadAhIRMSi\nFAAiV5zU7yLWncRyvgV1CEjEGs7dRZya4WEd6pRx+gUgImJR8/IXwNat35nrJoiILHjz8lEQsCuF\nkgHgZ1j10QCqW3VPp/w8+7OXNKX6KIh5+QsAUvkFMMR4AIiISDJ0DkDEcjQWgYybp78ARGT2pH4V\nka4gWlj0C0BExKIyHgCBQICCggLcbjf/9E//lOnqryDBuW7APBKc6wbMI8G5bsC8MRODoltdRgPg\n7NmzfOc73yEQCBAOh/nJT37Ce++9l8kmXEGCc92AeSQ41w2YR4JzXP/8uQtZAZC+jAbAwYMHyc/P\nJy8vD7vdzn333UdXV1cmmyAiaUl9LOPx8YyH50V4yLiMngSOx+MsW7bM/LfL5SIUCl003+LF90x7\n2adOHeYPf0ireSIy69I5AW2/aBzxhoaGpMsvWrSEEyeOpVR3uo+aT6fu2ZTRAEh2EPiPP34pnVrS\nKJtu+ZmuO/kv98Ja70u5XF8s9PVW3TNlePh40tughVT3ZDIaAE6nk2g0av47Go3icrkmzKM7FEVE\nMiOj5wDWrFlDJBLhyJEjjI6O0tHRQXl5eSabICIif5TRXwBZWVk899xzbNiwgbNnz7JlyxZuvvnm\nTDZBRET+KOP3Adx1113s3LmTrKws9uzZc9l7AR5++GHcbjfFxcX09vZmuJWZM9V9ET/+8Y8pLi5m\n1apVfPnLX+add96Zg1ZmRrL3iLzxxhtkZWXxX//1XxlsXWYl0xfBYBCv10thYSF+vz+zDcygqfri\n6NGjbNy4kdWrV1NYWMgPf/jDzDcyAx588EEcDgdFRUWXnWfa200jw86cOWOsWLHC6OvrM0ZHR43i\n4mIjHA5PmOdnP/uZcddddxmGYRi//vWvjbVr12a6mRmRTF+89tprxtDQkGEYhnHgwAFL98W5+e64\n4w7j7rvvNn7605/OQUtnXzJ9cfz4ccPj8RjRaNQwDMP46KOP5qKpsy6Zvnj66aeN7373u4ZhjPfD\n0qVLjdOnT89Fc2fV//zP/xhvvfWWUVhYeMnPU9luZvwXQDL3Auzbt4+amhoA1q5dy9DQEIODg5lu\n6qxLpi9uu+02Fi9eDIz3RSwWm4umzrpk7xHZtWsX9957L5/73OfmoJWZkUxf/Md//AebNm0yL6L4\n7Gc/OxdNnXXJ9MWNN97IiRMnADhx4gR//ud/TlbWwnvM2e23386SJUsu+3kq282MB8Cl7gWIx+NT\nzrMQN3zJ9MX5WlpaKCsry0TTMi7Z70VXVxe1tbVA8pcVX2mS6YtIJMKxY8e44447WLNmDW1tbZlu\nZkYk0xfbtm3j3Xff5S/+4i8oLi5m586dmW7mvJDKdjPjMZnsH61xweWgC/GPfTrr9PLLL7Nnzx5e\nffXVWWzR3EmmLx555BGamprMwS8u/I4sFMn0xenTp3nrrbf4xS9+waeffsptt93GunXrcLvdGWhh\n5iTTFzt27GD16tUEg0E++OADSktLefvtt1m0aFEGWji/THe7mfEASOZegAvnicViOJ3OjLUxU5Lp\nC4B33nmHbdu2EQgEJv0JeCVLpi96enq47777gPETfwcOHMButy+4S4mT6Ytly5bx2c9+lmuuuYZr\nrrmGr371q7z99tsLLgCS6YvXXnuNf/iHfwBgxYoVLF++nP/93/9lzZo1GW3rXEtpuzljZyiSdPr0\naeOLX/yi0dfXZ/zhD3+Y8iTw66+/vmBPfCbTFx9++KGxYsUK4/XXX5+jVmZGMn1xvs2bNxv/+Z//\nmcEWZk4yffHee+8ZJSUlxpkzZ4yTJ08ahYWFxrvvvjtHLZ49yfTFo48+atTX1xuGYRgDAwOG0+k0\nfv/7389Fc2ddX19fUieBk91uZvwXwOXuBdi9ezcA3/72tykrK2P//v3k5+dz3XXX8e///u+ZbmZG\nJNMX//iP/8jx48fN4952u52DBw/OZbNnRTJ9YRXJ9EVBQQEbN25k1apVXHXVVWzbtg2PxzPHLZ95\nyfTF3//93/PAAw9QXFzM2NgY//zP/8zSpQvvwXFVVVW88sorHD16lGXLltHQ0MDp06eB1Leb825Q\neBERyQyNCCYiYlEKABERi1IAiIhYlAJARMSiFt790iIXuPrqq1m1apX5766uLj7/+c/PWn3XX389\nn3zyyawtX2Sm6CogWfAWLVrE8PDwJT879/WfyTvNJ6tPZD7RISCxnCNHjnDTTTdRU1NDUVER0WiU\n7du386UvfYnCwkLq6+vNeffv38/NN9/MmjVrePjhh7nnnvHxqj/66CNKS0spLCxk27Zt5OXlcezY\nxWO+PvPMM/h8PoqLiycsV2Q+UADIgjcyMoLX68Xr9bJp0yZsNhvvv/8+Dz30EL/97W/5/Oc/z/e+\n9z3eeOMN3n77bV555RV+85vfcOrUKf7mb/6GQCDAm2++ydGjR81fCg0NDdx555389re/5d577+V3\nv/vdRfV2d3fz/vvvc/DgQXp7e+np6eGXv/xlpldf5LJ0DkAWvGuuuWbC4BhHjhzhC1/4Aj6fz5zW\n0dHB8892g7HPAAABy0lEQVQ/z5kzZ0gkEoTDYc6ePcsXv/hFvvCFLwDjd2L+27/9GwCvvvoqnZ2d\nAGzYsOGSz2jq7u6mu7sbr9cLwMmTJ3n//fe5/fbbZ21dRaZDASCWdN1115nv+/r6ePbZZ3nzzTdZ\nvHgxDzzwAKdOnbrovMCFp8uSOX32xBNP8K1vfWtmGi0yw3QISCzvxIkTXHfddWRnZzM4OMiBAwew\n2WzcdNNN/N///R8ffvghMP4r4VwofPnLX2bv3r3A+J7+8ePHL1ruhg0b2LNnDydPngTGn9f+0Ucf\nZWitRKamXwCy4F3qCp/zpxUXF+P1eikoKGDZsmV85StfAeAzn/kMzc3NbNy4keuuu44vfelLZrmn\nn36aqqoq2trauO2228jNzTWfP39untLSUt577z1uu+02YPzqoB/96EcLejQzubLoMlCRSZw8edI8\nXPTQQw+xcuVK6urqGB0d5eqrr+bqq6/m9ddf56GHHuKtt96a49aKTI9+AYhM4vnnn6e1tZXR0VFu\nvfVW87HUH374IX/1V3/F2NgYf/Znf8bzzz8/xy0VmT79AhARsSidBBYRsSgFgIiIRSkAREQsSgEg\nImJRCgAREYtSAIiIWNT/A2Y3uW6G2JxcAAAAAElFTkSuQmCC\n",
"text": [
""
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Now do the same thing for the related compound pairs."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scoredLists = cPickle.load(gzip.open('../data/chembl16_25K.pairs.sims.pkl.gz','rb'))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"t1=time.time()\n",
"rl=[]\n",
"for i,(m1,m2) in enumerate(zip(ms1,ms2)):\n",
" sim,frag= FraggleSim.GetFraggleSimilarity(m1[-1],m2[-1])\n",
" rl.append((sim,i))\n",
" if not (i%200):\n",
" print 'Done: %d in %.2f seconds'%(i,time.time()-t1)\n",
"t2=time.time()\n",
"print 'Finished in %.2f seconds'%(t2-t1)\n",
"scoredLists['Fraggle']=rl"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Done: 0 in 0.10 seconds\n",
"Done: 200 in 37.79 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 400 in 83.12 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 600 in 133.13 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 800 in 174.72 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 1000 in 226.38 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 1200 in 274.27 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 1400 in 322.24 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 1600 in 366.22 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 1800 in 408.76 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 2000 in 460.27 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 2200 in 504.41 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 2400 in 543.93 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 2600 in 591.81 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 2800 in 635.61 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 3000 in 681.73 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 3200 in 728.26 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 3400 in 771.40 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 3600 in 813.29 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 3800 in 861.38 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 4000 in 906.49 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 4200 in 954.90 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 4400 in 997.52 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 4600 in 1041.82 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 4800 in 1088.03 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 5000 in 1134.05 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 5200 in 1170.79 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 5400 in 1211.28 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 5600 in 1257.09 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 5800 in 1301.07 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 6000 in 1343.60 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 6200 in 1385.13 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 6400 in 1425.09 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 6600 in 1471.67 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 6800 in 1513.88 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 7000 in 1560.01 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 7200 in 1603.64 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 7400 in 1647.56 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 7600 in 1692.30 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 7800 in 1737.26 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 8000 in 1781.66 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 8200 in 1828.17 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 8400 in 1871.50 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 8600 in 1915.69 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 8800 in 1956.71 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 9000 in 1997.96 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 9200 in 2040.47 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 9400 in 2085.69 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 9600 in 2133.86 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 9800 in 2185.82 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 10000 in 2234.24 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 10200 in 2284.10 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 10400 in 2333.33 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 10600 in 2375.41 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 10800 in 2418.13 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 11000 in 2470.55 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 11200 in 2512.55 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 11400 in 2553.32 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 11600 in 2598.75 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 11800 in 2646.64 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 12000 in 2692.88 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 12200 in 2741.21 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 12400 in 2783.86 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 12600 in 2828.30 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 12800 in 2872.25 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 13000 in 2918.02 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 13200 in 2959.99 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 13400 in 3007.89 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 13600 in 3050.15 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 13800 in 3099.61 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 14000 in 3145.79 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 14200 in 3190.81 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 14400 in 3234.20 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 14600 in 3275.04 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 14800 in 3314.82 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 15000 in 3358.80 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 15200 in 3400.57 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 15400 in 3441.54 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 15600 in 3494.32 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 15800 in 3533.18 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 16000 in 3578.51 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 16200 in 3623.28 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 16400 in 3664.12 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 16600 in 3711.36 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 16800 in 3751.84 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 17000 in 3797.13 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 17200 in 3844.04 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 17400 in 3881.47 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 17600 in 3928.48 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 17800 in 3971.64 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 18000 in 4016.54 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 18200 in 4060.79 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 18400 in 4106.77 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 18600 in 4149.58 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 18800 in 4190.75 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 19000 in 4237.42 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 19200 in 4279.87 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 19400 in 4328.97 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 19600 in 4373.51 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 19800 in 4415.70 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 20000 in 4458.43 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 20200 in 4505.40 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 20400 in 4549.35 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 20600 in 4591.15 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 20800 in 4632.82 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 21000 in 4675.24 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 21200 in 4722.26 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 21400 in 4763.14 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 21600 in 4804.33 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 21800 in 4850.55 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 22000 in 4893.26 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 22200 in 4935.05 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 22400 in 4980.35 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 22600 in 5021.81 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 22800 in 5063.18 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 23000 in 5103.84 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 23200 in 5146.18 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 23400 in 5187.49 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 23600 in 5232.10 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 23800 in 5275.02 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 24000 in 5318.88 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 24200 in 5360.90 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 24400 in 5404.27 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 24600 in 5443.92 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Done: 24800 in 5488.28 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Finished in 5535.28 seconds"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"cPickle.dump(scoredLists,gzip.open('../data/chembl16_25K.pairs.sims2.pkl.gz','wb+'))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the lists"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"scoredLists = cPickle.load(gzip.open('../data/chembl16_25K.pairs.sims2.pkl.gz','rb'))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def directCompare(scoredLists,fp1,fp2,plotIt=True,silent=False):\n",
" l1 = scoredLists[fp1]\n",
" l2 = scoredLists[fp2]\n",
" rl1=[x[-1] for x in l1]\n",
" rl2=[x[-1] for x in l2]\n",
" vl1=[x[0] for x in l1]\n",
" vl2=[x[0] for x in l2]\n",
" if plotIt:\n",
" _=scatter(vl1,vl2,edgecolors='none')\n",
" maxvl1=max(vl1)\n",
" minvl1=min(vl1)\n",
" maxvl2=max(vl2)\n",
" minvl2=min(vl2)\n",
" _=plot((minvl1,maxvl1),(minvl2,maxvl2),color='k',linestyle='-')\n",
" xlabel(fp1)\n",
" ylabel(fp2)\n",
" \n",
" tau,tau_p=stats.kendalltau(vl1,vl2)\n",
" spearman_rho,spearman_p=stats.spearmanr(vl1,vl2)\n",
" pearson_r,pearson_p = stats.pearsonr(vl1,vl2)\n",
" if not silent:\n",
" print fp1,fp2,tau,tau_p,spearman_rho,spearman_p,pearson_r,pearson_p\n",
" return tau,spearman_rho,pearson_r"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Fraggle algorithm makes use of the RDKit5 fingerprint, so let's look at the comparison to that."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"_=directCompare(scoredLists,'Fraggle','RDKit5')\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fraggle RDKit5 0.510174399518 0.0 0.676266099876 0.0 0.734593163378 0.0\n"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEMCAYAAAA1VZrrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8TPf+P/DXZJHFmqARSQiJiIgklEQojRJBW9fFbaOt\nraiiVHdd0ful3C63i6La0pRW9XaLFvlpEVpKLAm1tLUEEaVI0hBksrx/f3w6yZzZM5mcM2fm/Xw8\nzkPOnDNn3omZ857PriEiAmOMMWaBh9IBMMYYc36cLBhjjFnFyYIxxphVnCwYY4xZxcmCMcaYVZws\nGGOMWaVIsnjwwQcRFBSEbt26mTz+ySefID4+HnFxcejbty8OHTokc4SMMcb0KZIsJk6ciKysLLPH\nO3bsiB07duDQoUN48cUX8dBDD8kYHWOMMUOKJIt+/fohICDA7PHk5GQ0b94cAJCUlIRz587JFRpj\njDETnL7N4sMPP8SwYcOUDoMxxtyal9IBWLJt2zasXLkSO3fuNDqm0WgUiIgxxtTPnlmenLZkcejQ\nIUyZMgXr1683W2VFRKrd5s6dq3gMHL/ycXD86tvUHDuR/VMBOmWyOHv2LEaOHIk1a9YgMjJS6XAY\nY8ztKVINNWbMGGzfvh2XL19GWFgY5s+fj4qKCgDA1KlT8fLLL6O4uBjTpk0DAHh7eyMnJ0eJUBlj\njEGhZLF27VqLxz/44AN88MEHMkWjjJSUFKVDqBeOX1kcv3LUHHt9aKg+lVgK0mg09ap/Y4wxd2Tv\nvdMp2ywYY4w5F04WjDHGrOJkwRhjzCpOFowxxqziZMEYY8wqThaMMcas4mTBGGPMKk4WjDHGrOJk\nwRhjzCpOFowxxqziZMEYY8wqThaMMcas4mTBGGPMKk4WjDHGrOJkwRhjzCpOFowxxqziZMEYY8wq\nThaMMcas4mTBGGPMKk4WjDHGrOJkwRhjzCpOFowxxqySPVk8+OCDCAoKQrdu3cyeM2vWLHTq1Anx\n8fHIzc2VMTrGGGOmyJ4sJk6ciKysLLPHN27ciBMnTuD48eNYsWIFpk2bJmN0DW/DBsDTE9BogJAQ\npaNhjNlKo5Fu7kb2ZNGvXz8EBASYPb5+/XqMHz8eAJCUlISSkhJcvHhRrvAa3F13AdXV4ufz54HE\nRGXjYYxZ9/LL+nvXALhfwvBSOgBDhYWFCAsLq9kPDQ3FuXPnEBQUZHTuvHnzan5OSUlBSkqKDBE6\n1uHDSkfAGDPn//0/4MEHxRc7QAtgAYB1AA7DCW+fJmVnZyM7O7ve13HK35aIJPsaMylcP1mola+v\n0hEwxkxJTwfWrdPt5QKYACAMwFY46a3TJMMv0vPnz7frOk7XGyokJAQFBQU1++fOnUOIC1fuu1tR\nljFnV1UFPP20LlFoAcwFkAbgCQDfAmirYHTKcbpkMXz4cHz88ccAgN27d6NFixYmq6DUyjA59Oyp\nTByMMWN79gD+/sCrrwKiNNELwH4AeQDGAXDfb3eyl6XGjBmD7du34/LlywgLC8P8+fNRUVEBAJg6\ndSqGDRuGjRs3IjIyEo0bN8aqVavkDrFBbdokGrkrK4HISMBCxzDGmEzefBN46inxuaxtm1gG4DUA\nY+HOSUJHQ4YNBCqh0WiM2jYYY6wu8vKA3r2B8nLdI/ptEytgrcpJjbcge++dnCwYY26rSROgrAyw\npzTh7Q1otQ0doePZe+90ujYLxhhrSFVVwLJlQECALlHY1zahGy/lLjhZMMbcQlUV8MILgJ8fMH06\nUFJSv55OVVUNE6ezUk9nYcYYs1NRERAVBVy5ontEv20iD+7aHbYuuGTBGHNp69YBQUG6RMHjJuzF\nyYIx5pKqqoCFC8VIbNEllsdN1AdXQzHGXNKMGcB77wE8bsIxOFkwxlzGoUPAzp3AwYO6RMFtE47C\nyYIxpnpVVcDjjwNLlui6tHJpwtE4WTDGVO3DD0WX2AsXdI9waaIhcLJgjKlSVRWQkKC/JgyXJhoS\n94aS2YkTQL9+QPv2wHPPKR0NY+r0ww/A0KH6iYJ7OjU0nhtKZvHxohFOZ+1a0bWPMWabhQuB55/X\n7SlbmlDhLYjnhlKLo0el+5mZysTBmNpUVADjxgEvvaR7hEsTcuI2C5l5eekGCAmtWysXC2NqUVEB\ntGwJXL0KKF2a0HG3VS65ZCGz+++v/blRI2DiROViYczZrVkDBAeL1etEonCe0oQaq6Dqg0sWMnvj\nDeDsWaCwEJg0CejeXemIGHM+1dXAZ5+JaidxU3aO0oQ74wZumY0eDXz5pfjZwwPIzha9oxhjAhEw\nYACwfbvukbqtXicXjUada1pwA7dKZGfX/lxdDezYoVgojDmlgwd1icK5Z4j18VE6AnlxspBZjx6W\n9xlzV3/+CTzwgK4dz3naJsxxt8WPuBpKZpcvA088ARQUAPfdB0yerHREjClnxw4xbsLTE9i9Gygq\nUk/bRGCg/mJK6mHvvZMbuGUWEAD06SMaublUwdxZYSEwbJhuHWxAbXM6uVu3dy5ZyOzhh3VTJwO+\nvuLbVHy8sjExJrfSUmDvXmDQIECtPZ28vQGtVuko6k5VDdxZWVmIjo5Gp06dsHjxYqPjly9fxpAh\nQ5CQkIDY2Fh89NFH8gfZQHQ9oQDg5k1g40blYmFMbhcuiC9HzZvrqmCdv23CHDX2hKoP2ZNFVVUV\nHnnkEWRlZeHo0aNYu3Ytjh07JjlnyZIl6N69O/Ly8pCdnY0nnngClfrDnlWsQwfpfseOysTBmNzK\nyoAnn9TNjabF6dPO29PJFv7+SkcgL9mTRU5ODiIjIxEeHg5vb2+kp6cj02CCpODgYJSWlgIASktL\n0bJlS3h5uUbzyi231P6s0QBhYcrFwpgc/voLuO02oEkT4JNPADWXJvSVlysdgbxkvwMXFhYiTO8O\nGRoaij179kjOmTJlCu644w60bdsWV69exeeff27yWvPmzav5OSUlBSkpKQ0RskPt3l37MxGwbZto\n8GbMlVy/DmRkiESRnS2WOlVr24Q5ammvyM7ORrb+AC87yZ4sNDbMvrVw4UIkJCQgOzsbJ0+eRGpq\nKg4ePIimTZtKztNPFmoRHw9s3SrdZ8yVVFQAMTHAmTP6j6qrp5MrMfwiPX/+fLuuI3s1VEhICAoK\nCmr2CwoKEBoaKjln165d+Ne//gUAiIiIQIcOHfDbb7/JGmdD+fRT4N57geRksV7wXXcpHRFjjrNj\nBxAXp58onHsUdn34+SkdgbxkL1n07NkTx48fx+nTp9G2bVusW7cOa9eulZwTHR2NH374AX379sXF\nixfx22+/oaOLtAQHBYkJ0hhzNU8/Dbz6qv4jrl2aCAhQOgJ5yZ4svLy8sGTJEqSlpaGqqgqTJk1C\nly5d8N7fgw+mTp2K5557DhMnTkR8fDyqq6vxn//8B4GBgXKHyhizkf74oYZum/D1Fd3OlXb5stIR\nyIsH5THG6s3bW7eoV8POEOvjA3z3HZCa6tDL2oVnnWWMsTooKQGqq+VpmygvB4YOFW1+TF6cLBhj\ndVJaCtx9txiU1qwZEBiYi+pq+cZNVFYCR44A7ds32EvYxMPN7p5cDcUYs8mNG8D+/aJ94sgRwNXG\nTdSVh4c6pynnWWcZYw1mzx6xet2NG7pHXLunky14ug/W4CoqgKIipaNgzDaLFgF9++oShWuOm7Bn\nzIS7TdXDyUJm2dlAq1ZAy5bA4MHuN78MU5f584Fnn9VVt7jGnE6m1JaYbGcw/6nL4zYLmQUEiN4j\nOgsXig8jY87I3x+4ccO92yYsUeEtiLvOqoV+ogCAFSuUiYMxc8aOBVq00CUK1y1NsLrhZKEwT0+l\nI2Cs1rPPAmvWAH/9pcWNG/Vvm2jTxvgx/blE7Xn/N2lS9+c0BHf77HJvKJm1bQucP1+7//zzysXC\nmM6XX4rJ/8TClY7r6XThgvFjRCJhpKYCmzfX/ZrXrtkdjkO52ySgnCxkFhMjTRadOysXC2Nnz4p1\nsI8fB+QcN0EEbNnSIJeWjbstL8AN3DLz8pIO5Bk9Gvjf/5SLh7mv/fuBpCT9nk4T0FBzOrmiyEhd\nklUXHpSnEoajPt1tTnymrKwsYNYsMdZHowGqqtTb00mjUbY3kqkqNlfGJQuZ6U/l7OMD5OYCXboo\nGxNzD3/9BYSEAGVluke4NGGNl5duNl3TVHgL4q6zarF8uUgWTzwB7N3LiYLJo7QUKCjQJQrXHIXt\naE8/bTlRuBuuhpLZJ58A06eLqqivvwby8gCDpcUZc6jRo0VvJ0F9czq1aiX/QkOJiWKzxFm68MqF\nq6Fk5usrneJj2jRg6VLl4mGuiwgYMwZYtw5w9xli68LPT5TAiouB6Gjg0iXT57VpA/zxh7yxOQJX\nQ6mE4VxQO3cqEwdzTZcvAxcvip9HjtQlCh6FXRc3bojqp8BA0aYYHW36PHerEeBkITPDUZ/duikT\nB3M9CxcCt9wivvEmJgLffMNtE/bSzdcWEiLW7vjtN+PR6O426yxXQ8ksPV33bU90/fvlF6BrV2Vj\nYuq3bx/Qq5f+I+ro6eTj47wzL//wg1hbPD8fePFF0UFAX0QEcOKEMrHVB4+zUAn96Q2IRCM3JwtW\nH3v2AAMH6vbU1Tah1SodgXmDBlk+7izTjsiFq6FkVlws3f/mG2XiYOpXVATMmwfcf7+uS6z62iYa\nNVI6Avt5eysdgby4ZCEzw1GnwcHKxcLU68wZMVWHaMxWV2lCnzNUQXl5icbq7t2B7dttX1fby83u\nnoqULLKyshAdHY1OnTphsZjm0kh2dja6d++O2NhYpKSkyBtgAxowQLo/c6YycTD1mjoVCA/XJQp5\nSxMadeQgkwICTD9eWSlK/Fu32p4oAODcOcfEpRoks8rKSoqIiKD8/HzSarUUHx9PR48elZxTXFxM\nMTExVFBQQEREly5dMrqOAqE7xKxZRKJsQeTjQ3TihNIRMbVYu5Zo8mTd+6ecgJcIaE1ABgHVNe8r\n3kxvvXo5/ppqZO+9U/aSRU5ODiIjIxEeHg5vb2+kp6cjMzNTcs6nn36KUaNGITQ0FADQqlUrucNs\nMMuX1/5cXg4sWqRcLMy5VVcDTz0FREUBQUFigN0HHwBqbJtQmoeH6AJrjZpLTg1N9lq3wsJChOl1\nUA4NDcWePXsk5xw/fhwVFRUYMGAArl69ikcffRRjx441uta8efNqfk5JSVFFdZXhXDNqHAHK5LFs\nGfDaa/qPqLdtQmnV1cD169bPq0uPUrW0WWRnZyM7O7ve17H51z116hRyc3PRtWtXRJsb0mgDjQ2p\nu6KiAgcOHMCWLVtw/fp1JCcno3fv3ujUqZPkPP1koRaenuKNq2Nq2UnGAGkpVI1zOjkzwxUr7dG/\nv2NiaWiGX6Tnz59v13XMVkONGDGi5ufMzEwMHDgQ3333HYYPH45Vq1bZ9WIAEBISggK90S0FBQU1\n1U06YWFhGDx4MPz8/NCyZUv0798fBw8etPs1nYnh+hUG+Y8xAOIb7tGjgLPPEKvWyfSeeKL+nz1z\nc0a5LHONGQkJCTU/9+7dm06dOkVEorG5W7dudjWQEBFVVFRQx44dKT8/n8rLy002cB87dowGDhxI\nlZWVVFZWRrGxsXTkyBHJORZCd2qentIGspEjlY6IOaP9+4mAAwTEEXAnAYWKNxCb2nx9lY+hrltA\nAFFFBdHq1fW7zi23KP0usY+9906bqqG0Wi06dOgAQDQ2e3jY3y7u5eWFJUuWIC0tDVVVVZg0aRK6\ndOmC9/5eEWjq1KmIjo7GkCFDEBcXBw8PD0yZMgUxMTF2v6Yz0a+CAoCbN5WJgzmfy5eBY8eA//1P\ni2XLnL9tQqNR9v3r4WH8ebJFWZkYOZ6RUb/Xr11Eyj2YnRvK09MT/v7+AICbN2/i7NmzCA4ORnl5\nOXr16oVDhw7JGqghtc4N1aEDcPp07f6XX4rZQZl7W70aGD8eIHK+OZ1SUgBT7aPWVpGTw4gRlmdB\niInRVedJeXqK8oFhsmnRAigpse21PT2V//3tYe+9s84TCZaUlODYsWNITk6u84s5klqThbe39A2W\nng6sXatcPEwZmzcDf/4JDB0qpsL299fi5k3nLE3cf79YtEsuHTqIyfvqS7d2TEPdJjw86jaIz1k0\n2HoWzzzzjGS/RYsW+IYnNLKb4TcRd1v0nQFPPgmkpQFjxwKtWwNBQbm4edN5x03I/WXmzBnHXOfm\nzfonimbNzB+rR228Kln9dTfrT5P6t40bNzZIMO6otFTpCJiciIB33tHtaUE0F5cuOWdPJx172gXU\n9HqWDB0KDB9u+pi7DeAz28C9bNkyLF26FCdPnkQ3vRV6rl69ir59+8oSnDuwtX6UuYaLF3VVF64z\nbsLbG6ioUDoKMfdTz57Ajh3WJyj09xdjnE6dMj7m5ycSQUQE8MwzopowIMC4QdxwITNXZ7bN4q+/\n/kJxcTHmzJmDxYsX19RxNW3aFC1btpQ1SFPU2mZh+G2kRw9g/35lYmENT6sFJkwA1q8XYxIuXdKi\nuto52ybcTY8eYlT3r79KH4+MFGMo/vpLJBV/f9FTzVC3boDC/Xzs4vDFjzQaDcLDw/Huu+8ajbou\nKipCYGBg3aNkRt/CevdWLhbW8N56q7bOv6zMdUoTruDAAdOlA/3V765fNz9NiC3Th7gSs8lizJgx\n2LBhA2699VaTU3TkO6K7ghuaORN44w3xs78/MGuWsvGwhpWVBfCcTo7j6enYHkj1uZa7JQu71uB2\nhpKFWquhAOCLL4CzZ4G77+bpPlxNfj7w6qtinfWiIkAta2Ez88wlqFat1Dnlh8O7zk6ePNnk4wUF\nBeivlhm0nNC+fWJemqefBubPV2c/bWba4cOivnvZMqCoyLnndGK2M9c7y1Q7hiszmywqKirwwAMP\noFrvL3X06FHcfvvtePLJJ2UJzhVNmCBKFVVVYqDTRx8pHRFzBCLRE0d8XHi9CWdiaayELZo0AXx8\nHBOLmplNFqtWrYK/vz/uvfdeVFVVYdeuXUhLS8Pbb7+NCRMmyBiiazEstqqxGMuM/f47UF7OpQln\ndPWq/c/18xNTsfw9dZ1EVJT911Ujq20WM2fORG5uLs6ePYt169YpPs2HjlrbLB59FHj7bfGzr69Y\nvatjR2VjYvY7exa4cQO46648nDgxAUAouG1CWcHB9i0q1q6d+P/U8fcHHn9c/KzRANeuic9uVRXQ\nuLGY9FFvHTfVcPjcUDNnzqy56KeffooePXrULHqk0Wjwtu6OpxC1Jov27aVvyKVLgWnTlIuH2Wf7\ndrGK3XffcU8nV6KbydbHR/x844b5c5csAWbMkC82R3F4svjoo49quswanqLRaDB+/Hg7wnQctSYL\nDw/pfDVDhgCbNikXD6ub3btFu9NvvwGiPWICuDThnjp3Nh7QpwYOH5TH7RINo1kzMTJU59ZblYuF\n1c25c2IpzYoKLk24grAwQG/RzjpzgoksZGVxIsGPPvoIPXr0gL+/P/z9/dGzZ09k1HfFEDf3zTei\nrQIAkpOBl15SNh5m3eHDYnDdiy8CFRV5ABLBPZ3ULzFRLBEQHW1+UkBz8z9pNKYbvV2auSX0Pvro\nI0pISKCtW7dScXExFRUV0ZYtW6hHjx6UkZFh17J8jmQhdKe2bBmRRiOWZQwJIbpyRemImCW1/1/l\nBLxEQGsCMgioVnx5UN4sb7rPmbnNw0Msr5qRQdSsmelzOnQgiogwfeyOO5R+d9rH3nun2WclJibW\nrLutLz8/nxITE+16MUdSa7IwXLN46lSlI2KWBAcTAbkExJMzr4XNW903jYZoyhTL57z9NlH79qYT\nj7e30u9O+9h77zRbDXX16tWadbf1hYeH42p9Oi67OcOpk/WXWGXO4Y8/RN/6zZu1uHFjLoDBAB4H\nj5tQlzZtgKlTRTfXRo1EF/WgoNrjRMD771u+xscfi8WYiIyPOcO07HIy28Dtq6tYr+MxZpnhur3t\n2ikXC5P66Sfgl1+AuXOBS5f0ezrxDLHOQqMxfeM25cIFYOVKsVa4Vlu7dkWTJmLMhC3On7cvTldk\nNlkcO3ZMsuiRvpMnTzZYQK7OcFnVgweViYPVKi0VvdLE1NTc08lZRUUBx4/X7TkVFcYlAMPSffv2\nQEgIsGuX9HEvLzEwzxxeKe9vx44dM/skU1OWM/u42zq+zujVV3WJgksTSvrnP4GvvzZ//PffHfM6\nMTHSL2nTp4seUf/4h/S8ykrxvjBXEvEye/d0TWZ/3fDwcJOPExE+//xztG/fvqFicmleXtLSRUyM\ncrEwYdcuLk04A0uJwlGmTAFef110WT9wABgwAHjySTHQ0pwOHUSJ5uZN6eM8zuJv165dw+uvv47p\n06dj6dKlqK6uxtdff42uXbvik08+qdeLZmVlITo6Gp06dcLixYvNnrd37154eXnhq6++qtfrOZMW\nLaT7Zmr6mEz+8588bN3K4ybcwapVwIoVQNOmwH//K6Zsad0aCA0FvvvO/POGDDE9JY+7zURrdrqP\nkSNHolmzZkhOTsbmzZtRUFAAX19fvP3220hISLD7BauqqtC5c2f88MMPCAkJQa9evbB27Vp06dLF\n6LzU1FT4+/tj4sSJGDVqlDRwlU73YViDd9ttwI8/KhOLO8rKAjZvBgoLtdi0aQGuXuXShLvw8QH6\n9gUeeUSsiV5YCPzwg/kG80aNxESCiYnAzp3Am29K15/h6T7+duLECRz6ezXyyZMnIzg4GGfOnIGf\nn5/9UQLIyclBZGRkTTVXeno6MjMzjZLFO++8g9GjR2Pv3r31ej1n56h6WGbdp58C998PcNuEeyov\nB7ZuFZstRo8G4uKAkSNNH3e3laXNJgtPvXHunp6eCAkJqXeiAIDCwkKE6c3rGxoaij179hidk5mZ\nia1bt2Lv3r1mG9TnzZtX83NKSgpSUlLqHZ/c4uKUjsD1abXiG+Hy5dw24Qr8/Rt+/et27cSKhw89\n1LCvI4fs7GxkZ2fX+zpmk8WhQ4fQtGnTmv0bN27U7Gs0GpSWltr1grb0pJo9ezYWLVpUU1wyV2TS\nTxZqpdUqHYHr0mrF/D1PPglotVyacBWOThSmks9nn4m1KywNmq1HbbysDL9Iz58/367rmE0WVQ20\nOHRISAgK9KZ6LCgoQGhoqOSc/fv3Iz09HQBw+fJlbNq0Cd7e3hg+fHiDxKSko0eVjsA1bdggJom7\ndo1LE8wyU8ln1ixg3z7xs7e3GIdx9qx0PW53G7BndaU8R6usrETnzp2xZcsWtG3bFomJiSYbuHUm\nTpyIu+++GyMNKg5dpYG7Y0eAxzg61okTonrvxg1eb4I5xscfA+PGSR9r1Mh4gJ8aOLyBu6F4eXlh\nyZIlSEtLQ1VVFSZNmoQuXbrgvb/n+506darcIclKtxKXTo8eysXiio4dA3r00OLmTS5NMPs1bSpd\nuzsqSkzVo1/hYml0tyuSvWThKGotWXh6SpPFqFHAF18oF48rqK4Wk721aAHcd18esrImgEsTrD7a\ntBFzSwGAnx+QmwsMHixdEvkf/xDr06iNvfdOnmxCZob/R/qr5rG602qBpCSgY0ctAgPnIiuLZ4hl\ntktLM/24LlEAYh3uzz6TJgoA2Lix4eJyRpwsZGY4F1RwsDJxqF1FhShRzJgB7NvHq9cx+0yZYn41\nPH233GL8mOGkoK6Ok4XM9KugAPfrUeEIzz4rqgaaNNEiI4PXm2D227pV2g5hSlCQSCqGnVP0Rha4\nBTebN1F5htVQV64oE4faaLXAjh3A888DOTkAkIcbNyaAx00we7VuLabssKZDBzEBqLe3dFyUuyUL\nLlnIzPAN9q9/KROHmqxbBzRrBqSmAjk5WgBcmmD28/YGZs4Efv5ZjNC+7TbL5+t6PRl+0eNqKNag\nmjWT7uvNfMJMqKgQ00eL/uzcNuHuDD8/1jRuDMTGSmeIragAzp0DIiJEb6azZ0UCMae6GigpAQwX\nCOUpylmDKiyU7r/5pjJxqMX588DNm1yaYEBgoKgSqovnnxeN04aD577+Gli7FrjvPpEs9FfTi4mR\nJo/sbKBPH+O2DXebqofHWcjMsJGsfXvL88+4s6VLgUceyQPRBPC4CWYrw5XtDBcc0xk4ENiyRfrY\n8uVA//6mFyVr1EiaIEJDAb2Zi1SDx1mohGGy6N5dmTicXUmJFjNmzAURlyZY3RgugWqubSE21vix\nwYOB334zfrxxY6CtwdvP1PNdGfeGkhkPyjMtJwf45RegqAh47bU8/PnnBHBPJ9ZQDMc7AaIhW6sV\ns83qa94c+N//RBvF7beLZNS2LfDhh/LE6iw4WSisqEjpCJS3di3wwANAdTXPEMscz3A+NkDs79gh\nfez6dbHMsWFpf9w40RPv/Hlx/NAhoFcv4yWSXR1XQ8nMsBrK1MhQd1JRATz2GFBdzT2dmGP07i3d\nN0wUOkVFxg3mFRWi2kr3eEIC8MIL4udHHxXdbcvKgMxMYNEix8bt7DhZyMywGspwvhl38+qrWly8\nyD2dmH1MdXlNTbXtuWfOiGrgO++UPt6ihVg2oLhYTCCo+0Jn2JPRcN/VcbJQmGHfbXeSl5eHN97g\n0gQzzdqcTY0bS7u86h5r2lQsVmSLoiJgxAhAfzmdI0eAX381rmb65z+l+/fea9truApOFjLTHxwE\niO577kar1WLOnLlITByMK1e4NMGkfHyAW28VjcrmVmH28RHVQfpuvRWYOxd4+mnz3/oNP38A8O67\n0uRy8aK4hqGFC6X7ps5xZdzALTPDN38DrV7rtPLy8nD33RNw8WIoKiq4pxMzNmIE8MwzwKBBxtW2\nOuXl4rOkf3zAAOPzfH2Bnj2Bn34S+97eQKdOQH4+cPOm+Pzl5Rk/z7D7LSBGces7dsy238dVcMlC\nZjdvSvd//VWZOOSm1Woxd+5cDBgwGOfOPY6KCi5NMNPWrRMrSFrrKWiYSF5/Xbq6HQAMHQocOFC7\nf+0acPiwKJUYflHTdadt1Ei0V0RGAi++WHvcsH2Ee0MxWR09qnQEDWvRIiAyMg+tWiUiO3s/NBpu\nm2ANgwj4979r95s0EWtn2zqfVHW1KK20bw8cPCgauf/v/8TCR4BxtdPy5Y6JWy04WSjMlQflff65\nFs8+OxeqAagAAAAeYklEQVQnTw7G1auPIyfnWxQXc2mCyePaNdEAnpFh2wJHgEg4f/whfez4cfHv\ntm3Sx3mlPCYrV50TPy8vD7NnS3s63bzJpQlmn+bNbTtPv02wTx8gIEBM4XH8ONCvn1iZMiZGrJ9t\nav4nT0/RxqGvdWvxr+Ecbq5eK2CIk4XCIiKUjsCxdG0TgwcPRv/+3NOJOYa5EriPjxhA5+sLtGkD\nPPGEWJ9i/Hgxbce994oR2ffdJ5YDqK4WN/kDB4CoKOm1oqLElOVBQdLHp00Dpk83bvQ2bB9xdTzr\nrMwMe0OFhIi59V1BXl4eJkyYgODgUBw6tALnz3OSYPbx8TGeVtwSPz+RFH7+WVQlaTSiVGGpkfye\ne0SV09694rlr14r2irFjgTVrrL9m48ame005O551VqVKS5WOoP50pYlBgwZj3LjHkZ//LScKZhfd\nl6m6JAoAuHED2LWrtocUkfXeVJ6ewMqVYp2Mn38WS6xOniymnzGcYRYQU53ra9y4bjGqHilg06ZN\n1LlzZ4qMjKRFixYZHV+zZg3FxcVRt27dqE+fPnTw4EGjcxQKvd7E27h2a9VK6YjqJzc3l2Jj48nf\n/04CCo1+P954s7S1bEnk60vk4UEUHEz02GMN+3re3uLfwECivDyisWONz+nbl6isjOiee2ofS0oi\nmjxZet6aNUp/+uxj773TvmfVQ2VlJUVERFB+fj5ptVqKj4+no0ePSs7ZtWsXlZSUEJFILElJSUbX\ncZVk0b690hHZp7y8nF566SVq3bo19euXQUC14jce3pxz8/ExfywmRrrftm39Xisw0PyxsDCiU6eI\ntm8nunRJJItGjUyfW1oq3ufbthF9+y3RjRtEQUHSc3r3VvQjaDd7752yV0Pl5OQgMjIS4eHh8Pb2\nRnp6OjIzMyXnJCcno/nf3R+SkpJwzlUq9U0wbExTg7y8PCQmJmL//v3Iy8vDtWs8boKZZ6lKybBH\n0fnztl2zc2fg22+N2wDfe8989VBBgZhF9qWXxHTkL71kemnUsDAxRgMAUlKAu+4SDehXrkjPO3HC\ntlhdhezTfRQWFiIsLKxmPzQ0FHv27DF7/ocffohhw4aZPDZv3ryan1NSUpCSkuKoMGVDpHQEttNq\ntViwYAGWLVuGhx9+DSdPjsWCBRqjUemM1ZePj+j8ceqU9HE/P2DMGGDBAtFLyfDz89dfwGuvATNn\nml4hr7QU2L5d9Jjq3Nn4eECAGD9hak6qFi2Ay5dr9yMj6/57KSE7OxvZ2dn1vo7syUJjbmYwE7Zt\n24aVK1di586dJo/rJwu1UsMavpcuAWlpecjLm4DmzUPx2mt5mDq1rdvNa8Xkc+edQHq66LGk78YN\nMRVHmzaAqdvCDz8AP/5ofilVnYICIDpa+lhcnHiuuRHfcXHA1q21+8nJ1n8PZ2D4RXr+/Pl2XUf2\naqiQkBAU6N0hCwoKEBoaanTeoUOHMGXKFKxfvx4BAQFyhigrZ18pT6vV4o475iI3dzCIHkdJybeY\nPJkTBbOPrSOp770XWLbM9LHvvhP/JiUZH4uPN57wz5SgIODCBeljPj6WpwYxnOW2uNj667gS2ZNF\nz549cfz4cZw+fRparRbr1q3D8OHDJeecPXsWI0eOxJo1axCplrKenUzVmToLXdvE6dO83gSrv4ED\nRVdVf38xaZ/h8qX6oqLMT9SnWwNm4ULRxdXLS1QfzZkDPPUU8PjjluPw8hID9gzbUnJzxfxP8+eb\n/hJnGG9CguXXcTkObmi3ycaNGykqKooiIiJo4cKFRES0fPlyWr58ORERTZo0iQIDAykhIYESEhKo\nV69eRtdQKPR6M9Xzwtno93T6978zKDmZezrxZnpr3tz6OY8/TrR6NVFlJdG1a0QrVxK1ayeOaTTG\n52s0RIsXix5LAQHSY15eRF99Jd6nfftKj332We17eNs2ohUrzMd0112iN5O54926EWm10s9FWRnR\nP/5B1KED0dSpRFVVsn0kHcree6d9z3ICrpIsvLyUjkjqyy9zqV27eEpJuZO+/bbQYrdH3txna9LE\n+DEvL6KJE22/hq9v3V7z+++JysuJfviBaM8e8TMR0V9/EQ0ZYnz+8OHG7+d580xf+847jbvCGm6/\n/iq91gcf1B7z9hYJSY04WaiEqQ+cMygvL6f773+JgNYE8LgJ3qxvpkoFjtyWLTP9Xp092/T5ERGm\nzz98WJQ62rQR5wUEEP30k+Xfp2lTor+HetXo00d6/qRJjv0MysXeeyevlKew6mqlIwCWLMnDE09M\nQEVFKETbBE/V4c4aNbKtLY3I/LHgYNGIbGoVOltjGDTI9DFzS6aePy/matKNkdDp2lVsQ4cCv/8O\ndOwoGrK9vKS9piZNAnbvFoscvfaa8Uy3bdpY3nd5Dk5aslFr6Ka+zSilvLycnnmGSxO81W7+/kTP\nPlu77+lp33UCA8VoaVvONWz36NyZ6MABw/dq7c+ffWb+Wrt32/7+/+ST2uk/Rowgqq62fH5BgZj2\nw9dXtHlcu2b7azkTe++dPJGgwgyXapSLrqfT1q3c04nVun5dfLv+/ntgyRLglVekx/XG01pUVARM\nnGh9HQp/f+OpvpOSxCp0+/aJQXlduohurf36iW6xOTmmr+XrC7RrZ1t8gJi2vKREjCP6+mvTA/H0\ntW4tYtBt/v62v5ZLcHDSko1aQzf8JnTLLfK+vn5Pp4yMDOrcmUsTvEm3mBjpe+bLL4lmzSL6+GOi\n11+v37UbN5buR0ZaPnfQIOljTz1FNHq09DEPD6IuXYg2bDD/vt+1i+jJJ4neeUf0yrLHzJnS1126\n1L7rKM3eeye3WShMzkF5uvUm/P1DMXhwHo4fb2u0+hdzHr6+UGQqlTZtxBgEHx+xP3Kk2ABRxz9v\nnuWFfzw9IRm02by5mIajZUvp/Eoajfi2XlgoRmYbKisznuGgqEiUCL74ovax6mpRoggLA44cEe0T\n+vbtA26/XSyxCgCHD9u3frZhiSYnR0w54i64GkphcjRwa7VaPPfcXPTpMxg3bz6O3bu/xSeftMX/\n/V/d1w1g8mmoRBEcbPn41q3As8+aPublBQwZYvn5gYHS/QceEMngn/+UPk4k1pHQTyyGVUFxcdJj\nI0aI64wYIT1v3z5xbmwsMGOG9NimTbWJAgAM5i21Wb9+lvddnoNLOLJRa+iGRe2AgIZ9vdzcXIqP\nj6fQUF5vwtm3pk2tn6PRiO7W9lzf11dUJ1maxlu3BQSI9Rv+/FP6frp+nejHH4mio80/9+mniebM\nEY3BDz8sBrMREa1fb7677ZIlRJ9+SvTCC9LH/fyk+7NmiWvt3Wt8TH/7/ffamNeulR7r18++z9Lp\n02IwoZcXUUJC7e+lNvbeO+17lhNwlWTxj380zOsYtk1ERnLbhNxbly5EzZrZPhjNy4to2jRxkzUc\nuRwYWPdBbbZukyebv/Hefnvte+riRdFTCajtRaS/aTREzz1n3Kto82bRg6m4mCgri+jRR8WiX7rn\ntWhB9Mcf4ty33pJe08NDuj9lSu11f/2V6P33iT76yDgWwwF1L75IFBVFlJpKdOaMfZ8pw7aSF1+0\n7zpK42ShEoZv6m7dHP8autLEnXfeSYWFhURElJys/M3TVbf27U1/Y966tfb/5LvvxLftOXPE6GFT\ni+60bVt7/okTYkRyv35En39OlJtb97j8/MSXEUvneHsT/fvf5o/7+tbGZPit39TWtSvR+fO1z5k1\nq/ZYp05ERUXi8VOnxOjvMWOI9u+vPf/4cWnJKTCw9m/r4UE0cqTowmrIsOF85cr6f44MGX6GHnzQ\n8a8hB04WKmH44WrSxHHXNixNVOt9xTP8MPEmNkeMQvbxEQlD/7E2baT/N/n50m/To0ZJX7t5c6Jf\nfjH/f3v1qnR6Co1GLAn6yCPiC8fttxvHdffdIjbDb+e6rV07oh07iLZsMf+7DRwo4hozRiQCW/4e\nzZqJaiKt1vi1V6+2/B7+8Ufr1+/c2bhHk+FYjbfeqssnxzbLltVe38tLTEeiRpwsVMLwje+oNgvD\n0sT69aLYPHMm0fz5yt+UDbdGjYgeesj6eYbVHRoNUY8eYqI5R9zomzcnio+3fI5hlZCpLThYuj9n\njvT/55VXpMeDgsSynv/7nyhF2GLECOk1HnhAevy//xXtHgEB0oF1+lvTpiKxjB9PdPly7XOff158\noQgNFe+XkSOJpk8XJYDWrWufrxuk5+srzgkJMV0lde+9ojqqWTPp499+a/l3/O032/5f9Usvhn/f\n9u2JLlyw7W9qKDNT/A1atyb6e15Tiawsov/8R8xVpVacLFTC8E3fsWP9rmdYmsjOrqb09Iaft6cu\nm6lvts2bi1lFDW+yhlt6OtHCheJbsP51HnhAfJg7dhSNjXfdZfr5uuoeH5/auYEMt+nT6/f7xcZK\nq2gaNyY6dEj6//T++9LnGI5lsMU//ym9xogR5s/99FPbb7SW5OQYP3/16to2BiKRdPr3l54zbpw4\ntn69SFAajWhvsDZKmkg0duuS3j33GL9++/ZEFRXGz/v5ZzEm5MoV238/fSUl0rYbDw+iY8dqj5eV\niZJcv34iodryuzgjThYqYfjGNzf5mS0MSxM7d9o/PYOjt5YtiXr1InrzTaJ9+8TvqX+8Z0/xO5w5\nI74F61fR6DaNprao/9pr0mNNm0r/FqYGi2k04sP/yCPinCeeMD6nRQvRHjBrVm0JwlR7guHm6Ska\noxctqq2Hz8wU3+71bzA6FRXi27aHh2ibsOeb6fbtYjoOQPy7fbv5c69cEVNpG8bdo0fdpta+ckXa\ne6pVK+MJ9ohEg3JoqDgnPJzo5MnaY1VVRDdu2P6ahr76SiSa224TpWX9nk6OZGp6Ev12p2nTpMfe\nfrth4mhonCxUwvDN2Lp13a9hrm3ClmodR2zWpnbWLwEsWKCLWXwj02hEIjG80X3zjfQajRqJKhqd\nzEzp8fbtRRxhYeLYggWWY/r2W6JVq0wf8/ER9fa6pKW76eknBt3PTZqIqj1LN2pLTH0jrotTp4i+\n/lr8a82ffxK9955IYLNni6omXWKriwMHRClm5EhRdWbOzZsiLv15nNSkqkpaQurcWTr/U1KS9H0x\ncaJysdYHJwuVMLxRRUfX7fmmejrphIcb37QffJDo5Zdrv5E6anvsMVHXr+u50qeP6PEzbpz0vPBw\nEVt1NdEdd9Q+3qGD6EqpLytLnDNqlOlv56+8Ij7AvXtLq9n8/ERdt+7312iMq74++EBc46WXxDd7\nwwZ/w4VwWrQQiUPXVVR/s9QQzdTt+nXRkP3mm8bVWXPmSN8HGRnKxFhfnCxUwrAxcNo0255nqadT\ncbH4Vmyqr/xff4lzsrPJoQsZpaVJ9ydPFq/zzjvSx5OSxOOFhcbXsDSXjyWmeu+cOiWqRzZvFolm\n8uTaY8HBxvX0hkktMVG67+cnzjt9WlqyaNTI/sZTpm6VlaJjxX33NUzXXLlwslCJ+++XfvO3ZUpl\nS6UJIuPisf7Wv7+48YWFmT4eH080Y4ZoE7jrLtGW8Mgjom7bUrIYOlS6HxsrYtFqRTdLHx9Rajp8\nWDxeViYdoazREB08aN/f8OpV6Tf+fv2M6+Grq8X4hHffFYnKUEGBaGQGRJfQn36Sdr/UX9gmI0M0\njrdtS7RunX0xM+Ys7L13av5+supoNBqoMfTycuD118UEaffcAwwYYP5crVaLBQsWYNmyZXjttdcw\nduxYaAwmzykpEYvV26NnT2D9etNzBV2/DixYIOIcNUrMrfP112Iuq7vvFvP03Hdf7fkPPQS8957l\n1/v+e2DsWDEltL8/8MYbwJQp9sV+5QqwerWY7G7CBMDPz77rlJaKhXAAsTDOV1+JifTGjQM8eOY0\n5oLsvXdysnBSuhliQ0NDsWLFCrRta3r1uupqoH174Nw5sa/RiO/GgFgH4Nix2nN1s5j6+4ubYlpa\n/WJcsQLYuBGIjhYzkfr6Wj6/qAgICamdIM/DA/j1V6BTp/rFwRizHScLF2FLacLQkSPA7NniW/Jj\njwHx8eLGfeiQmFq6ulrcmFetAqKiRHKxNvNoQzh+XLy+vu3bgf795Y+FMXfFycIF2FqaqIvdu8XW\nqxfQt68DgqyHqiogJQX46Sex36WLmFra7VYcY0xBnCxUzJ7ShFpdvw5kZABarWgXsLe9hTFmH3vv\nnYo04WVlZSE6OhqdOnXC4sWLTZ4za9YsdOrUCfHx8cjNzZU5woZ1+DCQlSVWD9Othb1//37k5eVh\n3LhxLpsoAFGKmDYNePRRThSMqUo9emDZpbKykiIiIig/P5+0Wi3Fx8fT0aNHJeds2LCBhg4dSkRE\nu3fvpiRdZ309CoTuEMuW6QaUlVOLFi9Ry5bG4yYYY6yh2HvvlL1kkZOTg8jISISHh8Pb2xvp6enI\nNFjncP369Rg/fjwAICkpCSUlJbh48aLcoTaIl18GiE4CSERJyX48/LDrlyYYY+rnJfcLFhYWIiws\nrGY/NDQUe/bssXrOuXPnEBQUJDlv3rx5NT+npKQgJSWlQWJ2JDEeoDWAZwCko00bThKMsYaTnZ2N\n7Ozsel9H9mRh6zdoMmiAMfU8/WShFkuXAqNHN8O1a2OQkgJMmqR0RIwxV2b4RXr+/Pl2XUf2ZBES\nEoKCgoKa/YKCAoSGhlo859y5cwgJCZEtxoaUlgZcvAgUFwNt24pBdIwx5uxkb7Po2bMnjh8/jtOn\nT0Or1WLdunUYPny45Jzhw4fj448/BgDs3r0bLVq0MKqCUjN/fzGSmRMFY0wtZC9ZeHl5YcmSJUhL\nS0NVVRUmTZqELl264L2/JxaaOnUqhg0bho0bNyIyMhKNGzfGqlWr5A6TMcaYHh6UxxhjbkRVg/IY\nY4ypCycLxhhjVnGyYIwxZhUnC8YYY1ZxsmCMMWYVJwvGGGNWcbJgjDFmFScLxhhjVnGyYIwxZhUn\nC8YYY1ZxsmCMMWYVJwvGGGNWcbJgjDFmFScLxhhjVnGyYIwxZhUnC8YYY1ZxsmCMMWYVJwvGGGNW\ncbJgjDFmFScLxhhjVnGyYIwxZhUnC8YYY1bJmiyKioqQmpqKqKgoDB48GCUlJUbnFBQUYMCAAeja\ntStiY2Px9ttvyxmibLKzs5UOoV44fmVx/MpRc+z1IWuyWLRoEVJTU/H7779j4MCBWLRokdE53t7e\n+O9//4sjR45g9+7dePfdd3Hs2DE5w5SF2t9wHL+yOH7lqDn2+pA1Waxfvx7jx48HAIwfPx7ffPON\n0Tlt2rRBQkICAKBJkybo0qULzp8/L2eYjDHGDMiaLC5evIigoCAAQFBQEC5evGjx/NOnTyM3NxdJ\nSUlyhMcYY8wMDRGRIy+YmpqKCxcuGD2+YMECjB8/HsXFxTWPBQYGoqioyOR1rl27hpSUFLzwwgsY\nMWKE0XGNRuO4oBljzI3Yc9v3cnQQ33//vdljQUFBuHDhAtq0aYM//vgDt9xyi8nzKioqMGrUKDzw\nwAMmEwVg3y/LGGPMPrJWQw0fPhwZGRkAgIyMDJOJgIgwadIkxMTEYPbs2XKGxxhjzAyHV0NZUlRU\nhHvuuQdnz55FeHg4Pv/8c7Ro0QLnz5/HlClTsGHDBvz000/o378/4uLiaqqaXnnlFQwZMkSuMBlj\njBkilbhy5QoNGjSIOnXqRKmpqVRcXGx0ztmzZyklJYViYmKoa9eu9NZbbykQqdSmTZuoc+fOFBkZ\nSYsWLTJ5zsyZMykyMpLi4uLowIEDMkdombX416xZQ3FxcdStWzfq06cPHTx4UIEozbPl709ElJOT\nQ56envTll1/KGJ1ltsS+bds2SkhIoK5du9Ltt98ub4BWWIv/0qVLlJaWRvHx8dS1a1datWqV/EGa\nMXHiRLrlllsoNjbW7DnO/Lm1Fr89n1vVJIunnnqKFi9eTEREixYtomeeecbonD/++INyc3OJiOjq\n1asUFRVFR48elTVOfZWVlRQREUH5+fmk1WopPj7eKJ4NGzbQ0KFDiYho9+7dlJSUpESoJtkS/65d\nu6ikpISIxM1BbfHrzhswYADdeeed9MUXXygQqTFbYi8uLqaYmBgqKCggInHzdRa2xD937lyaM2cO\nEYnYAwMDqaKiQolwjezYsYMOHDhg9mbrzJ9bIuvx2/O5Vc10H2oco5GTk4PIyEiEh4fD29sb6enp\nyMzMlJyj/3slJSWhpKTEapdiudgSf3JyMpo3bw5AxH/u3DklQjXJlvgB4J133sHo0aPRunVrBaI0\nzZbYP/30U4waNQqhoaEAgFatWikRqkm2xB8cHIzS0lIAQGlpKVq2bAkvL4f3ubFLv379EBAQYPa4\nM39uAevx2/O5VU2yUOMYjcLCQoSFhdXsh4aGorCw0Oo5znLDtSV+fR9++CGGDRsmR2g2sfXvn5mZ\niWnTpgFwni7ZtsR+/PhxFBUVYcCAAejZsydWr14td5hm2RL/lClTcOTIEbRt2xbx8fF466235A7T\nbs78ua0rWz+3zpHG/2ZpjIY+jUZj8UN97do1jB49Gm+99RaaNGni8DhtZeuNhwz6GDjLDasucWzb\ntg0rV67Ezp07GzCiurEl/tmzZ2PRokXQaDQgUS0rQ2TW2RJ7RUUFDhw4gC1btuD69etITk5G7969\n0alTJxkitMyW+BcuXIiEhARkZ2fj5MmTSE1NxcGDB9G0aVMZIqw/Z/3c1kVdPrdOlSzkGqMhl5CQ\nEBQUFNTsFxQU1FQZmDvn3LlzCAkJkS1GS2yJHwAOHTqEKVOmICsry2LRV262xL9//36kp6cDAC5f\nvoxNmzbB29sbw4cPlzVWQ7bEHhYWhlatWsHPzw9+fn7o378/Dh486BTJwpb4d+3aheeffx4AEBER\ngQ4dOuC3335Dz549ZY3VHs78ubVVnT+3DmtRaWBPPfVUTY+KV155xWQDd3V1NY0dO5Zmz54td3gm\nVVRUUMeOHSk/P5/Ky8utNnD//PPPTtVQZkv8Z86coYiICPr5558VitI8W+LXN2HCBKfpDWVL7MeO\nHaOBAwdSZWUllZWVUWxsLB05ckShiKVsif+xxx6jefPmERHRhQsXKCQkhK5cuaJEuCbl5+fb1MDt\nbJ9bHUvx2/O5VU2yuHLlCg0cONCo62xhYSENGzaMiIh+/PFH0mg0FB8fTwkJCZSQkECbNm1SMmza\nuHEjRUVFUUREBC1cuJCIiJYvX07Lly+vOWfGjBkUERFBcXFxtH//fqVCNcla/JMmTaLAwMCav3ev\nXr2UDNeILX9/HWdKFkS2xf7qq69STEwMxcbGOkVXcX3W4r906RLdddddFBcXR7GxsfTJJ58oGa5E\neno6BQcHk7e3N4WGhtKHH36oqs+ttfjt+dzKOiiPMcaYOqmmNxRjjDHlcLJgjDFmFScLxhhjVnGy\nYIwxZpVTjbNgTGmenp6Ii4ur2c/MzES7du0a7PWaNGmCa9euNdj1GXMU7g3FmJ6mTZvi6tWrJo/p\nPiqOHKlr6fUYcyZcDcWYBadPn0bnzp0xfvx4dOvWDQUFBZg+fTp69eqF2NhYzJs3r+bcjRs3okuX\nLujZsydmzZqFu+++GwBw6dIlpKamIjY2FlOmTEF4eLjJ5YRfffVVJCYmIj4+XnJdxpwBJwvG9Ny4\ncQPdu3dH9+7dMWrUKGg0Gpw4cQIzZszA4cOH0a5dOyxYsAB79+7FwYMHsX37dvzyyy+4efMmHn74\nYWRlZWHfvn24fPlyTQlk/vz5GDRoEA4fPozRo0fj7NmzRq+7efNmnDhxAjk5OcjNzcX+/fvx448/\nyv3rM2YWt1kwpsfPzw+5ubk1+6dPn0b79u2RmJhY89i6devw/vvvo7KyEn/88QeOHj2KqqoqdOzY\nEe3btwcAjBkzBitWrAAA7Ny5s2ZK/bS0NJPz8GzevBmbN29G9+7dAQBlZWU4ceIE+vXr12C/K2N1\nwcmCMSsaN25c83N+fj5ef/117Nu3D82bN8fEiRNx8+ZNo3YMw6ZAW5oGn332WTz00EOOCZoxB+Nq\nKMbqoLS0FI0bN0azZs1w8eJFbNq0CRqNBp07d8apU6dw5swZAKL0oUsgffv2xeeffw5AlCCKi4uN\nrpuWloaVK1eirKwMgFgv4dKlSzL9VoxZxyULxvSY6umk/1h8fDy6d++O6OhohIWF4bbbbgMA+Pr6\nYunSpRgyZAgaN26MXr161Txv7ty5GDNmDFavXo3k5GS0adOmZs0G3Tmpqak4duwYkpOTAYheUmvW\nrHGq1fuYe+Ous4w5SFlZWU2V1YwZMxAVFYVHH30UWq0Wnp6e8PT0xM8//4wZM2bgwIEDCkfLWN1w\nyYIxB3n//feRkZEBrVaLHj16YOrUqQCAM2fO4N5770V1dTUaNWqE999/X+FIGas7Llkwxhizihu4\nGWOMWcXJgjHGmFWcLBhjjFnFyYIxxphVnCwYY4xZxcmCMcaYVf8fnPuLke8Q55sAAAAASUVORK5C\nYII=\n",
"text": [
""
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's an interesting shape... \n",
"\n",
"### Let's a look at the points where the Fraggle similarity is high but the RDKit similarity is low.\n",
"\n",
"We'll get ready by loading the data into a Pandas data frame."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pandas.DataFrame(index=range(len(ms1)),columns=['mol1','mol2','Fraggle','RDKit5'])\n",
"df.mol1 = [x[1] for x in ms1]\n",
"df.mol2 = [x[1] for x in ms2]\n",
"df.Fraggle = [x[0] for x in scoredLists['Fraggle']]\n",
"df.RDKit5 = [x[0] for x in scoredLists['RDKit5']]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now do the subset"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subset = df[df.RDKit5<0.2][df.Fraggle>0.8]\n",
"subset.sort(columns=['Fraggle'],ascending=False,inplace=True)\n",
"len(subset)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 55,
"text": [
"62"
]
}
],
"prompt_number": 55
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Add the fragment that Fraggle is using to each row:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"frags = []\n",
"for row in subset.itertuples():\n",
" m1 = row[1]\n",
" m2 = row[2]\n",
" sim,frag= FraggleSim.GetFraggleSimilarity(m1,m2)\n",
" frags.append(frag) \n",
"mfrags = [Chem.MolFromSmiles(x) for x in frags]\n",
"subset['Fragment']=frags\n",
"subset['FragMol']=mfrags"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 56
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subset"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mol1 | \n",
" mol2 | \n",
" Fraggle | \n",
" RDKit5 | \n",
" Fragment | \n",
" FragMol | \n",
"
\n",
" \n",
" \n",
" \n",
" 2768 | \n",
" | \n",
" | \n",
" 1.000000 | \n",
" 0.198157 | \n",
" [*]C(F)(F)Cl.[*]C(F)(Cl)C(F)(F)F | \n",
" | \n",
"
\n",
" \n",
" 2937 | \n",
" | \n",
" | \n",
" 1.000000 | \n",
" 0.128205 | \n",
" [*]C[Se](=O)O | \n",
" | \n",
"
\n",
" \n",
" 7696 | \n",
" | \n",
" | \n",
" 1.000000 | \n",
" 0.157738 | \n",
" [*]c1ncnc(N)c1[*] | \n",
" | \n",
"
\n",
" \n",
" 21156 | \n",
" | \n",
" | \n",
" 1.000000 | \n",
" 0.184080 | \n",
" [*]CCC.[*]c1c2ccccc2nc2ccccc12 | \n",
" | \n",
"
\n",
" \n",
" 3347 | \n",
" | \n",
" | \n",
" 1.000000 | \n",
" 0.104478 | \n",
" [*]CC(C)(C)CO.[*]C(C)(C)CO | \n",
" | \n",
"
\n",
" \n",
" 6534 | \n",
" | \n",
" | \n",
" 1.000000 | \n",
" 0.071942 | \n",
" [*]CCNC.[*]CNC | \n",
" | \n",
"
\n",
" \n",
" 10494 | \n",
" | \n",
" | \n",
" 0.969231 | \n",
" 0.079365 | \n",
" [*]CCCCCCCCCCCCCCCCC | \n",
" | \n",
"
\n",
" \n",
" 23207 | \n",
" | \n",
" | \n",
" 0.964602 | \n",
" 0.164706 | \n",
" [*]SCCO.[*][N+](=O)[O-] | \n",
" | \n",
"
\n",
" \n",
" 6250 | \n",
" | \n",
" | \n",
" 0.952096 | \n",
" 0.172185 | \n",
" [*]c1ccccc1[*] | \n",
" | \n",
"
\n",
" \n",
" 24245 | \n",
" | \n",
" | \n",
" 0.950000 | \n",
" 0.185687 | \n",
" [*]c1cccc[n+]1[O-].[*]C(C)c1cc(C)ccc1C | \n",
" | \n",
"
\n",
" \n",
" 15887 | \n",
" | \n",
" | \n",
" 0.950000 | \n",
" 0.176136 | \n",
" [*][C@@H]1CCCNC1 | \n",
" | \n",
"
\n",
" \n",
" 17667 | \n",
" | \n",
" | \n",
" 0.949580 | \n",
" 0.161392 | \n",
" [*]c1ccccc1.[*]N1C(=O)CNC1=O | \n",
" | \n",
"
\n",
" \n",
" 17500 | \n",
" | \n",
" | \n",
" 0.948718 | \n",
" 0.156951 | \n",
" [*]CCCCCCCCCCC.[*]CP(=O)(OC)OC | \n",
" | \n",
"
\n",
" \n",
" 19213 | \n",
" | \n",
" | \n",
" 0.931034 | \n",
" 0.190476 | \n",
" [*]NC(=N)CN.[*]C(=O)O | \n",
" | \n",
"
\n",
" \n",
" 21961 | \n",
" | \n",
" | \n",
" 0.929412 | \n",
" 0.168790 | \n",
" [*]CSC#N.[*]c1ccccc1[*] | \n",
" | \n",
"
\n",
" \n",
" 15634 | \n",
" | \n",
" | \n",
" 0.927711 | \n",
" 0.191693 | \n",
" [*]c1ncnc2[nH]cnc21 | \n",
" | \n",
"
\n",
" \n",
" 17356 | \n",
" | \n",
" | \n",
" 0.925926 | \n",
" 0.120805 | \n",
" [*]CCCCCC.[*]CC(N)=O | \n",
" | \n",
"
\n",
" \n",
" 19129 | \n",
" | \n",
" | \n",
" 0.919355 | \n",
" 0.174863 | \n",
" [*]c1cncc(Cl)c1 | \n",
" | \n",
"
\n",
" \n",
" 13401 | \n",
" | \n",
" | \n",
" 0.918750 | \n",
" 0.184275 | \n",
" [*]CC1CC1.[*]c1ccccc1Br | \n",
" | \n",
"
\n",
" \n",
" 22933 | \n",
" | \n",
" | \n",
" 0.916667 | \n",
" 0.183784 | \n",
" [*]CC#C.[*]c1ncccn1 | \n",
" | \n",
"
\n",
" \n",
" 4404 | \n",
" | \n",
" | \n",
" 0.907216 | \n",
" 0.186667 | \n",
" [*]CSC.[*]c1ccccc1 | \n",
" | \n",
"
\n",
" \n",
" 12294 | \n",
" | \n",
" | \n",
" 0.894737 | \n",
" 0.099010 | \n",
" [*]c1ccccc1.[*]N(C)C | \n",
" | \n",
"
\n",
" \n",
" 16786 | \n",
" | \n",
" | \n",
" 0.893617 | \n",
" 0.112426 | \n",
" [*]c1sc[n+](C)c1C.[*]c1sc[n+](C)c1C | \n",
" | \n",
"
\n",
" \n",
" 13760 | \n",
" | \n",
" | \n",
" 0.887218 | \n",
" 0.190283 | \n",
" [*]COC(N)=O.[*][N+](=O)[O-] | \n",
" | \n",
"
\n",
" \n",
" 4473 | \n",
" | \n",
" | \n",
" 0.885417 | \n",
" 0.145985 | \n",
" [*]c1ccccc1.[*]c1ccccc1 | \n",
" | \n",
"
\n",
" \n",
" 19112 | \n",
" | \n",
" | \n",
" 0.883721 | \n",
" 0.157598 | \n",
" [*]c1cn2ccsc2n1.[*]n1nc(C)cc1C | \n",
" | \n",
"
\n",
" \n",
" 17148 | \n",
" | \n",
" | \n",
" 0.882353 | \n",
" 0.166667 | \n",
" [*]CCCCCCCCCCCCC | \n",
" | \n",
"
\n",
" \n",
" 6334 | \n",
" | \n",
" | \n",
" 0.882353 | \n",
" 0.190678 | \n",
" [*]c1ccccc1[*] | \n",
" | \n",
"
\n",
" \n",
" 16077 | \n",
" | \n",
" | \n",
" 0.879518 | \n",
" 0.123684 | \n",
" [*]c1nc(C)nn1[*].[*]c1ccccc1 | \n",
" | \n",
"
\n",
" \n",
" 8779 | \n",
" | \n",
" | \n",
" 0.878505 | \n",
" 0.152685 | \n",
" [*]c1nc2nnnc-2c(O)n1[*] | \n",
" | \n",
"
\n",
" \n",
" 2002 | \n",
" | \n",
" | \n",
" 0.875000 | \n",
" 0.154667 | \n",
" [*]c1ccco1.[*]c1ncnn1[*] | \n",
" | \n",
"
\n",
" \n",
" 5529 | \n",
" | \n",
" | \n",
" 0.875000 | \n",
" 0.135714 | \n",
" [*]C(N)=O.[*]C(CC)CCCC | \n",
" | \n",
"
\n",
" \n",
" 15573 | \n",
" | \n",
" | \n",
" 0.859813 | \n",
" 0.090196 | \n",
" [*]C(CSCCCCCCCCCCCCCCCC)OC.[*][n+]1ccsc1 | \n",
" | \n",
"
\n",
" \n",
" 17492 | \n",
" | \n",
" | \n",
" 0.858824 | \n",
" 0.182573 | \n",
" [*]c1ccc2c[nH]nc2c1 | \n",
" | \n",
"
\n",
" \n",
" 20831 | \n",
" | \n",
" | \n",
" 0.858824 | \n",
" 0.182573 | \n",
" [*]c1ccc2c[nH]nc2c1 | \n",
" | \n",
"
\n",
" \n",
" 2570 | \n",
" | \n",
" | \n",
" 0.853659 | \n",
" 0.111842 | \n",
" [*]C(=O)CCCCCCC.[*]C(=O)CCCCCCC | \n",
" | \n",
"
\n",
" \n",
" 23156 | \n",
" | \n",
" | \n",
" 0.853659 | \n",
" 0.130081 | \n",
" [*]CCCCCCCCC.[*]OC(=O)C=C | \n",
" | \n",
"
\n",
" \n",
" 13103 | \n",
" | \n",
" | \n",
" 0.853333 | \n",
" 0.091892 | \n",
" [*]CCCCCCCCCCC.[*]C(N)=O | \n",
" | \n",
"
\n",
" \n",
" 18140 | \n",
" | \n",
" | \n",
" 0.851852 | \n",
" 0.197101 | \n",
" [*]C(=O)OC(C)(C)C.[*]C(=O)OC(C)(C)C | \n",
" | \n",
"
\n",
" \n",
" 24087 | \n",
" | \n",
" | \n",
" 0.851351 | \n",
" 0.180851 | \n",
" [*]c1c(C)ncn1[*] | \n",
" | \n",
"
\n",
" \n",
" 6051 | \n",
" | \n",
" | \n",
" 0.850000 | \n",
" 0.182927 | \n",
" [*]C(=O)OCC.[*]C(=O)OCC | \n",
" | \n",
"
\n",
" \n",
" 7595 | \n",
" | \n",
" | \n",
" 0.839161 | \n",
" 0.094955 | \n",
" [*]OC=O.[*]C(C(=O)O)C(=O)O | \n",
" | \n",
"
\n",
" \n",
" 3185 | \n",
" | \n",
" | \n",
" 0.838323 | \n",
" 0.111111 | \n",
" [*]N1CCOCC1.[*]S(C)(=O)=O | \n",
" | \n",
"
\n",
" \n",
" 15940 | \n",
" | \n",
" | \n",
" 0.835821 | \n",
" 0.127490 | \n",
" [*]/C=C(\\O)C(=O)O.[*]C(C)C | \n",
" | \n",
"
\n",
" \n",
" 20472 | \n",
" | \n",
" | \n",
" 0.831858 | \n",
" 0.160458 | \n",
" [*]C(=O)Cc1ccsc1.[*]C(=O)NC1CCCCCC1 | \n",
" | \n",
"
\n",
" \n",
" 16457 | \n",
" | \n",
" | \n",
" 0.829787 | \n",
" 0.197581 | \n",
" [*]C(=O)OCC.[*]C(=O)C(C)N1CCOCC1 | \n",
" | \n",
"
\n",
" \n",
" 1283 | \n",
" | \n",
" | \n",
" 0.828571 | \n",
" 0.158301 | \n",
" [*]c1ccccc1.[*]C(C)C | \n",
" | \n",
"
\n",
" \n",
" 8629 | \n",
" | \n",
" | \n",
" 0.827273 | \n",
" 0.183333 | \n",
" [*]CCC(N)=O.[*]NCc1ccccc1 | \n",
" | \n",
"
\n",
" \n",
" 10367 | \n",
" | \n",
" | \n",
" 0.827273 | \n",
" 0.161812 | \n",
" [*]CCc1ccccc1.[*]C(CCS)C(=O)O | \n",
" | \n",
"
\n",
" \n",
" 9572 | \n",
" | \n",
" | \n",
" 0.826446 | \n",
" 0.164080 | \n",
" [*]c1ccc(Cl)c(Cl)c1.[*]c1cc(Cl)c(Cl)cc1[*] | \n",
" | \n",
"
\n",
" \n",
" 13684 | \n",
" | \n",
" | \n",
" 0.824074 | \n",
" 0.130233 | \n",
" [*]C1CN2CCC1C2 | \n",
" | \n",
"
\n",
" \n",
" 12972 | \n",
" | \n",
" | \n",
" 0.823529 | \n",
" 0.154762 | \n",
" [*]c1c(Br)cnn1[*] | \n",
" | \n",
"
\n",
" \n",
" 22001 | \n",
" | \n",
" | \n",
" 0.822785 | \n",
" 0.136364 | \n",
" [*]C(=O)O.[*]c1ncsc1[*] | \n",
" | \n",
"
\n",
" \n",
" 5087 | \n",
" | \n",
" | \n",
" 0.822222 | \n",
" 0.173913 | \n",
" [*]CC(C)(C)C(=O)O | \n",
" | \n",
"
\n",
" \n",
" 19447 | \n",
" | \n",
" | \n",
" 0.816327 | \n",
" 0.184507 | \n",
" [*]c1cc2ccccc2o1.[*]c1nnnn1C | \n",
" | \n",
"
\n",
" \n",
" 8002 | \n",
" | \n",
" | \n",
" 0.810345 | \n",
" 0.115672 | \n",
" [*]CCCC.[*]C(O)(P(=O)(O)O)P(=O)(O)O | \n",
" | \n",
"
\n",
" \n",
" 18304 | \n",
" | \n",
" | \n",
" 0.809091 | \n",
" 0.153226 | \n",
" [*]c1ccco1.[*]c1ccccn1 | \n",
" | \n",
"
\n",
" \n",
" 17612 | \n",
" | \n",
" | \n",
" 0.809091 | \n",
" 0.162791 | \n",
" [*]c1ccco1.[*]c1ccncc1 | \n",
" | \n",
"
\n",
" \n",
" 225 | \n",
" | \n",
" | \n",
" 0.806122 | \n",
" 0.198330 | \n",
" [*]c1ccccc1[*] | \n",
" | \n",
"
\n",
" \n",
" 6688 | \n",
" | \n",
" | \n",
" 0.804598 | \n",
" 0.138122 | \n",
" [*]CCO.[*]c1ccccc1[*] | \n",
" | \n",
"
\n",
" \n",
" 16812 | \n",
" | \n",
" | \n",
" 0.802083 | \n",
" 0.188732 | \n",
" [*]CCCCC.[*]c1nnc(N)s1 | \n",
" | \n",
"
\n",
" \n",
" 14552 | \n",
" | \n",
" | \n",
" 0.801980 | \n",
" 0.157407 | \n",
" [*]NC(N)=S.[*]Oc1cccc2ccccc21 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 54,
"text": [
" mol1 mol2 Fraggle RDKit5 Fragment FragMol\n",
"2768 1.000000 0.198157 [*]C(F)(F)Cl.[*]C(F)(Cl)C(F)(F)F \n",
"2937 1.000000 0.128205 [*]C[Se](=O)O \n",
"7696 1.000000 0.157738 [*]c1ncnc(N)c1[*] \n",
"21156 1.000000 0.184080 [*]CCC.[*]c1c2ccccc2nc2ccccc12 \n",
"3347 1.000000 0.104478 [*]CC(C)(C)CO.[*]C(C)(C)CO \n",
"6534 1.000000 0.071942 [*]CCNC.[*]CNC \n",
"10494 0.969231 0.079365 [*]CCCCCCCCCCCCCCCCC \n",
"23207 0.964602 0.164706 [*]SCCO.[*][N+](=O)[O-] \n",
"6250 0.952096 0.172185 [*]c1ccccc1[*] \n",
"24245 0.950000 0.185687 [*]c1cccc[n+]1[O-].[*]C(C)c1cc(C)ccc1C \n",
"15887 0.950000 0.176136 [*][C@@H]1CCCNC1 \n",
"17667 0.949580 0.161392 [*]c1ccccc1.[*]N1C(=O)CNC1=O \n",
"17500 0.948718 0.156951 [*]CCCCCCCCCCC.[*]CP(=O)(OC)OC \n",
"19213 0.931034 0.190476 [*]NC(=N)CN.[*]C(=O)O \n",
"21961 0.929412 0.168790 [*]CSC#N.[*]c1ccccc1[*] \n",
"15634 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 \n",
"17356 0.925926 0.120805 [*]CCCCCC.[*]CC(N)=O \n",
"19129 0.919355 0.174863 [*]c1cncc(Cl)c1 \n",
"13401 0.918750 0.184275 [*]CC1CC1.[*]c1ccccc1Br \n",
"22933 0.916667 0.183784 [*]CC#C.[*]c1ncccn1 \n",
"4404 0.907216 0.186667 [*]CSC.[*]c1ccccc1 \n",
"12294 0.894737 0.099010 [*]c1ccccc1.[*]N(C)C \n",
"16786 0.893617 0.112426 [*]c1sc[n+](C)c1C.[*]c1sc[n+](C)c1C \n",
"13760 0.887218 0.190283 [*]COC(N)=O.[*][N+](=O)[O-] \n",
"4473 0.885417 0.145985 [*]c1ccccc1.[*]c1ccccc1 \n",
"19112 0.883721 0.157598 [*]c1cn2ccsc2n1.[*]n1nc(C)cc1C \n",
"17148 0.882353 0.166667 [*]CCCCCCCCCCCCC \n",
"6334 0.882353 0.190678 [*]c1ccccc1[*] \n",
"16077 0.879518 0.123684 [*]c1nc(C)nn1[*].[*]c1ccccc1 \n",
"8779 0.878505 0.152685 [*]c1nc2nnnc-2c(O)n1[*] \n",
"2002 0.875000 0.154667 [*]c1ccco1.[*]c1ncnn1[*] \n",
"5529 0.875000 0.135714 [*]C(N)=O.[*]C(CC)CCCC \n",
"15573 0.859813 0.090196 [*]C(CSCCCCCCCCCCCCCCCC)OC.[*][n+]1ccsc1 \n",
"17492 0.858824 0.182573 [*]c1ccc2c[nH]nc2c1 \n",
"20831 0.858824 0.182573 [*]c1ccc2c[nH]nc2c1 \n",
"2570 0.853659 0.111842 [*]C(=O)CCCCCCC.[*]C(=O)CCCCCCC \n",
"23156 0.853659 0.130081 [*]CCCCCCCCC.[*]OC(=O)C=C \n",
"13103 0.853333 0.091892 [*]CCCCCCCCCCC.[*]C(N)=O \n",
"18140 0.851852 0.197101 [*]C(=O)OC(C)(C)C.[*]C(=O)OC(C)(C)C \n",
"24087 0.851351 0.180851 [*]c1c(C)ncn1[*] \n",
"6051 0.850000 0.182927 [*]C(=O)OCC.[*]C(=O)OCC \n",
"7595 0.839161 0.094955 [*]OC=O.[*]C(C(=O)O)C(=O)O \n",
"3185 0.838323 0.111111 [*]N1CCOCC1.[*]S(C)(=O)=O \n",
"15940 0.835821 0.127490 [*]/C=C(\\O)C(=O)O.[*]C(C)C \n",
"20472 0.831858 0.160458 [*]C(=O)Cc1ccsc1.[*]C(=O)NC1CCCCCC1 \n",
"16457 0.829787 0.197581 [*]C(=O)OCC.[*]C(=O)C(C)N1CCOCC1 \n",
"1283 0.828571 0.158301 [*]c1ccccc1.[*]C(C)C \n",
"8629 0.827273 0.183333 [*]CCC(N)=O.[*]NCc1ccccc1 \n",
"10367 0.827273 0.161812 [*]CCc1ccccc1.[*]C(CCS)C(=O)O \n",
"9572 0.826446 0.164080 [*]c1ccc(Cl)c(Cl)c1.[*]c1cc(Cl)c(Cl)cc1[*] \n",
"13684 0.824074 0.130233 [*]C1CN2CCC1C2 \n",
"12972 0.823529 0.154762 [*]c1c(Br)cnn1[*] \n",
"22001 0.822785 0.136364 [*]C(=O)O.[*]c1ncsc1[*] \n",
"5087 0.822222 0.173913 [*]CC(C)(C)C(=O)O \n",
"19447 0.816327 0.184507 [*]c1cc2ccccc2o1.[*]c1nnnn1C \n",
"8002 0.810345 0.115672 [*]CCCC.[*]C(O)(P(=O)(O)O)P(=O)(O)O \n",
"18304 0.809091 0.153226 [*]c1ccco1.[*]c1ccccn1 \n",
"17612 0.809091 0.162791 [*]c1ccco1.[*]c1ccncc1 \n",
"225 0.806122 0.198330 [*]c1ccccc1[*] \n",
"6688 0.804598 0.138122 [*]CCO.[*]c1ccccc1[*] \n",
"16812 0.802083 0.188732 [*]CCCCC.[*]c1nnc(N)s1 \n",
"14552 0.801980 0.157407 [*]NC(N)=S.[*]Oc1cccc2ccccc21 "
]
}
],
"prompt_number": 54
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a particularly nice example where a small change in the middle of the molecule (N->S) destroys what would otherwise be a fairly high RDKit similarity, but where Fraggle still produces a high score:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subset[subset.index==15634]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mol1 | \n",
" mol2 | \n",
" Fraggle | \n",
" RDKit5 | \n",
" Fragment | \n",
" FragMol | \n",
"
\n",
" \n",
" \n",
" \n",
" 15634 | \n",
" | \n",
" | \n",
" 0.927711 | \n",
" 0.191693 | \n",
" [*]c1ncnc2[nH]cnc21 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 69,
"text": [
" mol1 mol2 Fraggle RDKit5 Fragment FragMol\n",
"15634 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 "
]
}
],
"prompt_number": 69
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Demonstrate the disproportionate influence of the central S by replacing it with an N and repeating the similarity calculations"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"Chem.MolToSmiles(subset.ix[15634]['mol2'],True)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 70,
"text": [
"'CCCSc1ncnc2[nH]ncc21'"
]
}
],
"prompt_number": 70
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tmol = Chem.MolFromSmiles('CCCNc1ncnc2[nH]ncc21')\n",
"fp1 = Chem.RDKFingerprint(subset.ix[15634]['mol1'],maxPath=5)\n",
"fp2 = Chem.RDKFingerprint(tmol,maxPath=5)\n",
"print 'RDKit5: ',DataStructs.TanimotoSimilarity(fp1,fp2)\n",
"print 'Fraggle: ',FraggleSim.GetFraggleSimilarity(subset.ix[15634]['mol1'],tmol)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"RDKit5: 0.501992031873\n",
"Fraggle: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"(0.927710843373494, '[*]c1ncnc2[nH]cnc21')\n"
]
}
],
"prompt_number": 73
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The RDKit5 similarity is now well above the random threshold (0.29 and 95%), but there's no impact on Fraggle."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What about the cases where the Fraggle similarity is zero, but RDKit5 has a value?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subset2 = df[df.RDKit5>.5][df.Fraggle<0.1]\n",
"subset2.sort(columns=['RDKit5'],ascending=False,inplace=True)\n",
"len(subset2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 59,
"text": [
"38"
]
}
],
"prompt_number": 59
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"frags = []\n",
"for row in subset2.itertuples():\n",
" m1 = row[1]\n",
" m2 = row[2]\n",
" sim,frag= FraggleSim.GetFraggleSimilarity(m1,m2)\n",
" frags.append(frag) \n",
"subset2['Fragment']=frags\n",
"subset2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mol1 | \n",
" mol2 | \n",
" Fraggle | \n",
" RDKit5 | \n",
" Fragment | \n",
"
\n",
" \n",
" \n",
" \n",
" 2958 | \n",
" | \n",
" | \n",
" 0 | \n",
" 1.000000 | \n",
" None | \n",
"
\n",
" \n",
" 4568 | \n",
" | \n",
" | \n",
" 0 | \n",
" 1.000000 | \n",
" None | \n",
"
\n",
" \n",
" 21906 | \n",
" | \n",
" | \n",
" 0 | \n",
" 1.000000 | \n",
" None | \n",
"
\n",
" \n",
" 11718 | \n",
" | \n",
" | \n",
" 0 | \n",
" 1.000000 | \n",
" None | \n",
"
\n",
" \n",
" 11745 | \n",
" | \n",
" | \n",
" 0 | \n",
" 1.000000 | \n",
" None | \n",
"
\n",
" \n",
" 1581 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.987552 | \n",
" None | \n",
"
\n",
" \n",
" 9838 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.987552 | \n",
" None | \n",
"
\n",
" \n",
" 20969 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.987552 | \n",
" None | \n",
"
\n",
" \n",
" 10615 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.986063 | \n",
" None | \n",
"
\n",
" \n",
" 16125 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.881890 | \n",
" None | \n",
"
\n",
" \n",
" 17498 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.875472 | \n",
" None | \n",
"
\n",
" \n",
" 22214 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.849057 | \n",
" None | \n",
"
\n",
" \n",
" 17812 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.816901 | \n",
" None | \n",
"
\n",
" \n",
" 22514 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.798701 | \n",
" None | \n",
"
\n",
" \n",
" 23507 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.720000 | \n",
" None | \n",
"
\n",
" \n",
" 19071 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.714744 | \n",
" None | \n",
"
\n",
" \n",
" 19067 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.713948 | \n",
" None | \n",
"
\n",
" \n",
" 7808 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.711111 | \n",
" None | \n",
"
\n",
" \n",
" 12405 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.692000 | \n",
" None | \n",
"
\n",
" \n",
" 21204 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.687764 | \n",
" None | \n",
"
\n",
" \n",
" 18018 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.657895 | \n",
" None | \n",
"
\n",
" \n",
" 4167 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.654867 | \n",
" None | \n",
"
\n",
" \n",
" 8750 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.650000 | \n",
" None | \n",
"
\n",
" \n",
" 7008 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.630252 | \n",
" None | \n",
"
\n",
" \n",
" 12990 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.630000 | \n",
" None | \n",
"
\n",
" \n",
" 8355 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.623377 | \n",
" None | \n",
"
\n",
" \n",
" 21753 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.602941 | \n",
" None | \n",
"
\n",
" \n",
" 9773 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.598361 | \n",
" None | \n",
"
\n",
" \n",
" 3180 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.589474 | \n",
" None | \n",
"
\n",
" \n",
" 19333 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.576119 | \n",
" None | \n",
"
\n",
" \n",
" 8083 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.575758 | \n",
" None | \n",
"
\n",
" \n",
" 22375 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.559633 | \n",
" None | \n",
"
\n",
" \n",
" 20954 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.555556 | \n",
" None | \n",
"
\n",
" \n",
" 5359 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.548173 | \n",
" None | \n",
"
\n",
" \n",
" 20305 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.542601 | \n",
" None | \n",
"
\n",
" \n",
" 22193 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.542601 | \n",
" None | \n",
"
\n",
" \n",
" 15618 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.536585 | \n",
" None | \n",
"
\n",
" \n",
" 12669 | \n",
" | \n",
" | \n",
" 0 | \n",
" 0.531056 | \n",
" None | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 60,
"text": [
" mol1 mol2 Fraggle RDKit5 Fragment\n",
"2958 0 1.000000 None\n",
"4568 0 1.000000 None\n",
"21906 0 1.000000 None\n",
"11718 0 1.000000 None\n",
"11745 0 1.000000 None\n",
"1581 0 0.987552 None\n",
"9838 0 0.987552 None\n",
"20969 0 0.987552 None\n",
"10615 0 0.986063 None\n",
"16125 0 0.881890 None\n",
"17498 0 0.875472 None\n",
"22214 0 0.849057 None\n",
"17812 0 0.816901 None\n",
"22514 0 0.798701 None\n",
"23507 0 0.720000 None\n",
"19071 0 0.714744 None\n",
"19067 0 0.713948 None\n",
"7808 0 0.711111 None\n",
"12405 0 0.692000 None\n",
"21204 0 0.687764 None\n",
"18018 0 0.657895 None\n",
"4167 0 0.654867 None\n",
"8750 0 0.650000 None\n",
"7008 0 0.630252 None\n",
"12990 0 0.630000 None\n",
"8355 0 0.623377 None\n",
"21753 0 0.602941 None\n",
"9773 0 0.598361 None\n",
"3180 0 0.589474 None\n",
"19333 0 0.576119 None\n",
"8083 0 0.575758 None\n",
"22375 0 0.559633 None\n",
"20954 0 0.555556 None\n",
"5359 0 0.548173 None\n",
"20305 0 0.542601 None\n",
"22193 0 0.542601 None\n",
"15618 0 0.536585 None\n",
"12669 0 0.531056 None"
]
}
],
"prompt_number": 60
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At first these seem somewhat surprising, but it's just due to the fact that the molecules don't generate any fragments. Here is an example:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subset2.ix[21906]['mol1']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAYAAABNcIgQAAAZt0lEQVR4nO3de1jNeeIH8He6qV2X\nFFFU1qhzKjIpl/QY0iJ3nmXIbWYxWAaTMyS7iB0zboWdMRNmLNZ1HrJoMzYZptANTaObhkSI6kws\n3U7n/P4Y+jkjOqX6nMv79TzzePqe7znn3Xe+9e7zOd+LkUqlUoGIiMhANRMdgIiISCQWIRERGTQW\nIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRERGTQWIRmcAdkDsLFg\no+gYWo3biAwJi5D0Tm5FLt7PfR92P9rB7LIZOqV1QtCdIBRUFoiOpjW4jYj+H4uQ9EpWWRa8MryQ\nVpqG8E7hSJAk4GvHr1GsKMYn9z8RHU8rcBsRqTMRHYCoIc3Lmwd7U3tclFyEqZFp9fLBLQejSFFU\n6/MLCgrQrl07GBkZNWZMod50GxHpG44ISW8UKgoR+zgWMluZ2i/456xNrF/53P/9738IDg5G586d\nERcX15gxhbr79G69txGRvmIRkt74ufxnqKCCpLlE4+eoVCrs2bMHEokE69atQ2lpKfbu3duIKcW6\nrbpd521EpO9YhGSwysvL0adPH0yfPh35+fnVyw8fPozS0lKByRqHUqmEUqkUHYNI67AISW90Me8C\nAMgsy9Ro/fv37yMxMfGl5SUlJYiKimrQbNogJSUFlsWWADTfRkSGgEVIesPGxAZ+LfywqWATFCrF\nS4/X5UAQfZwe9fb2hntH9wbbRkT6gkVIeuVzh8+RV5EHnywfHJYfxtWnV3H60Wm8l/seQu+Favw6\np06dQnFxcSMmbVqPHz9GXFwcjI2NG2wbEekLFiHpFWlzKZKlyZA2l2Lh7YXwzvTGjFszYNHMAsHt\ngzV+nYqKCuzfv78RkzatnJwcTJkyBQ8fPmywbUSkL4xUKpVKdAgiEW7dugUnJ6dXPt6rVy8kJCQ0\nXaBGVlpaioqKCrRq1Up0FCKtwhEh0SskJiYiM1P3Dyq5efMmvvjiC5iamrIEiWrAIiR6jX379omO\n8MaqqqoQGRkJT09PlJWViY5DpHU4NUoG686dO+jUqdNr13F0dMTNmzf14pJrR48eRe/evWFvby86\nCpFW4YiQDJZcLoelpeVr17l16xZ++OGHJkrU8JKSkrBkyRKUlJRg3LhxsLGxweeffw5PT09s2rQJ\nT58+FR2RSDgWIRmsbt26Yc+ePbWO9nT5nMJOnTpBLpdDIpEgIiICJiYmmD9/Pk6cOIHLly9DKpVC\nLpeLjkkkFKdGyeCtXr0aK1eufOXjLVu2xP3792FhYdGEqRrW1atXERQUhKKiIoSFhWHQoEEAgKys\nLLi4uAhORyQWR4Rk8P72t79h8uTJr3z80aNHOHnyZBMmahjHjh3DkCFDkJGRgR49eiA2NharVq3C\n7NmzMXr0aDx48IAlSAQWIRGMjIywc+dO9O3b95Xr6OL06PDhwzF8+HAMGDAACxcuhFwux9ixY5Ge\nno4hQ4bwVAqiZzg1SvRMQUEBevXqhby8vJceMzU1RX5+Ptq2bSsgWd2Vl5dDqVTCwsIChYWFWLFi\nBY4ePYqVK1figw8+gLGxseiIRFqDI0KiZ2xtbREdHV3jSKmyshIHDx4UkKp+Tpw4AVdXV3z77bew\nsbHBtm3bcPr0aURGRiI7O1t0PCKtwhEh0W9ER0dj5MiRqKqqUlvu7e1d422btFVycjIWLVqEsrIy\nbN68Gb6+vqIjEWkljgiJfiMgIADr1q17aXlSUpJOXHJNLpfjwYMH8PLywvnz5zF79myMHz8e8fHx\noqMRaSUWIVENFi9ejNmzZ7+0/F//+peANHVz9uxZuLm5YcOGDVAoFJg1axaysrJeezAQkSHj1CjR\nK1RWVmLo0KGIjY2tXubo6IgbN26gWTPt/hvy+vXrkMlkSE9Px759+9CrVy/RkYi0FouQ6DWKi4vR\np08fXL9+vXrZuXPn0L9/f4GpXi0rKwvFxcXVo7+YmBhIpVJeX5ToNbT7z1oiwdq0aYMTJ07Aysqq\nepk2n1N4584dvPvuu5g0aRJu3boFf39/liBRLViERLVwcXHBoUOHYGJiAgA4fPgwSktLBaeq2aBB\ng5CZmQmpVKoTB/YQaQNOjRJp6Ouvv8bMmTMBAIcOHcKECRMEJ1L33//+F3l5eXj//fe1/jNMIm3C\nnxYiDc2YMQMLFiwAoJ3To+3atcOuXbsQFhYmOgqRTuGIkKgOqqqqMGbMGJw6dQp37tyBra2t6Ehq\nVCoVKioqYG5uLjoKkc7giJCoDoyNjbF//35IpVIcPnxYdJxqX331FbZu3QojIyOWIFEdcURIVA+5\nubmYP3++1tyeqbi4GMXFxXjrrbdERyHSOSxConqKj4/H06dPUVlZKTRH586dIZVKhWYg0mUmogMQ\n6aq8vDyEhYXh1q1bQnM4Ojpi8eLFmDhxotAcRLqKI0Kienj06BGcnZ2xbNkyLFy4UGiWzZs347PP\nPkN2djZatmwpNAuRLuLBMkT18Nlnn6Ft27aYN2+e6CiYP38+bGxsarxjBhHVjiNCojq6fv063N3d\nERUVBX9/f9FxAPx6TdHhw4fjp59+QteuXUXHIdIpLEKiOho3bhwqKytx4sQJ0VHUjBgxAubm5jhy\n5IjoKEQ6hUVIVAenT5/GqFGjcO3aNXTp0kV0HDU5OTlwd3fH8ePHMXjwYNFxiHQGi5BIQwqFAh4e\nHhg2bBg2bNggOk6NZDIZoqOjkZqaWn2RcCJ6PR4sQ6ShiIgIyOVyrFixQnSUV1q5ciWKi4uxfft2\n0VGIdAZHhEQaKCoqQteuXbF+/frqO1Boqx07diA4OBjZ2dmwtrYWHYdI67EIiTSwYMECXLp0CZcu\nXdL6WxwplUr07t0bPj4+2LJli+g4RFqPRUhUi7S0NHh6euLcuXPw8fERHUcjFy5cwDvvvIMrV67A\n3d1ddBwircYiJKqFv78/2rZtiwMHDoiOUicTJ05EYWEhYmJiREch0mosQqLXOH78OAIDA5GZmYmO\nHTuKjlMnd+7cgUQiwYEDBzBy5EjRcYi0lnZ/2EEkUHl5OYKCgiCTyXSuBAGgY8eOWLx4MT766COU\nl5eLjkOktViERK+wZcsWKBQKLF26VHSUegsODoZCocDWrVtFRyHSWpwaJarBvXv34OLigoiICEya\nNEl0nDeyf/9+zJkzB1lZWejQoYPoOERah0VIVIMZM2YgJycH586dEx3ljalUKrzzzjtwdnbGzp07\nRcch0josQqLfSExMRL9+/ZCYmIi3335bdJwGceXKFfTq1QsXLlyAt7e36DhEWoVFSPQClUqFfv36\nwdXVVe9GTzNmzEBGRgbi4+NhZGQkOg6R1uDBMkQvOHDgANLT07F27VrRURrcp59+ivT0dBw8eFB0\nFCKtwiIkeubJkydYsmQJli9fjnbt2omO0+DatWuHkJAQLFmyBE+ePBEdh0hrsAiJnlm/fj1+97vf\nYeHChaKjNJpFixbB0tJSa28jRSQCPyMkApCbmwtXV1ccOnRI76/Ccvz4cUycOBEZGRlwdHQUHYdI\nOBYhNakB2QMwotUIzLacjcDAQHTr1g3du3dHt27d4OzsDFNTUyG5Jk6ciJKSEkRHRwt5/6YWEBCA\n1q1ba/X1U5/vKzJbmegopOd4C2tqULkVuQi9G4rvHn2HQkUhbE1tMd5qPJbaLoWtqW31elVVVTh5\n8iROnjyp9vwOHTqgZ8+ecHNzg6urK3r27AmpVNqotz46d+4cjh49itTU1EZ7D20TFhYGDw8PnD9/\nHv3792+091EqlcjIyEBKSgrS09Nx7do1hISEoG/fvhrvK0SNjSNCajBZZVnol9UPTmZO+Lj9x3A2\nd8ZDxUPsL96PlsYtsbXT1uq/8meaz4SVlZVGr2tmZoa33npLrSC9vb3Rvn37N85cVVUFLy8v9O/f\n3+Du3bdgwQLExcUhKSkJxsbGb/x69+/fR1JSUnXhpaSkICcnBxUVFWrrff/992jfu73G+wpHhNTY\nOCKkBjMvbx7sTe1xUXIRpkb/P8U5uOVgFCmK6v26FRUVSE9PR3p6utpyKyur6lHj84L09PSEpaWl\nxq/9zTff4M6dO1i1alW98+mq0NBQODs7Y9euXZg5c6bGz3v69CkuX76sVnjp6emQy+Uav0Zj7StE\n9cEipAZRqChE7ONY7HbarfaL7TlrE+sGf0+5XI74+HjEx8dXLzMxMYGDg8NLBVnT9Oovv/yCv/71\nrwgNDdV4dKpPrKyssGrVKixfvhzjx49Hq1at1B5/Pq3528LLy8uDQqGo9/uWGJU0+b5C9DosQmoQ\nP5f/DBVUkDSXCM2hUChw48YN3LhxQ+3zxxYtWsDZ2VmtIBMTE9G6dWvMmjVLYGKxPvjgA2zZsgXb\ntm2Dt7e3WuFlZ2fj8ePHDf6ed5vd1Yp9heg5nkdIBuH5CMbY2BgKhQIqlQpKpVJwKu2hVCqhVCqh\nUCiqPy98k1EfkS7hiJAaRBfzLgCAzLJMeP9O3EWdLSws4OnpWT0l+vxfOzu7l9b19fXFjh07sGXL\nFshkhnlAxubNm1FRUYGgoCBYWFhg8ODBao/fvXu3emr0+b+XL19GaWlpvd/TTvnr/wvR+wrRcyxC\nahA2Jjbwa+GHTQWbMKnNJJgYqe9aRYqiBv3sp1mzZpBIJGqnWbi5ucHBwQEmJprt1hYWFli7di3m\nzp2LyZMnG9y9+u7du4c1a9bgq6++goWFRY3r2NnZwc7ODv7+/tXLFAoF8vLy1KZRr127hszMTI1G\n2a1UrZp0XyGqDU+foAaTUZaBfpn98FbztyCzlcHZ3BkPFA/e+PSJms4t7Nq1K8zMzN448/N79bm4\nuGDHjh1v/Hq6ZObMmcjOzsa5c+ca5G4UFRUVuH79ulo5pqSk4N69e2rrff/992jXu53G+wpPn6DG\nxhEhNRhpcymSpckIvReKhbcXolBRiPam7TGi1QgEtw+u9fm///3v4eHhoVZ6bm5ujXpEp5GRETZv\n3ozevXtj9uzZ8PLyarT30iZJSUnYvXs3EhMTG+yWTGZmZnBzc4Obm5vacrlcrja12qpVqzfeV4ga\nEkeEJMTjx48RGBgId3d3dO/eHe7u7pBIJMIusfbnP/8ZWVlZiIuL0/t79alUKvj6+sLFxQXffPON\n6DhEwrEIiQAUFBTA2dkZ27dvx7vvvis6TqM6ePAg5syZg6ysLNja8lJmRDx9ggiAra0tli1bBplM\nptf36nvy5Ak+/vhjLFu2jCVI9AyLkOiZ56cQbNy4UXSURrNhwwZYWlrio48+Eh2FSGtwapToBceO\nHUNgYKBe3qvv1q1bkEqlOHDgAEaPHi06DpHWYBES/cbQoUPRpk0b7N+/X3SUBjVp0iTI5XKcOnVK\ndBQircIiJPqN9PR09OjRAzExMY16r76mdP78efj7+yM1NRVSqVR0HCKtwiIkqsGHH36I+Ph4JCcn\nN+pNgZtCVVUVvL294evri61bt4qOQ6R1dPsnnKiRrF69Grdv38auXbtER3lj//znP3H79m2EhoaK\njkKklViERDWwsrLCypUrERISgpKSEtFx6q2kpAQhISFYtWqVQd5zkUgTnBoleoWqqiq8/fbbCAgI\nwLp160THqZclS5bgu+++w+XLl6tvr0RE6liERK8RGxuLgIAApKWlwdnZWXScOsnOzka3bt0QHR0N\nPz8/0XGItBaLkKgWY8aMgUqlwr///W/RUepk1KhRMDY2RmRkpOgoRFqNRUhUixs3bsDV1RXHjh3D\n0KFDRcfRyKlTpzB27Fhcu3YNf/jDH0THIdJqLEIiDQQHB+P48eNITU0VdocMTVVWVsLDwwOjR4/G\np59+KjoOkdbjUaNEGli+fDl++eUXfPnll6Kj1Grbtm0oKSnB8uXLRUch0gkcERJpaNeuXVi8eDGy\ns7NhY2MjOk6NCgsL4ezsjLCwMLz33nui4xDpBBYhkYaUSiX69u0LLy8vfPHFF6Lj1Ogvf/kLUlJS\ncOnSJb2/wTBRQ2EREtXBxYsX0b9/f6SkpKB79+6i46hJTU2Fl5cXfvjhB/Tp00d0HCKdwSIkqqMp\nU6bg7t27iI2NFR1FjZ+fH+zt7bF3717RUYh0CouQqI7y8/Ph4uKCvXv3YuzYsaLjAACOHj2K6dOn\nIzMzE/b29qLjEOkUHjVKVEf29vZYunQpZDIZysrKRMdBWVkZZDIZli5dyhIkqgcWIVE9fPzxx1Aq\nlQgPDxcdBWFhYVCpVJDJZKKjEOkkTo0S1dOBAwcQHh6O3NxcoTkcHBwgk8kwceJEoTmIdBWLkOgN\nZGRk4ObNm0IzmJmZoaKiAsOGDROag0hXmYgOQKSLcnJy0KZNG0ilUkilUtFxEBAQAAcHB7i7u4uO\nQqRz+BkhUT385z//wbfffis6RrXRo0dj1KhRePDggegoRDqHU6NEdVBeXg4zMzOtu2qLXC5Hhw4d\n4O3tjZiYGJibm4uORKQzOCIkqoOtW7eif//++PHHH0VHUWNlZYVhw4YhLi4Os2fPFh2HSKewCInq\nYPHixZg+fbpWTkFOnToVALB7926sX79ecBoi3cGpUSINfPfdd0hISIBMJoOlpaXoODWqqKiAvb09\nCgsL0axZMxw5cgRjxowRHYtI63FESKQBiUSCjIwMSCQSnDlzRnScGpmZmWH8+PEAfr1TRmBgIJKS\nkgSnItJ+HBESvUZ+fj4yMzMxaNAgAL/efcLa2hrOzs6Ck9Xs4sWL8PHxqf7azs4OiYmJvPQa0Wuw\nCIleIyEhAZMnT4a7uzs2bNiArl27io5UKxcXF2RnZ1d/3bNnT5w/f15rp3SJROPUKFENlEolHj16\nhN69e+PatWvw8fGBj48PIiMjRUerVWBgoNrXKSkpmDZtGvg3L1HNWIRENbhw4QIkEgl27twJU1NT\nLFmyBD/99BMGDBggOlqtpk+f/tJ5jkeOHMGqVavEBCLScixCohr4+vri7NmziIyMhKurK6KiomBr\nawsrKyvR0Wrl5OSk9jnhc2vWrMG+ffsEJCLSbixCohdkZGRgyJAhSEtLg4uLC6KiorB69WrMmzcP\nR44cER1PY8/PKXyRSqXCzJkzcfHiRQGJiLQXD5YhekFVVRW2b9+O0NBQjBs3DmvWrIG1tTVKS0vR\nrFkznbl02fNLrpWXl7/0mK2tLRITE+Hg4CAgGZH24YiQCL9eQ/TLL79EVVUV5s6di4yMDJiamsLV\n1RVbt26FqampzpQg8Osl14YPH17jYwUFBQgICEBJSUkTpyLSTixCIgAlJSWIjo6Gm5sbIiMjYWVl\nhS1btuDs2bOIiopCVFSU6Ih1VtP06HPp6emYNGkSqqqqmjARkXbi1CgZvKysLLi4uAAAYmJiEBQU\nBBsbG4SHh8PDw0Nwuvp78ZJrryKTybBhw4YmTEWkfTgiJINWXFyMP/7xj5gyZQry8/Ph7++PK1eu\nYMKECRg6dChmzZqFgoIC0THrxczMDBMmTHjtOhs3bkREREQTJSLSTixCMmht2rRBRkYGOnfujB49\nemDt2rWorKzEnDlzkJmZidatW+P27duiY9bbb6dHO3bsCCcnJ7X/Nm7ciAsXLghKSCQep0bJYN27\ndw9nzpzB5MmTYWRkhJs3b0Imk+Hq1avYuHEjxo4dKzpig5BIJMjKygIA5ObmwtHRUXAiIu3CESEZ\nrDZt2uAf//gHfHx8kJCQgM6dO+PIkSPYsWMHVqxYAX9/f+Tk5IiO+cZ+e8k1IlLHIiSDMyB7ADYW\nbIS5uTkuXbqEuXPn4k9/+hOmTp2K/Px8+Pn54cqVKxg7dixMTExEx31j06ZNe+mSa7V5vo2IDAGL\nkPRObkUu3s99H3Y/2sHsshk6pXVC0J0gFFSqH/Qil8tRUlKCadOmITMzE46OjujRowfWrFmDyspK\nzJs3D05OTmK+iQbk5OSEfv36qS3TdBsRGQIWIemVrLIseGV4Ia00DeGdwpEgScDXjl+jWFGMT+5/\norauSqWCVCpFREQEmjdvjr///e9ISkpCWloapFIprl69KuabaAQvHjRTl21EZAh4sAzpFf9sfzxU\nPESyNBmmRqZqjxUpimBtYo0B2QMwotUIyGxlSE1NxaJFiyCXyxEeHo6BAwcCAOLi4uDh4YEWLVqI\n+DYaXFFREezs7JCdnY0Z5TPqtI2I9J3ufwBC9EyhohCxj2Ox22n3S7/gAcDaxFrt6/Lycnh4eODs\n2bOIiYnBhx9+iA4dOiA8PBy+vr5NFbtJWFtbY+TIkShWFtdpGxEZAk6Nkt74ufxnqKCCpLlEo/Xv\n37+POXPm4OHDh/D390dycjIGDhyIgQMHIjk5uZHTNr2pU6ciT5lXp21EZAhYhGSwmjdvjoiICDg7\nOyM8PBzGxsYICQlBWloaPD09RcdrcMOGDavz0aNEhoBFSHqji3kXAEBmWaZG69va2uLMmTNwcHBA\nUFAQunTpgj179qB9+/Zo1kz/fjRMTU0hbSEFoPk2IjIE+vfTTgbLxsQGfi38sKlgExQqxUuPFymK\nXlrm5+eHpKQkbNq0CY8fP8b06dORkJDQFHGFcGjpUOdtRKTvWISkVz53+Bx5FXnwyfLBYflhXH16\nFacfncZ7ue8h9F5ojc8xMzNDUFAQsrOz8cknn8Db27uJUzcdc3Pzem0jIn3GIiS9Im0uRbI0GdLm\nUiy8vRDemd6YcWsGLJpZILh98Guf27ZtW4SEhOjltOiL3mQbEekjnkdIREQGTb//9CUiIqoFi5CI\niAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwai5CIiAwa\ni5CIiAza/wFf9ulLiNI27wAAAABJRU5ErkJggg==\n",
"prompt_number": 66,
"text": [
""
]
}
],
"prompt_number": 66
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"FraggleSim.generate_fraggle_fragmentation(subset2.ix[21906]['mol1'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 67,
"text": [
"set()"
]
}
],
"prompt_number": 67
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What about how different the compounds are?\n",
"\n",
"This is repeating the last bit of analysis from http://rdkit.blogspot.ch/2013/10/comparing-fingerprints-to-each-other.html"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nToDo=200\n",
"apl = sorted(scoredLists['AP'],reverse=True)[:nToDo]\n",
"ttl = sorted(scoredLists['TT'],reverse=True)[:nToDo]\n",
"avl = sorted(scoredLists['Avalon-1024'],reverse=True)[:nToDo]\n",
"rdkl = sorted(scoredLists['RDKit5'],reverse=True)[:nToDo]\n",
"fragl = sorted(scoredLists['Fraggle'],reverse=True)[:nToDo]\n",
"\n",
"idsToKeep=set()\n",
"idsToKeep.update([x[1] for x in apl])\n",
"idsToKeep.update([x[1] for x in ttl])\n",
"idsToKeep.update([x[1] for x in avl])\n",
"idsToKeep.update([x[1] for x in rdkl])\n",
"idsToKeep.update([x[1] for x in fragl])\n",
"\n",
"print 'Overall number:',len(idsToKeep)\n",
"ids={}\n",
"ids['AP']=set([x[1] for x in apl])\n",
"ids['TT']=set([x[1] for x in ttl])\n",
"ids['Avalon-1024']=set([x[1] for x in avl])\n",
"ids['RDKit5']=set([x[1] for x in rdkl])\n",
"ids['Fraggle']=set([x[1] for x in fragl])\n",
"\n",
"\n",
"ks = sorted(ids.keys())\n",
"for i,k in enumerate(ks):\n",
" for j in range(i+1,len(ks)):\n",
" overlap=len(ids[k].intersection(ids[ks[j]]))\n",
" print ks[i],ks[j],overlap,'%.2f'%(float(overlap)/len(apl))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overall number: 475\n",
"AP Avalon-1024 102 0.51\n",
"AP Fraggle 77 0.39\n",
"AP RDKit5 112 0.56\n",
"AP TT 137 0.69\n",
"Avalon-1024 Fraggle 68 0.34\n",
"Avalon-1024 RDKit5 125 0.62\n",
"Avalon-1024 TT 111 0.56\n",
"Fraggle RDKit5 82 0.41\n",
"Fraggle TT 70 0.35\n",
"RDKit5 TT 117 0.58\n"
]
}
],
"prompt_number": 78
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nToDo=100\n",
"apl = sorted(scoredLists['AP'],reverse=True)[:nToDo]\n",
"ttl = sorted(scoredLists['TT'],reverse=True)[:nToDo]\n",
"avl = sorted(scoredLists['Avalon-1024'],reverse=True)[:nToDo]\n",
"rdkl = sorted(scoredLists['RDKit5'],reverse=True)[:nToDo]\n",
"fragl = sorted(scoredLists['Fraggle'],reverse=True)[:nToDo]\n",
"\n",
"idsToKeep=set()\n",
"idsToKeep.update([x[1] for x in apl])\n",
"idsToKeep.update([x[1] for x in ttl])\n",
"idsToKeep.update([x[1] for x in avl])\n",
"idsToKeep.update([x[1] for x in rdkl])\n",
"idsToKeep.update([x[1] for x in fragl])\n",
"\n",
"print 'Overall number:',len(idsToKeep)\n",
"ids={}\n",
"ids['AP']=set([x[1] for x in apl])\n",
"ids['TT']=set([x[1] for x in ttl])\n",
"ids['Avalon-1024']=set([x[1] for x in avl])\n",
"ids['RDKit5']=set([x[1] for x in rdkl])\n",
"ids['Fraggle']=set([x[1] for x in fragl])\n",
"\n",
"\n",
"ks = sorted(ids.keys())\n",
"for i,k in enumerate(ks):\n",
" for j in range(i+1,len(ks)):\n",
" overlap=len(ids[k].intersection(ids[ks[j]]))\n",
" print ks[i],ks[j],overlap,'%.2f'%(float(overlap)/len(apl))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Overall number: 240\n",
"AP Avalon-1024 58 0.58\n",
"AP Fraggle 18 0.18\n",
"AP RDKit5 69 0.69\n",
"AP TT 86 0.86\n",
"Avalon-1024 Fraggle 16 0.16\n",
"Avalon-1024 RDKit5 56 0.56\n",
"Avalon-1024 TT 60 0.60\n",
"Fraggle RDKit5 24 0.24\n",
"Fraggle TT 21 0.21\n",
"RDKit5 TT 70 0.70\n"
]
}
],
"prompt_number": 79
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fraggle is really pulling back different compounds."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}