{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": true, "input": [ "#helper functions\n", "\n", "from rdkit import Chem\n", "from rdkit.Chem.Draw import IPythonConsole\n", "from rdkit.Chem import Draw\n", "from rdkit.Chem import AllChem\n", "\n", "def depict(input):\n", " if(\">>\" in input):\n", " rxn = AllChem.ReactionFromSmarts(input) \n", " return Draw.ReactionToImage(rxn)\n", " else:\n", " temp = Chem.MolFromSmiles(input)\n", " return temp\n", "\n", "def showMMPs(in_string):\n", " f = in_string.split(\",\")\n", " \n", " rxn =f[-2].split(\">>\") \n", " \n", " mols=[]\n", " ids=[]\n", " \n", " mols.append( Chem.MolFromSmiles(f[-6]) )\n", " mols.append( Chem.MolFromSmiles(f[-5]) )\n", " mols.append( Chem.MolFromSmiles(rxn[0]) )\n", " mols.append( Chem.MolFromSmiles(rxn[1]) )\n", " mols.append( Chem.MolFromSmiles(f[-1]) )\n", " ids.append(f[-3])\n", " ids.append(f[-4])\n", " ids.append(\"LHS\") \n", " ids.append(\"RHS\") \n", " ids.append(\"CONTEXT\") \n", " \n", " return Draw.MolsToGridImage(mols,molsPerRow=6,legends=ids)\n", "\n", "def showLine(in_string):\n", " f = in_string.split(\",\")\n", " \n", " mols=[]\n", " ids=[]\n", " \n", " mols.append( Chem.MolFromSmiles(f[0]) )\n", " mols.append( Chem.MolFromSmiles(f[1]) )\n", " mols.append( Chem.MolFromSmiles(f[4]) )\n", " mols.append( Chem.MolFromSmiles(f[5]) )\n", " ids.append(\"Query:%s\" % f[2])\n", " ids.append(f[3])\n", " ids.append(\"CHANGE\")\n", " ids.append(\"CONTEXT\") \n", " \n", " return Draw.MolsToGridImage(mols,molsPerRow=4,legends=ids)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Generating and searching an MMP database" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pair index used in the MMP identification algorithm can be written to a relational database. For the indexing.py\n", "program already described, the index is written to memory and the program will identify all the MMPs in the dataset.\n", "**However, if you just want to ask a (series of) specific questions on a dataset, a relational database containing the\n", "pair index (MMP db) can be used to do that.**\n", "\n", "The program **create_mmp_db.py** will build a MMP db for a given dataset and the program **search_mmp_db.py** can be used to\n", "search the MMP db. The types of searching that can be performed on the db are as follows:\n", "\n", "1. Find all MMPs of an input/query compound to the compounds in the db\n", "2. Find all MMPs in the db where the LHS of the transform matches an input substructure\n", "3. Find all MMPs that match the input transform/SMIRKS\n", "4. Find all MMPs in the db where the LHS of the transform matches an input SMARTS \n", "5. Find all MMPs that match the LHS and RHS SMARTS of the input transform\n", "\n", "The SMARTS searching utilises the DbCLI tools (http://code.google.com/p/rdkit/wiki/UsingTheDbCLI) that are part of the RDKit distribution.\n" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Generating the db" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To generate an MMP db use the following command:\n", "\n", " python $RDBASE/Contrib/mmpa/create_mmp_db.py >RHS_SMARTS (eg. [#0]c1ccccc1>>[#0]c1ccncc1). Note: This search can take a long time to run if a very general SMARTS expression is used." ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "a) To carry out a mmp search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find all MMPs of a input/query compound to the compounds in the db. You imagine using this search to identify analogues with single point changes\n", "\n", "Use search type: mmp\n", "\n", "Format of input file (space or comma separated. The ID field is optional): SMILES ID \n", "\n", "Format of output: SMILES_QUERY,SMILES_OF_MMP,QUERY_ID,RETRIEVED_ID,CHANGED_SMILES,CONTEXT_SMILES\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head sample_db_input_smi.txt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "depict(\"c1cc2c(ncnc2NCc2cccnc2)s1\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!python $RDBASE/Contrib/mmpa/search_mmp_db.py -t mmp >[*:1]c1cccnc1,[*:1]CNc1ncnc2sccc21\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "c) To carry out a transform search:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find all MMPs that match the input transform/SMIRKS. Make sure the input SMIRKS has been **canonicalised using the cansmirk.py program**.\n", "\n", "Use search type: trans\n", "\n", "Format of input file (space or comma separated. The ID field is optional): SMIRKS ID\n", "\n", "Format of output: [input_id,]SMILES_MMP1,SMILES_MMP2,MMP1_ID,MMP2_ID,Transform,Context\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head sample_db_input_trans.txt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "depict(\"[*:1]c1ccccc1>>[*:1]c1cccnc1\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!python $RDBASE/Contrib/mmpa/search_mmp_db.py -t trans >[*:1]c1cccnc1,[*:1]CNc1ncnc2sccc21\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "d) To carry out a LHS transform substructure SMARTS search:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find all MMPs in the db where the LHS of the transform matches an input SMARTS. **The attachment points in the SMARTS can be denoted by [#0]** (eg. [#0]c1ccccc1).\n", "\n", "Use search type: subs_smarts\n", "\n", "Format of input file (space or comma separated. The ID field is optional): SMARTS ID\n", "\n", "Format of output: [input_id,]SMILES_MMP1,SMILES_MMP2,MMP1_ID,MMP2_ID,Transform,Context\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head sample_db_input_subs_smarts.txt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!python $RDBASE/Contrib/mmpa/search_mmp_db.py -t subs_smarts >[*:1]c1ccc(C(=O)O)cc1,[*:1]NC(=O)C1COc2ccccc2O1\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "showMMPs(\"a,c1cc2c(ncnc2NCc2ccccc2)s1,c1cc2c(ncnc2NCc2cccnc2)s1,2139597,2531831,[*:1]c1ccccc1>>[*:1]c1cccnc1,[*:1]CNc1ncnc2sccc21\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "e) To carry out a transform SMARTS search:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find all MMPs that match the LHS and RHS SMARTS of the input transform. The transform SMARTS are input as **LHS_SMARTS>>RHS_SMARTS** (eg. [#0]c1ccccc1>>[#0]c1ccncc1). Note: This search can take a long time to run if a very general SMARTS expression is used.\n", "\n", "Use search type: trans_smarts\n", "\n", "Format of input file (space or comma separated. The ID field is optional): SMARTS ID \n", "\n", "Format of output: input_transform_SMARTS,[input_id,]SMILES_MMP1,SMILES_MMP2,MMP1_ID,MMP2_ID,Transform,Context\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head sample_db_input_trans_smarts.txt" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!python $RDBASE/Contrib/mmpa/search_mmp_db.py -t trans_smarts >n,ts,c1cc2c(ncnc2NCc2ccccc2)s1,c1cc2c(ncnc2NCc2cccnc2)s1,2139597,2531831,[*:1]NCc1ccccc1>>[*:1]NCc1cccnc1,[*:1]c1ncnc2sccc21\")" ], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }