{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This is an updated version of an [earlier post](http://rdkit.blogspot.com/2019/12/using-r-group-decomposition-code.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The RDKit's code for doing R-group decomposition (RGD) is quite flexible but also rather \"undocumented\". Thanks to that fact, you may not be aware of some of the cool stuff that's there. This post is an attempt to at least begin to remedy that by looking at some of the edge cases that come up while doing RGD. \n", "\n", "I have another post coming in the near future which is a bit more of a tutorial, but here we'll look at a number of difficult/interesting problems that arise all the time when doing RGD on real-world datasets:\n", "\n", "- Handling symmetric cores\n", "- Handling stereochemistry\n", "- Handling sidechains that attach to the core at more than one point\n", "- Handling multiple scaffolds or variable scaffolds\n", "\n", "Some of these problems are really tricky to solve perfectly, so please expect that there will be bugs (particularly in the code for handling symmetrization). If you find something that seems wrong, please do file a bug report, ideally with the exact code and structures that you used.\n", "\n", "The code in this blog post behaves correctly with v2019.09.1 and later of the RDKit. Older versions have bugs that generate different results for some of the examples here. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.249874Z", "start_time": "2023-01-05T12:50:05.028680Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022.09.1\n" ] } ], "source": [ "import pandas as pd\n", "from rdkit import Chem\n", "from rdkit.Chem.Draw import IPythonConsole\n", "from rdkit.Chem import Draw\n", "from rdkit.Chem import rdDepictor\n", "from rdkit.Chem import PandasTools\n", "IPythonConsole.ipython_useSVG=True\n", "from rdkit.Chem import rdRGroupDecomposition\n", "from rdkit import RDLogger\n", "RDLogger.DisableLog('rdApp.warning')\n", "import rdkit\n", "print(rdkit.__version__)\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.253793Z", "start_time": "2023-01-05T12:50:05.251495Z" } }, "outputs": [], "source": [ "PandasTools.RenderImagesInAllDataFrames()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Basics: a symmetric core\n", "\n", "Let's start with an easy example that has a symmetric core. In this case R1 and R5 are symmetry equivalent as are R2 and R4):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.388520Z", "start_time": "2023-01-05T12:50:05.254881Z" } }, "outputs": [ { "data": { "image/png": "\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('[*:1]c1c([*:2])c([*:3])c([*:4])c([*:5])n1')\n", "scaffold" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some molecules that share that scaffold. We've provided the atoms in different orders to make sure that's properly handled by the RGD code." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.479596Z", "start_time": "2023-01-05T12:50:05.391123Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cccn1 c1c(Cl)c(C)ccn1 c1c(O)cccn1 c1c(F)c(C)ccn1 c1cc(Cl)c(F)cn1'.split()]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do a version where we provide a scaffold without the R labels to start with:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.560231Z", "start_time": "2023-01-05T12:50:05.484558Z" } }, "outputs": [ { "data": { "image/png": "\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('c1ccccn1')\n", "scaffold" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.779397Z", "start_time": "2023-01-05T12:50:05.561421Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Those labels were automatically assigned and they aren't consistent with what we had above. Notice, however, that the symmetry in the scaffold has been properly handled. \n", "\n", "If we care about the R group labels, We can explicitly label the side chains:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.838879Z", "start_time": "2023-01-05T12:50:05.780626Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR2R3
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R2 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R3 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note: there's a bug in RDKit 2019.03.3 and 2019.03.4 that causes this to generate different\n", "# results with those versions\n", "scaffold = Chem.MolFromSmiles('c1c([*:2])c([*:3])ccn1')\n", "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've just been looking at compound images since that's a bit more readable. Here's what the raw output from the function looks like:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.888219Z", "start_time": "2023-01-05T12:50:05.839909Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Core': ['c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1'], 'R2': ['F[*:2]', 'Cl[*:2]', 'O[*:2]', 'F[*:2]', 'F[*:2]'], 'R3': ['[H][*:3]', 'C[*:3]', '[H][*:3]', 'C[*:3]', 'Cl[*:3]']}\n" ] } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=False) \n", "print(groups)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also get that in a row-oriented format:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.993161Z", "start_time": "2023-01-05T12:50:05.891802Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': '[H][*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'Cl[*:2]', 'R3': 'C[*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'O[*:2]', 'R3': '[H][*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': 'C[*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': 'Cl[*:3]'}]\n" ] } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=True) \n", "print(groups)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stereochemistry\n", "\n", "Making sure that the sidechains are labelled correctly on chiral centers can be a bit trickier.\n", "\n", "Here's a set of molecules we'll be using. Some have a chiral center, some don't. There are a few that have sidechains with dual attachment points (i.e. rings). We'll look at those in the next section." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.075609Z", "start_time": "2023-01-05T12:50:06.002666Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [x for x in Chem.SDMolSupplier('../data/rgd_chiral.sdf')]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remove the examples with \"ring\" sidechains. We'll get to those later" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.163267Z", "start_time": "2023-01-05T12:50:06.076737Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = Chem.MolFromSmarts('[R2]')\n", "mols = [x for x in mols if not x.HasSubstructMatch(q)]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll define two scaffolds, the first with a chiral center, the second without. In this case we will add explicit markers for the substituents. This is currently (v2019.09) necessary to properly handle atomic chirality." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.269438Z", "start_time": "2023-01-05T12:50:06.164977Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('[*:1]C1([*:2])OCCC=C1')\n", "chiral_scaffold = Chem.MolFromSmiles('[*:1][C@]1([*:2])OCCC=C1')\n", "Draw.MolsToGridImage([scaffold,chiral_scaffold])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start with doing a decomposition with the non-chiral scaffold. This matches all the molecules, but generates results that are not consistent with the chirality. The compounds in rows 2 and 3 (numbered from zero) demonstrate the problem clearly." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.461255Z", "start_time": "2023-01-05T12:50:06.274717Z" }, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try the chiral scaffold. This one will only match the chiral compounds, but it does the right thing with those:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.552713Z", "start_time": "2023-01-05T12:50:06.462250Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[13:50:06] No core matches\n", "[13:50:06] No core matches\n", "[13:50:06] No core matches\n", "[13:50:06] No core matches\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([chiral_scaffold],mols,asSmiles=False,asRows=False)\n", "tmols = [mols[x] for x in range(len(mols)) if x not in unmatched]\n", "PandasTools.RGroupDecompositionToFrame(groups,tmols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in each case the atom is assigned to the correct R group." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also combine the two scaffolds so that we can get the chiral and achiral cases. Order is important, so we include the more specific scaffold (the chiral one) first. In this case the stereochemistry determines the R1/R2 assignment for the chiral molecules. For the non-chiral molecules R1 and R2 are assigned using the standard symmetrization code." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.654456Z", "start_time": "2023-01-05T12:50:06.553996Z" }, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([chiral_scaffold,scaffold],mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sidechains that attach in more than one place\n", "\n", "This one is tricky, and there's not really a right answer, this is just a demonstration of what the current code does" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.661797Z", "start_time": "2023-01-05T12:50:06.655405Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [x for x in Chem.SDMolSupplier('../data/rgd_chiral.sdf')]\n", "q = Chem.MolFromSmarts('[R2]')\n", "mols = [x for x in mols if x.HasSubstructMatch(q)]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.782570Z", "start_time": "2023-01-05T12:50:06.662717Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('C1OCC=CC1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Scaffold Variation\n", "\n", "What happens if there are small variations in the scaffold within the series, something that we see all the time in med chem work?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.880913Z", "start_time": "2023-01-05T12:50:06.783759Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cccn1 c1c(Cl)c(C)ccn1 c1c(F)cncn1 c1c(F)c(C)ccn1'.split()]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.969261Z", "start_time": "2023-01-05T12:50:06.886152Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[13:50:06] No core matches\n" ] }, { "data": { "text/plain": [ "{'Core': ['c1cc([*:2])c([*:1])cn1',\n", " 'c1cc([*:2])c([*:1])cn1',\n", " 'c1cc([*:2])c([*:1])cn1'],\n", " 'R1': ['F[*:1]', 'Cl[*:1]', 'F[*:1]'],\n", " 'R2': ['[H][*:2]', 'C[*:2]', 'C[*:2]']}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('c1c([*:1])c([*:2])ccn1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=False)\n", "# the second return value, unmatched, provides the indices of the molecules that did not match a scaffold:\n", "print(unmatched)\n", "groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that now we only get three results, the third molecule (index 2) didn't end up in the output.\n", "Sometimes this is ok, but in cases like this it would be great if that molecule were also included in the R-group decomposition. \n", "\n", "One solution to this is to provide two different scaffolds:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.095976Z", "start_time": "2023-01-05T12:50:06.974410Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold2 = Chem.MolFromSmiles('c1c([*:1])c([*:2])ncn1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([scaffold,scaffold2],mols,asSmiles=False,asRows=False)\n", "print(unmatched)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that `unmatched` is now empty; all molecules matched one of the two cores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another is provide the scaffold as SMARTS:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.207251Z", "start_time": "2023-01-05T12:50:07.097158Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sma_scaffold = Chem.MolFromSmarts('c:1:c(-[*:1]):c(-[*:2]):*:c:n:1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([sma_scaffold],mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiple scaffolds\n", "\n", "What about if we have multiple scaffolds which share a common SAR? Here we just provide them both and label the attachment points manually to show the correspondance." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.311711Z", "start_time": "2023-01-05T12:50:07.208364Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [Chem.MolFromSmiles(smi) for smi in 'Fc1ccc(O)cc1 Fc1ccc(OC)cc1 Oc1ccc(Cl)cc1 Clc1ccc(OC)cc1 Fc1ccc(O)s1 COc1ccc(F)s1 Clc1ccc(O)s1 Clc1ccc(OC)s1'.split()]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.554666Z", "start_time": "2023-01-05T12:50:07.314210Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffolds = [Chem.MolFromSmiles('[*:1]c1ccc([*:2])cc1'),Chem.MolFromSmiles('[*:1]c1ccc([*:2])s1')]\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose(scaffolds,mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the symmetrization also worked." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Looking at the options that are available\n", "\n", "We'll use a real dataset pulled from ChEMBL for this:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.565016Z", "start_time": "2023-01-05T12:50:07.555903Z" } }, "outputs": [ { "data": { "image/png": "\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "core = Chem.MolFromSmiles('c1ccccc1-c2nc(c1ccccc1)no2')\n", "core" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.740187Z", "start_time": "2023-01-05T12:50:07.566164Z" } }, "outputs": [], "source": [ "smiles = ['CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5ccccc5)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'CCOc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'CC(C)Cc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5ccc(F)cc5)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CCC5)cc4)n3)cc2)C1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4noc(-c5ccccc5)n4)cc3)cc2)C(=O)NC(=O)NC1=O', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc([C@@H]5CCC(F)(F)C5)cc4)n3)cc2)C1', 'CC(C)(C)c1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C(F)(F)F)c(C#N)c2)n1', 'Cc1cc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)ccc1OC(C)C', 'CC(C)(C)Cc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(F)c2)n1', 'COc1cc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)ccc1OC(C)C', 'CC(C)(C)c1ccc(-c2noc(-c3ccc(CN4CC(C(=O)O)C4)cc3)n2)cc1', 'Cc1cc(CCCCCC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OCC(F)(F)F)c(C#N)c2)n1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5cccc(F)c5)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'Cc1ccccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'CCCCc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(C#N)c2)n1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(CCC(F)(F)F)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(-c5ccccc5)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CCCCC5)cc4)n3)cc2)C1', 'CC[C@H](C)Oc1ccc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)cc1C#N', 'Cc1cc(CCCC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(C(F)(F)F)c2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(Cl)c2)n1', 'CCCCCCc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(Br)c2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C(F)(F)F)C(F)(F)F)c(C#N)c2)n1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CCCC5)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc([C@H]5CCC(F)(F)C5)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CC5)cc4)n3)cc2)C1', 'CCCc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5ccccc5F)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'CCC(C)(C)c1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'CC(C)Oc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(C(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'CCOc1ccc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)cc1C(F)(F)F', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'COc1ccc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)cc1C(F)(F)F']\n", "mols = [Chem.MolFromSmiles(x) for x in smiles]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.897680Z", "start_time": "2023-01-05T12:50:07.742403Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Draw.MolsToGridImage(mols[:16],molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.249206Z", "start_time": "2023-01-05T12:50:07.898786Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2R3R4
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
10
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
11
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
12
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
13
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
14
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
15
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R2 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R3 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R4 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we'll use just the first 16 molecules to make things a bit smaller for this demo\n", "m16 = mols[:16]\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([core],m16,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,m16,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do a query with labelled R groups:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.261221Z", "start_time": "2023-01-05T12:50:08.251475Z" } }, "outputs": [ { "data": { "image/png": "\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lcore = Chem.MolFromSmiles('c1cc([*:1])ccc1-c2nc(c1ccc([*:2])cc1)no2')\n", "lcore" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.650633Z", "start_time": "2023-01-05T12:50:08.262336Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2R3R4
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
10
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
11
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
12
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
13
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
14
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
15
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R2 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R3 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R4 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([lcore],m16,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,m16,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can exclude any molecules that have R groups in non-labelled positions:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.787099Z", "start_time": "2023-01-05T12:50:08.651664Z" }, "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = rdRGroupDecomposition.RGroupDecompositionParameters()\n", "params.onlyMatchAtRGroups = True\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([lcore],m16,asSmiles=False,asRows=False,options=params)\n", "tmols = [x for i,x in enumerate(m16) if i not in unmatched]\n", "PandasTools.RGroupDecompositionToFrame(groups,tmols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are other useful parameters to control the calculation in that `RGroupDecompositionParameters` object, but this post is already getting pretty long, so I'm going to wrap up now and leave exploring those as an exercise for the reader. ;-)\n" ] } ], "metadata": { "_draft": { "nbviewer_url": "https://gist.github.com/0afd4f5cc9194432acd85bf22261ed8a" }, "gist": { "data": { "description": "RGroupEdgeCases.ipynb", "public": false }, "id": "0afd4f5cc9194432acd85bf22261ed8a" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }