{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is an updated version of an [earlier post](http://rdkit.blogspot.com/2019/12/using-r-group-decomposition-code.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The RDKit's code for doing R-group decomposition (RGD) is quite flexible but also rather \"undocumented\". Thanks to that fact, you may not be aware of some of the cool stuff that's there. This post is an attempt to at least begin to remedy that by looking at some of the edge cases that come up while doing RGD. \n",
"\n",
"I have another post coming in the near future which is a bit more of a tutorial, but here we'll look at a number of difficult/interesting problems that arise all the time when doing RGD on real-world datasets:\n",
"\n",
"- Handling symmetric cores\n",
"- Handling stereochemistry\n",
"- Handling sidechains that attach to the core at more than one point\n",
"- Handling multiple scaffolds or variable scaffolds\n",
"\n",
"Some of these problems are really tricky to solve perfectly, so please expect that there will be bugs (particularly in the code for handling symmetrization). If you find something that seems wrong, please do file a bug report, ideally with the exact code and structures that you used.\n",
"\n",
"The code in this blog post behaves correctly with v2019.09.1 and later of the RDKit. Older versions have bugs that generate different results for some of the examples here. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.249874Z",
"start_time": "2023-01-05T12:50:05.028680Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2022.09.1\n"
]
}
],
"source": [
"import pandas as pd\n",
"from rdkit import Chem\n",
"from rdkit.Chem.Draw import IPythonConsole\n",
"from rdkit.Chem import Draw\n",
"from rdkit.Chem import rdDepictor\n",
"from rdkit.Chem import PandasTools\n",
"IPythonConsole.ipython_useSVG=True\n",
"from rdkit.Chem import rdRGroupDecomposition\n",
"from rdkit import RDLogger\n",
"RDLogger.DisableLog('rdApp.warning')\n",
"import rdkit\n",
"print(rdkit.__version__)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.253793Z",
"start_time": "2023-01-05T12:50:05.251495Z"
}
},
"outputs": [],
"source": [
"PandasTools.RenderImagesInAllDataFrames()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Basics: a symmetric core\n",
"\n",
"Let's start with an easy example that has a symmetric core. In this case R1 and R5 are symmetry equivalent as are R2 and R4):"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.388520Z",
"start_time": "2023-01-05T12:50:05.254881Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAY0ElEQVR4nO3deVRTVwIG8JvILqiAii8EAUXBXYSqaN23ilStLWhHsHPaTrS2RDunres02sMoLtU4VY4cj9OJzFjB4gIVGcVBAUdFUXFBUbaRrWWLIKQSJHf+eDZSCIs8yEvg+/3RI3k3yZc5w5f37nvvIqCUEgAAaC8h3wEAAIwbahQAgBPUKAAAJ6hRAABOUKMAAJygRsFw5eTklJeXtzCgsrLy+fPnessDoBNqFAzXxo0br1y5onNTRESEt7f3sGHDBg0aNG3atDNnzug5G4AWahQM0VdffRUdHS0Wi62trUNCQs6dO9dw6/79+zdv3vz++++fOHHiH//4R69evVatWpWamspXWujmTPgOAKDD1KlT09LS4uPj8/LyZs2a5ezs3HCrg4PDrl27AgIC2B+HDx8+fvz4+Pj48ePH8xEWujvUKBgiPz+/4uLipUuXJiUl+fr6NqpRf3//hj/W1tYSQkxNTfUaEeA3OKgHQ5SXl3fo0KG1a9e+9957crmcEHL8+PFx48alpaU1Gpmdnb1+/Xo7O7s//OEPfCQFwN4oGCQXF5ekpCQLC4sPP/xQKBQSQqytrcVisYWFhXZMZGTk+vXra2tr+/Tp88MPPzTaYwXQGwGWJgHDVF5e/uOPP/bt2/fdd9/VOaC4uPjRo0fFxcXff//9kydPjh8/PnLkSD2HBCCoUTBY9+/fnzNnzvDhwxMSEloeWVVV5e3t7ePjo1Ao9JMNoCHMjYLR69Wrl5OTU05ODt9BoJtCjYJxUCqV9+/fV6lUhJDIyMhTp05pNxUVFeXm5rq6uvKXDro1nGIC43D69OmNGzdGR0f7+PhcuXLl+PHj8fHxXl5eT58+PXbsGKV0zZo1fGeEbgo1CsbB29t748aN7On4vXv3Tpky5fjx4+Hh4ebm5hMmTPjss8+GDx/Od0bopnCKCQxU208xAfALc6MAAJygRgEAOEGNAgBwghoFAOAENQoAwAlqFACAE9QoAAAnqFEAAE5QowAAnKBGAQA4QY0CAHCCGgUA4AQ1CgDACWoUDJRAIOjfv3+fPn34DgLQCqw3CgZKrVanp6fznQKgddgbBQDgBDUKAMAJahQAgBPUKAAAJ6hRAABOUKMAAJygRgEAOEGNAgBwghoFAOAENQoAwAlqFACAE9QoAAAnqFEAAE6wwpMR2Ldvn7Oz8+LFi5tuqqqqio6OvnPnjlAo9PDwCAgI6N27t/4TQttpNJrly5fL5XIHB4cWhhUVFX3//feEkFmzZk2cOFFf6aA9sDdq0NLS0qqrqysrKwUCQWlpaU5OTsOtjx8/njFjxrZt28rKygoLC7/55pu5c+eWlpbylRZalZycLBAIcnNz7ezsbt68WVdX19zI9evXHz169MCBA7du3dJnQmgH1KhBi4yMnDlz5o0bN44dO+bn55eQkNBwa1hYmIWFRUpKSkRExLFjxw4ePJifn3/o0CG+0kLLlErl119/vWDBAnNz8+XLl69ZsyYrK0vnyJMnTyYkJGzYsEHPCaF9cFBv0Hbu3FlSUrJ69WqBQJCSkmJqatpw6549eyoqKuzt7dkfFyxY0Lt377t37/KRFFpna2ubmJj4008/7dmzZ968eR999JHOYUqlUiaTLVq0aPbs2XpOCO2DvVGD9vTp0/fff18ikQwaNGjt2rWEkLy8vLCwMHYvRiAQaDuUJRQKX7x4wU9WaIOUlJStW7f++OOPhw4diouLI4RcvHgxLCys4dH9119/XVtbu2XLFt5SwmvC3qhBs7Gx+eSTT+bOnevp6ZmRkUEIyczMDAkJEYlEbm5ujQbfvXtXqVR2mdMRmZmZhJCysrKqqqpevXrxHadjuLu779ixw87Obvfu3UOGDCGExMbG/vDDDx988AF7qJGYmBgdHR0aGurg4PDzzz/znRfahoJRqaysTE9Pf/r0aaPH6+vr/f39R44cWVlZyUuwDlRWViaVSnv06GFlZUUIsbe3l8vlL1684DsXVxkZGStXrnz27FnDB/Pz89PT0+vr6ymlKpVq4sSJvr6+7I/FxcUMw4SFhfETF9oMNdoVaDSaTZs2icXiM2fO8J2FE7Va/e2337LXbJmamgYGBmp3rr28vJKTk/kO2E4VFRXr168Xi8UMw+zcubO5YX/5y1+cnJzu3bvH/ogaNRaoUaOnVqu/+OILR0fHY8eOaR98+PBhWloaj6na4fz58yNGjGBLc/bs2do2iYmJcXFxYR/38/PLzc3lNebrqauri4iIGDFiBMMwTk5OX375ZVlZmc6RBQUFYrF44MCBE3/j7e3NMMzw4cPfffddPceG14IaNW7FxcV+fn5ubm4//fRTw8eXLFkiEokkEkl+fj5f2douMzPTz8+PLcqhQ4fGxsY2GqBSqUJDQ62trQkhlpaW69ata3RobJiSk5NnzJjBMAzDMP7+/hkZGS0MLi8v3/9727dvZxgmMDDwyJEjessM7YAaNTKpqanvvPMOe3ibkpIyatSo4cOHnzt3Lq8BtVq9fft2FxcXhmEGDx4sl8ufP3/Od3DdlErlunXrzM3NCSF9+vQJDQ1tIWpBQUFQUJBAICCEODo6KhQKjUajz7Rtl5ubK5FI2AL18fGJiYlpbuSBAwfeeeedX3/9tekmHNQbC9SokUlMTBwzZszZs2cppWPHjmV0YUcWFhYGBweLRCKGYcaNGxcVFWVQpVNfX69QKPr3708IEQqFQUFBv/zyS1ueeO3aNe2E6fjx469cudLZUV9LTU3N7t27nZ2d2e+w3bt319bWtjB+69atY8aMUalUTTehRo0FatSIZWRkpOvScMzly5dnzZrF1uuSJUvu37/PV9qG2C8DtgqnT59++/btFkY2bX+NRqNQKAYMGEAIEQgEQUFBxcXFnRy5dfX19VFRUaNHj2YYRiQSBQcHt/GLoTlqtTo9Pb2kpKSjEkInQY12feyv96hRoxiGcXR0DA4O5vE388mTJ0FBQWyBisXilg/MExMTCSHe3t4pKSlNt1ZXV8tkMnZCoGfPnjKZjMe5i5s3b/r5+bFfV76+vjdu3OArCegfarS7qKysDAkJYQ823d3d9+/f3/LBZoerqamRyWQWFhaEECsrK5lMpnNCsKHY2FiRSMTucq5YsaKwsLDpmMePH/v7+7O97ObmFhUV1Tnxm1VcXKydPPH09DS0yRPQA9Ro95Kdnb1ixQp2p2nSpEnnzp3Tw5tqNJqoqKiBAweyhejv75+Xl9fG57axfBMSEkaNGsWW6cyZM+/cudOhn0A3lUq1f/9+Nzc3hmFcXV1DQkKqq6v18L5gaFCj3VFSUtL06dPZMg0ICHj48GHnvdf169cnT56svYRe5+F5qxpOBTg5OSkUiqZj6urqwsPD+/XrRwgxMTGRSCSdN3eh0WiOHj26ePFidhp01apVOveUoZtAjXZTarX64MGD7u7uDMOMGjXqq6++UiqVHfsWhYWFEolEKBQSQhiGCQ8PZ+9xbLe2nJgqLy+XSqUmJiaEEFtbW7lcXldXx+VNm7px44b2i+Hjjz9OTU3t2NcHo4Ma7dZKS0u/+OKL+fPnE0Ls7Ow66tb12tpauVzOridiZmYmlUo76k7/Nl4m9eDBA/ZDEUI8PDzi4uI65N1LS0vZm/270p3+wB1qFOitW7emTZvGls6wYcPi4+O5vFpMTMygQYO0925mZ2d3VE4t9qJ9MzMz7UX7Ok+XxcTEDB48WJskKyur3e+oVqu1XwympqZSqbTp6jDQbaFG4aWYmBhXV1dt6eTk5LzuK2RkZMybN0+7D8jeI9B5MjMzFyxYoL2FtNHtsKym9deO/eLz588PGzZMe7O/gVx7C4YDNQqvsAfjNjY22oPxqqqqtjyRnZFkj3bZyYEOn5FsTnMLmjRUVFTUaJa2jQfjDx8+9PX1ZV/c3d3d2BfQgk6CGoXGGp4aEolELZ8aYs+P9+3bV3t+vLS0VJ9p6W+7nNrl9Zo74m54aqjVZfcqKiq08wa2trbNzRsAUNQoNCc1NXXSpEls6Xh7e1++fLnpmPPnz48cOZIdM2vWrLt37+o/p5Z2secWzv+wV7A6Ozu3cAVrwwun2LNYuB0TWoYahWaxpePk5KQtnf/973/spkePHmnvHRoyZIj+7x1qzs2bN6dOncoG8/T0vHTpUtMx7CX9lpaW2kv6tSuDXLhwYfTo0ezTZ8yY0WiBAgCdUKPQCvbWdfY+op49e27YsGHTpk3snezW1tb83snenLacLsvPz9cuuycWi3fu3Kn9Yhg4cKDOK/wBdEKNQptkZWUtXryYPc5l//unP/2J4wpGnYpd6Zk9Xcau9KzzdNnFixfHjh3LfkMQQmxsbFpe8xSgKdQovIaTJ08SQiwsLG7evMl3ljZpy0rPL168WL16NSHE1dW1qKiIl5xg1PB36uE1sNOOlpaWnp6efGdpE0dHxyNHjly7ds3Hx6ewsPCDDz6YMGHClStXGo7p0aPH3LlzCSHsWoI8JQUjhhqFru+NN95ISUk5fPjwgAEDrl+//uabbwYHB/MdCroO1ChwdfDgwW3btimVSr6DtEQoFH744YdZWVkymczMzMzBwYHvRNB1mPAdAIzet99+m5WV5e/vb2try3eWVvTs2XPLli2BgYGOjo58Z4GuAzUK3Y6bmxvfEaBLwUE9AAAnqFEAAE5QowAAnKBGAQA4QY0CAHCCGgUA4AQ1CgDACWoUAIAT1CgAACeoUQAATlCjAACcoEYBADhBjQIAcIIaBQDgBDUKAMAJahQAgBPUKAAAJ6hRAABOUKMAAJygRgEAOEGNAgBwghoFAOAENQoAwAlqFACAE9QoAAAnqFEAAE5QowAAnKBGAQA4QY0CAHCCGgUA4AQ1Ct2ORqPhOwJ0KahR4MrT03PixImWlpZ8B2mT1NTUyZMnnzx5ku8g0HWgRoGrqKioK1euiMVivoO04smTJ8uWLZs4ceLVq1f/9re/8R0Hug7UKHR9KpVqx44dI0aMiIyMtLS0XLduXWxsLN+hoOsw4TsAGJNnz54RQl68eKFWq83MzPiO0yaxsbFSqTQvL48Q4ufn991337m4uDQa88svvxBCSkpK9J4OugLsjUKbUEqPHDkyYcIES0tLlUo1YsSI48eP8x2qFTdv3pw6derChQvz8vLGjRuXlJQUGxvbqENVKtWWLVukUmn//v2vXr06Y8aM9PR0nvKC0aIArUlOTvby8mL/DzNmzBg3Nzf233PmzLl//z7f6XQoLS2VSqU9evQghNjb28vl8hcvXjQao9FoFAqFSCQihAgEgsmTJ9vb2xNCevTosWrVqtLSUl6SgzFCjUJLCgoKgoKCBAIBIcTR0VGhUGg0mrq6uvDw8H79+hFCTExMJBJJSUkJ30lfUqvVcrm8V69ehBBTU1OpVPr06dOmw65fvz5p0iT2y8Db2zslJYVSqlQq161bx05W9OnTJzQ0tLa2Vu+fAIwPahR0U6lUoaGh1tbWhBD2tMyzZ88aDqioqJBKpSYmJoQQW1tbuVxeV1fHV1pWXNwd7Z7ywoULHz9+3HRMYWGhRCIRCoWEEJFIFB4eXl9f33BAZmbmggUL2BcZOnRobGysvuKDsUKNgg4xMTHaOUQ/P7/c3NzmRj548GD+/PnsSA8Pj7i4OD3GfOXhQ+rrS/v2re/Vy8nd3f3MmTNNx9TW1srlchsbG0KImZmZVCqtrKxs7gXPnz8/YsQI9nPNnj373r17nRkfjBtqFH4nLS1typQpbH14enomJSW15VkxMTGDBw/W1m5WVlZn59QqL6effUZNTCgh1M6O/v3vj3TuFMfExAwaNEibMDs7u9VXZucHevfu3fL8AABqFF4qKytr9bRMC5pOSrawr9ch6upoeDjt148SQoVCGhREdc7QZmRkzJs3jy3QYcOGxcfHv9a7cPyfBboD1Ch05G5XUVGRduaRYZjw8PBOKp0LF+jo0ZQQSgidMYOmp+sYU15erm1AOzs7Lg3IXjvFdvHYsWMvXbrEKT10LajR7u7f//63p6cnWxBvv/32o0ePuL/mtWvXfHx82NdcujTpv//l/pKvZGVRf/+XBerkRBUKHWPUarp3L5058zv2i2HNmjUVFRXc3zoyMtLZ2Zn9XMuWLcvPz+f+mtAFoEa7r6ysrMDAQIZh5s2b19xpmXbTaDQRERFvvSUhhAoEdPlyyr1zqqupTEYtLCghtGdPKpPRX3/VMSwujnp4UEKoSFT39tv+Dx484PrGDbAXMNjY2MyfP9/V1TUkJKTRBQzQDaFGuyOlUrlp0yYnJyeGYTw8PA4fPtxJ1yrV1FCZjFpaUkKolRVdt462r3M0GqpQ0AEDKFvK/v70yRMdwx49erWjOmQIjYriGL9Z+fn5f/7zn0UiEcMw48aNO3HihEaj6aw3A4OHGu1e6uvro6KiRo4cyTCMo6NjcHCwHm7Xyc+nQUFUIKCEULGYKhT0dTtnwYKX5ejjQ1NTdQx49ozKZNTcnBJCra2pTEafP++Q7C25deuWn58fwzAMw8yfP//GjRud/pZgkFCj3UhKSsrMmTPZX/v33nsvIyNDn+9+9SqdMOFlG06YQK9efY3nHjpERSLd/VtfTxUK6uDw6nz9zz93YOpWaDSaqKioMWPGMAwjEokkEklhYaH+3h4MA2q0WygsLAwODmYL1MvLK6rzDndb1O7Kq6+n1dU6Hr94kY4d285q7kA1NTW7d+92cXFhGGbw4MG7d+9+roedYTAYqNEuzgB/wzvkAJz7REGHy83NlUgkvH9Xgf6hRo3A48ePZ8+e3dzWioqKffv2ffTRR42uVdJoNDExMV5eXtrjzYKCgs4P21bNnQ7avJl6edEjR343OCmJenm9nBXtqNNWnSQ5ObnlmZP6+vrJkycXFRU1fe7Tp08lv/f555/rJTVwgvVGDVpZWdmpU6f69u1rYmKiUqmio6Mbbi0oKNiwYYO3t/eOHTvi4uIKCgq0m27fvr1w4cKVK1cWFRWNHTv29OnT4eHhjo6Oev8EzRoyhERFkbg44uFBHj8mAQFk/nySk0Py8khaGlm7ljRcQ7mykqSlkaoqEhVFhg4lW7eS58/J8uUkM5OEhhJra/4+RhNvvvnm+fPn9+3bZ29vf/ny5Tlz5kil0vLyckIIpTQiIkKtVguFQnt7+1OnTqlUqobPLSwsjI2NLS0t1T7CrgAAho7vHoeW3Lt3LyAgYOHChW+88Yavr+/HH39cU1Oj3Xrjxo3Vq1efPn06NjaWYZj//Oc/7ONnz55lr8Xx9PSMjo428Gtx2Hs6+/alZmY0M5MGBlJ7e9q3L12x4tWY2FhKCE1IoDIZJYR6edHkZP4St41Sqdy4cSN7VdmwYcOOHDlSUVGxYsWKuXPnjh8/ftGiRcuXL2+0BtWFCxcYhrl9+zZfmaF9UKOGTqPRbNy4cenSpTExMc2NSUxMbFijtbW106ZN27x5sxFdGV5SQqOjKaU0MJA6OVG5nAoENDHx5VZtjdbU0H/+k/5+ZTuDpr3HISQkhH3kX//6V0BAwI4dO5oOPnr0KMMwP+vzUgPoCPhbTIZuz549BQUFBw4cmDp1qlgs9vT03LVrl5WV1aefftrcU8zMzBISEtiVQI1Fv35kyZJXP376KTl8mKxcSe7cIebmrx63siLLl+s/XfsNHjw4IiLi3Llz7N2x8fHx+/fvT0xMfOutt5ydnZcuXRoZGXnr1q3t27cLBIKSkhKhUGhiYnLt2jWBQDBy5EgrKyu+PwG0zph+07qnFStW1NTU2NvbR0REjBkzhhASHR1tZ2fXQo0SQoyrQ5syMSF795LZs8muXWTzZr7TcDZ37lz2H9OnTxeLxebm5mFhYezt+ZcuXTp16tS2bdvYGiWEeHp6CgSCuro6a2vrb775ZtmyZXxGh7bge3cYXltVVVWjo/VGB/VGjT2oZy1bRi0taXb2q4P6rqempka7nlZFRcW5c+fYBQazs7MXL17s6OiIm6MMH87UGx8bGxtrgzo53Wn27iVmZuTLL/nO0ZmsrKzYJQoJIba2tnPmzGHXbB00aJBcLtdoNCdOnOA1ILQONQqGa8AAIpOREydIUhLfUfjg6OgoFArZi6XAkKFGwaAFB5PRo8l33/GdQy9ycnLq6uq0P6ampmo0Gnd3dx4jQVsY94mI7mnhwoW2trYKhYIQkpaWplKp7t27Rwi5d++eiYkJwzDav47ZBZiYkLAw8ttfh+qCNm3adPbs2bS0tOrq6kWLFg0cOHDlypWurq4ZGRmhoaEODg6BgYF8Z4RWoEaNj0gk0t7c8vnnn2dlZbH/3r59OyHkj3/847Zt23gLx1nPnuS3qcKXJk8mEgmJiiKmpjxl6kx2dnZOTk6EEBsbm4MHD27fvv2TTz6hlAqFwilTpvz1r3/t168f3xmhFQJKKd8ZoP2ePXum0WgaPmJmZmZpaclXHuBOqVSWl5f379+fPdcEhg81CgDACU4xAQBwghoFAOAENQoAwAlqFACAE9QoAAAn/wegFyNpvDCcLwAAAM56VFh0cmRraXRQS0wgcmRraXQgMjAyMi4wOS4xAAB4nHu/b+09BiDgZYAARiDmhuIGRgYdDbCgBKMOSEKLzQHEZ2aB0XB5JlR5uDgzDnEWHOKsEHF2iDgzN9A9jEwZTEzMDEwsCSysDCxsGUxs7AxsHAkcnAwcXBlMXIwJIoxsjFwcbCxM4n1QD4AB90M3tQO7XjzaD+I8dFu2f9eLJjhbPPaLHYStdsBNfK49TBOC3bB0cbeeA1S9PZJ6B5gakDjCTDUHmF0MDAdUYWwxAB4hOOTe9IpfAAABInpUWHRNT0wgcmRraXQgMjAyMi4wOS4xAAB4nI1SS27EIAzdcwpfYJAxJIRFF/mMqqodIk3TucPse3/VnlFiaKOoBke28/Ji/DAgdp3e79+wGU3GAODBTinBzSOiuYAEMJxf3zKMSz+slXH+yssnOCcbZdXYfpkva8XBFU5km9RhbOHkbIzEC9Diwxhw6z9e3IonGAVFKaHvJEKq8du/fIVEG4TZ7wBD1QJ/QTWQ6g4a5tXu/qA32lZolcTGX0Bf00am/U+3HdMeNBtq1lSyHk3LYcm7o0NTE7OyWV/u4Dfmc54qxZ93YJjzpHdAFqnEgd2rjo49qFqB00bVcOytDj1wGnWojr3T2QVOkw7HiZczCFIoDxnkQeVZys4lX28+x+YHKz+hEFAKj2UAAADEelRYdFNNSUxFUyByZGtpdCAyMDIyLjA5LjEAAHicbY/BDoIwDIZfxSOYrdnKYGOcvKvxbjiAemNCiBxMeHgrYuJST1/792vS3vUlOW99XqcLzcpsJRI1UdebOVFCarAWbSYqDViWyrwThUuCkJdOuT+OArMmX4cmSImo1G+JYD+l5JuSrUp+gmQ3pKJ59OE09oNHuE4hPPdNe+uAPoPQdzsaHprhOIX2NkLuTewY7hhfxE7Bncy72HHcQa9VLFHPLD2/AKkOcPcwgkhRAAAAAElFTkSuQmCC\n",
"image/svg+xml": [
"\n",
"\n"
],
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaffold = Chem.MolFromSmiles('[*:1]c1c([*:2])c([*:3])c([*:4])c([*:5])n1')\n",
"scaffold"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are some molecules that share that scaffold. We've provided the atoms in different orders to make sure that's properly handled by the RGD code."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.479596Z",
"start_time": "2023-01-05T12:50:05.391123Z"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
""
],
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mols = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cccn1 c1c(Cl)c(C)ccn1 c1c(O)cccn1 c1c(F)c(C)ccn1 c1cc(Cl)c(F)cn1'.split()]\n",
"Draw.MolsToGridImage(mols,molsPerRow=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do a version where we provide a scaffold without the R labels to start with:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.560231Z",
"start_time": "2023-01-05T12:50:05.484558Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAV8UlEQVR4nO3de3hMd/4H8HdumgQhLok2Ll2ViyCxiFZF2qpU+2xbokWkUbQuRRqZkGkUcavLRuvpxWIt233sKtkspbXdon60arWKWITciEsSSYjIZXKfmd8fY9Nf/eTkqMz5zpzzfv3l4dN4P6Fv35k553MczGYziIjo13IUHYCIyL6xRomIHghrlIjogbBGiYgeCGuUiOiBOIsOQCRAcTGuXQOAVq3Qr1+TY/n5KCwEgIAAtG6tUDayOzyNkhalpGDQIAwahKAgHDzY5Ni6dXfGzp9XMBzZG9Yoad3MmaipER2C7BlrlLQuOxtr1ogOQfaMNUqa9vDDALByJXJyREchu8UaJU1LSoKzM2pqMGuW6Chkt1ijpGkBAZg8GQAOHMD27YLDkJ1ijZLWrViB9u0BQKfD7dui05AdYo2S1nl5YdEiACgqwsKFotOQHWKNEuHtt9G7NwBs2IDjx0WnIXvDu5jo/hw6dOitt966dOmSl5eXi4tLi3/9Tp363ry5t8W/bKP9++Hnd/dPurhg40Y8/TRMJkybhpMn4cz/M0g2/mWh+2AymSIiIsrKygAUFBRY47eore1suf/SSurq7v3zYWGYMAGffYYzZ7BuHeLirJiBVIY1SvdBp9OVlZU5Ojru2rUrMDDQGqdRBwdnqz6QwcenyV/64AN89RVu30ZSEiIj0aWLFWOQmrBGSS6DwbB+/XoA0dHRo0aNEh2n5XXpgiVLEBeHigokJWHTJtGByE7wIyaSa82aNQ0NDV5eXps3bxadxVpmz0b//gCwZQvS0kSnITvBGiVZ8vLy3n//fQCpqanWeC1vI5ydsWkTHB1hMiE+HnzeI8nBGiVZ9Hq9wWAYP358WFiY6CzWFRKCKVMA4PBh7NsnOg3ZA9YoNe/YsWM7duxwc3NbvXq16CxKSE5Gp04AcPq06ChkD1ij1AyTyTRnzhyz2azX6x999FHRcZTQoQNWrBAdguwHa5Sa8emnn/70009du3ZNSEgQnUU5U6diyBDRIchOsEZJSkVFxaJFiwAkJye31tLTiBwdsXEj72UiWfjXhKQsX778+vXrQ4YMiYyMlBi7ceNGjx49Wvx3d3c3VFU5tOzXPHECgYEYPhx//COAe9wY2igoCLt24fp1ANDGmxn0KzmYeU0HNeHixYt9+vSpr6//4YcfQkJCJCaLi4u9vb1bPIC7u7mqqoW/5tmz6Nu3hb8maRxPo9Sk+Pj42traN954Q7pDAXh5eRkMhhYP4ODQ8lduurnd939SVITUVMTEtHASUg2eRuneDh48OGLEiLZt22ZmZj5seWKRJtXVoWdP5OfjwAGMGCE6DdkkfsRE99DQ0KDT6QAsXLhQyx0KoFUrvP02AMTGor5edBqySaxRuocNGzacPXv2sccemzNnjugs4ul08PXFhQt3PpUiugtf1NPdSktLfX19S0pK9uzZ8/LLL4uOYxN270ZEBDw9kZV15wYnokY8jdLdFi1aVFJSMnz4cHZoo9GjMXIkSkuxdKnoKGR7eBqlXzh//nxwcLDZbE5LS+vXr5/oODbk/Hn07w+TCWlp4DeG/i+eRukXdDpdQ0PDrFmz2KF3CQzEjBkwGvl8EbobT6P0s927d0dERHh6emZnZ3fs2FF0HJtTWgpfX5SUYPduqHH9P/1KPI3SHXV1dXq9HsDy5cvZoffk6YnFiwEgPh61taLTkM1gjdIda9euzc7ODgwMnD59uugstmvWLPTrh0uX8OGHoqOQzeCLegKAoqIiPz+/8vLyr7/+euTIkaLj2LT/+R88+yzatkVmJrR9awLdwdMoAUBiYmJ5efno0aPZoc0aPhyjRqGiAgsWiI5CtoGnUcKpU6dCQkKcnZ3PnTvn6+srOo4duHQJgYGor8exYxg8WHQaEo2nUa0zm81xcXEmk0mn07FDZerZE3PmwGRCXByfHko8jWretm3boqOjvb29MzMz27VrJzqO3aiogL8/rl/Htm2IihKdhoTiaVTTqqur3333XQCrVq1ih96Xtm3x3nsAoNfDCqtWyZ6wRjVt1apVV69eHTBgwKRJk0RnsT+TJ2PwYOTnIzlZdBQSii/qtevatWsBAQHV1dXffvvtsGHDRMexS8eOYehQuLri/Hk+r0m7eBrVrrlz51ZVVUVFRbFDf7UhQxAZiepqJCaKjkLi8DSqUUePHh02bJirq2tGRkb37t1Fx7FjeXkICIDBgMOH8dRTotOQCDyNapHJZIqLizObzYmJiezQB9S1KxISACAuDkaj6DQkAk+jWrRp06YZM2Z069YtIyPD3d1ddBy7V12NwEBcvoxNmzBtmug0pDjWqOaUl5f7+/sXFhampKSMGzdOdByVSElBZCS8vJCVBV45pjV8Ua85S5cuLSwsHDp06NixY0VnUY/x4xEWhuLiOxeTkqbwNKotOTk5ffv2ra+v//HHHwcNGiQ6jqqcOoWQEPj5nf/iC1df356i45ByeBrVlri4uNra2jfffJMd2uIGDMD8+TtycoLj4/lUam3haVRDvvnmm/Dw8LZt22ZlZXXp0kV0HBUqLi728/MrKyv76quvXnjhBdFxSCE8jWpFQ0NDXFwcgMWLF7NDrcTLy2vhwoUA4uPj6+vrRcchhbBGtWLdunXp6em9evWKiYkRnUXNYmNj/f39MzIy1q9fLzoLKYQv6jXh1q1bfn5+JSUle/fu/d3vfic6jsrt3bv3pZde8vT0zMrK6tSpk+g4ZHU8jWrCggULSkpKRowYwQ5VwIsvvvj888+XlpYmJSWJzkJK4GlU/dLT0/v37w8gLS2tb9++ouNowoULF4KDg00m08mTJ4ODg0XHIeviaVT9dDpdQ0NDTEwMO1QxvXv3njVrltFo1Ol0orOQ1fE0qnI7d+589dVXO3TokJWV1bFjR9FxNKS0tNTPz+/mzZs7d+4cM2aM6DhkRTyNqlltbW1iYiKAFStWsEMV5unpuWzZMgDz5s2rqakRHYesiDWqZu+//35OTk6fPn2mTp0qOosWTZ8+PSgoKDc3d+3ataKzkBXxRb1qFRYW+vv7l5eX79+/Pzw8XHQcjTp06NDw4cPbtGmTmZn5yCOPiI5DVsHTqGrp9fry8vJXXnmFHSrQM888M2bMmMrKyvnz54vOQtbC06g6nTx5cvDgwS4uLufOnevVq5foOJqWm5sbGBhYW1t77Nixxx9/XHQcank8jaqQ2WyePXu2yWSaO3cuO1S43/zmNzqdrvEPRXQcank8jarQ1q1bJ02a5O3tnZWV5eHhIToOobKy0t/fv6CgYOvWrRMnThQdh1oYT6Nq0/g2XHJyMjvURrRp02blypX47xvWouNQC2ONqs2qVasKCgoGDhwYHR0tOgv97PXXX3/88ccLCwuTk5NFZ6EWxhf1qtL4acZ3330XGhoqOg79wg8//PDkk0+2atWKn/upDE+jqmK5YWbixInsUBv0xBNPREdHN95aRqrB06h6WK70dnd3z8jI6Natm+g4dA/5+fkBAQGVlZW8J0JNeBpVibq6Ossdn++++y471Gb5+Pi88847+O/aLdFxqGXwNKoSUVFR27dvb926dXFxsbu7u+g41KSamprAwMDc3NwNGza89dZbouNQC+BpVA1yc3NTUlIAzJw5kx1q41xdXS0f1ickJFy6dEl0HGoBPI2qwciRI/fv39+uXbvbt29LT+bk5PAzYmuT80328fEpKCh47rnn9u3bp0wqsh6eRtXAcouhl5eX9NjEiRMDAgLS0tIUCaVRaWlpAQEBzd6q1Lp1a/z3D47sHWtUDWJjYwFkZ2fv2rVLYszLy8toNL799tt8CWI9Op3OaDR6e3tLzHz55ZfZ2dkAZs6cqVQusiK+qFeJoKCgs2fPdu7cubi4uKmZ8vJyf3//wsLClJSUcePGKRlPI1JSUiIjI728vDIzM9u3b9/UmLe3d3Fxcb9+/c6cOaNkPLISnkZV4vPPP3d1db1x48aePXuamvHw8Fi6dCmAefPmVVVVKZhOE6qrqy3X1b/33nsSHbp79+7i4uKHHnroH//4h4LpyJrMpBYff/wxgJ49e9bU1DQ1YzQaBw0aBGDZsmVKZtMCyz9R/fv3b2hoaGqmtrbW19cXwCeffKJkNrIq1qh6NDQ09OvXD8Dvf/97ibHvv//ewcHBzc3typUrimVTvby8PMunRocPH5YYW716NYDevXvX1dUplo2sjTWqKt988w2Atm3bFhQUSIyNHTsWwGuvvaZYMNWLiooCMG7cOImZwsLCdu3aAfj6668VC0YKYI2qzcsvvwzgjTfekJi5evWqu7u7g4PDd999p1gwFfv3v/9tOeDn5uZKjE2ZMgXAqFGjlMpFCmGNqk1OTs5DDz3k6Oj4448/SowtWrQIwIABA4xGo2LZVMloNA4ePBhAUlKSxNjJkycdHR1btWqVmZmpWDZSBmtUhfR6PYAhQ4aYTKamZqqqqrp37w7gz3/+s5LZ1GfLli0AfHx8Kisrm5oxmUzDhg0D8M477yiZjZTBGlWh8vLyhx9+GMBnn30mMfa3v/0NgLe3d1lZmWLZVKbxW71t2zaJsW3btgHw8vK6ffu2YtlIMaxRddq8eTOArl27Sh+RLNudExMTlcymJpatd80e/Hv06AFgy5YtSmYjxbBG1cloNIaEhABYvHixxFjjG3ZZWVlKRVOPixcvynkbOikpCcBvf/tbvg2tVqxR1ZL58fHkyZMBjB49Wqlc6jFq1CgAU6ZMkZjhRRFawBpVswkTJgAYP368xExhYaHlOcy8mPG+HDx4UM4lupbdBVFRUYoFI+WxRtXs2rVrlltrvv32W4mxVatWAQgMDKyvr1csm11rvGFs9erVEmNHjx61vCC4fPmyYtlIeaxRlVu8eLHljTk5N3qvW7dOyWz265NPPpG/vmDJkiVKZiPlsUZVrvFj4j/96U8SY5ZFpZ6enjdv3lQsm526detWp06dAHz++ecSY5s2bWr2YglSB9ao+m3fvl3ORYvPPfccAMtSZ5IQExMDYPjw4RIz5eXlXbp0AbBjxw7FgpEorFFNCAsLAzBv3jyJmfT0dGdnZycnpzNnzigWzO6kp6e7uLg0+12aO3cugCeffFLielJSDdaoJpw6dUrODd2zZ89u9pylcSNHjgQQExMjMdO41uCnn35SLBgJxBrVijfffBPASy+9JDFz69atjh07AtizZ49iwezI7t27Le8g37hxQ2LsxRdfBDB16lTFgpFYrFGtKCoqsiy7/Ne//iUx9tFHHwF47LHHJD6D1qba2lo/Pz8AH3/8scTYgQMH5FxPSmrCGtWQ5OTkZlev19fXy1mhr0FyFtfX19f37dsXwJo1a5TMRmKxRjWk8Tz10UcfSYzJXKGvKTIX13/44Yc8y2sQa1RbvvjiCznv7slZoa8pchbXl5SUWN5Z/vLLLxULRraANao5zz//PIBZs2ZJzDR+1nz8+HHFgtksmdc5zJw5E8Czzz6rWDCyEaxRzTl//rzlysf//Oc/EmMJCQnNbtLUgsbF9Xq9XmLs3Llzzs7Ozs7OZ8+eVSwb2QjWqBbFxsYCeOaZZyRmZK7QVz2Zi+vDw8MBzJkzR7FgZDtYo1rUeFf4rl27JMbkrNBXN5mL63fu3AmgQ4cO3EigTaxRjfrDH/5g2VFUXV3d1IzMFfoqJmdxfeN+rPXr1yuZjWwHa1SjGhoagoKCAKxcuVJirHGFvgY3Zl67dk3O4voVK1ZwW6vGsUa1y7K/vU2bNvn5+RJjkZGRACIjIxULZiPGjx8PYMKECRIzjc8O2Ldvn2LByNawRjUtIiICwKRJkyRmZK7QVxmZi+tff/11AGPGjFEsGNkg1qimXbx40dXV1cHBQfrZlnJW6KuJzMX1J06c4HNVycwapcTERABPPPGEnCetS6/QVw05i+tNJlNoaCiA+fPnK5mNbBBrVOsqKioeeeQRAH/9618lxmSu0FcBmYvrt27dCsDb27usrEyxbGSbWKNk/vTTTwH4+PhUVFRIjFlW6CckJCgWTIh58+Y1u7jeYDB0794dwF/+8hcls5FtYo2S2WQyDR48GMDChQslxmTeWm7XZC4TWLBgAYCBAwdKXE9K2sEaJbPZbD527JiDg4Orq2tubq7EmJwV+nZNzuL6q1evWq4nPXLkiGLByJaxRumO1157DcDYsWMlZmSu0LdTMhfXv/rqqwCio6MVC0Y2jjVKd+Tl5VmuDz106JDEmJwV+vZI5uL6I0eOODg4uLu7X7lyRbFsZONYo/SzZcuWAQgODpa4PlTmCn27I2dxvdFoHDhwIIDly5crmY1sHGuUflZdXf3oo48C2Lhxo8SYzBX6dkTm4voNGzYA6Natm8FgUCwb2T7WKP3C3//+dwCdO3cuLS2VGJOzQt+OyFlcX1ZW5u3tDSA1NVWxYGQXWKN0t6eeegpAfHy8xIzMFfp2Qebiep1OB2Do0KEafxwA/X+sUbpbWlqak5OTs7PzuXPnJMbkrNC3C3IW11+4cMHFxcXR0fHEiROKBSN7wRqle5g+fTqA8PBwiRmZK/RtnMzF9S+88AKAGTNmKBaM7AhrlO6huLi4ffv2AP75z39KjMlZoW/LZC6u37t3LwAPD4/r168rlo3sCGuU7u2DDz4A0KtXr9ra2qZmZK7Qt1lyFtfX1dX5+/sDWLt2rZLZyI6wRuneZNaHzBX6Nkjm4no5/5yQxrFGqUkyX8xaVujb3bOF58yZAyAiIkJiRuabG6RxTkuWLAHRvfj5+R0/fjw9Pb2ystKys+OeQkJCPD09k5KSXFxclIz3gMLCwlxcXBISEjw9PZuaiY+PP3r0aHh4uOX+LqJ7cjCbzaIzkO3KyMgICgoyGo3Hjx+33AepHadPnx40aJCDg8Pp06f79OkjOg7ZLkfRAcimBQQExMTEmEwmy2t20XEUFRcXZzQaY2Nj2aEkjadRakZ5ebmfn19RUVFqaqplR5wWpKamjhs3rnPnzllZWZa3R4mawtMoNcPDw8PyBnp8fHxVVZXoOEqoqanR6/UAli9fzg6lZrFGqXnTp08fOHDgtWvX1q5dKzqLEtasWXP58uXg4OCpU6eKzkJ2gC/qSZbvv/8+LCzMzc3twoULlqe5qVV+fr6/v7/BYDh06NDTTz8tOg7ZAZ5GSZbQ0NBXXnmlqqpK9Qe0KVOmGAyGsWPHskNJJp5GSa4rV6707NnTZDJt3LhxxowZouNYxebNm6dNm+bo6Jidnd2zZ0/Rccg+sEbpPoSGhh49etTd3b2srMzZ2Vl0nBZmMpk8PDwMBkNoaOiRI0dExyG7wRql+1BUVOTj42M0Glu1auXk5ASga9eubm5uonM9kOrq6ry8PABGo7Gurs7JySk/P9+y6J5IDrUdKMiqvL299Xp9cnJyXV2d5Weys7PFRmpZTk5Oer2eHUr3hadRum/p6elnzpyx/LhHjx6WxzLbL4PBcOXKFcuPg4KCeM8S3S/WKBHRA+EFT0RED4Q1SkT0QFijREQPhDVKRPRA/hcio9/AEhwazwAAAIl6VFh0cmRraXRQS0wgcmRraXQgMjAyMi4wOS4xAAB4nHu/b+09BiDgZYAARiBmg+IGRjaHDCDNzEwUg91BA8zgZmDMYGJkSmBizmBiZklgYc1gYmVIEGFkY2BlYWZiFA+CWgMGQHsO2K9epaUC4TrYP3Rbth/K3o9gH9hfWlKniiRuj6QezBYDAD+/HOIMO2nyAAAAx3pUWHRNT0wgcmRraXQgMjAyMi4wOS4xAAB4nI2R2wrCMAyG7/sU/wtY0sOmvdzWISLrQKfv4L3vj4lSu4mMJQ0k4SOnKohc4vnxxFdsVAqglRdCwN0RkRogDtr+eEropqbNmW68pemKmpVEl2QzjUPOGHQwuiIRkKZfJ3OWOdL7N4id0TYEcoc/oGNwt4n0Qm7pXS1KrlSskbCF61NcHOFzlnZMsZzFs9myvWdzZUcvVhYRrcq4HKAuQ3k2M+897yRx/jr21Qt0wGRJ21xkkwAAAE16VFh0U01JTEVTIHJka2l0IDIwMjIuMDkuMQAAeJxLNkxOzktONlSo0dA10DM31dE11DOytDQw0bEGskx1DIA0WBwujMqDqkHVqlkDAFlQEo2qAIwYAAAAAElFTkSuQmCC\n",
"image/svg+xml": [
"\n",
"\n"
],
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaffold = Chem.MolFromSmiles('c1ccccn1')\n",
"scaffold"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.779397Z",
"start_time": "2023-01-05T12:50:05.561421Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Mol
\n",
"
Core
\n",
"
R1
\n",
"
R2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
1
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
2
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
3
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
4
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Mol \\\n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"\n",
" Core \\\n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"\n",
" R1 \\\n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"\n",
" R2 \n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n",
"PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Those labels were automatically assigned and they aren't consistent with what we had above. Notice, however, that the symmetry in the scaffold has been properly handled. \n",
"\n",
"If we care about the R group labels, We can explicitly label the side chains:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.838879Z",
"start_time": "2023-01-05T12:50:05.780626Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Mol
\n",
"
Core
\n",
"
R2
\n",
"
R3
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
1
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
2
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
3
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
4
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Mol \\\n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"\n",
" Core \\\n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"\n",
" R2 \\\n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 \n",
"\n",
" R3 \n",
"0 \n",
"1 \n",
"2 \n",
"3 \n",
"4 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# note: there's a bug in RDKit 2019.03.3 and 2019.03.4 that causes this to generate different\n",
"# results with those versions\n",
"scaffold = Chem.MolFromSmiles('c1c([*:2])c([*:3])ccn1')\n",
"groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n",
"PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We've just been looking at compound images since that's a bit more readable. Here's what the raw output from the function looks like:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.888219Z",
"start_time": "2023-01-05T12:50:05.839909Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Core': ['c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1'], 'R2': ['F[*:2]', 'Cl[*:2]', 'O[*:2]', 'F[*:2]', 'F[*:2]'], 'R3': ['[H][*:3]', 'C[*:3]', '[H][*:3]', 'C[*:3]', 'Cl[*:3]']}\n"
]
}
],
"source": [
"groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=False) \n",
"print(groups)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also get that in a row-oriented format:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:05.993161Z",
"start_time": "2023-01-05T12:50:05.891802Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': '[H][*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'Cl[*:2]', 'R3': 'C[*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'O[*:2]', 'R3': '[H][*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': 'C[*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': 'Cl[*:3]'}]\n"
]
}
],
"source": [
"groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=True) \n",
"print(groups)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Stereochemistry\n",
"\n",
"Making sure that the sidechains are labelled correctly on chiral centers can be a bit trickier.\n",
"\n",
"Here's a set of molecules we'll be using. Some have a chiral center, some don't. There are a few that have sidechains with dual attachment points (i.e. rings). We'll look at those in the next section."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2023-01-05T12:50:06.075609Z",
"start_time": "2023-01-05T12:50:06.002666Z"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"