{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This is an updated version of an [earlier post](http://rdkit.blogspot.com/2019/12/using-r-group-decomposition-code.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The RDKit's code for doing R-group decomposition (RGD) is quite flexible but also rather \"undocumented\". Thanks to that fact, you may not be aware of some of the cool stuff that's there. This post is an attempt to at least begin to remedy that by looking at some of the edge cases that come up while doing RGD. \n", "\n", "I have another post coming in the near future which is a bit more of a tutorial, but here we'll look at a number of difficult/interesting problems that arise all the time when doing RGD on real-world datasets:\n", "\n", "- Handling symmetric cores\n", "- Handling stereochemistry\n", "- Handling sidechains that attach to the core at more than one point\n", "- Handling multiple scaffolds or variable scaffolds\n", "\n", "Some of these problems are really tricky to solve perfectly, so please expect that there will be bugs (particularly in the code for handling symmetrization). If you find something that seems wrong, please do file a bug report, ideally with the exact code and structures that you used.\n", "\n", "The code in this blog post behaves correctly with v2019.09.1 and later of the RDKit. Older versions have bugs that generate different results for some of the examples here. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.249874Z", "start_time": "2023-01-05T12:50:05.028680Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022.09.1\n" ] } ], "source": [ "import pandas as pd\n", "from rdkit import Chem\n", "from rdkit.Chem.Draw import IPythonConsole\n", "from rdkit.Chem import Draw\n", "from rdkit.Chem import rdDepictor\n", "from rdkit.Chem import PandasTools\n", "IPythonConsole.ipython_useSVG=True\n", "from rdkit.Chem import rdRGroupDecomposition\n", "from rdkit import RDLogger\n", "RDLogger.DisableLog('rdApp.warning')\n", "import rdkit\n", "print(rdkit.__version__)\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.253793Z", "start_time": "2023-01-05T12:50:05.251495Z" } }, "outputs": [], "source": [ "PandasTools.RenderImagesInAllDataFrames()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Basics: a symmetric core\n", "\n", "Let's start with an easy example that has a symmetric core. In this case R1 and R5 are symmetry equivalent as are R2 and R4):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.388520Z", "start_time": "2023-01-05T12:50:05.254881Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAY0ElEQVR4nO3deVRTVwIG8JvILqiAii8EAUXBXYSqaN23ilStLWhHsHPaTrS2RDunres02sMoLtU4VY4cj9OJzFjB4gIVGcVBAUdFUXFBUbaRrWWLIKQSJHf+eDZSCIs8yEvg+/3RI3k3yZc5w5f37nvvIqCUEgAAaC8h3wEAAIwbahQAgBPUKAAAJ6hRAABOUKMAAJygRsFw5eTklJeXtzCgsrLy+fPnessDoBNqFAzXxo0br1y5onNTRESEt7f3sGHDBg0aNG3atDNnzug5G4AWahQM0VdffRUdHS0Wi62trUNCQs6dO9dw6/79+zdv3vz++++fOHHiH//4R69evVatWpWamspXWujmTPgOAKDD1KlT09LS4uPj8/LyZs2a5ezs3HCrg4PDrl27AgIC2B+HDx8+fvz4+Pj48ePH8xEWujvUKBgiPz+/4uLipUuXJiUl+fr6NqpRf3//hj/W1tYSQkxNTfUaEeA3OKgHQ5SXl3fo0KG1a9e+9957crmcEHL8+PFx48alpaU1Gpmdnb1+/Xo7O7s//OEPfCQFwN4oGCQXF5ekpCQLC4sPP/xQKBQSQqytrcVisYWFhXZMZGTk+vXra2tr+/Tp88MPPzTaYwXQGwGWJgHDVF5e/uOPP/bt2/fdd9/VOaC4uPjRo0fFxcXff//9kydPjh8/PnLkSD2HBCCoUTBY9+/fnzNnzvDhwxMSEloeWVVV5e3t7ePjo1Ao9JMNoCHMjYLR69Wrl5OTU05ODt9BoJtCjYJxUCqV9+/fV6lUhJDIyMhTp05pNxUVFeXm5rq6uvKXDro1nGIC43D69OmNGzdGR0f7+PhcuXLl+PHj8fHxXl5eT58+PXbsGKV0zZo1fGeEbgo1CsbB29t748aN7On4vXv3Tpky5fjx4+Hh4ebm5hMmTPjss8+GDx/Od0bopnCKCQxU208xAfALc6MAAJygRgEAOEGNAgBwghoFAOAENQoAwAlqFACAE9QoAAAnqFEAAE5QowAAnKBGAQA4QY0CAHCCGgUA4AQ1CgDACWoUDJRAIOjfv3+fPn34DgLQCqw3CgZKrVanp6fznQKgddgbBQDgBDUKAMAJahQAgBPUKAAAJ6hRAABOUKMAAJygRgEAOEGNAgBwghoFAOAENQoAwAlqFACAE9QoAAAnqFEAAE6wwpMR2Ldvn7Oz8+LFi5tuqqqqio6OvnPnjlAo9PDwCAgI6N27t/4TQttpNJrly5fL5XIHB4cWhhUVFX3//feEkFmzZk2cOFFf6aA9sDdq0NLS0qqrqysrKwUCQWlpaU5OTsOtjx8/njFjxrZt28rKygoLC7/55pu5c+eWlpbylRZalZycLBAIcnNz7ezsbt68WVdX19zI9evXHz169MCBA7du3dJnQmgH1KhBi4yMnDlz5o0bN44dO+bn55eQkNBwa1hYmIWFRUpKSkRExLFjxw4ePJifn3/o0CG+0kLLlErl119/vWDBAnNz8+XLl69ZsyYrK0vnyJMnTyYkJGzYsEHPCaF9cFBv0Hbu3FlSUrJ69WqBQJCSkmJqatpw6549eyoqKuzt7dkfFyxY0Lt377t37/KRFFpna2ubmJj4008/7dmzZ968eR999JHOYUqlUiaTLVq0aPbs2XpOCO2DvVGD9vTp0/fff18ikQwaNGjt2rWEkLy8vLCwMHYvRiAQaDuUJRQKX7x4wU9WaIOUlJStW7f++OOPhw4diouLI4RcvHgxLCys4dH9119/XVtbu2XLFt5SwmvC3qhBs7Gx+eSTT+bOnevp6ZmRkUEIyczMDAkJEYlEbm5ujQbfvXtXqVR2mdMRmZmZhJCysrKqqqpevXrxHadjuLu779ixw87Obvfu3UOGDCGExMbG/vDDDx988AF7qJGYmBgdHR0aGurg4PDzzz/znRfahoJRqaysTE9Pf/r0aaPH6+vr/f39R44cWVlZyUuwDlRWViaVSnv06GFlZUUIsbe3l8vlL1684DsXVxkZGStXrnz27FnDB/Pz89PT0+vr6ymlKpVq4sSJvr6+7I/FxcUMw4SFhfETF9oMNdoVaDSaTZs2icXiM2fO8J2FE7Va/e2337LXbJmamgYGBmp3rr28vJKTk/kO2E4VFRXr168Xi8UMw+zcubO5YX/5y1+cnJzu3bvH/ogaNRaoUaOnVqu/+OILR0fHY8eOaR98+PBhWloaj6na4fz58yNGjGBLc/bs2do2iYmJcXFxYR/38/PLzc3lNebrqauri4iIGDFiBMMwTk5OX375ZVlZmc6RBQUFYrF44MCBE3/j7e3NMMzw4cPfffddPceG14IaNW7FxcV+fn5ubm4//fRTw8eXLFkiEokkEkl+fj5f2douMzPTz8+PLcqhQ4fGxsY2GqBSqUJDQ62trQkhlpaW69ata3RobJiSk5NnzJjBMAzDMP7+/hkZGS0MLi8v3/9727dvZxgmMDDwyJEjessM7YAaNTKpqanvvPMOe3ibkpIyatSo4cOHnzt3Lq8BtVq9fft2FxcXhmEGDx4sl8ufP3/Od3DdlErlunXrzM3NCSF9+vQJDQ1tIWpBQUFQUJBAICCEODo6KhQKjUajz7Rtl5ubK5FI2AL18fGJiYlpbuSBAwfeeeedX3/9tekmHNQbC9SokUlMTBwzZszZs2cppWPHjmV0YUcWFhYGBweLRCKGYcaNGxcVFWVQpVNfX69QKPr3708IEQqFQUFBv/zyS1ueeO3aNe2E6fjx469cudLZUV9LTU3N7t27nZ2d2e+w3bt319bWtjB+69atY8aMUalUTTehRo0FatSIZWRkpOvScMzly5dnzZrF1uuSJUvu37/PV9qG2C8DtgqnT59++/btFkY2bX+NRqNQKAYMGEAIEQgEQUFBxcXFnRy5dfX19VFRUaNHj2YYRiQSBQcHt/GLoTlqtTo9Pb2kpKSjEkInQY12feyv96hRoxiGcXR0DA4O5vE388mTJ0FBQWyBisXilg/MExMTCSHe3t4pKSlNt1ZXV8tkMnZCoGfPnjKZjMe5i5s3b/r5+bFfV76+vjdu3OArCegfarS7qKysDAkJYQ823d3d9+/f3/LBZoerqamRyWQWFhaEECsrK5lMpnNCsKHY2FiRSMTucq5YsaKwsLDpmMePH/v7+7O97ObmFhUV1Tnxm1VcXKydPPH09DS0yRPQA9Ro95Kdnb1ixQp2p2nSpEnnzp3Tw5tqNJqoqKiBAweyhejv75+Xl9fG57axfBMSEkaNGsWW6cyZM+/cudOhn0A3lUq1f/9+Nzc3hmFcXV1DQkKqq6v18L5gaFCj3VFSUtL06dPZMg0ICHj48GHnvdf169cnT56svYRe5+F5qxpOBTg5OSkUiqZj6urqwsPD+/XrRwgxMTGRSCSdN3eh0WiOHj26ePFidhp01apVOveUoZtAjXZTarX64MGD7u7uDMOMGjXqq6++UiqVHfsWhYWFEolEKBQSQhiGCQ8PZ+9xbLe2nJgqLy+XSqUmJiaEEFtbW7lcXldXx+VNm7px44b2i+Hjjz9OTU3t2NcHo4Ma7dZKS0u/+OKL+fPnE0Ls7Ow66tb12tpauVzOridiZmYmlUo76k7/Nl4m9eDBA/ZDEUI8PDzi4uI65N1LS0vZm/270p3+wB1qFOitW7emTZvGls6wYcPi4+O5vFpMTMygQYO0925mZ2d3VE4t9qJ9MzMz7UX7Ok+XxcTEDB48WJskKyur3e+oVqu1XwympqZSqbTp6jDQbaFG4aWYmBhXV1dt6eTk5LzuK2RkZMybN0+7D8jeI9B5MjMzFyxYoL2FtNHtsKym9deO/eLz588PGzZMe7O/gVx7C4YDNQqvsAfjNjY22oPxqqqqtjyRnZFkj3bZyYEOn5FsTnMLmjRUVFTUaJa2jQfjDx8+9PX1ZV/c3d3d2BfQgk6CGoXGGp4aEolELZ8aYs+P9+3bV3t+vLS0VJ9p6W+7nNrl9Zo74m54aqjVZfcqKiq08wa2trbNzRsAUNQoNCc1NXXSpEls6Xh7e1++fLnpmPPnz48cOZIdM2vWrLt37+o/p5Z2secWzv+wV7A6Ozu3cAVrwwun2LNYuB0TWoYahWaxpePk5KQtnf/973/spkePHmnvHRoyZIj+7x1qzs2bN6dOncoG8/T0vHTpUtMx7CX9lpaW2kv6tSuDXLhwYfTo0ezTZ8yY0WiBAgCdUKPQCvbWdfY+op49e27YsGHTpk3snezW1tb83snenLacLsvPz9cuuycWi3fu3Kn9Yhg4cKDOK/wBdEKNQptkZWUtXryYPc5l//unP/2J4wpGnYpd6Zk9Xcau9KzzdNnFixfHjh3LfkMQQmxsbFpe8xSgKdQovIaTJ08SQiwsLG7evMl3ljZpy0rPL168WL16NSHE1dW1qKiIl5xg1PB36uE1sNOOlpaWnp6efGdpE0dHxyNHjly7ds3Hx6ewsPCDDz6YMGHClStXGo7p0aPH3LlzCSHsWoI8JQUjhhqFru+NN95ISUk5fPjwgAEDrl+//uabbwYHB/MdCroO1ChwdfDgwW3btimVSr6DtEQoFH744YdZWVkymczMzMzBwYHvRNB1mPAdAIzet99+m5WV5e/vb2try3eWVvTs2XPLli2BgYGOjo58Z4GuAzUK3Y6bmxvfEaBLwUE9AAAnqFEAAE5QowAAnKBGAQA4QY0CAHCCGgUA4AQ1CgDACWoUAIAT1CgAACeoUQAATlCjAACcoEYBADhBjQIAcIIaBQDgBDUKAMAJahQAgBPUKAAAJ6hRAABOUKMAAJygRgEAOEGNAgBwghoFAOAENQoAwAlqFACAE9QoAAAnqFEAAE5QowAAnKBGAQA4QY0CAHCCGgUA4AQ1Ct2ORqPhOwJ0KahR4MrT03PixImWlpZ8B2mT1NTUyZMnnzx5ku8g0HWgRoGrqKioK1euiMVivoO04smTJ8uWLZs4ceLVq1f/9re/8R0Hug7UKHR9KpVqx44dI0aMiIyMtLS0XLduXWxsLN+hoOsw4TsAGJNnz54RQl68eKFWq83MzPiO0yaxsbFSqTQvL48Q4ufn991337m4uDQa88svvxBCSkpK9J4OugLsjUKbUEqPHDkyYcIES0tLlUo1YsSI48eP8x2qFTdv3pw6derChQvz8vLGjRuXlJQUGxvbqENVKtWWLVukUmn//v2vXr06Y8aM9PR0nvKC0aIArUlOTvby8mL/DzNmzBg3Nzf233PmzLl//z7f6XQoLS2VSqU9evQghNjb28vl8hcvXjQao9FoFAqFSCQihAgEgsmTJ9vb2xNCevTosWrVqtLSUl6SgzFCjUJLCgoKgoKCBAIBIcTR0VGhUGg0mrq6uvDw8H79+hFCTExMJBJJSUkJ30lfUqvVcrm8V69ehBBTU1OpVPr06dOmw65fvz5p0iT2y8Db2zslJYVSqlQq161bx05W9OnTJzQ0tLa2Vu+fAIwPahR0U6lUoaGh1tbWhBD2tMyzZ88aDqioqJBKpSYmJoQQW1tbuVxeV1fHV1pWXNwd7Z7ywoULHz9+3HRMYWGhRCIRCoWEEJFIFB4eXl9f33BAZmbmggUL2BcZOnRobGysvuKDsUKNgg4xMTHaOUQ/P7/c3NzmRj548GD+/PnsSA8Pj7i4OD3GfOXhQ+rrS/v2re/Vy8nd3f3MmTNNx9TW1srlchsbG0KImZmZVCqtrKxs7gXPnz8/YsQI9nPNnj373r17nRkfjBtqFH4nLS1typQpbH14enomJSW15VkxMTGDBw/W1m5WVlZn59QqL6effUZNTCgh1M6O/v3vj3TuFMfExAwaNEibMDs7u9VXZucHevfu3fL8AABqFF4qKytr9bRMC5pOSrawr9ch6upoeDjt148SQoVCGhREdc7QZmRkzJs3jy3QYcOGxcfHv9a7cPyfBboD1Ch05G5XUVGRduaRYZjw8PBOKp0LF+jo0ZQQSgidMYOmp+sYU15erm1AOzs7Lg3IXjvFdvHYsWMvXbrEKT10LajR7u7f//63p6cnWxBvv/32o0ePuL/mtWvXfHx82NdcujTpv//l/pKvZGVRf/+XBerkRBUKHWPUarp3L5058zv2i2HNmjUVFRXc3zoyMtLZ2Zn9XMuWLcvPz+f+mtAFoEa7r6ysrMDAQIZh5s2b19xpmXbTaDQRERFvvSUhhAoEdPlyyr1zqqupTEYtLCghtGdPKpPRX3/VMSwujnp4UEKoSFT39tv+Dx484PrGDbAXMNjY2MyfP9/V1TUkJKTRBQzQDaFGuyOlUrlp0yYnJyeGYTw8PA4fPtxJ1yrV1FCZjFpaUkKolRVdt462r3M0GqpQ0AEDKFvK/v70yRMdwx49erWjOmQIjYriGL9Z+fn5f/7zn0UiEcMw48aNO3HihEaj6aw3A4OHGu1e6uvro6KiRo4cyTCMo6NjcHCwHm7Xyc+nQUFUIKCEULGYKhT0dTtnwYKX5ejjQ1NTdQx49ozKZNTcnBJCra2pTEafP++Q7C25deuWn58fwzAMw8yfP//GjRud/pZgkFCj3UhKSsrMmTPZX/v33nsvIyNDn+9+9SqdMOFlG06YQK9efY3nHjpERSLd/VtfTxUK6uDw6nz9zz93YOpWaDSaqKioMWPGMAwjEokkEklhYaH+3h4MA2q0WygsLAwODmYL1MvLK6rzDndb1O7Kq6+n1dU6Hr94kY4d285q7kA1NTW7d+92cXFhGGbw4MG7d+9+roedYTAYqNEuzgB/wzvkAJz7REGHy83NlUgkvH9Xgf6hRo3A48ePZ8+e3dzWioqKffv2ffTRR42uVdJoNDExMV5eXtrjzYKCgs4P21bNnQ7avJl6edEjR343OCmJenm9nBXtqNNWnSQ5ObnlmZP6+vrJkycXFRU1fe7Tp08lv/f555/rJTVwgvVGDVpZWdmpU6f69u1rYmKiUqmio6Mbbi0oKNiwYYO3t/eOHTvi4uIKCgq0m27fvr1w4cKVK1cWFRWNHTv29OnT4eHhjo6Oev8EzRoyhERFkbg44uFBHj8mAQFk/nySk0Py8khaGlm7ljRcQ7mykqSlkaoqEhVFhg4lW7eS58/J8uUkM5OEhhJra/4+RhNvvvnm+fPn9+3bZ29vf/ny5Tlz5kil0vLyckIIpTQiIkKtVguFQnt7+1OnTqlUqobPLSwsjI2NLS0t1T7CrgAAho7vHoeW3Lt3LyAgYOHChW+88Yavr+/HH39cU1Oj3Xrjxo3Vq1efPn06NjaWYZj//Oc/7ONnz55lr8Xx9PSMjo428Gtx2Hs6+/alZmY0M5MGBlJ7e9q3L12x4tWY2FhKCE1IoDIZJYR6edHkZP4St41Sqdy4cSN7VdmwYcOOHDlSUVGxYsWKuXPnjh8/ftGiRcuXL2+0BtWFCxcYhrl9+zZfmaF9UKOGTqPRbNy4cenSpTExMc2NSUxMbFijtbW106ZN27x5sxFdGV5SQqOjKaU0MJA6OVG5nAoENDHx5VZtjdbU0H/+k/5+ZTuDpr3HISQkhH3kX//6V0BAwI4dO5oOPnr0KMMwP+vzUgPoCPhbTIZuz549BQUFBw4cmDp1qlgs9vT03LVrl5WV1aefftrcU8zMzBISEtiVQI1Fv35kyZJXP376KTl8mKxcSe7cIebmrx63siLLl+s/XfsNHjw4IiLi3Llz7N2x8fHx+/fvT0xMfOutt5ydnZcuXRoZGXnr1q3t27cLBIKSkhKhUGhiYnLt2jWBQDBy5EgrKyu+PwG0zph+07qnFStW1NTU2NvbR0REjBkzhhASHR1tZ2fXQo0SQoyrQ5syMSF795LZs8muXWTzZr7TcDZ37lz2H9OnTxeLxebm5mFhYezt+ZcuXTp16tS2bdvYGiWEeHp6CgSCuro6a2vrb775ZtmyZXxGh7bge3cYXltVVVWjo/VGB/VGjT2oZy1bRi0taXb2q4P6rqempka7nlZFRcW5c+fYBQazs7MXL17s6OiIm6MMH87UGx8bGxtrgzo53Wn27iVmZuTLL/nO0ZmsrKzYJQoJIba2tnPmzGHXbB00aJBcLtdoNCdOnOA1ILQONQqGa8AAIpOREydIUhLfUfjg6OgoFArZi6XAkKFGwaAFB5PRo8l33/GdQy9ycnLq6uq0P6ampmo0Gnd3dx4jQVsY94mI7mnhwoW2trYKhYIQkpaWplKp7t27Rwi5d++eiYkJwzDav47ZBZiYkLAw8ttfh+qCNm3adPbs2bS0tOrq6kWLFg0cOHDlypWurq4ZGRmhoaEODg6BgYF8Z4RWoEaNj0gk0t7c8vnnn2dlZbH/3r59OyHkj3/847Zt23gLx1nPnuS3qcKXJk8mEgmJiiKmpjxl6kx2dnZOTk6EEBsbm4MHD27fvv2TTz6hlAqFwilTpvz1r3/t168f3xmhFQJKKd8ZoP2ePXum0WgaPmJmZmZpaclXHuBOqVSWl5f379+fPdcEhg81CgDACU4xAQBwghoFAOAENQoAwAlqFACAE9QoAAAn/wegFyNpvDCcLwAAAM56VFh0cmRraXRQS0wgcmRraXQgMjAyMi4wOS4xAAB4nHu/b+09BiDgZYAARiDmhuIGRgYdDbCgBKMOSEKLzQHEZ2aB0XB5JlR5uDgzDnEWHOKsEHF2iDgzN9A9jEwZTEzMDEwsCSysDCxsGUxs7AxsHAkcnAwcXBlMXIwJIoxsjFwcbCxM4n1QD4AB90M3tQO7XjzaD+I8dFu2f9eLJjhbPPaLHYStdsBNfK49TBOC3bB0cbeeA1S9PZJ6B5gakDjCTDUHmF0MDAdUYWwxAB4hOOTe9IpfAAABInpUWHRNT0wgcmRraXQgMjAyMi4wOS4xAAB4nI1SS27EIAzdcwpfYJAxJIRFF/mMqqodIk3TucPse3/VnlFiaKOoBke28/Ji/DAgdp3e79+wGU3GAODBTinBzSOiuYAEMJxf3zKMSz+slXH+yssnOCcbZdXYfpkva8XBFU5km9RhbOHkbIzEC9Diwxhw6z9e3IonGAVFKaHvJEKq8du/fIVEG4TZ7wBD1QJ/QTWQ6g4a5tXu/qA32lZolcTGX0Bf00am/U+3HdMeNBtq1lSyHk3LYcm7o0NTE7OyWV/u4Dfmc54qxZ93YJjzpHdAFqnEgd2rjo49qFqB00bVcOytDj1wGnWojr3T2QVOkw7HiZczCFIoDxnkQeVZys4lX28+x+YHKz+hEFAKj2UAAADEelRYdFNNSUxFUyByZGtpdCAyMDIyLjA5LjEAAHicbY/BDoIwDIZfxSOYrdnKYGOcvKvxbjiAemNCiBxMeHgrYuJST1/792vS3vUlOW99XqcLzcpsJRI1UdebOVFCarAWbSYqDViWyrwThUuCkJdOuT+OArMmX4cmSImo1G+JYD+l5JuSrUp+gmQ3pKJ59OE09oNHuE4hPPdNe+uAPoPQdzsaHprhOIX2NkLuTewY7hhfxE7Bncy72HHcQa9VLFHPLD2/AKkOcPcwgkhRAAAAAElFTkSuQmCC\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('[*:1]c1c([*:2])c([*:3])c([*:4])c([*:5])n1')\n", "scaffold" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some molecules that share that scaffold. We've provided the atoms in different orders to make sure that's properly handled by the RGD code." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.479596Z", "start_time": "2023-01-05T12:50:05.391123Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cccn1 c1c(Cl)c(C)ccn1 c1c(O)cccn1 c1c(F)c(C)ccn1 c1cc(Cl)c(F)cn1'.split()]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do a version where we provide a scaffold without the R labels to start with:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.560231Z", "start_time": "2023-01-05T12:50:05.484558Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAV8UlEQVR4nO3de3hMd/4H8HdumgQhLok2Ll2ViyCxiFZF2qpU+2xbokWkUbQuRRqZkGkUcavLRuvpxWIt233sKtkspbXdon60arWKWITciEsSSYjIZXKfmd8fY9Nf/eTkqMz5zpzzfv3l4dN4P6Fv35k553MczGYziIjo13IUHYCIyL6xRomIHghrlIjogbBGiYgeCGuUiOiBOIsOQCRAcTGuXQOAVq3Qr1+TY/n5KCwEgIAAtG6tUDayOzyNkhalpGDQIAwahKAgHDzY5Ni6dXfGzp9XMBzZG9Yoad3MmaipER2C7BlrlLQuOxtr1ogOQfaMNUqa9vDDALByJXJyREchu8UaJU1LSoKzM2pqMGuW6Chkt1ijpGkBAZg8GQAOHMD27YLDkJ1ijZLWrViB9u0BQKfD7dui05AdYo2S1nl5YdEiACgqwsKFotOQHWKNEuHtt9G7NwBs2IDjx0WnIXvDu5jo/hw6dOitt966dOmSl5eXi4tLi3/9Tp363ry5t8W/bKP9++Hnd/dPurhg40Y8/TRMJkybhpMn4cz/M0g2/mWh+2AymSIiIsrKygAUFBRY47eore1suf/SSurq7v3zYWGYMAGffYYzZ7BuHeLirJiBVIY1SvdBp9OVlZU5Ojru2rUrMDDQGqdRBwdnqz6QwcenyV/64AN89RVu30ZSEiIj0aWLFWOQmrBGSS6DwbB+/XoA0dHRo0aNEh2n5XXpgiVLEBeHigokJWHTJtGByE7wIyaSa82aNQ0NDV5eXps3bxadxVpmz0b//gCwZQvS0kSnITvBGiVZ8vLy3n//fQCpqanWeC1vI5ydsWkTHB1hMiE+HnzeI8nBGiVZ9Hq9wWAYP358WFiY6CzWFRKCKVMA4PBh7NsnOg3ZA9YoNe/YsWM7duxwc3NbvXq16CxKSE5Gp04AcPq06ChkD1ij1AyTyTRnzhyz2azX6x999FHRcZTQoQNWrBAdguwHa5Sa8emnn/70009du3ZNSEgQnUU5U6diyBDRIchOsEZJSkVFxaJFiwAkJye31tLTiBwdsXEj72UiWfjXhKQsX778+vXrQ4YMiYyMlBi7ceNGjx49Wvx3d3c3VFU5tOzXPHECgYEYPhx//COAe9wY2igoCLt24fp1ANDGmxn0KzmYeU0HNeHixYt9+vSpr6//4YcfQkJCJCaLi4u9vb1bPIC7u7mqqoW/5tmz6Nu3hb8maRxPo9Sk+Pj42traN954Q7pDAXh5eRkMhhYP4ODQ8lduurnd939SVITUVMTEtHASUg2eRuneDh48OGLEiLZt22ZmZj5seWKRJtXVoWdP5OfjwAGMGCE6DdkkfsRE99DQ0KDT6QAsXLhQyx0KoFUrvP02AMTGor5edBqySaxRuocNGzacPXv2sccemzNnjugs4ul08PXFhQt3PpUiugtf1NPdSktLfX19S0pK9uzZ8/LLL4uOYxN270ZEBDw9kZV15wYnokY8jdLdFi1aVFJSMnz4cHZoo9GjMXIkSkuxdKnoKGR7eBqlXzh//nxwcLDZbE5LS+vXr5/oODbk/Hn07w+TCWlp4DeG/i+eRukXdDpdQ0PDrFmz2KF3CQzEjBkwGvl8EbobT6P0s927d0dERHh6emZnZ3fs2FF0HJtTWgpfX5SUYPduqHH9P/1KPI3SHXV1dXq9HsDy5cvZoffk6YnFiwEgPh61taLTkM1gjdIda9euzc7ODgwMnD59uugstmvWLPTrh0uX8OGHoqOQzeCLegKAoqIiPz+/8vLyr7/+euTIkaLj2LT/+R88+yzatkVmJrR9awLdwdMoAUBiYmJ5efno0aPZoc0aPhyjRqGiAgsWiI5CtoGnUcKpU6dCQkKcnZ3PnTvn6+srOo4duHQJgYGor8exYxg8WHQaEo2nUa0zm81xcXEmk0mn07FDZerZE3PmwGRCXByfHko8jWretm3boqOjvb29MzMz27VrJzqO3aiogL8/rl/Htm2IihKdhoTiaVTTqqur3333XQCrVq1ih96Xtm3x3nsAoNfDCqtWyZ6wRjVt1apVV69eHTBgwKRJk0RnsT+TJ2PwYOTnIzlZdBQSii/qtevatWsBAQHV1dXffvvtsGHDRMexS8eOYehQuLri/Hk+r0m7eBrVrrlz51ZVVUVFRbFDf7UhQxAZiepqJCaKjkLi8DSqUUePHh02bJirq2tGRkb37t1Fx7FjeXkICIDBgMOH8dRTotOQCDyNapHJZIqLizObzYmJiezQB9S1KxISACAuDkaj6DQkAk+jWrRp06YZM2Z069YtIyPD3d1ddBy7V12NwEBcvoxNmzBtmug0pDjWqOaUl5f7+/sXFhampKSMGzdOdByVSElBZCS8vJCVBV45pjV8Ua85S5cuLSwsHDp06NixY0VnUY/x4xEWhuLiOxeTkqbwNKotOTk5ffv2ra+v//HHHwcNGiQ6jqqcOoWQEPj5nf/iC1df356i45ByeBrVlri4uNra2jfffJMd2uIGDMD8+TtycoLj4/lUam3haVRDvvnmm/Dw8LZt22ZlZXXp0kV0HBUqLi728/MrKyv76quvXnjhBdFxSCE8jWpFQ0NDXFwcgMWLF7NDrcTLy2vhwoUA4uPj6+vrRcchhbBGtWLdunXp6em9evWKiYkRnUXNYmNj/f39MzIy1q9fLzoLKYQv6jXh1q1bfn5+JSUle/fu/d3vfic6jsrt3bv3pZde8vT0zMrK6tSpk+g4ZHU8jWrCggULSkpKRowYwQ5VwIsvvvj888+XlpYmJSWJzkJK4GlU/dLT0/v37w8gLS2tb9++ouNowoULF4KDg00m08mTJ4ODg0XHIeviaVT9dDpdQ0NDTEwMO1QxvXv3njVrltFo1Ol0orOQ1fE0qnI7d+589dVXO3TokJWV1bFjR9FxNKS0tNTPz+/mzZs7d+4cM2aM6DhkRTyNqlltbW1iYiKAFStWsEMV5unpuWzZMgDz5s2rqakRHYesiDWqZu+//35OTk6fPn2mTp0qOosWTZ8+PSgoKDc3d+3ataKzkBXxRb1qFRYW+vv7l5eX79+/Pzw8XHQcjTp06NDw4cPbtGmTmZn5yCOPiI5DVsHTqGrp9fry8vJXXnmFHSrQM888M2bMmMrKyvnz54vOQtbC06g6nTx5cvDgwS4uLufOnevVq5foOJqWm5sbGBhYW1t77Nixxx9/XHQcank8jaqQ2WyePXu2yWSaO3cuO1S43/zmNzqdrvEPRXQcank8jarQ1q1bJ02a5O3tnZWV5eHhIToOobKy0t/fv6CgYOvWrRMnThQdh1oYT6Nq0/g2XHJyMjvURrRp02blypX47xvWouNQC2ONqs2qVasKCgoGDhwYHR0tOgv97PXXX3/88ccLCwuTk5NFZ6EWxhf1qtL4acZ3330XGhoqOg79wg8//PDkk0+2atWKn/upDE+jqmK5YWbixInsUBv0xBNPREdHN95aRqrB06h6WK70dnd3z8jI6Natm+g4dA/5+fkBAQGVlZW8J0JNeBpVibq6Ossdn++++y471Gb5+Pi88847+O/aLdFxqGXwNKoSUVFR27dvb926dXFxsbu7u+g41KSamprAwMDc3NwNGza89dZbouNQC+BpVA1yc3NTUlIAzJw5kx1q41xdXS0f1ickJFy6dEl0HGoBPI2qwciRI/fv39+uXbvbt29LT+bk5PAzYmuT80328fEpKCh47rnn9u3bp0wqsh6eRtXAcouhl5eX9NjEiRMDAgLS0tIUCaVRaWlpAQEBzd6q1Lp1a/z3D47sHWtUDWJjYwFkZ2fv2rVLYszLy8toNL799tt8CWI9Op3OaDR6e3tLzHz55ZfZ2dkAZs6cqVQusiK+qFeJoKCgs2fPdu7cubi4uKmZ8vJyf3//wsLClJSUcePGKRlPI1JSUiIjI728vDIzM9u3b9/UmLe3d3Fxcb9+/c6cOaNkPLISnkZV4vPPP3d1db1x48aePXuamvHw8Fi6dCmAefPmVVVVKZhOE6qrqy3X1b/33nsSHbp79+7i4uKHHnroH//4h4LpyJrMpBYff/wxgJ49e9bU1DQ1YzQaBw0aBGDZsmVKZtMCyz9R/fv3b2hoaGqmtrbW19cXwCeffKJkNrIq1qh6NDQ09OvXD8Dvf/97ibHvv//ewcHBzc3typUrimVTvby8PMunRocPH5YYW716NYDevXvX1dUplo2sjTWqKt988w2Atm3bFhQUSIyNHTsWwGuvvaZYMNWLiooCMG7cOImZwsLCdu3aAfj6668VC0YKYI2qzcsvvwzgjTfekJi5evWqu7u7g4PDd999p1gwFfv3v/9tOeDn5uZKjE2ZMgXAqFGjlMpFCmGNqk1OTs5DDz3k6Oj4448/SowtWrQIwIABA4xGo2LZVMloNA4ePBhAUlKSxNjJkycdHR1btWqVmZmpWDZSBmtUhfR6PYAhQ4aYTKamZqqqqrp37w7gz3/+s5LZ1GfLli0AfHx8Kisrm5oxmUzDhg0D8M477yiZjZTBGlWh8vLyhx9+GMBnn30mMfa3v/0NgLe3d1lZmWLZVKbxW71t2zaJsW3btgHw8vK6ffu2YtlIMaxRddq8eTOArl27Sh+RLNudExMTlcymJpatd80e/Hv06AFgy5YtSmYjxbBG1cloNIaEhABYvHixxFjjG3ZZWVlKRVOPixcvynkbOikpCcBvf/tbvg2tVqxR1ZL58fHkyZMBjB49Wqlc6jFq1CgAU6ZMkZjhRRFawBpVswkTJgAYP368xExhYaHlOcy8mPG+HDx4UM4lupbdBVFRUYoFI+WxRtXs2rVrlltrvv32W4mxVatWAQgMDKyvr1csm11rvGFs9erVEmNHjx61vCC4fPmyYtlIeaxRlVu8eLHljTk5N3qvW7dOyWz265NPPpG/vmDJkiVKZiPlsUZVrvFj4j/96U8SY5ZFpZ6enjdv3lQsm526detWp06dAHz++ecSY5s2bWr2YglSB9ao+m3fvl3ORYvPPfccAMtSZ5IQExMDYPjw4RIz5eXlXbp0AbBjxw7FgpEorFFNCAsLAzBv3jyJmfT0dGdnZycnpzNnzigWzO6kp6e7uLg0+12aO3cugCeffFLielJSDdaoJpw6dUrODd2zZ89u9pylcSNHjgQQExMjMdO41uCnn35SLBgJxBrVijfffBPASy+9JDFz69atjh07AtizZ49iwezI7t27Le8g37hxQ2LsxRdfBDB16lTFgpFYrFGtKCoqsiy7/Ne//iUx9tFHHwF47LHHJD6D1qba2lo/Pz8AH3/8scTYgQMH5FxPSmrCGtWQ5OTkZlev19fXy1mhr0FyFtfX19f37dsXwJo1a5TMRmKxRjWk8Tz10UcfSYzJXKGvKTIX13/44Yc8y2sQa1RbvvjiCznv7slZoa8pchbXl5SUWN5Z/vLLLxULRraANao5zz//PIBZs2ZJzDR+1nz8+HHFgtksmdc5zJw5E8Czzz6rWDCyEaxRzTl//rzlysf//Oc/EmMJCQnNbtLUgsbF9Xq9XmLs3Llzzs7Ozs7OZ8+eVSwb2QjWqBbFxsYCeOaZZyRmZK7QVz2Zi+vDw8MBzJkzR7FgZDtYo1rUeFf4rl27JMbkrNBXN5mL63fu3AmgQ4cO3EigTaxRjfrDH/5g2VFUXV3d1IzMFfoqJmdxfeN+rPXr1yuZjWwHa1SjGhoagoKCAKxcuVJirHGFvgY3Zl67dk3O4voVK1ZwW6vGsUa1y7K/vU2bNvn5+RJjkZGRACIjIxULZiPGjx8PYMKECRIzjc8O2Ldvn2LByNawRjUtIiICwKRJkyRmZK7QVxmZi+tff/11AGPGjFEsGNkg1qimXbx40dXV1cHBQfrZlnJW6KuJzMX1J06c4HNVycwapcTERABPPPGEnCetS6/QVw05i+tNJlNoaCiA+fPnK5mNbBBrVOsqKioeeeQRAH/9618lxmSu0FcBmYvrt27dCsDb27usrEyxbGSbWKNk/vTTTwH4+PhUVFRIjFlW6CckJCgWTIh58+Y1u7jeYDB0794dwF/+8hcls5FtYo2S2WQyDR48GMDChQslxmTeWm7XZC4TWLBgAYCBAwdKXE9K2sEaJbPZbD527JiDg4Orq2tubq7EmJwV+nZNzuL6q1evWq4nPXLkiGLByJaxRumO1157DcDYsWMlZmSu0LdTMhfXv/rqqwCio6MVC0Y2jjVKd+Tl5VmuDz106JDEmJwV+vZI5uL6I0eOODg4uLu7X7lyRbFsZONYo/SzZcuWAQgODpa4PlTmCn27I2dxvdFoHDhwIIDly5crmY1sHGuUflZdXf3oo48C2Lhxo8SYzBX6dkTm4voNGzYA6Natm8FgUCwb2T7WKP3C3//+dwCdO3cuLS2VGJOzQt+OyFlcX1ZW5u3tDSA1NVWxYGQXWKN0t6eeegpAfHy8xIzMFfp2Qebiep1OB2Do0KEafxwA/X+sUbpbWlqak5OTs7PzuXPnJMbkrNC3C3IW11+4cMHFxcXR0fHEiROKBSN7wRqle5g+fTqA8PBwiRmZK/RtnMzF9S+88AKAGTNmKBaM7AhrlO6huLi4ffv2AP75z39KjMlZoW/LZC6u37t3LwAPD4/r168rlo3sCGuU7u2DDz4A0KtXr9ra2qZmZK7Qt1lyFtfX1dX5+/sDWLt2rZLZyI6wRuneZNaHzBX6Nkjm4no5/5yQxrFGqUkyX8xaVujb3bOF58yZAyAiIkJiRuabG6RxTkuWLAHRvfj5+R0/fjw9Pb2ystKys+OeQkJCPD09k5KSXFxclIz3gMLCwlxcXBISEjw9PZuaiY+PP3r0aHh4uOX+LqJ7cjCbzaIzkO3KyMgICgoyGo3Hjx+33AepHadPnx40aJCDg8Pp06f79OkjOg7ZLkfRAcimBQQExMTEmEwmy2t20XEUFRcXZzQaY2Nj2aEkjadRakZ5ebmfn19RUVFqaqplR5wWpKamjhs3rnPnzllZWZa3R4mawtMoNcPDw8PyBnp8fHxVVZXoOEqoqanR6/UAli9fzg6lZrFGqXnTp08fOHDgtWvX1q5dKzqLEtasWXP58uXg4OCpU6eKzkJ2gC/qSZbvv/8+LCzMzc3twoULlqe5qVV+fr6/v7/BYDh06NDTTz8tOg7ZAZ5GSZbQ0NBXXnmlqqpK9Qe0KVOmGAyGsWPHskNJJp5GSa4rV6707NnTZDJt3LhxxowZouNYxebNm6dNm+bo6Jidnd2zZ0/Rccg+sEbpPoSGhh49etTd3b2srMzZ2Vl0nBZmMpk8PDwMBkNoaOiRI0dExyG7wRql+1BUVOTj42M0Glu1auXk5ASga9eubm5uonM9kOrq6ry8PABGo7Gurs7JySk/P9+y6J5IDrUdKMiqvL299Xp9cnJyXV2d5Weys7PFRmpZTk5Oer2eHUr3hadRum/p6elnzpyx/LhHjx6WxzLbL4PBcOXKFcuPg4KCeM8S3S/WKBHRA+EFT0RED4Q1SkT0QFijREQPhDVKRPRA/hcio9/AEhwazwAAAIl6VFh0cmRraXRQS0wgcmRraXQgMjAyMi4wOS4xAAB4nHu/b+09BiDgZYAARiBmg+IGRjaHDCDNzEwUg91BA8zgZmDMYGJkSmBizmBiZklgYc1gYmVIEGFkY2BlYWZiFA+CWgMGQHsO2K9epaUC4TrYP3Rbth/K3o9gH9hfWlKniiRuj6QezBYDAD+/HOIMO2nyAAAAx3pUWHRNT0wgcmRraXQgMjAyMi4wOS4xAAB4nI2R2wrCMAyG7/sU/wtY0sOmvdzWISLrQKfv4L3vj4lSu4mMJQ0k4SOnKohc4vnxxFdsVAqglRdCwN0RkRogDtr+eEropqbNmW68pemKmpVEl2QzjUPOGHQwuiIRkKZfJ3OWOdL7N4id0TYEcoc/oGNwt4n0Qm7pXS1KrlSskbCF61NcHOFzlnZMsZzFs9myvWdzZUcvVhYRrcq4HKAuQ3k2M+897yRx/jr21Qt0wGRJ21xkkwAAAE16VFh0U01JTEVTIHJka2l0IDIwMjIuMDkuMQAAeJxLNkxOzktONlSo0dA10DM31dE11DOytDQw0bEGskx1DIA0WBwujMqDqkHVqlkDAFlQEo2qAIwYAAAAAElFTkSuQmCC\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('c1ccccn1')\n", "scaffold" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.779397Z", "start_time": "2023-01-05T12:50:05.561421Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Those labels were automatically assigned and they aren't consistent with what we had above. Notice, however, that the symmetry in the scaffold has been properly handled. \n", "\n", "If we care about the R group labels, We can explicitly label the side chains:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.838879Z", "start_time": "2023-01-05T12:50:05.780626Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR2R3
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R2 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "\n", " R3 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# note: there's a bug in RDKit 2019.03.3 and 2019.03.4 that causes this to generate different\n", "# results with those versions\n", "scaffold = Chem.MolFromSmiles('c1c([*:2])c([*:3])ccn1')\n", "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've just been looking at compound images since that's a bit more readable. Here's what the raw output from the function looks like:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.888219Z", "start_time": "2023-01-05T12:50:05.839909Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Core': ['c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1', 'c1cc([*:3])c([*:2])cn1'], 'R2': ['F[*:2]', 'Cl[*:2]', 'O[*:2]', 'F[*:2]', 'F[*:2]'], 'R3': ['[H][*:3]', 'C[*:3]', '[H][*:3]', 'C[*:3]', 'Cl[*:3]']}\n" ] } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=False) \n", "print(groups)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also get that in a row-oriented format:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:05.993161Z", "start_time": "2023-01-05T12:50:05.891802Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': '[H][*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'Cl[*:2]', 'R3': 'C[*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'O[*:2]', 'R3': '[H][*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': 'C[*:3]'}, {'Core': 'c1cc([*:3])c([*:2])cn1', 'R2': 'F[*:2]', 'R3': 'Cl[*:3]'}]\n" ] } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=True) \n", "print(groups)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stereochemistry\n", "\n", "Making sure that the sidechains are labelled correctly on chiral centers can be a bit trickier.\n", "\n", "Here's a set of molecules we'll be using. Some have a chiral center, some don't. There are a few that have sidechains with dual attachment points (i.e. rings). We'll look at those in the next section." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.075609Z", "start_time": "2023-01-05T12:50:06.002666Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [x for x in Chem.SDMolSupplier('../data/rgd_chiral.sdf')]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remove the examples with \"ring\" sidechains. We'll get to those later" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.163267Z", "start_time": "2023-01-05T12:50:06.076737Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = Chem.MolFromSmarts('[R2]')\n", "mols = [x for x in mols if not x.HasSubstructMatch(q)]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll define two scaffolds, the first with a chiral center, the second without. In this case we will add explicit markers for the substituents. This is currently (v2019.09) necessary to properly handle atomic chirality." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.269438Z", "start_time": "2023-01-05T12:50:06.164977Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('[*:1]C1([*:2])OCCC=C1')\n", "chiral_scaffold = Chem.MolFromSmiles('[*:1][C@]1([*:2])OCCC=C1')\n", "Draw.MolsToGridImage([scaffold,chiral_scaffold])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start with doing a decomposition with the non-chiral scaffold. This matches all the molecules, but generates results that are not consistent with the chirality. The compounds in rows 2 and 3 (numbered from zero) demonstrate the problem clearly." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.461255Z", "start_time": "2023-01-05T12:50:06.274717Z" }, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,_ = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False) \n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try the chiral scaffold. This one will only match the chiral compounds, but it does the right thing with those:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.552713Z", "start_time": "2023-01-05T12:50:06.462250Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[13:50:06] No core matches\n", "[13:50:06] No core matches\n", "[13:50:06] No core matches\n", "[13:50:06] No core matches\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([chiral_scaffold],mols,asSmiles=False,asRows=False)\n", "tmols = [mols[x] for x in range(len(mols)) if x not in unmatched]\n", "PandasTools.RGroupDecompositionToFrame(groups,tmols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in each case the atom is assigned to the correct R group." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also combine the two scaffolds so that we can get the chiral and achiral cases. Order is important, so we include the more specific scaffold (the chiral one) first. In this case the stereochemistry determines the R1/R2 assignment for the chiral molecules. For the non-chiral molecules R1 and R2 are assigned using the standard symmetrization code." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.654456Z", "start_time": "2023-01-05T12:50:06.553996Z" }, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([chiral_scaffold,scaffold],mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sidechains that attach in more than one place\n", "\n", "This one is tricky, and there's not really a right answer, this is just a demonstration of what the current code does" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.661797Z", "start_time": "2023-01-05T12:50:06.655405Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [x for x in Chem.SDMolSupplier('../data/rgd_chiral.sdf')]\n", "q = Chem.MolFromSmarts('[R2]')\n", "mols = [x for x in mols if x.HasSubstructMatch(q)]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.782570Z", "start_time": "2023-01-05T12:50:06.662717Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('C1OCC=CC1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Scaffold Variation\n", "\n", "What happens if there are small variations in the scaffold within the series, something that we see all the time in med chem work?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.880913Z", "start_time": "2023-01-05T12:50:06.783759Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [Chem.MolFromSmiles(smi) for smi in 'c1c(F)cccn1 c1c(Cl)c(C)ccn1 c1c(F)cncn1 c1c(F)c(C)ccn1'.split()]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:06.969261Z", "start_time": "2023-01-05T12:50:06.886152Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[13:50:06] No core matches\n" ] }, { "data": { "text/plain": [ "{'Core': ['c1cc([*:2])c([*:1])cn1',\n", " 'c1cc([*:2])c([*:1])cn1',\n", " 'c1cc([*:2])c([*:1])cn1'],\n", " 'R1': ['F[*:1]', 'Cl[*:1]', 'F[*:1]'],\n", " 'R2': ['[H][*:2]', 'C[*:2]', 'C[*:2]']}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold = Chem.MolFromSmiles('c1c([*:1])c([*:2])ccn1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([scaffold],mols,asSmiles=True,asRows=False)\n", "# the second return value, unmatched, provides the indices of the molecules that did not match a scaffold:\n", "print(unmatched)\n", "groups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that now we only get three results, the third molecule (index 2) didn't end up in the output.\n", "Sometimes this is ok, but in cases like this it would be great if that molecule were also included in the R-group decomposition. \n", "\n", "One solution to this is to provide two different scaffolds:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.095976Z", "start_time": "2023-01-05T12:50:06.974410Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffold2 = Chem.MolFromSmiles('c1c([*:1])c([*:2])ncn1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([scaffold,scaffold2],mols,asSmiles=False,asRows=False)\n", "print(unmatched)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that `unmatched` is now empty; all molecules matched one of the two cores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another is provide the scaffold as SMARTS:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.207251Z", "start_time": "2023-01-05T12:50:07.097158Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sma_scaffold = Chem.MolFromSmarts('c:1:c(-[*:1]):c(-[*:2]):*:c:n:1')\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([sma_scaffold],mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiple scaffolds\n", "\n", "What about if we have multiple scaffolds which share a common SAR? Here we just provide them both and label the attachment points manually to show the correspondance." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.311711Z", "start_time": "2023-01-05T12:50:07.208364Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mols = [Chem.MolFromSmiles(smi) for smi in 'Fc1ccc(O)cc1 Fc1ccc(OC)cc1 Oc1ccc(Cl)cc1 Clc1ccc(OC)cc1 Fc1ccc(O)s1 COc1ccc(F)s1 Clc1ccc(O)s1 Clc1ccc(OC)s1'.split()]\n", "Draw.MolsToGridImage(mols,molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.554666Z", "start_time": "2023-01-05T12:50:07.314210Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scaffolds = [Chem.MolFromSmiles('[*:1]c1ccc([*:2])cc1'),Chem.MolFromSmiles('[*:1]c1ccc([*:2])s1')]\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose(scaffolds,mols,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,mols,include_core=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the symmetrization also worked." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Looking at the options that are available\n", "\n", "We'll use a real dataset pulled from ChEMBL for this:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.565016Z", "start_time": "2023-01-05T12:50:07.555903Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO2de1RTV9rGnyTcbwICAgVEvFTAC15QOngXrTooVqt2OmV56Syq7TSdVVdLZ33FdLWdWXTaaWM7Vu3U1rTOVPGCorXjpailFcELogWtCioIqCAgCARIcr4/tj2NuUAgIfsk2b+/dJ+Tk4fk5Dl77/fd7xZxHAcGg8Fg9BYxbQEMBoNh2zAbZTAYDLNgNspgMBhmwWyUwWAwzMJRbLSmpoa2BAbDsXCcH5392+jVq1cDAwNDQ0NdXFwSEhJ+/PFHlpzAYPQdJSUl06dPd3NzCw0NDQwMvHr1Km1FfY7Ivj3l4sWL8fHx7e3t2o1RUVGLFy9evHjxhAkTRCIRLW0Mht3AcVxhYeHu3bt3795dXl6ufcjV1fX06dMjR46kpc0K2LONlpSUzJ07t7Ky0s3N7dixY4WFhXl5efn5+VVVVeSEsLCwefPmJScnz50718nJia5aBsPm0Gg0J0+ePHDgwK5du8rKykhjQEDAuHHjJk+enJSUNH369La2tpCQkIMHD8bFxdFV24dwdkp+fn7//v0BTJ069datW3y7Wq0+c+aMTCYbOnQo/yH0798/NTU1Jyeno6ODomYGwyZQqVR5eXlSqTQ0NJT/EYWHh0ul0iNHjnR2dvJn1tTUzJo1C4Cvr++JEycoau5T7NNG9+7d6+7uDuCpp55qa2szdtrPP/8sk8mGDx/O3wp+fn7ET5VKpTUFMxjCh3fPAQMG8D+ZyMhIqVSal5en0WgMvqq9vf2ZZ54B4OrqumPHDitrtg52aKNbtmwhI/SXXnpJrVab8hLip+PGjeNvDg8Pj+TkZIVC0dzc3NeCGQwh09bWlpOTk5qa2q9fP/4HEhUVRdzTlCtoNJq1a9cCkEgkn376aV8Ltj72ZqOZmZkARCKRTCbrxcvLy8vlcnliYiIfenJ3dyd+ev/+fUuLZTCES2trK3FPb29v3j1jYmJkMllJSUkvLiiXy8nPKj093VjX1UaxHxtVqVQvvPACeeJ99tlnZl7txo0bOn7q5uaWnJy8efPmu3fvWkQwgyFAGhoasrKyUlNTPT09ddzzl19+MfPiCoXC2dkZwPLly7WnUG0dO7FRpVL59NNPk8H4gQMHDJ5z/fr1Xly5srJy8+bNycnJfChfIpEkJibK5fKamhqzRDMYguHevXsKhSI5OdnFxYXc52KxODExMTMz89q1a6ZfR6PRkMwnYyccPnyYdG/nz5/f0tJiCe30sQcbra+vnzRpEgB/f3+SXa9PZmamq6vr0aNHe/0udXV1Bu8zuVyunQnAYNgQtbW15K4mnUTtXkJVVZXp1+ETYIYMGQIgMDCwi85mYWFhYGAggIkTJ9bW1lri76CMzdtoVVXVqFGjAAwcOPDSpUv6J6hUqrS0NABOTk4KhcL8d6yvryd3nqurK++n48aNk8lkV65cMf/6DEZfU1FRIZfLk5KS9MdYt2/fNv06KpXq+++/f/HFF0NCQrQzn1555ZWuYwnXrl0jhhsdHX3z5k2z/yDK2LaNlpSUhIeHA4iNja2srNQ/QalULlmyhCRb7Nq1y7Lv3tLSQubgvby8dGaRDBo6g0GX69evG5zxVygUjY2Npl+Hz3wKDg7m7/yBAwd2nfmkQ01NDUnIDwkJOX/+fG//JkFgwzbKJ9hPmzbN4E1QX18/efJkkg1qYmZG7+Bjmj4+Pjp+eubMmb57XwbDFMrKyoh76ufzNTU1mX4dkvmUlpZGhuQ6mU+9CL43NzfPnj0btp+cb6s2yifYL1y4sLW1Vf+Eqqqq0aNHAwgNDb1w4YJ1VPH3WVBQEH+fDRo0qNf3GYPRa0g2dExMDH8r+vr6LlmyRKFQPHjwwPTr9Gkvob29fdmyZbDx5HybtNEvvvii6wT70tLSiIgI8mVXVFRYX2EXo56dO3eauCiAwegpGo3m0KFDMpls2LBh/I3n7+9P1ua1t7ebfimrzVnZQXK+7dlotwn2p06dCggIAJCQkFBXV2dddbqoVKrc3NyXXnpJe/WxRCKZNWuWPeXNMajT2dk5a9YsiUTC32ahoaEvvfRSbm6uSqUy/TpdRFCvXr3ad/ptOjnflmzUlAT7ffv2eXh4AEhJSTE42KcFyQiRSqV8bHTFihW0RTHshxUrVpD7ysnJafny5UeOHOmRe/L5fGZmPpkDn5y/YsUK2+pk2IyNaifY79+/3+A5X375JTGpVatWCflrIDVvIiIiaAth2A8kZWX27Nk9elVFRYWgVpfYaHK+bdiodsy9iwR7ch+kp6dbWV5P2bt3L3ke0BbCsB9IxHXfvn2mnCzktc4FBQU2l5xvAzZqSoL9mjVryIN006ZN1lfYUzo7O8kjt0epzgyGMe7cuQPAy8ur60EYn/kk8Mo7165dGzx4MGwnOV/oNtqjBPudO3daX2HvmDFjBgBjsxMMRo/IyckBMHPmTINHbbEOZHV1tQ0l5wvaRvPz80nMferUqQ0NDfonNDQ0TJkyBX2fYG9x3njjDQAZGRm0hTDsgTfffBPAX//6V532uro6suaS0L9//1WrVn377bc9ynyiRVNTk61UzheujXabYF9dXc0n2BcXF1tfoSmoVKrVq1f/7ne/0wmb7tmzB8CTTz5JSxjDniBrgbKzs/UPRUdHBwQE2OgeOUqlkk/Oz8rKoi3HKAK1UeEn2JtOVFQUgIsXL2o3VldXk8eszaXIMYSGRqPx8/MDYDAzqaKiwqaXe2g0mldffZVEPjZu3EhbjmGEaKPdxtwFlWDfLWQjms8//1yn/bHHHgNgfilchoNz+fJlAGFhYbSF9CGZmZl8cj5tLQYQQ0io1eo1a9a88cYbEolk8+bNvJ9qs3///hkzZtTV1aWkpOTm5pLqJEJmwoQJAAoLC3XaJ06caLCdwegR5BYit5O9kp6e/uWXXzo7O7/33nsrV65UqVS0FT2CgGyU7CC4adMmNze3HTt2kCKhOmzdunXRokWtra0rV67ctWsXmTwVOMZs1Fg7g9EjyC1Ebic7Zvny5Xv27PHw8Ni6devixYvb2tpoK9KCdnf4IaYUtRN4x94Yra2tzs7OEolEp6xObm4ugAkTJtASxrAP4uPjARw7doy2EGsgzOR8Qdhot0XttBPsBTvN3AVjx44FoPN4aGpqkkgkLi4uSqWSljCGraNUKl1dXcVisaDy5/uU0tLSgQMHAoiJiRFIcj79QX1paekTTzxRXFwcGxt76tSpkSNH6pzQ3t7+7LPPbty40dXVdfv27atXr6ai0xzIgKugoEC70dvbe/jw4R0dHcXFxZR0MWye8+fPt7e3x8TEaBcDtW+io6Pz8/Pj4uJKS0sTEhKE8POhbKMFBQVTp06tqKhISEg4ceIEWbCkTWNj4+zZs7Oysvz8/I4cOUKqk9gcLMrE6CMcIb6kT0hIyLFjx6ZMmVJTUzN9+vS8vDy6emjaaE5OzvTp0+vq6hYuXGgw5k4+ox9++CE0NPT48eNk8tQWMWaXLMrEMBMHiS/p4+vre/jw4aVLlzY0NMyaNWvnzp001dCaTVi5ciUpMWusqJ12gr1AZkB6jVqtJmMunVok586dAzBs2DBawhi2ztChQwEUFRXRFkIHlUr14osvAhCLxc899xwtGXRsdNq0acTE33zzTYMn8PG4hIQE4cTjzMFgLZLOzk5PT0+RSFRfX09LGMN2aWhoEIlEHh4eQq6uawVISQEA06ZNoyKAwqC+qKjo+PHjAFJTU9955x39E/bv3z99+vTa2toFCxbk5uaSBUu2jsHxu5OTU1xcHMdxp0+fpqSLYcMUFBRwHDd27Fi+6LJj8s4776xatQrA8ePHi4qKrC+Ago3W1NQAkEgkX331lf5RhUKxePHi1tbWFStW7N692yYS7E3BYLAeLMrEMAPHjC8ZZMuWLWSS8Pbt29Z/dwo2+uSTT4pEIrVafeHCBZ1D27ZtW7lyZWdnp0wm43cEsQ8SEhIAFBYWchyn3c6iTIxe47DxJX0uXryoVqtFIhGprWdlKNioRCIZP348gOvXr+scWrBgwbhx4zZu3PjWW29ZX1ifEhISEhYW1tjYePXqVe12Y71UBqNbyFyQQ9lofX39vHnz3nvvPZ32srIyAPHx8VT6XnQSnkh5RP0umI+PT35+vi0m2JuCQcccNGhQUFDQ3bt3b9y4QUcWwza5fv36nTt3AgMDIyMjaWuxHgUFBd99993Bgwd12omZEGOxPnRslKwCNjiStaeBvA7ERvWjSV18GgyGMRxzYtTYX03ayU/J+tCxUfIpnD59WqPRUBFABWPjdzY9yugFjjkxatAuOY47e/asfrvVoGOjwcHBERER9+/fv3LlChUBVBg/frxEIiGLoLXbWbCe0Qscszd65swZ6P3Vly9fbmxsHDhwYEhICBVV1BaDOmBoxdvbOzo6Wr8WSXx8vEgkOnv2rNCK0TIEi0qlKioqEolE2vt92j3l5eV3794NCgoi6xt5iI1Q7JhTtlFH64IZfHj4+/sPGTKktbX1559/pqSLYWNcvHixpaVl6NChwt/9wYKQHw7JHdSG+vwGs1Grwko9dcvVq9i5E/v2GTh0+TJ27sSjCWMOimOO6I3ZpePa6Pjx452cnIqLi4W1GUAfY8xGWbCe5+BBLF2KhQuRlaV7aO9eLF0KvVwXR4RuYJoWBu1SqVRevHhRIpGQ4uhUoGajnp6eMTExnZ2dQqi6ajVGjhzp6el59erVe/fuabez3qg+r7yCxkbaIoSKA/ZGOzs7yXQwWbzDU1RU1NHRMWLECC8vL1raaNYbdcAok5OT05gxY/j8DJ64uDhXV9eSkpKmpiZa2gRFXBxu30ZGBm0dgqS5ufnSpUsuLi5k6x0H4eLFi21tbY8//rifn592O/X4EoRgo47WBTP48HB1dR01apRGoyEVSBnLlyMmBhs34swZ2lKEx5kzZ9RqNXn00tZiPYzZJfWJUdC1UcccybIokylIJPj4Y6jVWL0aajVtNQLDAUf06C6+RPfToGmjsbGxXl5eZWVlOhOF9g35vvWnMliUSYeZM5GSgrNnsWEDbSkCw5HjSzp2WV9fX15e7unpGR0dTUkXQNdGSXDN0YoWR0ZGDhgwoLa2VqcWCeuN6vPxx/D0xJtvoqqKthQhIYT+l5Vpbm6+fPmyq6urzs7BpHA1SfuhpQ3UdwZ1wCgTjHQ8hw0b5ufnV1lZWV1dTUmX4IiIwF//iuZmrF1LW4pgqKmpuXXrlq+vL9mFyUEoLCzUaDRjxozRmQ4WQnwJArFRR+uCGfyr+UwOR/s0uua11zB8OHbswIkTtKUIg1OnTgGYMGGCSCSircV6CDbxnkDZRvmJQp2a8PYN21DEdFxc8PHHAPDqq3Cke8QoDjiihxG75OcDqX8alG00IiIiJCTk3r17+pXw7ZiJEyeKRKJz587p1CJhUSaDzJqFpUtx7hy2b6ctRQCw+BJPeXl5XV1dcHBweHg4JV0PoWyjcEjvIBNb+rVI+P2aHKoMK+HkSXz4odGj69ejXz9o79314Yc4edIKuoSFRqMhleKoD2Otya1bt6qrq/39/QcPHqzdLpyOOX0bdeTpUZ1xfVBQ0MCBA5ubm3/55RdKuiigVuPddzF1Kl57Dfn5hs8JDsa6db/9Nz8fr72GqVPx7ruOlVV6+fLlpqYmkuxBW4v14ONIOtPBApkYhXBs1NGC9cY2FHG0T+POHcybh4wMqNX485/RRfFMqRT80sf4eGRkgOOQkYGkJNy6ZR2x9BFIYNrKGJsAFdCnwdGmsbFRLBa7ubl1dHTQ1mI9SLx1xIgROu3vv/8+gDVr1lBRZWX27uX69+cALiiI++67Hr88N5d77DEO4Pr143bs6AN9woPs9vjBBx/QFmJVpk2bBuDAgQPajR0dHe7u7iKRqKGhgZYwHvo2ynHc8OHDAZw9e5a2EOuhVCpdXV3FYvH9+/e120+cOAFg3LhxtIRZh7Y2TirlRCIO4GbN4mpqenmd2lpu/nwO4AAuNZVrabGoSuFBasH98MMPtIVYD7Va7ePjA+DOnTva7WSOODo6mpYwbegP6uF4I1kArq6uo0eP1q9FQtZjXLhwwY7LsF6+jIQEfPwxXFyQmYn//Q/Bwb28VEAA9u3D5s1wd8fXXyM+/pEwlJ0hhMKa1qe0tLSpqSkqKiooKEi7XUAjeiHMjcL4RKF9Y/Dh4eHhERsb29nZef78eUq6+pavvsL48SguxuOPIz8f6ekQ692DGg02bUJLi0kXFImQlobTpzFiBEpLMXEi1q+3uGpBcO7cuc7OTlKyVudQbW1tSUkJFVV9jTG7JHbBbPQ3HLA3CiAtLe3o0aNr1qzRabfXT+P+ffzhD1i+HC0tSE3FmTMYM8bAaTU1mDMHa9bgL3/pwcVjY1FYCKkUSiX+8hcsWoT6eksJFwrGDKW1tTUlJWXSpEk//PADDV19i7FwvKB6o4KYGyWzxfoThY7Jv//9bwB/+MMfaAuxJKdOcVFRHMD5+HD/+Y/R044c4UJCOIALDOQejSiYyu7dnJ8fB3Dh4dyJE73WK0SeeeYZAJ9//rlOe3t7+7JlywC4urrusLtYW1xcHIAff/xRu7GpqYnEpdvb22kJ00YQNspxHMk8//7772kLoU92drZIJPL29k5NTc3JybH1BAa1mpPLOWdnDuDi47lr1wyf1tHByWScWMwB3MyZXFVV79/x5k1u0iQO4CQSTibjVKreX0oIqFSqvLw8qVTq4+MjEomys7P1z9FoNGvXrgUgkUg+/fRT64vsI1paWpycnJydnVsejR4ePXoUwBNPPEFLmA5CsVGpVArg73//O20hlMnLy9PZI6F///6rVq06cOCAUqmkra7HVFRwU6ZwACcScVIpZ+yJcPkyN2YMB3BOTpxMxqnV5r5vZycnk3ESCQdwCQlcebm5F7Q+SqXywIEDq1at0tlC2c/PLy8vz+BL5HI5SVBPT0/XaDRWFtwX5OXlASDlNLX529/+BuCVV16hokofodjotm3bADz11FO0hdBk37597u7uABYuXHj27FmZTDZOKx/dw8MjOTlZoVA0NzfTVmoS2dkP00IHDOD+9z+jpykUnJcXB3CRkdzJk5YUcOzYb4ml27db8sp9R1tbW05OTmpqar9+/fivPioqSiqV5ubmLl26lAzes7KyDL5coVA4OzsDWLFiRWdnp5XFW5wPPvgAwOrVq3XaFy5cCOA/XUwPWReh2OiVK1cAhISE0BZCjS+//JKUnl21apX2D6C8vFwulycmJvIr4dzd3YmfCnYqubW1dd26SySdc/58rrbW8Gn373N//OPDrM8lS7i+SKPWTixdt+6H1tZWy7+HJWhtbSXu6e3tzbtnTEyMTCYrKSnhT1OpVC+++CIZvG/cuNHgpQ4fPkwuMn/+/BYbz6Qlj40vvvhCpz00NBTA1atXqajSRyg2qtFoyOClsrKSthYKZGZmkl9Oenq6sXNu3Lih46dubm7JycmbN2/WyUymS2lp6ejRo93cfGNjW+RyztjgsrCQGzKEAzhvb+6rr/pWkkLBxcaWeHh4REdHnz9/vm/frCc0NDRkZWWlpqZq5zAR9/zll1+MvSozM5MfvBs8oaCgIDAwEMDEiRNrjT3EbIHIyEgA2g8SjuMqKioA+Pv7C2fiQig2ynHcnDlzAOzevZu2EKuiUqnICj+JRLJp0yZTXlJZWbl58+bk5GR+4wSJRJKYmCiXy2t6vR7IQmzcuJHMS0RHRxcX/2zwHI2Gk8s5FxcO4MaP565csYaw4uKLZLsed3d3Y/04q3Hv3j2FQpGcnOzi4kK+QbFYnJiYmJmZec1YDO5Rtm7dSgbvK1euNDh4v3btGqmHFB0dffPmTUv/Bdbgzp07ALy9vdWPTpbv2rULwNy5c2kJ00dANrpu3bquu2P2h1KpXLJkCZnt2rVrV09fXldXZ/DXKJfLb9261ReCu6CxsZGk3QBITU198OCBwdNu3+bmzPkt6GTNfJXW1lYSySSzz3V1ddZ7b47jOK62tpZ8X8QBtZ9/1dXVPb1aTk6Oh4cHgAULFhicrKiuribZQiEhIYLqg5tITk4OgBkzZui0v/766wBkMhkNUYYRkI0eOHAAwPTp02kLsRINDQ2TJ0/uOvZqIvX19eT3ye9UIxaLx40bJ5PJrlils5efnz9o0CAAPj4+//3vf42ddvgwFxz8sBbJt99aQZcB9uzZ4+/vDyAsLOyEVTJLb968KZfLk5KS9EcPt2/fNufKBQUFAQEBABISEgwO3puammbNmgXA19fXOn+sBcnIyADwxhtv6LSTSiXf0rqBDCEgG7179y7pw6tsPdPPBKqqqkaPHg0gNDS0uLjYUpdtaWkhkQovLy+dubZLly5Z6l20UalUMplMIpEAmDBhQllZmcHTlEplenr6tGnnAC4piet538uS3Lx5kzzAxGKxVCrto8zc69evG5zLVigUjY2NlnqX0tLSiIgI8i0bHLwrlUo+Od9YfF+YzJ49G8CePXu0G41VKqGLgGyU4zjSo7l48SJtIX2L9q1fUVHRF2/BR37JPaftp2fOnLHUu1RUVEyZMgWASCTqwo8uXbpERpfe3gGffPJACIEBbfefOHGiMffvBWVlZcQ9+Y+dz1Rramqy1Ltoww/ejT2SNRrNq6++2nV8X2jwMWed6akLFy4AiIqKoiXMIMKyUbLcbcuWLbSF9CGnTp3iB2JWmJ5TKpU5OTlpaWnaBXIGDRoklUrz8vLMiXXyo+MBAwYcOnTI2GkKhYJ0jSMjI/Pz83v9dn3B8ePHw8LCAPTr1++bb74x51I///yzTCaLiYnhP2RfX98lS5YoFApj08QWpKGhgTzP/Pz8jJXR6za+LyjIBhBhYWE67Z9//jmEt1RaWDb64YcfAnjhhRdoC+kr9u3bR8ICKSkpVs5h5NcUBmuVpRs4cGAv/LStrY2P1aSkpBh7GNy/f//ZZ5/lg07CXDVgYmTMGMQ9hw0bxn+k/v7+ZBWvlZd7K5VKPjl/586dBs/h4/vCT87/6quvACxatEinPS0tDcBHH31ERZUxhGWjP/74I4AxY8boH2pubn7nnXd0MshsC2MJ9lZGrVYTP33sscf4H39YWFhaWlpOTk63wkpKSkaNGkVm+uRyuTH/LSgoIAk33t7eX3/9dR/8HZZEoVCQx9vw4cOLioq6Ppl8gOnp6do7rAUEBFCvgaBSqUjBsC4G7/v37+86vi8QXn75ZQCZmZk67WT64qeffqKiyhjCstHW1lZnZ2cnJyf9TsE333xD7leyMM7MAan1MSXB3vqQztTQoUN5O+jfv38XnSnebrrIY9doNHK5nPR6xo8fL5ylJl3T7eOB786HhITwH1dERAS5G9XmFwKwEPaRnH/jxo1t27ZdvnxZu5GvVCK0B4CwbJTjuDFjxgDQTwA6e/bs888/T2YVCYMHD3799dcLCgoE7qfafQQTE+ytD/FTspsLwc/PLzU1NSsriywobGhoIGPGrge/t2/ffvLJJ/Fr0EkgdcxMRHuyYsGCBWSyQqlUHjlyRCqVam/GGRkZKeRnOT/uMZacX1paOnDgQBiP7wsTUlBVv1IJdQRno2RJzz//+U+DR/keAVlUSwgPDzdxQGp9TJmxEhTFxcUymWzEiBH8x+vj4zNnzhwyo+rn59fFMoFDhw4RrwkKCjp48KA1ZVuQXbt2kSJbwcHBc+bM0U51GDFihEwms2CCWt/BJ+cbm4W3xeR8UqlEgBs+Cs5GN23aBGDSpEldn6ZWq8+cOSOTyYYMGSKo+SlttOOnZibYWx/tkihisdjDw2Ps2LHlRkrOKZVKqVRKxpJJSUm9WJMjKCoqKhISEkgpcfyaKFZaWkpbV8/oNifElPi+oCBpZJs3b6YtRBfB2ejevXsBODk5mf4S/QEpHy2lWKOzurq6LxLsrU95eTmp17dhwwaDJ/Bpoc7OzjKZTDizhObwr3/9C8C4ceOMPTlsgm4zlG1rtERmKnJycmgL0UVwNtrZ2Uk6NTNnzjx+/HiPXkv8VAg1Oq2QYG9NSPZJUlKS/qGDBw+SweOwYcPsaYvsmTNnAhB+jkG3dPs4N6X4HnWOHz9OvhGRSCTAuTvB2SjHcb///e95H9QvuWgK/EoSKjU6rZxgbwUaGhpcXFwkEsndu3d1Dt25cyc4OFiwaaFd09DQ8PTTT2/XK+lcV1dHIsL19fVUhFkWUyaXhJmcr78kbN68ebRFGUCINtrW1paenh4ZGUlKrhFiY2MzMjJ6OhfeRY1OfUewCN1O7dsoc+fOhZEFZmbW16CIsV42WSojqFJsZqJdS6yL5Pyu4/tW4/z58xkZGbGxsfzP393dPTIyMj09va2tjaIwYwjRRnn4uLz5uSYVFRVWqNHZbaKJ7UKcRZh9gV6TkpICQH8kS54Z+ntw2jSmJN7R7QQYy7qjG+QwBUHbKI+xPCepVHrkyJEeGRZfo1O/5mOVOdtRCnVYZCnsbJzLcVxzczOJxevkFTQ2NhqbwbADul0G0m3xPctiMOWGXwMikJSbbrENG+Xh1+GZn+fURY3Oni68MWURnh0wY8YMANu2baMtxDJs374dwJQpU3Tav/76axLhpKLKCnS7KNkKAVKbSwDvGhuzUW3IEODxxx/nv4beVYUws0anbaWMmMOGDRtgR7u3krnC9evX67STXSftabd3fbotkdNH6XoWrI8jKGzYRnksVaPswYMHWVlZy5Yt0/bT559/vouX2FwCsznU1NSIxWJ3d3dbDMrr0Nra6uXlJRKJdJZCtrS0eHh4iMViM2d4hE+3+ST19fWkuHVERISZi3rJrtFpaWlkLT/BRotjGMQebJSnizynHlXM1f7WP/nkE2OndVsu1/6YNGkSANsqom6QPXv2AJg4caJO+44dO2DCIjr7oNvBe1tb27Jly7ooJts11qkdLgTsykZ5LLV/Q3t7u7F4ZbebN9glH330EYBly5bRFmIuf/zjHwH84x5YaioAAAgcSURBVB//0Gkn8zNCK2fZd/TFZjbW38mGOvZpozwVFRV9sZuYlaOZwqGyslIkEnl5edl0SmxHRwcpPqKzm3FbW5u3t7f+SN++sdTWivyu0ebHbG0OO7dRni72tu3pLFi3G9vaN/Hx8QD27t1LW0jv+fbbb2GoOjip5zBhwgQqqihizkbfFvxl2S6OYqM8/DNTZ2/3zMxMU56Z/DYM9pdgbyIk8TA1NZW2kN6zatUqAO+++65Oe2pqKoD33nuPiiq6qFQqUqPSxKq41lnPYis4nI3yNDQ0ZGVlGZzB0am5zWPfCfYmUlZWRnIhbKskM09nZyeJF+sUvuNH+nY/Au2CbpPzrb+62iZwXBvl6SKeyJdEaW9vJ8nn9p1gbyIkKPHdd9/RFtIbjhw5Qr5fnfaDBw8CGD16NBVVwoFPzp88eTK/BJNurR/hw2z0N1pbW7Ozs5977jlfX1/eTz09PSdNmkQW9UskEjvI9TGft99+G8Cf/vQn2kJ6Axm6rlu3Tqf9+eefB/D2229TUSUo9u3bR5w0KCgoMTHR09OT/zn4+vo+99xz2dnZDhgV6AJmowbg11qQfdgJIpHIWN1iR6OkpARAQECAzc0Oq9VqsiGdTnKPSqUyONJ3WDZs2MB3PAH4+PjYRIkQWjAb7Yq2trb3338/NjY2KSnJpmPTFic6OhpAXm4ubSE9g+yJFhUVpdOem3scwPDhw6moEiZ79+5NSkqKjY19//33hVmeTjiIOI4Dg9FDjv3zn0GbNsXOno0NG2hr6QHr1jXk5Z2bMeNmRsYq7faXX8axYw9WrixZu3YiLW0M24XZKKNXnD+PMWMQHIyqKojFtNWYBMchMhIVFSgsRHz8b+0aDcLDUV2NoiLExdHTx7BZbOMHwBAccXEYPBi3byM/n7YUUyksREUFwsMxfvwj7SdPoroagwYxD2X0EmajjN6yaBEA7N5NW4epEKVPPw2t2Mlv7UuWUJDEsA+YjTJ6y+LFALBrF2xkXmjPHuBX1Twch+xsA+0MhumwuVFGbzE21yhIioowdqyBudzTpzFhAsLCUFGh20tlMEyE9UYZvUUkwsKFgG2M64nGRYt042GkffFi5qGM3sNslGEGZCRsCzZqcEQPYO9ew+0MhumwQT3DDDQahIWhpgbFxRg1irYao5SWIjYWAQGoqcGvBYkAoLgYcXEYMABVVZBI6Olj2DisN8owA7EYKSmA0DukRN3ChY94KLRG+sxDGebAbJRhHrYwrucnQE1sZzB6BBvUM8xDpUJICOrqcOkShg+nrcYA5eUYPBi+vrhzB7+W6gaAK1fw+OPo3x+3b+v2UhmMHsF6owzzcHLC/PmAcDukO3cCwPz5j3go356SwjyUYS7MRhlmI+xxPRvRM/oaNqhnmE1HB4KD0dCAa9cweDBtNY9w6xYiIuDpibt34e7+W/v164iKQr9+uHMHv25kyWD0EtYbZZiNiwvmzQPwcFmlkCBLVX//+0c8lLQDmD+feSjDAjAbZVgCoY7r2YieYQXYoJ5hCdraEBiI1lbcvInwcNpqfmPrVmRn4z//gdb2r6iqQng43N1RWwsPD3riGPYC640yLIG7O+bOBcc9XFwpGFaswL59j3gotEb6zEMZFoHZKMNCLF6MsDCbqITf1AQvr4flUhkM82GDeoaFUKshFttKoaS2NojFLL7EsAws85hhISQSaDQoKEBBAerq4OmJYcMwYwZ8fWkpamzE5s0AsGwZIiMfOdTSgi1bEBmJZctoKGPYF6w3yrAQeXlYvRqlpY80enritdeQkUFlsE+WgQKYPRuHDj1yqKQEI0YgKQlHjlhfF8PesIGZLIYNcOwYZs3CL79g9Wr89BMqK1Faik8+gY8P3noLzz9PV93hw9ixg64Ehj3DeqMMs1EqMWQIqqqwdSuWL3/kUGUlEhJQXY3s7Iel8q0I6Y2OGYPycnh44NIl9Ov38BDrjTIsCOuNMsxm505UVWHmTF0PBRAejsxMAPjoI+vrIgQGIj0dNTV4801aEhh2DrNRhtmQHt3SpYaPPv00XFxw8iRaWqwpSpu1azF0KD79FKdO0ZLAsGdYpJ5hNpcuAcDIkYaPurtj6FCUlODqVYwcicOHe3r5apfI4o7onr4qMfG3f7u44B//wFNP4aWXUFjIat0zLAyzUYbZNDYC6Cqxyc8PABoa0NLysIhJT6id8vK8Hz7u6avOnn1E0cKFSE7GgQPYsAFSaU8vxmB0BbNRhtmQLPbOTqMntLcDgJsbnJwwZ05PLy8aFDOn56s2+WgSj1yOo0eRkWF0+oHB6B3MRhlmM2AASkpw65bRzUFv3QKA4GB4eOC773p6+VFAj18DACgvf+S/gwfjjTfw1lv4v//Dq6/26ooMhiFYiIlhNuPHA8BPPxk+WlaGmhoEBuouJKJBejqGDMHWrSgqoi2FYUcwG2WYDVlQuXUrmpsNHF2/HgCeeUYIy+3d3PCvf0GjQUYGbSkMO4LZKMNsxo7F4sWorsazz+o66Rdf4NNP4euL11+nJE6XJ5/E4sW4cYO2DoYdweZGGZbgs89QWYkDBzBkCBYuRFQUmptx9CgKCuDpie3bERZGW+JvyOU4dAgPHtDWwbAXmI0yLIG/P44fh1yOLVvw2WcPG7298eyzkMkwbBgVUc7OiIpCcLBue1gY/vY3rF+PkBAashh2B1tTz7A0dXWorYWXF0JC2B7wDEeA2SiDwWCYBQsxMRgMhlkwG2UwGAyzYDbKYDAYZsFslMFgMMyC2SiDwWCYxf8DzExke9YFRuYAAAEmelRYdHJka2l0UEtMIHJka2l0IDIwMjIuMDkuMQAAeJx7v2/tPQYg4GWAAEYgFgRiYSBuYGRzyADSzMxEMjRADBYYzQ6hmdHFSTIUZggHhGbiZmDMYGJkSmBizmBiZklgYc1gYmVTYGPPYGLnSODgVODkymDi4k7g5slg4uFN4OXLYOLgz2DiF0hgZUjg40wQYEsQYWZjYGVhZmJkZWcT4Odg4+Lm4eXjFL8G9TwYCEr4Fxww4+/aD+Ionlt64FKJLJj9+cT6A4zqUfYgtr5sz4E5PW/BbJO3NgfOmW8Csyd+1ziwLnHmXhD7XMjs/Revcu6HGOuxbNn+9XZQcXuYOFC9A0w90F4HJHsdkOx1QLLXAcleB5i9DAwO9ratv6F2OeyHscUAi9hY2+ZyR4YAAAGIelRYdE1PTCByZGtpdCAyMDIyLjA5LjEAAHicjVTbasMwDH3PV+gHanxsyZfHNiljjKawdfuHve//mZSutQthjRMJWxwrks4hA9l6n96+f+i+wjQMRP6ft9ZKX9F7P5zINnQ4vrzONF72h1tkPH/Olw9CJlS9o88jdn85n24R0Ei76LJwCYV2cD4XKXrN+WW1u8GQ4lDACLTzLkF84hVkvCK5xlA07IoESFkBsgHZRf1kigQFSpG1jGLA4Kqg5qLAWIPmXgGmKzCxB7wV6TOXGleQ2ZBwAVGCGFJiYaw1XmheMv2FXeQYZA1YNeWmjFraSJuqhPGziR4YP5voQVyQz+mB8bOFHsjSz3N6kHSWXttZwtpOTbpdTZnpbF1sgB7n6UHSV5EfzvPURM5qoSmZ1WKTK5s1UdojTXp6oNQEBrXcVMRqpUmF1WoTBMx63tkc0PHL5hA6HtkcYscXmwN3xLA5SMcA26dSN2k2h9xNlK38+9mK1XtdedUiuXVnk+3naOfbj0b3wy88muI3thsHEwAAALF6VFh0U01JTEVTIHJka2l0IDIwMjIuMDkuMQAAeJxtTzkOhDAM/MqWIDmWz+Aoz3EPH+Dxm7DVAm4sj+bwJGfmUlL2Yy7NObrusmby51wcrakEEIYLe0B35GBjKISVnapBV9zcQqEw0hbeoAtWI75ItFk0VeiMwuo6MdcwHjQawqlq1aGX/+uFXt5sB9icWwCjNrkQQx1v1IGEe0yzZ43y0qM8i9CQqKn8St2C7jnr+QV8kEWxbgvvQAAAAABJRU5ErkJggg==\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "core = Chem.MolFromSmiles('c1ccccc1-c2nc(c1ccccc1)no2')\n", "core" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.740187Z", "start_time": "2023-01-05T12:50:07.566164Z" } }, "outputs": [], "source": [ "smiles = ['CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5ccccc5)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'CCOc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'CC(C)Cc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5ccc(F)cc5)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CCC5)cc4)n3)cc2)C1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4noc(-c5ccccc5)n4)cc3)cc2)C(=O)NC(=O)NC1=O', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc([C@@H]5CCC(F)(F)C5)cc4)n3)cc2)C1', 'CC(C)(C)c1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C(F)(F)F)c(C#N)c2)n1', 'Cc1cc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)ccc1OC(C)C', 'CC(C)(C)Cc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(F)c2)n1', 'COc1cc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)ccc1OC(C)C', 'CC(C)(C)c1ccc(-c2noc(-c3ccc(CN4CC(C(=O)O)C4)cc3)n2)cc1', 'Cc1cc(CCCCCC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OCC(F)(F)F)c(C#N)c2)n1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5cccc(F)c5)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'Cc1ccccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'CCCCc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(C#N)c2)n1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(CCC(F)(F)F)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(-c5ccccc5)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CCCCC5)cc4)n3)cc2)C1', 'CC[C@H](C)Oc1ccc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)cc1C#N', 'Cc1cc(CCCC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(C(F)(F)F)c2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(Cl)c2)n1', 'CCCCCCc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C)C)c(Br)c2)n1', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(OC(C(F)(F)F)C(F)(F)F)c(C#N)c2)n1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CCCC5)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc([C@H]5CCC(F)(F)C5)cc4)n3)cc2)C1', 'O=C(O)C1CN(Cc2ccc(-c3noc(-c4ccc(C5CC5)cc4)n3)cc2)C1', 'CCCc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'CCOCCC1(Oc2ccc(Oc3ccc(-c4nc(-c5ccccc5F)no4)cc3)cc2)C(=O)NC(=O)NC1=O', 'CCC(C)(C)c1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(CC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'CC(C)Oc1ccc(-c2nc(-c3ccc(CN4CC(C(=O)O)C4)cc3)no2)cc1', 'Cc1cc(C(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'CCOc1ccc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)cc1C(F)(F)F', 'Cc1cc(CCC(=O)O)ccc1-c1noc(-c2ccc(C3CCCCC3)cc2)n1', 'COc1ccc(-c2nc(-c3ccc(CCC(=O)O)cc3C)no2)cc1C(F)(F)F']\n", "mols = [Chem.MolFromSmiles(x) for x in smiles]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:07.897680Z", "start_time": "2023-01-05T12:50:07.742403Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Draw.MolsToGridImage(mols[:16],molsPerRow=4)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.249206Z", "start_time": "2023-01-05T12:50:07.898786Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2R3R4
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
10
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
11
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
12
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
13
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
14
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
15
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R2 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R3 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R4 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we'll use just the first 16 molecules to make things a bit smaller for this demo\n", "m16 = mols[:16]\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([core],m16,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,m16,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do a query with labelled R groups:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.261221Z", "start_time": "2023-01-05T12:50:08.251475Z" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO2de0AUVfvHn11gAQFFAQVEbl5SElEURTHzp2heNvRVMTXRLm94K9TSePW1ViuTEt9AszQrRbsoaiqKGmJqpHK1JCQUuZqAiIDcF3Z3fn8cHaddXBZmd2d3eD5/6Znbd9iZ75xznuecI6AoChAEQZCOIuRaAIIgiHGDNoogCMIKtFEEQRBWoI0iCIKwAm0UQRCEFWijCIIgrEAbRRAEYQXaKIIgCCvQRhEEQViBNoogCMIKtFEEQRBWoI0iCIKwAm0UQRCEFWijCIIgrEAbRRAEYQXaKIIgCCvQRhEEQViBNtpuKioqPv7448WLF+fk5HCtBUHYUl9fv3nz5hUrVpw/f55rLcaKABcRaRd5eXnDhg2rra0l/w0ICAgODp4zZ07v3r25FYYg7UIqlSYkJJw6der777+vr68HAIFAsHnz5nXr1nEtzfhAG20HRUVFEydOzMvLMzEx6d69e1VVlVwuBwChUOjv7z979uxZs2a5u7tzLRNBnkpDQ8Pp06ePHj0aHx9P1wYsLCxEIlFNTY2pqWlMTMyCBQu4FWl8UIhmZGVlubi4AEC/fv2uXbtGUVR9fX1cXFxISIiNjQ399/Ty8pJIJNnZ2VzrRZAn0M+qtbW10rP6119/URQll8sXLlwIAAKB4JNPPuFar5GBNqoRFy9etLW1BYDx48dXV1crbW1oaCDPaLdu3ZjPaHh4eFJSEieCEYSiqMrKypiYGLFYbG5uTh5LoVA4fPhwiURy69Yt1f2jo6OFQiEAhIWFyeVy/Qs2UtBG2+bYsWOWlpYA8K9//auxsVHNnk1NTefOnQsLC+vZsyftpx4eHmFhYUlJSQqFQm+akc5MRUUFcU+RSES7Z0BAQFRU1N9//63+2CNHjlhYWADAwoULm5ub9SPY2EEbbYPPP/+cfJ/ffPNNzb/PMpksKSkpLCzMycmJ9lNXV1fip/idR3TBnTt3du/eLRaLTU1NySNnYmJC3LO0tFTz85w/f75r164AMHHixIcPH+pOMG9AG1VHREQE6S2SSCRKmwoLCzU5g1wuJ37KDOU7ODiEhITExcW1tLRoXzTSySgsLIyKigoICBAIBOQBs7CwEIvFu3fvvnfvnvpjCwoKvvrqK9XyzMxM8sSOGDGizZMgaKOtI5PJQkNDyfdc9Tnbu3evubl5bGxsu86ZlZUlkUj69+9P+6mdnR3xU6lUqj3tSKcgPz9fyT0tLS3FYnFMTEybVUilY2/cuNHqPgMGDAAAT0/PVjtSERq00VZobGycPXs2AFhZWZ06dUpp66ZNm8hT+/HHH3fs/MRPBw0aRPtp9+7dQ0JCYmNj6+rqWMtH+Ax5eIYPH04/PF26dCHuWVtbq/7YzMzMjRs3ent708d27dp1/vz5rdooRVEVFRWjR48GgF69emVkZOjgbniCFmw0JSVly5YtT9va3Nx8+PDhXbt2GUt3dWVl5dixYwGgR48ely9fZm6SyWTLli0jVdRdu3axv1ZWVlZERERAQIDqK1FTU8P+/AhvIO45cOBApU9vXFxcU1OTJsd27LNdV1c3depUALC2tj5z5oz2bki3NDY2hoaGPq1WXlBQsGPHjvXr12/fvp0kL7KElY2WlZWVlZVduXJlyZIlcrn85s2bzK01NTV79uwZPny4k5OTk5OTap6QAXL37l3yrXZ3d8/JyWFuampqmjt3LgCYm5sfPnxYu9dV00Azir8bogvkcnl6evrTOoLarJcQ9yQNczadSC0tLa+99hoAiESiH374gd096YPr169TFOXj46NQKHJycpQyZPbu3evi4jJ06NDg4GAfHx9nZ+cvv/yS5RVZ2ejRo0efffbZJUuWiMXi8ePHBwcHM7dGRkaKxeIdO3a88847RmGjWVlZffr0AYDBgwffuXOHuamqqmrcuHHkM67TVNCioiLipyQ9gLh2YGDgW2+9pfSVQvhKc3NzdHT0smXLmGFJFxeX0NDQNsOS6kOaHW4RKhQKiURCwq2ffvppx06iH0pLS4cOHbpgwYIJEya8/PLLvr6+RUVF9NZ79+55eHisX79eJpNRFNXQ0CAWi11dXR88eMDmomwb9SQUM3ny5IKCgqft89lnnxm+jV69etXOzg5aS7AvKSnx8fEBAGdnZ/Kh0wN///33jh07xo8fb2JiQr8PS5Ys0c/VEa747bffzMzM6F/c09Nz7dq1ycnJ6pOO9ZNgFxUVZSzJ+QcPHhw9evTBgwdVN929e5f5Ofnhhx+cnJxY1o3Y2uhHH320YsWKhISE559/vqqqqqWlZf/+/cnJycx9DN9G1STYZ2dnu7q6AoCXl1dxcbH+tVVUVHz22WcODg7k3cA6Kb8hAzdMTU2nTp3a5pAN/Q/3MIrk/KSkJH9//7KysmHDhl29epWiqMTExFYtlaKo2NhYJyenhIQENldka6MJCQkkg3L//v3Nzc11dXVOTk5r1qxh7mPgNvrNN9+QdGXVBPvk5GR7e3sA8Pf3v3//PlcKCWSk6QcffMCtDER3yOVy0jmekpKiZjduBx8zk/MNMxBaUFBAbPH06dMlJSUURc2fP/+ZZ55pdedVq1a5urqyfLu1nPAkl8uvXLmSn5/PLFS1UcNJO1eTYB8XF9elSxcAmDFjRkNDAxfq/sGiRYsA4P333+daCKIrbt26RVJEWt1qOFPhZGZmOjs7g4El59++fTs6OjooKEh1xHZ2dnZaWprqIVlZWW5ubqrvfnvRR96oko3evHlz4MCBb731VkJCAodp5zKZbMmSJfCUBPt9+/aRKuqrr75qIKZ/+PBhAJg2bRrXQhBd8d133wHAzJkzmYVkepHg4GArKysl9+QwK95wkvNzcnI+//zzoKAgp8domJhVWlrq7+///PPPsx/wyoGNfvvtt/QNDxgwYPny5fHx8Xqu7jU1NZEE+y5duqgm2EdERJC2VXh4uD5VKVFfX3/79m36v0VFRSRnBac44SthYWGgMqwjMjKSWKfm04voh4qKCn9/fwBwdHTUf3J+Tk5OZGTkc889R5tJuypnubm5o0ePHj16dFlZGXsxHNgoRVFFRUV79uwJCgpydnYmfwIPD49FixbFxsbqobeFmWD/22+/MTcxE+zZZ5Ox4dq1a6ampkOHDmUWkpYU01sRPkFc6fz588zC/Pz8yZMnazJAXv/U1dVNmTIFAKytrc+ePauHKxL3HDNmDO2eXl5exD01D3mdPHmyf//+L774olY8lNK6jTY1Nb300kt79uyhKEqhUBQWFhYWFm7cuNHJyenPP/8sLCxUGjVx584d4qe9e/cmfxQ3N7e5c+fu2bNHRyGdu3fvDhkyBADc3NzIhLVM8bpLsG8vjY2NZmZmJiYmzL/YjBkzAMAoUqCR9tLc3GxpaSkUCo1rUiWpVEpmy9ddcr5cLk9JSdmwYcOwYcNo9/T19V27dm1CQoImfW5btmx5/fXXKYpqaWnZtGmTs7PzvHnzcnNzCx9TWVnJRqGWbbS+vn7QoEGky7a+vt5JhafZU0VFRWxs7KJFi/r06UP2dHFxCQoK2rNnjxY/whom2P/666/auiIbfH19AYAZe928eTMArFq1ikNViI5ITU0lnZ5cC2k3CoUiPDxc68n5MpmMuKePjw9tIH5+fhs2bEhJSWlX11ZoaKi/vz9FUfHx8aqm5OTkxDIBRoeNerlcfl2FNl3//v37+/fvf+mll1xdXWk/nTNnztdff3337l02etQn2A8dOlTPCfZtsnTpUgCIjIykS86dOwcAY8aM4VAVoiM+//xzEtLkWkgH0VZyfmNj44kTJxYvXjx79mza5saOHbtly5bMzEyWIqurq1VN6fr16yQvqsMY7gxP1dXVsbGxoaGhffv29fX1JTGfDgcojx8/ThLsZ86cqRTOYibYM8eNcc63334LAHPnzqVLqqurhUKhhYUFTqzHP0hC2xdffMG1kI7z3Xffkfn2Q0JC2pucTyfDkqRU0u02ceLEbdu2Kc1uYYAYro3S1NTUnDx5ctasWSSLkzBixIgtW7Zo6Kd0gv2KFSuUvpMpKSlkgNCoUaM4T7BX4saNGwDg7u7OLCRz/KSnp3OlCtER5Jc19vno6OT8wMBATcLFba61ZxQYgY3SqH6vNBm8oUmCfVBQkCEk2Cshl8vJnTLjiYsXLzb2OguiCt3OMNgRlpqTnp5OBqf6+fk9LbDR3rX2DBxjslEaDYcSt5lgT+aAeOWVVwwkwV6VCRMmAMDJkyfpkp07dxLNHKpCtE5CQgKfer3z8/PJ5H6enp65ubl0OZu19gwZo7RRmlYntnFzcwsLC/vll1/oBHumDREMJMG+Tf7zn/8AwHvvvUeXpKWlAcCgQYM4VIVonY8++ggAVq9ezbUQrVFWVkam6Hd0dDxz5oxW1tozWIzbRmlkMtmFCxfefPNN5jSLAoHAzs6OTPHC3HP58uWGkGCvCT/99BMATJ48mS4h2YUCgaCqqopDYYh2CQoK4l9GcE1NTWBgIKnKkFfSwsIiKCgoJiaGZZ6mocETG6WRy+WXL19+++23bW1tAeDIkSPMrcwE+/YuSMcJJSUlAGBra8vMkiNjXRITEzkUhmgX0pzKy8vjWoiWkUql9vb2FhYWU6ZM+fHHHw1zRij2CIFfCIXCMWPGbNu2jQz4uXfvHnPrhg0bYmNju3fvnpiYGBwczJHGduDk5NS7d+/q6urc3Fy6cNSoUQBAsrURHlBcXFxaWmpnZ+fh4cG1Fi3T2NhYWVkJAHFxcfPmzWPOTcUn+GajNCNHjgQVr9mwYcP06dOTkpLImHqjQNU0W701xHhJSUkBgFGjRtErcRkpq1evDggIuHDhAl2SmpqqUCiGDx/OnNKff/DWRlutsnXr1u3UqVPPPvssR6I6gqppkhLy7iE8gPy45Gc1ai5evHjlyhU6Cg+Pn1Ie3Jp6eGujQ4YMsbS0zMnJqa6u5loLK1RNs2/fvnZ2dqWlpXfu3OFOF6I1+GGjDQ0NWVlZZmZmw4YNowtJYomx31qb8NZGzczMyCxzGRkZXGthxYgRI0xMTP744w+pVEpKBAKBn58fYLueF8jl8mvXrtG/qfFy7do1mUzm7e3NHG2INmr08KPxa2NjM3DgwObm5uvXr9OFGGXiDTdu3Kirq+vbty9Z9ct4UW2/FxUVlZaWOjg4eHp6cqdLH/DfRnngNRhl4jG86T1U7ZrgR2eFJvDZRon7GHttFFozTRLVTU9Pl8vl3OnSCRkZMGIEzJoFSnc2fjx8+CFHmnQJb7yG3Ah56ZglPLi1NuGzjXp6etrb25eVlRl7KEa1d4LkGNbV1WVnZ3OnSyfU1kJGBhw7Brt2/aP8jz+gqIgjTbpE1X2MkfLy8sLCQtIBRRfypqLdJny2Ubrb3tgrpN7e3lZWVrm5uVVVVXQhv7tHBw+G//4XSku51qFjGhoasrOzzczMfHx8uNbCCvKK+fn5kZmbgUehM03gs43C4y8hCRcaL/Tadswb4Xewfv16EAhg9WqudeiY9PR0mUzm4+ND5hQ3XlQj8llZWfX19SQ5jztdeqJT2Kix10ahtbonv2ujDg4gkcChQ3D6NNdSdAlvmr2qN8KPzgoN4bmN0qEYmUzGtRZWqEaZfH19zczM/vzzz/r6eu506ZAVK2DwYAgLg8ZGrqXoDH4EYejs7M4Zpgfe26idnZ2np2d9ff1ff/3FtRZWqFarLSwsvL29SQ8Ud7p0iJkZ7NwJ+fkQEcG1FJ3Bjypbbm7ugwcPXFxcmNNU8qairQk8t1HgS7vew8OjZ8+eJB5KF/K7XQ8A48bBwoWwdSswbpo/lJWVFRcXd+vWbcCAAVxrYYWqY5KKCw9CZxrSWWzU2KNM0FpMid9RJsLWrWBuDuvXc61DB6hGt40U1fY76UYbOnSosYfONMS4fz9N4EdtFJ6ShA88stGjR1vpBu3VCz76CA4ehLq6RyXHjz/5t1HDm/HmnXn8EoH/Nurr6ysSiUj6BddaWKFqmgMHDuzatWthYaHS7NRGR20tLFoEc+bAmjWtbF22DPz8Hg1qSkqCOXPA2xuuXtWzRu3Dj95DMtsDWdeTLkQb5Ru8CcX4+fkJBIKMjAw660AoFI4YMQKMvEKang6+vnDgAFhawoABYGkJnp7AbAsKhbBjB3h6gr099OwJQ4ZAYSGMGwcbN4JCwZ1udlAUlZ6eDo97ZowXMveYl5cXc9lzfnwhNIf/Ngp8adf36NGjX79+ZFZHutCo2/UUBdHREBAAt2+Dry/88QesXAmjRkFeHgQE/GPPkSMhLw8iIuCZZ+DqVQgPB4UCNm2CF14w1pFON2/erK6udnV1dXZ25loLK1Qdkwy/5kHoTHM6kY0aqdcwUb0R440ylZfD9OmwahW0tEBYGFy5Ahq+dObmEBEBZ8+CoyMkJoKPj1Gm6POmvqbafqdvzdhDZ5rTKe6TxzZKVglNTU2lKIozWe0nMRGGDoUzZ8DBAU6ehOhoMDd/srWuDuLi2jjDpElw/TpMmQL374NYDCtXQnOzTiVrmVbjSwqFIjMzkyNFHaQzT+z0BE7XJdUTCoWCrLdcUlLCtRZW5OXlHThwID8/n1no4uICADdv3uRKVbtobqYkEkoopACoiROpu3eVd0hPp/r3p4RC6pdf2j6bQkFFRVEiEQVADR9O3bqlC8k6gXRqX7x4kVm4cuVKkUj0448/cqWqvVRVVQkEAktLy+bmZrqQrE1/4sQJDoXpmU5hoxRFTZw4EQDi4uK4FqJ9goKCAOC9996Ty+Vca2mDmzcpX18KgDI1pSQSSkmvQkFFRj7yRB8fKjtb09OmplL9+lEAlI0NtXu31lVrn9u3b4tEIhMTk9raWrpQoVCsWrUKAIRC4fbt2zmUpzk///wzAIwdO5Yu4U2VpV10Fhtdt24dAGzYsIFrIVqmtLTU3d3dzc0NAOzt7UNCQuLi4phVA8MhJoaytqYAKHd36vJl5a3l5dT06RQABUCFhFD19e07+cOH1MKFjw4PDqaqqrSlWpsUFhZGRUUFBAQIBIJevXpZWFgcPnxYaZ+oqCjSpRgWFqZQKDjRqTkffPABALz99tt0CZkA19XVlUNV+qez2OixY8cAYNKkSVwL0SY5OTnu7u4A4OjoSJyU4ODg8MYbb5w9e9ZA/PThQ+rll9V5XGIi5exMAVD29hSbBoN6p+aK7OzsDz/8cOjQofQPZG1tTaY3NjEx2a1Sfz5w4ABZ1X3RokUG8gs+DbFYDAAHDx6kS/bt2wcAwcHBHKrSP53FRktKSgCgW7duht/y1ZDU1FQHBwcAGDly5P379ymKysrKkkgkgwYNol9XW1vbkJCQ2NjYuro6rnQmJyf7+zeSFvf+/cpbW1qedJX+3/9Rf//N9nJ0v4FIpNi58zCHFTryczCT0rt06SIWi2NiYkhbPiIiQiAQAEB4eLjSsYmJiTY2NuTDX1NTw4V8jejVqxcAFBQU0CXLly8HgK1bt3InigM6i41SFNWnTx8AyMnJ4VqIFjh37hx5zV588cV6lQZwXl4eaTyqvsD6fCcVCkVUVJRIJPL2XjpypEI1/lNQQI0e/aSrVCbTznVJFOv553cAwIQJE+6qhrF0CXHPZ555hv7j9+jRg3S2NDU1Ke28b98+U1NTAHj11VdbWlqYm9LS0nr27AkAfn5+5eXlerwDTSkoKCBNH2YhCZ1dunSJK1Wc0IlsdPbs2QCwX7VGZGzExMSQRt/ixYvVN/oKCgrozjjySltYWBA/ra6u1qnIkpISEtYTCARr1qyRSpV1xsZStrYUAOXqSiUlaV9AfPxpYkM9e/Y8c+aM9i/AQC6XJyUlhYeH9+vXj3ZPDbuq4+LiyMLuM2bMaGhoYG7Ky8vr378/APTt2zc3N1ent9ABDh48CABisZguaWxsVA2ddQY6kY1+8sknAPDmm29yLYQVUVFRdEtQ8xZrUVFRVFRUYGAgqfuQXrmAgICoqKiysjKti0xISHB0dCQWFh8fr7S1poYKDX3UVTp7NlVZqfXrP+LevXtTp04lVh4WFqZaGWSJTCZLSkoKCwtjjkTq06dPWFjYuXPnlGqXakhJSSGL1Pv7+5P+GZrS0lJfX1/S/f37779rVz9L3nnnHQD44IMP6JIrV64AwJAhQzhUxQmdyEYvXLhAmkhcC+kgCoVizZo1xAF37tzZsZPcv38/JiZGLBaT+izTT7WSodLU1BQeHk5izYGBgarnTEtLmz49UiikrKyor79mf8E2oDsWAMDX11cr2bW0e5KeQYK7u3tYWFhSUlLHemOzs7NdXV0BwMvLq7i4mLmptrb2hRdeIIGpn3/+mb1+bTF27FgAOHv2LF0SFRUFAG+88QaHqjihE9lobW2tiYmJSCTSeq1ED0il0nnz5gGAubn5oUOH2J/wwYMHxE+JxQCAUCgMCAiIiIi4fft2x86Zk5MzbNgwADA1NZVIJErRPIVCERkZSS738suX//qL/U1oSlpaGmluW1paRkVFdewkjY2NcXFxISEhJC+S4OnpycY9mZSUlJBJjp2dna9fv87cJJVK58+fDwAikYgZFucQmUxmZWUlEAgqKirowgULFgDAnj17OBTGCZ3IRimK8vb2BoDk5GSuhbSP2trayZMnk8i70rgX9lRVVcXGxoaEhFhZWdHu4OXlJZFI2lV3i4mJsba2JvWyK1euKG0tLy+fNm2a7trXbVJTU7Nw4UJyd3PmzKnSOLO0oaGBuCdzBiPy97lx44Z2RVZVVY0bNw4Aunfv/uuvvzI3KRSKtWvXkj9gZGSkdq/bAX7//XcAGDBgALOwb9++AKD0DegMdC4bff311wEgOjpaqTwjI0MqlXIiqU1KS0tJFc/JyUmnvWO0X5AcAM394uHDhy+//DLZPzg4WNWhEhMTnZycSNTl5MmTuruFNomNjSV1STc3t8tqM0vprwv5NjD/GjpN9mhqapo7dy5pdhhycv6uXbsAYOHChXTJgwcPBAJBly5dNO8U5g2dy0Z3796t9NtTFFVfX29lZWVraxscHEzn9BkIt2/fJq3RgQMHFhYW6ueidOu1W7duqq1XpZ1TU1NJHcTGxubAgQNKW1taWiQSCXnz9Z971CoFBQWjR49+Ws+Dmr4OvcXKZTLZsmXLSLf1l19+qbR1//79hpCcf+nSpddee405/P/06dMAMG7cOK4kcUjnslHSEunfvz+zMCcnR2mEydy5cw8dOsS5nzIT7DnJHFQfS5HL5VFRUeSVHjFihKrLqDcsDlE1dzWRN66sX01yPp01bFDJ+Rs3bgSAtWvXci2EAzqXjbbaL05gjncmLxJJsdy9ezcnFka/KmKxWDXBXs+0tLScO3du6dKlTD/t3r07qay9++67qtUiZvP5t99+40S2es6cOUNux9bWlp4ZUyQSTZs27ZtvvlF9QvTP3r17n5acn5qaSrJiufrEqkL6vlU7IjoDnctGKYoiw0vUDFYrLi7evXu3WCxWTbEsLS3Vj0gDabipQueZe3p69urVSyQSnTp1SmmfmpqakJAQOphTqbu8UNaUl5d7eHg4OjqKRCL9jEpoL+qT80mHT9++fTucXKFFSDfILSOarFB7dDobJYPVXFxcMjMz1e/JVUOPTrDnPIygBoVCQQbYXLhwgVmemZlJukqtra337dvHkTpNUSgUZIiwUljcoEhOTqaT85XqyHoLP6qntrZ26dKlpGnClQZu6XQ2Gh0dTTdLNUyTpMMO5o+naCfrIEokEu2GHegEe4FAsG3bNi2eWResX78eVEaFlZWV9erVS1uJ7tpl69atcXFxzJSMq1evkt5eg/1cEdQn55NkOBsbm4SEBH2qokOR9GL0Y8aM0acAw6HT2ShFUe+//76joyNdxxQIBP7+/lu3blWaVV6V+vp68tzoIgmGmWBvICnW6snIyAAAR0dHpdhRZmamAWaPVVVViUQiMzOzBw8e0IXko8WcLtNgUZ+cr88np7a29tChQ3PnzmW+BZaWlgYV79IzndFGCbQntjdNklKbkp2ent4BMfSAP/3XKdjg6ekJAOoTMA2EvXv3AsALL7zALCT9D0ahn/pncr5S5plWBgq3eXVOEmmNgs5rozTtSpNs9djQ0FCSmaR0rIZNRQPp4eoAZHIKo6jNkQmGv/rqK7rkabVpQ6apqSk4OBienpzfgWlr1GMIibSGD9roE9hMOUEfS2Y2Iri5ubV5rKHFW9sFmdHH8PsWa2pqLCwsTExM7t27RxeSvt0VK1ZwKKwDyGQyEs8xMTHZtWuX0lY6x6PNSRTVY4CJtIYM2mgr0J5IhjASNJwATSaTJSYmLl++nHnsypUrW93ZALP/2gUd6e5YV4be+P777wFg/PjxzEKyjMcvmixAanhERESQR0s1OT8+Pp5Mj9CBqXWLi4v1OaEib0AbVQedJkk60QgaTscrl8vT09MlEkm/fv1U59ykDHUsSnt56623AGDdunVcC1HHrFmzAGDHjh10SVZWFvkpjXcAOJ2c/9prryndRXJy8ttvv615E4HD6b35AdqopqhZHEJ9YFqhUKj2vhnRsmXquXjxIqiMrzUoyJwJAoHgzp07dOGmTZvA+GfGPHHiBEk2Uk3O1wR6sRnaPS0tLfW/2AwPQBttN8RPvby8aD+lpzXRcOU4w5mnhz1yuZx0B7c5nIErjhw5AiopjUOGDIF/TjlspKhJzn8a7B9gRAm00Y6junJcmx9zQ5s1UissWbIEACQSCddCWodMeMwcznDr1i3iHQaY39oB1CTnM+lwcwppE7RRLaBh15IBzmGuFRISEgBg8ODBXAtphaamJpLbm5eXRxdu2bKFdKdwKEy73L17l07OZzYL2Ky1h2gO2qg2eVqgMzQ09OLFiyS31NBW1GFPS0sLaVf+pc+FQTQjLi4OAEaMGMEsJPMqnDhxgitVuqCysvK5554DADMzs+3bt3/22Wf//ve/Wa61h2gI2qhOKCkp2blz54QJE0xMTLHtS90AAAXsSURBVICBqampcSXYa8grr7wCAJs3b+ZaiDKLFy8GgC1bttAlxcXFAoHA2tq6sbGRQ2G6oKGhYdCgQfBP+vXrFx4enpqaauy98IYM2qhuefDgQUREhIODg0AgsLOza3NYlJFy8uRJ1Uof5zQ3N/fo0QMAmKMV//e//wHAvHnzOBSmO6RSqY+Pj1AoNDMzCwwM1Mpae0ibCCiKAkT3SKVSeoIo/iGVSvcGBc28ft3x6lXw8OBaziMSE6UffnjL1jbuxIn/0oWBgfL6+tLw8HszZw7nUJtO4ffDZoAIuRbQWeD3Y21ubr7U3t7x3j346SeutTwhNtb811+9fX2feGhZGVy4YHL9usukSbz1UOD7w2aAoI0iWmL2bACAo0e51vEIuRxOnAB4rItw9CgoFDBlCjAWk0YQtqCNIlqCmFNyMty5w7UUAICkJCgvhwEDYPDgJ4XE5JnGiiDsQRtFtESXLjBlClAUHD/OtRSAx445Z86TkooKSEoCkQimT+dKFMJP0EYR7WEw7XqKaqVFf/w4yGQwaRLY2nKlC+EnaKOI9hCLwcICkpKgrIxbIaRrwd0dhg17UogtekRHoI0i2sPGBiZNAoUC4uK4FUI75uPRuVBdDb/8Aqam8OKLHOpC+AnaKKJVDKNdf+zYEy2EuDhobobx48HenitRCG9BG0W0yowZIBLBhQtQWcmVhIwMyM+H3r1h1KgnhdiiR3QH2iiiVWxtYfx4aGmBkye5kkAcc9YsED5+uuvq4Nw5EAphxgyuRCF8Bm0U0TZct+tVW/Tx8dDYCGPHAmN9LATRGmijiLaZNQtMTSEhAWpq9H/xrCzIyQF7e2DMpo0tekS3oI0i2sbeHsaOBakU4uP1f/EjRwAeOzmhsRHOnAGBAGbO1L8cpFOANoroAO7a9a+8Ap98Aq+++qTk7Fmoq4ORI8HVVf9ykE4BTpSH6IC7d8HVFSws4MEDsLDgVsvChfD99/Dpp7B2LbdCEN6CNorohv37YexY8PTkWgeUlsLx4zBtGri5cS0F4Sloo4jOKC+Hq1ehvBzs7GDkSHBx0cM109KgsBBGj/7H1UpK4PJlGD8eHBz0IAHpdJi2vQuCtBepFFatgj17QKEAW1t4+BAoCl56CXbvhq5ddXrlL7+EvXvh+efhwoUnI0HT0mDuXLh0CW0U0QkYYkJ0wOLF8NVXsHEjVFZCZSU8fAjbtsHRozBjBigUur64SAS//grffafr6yDII9BGEW2TlASHDkF4OGzY8GhOOmtrWL0aPv4YLl7UQ/je3h7mzIE1azgcj4p0LtBGEW1z+DAAwPLlyuXLloFI9GirjvnkE6ipgfXr9XApBMG+UUTrZGWBlVUrASUrK+jTB7KyAAC+/x5u39bkZMXdvPc+nKXhld9//9E/PDxgzRr4+GNYtAjGjNHwaATpIGijiLaprYVevVrf5OQE+fkAAPv3Q0KCJiejxszfeEVTG/3vkzVAYd06+O47WLYMMjI0PBpBOgjaKKJtrKzg1q3WN5WXg40NAMDCheDvr8nJBLZD3g/U9MpCRh9Vly4QHQ0zZsAXX2DGKKJb0EYRbTNwIFy6BPfvK6cXNTdDcTFMmQIAEBKi4clcATZ1VEhQELz4ImzcCNu2dfQUCKIBGGJCtA1ZpuPAAeXygwehqQmCgvSpZft2kEohIkKf10Q6HWijiLaZOhXGjYP33ns08Sfh/HlYtQq8vWHBAn1qcXeHdeue2seAIFoBbRTRNkIhHD0Kfn4waxa4u8PEidC/PwQGQt++EB8P5uZ6lvPuuzBwoJ6viXQucEw9ojMuXYKLF6GiAnr0gLFjYeLEf8SAdENaGty7B2LxPwpzcuDPP3FMPaIr0EYRBEFYgY16BEEQVqCNIgiCsAJtFEEQhBVoowiCIKxAG0UQBGEF2iiCIAgr0EYRBEFYgTaKIAjCCrRRBEEQVqCNIgiCsAJtFEEQhBVoowiCIKxAG0UQBGEF2iiCIAgr0EYRBEFYgTaKIAjCCrRRBEEQVqCNIgiCsOL/Ad5bCnBjNNJfAAABSnpUWHRyZGtpdFBLTCByZGtpdCAyMDIyLjA5LjEAAHice79v7T0GIOBlgABGIBYGYlEgbmBkc8gA0szMyAwNEIOFQUcDrFyCUQekRQu3ShjNDqGZ0cWJsIMJlx0wMzkgNBM3A2MGEyNTAhMzAxNLBhMLawIrWwYTG7sCO0cGEwdnAieXAhd3BhM3TwIPbwYTLx8DL38Cv0AGE6dgBpOgUAIbQ4IAV4IQe4IIMxsDGysLEyMrB7uQICcbNw8vvwCX+DtoAIGBsIR/wYGok7P3gziK55YekHjpAGZ/PrH+wO5NZvYgdkf+3QPa66aB2fqyPQcqL90Gs03e2hxYXbsAzJ74XePAh8aofSD2uZDZ+8W9dPdDrPBYZrwu0A4qbg8TB6p3gKkHusEByQ0OSG5wQHKDA5IbHJDc4ABzAwODg/2G+WwHoOz9MLYYAEXzYth0PEZVAAABtXpUWHRNT0wgcmRraXQgMjAyMi4wOS4xAAB4nI1Uy07EMAy89yv8AxtlYud14LAPhBDQlWDhH7jz/8LOqiQrVWzT2mrcSRp7xp3Ixvvp5fuH/kY4TROR/+eutdIXe++nN7IHOjw+Pc90vOwPS+R4/pwvH4RKAbpGr1vs/nJ+WyKgI+3Y5SglFNrBBWRfAnnn2+hrgyGjQ4Eg0M7rGo4xryD5ipTKoWjYZXBKcQUo9E675Cpq0fdwyBnVdyDoa//6gAUebV9xnEtMrPCMzAkr+yYDBlcjai4KDJELrx0gX4FJPOAtpwDmklaQxZBWHY4hGjLlVLFWp0qzvV/CLnjJaW1L/eKRNm0JY2nTMWEsbaITxtImOiENeZ9OROXzHzrDLZ0wmrbQidzSv08nitbeMmlhJRaepa7mVOl8Tfou9HE+3XTMtYcO5/nUe0jUQm8UUePeDVCTrnnRaeyaFrXUlStquesTaqWLUNRqV5qoYRQUmsMgHDGHMAhEzIEHIYg5yEA4mosDs9IiaSDPpsrNwJK0r5eBDjGHOpRdLKm/Ulk9dGE/cWgnrj1nK/9YbJsvPzt9nn4BR1j7Hlr4eu4AAAD8elRYdFNNSUxFUyByZGtpdCAyMDIyLjA5LjEAAHicbY+7bsMwDEV/paMdSIQoiqJkT93bonvRwVGzRXEQJEOBfHypoJOVieDhfYAFy2CLP61tUCll+NpN+D2WQuPJj//AK8CX+0AgHBIZi+BRXDKzhxgcorFOCVGKZm43YmooSszozezUZ6wHdMRmtpv1icE+yVWWGXMyqmdKQUkAksRRiaBQ0zCETD4ZrUCKscVHyOpqIhTB/BBhwvAIFyZmUda/5jTEuyAxtjc31V1zX9z19rWjWa5r/bys5wkd/Nxq/X1b9ocj7Npe1+Ornt+X88et7g8XwAllo5Je5e9/bi5su75/UhQAAAAASUVORK5CYII=\n", "image/svg+xml": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lcore = Chem.MolFromSmiles('c1cc([*:1])ccc1-c2nc(c1ccc([*:2])cc1)no2')\n", "lcore" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.650633Z", "start_time": "2023-01-05T12:50:08.262336Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2R3R4
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
10
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
11
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
12
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
13
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
14
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
15
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R2 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R3 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 \n", "\n", " R4 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "10 \n", "11 \n", "12 \n", "13 \n", "14 \n", "15 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([lcore],m16,asSmiles=False,asRows=False)\n", "PandasTools.RGroupDecompositionToFrame(groups,m16,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can exclude any molecules that have R groups in non-labelled positions:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2023-01-05T12:50:08.787099Z", "start_time": "2023-01-05T12:50:08.651664Z" }, "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n", "[13:50:08] No core matches\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MolCoreR1R2
0
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
1
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
2
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
3
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
4
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
5
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
6
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
7
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
8
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
9
\"Mol\"/
\"Mol\"/
\"Mol\"/
\"Mol\"/
\n", "
" ], "text/plain": [ " Mol \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " Core \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R1 \\\n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 \n", "\n", " R2 \n", "0 \n", "1 \n", "2 \n", "3 \n", "4 \n", "5 \n", "6 \n", "7 \n", "8 \n", "9 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = rdRGroupDecomposition.RGroupDecompositionParameters()\n", "params.onlyMatchAtRGroups = True\n", "groups,unmatched = rdRGroupDecomposition.RGroupDecompose([lcore],m16,asSmiles=False,asRows=False,options=params)\n", "tmols = [x for i,x in enumerate(m16) if i not in unmatched]\n", "PandasTools.RGroupDecompositionToFrame(groups,tmols,include_core=True,redraw_sidechains=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are other useful parameters to control the calculation in that `RGroupDecompositionParameters` object, but this post is already getting pretty long, so I'm going to wrap up now and leave exploring those as an exercise for the reader. ;-)\n" ] } ], "metadata": { "_draft": { "nbviewer_url": "https://gist.github.com/0afd4f5cc9194432acd85bf22261ed8a" }, "gist": { "data": { "description": "RGroupEdgeCases.ipynb", "public": false }, "id": "0afd4f5cc9194432acd85bf22261ed8a" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }