{ "cells": [ { "cell_type": "markdown", "id": "4477b206-34bf-4bae-a780-47ba8c3e9251", "metadata": {}, "source": [ "The RDKit's [Contrib](https://github.com/rdkit/rdkit/tree/master/Contrib) directory includes implementations of two descriptors, `SA_Score` and `NP_Score`, which are not included in the core RDKit because they both use large data files. Still, the descriptors can be useful and questions about how to use them come up every once in a while, so here's a really short blog post showing how to use them.\n", "\n", "Both descriptors are implemented in Python, so they're currently only accessible from Python. We're working on making them available in KNIME too (in fact this blog post was prompted by a question Alice at KNIME asked me as she was working on the node), but that's going to take a bit longer.\n", "\n", "If you want to learn more about the descriptors themselves, the publications describing for those two descriptors are:\n", "\n", "1. SA_Score: http://www.jcheminf.com/content/1/1/8 (this one is open access)\n", "2. NP_Score: http://pubs.acs.org/doi/abs/10.1021/ci700286x (this one is not open access)\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "3584082d-620d-4d60-9337-ac0b0f574688", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023.09.2\n" ] } ], "source": [ "from rdkit import Chem\n", "from rdkit.Chem.Draw import IPythonConsole\n", "from rdkit.Chem import rdDepictor\n", "rdDepictor.SetPreferCoordGen(True)\n", "import rdkit\n", "print(rdkit.__version__)" ] }, { "cell_type": "markdown", "id": "eb47435f-4f44-45fa-993c-6835c2bfbe07", "metadata": {}, "source": [ "We'll use a compound from one of the papers in J. Med. Chem.'s ASAP section as I was writing this post (https://pubs.acs.org/doi/10.1021/acs.jmedchem.3c01626). \n", "\n", "> Mini rant: it was easy for me to get the structure of this compound since J. Med. Chem. suggests that authors provide SMILES strings for the compounds in their papers and compliance is pretty good. Too bad this isn't true in computational/cheminformatics journals.\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "2b35641b-616e-4801-bd7a-95059df2c6d2", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = Chem.MolFromSmiles('FC1=CN=C(NC2=NC=C(C(N3CCN(C(C4CC4)=O)CC3)=O)C=C2)N=C1C5=CC=C6N=C(N(CC)CC)SC6=C5')\n", "m" ] }, { "cell_type": "markdown", "id": "43e3183b-cbac-40af-9400-df60e1f281cd", "metadata": {}, "source": [ "# Using the RDKit installed from conda-forge\n", "\n", "The Contrib directory is installed with the RDKit conda package, you just need to tell python to look in the appropriate place:" ] }, { "cell_type": "code", "execution_count": 3, "id": "3f93ea6e-9aef-4b78-b2d3-0facb2922cbc", "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "sys.path.append(os.path.join(os.environ['CONDA_PREFIX'],'share','RDKit','Contrib'))\n", "\n", "from SA_Score import sascorer\n", "from NP_Score import npscorer" ] }, { "cell_type": "markdown", "id": "bc45599d-9821-43d9-bf92-ab658ee7bed1", "metadata": {}, "source": [ "# Using the RDKit installed from pypi\n", "\n", "When Chris (@kuelumbus on GitHub) packages the RDKit for pypi, he copies the Contrib directory into the rest of the Python code, so you do:\n", "\n", "```\n", "from rdkit.Contrib.SA_Score import sascorer\n", "from rdkit.Contrib.NP_Score import npscorer\n", "```" ] }, { "cell_type": "markdown", "id": "c2ca986c-e94c-4d4d-bd48-e99ca6717899", "metadata": {}, "source": [ "# Calculating the scores\n", "\n", "This is the same regardless of how you have the RDKit installed:" ] }, { "cell_type": "code", "execution_count": 4, "id": "532d7847-7d51-4458-a009-9abecb9e8fb4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.8716389090191434" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sascorer.calculateScore(m)" ] }, { "cell_type": "markdown", "id": "cb999625-0504-4999-8215-6c88bdeaecdf", "metadata": {}, "source": [ "The SA_Score ranges from 1 to 10 with 1 being easy to make and 10 being hard to make." ] }, { "cell_type": "code", "execution_count": 5, "id": "3cb1f499-a021-4e0e-9737-be898babc028", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "reading NP model ...\n", "model in\n" ] }, { "data": { "text/plain": [ "-1.960519718438019" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fscore = npscorer.readNPModel()\n", "npscorer.scoreMol(m,fscore)" ] }, { "cell_type": "markdown", "id": "b4c1c5f7-fa0c-4e37-a783-cacacdfc7792", "metadata": {}, "source": [ "The NP score ranges from -5 to 5, so this is pretty low.\n", "\n", "You can also get a confidence value:" ] }, { "cell_type": "code", "execution_count": 6, "id": "7f3ea417-47e4-4eab-a41e-9bbe9f2db3ea", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NPLikeness(nplikeness=-1.960519718438019, confidence=1.0)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "npscorer.scoreMolWConfidence(m,fscore)" ] }, { "cell_type": "markdown", "id": "2d84d483-bbf7-4f28-9ee1-98a5983acc59", "metadata": {}, "source": [ "That's it for this post. As I said in the intro, it's a short one!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" } }, "nbformat": 4, "nbformat_minor": 5 }