{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Constraint Docking\n", "\n", "This feature is available since version 2.0.0\n", "\n", "**NOTE**: Since version 2.1.7 it is not needed to specify `protected_ids` to avoid failures of **MolDrug**. However, on this tutorial we will let the workaround for user with older versions. May be that the specified CReM parameters completely remove any similarity between a solution and the reference structure for constraint docking; only in this specific scenarios you should protect some atoms in order to preserve some MCS." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tempfile, os, requests, gzip, shutil, yaml\n", "from multiprocessing import cpu_count\n", "from moldrug.data import ligands, boxes, receptor_pdbqt, receptor_pdb, constraintref\n", "from moldrug import utils\n", "tmp_path = tempfile.TemporaryDirectory()\n", "import numpy as np\n", "from rdkit import Chem" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def atom_ids_list(smiles:str) -> list[int]:\n", " \"\"\"Return a list of atom IDs for the molecule\n", "\n", " Parameters\n", " ----------\n", " smiles : str\n", " The SMILES string of the molecule\n", "\n", " Returns\n", " -------\n", " list[int]\n", " List of atoms IDs\n", " \"\"\"\n", " return list(range(Chem.MolFromSmiles(smiles).GetNumAtoms()))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Setting the working directory (you could change it but it MUST be an absolute path)\n", "wd = tmp_path.name\n", "# os.makedirs('wd_tutorial', exist_ok=True)\n", "# wd = os.path.abspath('wd_tutorial')\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "COC(=O)C=1C=CC(=CC1)S(=O)(=O)N\n", "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]\n", "{'A': {'boxcenter': [12.11, 1.84, 23.56], 'boxsize': [22.5, 22.5, 22.5]}}\n", "/var/folders/fl/txtwcjh94vs_gkm_s21h8w7m0000gn/T/tmphs9lvmia ['x0161.pdbqt', 'crem.db.gz', 'ref.sdf', 'crem.db', 'x0161.pdb']\n" ] } ], "source": [ "# Getting the data\n", "lig = ligands.r_x0161\n", "box = boxes.r_x0161\n", "with open(os.path.join(wd, 'x0161.pdbqt'), 'w') as f:\n", " f.write(receptor_pdbqt.r_x0161)\n", "with open(os.path.join(wd, 'x0161.pdb'), 'w') as f:\n", " f.write(receptor_pdb.r_x0161)\n", "with open(os.path.join(wd, 'ref.sdf'), 'w') as f:\n", " f.write(constraintref.r_x0161)\n", "# Getting the CReM data base\n", "url = \"http://www.qsar4u.com/files/cremdb/replacements02_sc2.db.gz\"\n", "r = requests.get(url, allow_redirects=True)\n", "crem_dbgz_path = os.path.join(wd,'crem.db.gz')\n", "crem_db_path = os.path.join(wd,'crem.db')\n", "open(crem_dbgz_path, 'wb').write(r.content)\n", "with gzip.open(crem_dbgz_path, 'rb') as f_in:\n", " with open(crem_db_path, 'wb') as f_out:\n", " shutil.copyfileobj(f_in, f_out)\n", "\n", "print(lig)\n", "print(atom_ids_list(lig))\n", "print(box)\n", "print(wd, os.listdir(wd))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting the configuration\n", "\n", "**MolDrug**'s version 2.0.4 tried to fix some issues with constraint docking ([see release description](https://moldrug.readthedocs.io/en/latest/source/CHANGELOG.html)). If follow jobs are needed and `mutate_crem_kwargs` allows changes in heavy atoms; it *MUST* be specified the keyword `protected_ids` for `mutate_crem_kwargs`. This is needed because during the generation of constraint conformers `constraint_ref` is used as fix core. Therefore if a generated molecule does not have this core, an error happens. On version 2.0.0 the core was guessed based on MCS but some error happened ([see this RDKit bug](https://github.com/rdkit/rdkit/issues/5518)). You have then two possibles work around:\n", "\n", "1. Use `mutate_crem_kwargs` that only allow grow operations (`min_size = max_size = 0`). In this way the core will be preserved.\n", "2. Specify `protected_ids` inside `mutate_crem_kwargs`. Those IDs are the atom indexes of `seed_mol` that correspond to the atoms of `constraint_ref`.\n", "\n", "On this tutorial we will use the second strategy. In this case `constraint_ref` is the same as `seed_mol` but with some specific conformation state. Therefore we could use the the result of `atom_ids_list` function (specified above) to get the protected_ids list of atoms. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['x0161.pdbqt', 'crem.db.gz', 'ref.sdf', 'config.yml', 'crem.db', 'x0161.pdb']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "config ={\n", " \"01_grow\": {\n", " \"type\": \"GA\",\n", " \"njobs\": 6,\n", " \"seed_mol\": lig,\n", " \"AddHs\": True,\n", " \"costfunc\": \"Cost\",\n", " \"costfunc_kwargs\": {\n", " \"vina_executable\": \"/Users/klimt/GIT/docking/vina_1.2.5_mac_x86_64\", # You should change to the proper vina executable\n", " \"receptor_pdbqt_path\": os.path.join(wd, 'x0161.pdbqt'),\n", " \"boxcenter\": box['A']['boxcenter'],\n", " \"boxsize\": box['A']['boxsize'],\n", " \"exhaustiveness\": 4,\n", " \"ncores\": int(cpu_count() / 6),\n", " \"num_modes\": 1,\n", " \"constraint\": True,\n", " \"constraint_type\": \"score_only\", # local_only\n", " \"constraint_ref\": os.path.join(wd, 'ref.sdf'),\n", " \"constraint_receptor_pdb_path\": os.path.join(wd, 'x0161.pdb'),\n", " \"constraint_num_conf\": 100,\n", " \"constraint_minimum_conf_rms\": 0.01\n", " },\n", " \"crem_db_path\": crem_db_path,\n", " \"maxiter\": 2,\n", " \"popsize\": 10,\n", " \"beta\": 0.001,\n", " \"pc\": 0.5,\n", " \"get_similar\": False,\n", " \"mutate_crem_kwargs\": {\n", " \"radius\": 3,\n", " \"min_size\": 0,\n", " \"max_size\": 0,\n", " \"min_inc\": -5,\n", " \"max_inc\": 6,\n", " \"protected_ids\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],\n", " \"ncores\": 12\n", " },\n", " \"save_pop_every_gen\": 10,\n", " \"deffnm\": \"01_grow\"\n", " },\n", " \"02_allow_grow\": {\n", " \"mutate_crem_kwargs\": {\n", " \"radius\": 3,\n", " \"min_size\": 0,\n", " \"max_size\": 2,\n", " \"min_inc\": -5,\n", " \"max_inc\": 3,\n", " \"protected_ids\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],\n", " \"ncores\": 12\n", " },\n", " \"maxiter\": 2,\n", " \"deffnm\": \"02_allow_grow\"\n", " },\n", " \"03_pure_mutate\": {\n", " \"mutate_crem_kwargs\": {\n", " \"radius\": 3,\n", " \"min_size\": 1,\n", " \"max_size\": 8,\n", " \"min_inc\": -5,\n", " \"max_inc\": 3,\n", " \"protected_ids\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],\n", " \"ncores\": 12\n", " },\n", " \"maxiter\": 2,\n", " \"deffnm\": \"03_pure_mutate\"\n", " },\n", " \"04_local\": {\n", " \"mutate_crem_kwargs\": {\n", " \"radius\": 3,\n", " \"min_size\": 0,\n", " \"max_size\": 1,\n", " \"min_inc\": -1,\n", " \"max_inc\": 1,\n", " \"protected_ids\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],\n", " \"ncores\": 12\n", " },\n", " \"maxiter\": 2,\n", " \"deffnm\": \"04_local\"\n", " }\n", "}\n", "# Save the config as a yaml file\n", "with open(os.path.join(wd, 'config.yml'), 'w') as f:\n", " yaml.dump(config, f)\n", "os.listdir(wd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Move to the directory and run moldrug" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Started at Mon Sep 4 18:24:41 2023\n", "You are using moldrug: 3.4.0.\n", "\n", "CommandLineHelper(yaml_file='config.yml', fitness=None, outdir=None, continuation=False)\n", "\n", "\n", "\n", "\n", "Creating the first population with 10 members:\n", " 0%| | 0/10 [00:00,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# parms = pmd.rdkit.from_sdf(os.path.join(wd, '04_local_pop.sdf')) # require parmed, rdkit\n", "# parms" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6a6ce34e13434d86aaa807885764fc4d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(NGLWidget(), IntSlider(value=0, max=9)))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# view = nv.NGLWidget()\n", "\n", "# slider = IntSlider(max=len(parms)-1)\n", "\n", "# def show_one_ligand(change):\n", "# val = change['new']\n", "# view.show_only([val])\n", " \n", "# slider.observe(show_one_ligand, 'value')\n", "\n", "# VBox([view, slider])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# for p in parms:\n", "# view.add_structure(nv.ParmEdTrajectory(p))\n", "# view.show_only([0])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "avg = -3.5055\n", "min = -4.942\n", "max = -0.549\n" ] } ], "source": [ "result = utils.decompress_pickle(os.path.join(wd, '04_local_result.pbz2'))\n", "valid_individuals_vina_score = [individual.vina_score for individual in result.pop if individual.vina_score != np.inf]\n", "print(f\"avg = {np.average(valid_individuals_vina_score)}\")\n", "print(f\"min = {min(valid_individuals_vina_score)}\")\n", "print(f\"max = {max(valid_individuals_vina_score)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See how it looks with local_only" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['02_allow_grow_pop.sdf',\n", " '04_local_pop.sdf',\n", " 'new_config.yml',\n", " '01_grow_result.pbz2',\n", " 'x0161.pdbqt',\n", " '03_pure_mutate_pop.sdf',\n", " '04_local_result.pbz2',\n", " '01_grow_pop.sdf',\n", " '03_pure_mutate_result.pbz2',\n", " 'crem.db.gz',\n", " '03_pure_mutate_pop.pbz2',\n", " 'ref.sdf',\n", " 'config.yml',\n", " 'error.tar.gz',\n", " '02_allow_grow_result.pbz2',\n", " 'crem.db',\n", " '01_grow_pop.pbz2',\n", " '02_allow_grow_pop.pbz2',\n", " '04_local_pop.pbz2',\n", " 'x0161.pdb']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Save the new config as a yaml file\n", "new_config = config.copy()\n", "new_config['01_grow']['costfunc_kwargs']['constraint_type'] = 'local_only'\n", "\n", "with open(os.path.join(wd, 'new_config.yml'), 'w') as f:\n", " yaml.dump(config, f)\n", "os.listdir(wd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Move back to the directory and run moldrug" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Started at Mon Sep 4 18:32:14 2023\n", "You are using moldrug: 3.4.0.\n", "\n", "CommandLineHelper(yaml_file='new_config.yml', fitness=None, outdir=None, continuation=False)\n", "\n", "\n", "\n", "\n", "Creating the first population with 10 members:\n", "100%|███████████████████████████████████████████| 10/10 [00:52<00:00, 5.20s/it]\n", "Initial Population: Best Individual: Individual(idx = 8, smiles = COC(=O)c1ccc(S(=O)(=O)NN=C(N)N)cc1, cost = 1.0)\n", "Accepted rate: 10 / 10\n", "\n", "File 01_grow_pop.sdf was createad!\n", "Evaluating generation 1 / 2:\n", "100%|█████████████████████████████████████████████| 5/5 [01:19<00:00, 16.00s/it]\n", "Generation 1: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 1 / 5\n", "\n", "Evaluating generation 2 / 2:\n", "100%|█████████████████████████████████████████████| 5/5 [01:39<00:00, 19.89s/it]\n", "File 01_grow_pop.sdf was createad!\n", "Generation 2: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 0 / 5\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 2 generations with a population of 10 individuals. A total number of 20 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 1.0)\n", "Final Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279)\n", "The cost function dropped in 0.11876570734017211 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 248.75 (s).\n", "Finished at Mon Sep 4 18:36:23 2023.\n", "\n", "File 01_grow_pop.sdf was createad!\n", "The follow job 02_allow_grow started.\n", "File 02_allow_grow_pop.sdf was createad!\n", "Evaluating generation 3 / 4:\n", "100%|█████████████████████████████████████████████| 5/5 [01:33<00:00, 18.65s/it]\n", "Generation 3: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 0 / 5\n", "\n", "Evaluating generation 4 / 4:\n", "100%|█████████████████████████████████████████████| 5/5 [01:12<00:00, 14.41s/it]\n", "File 02_allow_grow_pop.sdf was createad!\n", "Generation 4: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 0 / 5\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 4 generations with a population of 10 individuals. A total number of 30 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 1.0)\n", "Final Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279)\n", "The cost function dropped in 0.11876570734017211 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 180.65 (s).\n", "Finished at Mon Sep 4 18:39:24 2023.\n", "\n", "File 02_allow_grow_pop.sdf was createad!\n", "The job 02_allow_grow finished!\n", "The follow job 03_pure_mutate started.\n", "File 03_pure_mutate_pop.sdf was createad!\n", "Evaluating generation 5 / 6:\n", " 20%|█████████ | 1/5 [00:12<00:51, 12.88s/it]\n", "/Users/klimt/GIT/moldrug/moldrug/utils.py:1349: UserWarning: Parallelization did not work. Trying with serial...\n", " warn(\"Parallelization did not work. Trying with serial...\")\n", "100%|█████████████████████████████████████████████| 5/5 [02:53<00:00, 34.76s/it]\n", "Generation 5: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 1 / 5\n", "\n", "Evaluating generation 6 / 6:\n", "100%|█████████████████████████████████████████████| 5/5 [01:04<00:00, 12.90s/it]\n", "File 03_pure_mutate_pop.sdf was createad!\n", "Generation 6: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 0 / 5\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 6 generations with a population of 10 individuals. A total number of 40 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 1.0)\n", "Final Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279)\n", "The cost function dropped in 0.11876570734017211 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 267.81 (s).\n", "Finished at Mon Sep 4 18:43:52 2023.\n", "\n", "File 03_pure_mutate_pop.sdf was createad!\n", "The job 03_pure_mutate finished!\n", "The follow job 04_local started.\n", "File 04_local_pop.sdf was createad!\n", "Evaluating generation 7 / 8:\n", "100%|█████████████████████████████████████████████| 5/5 [01:22<00:00, 16.51s/it]\n", "Generation 7: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 0 / 5\n", "\n", "Evaluating generation 8 / 8:\n", "100%|█████████████████████████████████████████████| 3/3 [00:17<00:00, 5.76s/it]\n", "File 04_local_pop.sdf was createad!\n", "Generation 8: Best Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279).\n", "Accepted rate: 0 / 3\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 8 generations with a population of 10 individuals. A total number of 48 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 1.0)\n", "Final Individual: Individual(idx = 12, smiles = NC(N)=NNS(=O)(=O)c1ccc(C(=O)OCC(F)(F)F)cc1, cost = 0.8812342926598279)\n", "The cost function dropped in 0.11876570734017211 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 113.96 (s).\n", "Finished at Mon Sep 4 18:45:46 2023.\n", "\n", "File 04_local_pop.sdf was createad!\n", "The job 04_local finished!\n" ] } ], "source": [ "cwd = os.getcwd()\n", "os.chdir(wd)\n", "! moldrug new_config.yml\n", "os.chdir(cwd)\n", "os.listdir(wd)\n", "os.chdir(cwd)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# parms = pmd.rdkit.from_sdf(os.path.join(wd, '04_local_pop.sdf')) # require parmed, rdkit\n", "# parms" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# view = nv.NGLWidget()\n", "\n", "# slider = IntSlider(max=len(parms)-1)\n", "\n", "# def show_one_ligand(change):\n", "# val = change['new']\n", "# view.show_only([val])\n", " \n", "# slider.observe(show_one_ligand, 'value')\n", "\n", "# VBox([view, slider])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# for p in parms:\n", "# view.add_structure(nv.ParmEdTrajectory(p))\n", "# view.show_only([0])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "avg = -5.402699999999999\n", "min = -6.342\n", "max = -4.724\n" ] } ], "source": [ "result = utils.decompress_pickle(os.path.join(wd, '04_local_result.pbz2'))\n", "valid_individuals_vina_score = [individual.vina_score for individual in result.pop if individual.vina_score != np.inf]\n", "print(f\"avg = {np.average(valid_individuals_vina_score)}\")\n", "print(f\"min = {min(valid_individuals_vina_score)}\")\n", "print(f\"max = {max(valid_individuals_vina_score)}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see now the fix core changes a little. On both examples (when you run the tutorial by yourself may be different) **MolDrug** can not optimize the cost function. Even so, the local_only strategy performs better respect to maximum and average cost. One of the main cause of this behavior is the desirability definition which is for vina_score:\n", "\n", "```\n", "'vina_score': {\n", " 'w': 1,\n", " 'SmallerTheBest': {\n", " 'Target': -12,\n", " 'UpperLimit': -6,\n", " 'r': 1\n", " }\n", "}\n", "```\n", "\n", "This means that if the vina score is not lower than -6 the cost function will be always 1. In this scenario all were are not optimizing over the generations the \"best\" individuals. In other words, it is important to see what is the behavior of our system (one small simulation) and then tune your parameters accordantly. On possibility for this system is change UpperLimit to, for example -4. There are a lot of other different options that you could play around (e.g. increase the generations, etc...). It is important to remark that this example is just for the tutorial, in a real project you may need to increase generations, tune the crem keywords, etc. I let it to you as a challenge. Happy simulations!\n", "\n", "Just as a bonus I let you here how to change the desirability definition." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Change the desirability" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['02_allow_grow_pop.sdf',\n", " '04_local_pop.sdf',\n", " 'new_config.yml',\n", " '01_grow_result.pbz2',\n", " 'x0161.pdbqt',\n", " '03_pure_mutate_pop.sdf',\n", " '04_local_result.pbz2',\n", " '01_grow_pop.sdf',\n", " '03_pure_mutate_result.pbz2',\n", " 'crem.db.gz',\n", " '03_pure_mutate_pop.pbz2',\n", " 'tune_config.yml',\n", " 'ref.sdf',\n", " 'config.yml',\n", " 'error.tar.gz',\n", " '02_allow_grow_result.pbz2',\n", " 'crem.db',\n", " '01_grow_pop.pbz2',\n", " '02_allow_grow_pop.pbz2',\n", " '04_local_pop.pbz2',\n", " 'x0161.pdb']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Save the new config as a yaml file\n", "tune_config = new_config.copy()\n", "tune_config['01_grow']['costfunc_kwargs']['desirability'] = {\n", " 'qed': {\n", " 'w': 1,\n", " 'LargerTheBest': {\n", " 'LowerLimit': 0.1,\n", " 'Target': 0.75,\n", " 'r': 1\n", " }\n", " },\n", " 'sa_score': {\n", " 'w': 1,\n", " 'SmallerTheBest': {\n", " 'Target': 3,\n", " 'UpperLimit': 7,\n", " 'r': 1\n", " }\n", " },\n", " 'vina_score': {\n", " 'w': 1,\n", " 'SmallerTheBest': {\n", " 'Target': -8,\n", " 'UpperLimit': -3,\n", " 'r': 1\n", " }\n", " }\n", " }\n", "\n", "with open(os.path.join(wd, 'tune_config.yml'), 'w') as f:\n", " yaml.dump(config, f)\n", "os.listdir(wd)\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Started at Mon Sep 4 18:45:47 2023\n", "You are using moldrug: 3.4.0.\n", "\n", "CommandLineHelper(yaml_file='tune_config.yml', fitness=None, outdir=None, continuation=False)\n", "\n", "\n", "\n", "\n", "Creating the first population with 10 members:\n", "100%|███████████████████████████████████████████| 10/10 [01:12<00:00, 7.27s/it]\n", "Initial Population: Best Individual: Individual(idx = 1, smiles = N#CCOC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.5520254583868194)\n", "Accepted rate: 10 / 10\n", "\n", "File 01_grow_pop.sdf was createad!\n", "Evaluating generation 1 / 2:\n", "100%|█████████████████████████████████████████████| 5/5 [01:32<00:00, 18.58s/it]\n", "Generation 1: Best Individual: Individual(idx = 1, smiles = N#CCOC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.5520254583868194).\n", "Accepted rate: 4 / 5\n", "\n", "Evaluating generation 2 / 2:\n", "100%|█████████████████████████████████████████████| 5/5 [01:23<00:00, 16.72s/it]\n", "File 01_grow_pop.sdf was createad!\n", "Generation 2: Best Individual: Individual(idx = 1, smiles = N#CCOC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.5520254583868194).\n", "Accepted rate: 2 / 5\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 2 generations with a population of 10 individuals. A total number of 20 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.6433856814869847)\n", "Final Individual: Individual(idx = 1, smiles = N#CCOC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.5520254583868194)\n", "The cost function dropped in 0.09136022310016534 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "Note: Check the running warnings and erorrs in error.tar.gz file!\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 266.59 (s).\n", "Finished at Mon Sep 4 18:50:14 2023.\n", "\n", "File 01_grow_pop.sdf was createad!\n", "The follow job 02_allow_grow started.\n", "File 02_allow_grow_pop.sdf was createad!\n", "Evaluating generation 3 / 4:\n", "100%|█████████████████████████████████████████████| 5/5 [01:35<00:00, 19.14s/it]\n", "Generation 3: Best Individual: Individual(idx = 1, smiles = N#CCOC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.5520254583868194).\n", "Accepted rate: 2 / 5\n", "\n", "Evaluating generation 4 / 4:\n", "100%|█████████████████████████████████████████████| 5/5 [01:00<00:00, 12.00s/it]\n", "File 02_allow_grow_pop.sdf was createad!\n", "Generation 4: Best Individual: Individual(idx = 1, smiles = N#CCOC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.5520254583868194).\n", "Accepted rate: 2 / 5\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 4 generations with a population of 10 individuals. A total number of 30 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.6433856814869847)\n", "Final Individual: Individual(idx = 1, smiles = N#CCOC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.5520254583868194)\n", "The cost function dropped in 0.09136022310016534 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 171.95 (s).\n", "Finished at Mon Sep 4 18:53:06 2023.\n", "\n", "File 02_allow_grow_pop.sdf was createad!\n", "The job 02_allow_grow finished!\n", "The follow job 03_pure_mutate started.\n", "File 03_pure_mutate_pop.sdf was createad!\n", "Evaluating generation 5 / 6:\n", "100%|█████████████████████████████████████████████| 5/5 [00:59<00:00, 11.89s/it]\n", "Generation 5: Best Individual: Individual(idx = 31, smiles = CC(=O)NS(=O)(=O)c1ccc(C(=O)OCC#N)cc1, cost = 0.5443216491859986).\n", "Accepted rate: 2 / 5\n", "\n", "Evaluating generation 6 / 6:\n", "100%|█████████████████████████████████████████████| 4/4 [01:13<00:00, 18.31s/it]\n", "File 03_pure_mutate_pop.sdf was createad!\n", "Generation 6: Best Individual: Individual(idx = 31, smiles = CC(=O)NS(=O)(=O)c1ccc(C(=O)OCC#N)cc1, cost = 0.5443216491859986).\n", "Accepted rate: 1 / 4\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 6 generations with a population of 10 individuals. A total number of 39 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.6433856814869847)\n", "Final Individual: Individual(idx = 31, smiles = CC(=O)NS(=O)(=O)c1ccc(C(=O)OCC#N)cc1, cost = 0.5443216491859986)\n", "The cost function dropped in 0.09906403230098615 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "Note: Check the running warnings and erorrs in error.tar.gz file!\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 148.44 (s).\n", "Finished at Mon Sep 4 18:55:34 2023.\n", "\n", "File 03_pure_mutate_pop.sdf was createad!\n", "The job 03_pure_mutate finished!\n", "The follow job 04_local started.\n", "File 04_local_pop.sdf was createad!\n", "Evaluating generation 7 / 8:\n", "100%|█████████████████████████████████████████████| 5/5 [00:52<00:00, 10.51s/it]\n", "Generation 7: Best Individual: Individual(idx = 31, smiles = CC(=O)NS(=O)(=O)c1ccc(C(=O)OCC#N)cc1, cost = 0.5443216491859986).\n", "Accepted rate: 4 / 5\n", "\n", "Evaluating generation 8 / 8:\n", "100%|█████████████████████████████████████████████| 5/5 [00:39<00:00, 7.93s/it]\n", "File 04_local_pop.sdf was createad!\n", "Generation 8: Best Individual: Individual(idx = 45, smiles = N#CCOC(=O)c1ccc(S(=O)(=O)NC(=O)CF)cc1, cost = 0.5266695572213174).\n", "Accepted rate: 3 / 5\n", "\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "The simulation finished successfully after 8 generations with a population of 10 individuals. A total number of 49 Individuals were seen during the simulation.\n", "Initial Individual: Individual(idx = 0, smiles = COC(=O)c1ccc(S(N)(=O)=O)cc1, cost = 0.6433856814869847)\n", "Final Individual: Individual(idx = 45, smiles = N#CCOC(=O)c1ccc(S(=O)(=O)NC(=O)CF)cc1, cost = 0.5266695572213174)\n", "The cost function dropped in 0.11671612426566735 units.\n", "\n", "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n", "\n", "Total time (2 generations): 108.05 (s).\n", "Finished at Mon Sep 4 18:57:22 2023.\n", "\n", "File 04_local_pop.sdf was createad!\n", "The job 04_local finished!\n" ] } ], "source": [ "\n", "cwd = os.getcwd()\n", "os.chdir(wd)\n", "! moldrug tune_config.yml\n", "os.chdir(cwd)\n", "os.listdir(wd)\n", "os.chdir(cwd)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "avg = -5.1836\n", "min = -5.727\n", "max = -4.8\n" ] } ], "source": [ "result = utils.decompress_pickle(os.path.join(wd, '04_local_result.pbz2'))\n", "valid_individuals_vina_score = [individual.vina_score for individual in result.pop if individual.vina_score != np.inf]\n", "print(f\"avg = {np.average(valid_individuals_vina_score)}\")\n", "print(f\"min = {min(valid_individuals_vina_score)}\")\n", "print(f\"max = {max(valid_individuals_vina_score)}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you see the maximum is much lower than the rest of the examples. So, we get some improvements changing the desirability." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.13 ('moldrug')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "580997a69bc0f3991857025e1d93e87ed090e2c1fa4aff0ca8e9824f56baf8cb" } } }, "nbformat": 4, "nbformat_minor": 2 }