{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How-to\n",
    "This notebook serves as a practical guide to common questions users might have. Some functionalities of MolDrug may not be available on pypi or conda yet. This could be because the new version is not yet release, but in brief it is going to be. You could install directly from the repo if some problem pops up. Just paste the following in a code cell:\n",
    "\n",
    "```bash\n",
    "! pip install git+https://github.com/ale94mleon/MolDrug.git@main\n",
    "```\n",
    "\n",
    "**Table of content**\n",
    "\n",
    "1. [Execute moldrug from the command line](#Execute-moldrug-from-the-command-line).\n",
    "2. [Create your own cost function](#Create-your-own-cost-function)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import section\n",
    "from moldrug.data import ligands, boxes, receptors\n",
    "from rdkit import Chem\n",
    "import tempfile, os, requests,inspect, gzip, shutil, yaml, sys\n",
    "from moldrug import utils\n",
    "from typing import List\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Execute moldrug from the command line\n",
    "\n",
    "MolDrug it is meant to be for using as a python module. However it has capabilities to work from the command line. Due to the complexity of the input for a normal simulation, [yaml](https://yaml.org/) structured file are used for the input specification. Let first see the help of moldrug."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "usage: moldrug [-h] [-v] [-f FITNESS] [-o OUTDIR] yaml_file\n",
      "\n",
      "For information of MolDrug:\n",
      "    Docs: https://moldrug.readthedocs.io/en/latest/\n",
      "    Source Code: https://github.com/ale94mleon/moldrug\n",
      "\n",
      "positional arguments:\n",
      "  yaml_file             The configuration yaml file\n",
      "\n",
      "optional arguments:\n",
      "  -h, --help            show this help message and exit\n",
      "  -v, --version         show program's version number and exit\n",
      "  -f FITNESS, --fitness FITNESS\n",
      "                        The path to the user-custom fitness module; inside of which the given custom cost function must be implemented. See the docs for how to do it properly. E.g. my/awesome/fitness_module.py.By default will look in the moldrug.fitness module.\n",
      "  -o OUTDIR, --outdir OUTDIR\n",
      "                        The path to where all the files should be written. By default the current working directory will be used (where the command line was invoked).\n"
     ]
    }
   ],
   "source": [
    "! moldrug -h"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As it shows, only one positional arguments is needed `yaml_file`. Then you could give:\n",
    "\n",
    "-   `fitness`: A customized fitness python module where your cost function must be implemented.\n",
    "\n",
    "-   `outdir`: The out directory.\n",
    "\n",
    "Lets go steep by steep"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### yaml_file anatomy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "COC(=O)C=1C=CC(=CC1)S(=O)(=O)N\n",
      "{'A': {'boxcenter': [12.11, 1.84, 23.56], 'boxsize': [22.5, 22.5, 22.5]}}\n",
      "/tmp/tmp0n8w1rkf ['crem.db.gz', 'x0161.pdbqt', 'crem.db']\n"
     ]
    }
   ],
   "source": [
    "# Getting the data\n",
    "tmp_path = tempfile.TemporaryDirectory()\n",
    "lig = ligands.r_x0161\n",
    "box = boxes.r_x0161\n",
    "with open(os.path.join(tmp_path.name, 'x0161.pdbqt'), 'w') as f:\n",
    "    f.write(receptors.r_x0161)\n",
    "\n",
    "\n",
    "# Getting the CReM data base\n",
    "url = \"http://www.qsar4u.com/files/cremdb/replacements02_sc2.db.gz\"\n",
    "r = requests.get(url, allow_redirects=True)\n",
    "crem_dbgz_path = os.path.join(tmp_path.name,'crem.db.gz')\n",
    "crem_db_path = os.path.join(tmp_path.name,'crem.db')\n",
    "open(crem_dbgz_path, 'wb').write(r.content)\n",
    "with gzip.open(crem_dbgz_path, 'rb') as f_in:\n",
    "    with open(crem_db_path, 'wb') as f_out:\n",
    "        shutil.copyfileobj(f_in, f_out)\n",
    "\n",
    "print(lig)\n",
    "print(box)\n",
    "print(tmp_path.name, os.listdir(tmp_path.name))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we have all that we need. The SMILES, the definition of the box, the CReM data base and the receptor. But first let see what are the arguments that the function `utils.Local` needs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['self',\n",
       " 'seed_mol',\n",
       " 'crem_db_path',\n",
       " 'costfunc',\n",
       " 'grow_crem_kwargs',\n",
       " 'costfunc_kwargs',\n",
       " 'AddHs']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inspect.getfullargspec(utils.Local.__init__).args"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will pass all of this parameters using config.yaml file.\n",
    "\n",
    "**Note**: Here we will not work neither modify with the desirability parameter of the cost function. We will talk about it in a separate tutorial inside of Advance Topics.\n",
    "\n",
    "**Warning!**: Check your correct temporal file and change it accordantly. Of course you could also set a normal directory and safe the results of the simulation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Here is an example yaml file to work with the class `utils.Local`.\n",
    "\n",
    "```yaml\n",
    "main:\n",
    "  type: Local\n",
    "  njobs: 1\n",
    "  pick: 1\n",
    "  AddHs: False\n",
    "  seed_mol: COC(=O)C=1C=CC(=CC1)S(=O)(=O)N\n",
    "  costfunc: Cost\n",
    "  costfunc_kwargs:\n",
    "    vina_executable: vina\n",
    "    receptor_path: /tmp/your_tmp_dir/x0161.pdbqt\n",
    "    boxcenter:\n",
    "      - 12.11\n",
    "      - 1.84\n",
    "      - 23.56\n",
    "    boxsize:\n",
    "      - 22.5\n",
    "      - 22.5\n",
    "      - 22.5\n",
    "    exhaustiveness: 4\n",
    "    ncores: 12\n",
    "    num_modes: 1\n",
    "  crem_db_path: /tmp/your_tmp_dir/crem.db\n",
    "  grow_crem_kwargs:\n",
    "    radius: 3\n",
    "    min_atoms: 1\n",
    "    max_atoms: 2\n",
    "    ncores: 12\n",
    "```\n",
    "The structure of a yaml file is like a python dictionary but more human readable (see this [youtube video](https://www.youtube.com/watch?v=1uFVr15xDGg&list=PL6ebkIZFT4xXiVdpOeKR4o_sKLSY0aQf_&index=11) if you are not familiar).\n",
    "\n",
    "First we have the directrix:\n",
    "\n",
    "```yaml\n",
    "main:\n",
    "```\n",
    "\n",
    "This is just the name of your main project and could be any word (we will see that `moldrug.utils.GA` accept also follow projects). Then we have:\n",
    "\n",
    "```yaml\n",
    "  type: Local\n",
    "  njobs: 1\n",
    "  pick: 1\n",
    "  AddHs: False\n",
    "```\n",
    "\n",
    "Look how this section is inside of `main` (one indentation level). `type` is the name of the class inside `moldrug.utils`; for now could be `GA` or `Local` (case insensitive). In this case we want to use the class `Local`. `njobs`, `pick` and `AddHs` are the parameters when the class `Local` (or `GA`) is call. The rest of the parameters are just the parameters that needs `Local` for the initialization. All of them are at the same level of the previous ones. Because `costfunc_kwargs` is a dictionary; we add for its value a new indentation level:\n",
    "\n",
    "```yaml\n",
    "  costfunc_kwargs:\n",
    "    vina_executable: vina\n",
    "    receptor_path: /tmp/your_tmp_dir/x0161.pdbqt\n",
    "    boxcenter:\n",
    "      - 12.11\n",
    "      - 1.84\n",
    "      - 23.56\n",
    "    boxsize:\n",
    "      - 22.5\n",
    "      - 22.5\n",
    "      - 22.5\n",
    "    exhaustiveness: 4\n",
    "    ncores: 12\n",
    "    num_modes: 1\n",
    "```\n",
    "As you could imagine the keyword boxcenter is a list and this is one of the way to represent list in yaml. Cool right?\n",
    "\n",
    "The next dictionary is the same that the yaml file and we will save it in the temporal file that we created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['crem.db.gz', 'x0161.pdbqt', 'config.yml', 'crem.db']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "config = {\n",
    "    \"main\": {\n",
    "        \"type\": \"Local\",\n",
    "        \"njobs\": 1,\n",
    "        \"pick\": 1,\n",
    "        \"seed_mol\": lig,\n",
    "        \"costfunc\": \"Cost\",\n",
    "        \"costfunc_kwargs\": {\n",
    "            \"vina_executable\": \"vina\",\n",
    "            \"receptor_path\": os.path.join(tmp_path.name, 'x0161.pdbqt'),\n",
    "            \"boxcenter\": [\n",
    "                12.11,\n",
    "                1.84,\n",
    "                23.56\n",
    "            ],\n",
    "            \"boxsize\": [\n",
    "                22.5,\n",
    "                22.5,\n",
    "                22.5\n",
    "            ],\n",
    "            \"exhaustiveness\": 4,\n",
    "            \"ncores\": 12,\n",
    "            \"num_modes\": 1\n",
    "        },\n",
    "        \"crem_db_path\": os.path.join(tmp_path.name, 'crem.db'),\n",
    "        \"grow_crem_kwargs\": {\n",
    "            \"radius\": 3,\n",
    "            \"min_atoms\": 1,\n",
    "            \"max_atoms\": 2,\n",
    "            \"ncores\": 12\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\n",
    "# Save the config as a yaml file\n",
    "with open(os.path.join(tmp_path.name, 'config.yml'), 'w') as f:\n",
    "    yaml.dump(config, f)\n",
    "os.listdir(tmp_path.name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Move to the directory and run moldrug"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The main job is being executed.\n",
      "Calculating cost function...\n",
      "100%|█████████████████████████████████████████████| 2/2 [00:06<00:00,  3.20s/it]\n",
      "File local_pop.sdf was createad!\n",
      "The main job finished!.\n"
     ]
    }
   ],
   "source": [
    "cwd = os.getcwd()\n",
    "os.chdir(tmp_path.name)\n",
    "! moldrug config.yml\n",
    "os.chdir(cwd)\n",
    "os.listdir(tmp_path.name)\n",
    "os.chdir(cwd)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you see the simulation run successfully and the following files were generated:\n",
    "\n",
    "1.  `local_result.pbz2`; the compresses version of the `Local` class with all the information of the simulation (binary file).\n",
    "\n",
    "2.  `local_pop.sdf`; the mol structures. This file is useful to use inside Pymol."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Specify a custom cost function\n",
    "\n",
    "The following is a simple cost function based on the vina score and very similar to `moldrug.fitness.Cost` used by **MolDrug**. The details on the parameters and how to correctly implement it will be discussed letter on the [following section](#create-your-own-cost-function).\n",
    "\n",
    "\n",
    "```python\n",
    "def MyAwesomeCost(Individual:utils.Individual, wd:str = '.vina_jobs', vina_executable:str = 'vina', receptor_path:str = None, boxcenter:List[float] = None, boxsize:List[float] =None, exhaustiveness:int = 8, ncores:int = 1,  num_modes:int = 1):\n",
    "\n",
    "    # Getting Vina score\n",
    "    cmd = f\"{vina_executable} --receptor {receptor_path} --ligand {os.path.join(wd, f'{Individual.idx}.pdbqt')} \"\\\n",
    "        f\"--center_x {boxcenter[0]} --center_y {boxcenter[1]} --center_z {boxcenter[2]} \"\\\n",
    "        f\"--size_x {boxsize[0]} --size_y {boxsize[1]} --size_z {boxsize[2]} \"\\\n",
    "        f\"--out {os.path.join(wd, f'{Individual.idx}_out.pdbqt')} --cpu {ncores} --exhaustiveness {exhaustiveness} --num_modes {num_modes}\"\n",
    "\n",
    "    # Creating the ligand pdbqt\n",
    "    with open(os.path.join(wd, f'{Individual.idx}.pdbqt'), 'w') as l:\n",
    "        l.write(Individual.pdbqt)\n",
    "    utils.run(cmd)\n",
    "\n",
    "    # Getting the information\n",
    "    best_energy = utils.VINA_OUT(os.path.join(wd, f'{Individual.idx}_out.pdbqt')).BestEnergy()\n",
    "    # Changing the 3D conformation by the conformation of the binding pose\n",
    "    Individual.pdbqt = ''.join(best_energy.chunk)\n",
    "\n",
    "    # Getting the Scoring function of Vina\n",
    "    Individual.vina_score = best_energy.freeEnergy\n",
    "\n",
    "    # Getting the Scoring function of Vina and assign it to the cost attribute of the Individual\n",
    "    Individual.cost = best_energy.freeEnergy\n",
    "    return Individual\n",
    "```\n",
    "\n",
    "Because this cost function accept the same parameters we can use the same configuration yaml file; but the name of the function is different. So we have to modify that parameter. Let's save the function in a .py file with the corresponded import statements \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['local_result.pbz2',\n",
       " 'crem.db.gz',\n",
       " 'x0161.pdbqt',\n",
       " 'config.yml',\n",
       " 'local_pop.sdf',\n",
       " 'crem.db',\n",
       " 'config_custom_fitness.yml']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a new config file\n",
    "# Save the config as a yaml file\n",
    "config['main']['costfunc'] = 'MyAwesomeCost'\n",
    "with open(os.path.join(tmp_path.name, 'config_custom_fitness.yml'), 'w') as f:\n",
    "    yaml.dump(config, f)\n",
    "os.listdir(tmp_path.name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['local_result.pbz2',\n",
       " 'crem.db.gz',\n",
       " 'x0161.pdbqt',\n",
       " 'config.yml',\n",
       " 'local_pop.sdf',\n",
       " 'crem.db',\n",
       " 'MyCustomFitness.py',\n",
       " 'config_custom_fitness.yml']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "str_module = \"\"\"#!/usr/bin/env python3\n",
    "# -*- coding: utf-8 -*-\n",
    "from moldrug import utils\n",
    "from typing import List\n",
    "import os\n",
    "\n",
    "def MyAwesomeCost(Individual:utils.Individual, wd:str = '.vina_jobs', vina_executable:str = 'vina', receptor_path:str = None, boxcenter:List[float] = None, boxsize:List[float] =None, exhaustiveness:int = 8, ncores:int = 1,  num_modes:int = 1):\n",
    "\n",
    "    # Getting Vina score\n",
    "    cmd = f\"{vina_executable} --receptor {receptor_path} --ligand {os.path.join(wd, f'{Individual.idx}.pdbqt')} \"\\\n",
    "        f\"--center_x {boxcenter[0]} --center_y {boxcenter[1]} --center_z {boxcenter[2]} \"\\\n",
    "        f\"--size_x {boxsize[0]} --size_y {boxsize[1]} --size_z {boxsize[2]} \"\\\n",
    "        f\"--out {os.path.join(wd, f'{Individual.idx}_out.pdbqt')} --cpu {ncores} --exhaustiveness {exhaustiveness} --num_modes {num_modes}\"\n",
    "\n",
    "    # Creating the ligand pdbqt\n",
    "    with open(os.path.join(wd, f'{Individual.idx}.pdbqt'), 'w') as l:\n",
    "        l.write(Individual.pdbqt)\n",
    "    utils.run(cmd)\n",
    "\n",
    "    # Getting the information\n",
    "    best_energy = utils.VINA_OUT(os.path.join(wd, f'{Individual.idx}_out.pdbqt')).BestEnergy()\n",
    "    # Changing the 3D conformation by the conformation of the binding pose\n",
    "    Individual.pdbqt = ''.join(best_energy.chunk)\n",
    "\n",
    "    # Getting the Scoring function of Vina\n",
    "    Individual.vina_score = best_energy.freeEnergy\n",
    "\n",
    "    # Getting the Scoring function of Vina and assign it to the cost attribute of the Individual\n",
    "    Individual.cost = best_energy.freeEnergy\n",
    "    return Individual\n",
    "\"\"\"\n",
    "# Saving the new fitness\n",
    "with open(os.path.join(tmp_path.name, 'MyCustomFitness.py'), 'w') as f:\n",
    "    f.write(str_module)\n",
    "os.listdir(tmp_path.name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we only need to call MolDrug and specify the new fitness module to use. It is recommendable to output in on folder the results; therefore also use the option `outdir` of the command line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The main job is being executed.\n",
      "Calculating cost function...\n",
      "100%|█████████████████████████████████████████████| 2/2 [00:05<00:00,  2.76s/it]\n",
      "File local_pop.sdf was createad!\n",
      "The main job finished!.\n"
     ]
    }
   ],
   "source": [
    "cwd = os.getcwd()\n",
    "os.chdir(tmp_path.name)\n",
    "! moldrug config_custom_fitness.yml --fitness MyCustomFitness.py --outdir custom_fitness\n",
    "os.listdir(tmp_path.name)\n",
    "os.chdir(cwd)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['local_result.pbz2', 'crem.db.gz', 'custom_fitness', 'x0161.pdbqt', 'config.yml', 'local_pop.sdf', 'crem.db', 'MyCustomFitness.py', 'config_custom_fitness.yml']\n",
      "['local_result.pbz2', 'CustomMolDrugFitness.py', 'local_pop.sdf', '__pycache__']\n"
     ]
    }
   ],
   "source": [
    "print(os.listdir(tmp_path.name))\n",
    "print(os.listdir(os.path.join(tmp_path.name, 'custom_fitness')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see the simulation run successfully! But be aware of a possible issue! Because the cost function used is not actually inside of MolDrug. If we would like to use on the future the binary file `local_result.pbz2`. We have first say to Python where to find the used fitness function. For reproducibility and also peace of mind, MolDrug will generate a new file: `CustomMolDrugFitness.py` this is nothing more than a copy of your implemented python fitness module. This module is the one used for the internal calculation of MolDrug. Therefore, we have to say to python where is `CustomMolDrugFitness` and append to `sys.path` in order to open the binary file. If we do not do that we will get an error. Let see the example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This was the problem: No module named 'CustomMolDrugFitness'. So we have to add to the system path the directory where CustomMolDrugFitness.py is located.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<moldrug.utils.Local at 0x7f47ac62c940>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "try:\n",
    "    whole_result = utils.decompress_pickle(os.path.join(tmp_path.name, 'custom_fitness', 'local_result.pbz2'))\n",
    "except Exception as e:\n",
    "    print(f\"This was the problem: {e}. So we have to add to the system path the directory where CustomMolDrugFitness.py is located.\")   \n",
    "    sys.path.append(os.path.join(tmp_path.name, 'custom_fitness'))\n",
    "    whole_result = utils.decompress_pickle(os.path.join(tmp_path.name, 'custom_fitness', 'local_result.pbz2'))\n",
    "\n",
    "whole_result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<function CustomMolDrugFitness.MyAwesomeCost(Individual: moldrug.utils.Individual, wd: str = '.vina_jobs', vina_executable: str = 'vina', receptor_path: str = None, boxcenter: List[float] = None, boxsize: List[float] = None, exhaustiveness: int = 8, ncores: int = 1, num_modes: int = 1)>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The cost modify cost function\n",
    "whole_result.costfunc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<function moldrug.fitness.Cost(Individual: moldrug.utils.Individual, wd: str = '.vina_jobs', vina_executable: str = 'vina', receptor_path: str = None, boxcenter: List[float] = None, boxsize: List[float] = None, exhaustiveness: int = 8, ncores: int = 1, num_modes: int = 1, desirability: Dict = {'qed': {'w': 1, 'LargerTheBest': {'LowerLimit': 0.1, 'Target': 0.75, 'r': 1}}, 'sa_score': {'w': 1, 'SmallerTheBest': {'Target': 3, 'UpperLimit': 7, 'r': 1}}, 'vina_score': {'w': 1, 'SmallerTheBest': {'Target': -12, 'UpperLimit': -6, 'r': 1}}})>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The original cost function form the first simulation\n",
    "(utils.decompress_pickle(os.path.join(tmp_path.name, 'local_result.pbz2'))).costfunc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You could see the difference between the two cost functions and the error printed in the try-except block. Finally let's take a look to the sys.path; check that at the end it is our path with our new fitness module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['/home/ale/GITLAB/moldrug/docs/notebooks',\n",
       " '/home/ale/anaconda3/envs/moldrug/lib/python39.zip',\n",
       " '/home/ale/anaconda3/envs/moldrug/lib/python3.9',\n",
       " '/home/ale/anaconda3/envs/moldrug/lib/python3.9/lib-dynload',\n",
       " '',\n",
       " '/home/ale/anaconda3/envs/moldrug/lib/python3.9/site-packages',\n",
       " '/home/ale/GITLAB/moldrug',\n",
       " '/tmp/tmp0n8w1rkf/custom_fitness']"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sys.path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create your own cost function\n",
    "\n",
    "Create your own cost function should be straightforward as soon as we understand how MolDrug works with it. So, let's take a look again to the function defined previously: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def MyAwesomeCost(Individual:utils.Individual, wd:str = '.vina_jobs', vina_executable:str = 'vina', receptor_path:str = None, boxcenter:List[float] = None, boxsize:List[float] =None, exhaustiveness:int = 8, ncores:int = 1,  num_modes:int = 1):\n",
    "\n",
    "    # Getting Vina score\n",
    "    cmd = f\"{vina_executable} --receptor {receptor_path} --ligand {os.path.join(wd, f'{Individual.idx}.pdbqt')} \"\\\n",
    "        f\"--center_x {boxcenter[0]} --center_y {boxcenter[1]} --center_z {boxcenter[2]} \"\\\n",
    "        f\"--size_x {boxsize[0]} --size_y {boxsize[1]} --size_z {boxsize[2]} \"\\\n",
    "        f\"--out {os.path.join(wd, f'{Individual.idx}_out.pdbqt')} --cpu {ncores} --exhaustiveness {exhaustiveness} --num_modes {num_modes}\"\n",
    "\n",
    "    # Creating the ligand pdbqt\n",
    "    with open(os.path.join(wd, f'{Individual.idx}.pdbqt'), 'w') as l:\n",
    "        l.write(Individual.pdbqt)\n",
    "    utils.run(cmd)\n",
    "\n",
    "    # Getting the information\n",
    "    best_energy = utils.VINA_OUT(os.path.join(wd, f'{Individual.idx}_out.pdbqt')).BestEnergy()\n",
    "    # Changing the 3D conformation by the conformation of the binding pose\n",
    "    Individual.pdbqt = ''.join(best_energy.chunk)\n",
    "\n",
    "    # Getting the Scoring function of Vina\n",
    "    Individual.vina_score = best_energy.freeEnergy\n",
    "\n",
    "    # Getting the Scoring function of Vina and assign it to the cost attribute of the Individual\n",
    "    Individual.cost = best_energy.freeEnergy\n",
    "    return Individual"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`MyAwesomeCost` function it will only:\n",
    "\n",
    "1. Perform Docking\n",
    "2. Update:\n",
    "    - `pdbqt` attribute of `Individual`\n",
    "    - `cost` attribute of `Individual` (the must important in order to MolDrug works properly)\n",
    "3. Assign `vina_score` attribute to `Individual`\n",
    "\n",
    "Basically the idea is give a number to the attribute `cost` of the passed `Individual` instance. MolDrug works based on the class `moldrug.utils.Individual`; this is just a representation of each chemical structure. Therefore all the custom cost function must work based on this class; in oder words, \"the chemical structure input\" must be coded as an `Individual` and it will be the first positional argument to pass. The rest is completely optional. If the function perform I/O operations you can also provided as keyword `wd`. By default `moldrug.utils.GA` and `moldrug.utils.Local` create a temporal directory (in /tmp) even when you specify some values for it. This could be a problem if your function create big files (bigger than the capacity of your /tmp directory) or if you would like to preserve some of this files. In this case just create an alternative keyword (like `wd_dont_touch`) and work based on it. The other important idea to have in mind is that `moldrug.utils.GA` perform minimization (the best element is the one with lower `cost` attribute). The following is probably the easiest and dummy cost function possible. But I bring it here to show that there is not limitation on how to create your own cost function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "Creating the first population with 20 members:\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 20/20 [00:00<00:00, 1313.31it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initial Population: Best individual: Individual(idx = 6, smiles = CCCCCCC=CCO, cost = 0.04045062715899039)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluating generation 1 / 2:\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 20/20 [00:00<00:00, 606.98it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generation 1: Best Individual: Individual(idx = 22, smiles = C#CC1(O)CCCCCCCC1, cost = 0.03751898267200682).\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluating generation 2 / 2:\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 20/20 [00:00<00:00, 798.06it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generation 2: Best Individual: Individual(idx = 22, smiles = C#CC1(O)CCCCCCCC1, cost = 0.03751898267200682).\n",
      "\n",
      "\n",
      "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n",
      "\n",
      "The simulation finished successfully after 2 generations with a population of 20 individuals. A total number of 60 Individuals were seen during the simulation.\n",
      "Initial Individual: Individual(idx = 0, smiles = CCCC(=O)OCCN, cost = 0.6813211889306405)\n",
      "Final Individual: Individual(idx = 22, smiles = C#CC1(O)CCCCCCCC1, cost = 0.03751898267200682)\n",
      "The cost function droped in 0.6438022062586337 units.\n",
      "\n",
      "=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+\n",
      "\n",
      "'__call__'  92583.43 ms\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "def MyPerfectDummyCost(Individual):\n",
    "    Individual.cost = random.random()\n",
    "    return Individual\n",
    "\n",
    "ga = utils.GA(\n",
    "    Chem.MolFromSmiles('CCCC(=O)OCCN'),\n",
    "    crem_db_path = os.path.join(tmp_path.name, 'crem.db'),\n",
    "    maxiter = 2,\n",
    "    popsize = 20,\n",
    "    costfunc=MyPerfectDummyCost,\n",
    "    costfunc_kwargs={}\n",
    "    )\n",
    "ga(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`moldrug.utils.GA` exports an sdf file that contains the individual of the current generations every `save_pop_every_gen`. For that, it uses the information contained in the `pdbqt` or `pdbqts` attributes (depending if single or multi-receptor is used respectively). Therefore if we use docking we must update the pdbqt attribute with the pose obtained by vina. If we don't do that, we will just get a conformation generated by RDKit that is automatically created when a instance of `utils.Individual` is made it."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.13 ('moldrug')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "580997a69bc0f3991857025e1d93e87ed090e2c1fa4aff0ca8e9824f56baf8cb"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}