--- name: "autodock-vina-docking" description: "Molecular docking with AutoDock Vina (Python API). Receptor/ligand prep (Meeko + RDKit), grid box, docking, pose and binding energy analysis, and batch virtual screening." license: "CC-BY-4.0" --- # AutoDock Vina Molecular Docking ## Overview AutoDock Vina is one of the fastest and most widely used open-source molecular docking engines for predicting protein–ligand binding modes and affinities. This skill covers the full Python-based pipeline: receptor preparation from PDB, ligand preparation from SMILES/SDF via Meeko and RDKit, search box definition, docking execution, pose analysis, and batch virtual screening for hit identification. ## When to Use - Predicting binding poses of small molecules to a protein target - Estimating relative binding affinities (kcal/mol) for ligand ranking - Virtual screening of compound libraries against a target receptor - Validating docking protocols by re-docking co-crystallized ligands - Preparing docking inputs from SMILES strings without intermediate files - Comparing binding modes of analogs in a structure-activity relationship study - Generating starting poses for molecular dynamics simulations - Use **DiffDock** instead for blind docking when the binding site is unknown; use **GNINA** as an alternative with CNN scoring ## Prerequisites - **Python packages**: `vina`, `meeko`, `rdkit`, `prody` (for PDB fetch), `py3Dmol` (for visualization) - **External tools**: `ADFR Suite` (provides `prepare_receptor` for PDBQT conversion) — download from https://ccsb.scripps.edu/adfr/downloads/ - **Data requirements**: Protein structure (PDB file or PDB ID), ligand(s) as SMILES, SDF, or MOL2 - **Environment**: Python 3.8+, Linux or macOS recommended ```bash pip install vina meeko rdkit-pypi prody py3Dmol # ADFR Suite must be installed separately for prepare_receptor ``` ## Workflow ### Step 1: Fetch and Prepare the Receptor Download the protein structure, remove water/heteroatoms, and convert to PDBQT format. ```python import prody import subprocess from pathlib import Path # Download PDB structure (example: HIV-1 protease, PDB 1HPV) pdb_id = "1HPV" pdb_file = f"{pdb_id}.pdb" prody.fetchPDB(pdb_id, compressed=False) # Extract protein chain only (remove water and ligands) structure = prody.parsePDB(pdb_file) protein = structure.select("protein") prody.writePDB(f"{pdb_id}_protein.pdb", protein) print(f"Protein atoms: {protein.numAtoms()}") # Convert to PDBQT using ADFR Suite's prepare_receptor receptor_pdbqt = f"{pdb_id}_receptor.pdbqt" subprocess.run([ "prepare_receptor", "-r", f"{pdb_id}_protein.pdb", "-o", receptor_pdbqt, "-A", "hydrogens", # add hydrogens ], check=True) print(f"Receptor PDBQT: {receptor_pdbqt}") ``` ### Step 2: Identify the Binding Site Define the docking search box centered on the known binding site or co-crystallized ligand. ```python import numpy as np # Option A: Center on co-crystallized ligand coordinates structure = prody.parsePDB(pdb_file) ligand = structure.select("hetero and not water and not ion") if ligand is not None: center = ligand.getCoords().mean(axis=0) # Box size = ligand extent + padding extent = ligand.getCoords().max(axis=0) - ligand.getCoords().min(axis=0) box_size = extent + 10.0 # 10 Å padding on each side print(f"Binding site center: {center}") print(f"Box size: {box_size}") else: # Option B: Manual coordinates (from literature or visual inspection) center = np.array([15.0, 54.0, 17.0]) box_size = np.array([25.0, 25.0, 25.0]) print(f"Using manual box: center={center}, size={box_size}") ``` ### Step 3: Prepare the Ligand from SMILES Use RDKit for 3D coordinate generation and Meeko for PDBQT conversion. ```python from rdkit import Chem from rdkit.Chem import AllChem from meeko import MoleculePreparation, PDBQTWriterLegacy # Define ligand (example: Indinavir, HIV protease inhibitor) smiles = "CC(C)(C)NC(=O)[C@@H]1CN(CCc2ccccc2)C[C@H]1O" mol_name = "indinavir_analog" # Generate 3D coordinates with RDKit mol = Chem.MolFromSmiles(smiles) mol = Chem.AddHs(mol) AllChem.EmbedMolecule(mol, randomSeed=42) AllChem.MMFFOptimizeMolecule(mol) # Convert to PDBQT using Meeko preparator = MoleculePreparation() mol_setups = preparator.prepare(mol) # Write PDBQT string (for Vina API) or file pdbqt_string = PDBQTWriterLegacy.write_string(mol_setups[0])[0] ligand_pdbqt = f"{mol_name}.pdbqt" with open(ligand_pdbqt, "w") as f: f.write(pdbqt_string) print(f"Ligand PDBQT: {ligand_pdbqt} ({mol.GetNumAtoms()} atoms)") ``` ### Step 4: Run Docking Initialize AutoDock Vina, configure the search space, and execute docking. ```python from vina import Vina # Initialize Vina v = Vina(sf_name="vina", cpu=4) # Load receptor and ligand v.set_receptor(receptor_pdbqt) v.set_ligand_from_file(ligand_pdbqt) # Define search space v.compute_vina_maps( center=center.tolist(), box_size=box_size.tolist(), ) # Run docking v.dock( exhaustiveness=32, # Higher = more thorough (default 8) n_poses=10, # Number of output poses ) # Write output poses output_file = f"{mol_name}_docked.pdbqt" v.write_poses(output_file, n_poses=10, overwrite=True) print(f"Docking complete. Poses written to {output_file}") ``` ### Step 5: Analyze Docking Results Extract binding energies and RMSD values from the docked poses using Vina's built-in API. ```python # Method A: Vina built-in energies (most reliable) energies = v.energies(n_poses=10) # Each row: [total, inter, intra, torsions, intra_best_pose] print(f"{'Pose':<6} {'Total (kcal/mol)':<18} {'Inter':<10} {'Intra':<10}") print("-" * 44) for i, e in enumerate(energies): print(f"{i+1:<6} {e[0]:<18.2f} {e[1]:<10.2f} {e[2]:<10.2f}") print(f"\nBest pose: {energies[0][0]:.2f} kcal/mol") # Method B: Parse PDBQT output with Meeko (for RDKit conversion) from meeko import PDBQTMolecule, RDKitMolCreate pdbqt_mol = PDBQTMolecule.from_file(output_file) best_pose_rdkit = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol)[0] print(f"Best pose converted to RDKit mol: {best_pose_rdkit.GetNumAtoms()} atoms") ``` ### Step 6: Visualize Docking Results Generate a 3D visualization of the docked complex using py3Dmol. ```python import py3Dmol # Load receptor and best docked pose with open(f"{pdb_id}_protein.pdb") as f: receptor_pdb = f.read() with open(output_file) as f: docked_pdbqt = f.read() # Create 3D viewer view = py3Dmol.view(width=800, height=600) view.addModel(receptor_pdb, "pdb") view.setStyle({"model": 0}, {"cartoon": {"color": "lightgrey"}}) # Add docked ligand (first model only) first_model = docked_pdbqt.split("ENDMDL")[0] + "ENDMDL" view.addModel(first_model, "pdb") view.setStyle({"model": 1}, {"stick": {"colorscheme": "greenCarbon"}}) # Zoom to ligand view.zoomTo({"model": 1}) view.show() # In Jupyter: displays interactive 3D view # To save: view.png() or view.write_html("docking_result.html") print("3D visualization rendered") ``` ### Step 7: Batch Virtual Screening Screen a library of compounds against the same receptor. ```python import pandas as pd # Define compound library compounds = pd.DataFrame({ "name": ["cpd_001", "cpd_002", "cpd_003", "cpd_004", "cpd_005"], "smiles": [ "CC(=O)Oc1ccccc1C(=O)O", # Aspirin "CC(C)Cc1ccc(cc1)C(C)C(=O)O", # Ibuprofen "OC(=O)c1ccccc1O", # Salicylic acid "CC12CCC3C(CCC4CC(=O)CCC34C)C1CCC2O", # Testosterone "c1ccc2c(c1)cc1ccc3cccc4ccc2c1c34", # Pyrene ], }) # Screen all compounds results = [] for _, row in compounds.iterrows(): try: # Prepare ligand mol = Chem.MolFromSmiles(row["smiles"]) mol = Chem.AddHs(mol) AllChem.EmbedMolecule(mol, randomSeed=42) AllChem.MMFFOptimizeMolecule(mol) mol_setups = preparator.prepare(mol) pdbqt_str = PDBQTWriterLegacy.write_string(mol_setups[0])[0] # Dock v_screen = Vina(sf_name="vina", cpu=2) v_screen.set_receptor(receptor_pdbqt) v_screen.set_ligand_from_string(pdbqt_str) v_screen.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist()) v_screen.dock(exhaustiveness=16, n_poses=1) energy = v_screen.energies(n_poses=1)[0][0] results.append({"name": row["name"], "smiles": row["smiles"], "energy_kcal": energy}) print(f" {row['name']}: {energy:.2f} kcal/mol") except Exception as e: results.append({"name": row["name"], "smiles": row["smiles"], "energy_kcal": None}) print(f" {row['name']}: FAILED ({e})") # Rank by binding energy results_df = pd.DataFrame(results).sort_values("energy_kcal") results_df.to_csv("screening_results.csv", index=False) print(f"\nTop hits:\n{results_df.head()}") ``` ### Step 8: Save and Export ```python import os os.makedirs("results", exist_ok=True) # Save summary results_df.to_csv("results/screening_results.csv", index=False) # Save best poses for top hits for _, row in results_df.head(3).iterrows(): print(f"Top hit: {row['name']} → {row['energy_kcal']:.2f} kcal/mol") print("Virtual screening complete. Results in results/screening_results.csv") ``` ## Key Parameters | Parameter | Default | Range / Options | Effect | |-----------|---------|-----------------|--------| | `exhaustiveness` | `8` | `8`-`128` | Search thoroughness; 32+ recommended for publication | | `n_poses` | `9` | `1`-`20` | Number of output binding poses | | `energy_range` | `3.0` | `1.0`-`5.0` | Max energy difference (kcal/mol) from best pose to include | | `sf_name` | `"vina"` | `"vina"`, `"ad4"`, `"vinardo"` | Scoring function choice | | `cpu` | all | `1`-`N` | Number of CPU cores for docking | | `box_size` (xyz) | — | `15`-`30` Å per side | Search space dimensions; must enclose binding site + 5-10Å padding | | `center` (xyz) | — | Binding site coordinates | Center of the search box | | `randomSeed` (RDKit) | random | any int | Reproducible 3D conformer generation | | `padding` (box) | `10.0` Å | `5.0`-`15.0` Å | Extra space around known ligand for box definition | ## Common Recipes ### Recipe: Re-docking Validation (Cognate Docking) When to use: validating your protocol by re-docking the co-crystallized ligand and checking RMSD < 2.0 Å. ```python from rdkit.Chem import AllChem, rdMolAlign # Extract co-crystallized ligand from PDB ref_ligand = Chem.MolFromPDBFile(f"{pdb_id}_ligand.pdb", removeHs=False) # Dock the same ligand # ... (use steps 3-4 above with the extracted ligand) # Calculate RMSD between docked pose and crystal structure rmsd = AllChem.GetBestRMS(ref_ligand, best_pose_rdkit) print(f"Re-docking RMSD: {rmsd:.2f} Å") print(f"Validation: {'PASS' if rmsd < 2.0 else 'FAIL'} (threshold: 2.0 Å)") ``` ### Recipe: Flexible Receptor Docking When to use: key binding-site residues need conformational freedom (e.g., induced fit). ```python # Prepare receptor with flexible sidechains (using ADFR Suite) # prepare_receptor -r protein.pdb -o rigid.pdbqt -A hydrogens # prepare_flexreceptor -r rigid.pdbqt -s "A:ARG8,A:ASP25,A:ILE50" v_flex = Vina(sf_name="vina", cpu=4) v_flex.set_receptor("rigid.pdbqt", "flex.pdbqt") # rigid + flexible parts v_flex.set_ligand_from_file(ligand_pdbqt) v_flex.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist()) v_flex.dock(exhaustiveness=64, n_poses=10) v_flex.write_poses("docked_flex.pdbqt", n_poses=5, overwrite=True) ``` ### Recipe: Scoring Only (No Docking) When to use: evaluating the binding energy of a pre-positioned ligand without running a full search. ```python v_score = Vina(sf_name="vina") v_score.set_receptor(receptor_pdbqt) v_score.set_ligand_from_file("pre_positioned_ligand.pdbqt") v_score.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist()) # Score current pose energy = v_score.score() print(f"Score: {energy[0]:.2f} kcal/mol") # Local minimization energy_min = v_score.optimize() print(f"After local optimization: {energy_min[0]:.2f} kcal/mol") v_score.write_pose("minimized.pdbqt", overwrite=True) ``` ### Recipe: Multiple Scoring Functions Comparison When to use: consensus scoring to increase confidence in docking results. ```python scoring_results = {} for sf in ["vina", "vinardo", "ad4"]: v_sf = Vina(sf_name=sf, cpu=2) v_sf.set_receptor(receptor_pdbqt) v_sf.set_ligand_from_file(ligand_pdbqt) v_sf.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist()) v_sf.dock(exhaustiveness=16, n_poses=1) scoring_results[sf] = v_sf.energies(n_poses=1)[0][0] for sf, energy in scoring_results.items(): print(f" {sf}: {energy:.2f} kcal/mol") ``` ## Expected Outputs - `{name}_docked.pdbqt` — Docked poses in PDBQT format with binding energies in header - `results/screening_results.csv` — Virtual screening results: compound name, SMILES, binding energy (kcal/mol) - `{pdb_id}_receptor.pdbqt` — Prepared receptor in PDBQT format - Figures: 3D docking visualization (interactive py3Dmol or static image) - Console output: ranked binding energies and RMSD values per pose ## Troubleshooting | Problem | Cause | Solution | |---------|-------|----------| | `prepare_receptor` not found | ADFR Suite not in PATH | Add ADFR Suite bin to `$PATH` or use full path | | `RuntimeError: receptor not set` | Forgot to call `set_receptor` | Call `v.set_receptor(pdbqt_file)` before docking | | Very positive docking scores (>0) | Ligand outside box or bad geometry | Check box center/size covers binding site; verify 3D coords | | All poses identical | `exhaustiveness` too low | Increase to 32-64 for reliable sampling | | `Meeko MoleculePreparation` error | Missing hydrogens on input mol | Always call `Chem.AddHs(mol)` before Meeko | | RMSD > 2Å in re-docking | Box too small or wrong center | Expand box by 5Å; verify center on co-crystallized ligand | | `EmbedMolecule` returns -1 | RDKit failed to generate 3D coords | Use `AllChem.EmbedMolecule(mol, maxAttempts=1000)` or try `useRandomCoords=True` | | Slow screening (>1min/compound) | High exhaustiveness + large box | Reduce `exhaustiveness` to 8-16 for screening; narrow box | | `PDBQTWriterLegacy` not found | Old Meeko version | `pip install meeko>=0.5` — API changed from `write_pdbqt_string` | | Inconsistent energies across runs | Non-deterministic search | Set `seed` parameter in `v.dock(seed=42)` for reproducibility | ## Bundled Resources This skill includes reference files for deeper lookup. Read these on demand. ### references/receptor_preparation_guide.md Detailed guide for receptor preparation: handling missing residues, protonation states (pH-dependent), metal ions, cofactors, and multi-chain complexes. Decision tree for when to use PDB2PQR, PROPKA, or manual protonation. ### references/scoring_functions_comparison.md Comparison of Vina, Vinardo, and AD4 scoring functions: accuracy benchmarks, speed trade-offs, and recommendations by target class (kinase, protease, GPCR, nuclear receptor). ## References - [AutoDock Vina documentation](https://autodock-vina.readthedocs.io/) — Official docs and Python API (Apache-2.0) - [Eberhardt et al. (2021)](https://doi.org/10.1021/acs.jcim.1c00203) — "AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings", *J Chem Inf Model* - [Meeko: Preparation of small molecules for AutoDock](https://github.com/forlilab/Meeko) — Ligand PDBQT preparation from RDKit (LGPL) - [ADFR Suite](https://ccsb.scripps.edu/adfr/downloads/) — Receptor preparation tools from Scripps (free academic license) - [Forli et al. (2016)](https://doi.org/10.1038/nprot.2016.051) — "Computational protein-ligand docking and virtual drug screening with the AutoDock suite", *Nat Protoc* - [RDKit documentation](https://www.rdkit.org/docs/) — Cheminformatics toolkit for ligand handling (BSD)