{ "cells": [ { "cell_type": "markdown", "id": "3ee4c336", "metadata": {}, "source": [ "One of the problems with using the results from quantum chemical calculations with the RDKit is that typical QM output formats just include atoms and their positions: since the calculations don't need bond orders, they don't show up in the output.\n", "The problem of assigning correct bond orders to the atoms in a molecule based solely on atomic positions (and the overall charge on the molecule) is a non-trivial one, and we've never had a good answer in the RDKit.\n", "\n", "A few years ago Jan Jensen and his group published `xyz2mol`, an open-source, RDKit-based solution to this problem written in Python: https://github.com/jensengroup/xyz2mol. During this year's Google Summer of Code, Sreya Gogineni, did a C++ port of the Python code and integrated it into the RDKit core for the 2022.09 release. \n", "Here's the [project description](https://summerofcode.withgoogle.com/programs/2022/projects/ugO4HoEX) and here's Sreya's [\"final report\"](https://github.com/rdkit/rdkit/pull/5557) which is also the PR where we merged her changes into the RDKit core.\n", "\n", "This post was originally just going to be a quick introduction to how to use that code. However, since I was having fun with it, I went ahead and did some testing on a bunch of 3D structures from QM9." ] }, { "cell_type": "code", "execution_count": 1, "id": "1327e840", "metadata": { "ExecuteTime": { "end_time": "2022-12-18T06:40:39.336147Z", "start_time": "2022-12-18T06:40:39.250244Z" } }, "outputs": [ { "data": { "text/plain": [ "'2022.09.1'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from rdkit import Chem\n", "from rdkit.Chem import Draw\n", "from rdkit.Chem.Draw import IPythonConsole\n", "IPythonConsole.ipython_3d = True\n", "import rdkit\n", "rdkit.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using rdDetermineBonds" ] }, { "cell_type": "markdown", "id": "08fc07a8", "metadata": {}, "source": [ "To get some testing files, I downloaded some structures from the [QM9 dataset](https://figshare.com/collections/Quantum_chemistry_structures_and_properties_of_134_kilo_molecules/978904). Here's what those look like: " ] }, { "cell_type": "code", "execution_count": 2, "id": "fe6c18ba", "metadata": { "ExecuteTime": { "end_time": "2022-12-18T06:40:39.446374Z", "start_time": "2022-12-18T06:40:39.337014Z" }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19\r\n", "gdb 107313\t2.67642\t1.59305\t1.14971\t3.3443\t81.12\t-0.2359\t-0.0506\t0.1853\t1106.1507\t0.159794\t-385.918216\t-385.909962\t-385.909018\t-385.950934\t31.892\t\r\n", "C\t 0.0645055554\t 1.4843171326\t 0.3723315122\t-0.379845\r\n", "C\t-0.001915467\t 0.0516984051\t-0.1729357038\t-0.277772\r\n", "C\t-1.4001624807\t-0.5304376801\t-0.1487075838\t 0.233551\r\n", "C\t-1.9909553844\t-1.0379429662\t 1.1601625215\t-0.307513\r\n", "C\t-1.7182271444\t-1.9962888369\t 0.0152208454\t-0.158294\r\n", "C\t-2.905416575\t-2.2627721735\t-0.8489696604\t-0.065615\r\n", "C\t-3.3347497536\t-1.1525477782\t-1.4661914556\t-0.191532\r\n", "C\t-2.4353480328\t-0.0114763902\t-1.1388331005\t 0.336742\r\n", "O\t-2.4853518918\t 1.1093228549\t-1.595794586\t-0.340877\r\n", "H\t 1.0876615149\t 1.8699729825\t 0.3263064541\t 0.106173\r\n", "H\t-0.5836533924\t 2.142053447\t-0.2101812431\t 0.152698\r\n", "H\t-0.2626541721\t 1.5210712578\t 1.4170099662\t 0.108877\r\n", "H\t 0.3710335675\t 0.0343653078\t-1.2041622283\t 0.107526\r\n", "H\t 0.662791197\t-0.6008609283\t 0.4071642198\t 0.091428\r\n", "H\t-1.3142481944\t-1.0584396879\t 2.0096409673\t 0.122076\r\n", "H\t-3.0252533013\t-0.8199288984\t 1.4038330438\t 0.125348\r\n", "H\t-0.9412329872\t-2.7436632889\t 0.1278385726\t 0.101319\r\n", "H\t-3.37329736\t-3.2394582332\t-0.9072563749\t 0.118603\r\n", "H\t-4.2004762777\t-1.0463782959\t-2.1061677066\t 0.117109\r\n", "88.1998\t132.6788\t204.3282\t214.9254\t283.2151\t320.8425\t354.2003\t451.4683\t471.7609\t631.475\t661.0085\t735.6532\t750.9738\t778.4916\t839.8756\t849.5692\t876.8931\t897.9162\t967.6602\t980.1789\t992.0568\t1026.4327\t1049.1839\t1072.0692\t1075.099\t1111.0662\t1128.3577\t1138.5104\t1233.4573\t1265.5356\t1338.63\t1352.428\t1365.7919\t1389.2764\t1408.176\t1477.5353\t1487.9659\t1495.4859\t1514.6888\t1630.2685\t1799.8365\t3021.0204\t3038.158\t3057.8354\t3101.9102\t3122.4464\t3138.2864\t3177.3129\t3193.7776\t3214.9261\t3232.806\r\n", "CCC12CC1C=CC2=O\tCC[C@]12C[C@H]1C=CC2=O\t\r\n", "InChI=1S/C8H10O/c1-2-8-5-6(8)3-4-7(8)9/h3-4,6H,2,5H2,1H3\tInChI=1S/C8H10O/c1-2-8-5-6(8)3-4-7(8)9/h3-4,6H,2,5H2,1H3/t6-,8+/m1/s1\r\n" ] } ], "source": [ "!cat ../data/dsgdb9nsd_107313.xyz" ] }, { "cell_type": "markdown", "id": "b49d3be8", "metadata": {}, "source": [ "Sreya also added an XYZ file format parser to the RDKit, but these files include a bunch of additional information that we need to strip out. Here's the code for that:" ] }, { "cell_type": "code", "execution_count": 3, "id": "8b3d7230", "metadata": { "ExecuteTime": { "end_time": "2022-12-18T06:40:39.465614Z", "start_time": "2022-12-18T06:40:39.452498Z" } }, "outputs": [], "source": [ "# the XYZ files from QM9 aren't really XYZ... clean them up:\n", "def cleanup_qm9_xyz(fname):\n", " ind = open(fname).readlines()\n", " nAts = int(ind[0])\n", " # There are two smiles in the data: the one from GDB and the one assigned from the\n", " # 3D coordinates in the QM9 paper using OpenBabel (I think).\n", " gdb_smi,relax_smi = ind[-2].split()[:2]\n", " ind[1] = '\\n'\n", " ind = ind[:nAts+2]\n", " for i in range(2,nAts+2):\n", " l = ind[i]\n", " l = l.split('\\t')\n", " l.pop(-1)\n", " ind[i] = '\\t'.join(l)+'\\n'\n", " ind = ''.join(ind)\n", " return ind,gdb_smi,relax_smi" ] }, { "cell_type": "code", "execution_count": 4, "id": "8ca4107b", "metadata": { "ExecuteTime": { "end_time": "2022-12-18T06:40:39.547937Z", "start_time": "2022-12-18T06:40:39.470993Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19\n", "\n", "C\t 0.0645055554\t 1.4843171326\t 0.3723315122\n", "C\t-0.001915467\t 0.0516984051\t-0.1729357038\n", "C\t-1.4001624807\t-0.5304376801\t-0.1487075838\n", "C\t-1.9909553844\t-1.0379429662\t 1.1601625215\n", "C\t-1.7182271444\t-1.9962888369\t 0.0152208454\n", "C\t-2.905416575\t-2.2627721735\t-0.8489696604\n", "C\t-3.3347497536\t-1.1525477782\t-1.4661914556\n", "C\t-2.4353480328\t-0.0114763902\t-1.1388331005\n", "O\t-2.4853518918\t 1.1093228549\t-1.595794586\n", "H\t 1.0876615149\t 1.8699729825\t 0.3263064541\n", "H\t-0.5836533924\t 2.142053447\t-0.2101812431\n", "H\t-0.2626541721\t 1.5210712578\t 1.4170099662\n", "H\t 0.3710335675\t 0.0343653078\t-1.2041622283\n", "H\t 0.662791197\t-0.6008609283\t 0.4071642198\n", "H\t-1.3142481944\t-1.0584396879\t 2.0096409673\n", "H\t-3.0252533013\t-0.8199288984\t 1.4038330438\n", "H\t-0.9412329872\t-2.7436632889\t 0.1278385726\n", "H\t-3.37329736\t-3.2394582332\t-0.9072563749\n", "H\t-4.2004762777\t-1.0463782959\t-2.1061677066\n", "\n" ] } ], "source": [ "ind,gdb_smi,relax_smi = cleanup_qm9_xyz('../data/dsgdb9nsd_107313.xyz')\n", "print(ind)\n" ] }, { "cell_type": "markdown", "id": "a315eee3", "metadata": {}, "source": [ "And now we can construct a molecule:" ] }, { "cell_type": "code", "execution_count": 5, "id": "b088bbbb", "metadata": { "ExecuteTime": { "end_time": "2022-12-18T06:40:39.622861Z", "start_time": "2022-12-18T06:40:39.552852Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19 0\n" ] } ], "source": [ "raw_mol = Chem.MolFromXYZBlock(ind)\n", "print(raw_mol.GetNumAtoms(),raw_mol.GetNumBonds())" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2022-12-18T06:40:39.692056Z", "start_time": "2022-12-18T06:40:39.625322Z" } }, "outputs": [], "source": [ "import py3Dmol\n", "def draw_with_spheres(mol):\n", " v = py3Dmol.view(width=300,height=300)\n", " IPythonConsole.addMolToView(mol,v)\n", " v.zoomTo()\n", " v.setStyle({'sphere':{'radius':0.3},'stick':{'radius':0.2}});\n", " v.show()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2022-12-18T06:40:39.774814Z", "start_time": "2022-12-18T06:40:39.693887Z" } }, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n",
" jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n",
" jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n",
" jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n",
" jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n",
" jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n",
" jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n",
" jupyter labextension install jupyterlab_3dmol