{ "metadata": { "name": "", "signature": "sha256:987b43c1bb56cf140f67799a3a4ce77bbaedd25a42f4575362dc8bcf5eb6a844" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Examples of regular expressions" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Using module re" ] }, { "cell_type": "heading", "level": 6, "metadata": {}, "source": [ "Germain SALVATO-VALLVERDU (germain.vallverdu@univ-pau.fr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook is about several practical examples of regular expression.\n", "Online regex tester and debugger : [regex101](https://regex101.com/#python)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import re" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some notes\n", "* quantifier * : Between zero and unlimited times\n", "* quantifier + : Between one and unlimited times,\n", "* quantifier ? : Between zero and one" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Simplest case" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = \" SCF Done: E(RB3LYP) = -599.864175717 A.U. after 16 cycles\"\n", "if re.search(\"^\\sSCF Done:\", a):\n", " print(\"ok\")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ok\n" ] } ], "prompt_number": 59 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Read floating numbers with a regular expression" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = \" CCl 1.70000 1.60000 1.60000 1.70000 1.80000\\n\"\n", "b = \" D1 148.47288 140.38227\\n\"\n", "c = \" D3 -116.60811-112.89609-109.06468-104.97240-100.75211\\n\"\n", "d = \" D1 148.47288\\n\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "pattern = re.compile(\"(\\d+\\.\\d+)\")\n", "res = pattern.search(b)\n", "print(res.group(0))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "148.47288\n" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "pattern = re.compile(\"\\s*([+-]?\\d+\\.\\d+)\")\n", "print(\"a \", pattern.findall(a))\n", "print(\"b \", pattern.findall(b))\n", "print(\"c \", pattern.findall(c))\n", "print(\"d \", pattern.findall(d))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "a ['1.70000', '1.60000', '1.60000', '1.70000', '1.80000']\n", "b ['148.47288', '140.38227']\n", "c ['-116.60811', '-112.89609', '-109.06468', '-104.97240', '-100.75211']\n", "d ['148.47288']\n" ] } ], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "eigentest = \" Eigenvalues -- -547.05077-547.86712-548.29237-548.49474-548.57146 \"\n", "pattern = re.compile(\"\\s*([+-]?\\d+\\.\\d+)\")\n", "pattern.findall(eigentest)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 64, "text": [ "['-547.05077', '-547.86712', '-548.29237', '-548.49474', '-548.57146']" ] } ], "prompt_number": 64 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Looking for a string" ] }, { "cell_type": "code", "collapsed": false, "input": [ "soup = \"\"\" Rotational constants (GHZ): 142.1344479 0.4743210 0.4743149\n", " Standard basis: 6-31+G(d) (6D, 7F)\n", " There are 67 symmetry adapted cartesian basis functions of A symmetry.\n", " There are 67 symmetry adapted basis functions of A symmetry.\n", " 67 basis functions, 132 primitive gaussians, 67 cartesian basis functions\n", " 18 alpha electrons 18 beta electrons\n", " nuclear repulsion energy 45.1237784897 Hartrees.\n", " NAtoms= 6 NActive= 6 NUniq= 6 SFac= 1.00D+00 NAtFMM= 60 NAOKFM=F Big=F\n", " Integral buffers will be 131072 words long. \n", " Raffenetti 2 integral format.\n", " Two-electron integral symmetry is turned on.\n", " One-electron integrals computed using PRISM.\n", " NBasis= 67 RedAO= T EigKep= 7.02D-03 NBF= 67\n", " NBsUse= 67 1.00D-06 EigRej= -1.00D+00 NBFU= 67\n", " Initial guess from the checkpoint file: \"/scratch/183547/Gau-16335.chk\"\n", " B after Tr= 0.000000 0.000000 0.000000\n", " Rot= 1.000000 0.000065 0.000000 0.000022 Ang= 0.01 deg.\n", " ExpMin= 4.38D-02 ExpMax= 2.52D+04 ExpMxC= 3.78D+03 IAcc=2 IRadAn= 4 AccDes= 0.00D+00\n", " Harris functional with IExCor= 402 and IRadAn= 4 diagonalized for initial guess.\n", " HarFok: IExCor= 402 AccDes= 0.00D+00 IRadAn= 4 IDoV= 1 UseB2=F ITyADJ=14\n", " ICtDFT= 3500011 ScaDFX= 1.000000 1.000000 1.000000 1.000000\n", " FoFCou: FMM=F IPFlag= 0 FMFlag= 100000 FMFlg1= 0\n", " NFxFlg= 0 DoJE=T BraDBF=F KetDBF=T FulRan=T\n", " wScrn= 0.000000 ICntrl= 500 IOpCl= 0 I1Cent= 200000004 NGrid= 0\n", " NMat0= 1 NMatS0= 1 NMatT0= 0 NMatD0= 1 NMtDS0= 0 NMtDT0= 0\n", " Petite list used in FoFCou.\n", " Keep R1 ints in memory in canonical form, NReq=3514379.\n", " Requested convergence on RMS density matrix=1.00D-08 within 128 cycles.\n", " Requested convergence on MAX density matrix=1.00D-06.\n", " Requested convergence on energy=1.00D-06.\n", " No special actions if energy rises.\n", " EnCoef did 2 forward-backward iterations\n", " EnCoef did 2 forward-backward iterations\n", " EnCoef did 100 forward-backward iterations\n", " EnCoef did 2 forward-backward iterations\n", " SCF Done: E(RB3LYP) = -599.864175717 A.U. after 16 cycles\n", " NFock= 16 Conv=0.99D-09 -V/T= 2.0046\n", " Calling FoFJK, ICntrl= 2127 FMM=F ISym2X=0 I1Cent= 0 IOpClX= 0 NMat=1 NMatS=1 NMatT=0.\n", " ***** Axes restored to original set *****\n", " -------------------------------------------------------------------\n", " Center Atomic Forces (Hartrees/Bohr)\n", " Number Number X Y Z\n", " -------------------------------------------------------------------\n", " 1 6 -0.000049868 0.001084800 -0.000123630\n", " 2 1 0.000003801 0.000176939 0.000127406\n", " 3 1 0.000110364 0.000154941 -0.000056689\n", " 4 1 -0.000090176 0.000143240 -0.000052491\n", " 5 9 -0.000046616 0.003735651 -0.000031395\n", " 6 17 0.000072495 -0.005295572 0.000136799\n", " -------------------------------------------------------------------\"\"\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "pattern = re.compile(\"^(\\sSCF Done:).*([+-]\\d+.\\d+)\")\n", "for line in soup.split(\"\\n\"):\n", " if pattern.match(line):\n", " m = pattern.match(line)\n", " for i in range(3):\n", " print(i, m.group(i))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0 SCF Done: E(RB3LYP) = -599.864175717\n", "1 SCF Done:\n", "2 -599.864175717\n" ] } ], "prompt_number": 55 }, { "cell_type": "code", "collapsed": false, "input": [ "scan = \"\"\" SCF Done: E(RB3LYP) = -548.021019862 A.U. after 19 cycles\n", " NFock= 19 Conv=0.27D-08 -V/T= 2.0040\n", " Scan completed.\n", "\n", " Summary of the potential surface scan:\n", " N DSO SCF \n", " ---- --------- -----------\n", " 1 1.0000 -547.05077\n", " 2 1.1000 -547.86712\n", " 3 1.2000 -548.29237\n", " 4 1.3000 -548.49474\n", " 5 1.4000 -548.57146\n", " 6 1.5000 -548.57912\n", " 7 1.6000 -548.55048\n", " 8 1.7000 -548.50429\n", " 9 1.8000 -548.45102\n", " 10 1.9000 -548.39653\n", " 11 2.0000 -548.34390\n", " 12 2.1000 -548.29464\n", " 13 2.2000 -548.24935\n", " 14 2.3000 -548.20822\n", " 15 2.4000 -548.17114\n", " 16 2.5000 -548.13792\n", " 17 2.6000 -548.10833\n", " 18 2.7000 -548.08209\n", " 19 2.8000 -548.05895\n", " 20 2.9000 -548.03866\n", " 21 3.0000 -548.02102\n", " ---- --------- -----------\n", "\"\"\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 56 }, { "cell_type": "code", "collapsed": false, "input": [ "scan_patt = re.compile(\"^\\sSummary of the potential surface scan:\")\n", "for line in scan.split(\"\\n\"):\n", " if scan_patt.match(line):\n", " print(line.strip())\n", " " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Summary of the potential surface scan:\n" ] } ], "prompt_number": 57 } ], "metadata": {} } ] }