{ "metadata": { "name": "", "signature": "sha256:699f8ce0d2324e3748110ce3627961ec89b6535573bb7f3c9a5b5c766fbc4b99" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "import re" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "#Sample Msg\n", "sample=\"This is roger. my contact no is 415-555-4242,& 415-555-4243, 416-555-4244\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "phoneNumRegex = re.compile(r'\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d')\n", "\n", "#Search() return first match\n", "mob=phoneNumRegex.search(sample)\n", "print mob.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "415-555-4242\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "mob=phoneNumRegex.findall(sample)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#find all return All matches\n", "mob=phoneNumRegex.findall(sample)\n", "print mob" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['415-555-4242', '415-555-4243', '416-555-4244']\n" ] } ], "prompt_number": 15 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Understanding Group" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "let say we want to categrise our result , like above phone number consist of area code & actual number . using group we can find it & use it easily." ] }, { "cell_type": "code", "collapsed": false, "input": [ "phoneNumRegex = re.compile(r'(\\d\\d\\d)-(\\d\\d\\d-\\d\\d\\d\\d)')\n", "mob=phoneNumRegex.findall(sample)\n", "print mob\n", "#for find all our result is set of tuple & each touple consist of 2 group based on our regression" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[('415', '555-4242'), ('415', '555-4243'), ('416', '555-4244')]\n" ] } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "mo=phoneNumRegex.search(sample)\n", "print 'area code: ',mo.group(1)\n", "print 'phone no :',mo.group(2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "area code: 415\n", "phone no : 555-4242\n" ] } ], "prompt_number": 26 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Matching Multiple Groups with the Pipe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The regular expression r'Batman|Tina Fey'\n", "will match either 'Batman' or 'Tina Fey' .\n", "When both Batman and Tina Fey occur in the searched string, the first\n", "occurrence of matching text will be returned as the Match object." ] }, { "cell_type": "code", "collapsed": false, "input": [ "heroRegex = re.compile (r'Batman|Tina Fey')\n", "mo1 = heroRegex.search('Batman and Tina Fey.')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batman\n" ] } ], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "mo2 = heroRegex.search('Tina Fey and Batman.')\n", "print mo2.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Tina Fey\n" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "mo3=heroRegex.findall('Tina Fey and Batman.')\n", "print mo3" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['Tina Fey', 'Batman']\n" ] } ], "prompt_number": 31 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "More Examples of pipe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "batRegex = re.compile(r'Bat(man|mobile|copter|bat)')\n", "mo = batRegex.search('Batmobile lost a wheel')\n", "print mo.group()\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batmobile\n" ] } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "mo = batRegex.search('Batbat lost a wheel')\n", "print mo.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batbat\n" ] } ], "prompt_number": 35 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Optional Matching with the Question Mark" ] }, { "cell_type": "code", "collapsed": false, "input": [ "batRegex = re.compile(r'Bat(wo)?man')\n", "#here \"wo\" is optional\n", "\n", "mo1 = batRegex.search('The Adventures of Batman')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batman\n" ] } ], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "mo1 = batRegex.search('The Adventures of Batwoman')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batwoman\n" ] } ], "prompt_number": 37 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Matching Zero or More with the Star" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The * (called the star or asterisk) means \u201cmatch zero or more\u201d\u2014the group\n", "that precedes the star can occur any number of times in the text" ] }, { "cell_type": "code", "collapsed": false, "input": [ "batRegex = re.compile(r'Bat(wo)*man')\n", "\n", "mo1 = batRegex.search('The Adventures of Batman')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batman\n" ] } ], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "mo1 = batRegex.search('The Adventures of Batwoman')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batwoman\n" ] } ], "prompt_number": 39 }, { "cell_type": "code", "collapsed": false, "input": [ "mo1 = batRegex.search('The Adventures of Batwowowoman')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Batwowowoman\n" ] } ], "prompt_number": 40 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Note: While * means \u201cmatch zero or more,\u201d the + (or plus) means \u201cmatch one or\n", "more.\n" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Matching Specific Repetitions with Curly Brackets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regex (Ha){3} will match the string 'HaHaHa
\n", "egex (Ha){3,5} will match 'HaHaHa' , 'HaHaHaHa' , and 'HaHaHaHaHa' ." ] }, { "cell_type": "code", "collapsed": false, "input": [ "haRegex = re.compile(r'(Ha){3}')\n", "mo1 = haRegex.search('HaHaHa')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "HaHaHa\n" ] } ], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "haRegex = re.compile(r'(Ha){3,5}')\n", "mo1 = haRegex.search('HaHaHaHa')\n", "print mo1.group()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "HaHaHaHa\n" ] } ], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }