# Regular Expression By Example

- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)
- **Date:** -
- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)
- **Note:** This snippit is based on: [http://www.tutorialspoint.com/python/python_reg_expressions.htm](http://www.tutorialspoint.com/python/python_reg_expressions.htm)

In [23]:
# Import regex
import re

In [24]:
# Create some data
text = 'A flock of 120 quick brown foxes jumped over 30 lazy brown, bears.'

### ^ Matches beginning of line.

In [39]:
re.findall('^A', text)

['A']

### $ Matches end of line.

In [38]:
re.findall('bears.$', text)

['bears.']

### . Matches any single character except newline.

In [37]:
re.findall('f..es', text)

['foxes']

### [...] Matches any single character in brackets.

In [51]:
# Find all vowels
re.findall('[aeiou]', text)

['u', 'i', 'o', 'o', 'e', 'u', 'e', 'o', 'e', 'a', 'o', 'e', 'a']

### [# ^...] Matches any single character not in brackets

In [56]:
# Find all characters that are not lower-case vowels
re.findall('[^aeiou]', text)

['A',
 ' ',
 '1',
 '2',
 '0',
 ' ',
 'q',
 'c',
 'k',
 ' ',
 'b',
 'r',
 'w',
 'n',
 ' ',
 'f',
 'x',
 's',
 ' ',
 'j',
 'm',
 'p',
 'd',
 ' ',
 'v',
 'r',
 ' ',
 '3',
 '0',
 ' ',
 'l',
 'z',
 'y',
 ' ',
 'b',
 'r',
 'w',
 'n',
 ',',
 ' ',
 'b',
 'r',
 's',
 '.']

### a | b Matches either a or b.

In [74]:
re.findall('a|A', text)

['A', 'a', 'a']

### (re) Groups regular expressions and remembers matched text.

In [79]:
# Find any instance of 'fox'
re.findall('(foxes)', text)

['foxes']

### \w Matches word characters.

In [116]:
# Break up string into five character blocks
re.findall('\w\w\w\w\w', text)

['quick', 'brown', 'foxes', 'jumpe', 'brown', 'bears']

### \W Matches nonword characters.

In [121]:
re.findall('\W\W', text)

[', ']

### \s Matches whitespace. Equivalent to [\t\n\r\f].

In [120]:
re.findall('\s', text)

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

### \S Matches nonwhitespace.

In [124]:
re.findall('\S\S', text)

['12',
 'qu',
 'ic',
 'br',
 'ow',
 'fo',
 'xe',
 'ju',
 'mp',
 'ed',
 'ov',
 'er',
 '30',
 'la',
 'zy',
 'br',
 'ow',
 'n,',
 'be',
 'ar',
 's.']

### \d Matches digits. Equivalent to [0-9].

In [125]:
re.findall('\d\d\d', text)

['120']

### \D Matches nondigits.

In [127]:
re.findall('\D\D\D\D\D', text)

[' quic',
 'k bro',
 'wn fo',
 'xes j',
 'umped',
 ' over',
 ' lazy',
 ' brow',
 'n, be']

### \A Matches beginning of string.

In [131]:
re.findall('\AA', text)

['A']

### \Z Matches end of string. If a newline exists, it matches just before newline.

In [135]:
re.findall('bears.\Z', text)

['bears.']

### \z Matches end of string.

In [143]:
re.findall('\.\z', text)

[]

In [153]:
re.findall('\b[foxes]', text)

[]

### \n, \t, etc. Matches newlines, carriage returns, tabs, etc.

In [155]:
re.findall('\n', text)

[]

### [Pp]ython Match "Python" or "python"

In [170]:
re.findall('[Ff]oxes', 'foxes Foxes Doxes')

['foxes', 'Foxes']

### [0-9] Match any digit; same as [0123456789]

In [None]:
re.findall('[Ff]oxes', 'foxes Foxes Doxes')

### [a-z] Match any lowercase ASCII letter

In [172]:
re.findall('[a-z]', 'foxes Foxes')

['f', 'o', 'x', 'e', 's', 'o', 'x', 'e', 's']

### [A-Z] Match any uppercase ASCII letter

In [173]:
re.findall('[A-Z]', 'foxes Foxes')

['F']

### [a-zA-Z0-9] Match any of the above

In [175]:
re.findall('[a-zA-Z0-9]', 'foxes Foxes')

['f', 'o', 'x', 'e', 's', 'F', 'o', 'x', 'e', 's']

### [^aeiou] Match anything other than a lowercase vowel

In [176]:
re.findall('[^aeiou]', 'foxes Foxes')

['f', 'x', 's', ' ', 'F', 'x', 's']

### [^0-9] Match anything other than a digit

In [177]:
re.findall('[^0-9]', 'foxes Foxes')

['f', 'o', 'x', 'e', 's', ' ', 'F', 'o', 'x', 'e', 's']

### ruby? Match "rub" or "ruby": the y is optional

In [180]:
re.findall('foxes?', 'foxes Foxes')

['foxes']

### ruby* Match "rub" plus 0 or more ys

In [183]:
re.findall('ox*', 'foxes Foxes')

['ox', 'ox']

### ruby+ Match "rub" plus 1 or more ys

In [184]:
re.findall('ox+', 'foxes Foxes')

['ox', 'ox']

### \d{3} Match exactly 3 digits

In [188]:
re.findall('\d{3}', text)

['120']

### \d{3,} Match 3 or more digits

In [189]:
re.findall('\d{2,}', text)

['120', '30']

### \d{3,5} Match 3, 4, or 5 digits

In [190]:
re.findall('\d{2,3}', text)

['120', '30']

### ^Python Match "Python" at the start of a string or internal line

In [191]:
re.findall('^A', text)

['A']

### Python$ Match "Python" at the end of a string or line

In [192]:
re.findall('bears.$', text)

['bears.']

### \APython Match "Python" at the start of a string

In [196]:
re.findall('\AA', text)

['A']

### Python\Z Match "Python" at the end of a string

In [198]:
re.findall('bears.\Z', text)

['bears.']

### Python(?=!) Match "Python", if followed by an exclamation point

In [204]:
re.findall('bears(?=.)', text)

['bears']

### Python(?!!) Match "Python", if not followed by an exclamation point

In [209]:
re.findall('foxes(?!!)', 'foxes foxes!')

['foxes']

### python|perl Match "python" or "perl"

In [211]:
re.findall('foxes|foxes!', 'foxes foxes!')

['foxes', 'foxes']

### rub(y|le)) Match "ruby" or "ruble"

In [212]:
re.findall('fox(es!)', 'foxes foxes!')

['es!']

### Python(!+|\?) "Python" followed by one or more ! or one ?

In [213]:
re.findall('foxes(!)', 'foxes foxes!')

['!']