{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Regex Examples\n",
    "This is a small notebook to review some examples of regular expressions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python Regex Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re # This just imports python's regex package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In python we can use `re.search(`*`pattern`*`,`*`str`*`)` to search for a given pattern in a string\n",
    "\n",
    "The results of this call will return a list of *Match* objects dictating where the pattern was found throughout the provided *str*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = 'world'\n",
    "string = 'hello world'\n",
    "\n",
    "match = re.search(pattern, string)\n",
    "print(match)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note how the **match** object is within `span=(6,11)`. That is because the word 'world' begins at index 6 and ends at index 11 \n",
    "\n",
    "We can however extract more granular information by using the `start`, `end`, and `group()` methods of the **match** object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Match at index {match.start()}-{match.end()}\")\n",
    "print(\"Full match: {match.group(0)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regex Special Characters\n",
    "\n",
    "In many instances we won't know the exact expressions, so we'll need to leverage the special patterns in regular expressions to find our matches.\n",
    "\n",
    "We can also find (in an iterable format) all instances using `re.finditer(<pattern>, <string>)`, which can be looped through to see each result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "string_2 = 'There are 3 blind mice, 5 little pigs, 12 angry men, 500 hats of Bartholomew Cubbins'\n",
    "pattern_2 = '[0-9]+'\n",
    "\n",
    "for match in re.finditer(pattern_2, string_2):\n",
    "    print(match.group(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regex Groups\n",
    "You may have noticed that each **match** is indexed by a *group* - `match.group(0)`\n",
    "\n",
    "This is because you can actually group regex patterns to return indexable values by using ().\n",
    "                     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "string_3 = \"yesterday was January 26th, today is January 27th, next week it will be Februray 2nd\"\n",
    "pattern_3 = '([a-zA-Z]+) ([0-9]+)(?:st|[nr]d|th)'\n",
    "for match in re.finditer(pattern_3, string_3):\n",
    "    print(f\"Month: {match.group(1)} - Day: {match.group(2)}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9-final"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}