{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Regular Expression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. 일반적인 문자는 그 문자 자체로 매칭이 된다. 특수 문자는 다음과 같다.\n", "\n", " \\ 특수 문자 Escape (or start a sequence)\n", " . 줄바꿈을 제외한 모든 문자 (참고 re.DOTALL)\n", " ^ 문자열의 시작 (참고 re.MULTILINE)\n", " $ 문자열의 마지막 (re.MULTILINE)\n", " [] 문자 집합\n", " | 또는\n", " () Capture 그룹 생성 (우선순위 지정)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. 집합기호 '``[``' 문자 이후에 사용할 수 있는 특수 문자는 다음과 같다. \n", "\n", " ] 집합의 끝\n", " - 범위 (예 a-c 는 a, b 또는 c를 의미)\n", " ^ Negate를 의미" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quantifiers (append '``?``' for non-greedy):\n", "\n", " {m} Exactly m repetitions\n", " {m,n} From m (default 0) to n (default infinity)\n", " * 0 or more. Same as {,}\n", " + 1 or more. Same as {1,}\n", " ? 0 or 1. Same as {,1}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Special sequences::\n", "\n", " \\A Start of string\n", " \\b Match empty string at word (\\w+) boundary\n", " \\B Match empty string not at word boundary\n", " \\d Digit\n", " \\D Non-digit\n", " \\s Whitespace - [ \\t\\n\\r\\f\\v]과 동일 (참고: LOCALE, UNICODE)\n", " \\S Non-whitespace\n", " \\w Alphanumeric: [0-9a-zA-Z_], see LOCALE\n", " \\W Non-alphanumeric\n", " \\Z End of string\n", " \\g Match prev named or numbered group,\n", " '<' & '>' are literal, e.g. \\g<0>\n", " or \\g (not \\g0 or \\gname)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Special character escapes are much like those already escaped in Python string\n", "literals. Hence regex '``\\n``' is same as regex '``\\\\n``'::\n", "\n", " \\a ASCII Bell (BEL)\n", " \\f ASCII Formfeed\n", " \\n ASCII Linefeed\n", " \\r ASCII Carriage return\n", " \\t ASCII Tab\n", " \\v ASCII Vertical tab\n", " \\\\ A single backslash\n", " \\xHH Two digit hexadecimal character goes here\n", " \\OOO Three digit octal char (or just use an\n", " initial zero, e.g. \\0, \\09)\n", " \\DD Decimal number 1 to 99, match\n", " previous numbered group\n", "\n", "Extensions. Do not cause grouping, except '``P``'::\n", "\n", " (?iLmsux) Match empty string, sets re.X flags\n", " (?:...) Non-capturing version of regular parens\n", " (?P...) Create a named capturing group.\n", " (?P=name) Match whatever matched prev named group\n", " (?#...) A comment; ignored.\n", " (?=...) Lookahead assertion, match without consuming\n", " (?!...) Negative lookahead assertion\n", " (?<=...) Lookbehind assertion, match if preceded\n", " (? RegexObject\n", " match(pattern, string[, flags]) -> MatchObject\n", " search(pattner, string[, flags]) -> MatchObject\n", " findall(pattern, string[, flags]) -> list of strings\n", " finditer(pattern, string[, flags]) -> iter of MatchObjects\n", " split(pattern, string[, maxsplit, flags]) -> list of strings\n", " sub(pattern, repl, string[, count, flags]) -> string\n", " subn(pattern, repl, string[, count, flags]) -> (string, int)\n", " escape(string) -> string\n", " purge() # the re cache\n", "\n", "RegexObjects (returned from ``compile()``)::\n", "\n", " .match(string[, pos, endpos]) -> MatchObject\n", " .search(string[, pos, endpos]) -> MatchObject\n", " .findall(string[, pos, endpos]) -> list of strings\n", " .finditer(string[, pos, endpos]) -> iter of MatchObjects\n", " .split(string[, maxsplit]) -> list of strings\n", " .sub(repl, string[, count]) -> string\n", " .subn(repl, string[, count]) -> (string, int)\n", " .flags # int, Passed to compile()\n", " .groups # int, Number of capturing groups\n", " .groupindex # {}, Maps group names to ints\n", " .pattern # string, Passed to compile()\n", "\n", "MatchObjects (returned from ``match()`` and ``search()``)::\n", "\n", " .expand(template) -> string, Backslash & group expansion\n", " .group([group1...]) -> string or tuple of strings, 1 per arg\n", " .groups([default]) -> tuple of all groups, non-matching=default\n", " .groupdict([default]) -> {}, Named groups, non-matching=default\n", " .start([group]) -> int, Start/end of substring match by group\n", " .end([group]) -> int, Group defaults to 0, the whole match\n", " .span([group]) -> tuple (match.start(group), match.end(group))\n", " .pos int, Passed to search() or match()\n", " .endpos int, \"\n", " .lastindex int, Index of last matched capturing group\n", " .lastgroup string, Name of last matched capturing group\n", " .re regex, As passed to search() or match()\n", " .string string, \"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }