{ "cells": [ { "cell_type": "markdown", "id": "831ff0f5", "metadata": {}, "source": [ "# 模块re:正则表达式" ] }, { "cell_type": "markdown", "id": "08450111", "metadata": {}, "source": [ "正则表达式是用来匹配字符串或者子串的一种模式,匹配的字符串可以很具体,也可以很一般化。\n", "\n", "Python 标准库提供了 re 模块:" ] }, { "cell_type": "code", "execution_count": 1, "id": "2d5fa2b0", "metadata": {}, "outputs": [], "source": [ "import re" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b96a8bdb", "metadata": {}, "source": [ "## re.match()函数\n", "\n", "`re.match()`函数对字符串的开头进行匹配,返回第一个匹配对应的Match对象,否则返回None:" ] }, { "cell_type": "code", "execution_count": 2, "id": "8c0b625f", "metadata": {}, "outputs": [], "source": [ "pat = \"\\d+\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "a8cd96f8", "metadata": {}, "outputs": [], "source": [ "s = \"abc123abc123456\"" ] }, { "cell_type": "code", "execution_count": 4, "id": "7167a2af", "metadata": {}, "outputs": [], "source": [ "re.match(pat, s)" ] }, { "cell_type": "markdown", "id": "9bd305a9", "metadata": {}, "source": [ "由于字符串不是字母开头,没有匹配结果。" ] }, { "cell_type": "markdown", "id": "1e81588e", "metadata": {}, "source": [ "## re.search()函数" ] }, { "attachments": {}, "cell_type": "markdown", "id": "659d7c15", "metadata": {}, "source": [ "与`re.match()`函数不同,`re.search()`函数会用正则表达式去匹配字符串中所有的子串,如果找到,返回第一个匹配对应的Match对象,否则返回None:" ] }, { "cell_type": "code", "execution_count": 5, "id": "fa28fa59", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<re.Match object; span=(3, 6), match='123'>" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "re.search(pat, s)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b17b6f30", "metadata": {}, "source": [ "可以调用返回的Match对象的.group()方法查看匹配到的字符串:" ] }, { "cell_type": "code", "execution_count": 6, "id": "364cb723", "metadata": {}, "outputs": [], "source": [ "m = re.search(pat, s)" ] }, { "cell_type": "code", "execution_count": 7, "id": "33a4dc68", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'123'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m.group(0)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4b6759f4", "metadata": {}, "source": [ "## re.split()函数\n", "\n", "`re.split()`使用指定的正则表达式作为分隔符,对字符串进行分割,其用法为:" ] }, { "cell_type": "code", "execution_count": 8, "id": "f979ae49", "metadata": {}, "outputs": [], "source": [ "pat = \" +\"" ] }, { "cell_type": "code", "execution_count": 9, "id": "b1f9ddd8", "metadata": {}, "outputs": [], "source": [ "s = \"a b c d e\"" ] }, { "cell_type": "code", "execution_count": 10, "id": "717cd7f6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['a', 'b', 'c', 'd', 'e']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "re.split(pat, s)" ] }, { "cell_type": "markdown", "id": "fa005fcf", "metadata": {}, "source": [ "## re.sub()函数" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7e1b4b8e", "metadata": {}, "source": [ "`re.sub()`函数对字符串中正则表达式匹配的部分进行替换:" ] }, { "cell_type": "code", "execution_count": 11, "id": "ee543515", "metadata": {}, "outputs": [], "source": [ "pat = \" +\"" ] }, { "cell_type": "code", "execution_count": 12, "id": "e2210ec5", "metadata": {}, "outputs": [], "source": [ "replace = \";\"" ] }, { "cell_type": "code", "execution_count": 13, "id": "4174ab3d", "metadata": {}, "outputs": [], "source": [ "s = \"a b c d e\"" ] }, { "cell_type": "code", "execution_count": 14, "id": "c60255b9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a;b;c;d;e'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "re.sub(pat, replace, s)" ] }, { "cell_type": "markdown", "id": "bc36b819", "metadata": {}, "source": [ "## 正则表达式规则\n", "\n", "正则表达式由一些普通字符和一些元字符组成。普通字符包括大小写的字母和数字,而元字符则具有特殊的含义:\n", "\n", "\n", "子表达式 | 匹配内容\n", "-- | -- \n", ".\t|匹配除了换行符之外的内容\n", "\\w\t|匹配所有字母和数字字符\n", "\\d\t|匹配所有数字,相当于 [0-9]\n", "\\s\t|匹配空白,相当于 [\\t\\n\\t\\f\\v]\n", "\\W,\\D,\\S\t|匹配对应小写字母形式的补\n", "[...]\t|表示可以匹配的集合,支持范围表示如 a-z, 0-9 等\n", "(...)\t|表示作为一个整体进行匹配\n", "¦\t|表示逻辑或\n", "^\t|表示匹配后面的子表达式的补\n", "*\t|表示匹配前面的子表达式 0 次或更多次\n", "+\t|表示匹配前面的子表达式 1 次或更多次\n", "?\t|表示匹配前面的子表达式 0 次或 1 次\n", "{m}\t|表示匹配前面的子表达式 m 次\n", "{m,}\t|表示匹配前面的子表达式至少 m 次\n", "{m,n}\t|表示匹配前面的子表达式至少 m 次,至多 n 次\n", "\n", "例如:\n", "\n", "- ca*t 匹配: ct, cat, caaaat, ...\n", "- ab\\d|ac\\d 匹配: ab1, ac9, ...\n", "- ([^a-q]bd) 匹配: rbd, 5bd, ..." ] }, { "cell_type": "code", "execution_count": null, "id": "66c90ed1", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.10" } }, "nbformat": 4, "nbformat_minor": 5 }