{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<p style=\"text-align:center\">\n", " <a href=\"https://nbviewer.jupyter.org/github/twMr7/Python-Machine-Learning/blob/master/04-String_Operations.ipynb\">\n", " Open In Jupyter nbviewer\n", " <img style=\"float: center;\" src=\"https://nbviewer.jupyter.org/static/img/nav_logo.svg\" width=\"120\" />\n", " </a>\n", "</p>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/twMr7/Python-Machine-Learning/blob/master/04-String_Operations.ipynb)\n", "\n", "\n", "# 4. `str` 字串操作\n", "\n", "`str` 字串是以字元為元素的序列資料結構,字串的語法使用單引號或雙引號包起來,一樣的引號要成對使用。\n", "\n", "元素內容是按照儲存順序的 index 存取,語法為 **`[ index ]`**。 如果按照由前往後的順序,**第一個元素 index 是0**,依次往後遞增; 如果反過來由後往前存取,**最後一個元素 index 可以用-1**,依次向前遞減。\n", "\n", "| 字串範例 | 說明 |\n", "|------------------------------|-----------------------------------------------|\n", "| `''` | 空字串 |\n", "| `\"Python's\"`, `'Python\"s'` | 字串用單引號或雙引號包起來 |\n", "| `'Python\\'s\\n'` | 特殊字元前面要加反斜線 `\\` |\n", "| `r'c:\\Users\\name'` | 引號前置碼`r`保留字串呈現的原貌(raw string) |\n", "| `\"This\" \"is\" \"concatenated\"` | 相鄰的字串會自動被串接起來 |\n", "\n", "- 內建函式 `str()` 用來建構新字串,或將物件轉成字串。\n", "- 內建函式 `hex()`、`bin()` 分別可以用來將數值轉換成十六進位及二進位的數字字串。\n", "- 內建函式 `chr()` 可以用來將字碼轉換成對應的字元,是 `ord()` 的逆向操作。\n", "- 內建函式 `len()` 可以用來回傳字串的長度。\n", "- 內建函式 `print()` 可以用來將字串內容輸出至畫面。\n", "\n", "字串類別本身有很多專屬的方法,請參閱官方文件 [Text Sequence Type — str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)。以下僅列出常用的幾個:\n", "- `format()` 用來格式化字串。\n", "- `count()` 計算某個子字串出現幾次。\n", "- `find()` 找出子字串出現的位置。\n", "- `lower()` 返回全部是小寫的複本。\n", "- `upper()` 返回全部是大寫的複本。\n", "- `replace()` 將子字串全部替換成指定的字串。\n", "- `split()` 將字串按照指定的分隔符號切割。\n", "- `strip()` 移除字串前後的空白字元。\n", "\n", "字串也是序列容器,所以一般 immutable 序列容器的共同方法也都可以用,參閱官方文件 [Common Sequence Operations](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations)。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 字串是不能就地變更的序列容器。\n", "\n", "字串在建立後,元素內容**不可以**就地變更(immutable)。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'str' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m<ipython-input-1-18ce7d0609eb>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# 對字串元素指定新值會造成錯誤\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0ms\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'Pithon'\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0ms\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'y'\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mTypeError\u001b[0m: 'str' object does not support item assignment" ] } ], "source": [ "# 對字串元素指定新值會造成錯誤\n", "s = 'Pithon'\n", "s[1] = 'y'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 透過切割、重複複製、及串接的操作可以產生新字串。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python\n" ] } ], "source": [ "# 使用 [from:to:interval] 的語法切割片段範圍\n", "# 使用加號 '+' 串接字串\n", "s = s[0] + 'y' + s[2:]\n", "print(s)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python Python Python rocks!\n" ] } ], "source": [ "# 使用乘號 '*' 重複複製序列元素\n", "s = 3 * (s + ' ') + 'rocks!'\n", "print(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 讀取某位置的索引值不能超過範圍,但片段的指定可以容許超過範圍。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "length of string is 27\n" ] }, { "ename": "IndexError", "evalue": "string index out of range", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m<ipython-input-4-b72f16fd5655>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'length of string is '\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mstr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0ms\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;31m# 超出索引值範圍會造成錯誤\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0ms\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m60\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mIndexError\u001b[0m: string index out of range" ] } ], "source": [ "print('length of string is ' + str(len(s)))\n", "# 超出索引值範圍會造成錯誤\n", "s[60]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'rocks!'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 指定的片段不存在返回空字串\n", "s[21:60]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 字串類別本身有許多好用的方法,可以用來操作子字串" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 3 spaces\n" ] } ], "source": [ "# 數一數字串裡有幾個空格字元\n", "print('There are ' + str(s.count(' ')) + ' spaces')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New comma separated string is: Python,Python,Python,rocks!\n" ] } ], "source": [ "# 將空格字元用逗號取代\n", "s = s.replace(' ', ',')\n", "print('New comma separated string is: ' + s)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Python', 'Python', 'Python', 'rocks!']\n" ] } ], "source": [ "# 將逗號分隔的片段切割成子串\n", "print(s.split(sep = ','))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.1 用 `format()` 格式化字串\n", "使用大括號 `{` `}` 保留需要格式化字串的位置,裡面可以指定欄位名稱及格式規則。(註: 以下格式是簡化過的版本,完整的定義請參閱官方文件 [6.1.3. Format String Syntax](https://docs.python.org/3/library/string.html#format-string-syntax))\n", "<p>\n", " <center>格式化語法: <b>`{` `欄位名稱` `:格式規則` `}`</b></center>\n", "</p>\n", "\n", "- `欄位名稱`:可以用數字或文字命名。欄位名字所在的位置,會被指定用某個格式化後的文字取代。\n", "- `格式規則`:指定用來從原物件資料轉成字串的規則,如欄位寬、小數點位數、左右對齊等。格式的規則按照以下語法順序指定,中括號不是語法的一部份,只是用來區隔每一種可有可無的格式選項:\n", "\n", "<p>\n", " <center>格式規則順序: [[`空白填補字元`]`對齊`] [`#`] [`0`] [`寬度`] [`.小數點精度`] [`格式類型`]</center>\n", "</p>\n", "\n", "- `對齊`\n", "\n", "| 對齊選項 | 說明 |\n", "|-----------|-----------|\n", "| **`<`** | 靠左對齊 |\n", "| **`>`** | 靠右對齊 |\n", "| **`^`** | 置中 |\n", "\n", "- `#`:指定顯示數值進制的前綴符號。\n", "- `0`:針對數值格式,指定在空白處補 `'0'`。\n", "- `寬度`:指定欄位寬。\n", "- `.小數點精度`:指定小數點後面幾位數\n", "- `格式類型`\n", "\n", "| 格式類型 | 說明 |\n", "|-----------|-----------|\n", "| **`s`** | 字串 |\n", "| **`c`** | 字元 |\n", "| **`b`** | 二進位 |\n", "| **`d`** | 十進制整數 |\n", "| **`x`** | 小寫字母十六進制整數 |\n", "| **`X`** | 大寫字母十六進制整數 |\n", "| **`f`** | 浮點數,預設小數點以下6位 |\n", "| **`e`** | 小寫字母科學記號 |\n", "| **`E`** | 大寫字母科學記號 |\n", "| **`%`** | 將浮點數轉換為百分比 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § `format` 指定欄位名稱" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586\n" ] } ], "source": [ "import math\n", "# 用數字指定欄位名字,數字對應的是format()裡的參數順序\n", "print('Math constants: π = {0}, e = {1}, τ = {2}'.format(math.pi, math.e, math.tau))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Math constants: π = 3.141592653589793,\n", " τ = 6.283185307179586, \tτ 其實就是 2π = 2 * 3.141592653589793\n" ] } ], "source": [ "# 欄位可以重複指定使用\n", "print('Math constants: π = {0},\\n τ = {1}, \\tτ 其實就是 2π = 2 * {0}'.format(math.pi, math.tau))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586\n" ] } ], "source": [ "# 數字欄位的名字可以不用照順序\n", "print('Math constants: π = {2}, e = {0}, τ = {1}'.format(math.e, math.tau, math.pi))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 如果全部欄位都按照順序,一個蘿蔔一個坑,那欄位裡的數字可以省略\n", "'Math constants: π = {}, e = {}, τ = {}'.format(math.pi, math.e, math.tau)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586\n" ] } ], "source": [ "# 用文字指定欄位名字,而因為format()裡也明確指定參數的名字,所以參數列出來的順序就不重要了\n", "print('Math constants: π = {Pi}, e = {e}, τ = {Tau}'.format(e=math.e, Pi=math.pi, Tau=math.tau))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Math constants: π = 3.141592653589793, τ = 6.283185307179586, τ 其實就是 2π = 2 * 3.141592653589793\n" ] } ], "source": [ "# 欄位可以重複指定使用\n", "print('Math constants: π = {Pi}, τ = {Tau}, τ 其實就是 2π = 2 * {Pi}'.format(Pi=math.pi, Tau=math.tau))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Math constants: 3.141592653589793, 2.718281828459045, 6.283185307179586\n" ] } ], "source": [ "m = [math.pi, math.e, math.tau]\n", "# list 卸載會塞到對應順序的欄位\n", "print('Math constants: {}, {}, {}'.format(*m))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Math constants: π = 3.141592653589793, τ = 6.283185307179586, e = 2.718281828459045\n", "Math constants: pi, e, tau\n" ] } ], "source": [ "n = {'pi': math.pi, 'e': math.e, 'tau': math.tau}\n", "# dict 卸載會塞value到對應key的欄位\n", "print('Math constants: π = {pi}, τ = {tau}, e = {e}'.format(**n))\n", "# 或是只卸載 key\n", "print('Math constants: {}, {}, {}'.format(*n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § `format` 指定寬度、對齊與空白填補" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "left aligned------------------\n" ] } ], "source": [ "# 靠左,空白處補 '-'\n", "print('{:-<30}'.format('left aligned'))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " right aligned\n" ] } ], "source": [ "# 靠右\n", "print('{:>30}'.format('right aligned'))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "***********centered***********\n" ] } ], "source": [ "# 置中,空白處補 '*'\n", "print('{:*^30}'.format('centered'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § `format` 指定數值格式與小數精度" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int: 42; hex: 2a; oct: 52; bin: 101010\n" ] } ], "source": [ "# 沒有進制前綴符號\n", "print('int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}'.format(42))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int: 42; hex: 0x2a; oct: 0o52; bin: 0b101010\n" ] } ], "source": [ "# 加上進制前綴符號\n", "print('int: {0:d}; hex: {0:#x}; oct: {0:#o}; bin: {0:#b}'.format(42))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "constant π = 3.1400000000000000000000000000\n" ] } ], "source": [ "# 改變小數精度\n", "print('constant π = {0:<030.2f}'.format(math.pi))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "constant π = 00000000000000000000003.14e+00\n" ] } ], "source": [ "# 改變數值顯示格式\n", "print('constant π = {0:>030.2e}'.format(math.pi))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "π% = 314.159%\n" ] } ], "source": [ "# 用百分比顯示\n", "print('π% = {:.3%}'.format(math.pi))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 2 }