{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"text-align:center\">\n",
    "    <a href=\"https://nbviewer.jupyter.org/github/twMr7/Python-Machine-Learning/blob/master/04-String_Operations.ipynb\">\n",
    "        Open In Jupyter nbviewer\n",
    "        <img style=\"float: center;\" src=\"https://nbviewer.jupyter.org/static/img/nav_logo.svg\" width=\"120\" />\n",
    "    </a>\n",
    "</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/twMr7/Python-Machine-Learning/blob/master/04-String_Operations.ipynb)\n",
    "\n",
    "\n",
    "# 4. `str` 字串操作\n",
    "\n",
    "`str` 字串是以字元為元素的序列資料結構,字串的語法使用單引號或雙引號包起來,一樣的引號要成對使用。\n",
    "\n",
    "元素內容是按照儲存順序的 index 存取,語法為 **`[ index ]`**。 如果按照由前往後的順序,**第一個元素 index 是0**,依次往後遞增; 如果反過來由後往前存取,**最後一個元素 index 可以用-1**,依次向前遞減。\n",
    "\n",
    "| 字串範例                     | 說明                                          |\n",
    "|------------------------------|-----------------------------------------------|\n",
    "| `''`                         | 空字串                                        |\n",
    "| `\"Python's\"`, `'Python\"s'`   | 字串用單引號或雙引號包起來                    |\n",
    "| `'Python\\'s\\n'`              | 特殊字元前面要加反斜線 `\\`                    |\n",
    "| `r'c:\\Users\\name'`           | 引號前置碼`r`保留字串呈現的原貌(raw string) |\n",
    "| `\"This\" \"is\" \"concatenated\"` | 相鄰的字串會自動被串接起來                    |\n",
    "\n",
    "- 內建函式 `str()` 用來建構新字串,或將物件轉成字串。\n",
    "- 內建函式 `hex()`、`bin()` 分別可以用來將數值轉換成十六進位及二進位的數字字串。\n",
    "- 內建函式 `chr()` 可以用來將字碼轉換成對應的字元,是 `ord()` 的逆向操作。\n",
    "- 內建函式 `len()` 可以用來回傳字串的長度。\n",
    "- 內建函式 `print()` 可以用來將字串內容輸出至畫面。\n",
    "\n",
    "字串類別本身有很多專屬的方法,請參閱官方文件 [Text Sequence Type — str](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)。以下僅列出常用的幾個:\n",
    "- `format()` 用來格式化字串。\n",
    "- `count()` 計算某個子字串出現幾次。\n",
    "- `find()` 找出子字串出現的位置。\n",
    "- `lower()` 返回全部是小寫的複本。\n",
    "- `upper()` 返回全部是大寫的複本。\n",
    "- `replace()` 將子字串全部替換成指定的字串。\n",
    "- `split()` 將字串按照指定的分隔符號切割。\n",
    "- `strip()` 移除字串前後的空白字元。\n",
    "\n",
    "字串也是序列容器,所以一般 immutable 序列容器的共同方法也都可以用,參閱官方文件 [Common Sequence Operations](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations)。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § 字串是不能就地變更的序列容器。\n",
    "\n",
    "字串在建立後,元素內容**不可以**就地變更(immutable)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'str' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-1-18ce7d0609eb>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# 對字串元素指定新值會造成錯誤\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[0ms\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'Pithon'\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0ms\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;34m'y'\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m: 'str' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "# 對字串元素指定新值會造成錯誤\n",
    "s = 'Pithon'\n",
    "s[1] = 'y'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § 透過切割、重複複製、及串接的操作可以產生新字串。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python\n"
     ]
    }
   ],
   "source": [
    "# 使用 [from:to:interval] 的語法切割片段範圍\n",
    "# 使用加號 '+' 串接字串\n",
    "s = s[0] + 'y' + s[2:]\n",
    "print(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python Python Python rocks!\n"
     ]
    }
   ],
   "source": [
    "# 使用乘號 '*' 重複複製序列元素\n",
    "s = 3 * (s + ' ') + 'rocks!'\n",
    "print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § 讀取某位置的索引值不能超過範圍,但片段的指定可以容許超過範圍。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "length of string is 27\n"
     ]
    },
    {
     "ename": "IndexError",
     "evalue": "string index out of range",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mIndexError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-4-b72f16fd5655>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'length of string is '\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mstr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0ms\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[1;31m# 超出索引值範圍會造成錯誤\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[0ms\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m60\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mIndexError\u001b[0m: string index out of range"
     ]
    }
   ],
   "source": [
    "print('length of string is ' + str(len(s)))\n",
    "# 超出索引值範圍會造成錯誤\n",
    "s[60]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'rocks!'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 指定的片段不存在返回空字串\n",
    "s[21:60]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § 字串類別本身有許多好用的方法,可以用來操作子字串"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 3 spaces\n"
     ]
    }
   ],
   "source": [
    "# 數一數字串裡有幾個空格字元\n",
    "print('There are ' + str(s.count(' ')) + ' spaces')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "New comma separated string is: Python,Python,Python,rocks!\n"
     ]
    }
   ],
   "source": [
    "# 將空格字元用逗號取代\n",
    "s = s.replace(' ', ',')\n",
    "print('New comma separated string is: ' + s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Python', 'Python', 'Python', 'rocks!']\n"
     ]
    }
   ],
   "source": [
    "# 將逗號分隔的片段切割成子串\n",
    "print(s.split(sep = ','))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1 用 `format()` 格式化字串\n",
    "使用大括號 `{` `}` 保留需要格式化字串的位置,裡面可以指定欄位名稱及格式規則。(註: 以下格式是簡化過的版本,完整的定義請參閱官方文件 [6.1.3. Format String Syntax](https://docs.python.org/3/library/string.html#format-string-syntax))\n",
    "<p>\n",
    "    <center>格式化語法: <b>`{` `欄位名稱` `:格式規則` `}`</b></center>\n",
    "</p>\n",
    "\n",
    "- `欄位名稱`:可以用數字或文字命名。欄位名字所在的位置,會被指定用某個格式化後的文字取代。\n",
    "- `格式規則`:指定用來從原物件資料轉成字串的規則,如欄位寬、小數點位數、左右對齊等。格式的規則按照以下語法順序指定,中括號不是語法的一部份,只是用來區隔每一種可有可無的格式選項:\n",
    "\n",
    "<p>\n",
    "    <center>格式規則順序: [[`空白填補字元`]`對齊`] [`#`] [`0`] [`寬度`] [`.小數點精度`] [`格式類型`]</center>\n",
    "</p>\n",
    "\n",
    "- `對齊`\n",
    "\n",
    "| 對齊選項  |  說明     |\n",
    "|-----------|-----------|\n",
    "| **`<`**   | 靠左對齊  |\n",
    "| **`>`**   | 靠右對齊  |\n",
    "| **`^`**   | 置中      |\n",
    "\n",
    "- `#`:指定顯示數值進制的前綴符號。\n",
    "- `0`:針對數值格式,指定在空白處補 `'0'`。\n",
    "- `寬度`:指定欄位寬。\n",
    "- `.小數點精度`:指定小數點後面幾位數\n",
    "- `格式類型`\n",
    "\n",
    "| 格式類型  |  說明     |\n",
    "|-----------|-----------|\n",
    "| **`s`**   | 字串      |\n",
    "| **`c`**   | 字元      |\n",
    "| **`b`**   | 二進位    |\n",
    "| **`d`**   | 十進制整數    |\n",
    "| **`x`**   | 小寫字母十六進制整數  |\n",
    "| **`X`**   | 大寫字母十六進制整數  |\n",
    "| **`f`**   | 浮點數,預設小數點以下6位  |\n",
    "| **`e`**   | 小寫字母科學記號  |\n",
    "| **`E`**   | 大寫字母科學記號  |\n",
    "| **`%`**   | 將浮點數轉換為百分比  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § `format` 指定欄位名稱"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586\n"
     ]
    }
   ],
   "source": [
    "import math\n",
    "# 用數字指定欄位名字,數字對應的是format()裡的參數順序\n",
    "print('Math constants: π = {0}, e = {1}, τ = {2}'.format(math.pi, math.e, math.tau))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Math constants: π = 3.141592653589793,\n",
      " τ = 6.283185307179586, \tτ 其實就是 2π = 2 * 3.141592653589793\n"
     ]
    }
   ],
   "source": [
    "# 欄位可以重複指定使用\n",
    "print('Math constants: π = {0},\\n τ = {1}, \\tτ 其實就是 2π = 2 * {0}'.format(math.pi, math.tau))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586\n"
     ]
    }
   ],
   "source": [
    "# 數字欄位的名字可以不用照順序\n",
    "print('Math constants: π = {2}, e = {0}, τ = {1}'.format(math.e, math.tau, math.pi))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 如果全部欄位都按照順序,一個蘿蔔一個坑,那欄位裡的數字可以省略\n",
    "'Math constants: π = {}, e = {}, τ = {}'.format(math.pi, math.e, math.tau)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Math constants: π = 3.141592653589793, e = 2.718281828459045, τ = 6.283185307179586\n"
     ]
    }
   ],
   "source": [
    "# 用文字指定欄位名字,而因為format()裡也明確指定參數的名字,所以參數列出來的順序就不重要了\n",
    "print('Math constants: π = {Pi}, e = {e}, τ = {Tau}'.format(e=math.e, Pi=math.pi, Tau=math.tau))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Math constants: π = 3.141592653589793, τ = 6.283185307179586, τ 其實就是 2π = 2 * 3.141592653589793\n"
     ]
    }
   ],
   "source": [
    "# 欄位可以重複指定使用\n",
    "print('Math constants: π = {Pi}, τ = {Tau}, τ 其實就是 2π = 2 * {Pi}'.format(Pi=math.pi, Tau=math.tau))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Math constants: 3.141592653589793, 2.718281828459045, 6.283185307179586\n"
     ]
    }
   ],
   "source": [
    "m = [math.pi, math.e, math.tau]\n",
    "# list 卸載會塞到對應順序的欄位\n",
    "print('Math constants: {}, {}, {}'.format(*m))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Math constants: π = 3.141592653589793, τ = 6.283185307179586, e = 2.718281828459045\n",
      "Math constants: pi, e, tau\n"
     ]
    }
   ],
   "source": [
    "n = {'pi': math.pi, 'e': math.e, 'tau': math.tau}\n",
    "# dict 卸載會塞value到對應key的欄位\n",
    "print('Math constants: π = {pi}, τ = {tau}, e = {e}'.format(**n))\n",
    "# 或是只卸載 key\n",
    "print('Math constants: {}, {}, {}'.format(*n))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § `format` 指定寬度、對齊與空白填補"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "left aligned------------------\n"
     ]
    }
   ],
   "source": [
    "# 靠左,空白處補 '-'\n",
    "print('{:-<30}'.format('left aligned'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                 right aligned\n"
     ]
    }
   ],
   "source": [
    "# 靠右\n",
    "print('{:>30}'.format('right aligned'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "***********centered***********\n"
     ]
    }
   ],
   "source": [
    "# 置中,空白處補 '*'\n",
    "print('{:*^30}'.format('centered'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § `format` 指定數值格式與小數精度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "int: 42;  hex: 2a;  oct: 52;  bin: 101010\n"
     ]
    }
   ],
   "source": [
    "# 沒有進制前綴符號\n",
    "print('int: {0:d};  hex: {0:x};  oct: {0:o};  bin: {0:b}'.format(42))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "int: 42;  hex: 0x2a;  oct: 0o52;  bin: 0b101010\n"
     ]
    }
   ],
   "source": [
    "# 加上進制前綴符號\n",
    "print('int: {0:d};  hex: {0:#x};  oct: {0:#o};  bin: {0:#b}'.format(42))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "constant π = 3.1400000000000000000000000000\n"
     ]
    }
   ],
   "source": [
    "# 改變小數精度\n",
    "print('constant π = {0:<030.2f}'.format(math.pi))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "constant π = 00000000000000000000003.14e+00\n"
     ]
    }
   ],
   "source": [
    "# 改變數值顯示格式\n",
    "print('constant π = {0:>030.2e}'.format(math.pi))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "π% = 314.159%\n"
     ]
    }
   ],
   "source": [
    "# 用百分比顯示\n",
    "print('π% = {:.3%}'.format(math.pi))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}