{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<p style=\"text-align:center\">\n", " <a href=\"https://nbviewer.jupyter.org/github/twMr7/Python-Machine-Learning/blob/master/05-List_Operations.ipynb\">\n", " Open In Jupyter nbviewer\n", " <img style=\"float: center;\" src=\"https://nbviewer.jupyter.org/static/img/nav_logo.svg\" width=\"120\" />\n", " </a>\n", "</p>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/twMr7/Python-Machine-Learning/blob/master/05-List_Operations.ipynb)\n", "\n", "# 5. `list` 序列容器操作\n", "\n", "`list` 是存放序列性資料的結構。語法使用逗號 `,` 分隔資料元素,用中括號(square brackets)`[` `]` 成對包住所有元素。 `list` 可以是巢狀多維度的,同一個 `list` 中也可以存放異質類型資料,不過一般使用情境還是以同類型的資料較適合。\n", "\n", "元素內容是按照儲存順序的 index 存取,語法為 **`[ index ]`**。 如果按照由前往後的順序,**第一個元素 index 是0**,依次往後遞增; 如果反過來由後往前存取,**最後一個元素 index 可以用-1**,依次向前遞減。 \n", "\n", "| List 範例 | 說明 |\n", "|------------------------------------|-----------------------------------------------|\n", "| `[]` | 空的 list |\n", "| `[5, 6, 7, 8]` | 四個數字元素的 list |\n", "| `['code', [42, 3.1415], 1.23, {}]` | 巢狀、異質的 list |\n", "\n", "- 內建函式 `list()` 可以用來建構一個新的`list`物件。\n", "- 內建函式 `len()` 可以用來回傳容器裡的元素個數。\n", "- 內建函式 `min()` 可以用來回傳容器中最小的元素。\n", "- 內建函式 `max()` 可以用來回傳容器中最大的元素。\n", "\n", "`list` 是序列容器,可以使用序列容器的共通方法(參閱官方文件 [Common Sequence Operations](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations))。 此外,`list` 在建立後,元素內容**可以**就地變更(mutable),標準函式庫另外還有提供可以就地變更的方法,請參閱官方文件 [Mutable Sequence Types](https://docs.python.org/3/library/stdtypes.html#typesseq-mutable),以下是幾個常用的方法:\n", "- `append()` 追加一個元素在容器後面。\n", "- `extend()` 追加一系列的元素在容器後面。\n", "- `del L[m:n]` 刪除範圍內的元素,與 `L[m:n] = []` 相同。\n", "- `copy()` 產生一份複製,與 `L[:]` 相同。\n", "- `clear()` 移除所有的元素,與 `del L[:]` 相同。\n", "- `insert()` 插入元素到某個位置。\n", "- `remove()` 移除第一個出現的指定元素值。\n", "- `pop()` 回傳某個位置的元素值,並從容器中移除。\n", "- `sort()` 對元素就地排序。\n", "- `reverse()` 就地反轉元素順序。\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § `list` 是可以 In-Place 就地變更的序列容器\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [123, [4, 56], 'One-Two-Three', 7.89].\n" ] } ], "source": [ "L = [123, [4, 56], 'One-Two-Three', 7.89]\n", "print('L = {}.'.format(L))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [135, [4, 56], 'One-Two-Three', 7.89], 1st element is changed.\n" ] } ], "source": [ "# a += b 等同於 a = a + b\n", "L[0] += 12\n", "print('L = {}, 1st element is changed.'.format(L))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [135, [4, 22], 'One-Two-Three', 7.89], 2nd element is changed.\n" ] } ], "source": [ "# a -= b 等同於 a = a - b\n", "L[1][1] -= 34\n", "print('L = {}, 2nd element is changed.'.format(L))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [135, [4, 22], 'One-Two-ThreeOne-Two-Three', 7.89], 3rd element is changed.\n" ] } ], "source": [ "# a *= b 等同於 a = a * b\n", "L[2] *= 2\n", "print('L = {}, 3rd element is changed.'.format(L))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [135, [4, 22], 'One-Two-ThreeOne-Two-Three', 14.089285714285712], 4th element is changed.\n" ] } ], "source": [ "# a /= b 等同於 a = a / b\n", "L[3] /= 0.56\n", "print('L = {}, 4th element is changed.'.format(L))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § Slices 片段也可以就地變更" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 7 elements.\n" ] } ], "source": [ "letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']\n", "print('letters = {}, {} elements.'.format(letters, len(letters)))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "letters = ['a', 'b', 99, 100, 101, 'f', 'g'], 7 elements.\n" ] } ], "source": [ "# 將序號 2 到 4 的元素分別用新的數值取代\n", "letters[2:5] = [ord('c'), ord('d'), ord('e')]\n", "print('letters = {}, {} elements.'.format(letters, len(letters)))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "letters = ['a', 'b', 'X', 'f', 'g'], 5 elements.\n" ] } ], "source": [ "# 將序號 2 到 4 的元素全部用一個新的值取代\n", "letters[2:5] = 'X'\n", "print('letters = {}, {} elements.'.format(letters, len(letters)))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "letters = ['a', 'b'], 2 elements.\n" ] } ], "source": [ "# 將序號 2 到 4 的元素刪除\n", "letters[2:5] = []\n", "print('letters = {}, {} elements.'.format(letters, len(letters)))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "letters = [], 0 elements.\n" ] } ], "source": [ "# 可以用空的 [] 來清空 list\n", "letters[:] = []\n", "print('letters = {}, {} elements.'.format(letters, len(letters)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 讀取的索引值還是不能超過範圍,但寫入和 Slice 的索引範圍可以。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "list index out of range", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m<ipython-input-11-27568a3683fb>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# 讀取超過範圍的索引會出現 IndexError\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mL\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m4\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mIndexError\u001b[0m: list index out of range" ] } ], "source": [ "# 讀取超過範圍的索引會出現 IndexError\n", "print(L[4])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [135, [4, 22], 'One-Two-ThreeOne-Two-Three', 14.089285714285712].\n", "L reversed = [14.089285714285712, 'One-Two-ThreeOne-Two-Three', [4, 22], 135].\n" ] } ], "source": [ "# 但是 Slice 範圍超過只會被默默忽略\n", "print('L = {}.'.format(L[:10]))\n", "print('L reversed = {}.'.format(L[-1:-9:-1]))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L extended = [135, [4, 22], 'One-Two-ThreeOne-Two-Three', 14.089285714285712, 0, 1], now length = 6.\n" ] } ], "source": [ "# 寫入新的 list 物件到索引結束的後面,可以直接追加元素進去\n", "L[len(L):] = list(range(2))\n", "print('L extended = {}, now length = {}.'.format(L, len(L)))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L extended = [135, [4, 22], 'One-Two-ThreeOne-Two-Three', 14.089285714285712, 0, 1, 3, 4], now length = 8.\n" ] } ], "source": [ "# 寫入的 slice 索引超過結束的後面,一樣被忽略\n", "L[len(L) + 2:] = [3, 4]\n", "print('L extended = {}, now length = {}.'.format(L, len(L)))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L extended = [135, [4, 22], 'One-Two-ThreeOne-Two-Three', 14.089285714285712, 0, 1, 2, 3, 4, 5], now length = 10.\n" ] } ], "source": [ "# 寫入的 slice 索引橫跨原本有和沒有的範圍,則原本有的會被覆蓋,原本沒有的範圍會新增\n", "L[-2:] = [2, 3, 4, 5]\n", "print('L extended = {}, now length = {}.'.format(L, len(L)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 使用 `list` 物件的方法 (Methods) 來操作\n", "針對新增、刪除、插入等 in-place 變更的操作,原則上會建議使用 `list` 物件提供的方法,這樣會使得程式碼可讀性比較高。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [], length = 0\n" ] } ], "source": [ "# 刪除所有元素\n", "L.clear()\n", "print('L = {}, length = {}'.format(L, len(L)))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [0, 1, 2], length = 3\n" ] } ], "source": [ "# 新增一系列元素\n", "L.extend(range(3))\n", "print('L = {}, length = {}'.format(L, len(L)))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [0, 1, 2, [0, 1, 2]], length = 4\n" ] } ], "source": [ "# 新增一個元素,注意和 extend() 方法有甚麼差異\n", "L.append(list(range(3)))\n", "print('L = {}, length = {}'.format(L, len(L)))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L = [0, 1, 2], length = 3\n" ] } ], "source": [ "# 刪除最後一個元素\n", "del L[-1:]\n", "print('L = {}, length = {}'.format(L, len(L)))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "L2 = [[0, 1, 2], [0, 1, 2], [0, 1, 2]],\n", "(L2[0], L2[1], L2[2]) 和 L 是同一份參考嗎? (True, True, True)\n" ] } ], "source": [ "# 複製三份 L 串成新的 list 物件,注意三份都是同一個物件的參考\n", "L2 = [L] * 3\n", "print('\\nL2 = {},\\n(L2[0], L2[1], L2[2]) 和 L 是同一份參考嗎? ({}, {}, {})'\n", " .format(L2, L2[0] is L, L2[1] is L, L2[2] is L))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "L2copy = [[0, 1, 2], [0, 1, 2], [0, 1, 2]],\n", "L2copy 和 L2 是同一份參考嗎? (False),\n", "L2copy[0] 和 L 是同一份參考嗎? (True)\n" ] } ], "source": [ "# copy() 是所謂的 shallow copy\n", "L2copy = L2.copy()\n", "print('\\nL2copy = {},\\nL2copy 和 L2 是同一份參考嗎? ({}),\\nL2copy[0] 和 L 是同一份參考嗎? ({})'\n", " .format(L2copy, L2copy is L2, L2copy[0] is L))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "L = [0, 1, 0.5, 2, 1.5]\n", "L2 = [[0, 1, 0.5, 2, 1.5], [0, 1, 0.5, 2, 1.5], [0, 1, 0.5, 2, 1.5]]\n", "L2copy = [[0, 1, 0.5, 2, 1.5], [0, 1, 0.5, 2, 1.5], [0, 1, 0.5, 2, 1.5]]\n", "\n" ] } ], "source": [ "# 既然都參考到同一個物件,有一個內容改變了,其他也會跟著變\n", "L.insert(2, 0.5)\n", "L.append(1.5)\n", "print('\\nL = {}\\nL2 = {}\\nL2copy = {}\\n'.format(L, L2, L2copy))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "List 有提供 in-place 排序的方法 `sort()`。另外 Python 也有一個內建函式 `sorted()` 可以用來排序,這個內建函式不是 in-place 排序,但通用於所有支援迭代(iterator)的物件。" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "由小到大排序 L = [0, 0.5, 1, 1.5, 2]\n" ] } ], "source": [ "# 將元素內容排序\n", "L.sort()\n", "print('由小到大排序 L = {}'.format(L))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "由大到小排序 L = [2, 1.5, 1, 0.5, 0]\n" ] } ], "source": [ "L.sort(reverse=True)\n", "print('由大到小排序 L = {}'.format(L))" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L 的內容由小到大排序,新的 list = [0, 0.5, 1, 1.5, 2],原本的 L = [2, 1.5, 1, 0.5, 0] 沒變\n" ] } ], "source": [ "# Python 內建函式 sorted 是回傳一個新的 list 物件,不是 in-place 排序。\n", "print('L 的內容由小到大排序,新的 list = {},原本的 L = {} 沒變'.format(sorted(L), L))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.1 List Comprehension\n", "\n", "對於序列容器或可迭代物件 S 進行操作,並生成一個新的 `list` 物件。\n", "\n", "| 成員的操作 | 說明 |\n", "|-----------------------------------|-------------------------------------------------------------------|\n", "| `[運算表示句 for x in S]` | 針對每個 S 的成員 x 做運算,運算結果生成新的 list 物件 |\n", "| `[運算表示句 for x in S if 條件]` | 針對每個***符合條件***的成員 x 做運算,運算結果生成新的 list 物件 |\n", "\n", "List comprehension 語法結構可組成相當豐富的條件式迭代運算\n", "```\n", "[運算表示句 for x1 in S1 if 條件1\n", " for x2 in S2 if 條件2 ...\n", " for xN in SN if 條件N]\n", "```" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[11, 12, 13, 14, 15]\n", "[21, 22, 23, 24, 25]\n" ] } ], "source": [ "# 使用 for 迴圈操作 List 容器裡的成員\n", "L = [1, 2, 3, 4, 5]\n", "for i in range(len(L)):\n", " L[i] += 10\n", "print(L)\n", "\n", "# 使用 List Comprehension 生成新的 List 物件\n", "L = [x + 10 for x in L]\n", "print(L)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 4, 8, 12, 16]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 條件式挑選部份成員作處理\n", "L = list(range(10))\n", "Lnew = [n*2 for n in L if n % 2 == 0]\n", "Lnew" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5)]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 巢狀迴圈 + 條件式\n", "L1 = list(range(1, 4))\n", "L2 = list(range(3, 6))\n", "\n", "# 一般的巢狀迴圈寫法落落長\n", "#Lnew = []\n", "#for x in L1:\n", "# for y in L2:\n", "# if x != y:\n", "# Lnew.append((x, y))\n", "\n", "# List comprehension 的巢狀迴圈\n", "Lnew = [(x, y) for x in L1 for y in L2 if x != y]\n", "Lnew" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 與其他方法比較\n", "\n", "+ List comprehension 與生成運算表示的語法,除了括號的不同以外其他幾乎都一樣。 但生成運算表示只返回迭代子,不需要在記憶體中生成所有成員再來迭代運算,在大量運算時會比較節省記憶體空間。\n", "+ 內建函式 `map(function, iterable)` 是一個生成函式,返回一個尋訪參數2的可迭代物件,每次的迭代都返回指定函式(參數1)循序套用到尋訪參數2成員的結果。\n", "+ 內建函式 `filter(function, iterable)` 是一個生成函式,返回一個尋訪參數2的可迭代物件,每次迭代只會返回符合函式(參數1)測試結果的參數2的成員。\n", "+ `functools` 模組裡的 `reduce(function, iterable[, initializer])` 累進式套用兩個參數的函式(或二元運算子)到序列的每個成員,一直到序列化簡成單一值為止。\n", "\n", "由於 `map`, `filter`, 以及 `reduce` 都需要套用函式呼叫,時常需要另外定義額外的 `lambda` 函式來輔助。\n", "\n", "一般而言,使用 `map`, `filter` 的程式效能通常比 `for` 迴圈快;而使用 list comprehension 通常又比 `map`, `filter` 還快。" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[31.0, 0.3, 52.0, 46.5, 12.0, 94.7]\n", "[31.0, 0.3, 52.0, 46.5, 12.0, 94.7]\n", "[31.0, 0.3, 52.0, 46.5, 12.0, 94.7]\n" ] } ], "source": [ "L = ['31', '0.3', '52', '46.5', '12', '94.7']\n", "\n", "# 使用 List comprehension 將字串清單轉成數值清單\n", "print([float(s) for s in L])\n", "\n", "# 使用生成運算表示將字串清單轉成數值清單\n", "print(list(float(s) for s in L))\n", "\n", "# 使用 map() 將字串清單轉成數值清單\n", "print(list(map(float, L)))\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[31, 52, 12]\n", "[31, 52, 12]\n", "[31, 52, 12]\n" ] } ], "source": [ "# 使用 List comprehension 僅將清單裡的十進位整數字串轉成數值\n", "print([int(s) for s in L if s.isdecimal()])\n", "\n", "# 使用生成運算表示將清單裡的十進位整數字串轉成數值\n", "print(list(int(s) for s in L if s.isdecimal()))\n", "\n", "# 使用 map() + filter() 將清單裡的十進位整數字串轉成數值\n", "print(list(map(int, filter(str.isdecimal, L))))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "236.5\n", "236.5\n", "236.5\n", "236.5\n" ] } ], "source": [ "# List comprehension 要另外靠函式來達到 reduce 相同的效果\n", "print(sum([float(s) for s in L]))\n", "\n", "# 生成運算表示也要另外靠函式來達到 reduce 相同的效果\n", "print(sum(float(s) for s in L))\n", "\n", "# 使用 functools.reduce() 將清單裡的字串數值加總\n", "import operator, functools\n", "print(functools.reduce(operator.add, [float(s) for s in L]))\n", "\n", "# operator.add 改用自定義 lambda\n", "print(functools.reduce((lambda x, y: x + y), (float(s) for s in L)))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 2 }