{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<p style=\"text-align:center\">\n", " <a href=\"https://nbviewer.jupyter.org/github/twMr7/Python-Machine-Learning/blob/master/08-File_Operations.ipynb\">\n", " Open In Jupyter nbviewer\n", " <img style=\"float: center;\" src=\"https://nbviewer.jupyter.org/static/img/nav_logo.svg\" width=\"120\" />\n", " </a>\n", "</p>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/twMr7/Python-Machine-Learning/blob/master/08-File_Operations.ipynb)\n", "\n", "# 8. 檔案操作 File Operations\n", "\n", "使用 Python 內建函式 `open` 產生 file 物件,預設開啟為**文字模式**的檔案,檔案的相關操作均透過這個 file 物件的方法,基本操作方法如以下表格。\n", "\n", "| 檔案操作 | 說明 |\n", "|-------------------------------------|-----------------------------------------------|\n", "| `file = open('app.log', 'w')` | 開啟並清空一個可供寫入的檔案物件 |\n", "| `file = open('app.log', 'a')` | 開啟一個檔案物件,從結束位置開始寫入 |\n", "| `file.write(aString)` | 寫入字串到檔案 |\n", "| `file.writelines(aList)` | 寫入清單裡的數行字串到檔案 |\n", "| `file = open(r'C:\\data.csv', 'r')` | 開啟一個可供讀取的文字檔案物件 |\n", "| `aString = file.read()` | 讀取整個檔案到一個字串 |\n", "| `aString = file.read(N)` | 讀取下 N 個 bytes 到字串 |\n", "| `aString = file.readline()` | 讀取下一行(包含 '\\n' 字元)到字串 |\n", "| `aList = file.readlines()` | 讀取整個檔案為數行(包含 '\\n' 字元)的清單 |\n", "| `file = open('music.bin', 'w+b')` | 開啟一個 binary 檔案物件,清空為 0 byte |\n", "| `file = open('music.bin', 'r+b')` | 開啟一個 binary 檔案物件,不清空 |\n", "| `file.close()` | 手動關閉檔案物件 |\n", "| `file.flush()` | 把緩衝(buffer)中的資料寫入實體磁碟 |\n", "\n", "+ 存取**文字模式**的檔案內容時,會先經過系統預設編碼解譯後返回 `str` 字串物件。 **binary 模式**則不會進行解碼,檔案內容直接返回 `bytes` 物件。\n", "+ 檔案的讀取可以當成可迭代物件操作\n", "+ 手動呼叫 `close()` 關閉檔案通常不是必要的,Python 的資源回收機制會自動關閉開啟的檔案物件。 但是養成開啟檔案後,明確地呼叫 `close()` 是個好習慣。\n", "+ 存取檔案資源,建議搭配 *context manager* 或 *try-except-finally* 使用,可以提供有保障的資源存取策略。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 文字檔案讀寫\n", "\n", "檔案讀寫若未指定編碼(encoding)參數,預設會是作業系統預設編碼(如 `big5`),建議明確指定 `utf-8` 以方便與他人檔案交換。" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "寫入 13 個字\n", "寫入 6 個字\n" ] } ], "source": [ "# 寫一個新檔案\n", "outfile = open('hello.txt', 'w', encoding='utf-8')\n", "# write() 的結果返回寫入的字元數量,不是 byte\n", "nchars = outfile.write('Hello Python\\n')\n", "print('寫入', nchars, '個字')\n", "nchars = outfile.write('你好,拍神\\n')\n", "print('寫入', nchars, '個字')\n", "outfile.close()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello Python\n", "你好,拍神\n", "\n" ] } ], "source": [ "# 開啟剛剛寫入的檔案,讀進內容,mode 參數預設為 'rt',可以省略\n", "hello_string = open('hello.txt', encoding='utf-8').read()\n", "#hello_string\n", "print(hello_string)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello Python\n", "你好,拍神\n" ] } ], "source": [ "# 對 file 使用 for 迴圈,每次迭代就是讀進一行\n", "for line in open('hello.txt', encoding='utf-8'):\n", " print(line, end='')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 檔案操作 - 使用 Context Manager\n", "\n", "使用 Context Manager 在檔案物件上,可代替 try/finally 的例外處理功能,在檔案開啟進入 `with` 區塊後,不論區塊內的運算是否發生例外狀況,保證在離開 `with` 區塊前自動關閉檔案。\n", "\n", "```\n", "with expression [as variable]:\n", " statements\n", "```\n", "\n", "巢狀的 Context Manager 可以寫成\n", "```\n", "with exprA [as varA], exprB [as varB]:\n", " statements\n", "```\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello Python\n", "你好,拍神\n" ] } ], "source": [ "# 同樣的程式使用 try-except-finally 的寫法\n", "# fin = open('hello.txt', encoding='utf-8')\n", "# try:\n", "# for line in fin:\n", "# print(line)\n", "# except:\n", "# print('something is wrong')\n", "# finally:\n", "# fin.close()\n", "\n", "# 使用 with 的可讀性較佳\n", "with open('hello.txt', encoding='utf-8') as fin:\n", " for line in fin:\n", " print(line, end='')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# 使用巢狀的 Context Manager 作轉碼,讀取 utf-8 檔案,寫入 utf-16 檔案\n", "with open('hello.txt', encoding='utf-8') as fin, open('uhello.txt', 'w', encoding='utf-16') as fout:\n", " for line in fin:\n", " fout.write(line)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# 使用巢狀的 Context Manager 作簡單的檔案比較\n", "with open('hello.txt', encoding='utf-8') as fu8, open('uhello.txt', encoding='utf-16') as fu16:\n", " for (linenum, (u8line, u16line)) in enumerate(zip(fu8, fu16)):\n", " if u8line != u16line:\n", " print('line #{} 不同\\tfile1:\"{}\",\\tfile2:\"{}\"'.format(linenum, u8line[:-1], u16line[:-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### § 檔案操作 - 使用 Comprehension\n", "\n", "使用 List 或 Dict Comprehension,在該段運算結束後,暫時的檔案物件也會自動被資源回收機制所關閉。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Hello Python', '你好,拍神']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rstrip() 去除換行字元後放入 List\n", "[line.rstrip() for line in open('hello.txt', encoding='utf-8')]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Hello Python', '你好,拍神']" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 忽略第一個字元是註解 '#' 符號的那一行\n", "[line.rstrip() for line in open('uhello.txt', encoding='utf-16') if line[0] != '#']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 'Hello Python', 1: '你好,拍神'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 用行號把每一行記錄成 Dict\n", "{key: line.rstrip() for key, line in enumerate(open('hello.txt', encoding='utf-8'))}" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }