{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"text-align:center\">\n",
    "    <a href=\"https://nbviewer.jupyter.org/github/twMr7/Python-Machine-Learning/blob/master/08-File_Operations.ipynb\">\n",
    "        Open In Jupyter nbviewer\n",
    "        <img style=\"float: center;\" src=\"https://nbviewer.jupyter.org/static/img/nav_logo.svg\" width=\"120\" />\n",
    "    </a>\n",
    "</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/twMr7/Python-Machine-Learning/blob/master/08-File_Operations.ipynb)\n",
    "\n",
    "# 8. 檔案操作 File Operations\n",
    "\n",
    "使用 Python 內建函式 `open` 產生 file 物件，預設開啟為**文字模式**的檔案，檔案的相關操作均透過這個 file 物件的方法，基本操作方法如以下表格。\n",
    "\n",
    "| 檔案操作                            | 說明                                          |\n",
    "|-------------------------------------|-----------------------------------------------|\n",
    "| `file = open('app.log', 'w')`       | 開啟並清空一個可供寫入的檔案物件              |\n",
    "| `file = open('app.log', 'a')`       | 開啟一個檔案物件，從結束位置開始寫入          |\n",
    "| `file.write(aString)`               | 寫入字串到檔案                                |\n",
    "| `file.writelines(aList)`            | 寫入清單裡的數行字串到檔案                    |\n",
    "| `file = open(r'C:\\data.csv', 'r')`  | 開啟一個可供讀取的文字檔案物件                |\n",
    "| `aString = file.read()`             | 讀取整個檔案到一個字串                        |\n",
    "| `aString = file.read(N)`            | 讀取下 N 個 bytes 到字串                      |\n",
    "| `aString = file.readline()`         | 讀取下一行（包含 '\\n' 字元）到字串            |\n",
    "| `aList = file.readlines()`          | 讀取整個檔案為數行（包含 '\\n' 字元）的清單    |\n",
    "| `file = open('music.bin', 'w+b')`   | 開啟一個 binary 檔案物件，清空為 0 byte       |\n",
    "| `file = open('music.bin', 'r+b')`   | 開啟一個 binary 檔案物件，不清空              |\n",
    "| `file.close()`                      | 手動關閉檔案物件                              |\n",
    "| `file.flush()`                      | 把緩衝（buffer）中的資料寫入實體磁碟          |\n",
    "\n",
    "+ 存取**文字模式**的檔案內容時，會先經過系統預設編碼解譯後返回 `str` 字串物件。 **binary 模式**則不會進行解碼，檔案內容直接返回 `bytes` 物件。\n",
    "+ 檔案的讀取可以當成可迭代物件操作\n",
    "+ 手動呼叫 `close()` 關閉檔案通常不是必要的，Python 的資源回收機制會自動關閉開啟的檔案物件。 但是養成開啟檔案後，明確地呼叫 `close()` 是個好習慣。\n",
    "+ 存取檔案資源，建議搭配 *context manager* 或 *try-except-finally* 使用，可以提供有保障的資源存取策略。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § 文字檔案讀寫\n",
    "\n",
    "檔案讀寫若未指定編碼（encoding）參數，預設會是作業系統預設編碼（如 `big5`），建議明確指定 `utf-8` 以方便與他人檔案交換。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "寫入 13 個字\n",
      "寫入 6 個字\n"
     ]
    }
   ],
   "source": [
    "# 寫一個新檔案\n",
    "outfile = open('hello.txt', 'w', encoding='utf-8')\n",
    "# write() 的結果返回寫入的字元數量，不是 byte\n",
    "nchars = outfile.write('Hello Python\\n')\n",
    "print('寫入', nchars, '個字')\n",
    "nchars = outfile.write('你好，拍神\\n')\n",
    "print('寫入', nchars, '個字')\n",
    "outfile.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello Python\n",
      "你好，拍神\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 開啟剛剛寫入的檔案，讀進內容，mode 參數預設為 'rt'，可以省略\n",
    "hello_string = open('hello.txt', encoding='utf-8').read()\n",
    "#hello_string\n",
    "print(hello_string)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello Python\n",
      "你好，拍神\n"
     ]
    }
   ],
   "source": [
    "# 對 file 使用 for 迴圈，每次迭代就是讀進一行\n",
    "for line in open('hello.txt', encoding='utf-8'):\n",
    "    print(line, end='')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § 檔案操作 - 使用 Context Manager\n",
    "\n",
    "使用 Context Manager 在檔案物件上，可代替 try/finally 的例外處理功能，在檔案開啟進入 `with` 區塊後，不論區塊內的運算是否發生例外狀況，保證在離開 `with` 區塊前自動關閉檔案。\n",
    "\n",
    "```\n",
    "with expression [as variable]:\n",
    "    statements\n",
    "```\n",
    "\n",
    "巢狀的 Context Manager 可以寫成\n",
    "```\n",
    "with exprA [as varA], exprB [as varB]:\n",
    "    statements\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello Python\n",
      "你好，拍神\n"
     ]
    }
   ],
   "source": [
    "# 同樣的程式使用 try-except-finally 的寫法\n",
    "# fin = open('hello.txt', encoding='utf-8')\n",
    "# try:\n",
    "#     for line in fin:\n",
    "#         print(line)\n",
    "# except:\n",
    "#     print('something is wrong')\n",
    "# finally:\n",
    "#     fin.close()\n",
    "\n",
    "# 使用 with 的可讀性較佳\n",
    "with open('hello.txt', encoding='utf-8') as fin:\n",
    "    for line in fin:\n",
    "        print(line, end='')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用巢狀的 Context Manager 作轉碼，讀取 utf-8 檔案，寫入 utf-16 檔案\n",
    "with open('hello.txt', encoding='utf-8') as fin, open('uhello.txt', 'w', encoding='utf-16') as fout:\n",
    "    for line in fin:\n",
    "        fout.write(line)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用巢狀的 Context Manager 作簡單的檔案比較\n",
    "with open('hello.txt', encoding='utf-8') as fu8, open('uhello.txt', encoding='utf-16') as fu16:\n",
    "    for (linenum, (u8line, u16line)) in enumerate(zip(fu8, fu16)):\n",
    "        if u8line != u16line:\n",
    "            print('line #{} 不同\\tfile1:\"{}\",\\tfile2:\"{}\"'.format(linenum, u8line[:-1], u16line[:-1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### § 檔案操作 - 使用 Comprehension\n",
    "\n",
    "使用 List 或 Dict Comprehension，在該段運算結束後，暫時的檔案物件也會自動被資源回收機制所關閉。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Hello Python', '你好，拍神']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# rstrip() 去除換行字元後放入 List\n",
    "[line.rstrip() for line in open('hello.txt', encoding='utf-8')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Hello Python', '你好，拍神']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 忽略第一個字元是註解 '#' 符號的那一行\n",
    "[line.rstrip() for line in open('uhello.txt', encoding='utf-16') if line[0] != '#']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: 'Hello Python', 1: '你好，拍神'}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 用行號把每一行記錄成 Dict\n",
    "{key: line.rstrip() for key, line in enumerate(open('hello.txt', encoding='utf-8'))}"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}