{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CSV 文件和 csv 模块" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "标准库中有自带的 `csv` (逗号分隔值) 模块处理 `csv` 格式的文件:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 读 csv 文件" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们有这样的一个文件:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing data.csv\n" ] } ], "source": [ "%%file data.csv\n", "\"alpha 1\", 100, -1.443\n", "\"beat 3\", 12, -0.0934\n", "\"gamma 3a\", 192, -0.6621\n", "\"delta 2a\", 15, -4.515" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "打开这个文件,并产生一个文件 reader:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "fp = open(\"data.csv\")\n", "r = csv.reader(fp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以按行迭代数据:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['alpha 1', ' 100', ' -1.443']\n", "['beat 3', ' 12', ' -0.0934']\n", "['gamma 3a', ' 192', ' -0.6621']\n", "['delta 2a', ' 15', ' -4.515']\n" ] } ], "source": [ "for row in r:\n", " print row\n", " \n", "fp.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "默认数据内容都被当作字符串处理,不过可以自己进行处理:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[['alpha 1', 100, -1.443],\n", " ['beat 3', 12, -0.0934],\n", " ['gamma 3a', 192, -0.6621],\n", " ['delta 2a', 15, -4.515]]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = []\n", "\n", "with open('data.csv') as fp:\n", " r = csv.reader(fp)\n", " for row in r:\n", " data.append([row[0], int(row[1]), float(row[2])])\n", " \n", "data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os\n", "os.remove('data.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 写 csv 文件" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以使用 `csv.writer` 写入文件,不过相应地,传入的应该是以写方式打开的文件,不过一般要用 `'wb'` 即二进制写入方式,防止出现换行不正确的问题:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = [('one', 1, 1.5), ('two', 2, 8.0)]\n", "with open('out.csv', 'wb') as fp:\n", " w = csv.writer(fp)\n", " w.writerows(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "显示结果:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "one,1,1.5\n", "two,2,8.0\n" ] } ], "source": [ "!cat 'out.csv'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 更换分隔符" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "默认情况下,`csv` 模块默认 `csv` 文件都是由 `excel` 产生的,实际中可能会遇到这样的问题:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = [('one, \\\"real\\\" string', 1, 1.5), ('two', 2, 8.0)]\n", "with open('out.csv', 'wb') as fp:\n", " w = csv.writer(fp)\n", " w.writerows(data)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"one, \"\"real\"\" string\",1,1.5\n", "two,2,8.0\n" ] } ], "source": [ "!cat 'out.csv'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以修改分隔符来处理这组数据:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = [('one, \\\"real\\\" string', 1, 1.5), ('two', 2, 8.0)]\n", "with open('out.psv', 'wb') as fp:\n", " w = csv.writer(fp, delimiter=\"|\")\n", " w.writerows(data)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"one, \"\"real\"\" string\"|1|1.5\n", "two|2|8.0\n" ] } ], "source": [ "!cat 'out.psv'" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import os\n", "os.remove('out.psv')\n", "os.remove('out.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 其他选项" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy.loadtxt()` 和 `pandas.read_csv()` 可以用来读写包含很多数值数据的 `csv` 文件:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing trades.csv\n" ] } ], "source": [ "%%file trades.csv\n", "Order,Date,Stock,Quantity,Price\n", "A0001,2013-12-01,AAPL,1000,203.4\n", "A0002,2013-12-01,MSFT,1500,167.5\n", "A0003,2013-12-02,GOOG,1500,167.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用 `pandas` 进行处理,生成一个 `DataFrame` 对象:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Date Stock Quantity Price\n", "Order \n", "A0001 2013-12-01 AAPL 1000 203.4\n", "A0002 2013-12-01 MSFT 1500 167.5\n", "A0003 2013-12-02 GOOG 1500 167.5\n" ] } ], "source": [ "import pandas\n", "df = pandas.read_csv('trades.csv', index_col=0)\n", "print df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通过名字进行索引:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Order\n", "A0001 203400\n", "A0002 251250\n", "A0003 251250\n", "dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Quantity'] * df['Price']" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import os\n", "os.remove('trades.csv')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }