{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# gzip, zipfile, tarfile 模块:处理压缩文件" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import os, shutil, glob\n", "import zlib, gzip, bz2, zipfile, tarfile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "gzip " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## zilb 模块" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`zlib` 提供了对字符串进行压缩和解压缩的功能:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x�+��,V\u0000�D����\u0012�⒢̼t\u0000S�\u0007�\n", "this is a test string\n" ] } ], "source": [ "orginal = \"this is a test string\"\n", "\n", "compressed = zlib.compress(orginal)\n", "\n", "print compressed\n", "print zlib.decompress(compressed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "同时提供了两种校验和的计算方法:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1407780813\n" ] } ], "source": [ "print zlib.adler32(orginal) & 0xffffffff" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4236695221\n" ] } ], "source": [ "print zlib.crc32(orginal) & 0xffffffff" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## gzip 模块" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`gzip` 模块可以产生 `.gz` 格式的文件,其压缩方式由 `zlib` 模块提供。\n", "\n", "我们可以通过 `gzip.open` 方法来读写 `.gz` 格式的文件: " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "content = \"Lots of content here\"\n", "with gzip.open('file.txt.gz', 'wb') as f:\n", " f.write(content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "读:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lots of content here\n" ] } ], "source": [ "with gzip.open('file.txt.gz', 'rb') as f:\n", " file_content = f.read()\n", "\n", "print file_content" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将压缩文件内容解压出来:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "with gzip.open('file.txt.gz', 'rb') as f_in, open('file.txt', 'wb') as f_out:\n", " shutil.copyfileobj(f_in, f_out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "此时,目录下应有 `file.txt` 文件,内容为:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lots of content here\n" ] } ], "source": [ "with open(\"file.txt\") as f:\n", " print f.read()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "os.remove(\"file.txt.gz\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### bz2 模块" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`bz2` 模块提供了另一种压缩文件的方法:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "BZh91AY&SY*\u001c", "�v\u0000\u0000\t��@\u0000\"�\u001c", "\u0000 \u00001\u00000\"zi\u000f�\u0015\u001b�FLT`�軒)„�P�˰\n", "this is a test string\n" ] } ], "source": [ "orginal = \"this is a test string\"\n", "\n", "compressed = bz2.compress(orginal)\n", "\n", "print compressed\n", "print bz2.decompress(compressed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## zipfile 模块" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "产生一些 `file.txt` 的复制:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for i in range(10):\n", " shutil.copy(\"file.txt\", \"file.txt.\" + str(i))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "将这些复制全部压缩到一个 `.zip` 文件中:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "f = zipfile.ZipFile('files.zip','w')\n", "\n", "for name in glob.glob(\"*.txt.[0-9]\"):\n", " f.write(name)\n", " os.remove(name)\n", " \n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "解压这个 `.zip` 文件,用 `namelist` 方法查看压缩文件中的子文件名:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['file.txt.9', 'file.txt.6', 'file.txt.2', 'file.txt.1', 'file.txt.5', 'file.txt.4', 'file.txt.3', 'file.txt.7', 'file.txt.8', 'file.txt.0']\n" ] } ], "source": [ "f = zipfile.ZipFile('files.zip','r')\n", "print f.namelist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用 `f.read(name)` 方法来读取 `name` 文件中的内容:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "file.txt.9 content: Lots of content here\n", "file.txt.6 content: Lots of content here\n", "file.txt.2 content: Lots of content here\n", "file.txt.1 content: Lots of content here\n", "file.txt.5 content: Lots of content here\n", "file.txt.4 content: Lots of content here\n", "file.txt.3 content: Lots of content here\n", "file.txt.7 content: Lots of content here\n", "file.txt.8 content: Lots of content here\n", "file.txt.0 content: Lots of content here\n" ] } ], "source": [ "for name in f.namelist():\n", " print name, \"content:\", f.read(name)\n", "\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以用 `extract(name)` 或者 `extractall()` 解压单个或者全部文件。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## tarfile 模块" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "支持 `.tar` 格式文件的读写:\n", "\n", "例如可以这样将 `file.txt` 写入:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f = tarfile.open(\"file.txt.tar\", \"w\")\n", "f.add(\"file.txt\")\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "清理生成的文件:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "os.remove(\"file.txt\")\n", "os.remove(\"file.txt.tar\")\n", "os.remove(\"files.zip\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }