{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3.2. Converting an IPython notebook to other formats with nbconvert" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need pandoc, a LateX distribution, and the Notebook dataset on the book's website. On Windows, you also need pywin32 (`conda install pywin32` if you use Anaconda)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Let's open the test notebook in the `data` folder. A notebook is just a plain text file (JSON), so we open it in text mode (`r` mode)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with open('data/test.ipynb', 'r') as f:\n", " contents = f.read()\n", "print(len(contents))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "strip_output": [ 10, 10 ] }, "outputs": [], "source": [ "print(contents[:345] + '...' + contents[-33:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Now that we have loaded the notebook as a string, let's parse it with the `json` module." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import json\n", "nb = json.loads(contents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Let's have a look at the keys in the notebook dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(nb.keys())\n", "print('nbformat ' + str(nb['nbformat']) + \n", " '.' + str(nb['nbformat_minor']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The version of the notebook format is indicated in `nbformat` and `nbformat_minor`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. The main field is `worksheets`: there is only one by default. A worksheet contains a list of cells, and some metadata." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "nb['worksheets'][0].keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Each cell has a type, optional metadata, some contents (text or code), possibly one or several outputs, and other information. Let's look at a Markdown cell and a code cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "nb['worksheets'][0]['cells'][1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "nb['worksheets'][0]['cells'][2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Once parsed, the notebook is represented as a Python dictionary. Manipulating it is therefore quite convenient in Python. Here, we count the number of Markdown and code cells." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cells = nb['worksheets'][0]['cells']\n", "nm = len([cell for cell in cells\n", " if cell['cell_type'] == 'markdown'])\n", "nc = len([cell for cell in cells\n", " if cell['cell_type'] == 'code'])\n", "print((\"There are {nm} Markdown cells and \"\n", " \"{nc} code cells.\").format(\n", " nm=nm, nc=nc))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Let's have a closer look at the image output of the cell with the matplotlib figure." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "png = cells[2]['outputs'][0]['png']\n", "cells[2]['outputs'][0]['png'] = png[:20] + '...' + png[-20:]\n", "cells[2]['outputs'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In general, there can be zero, one, or multiple outputs. Besides, each output can have multiple representations. Here, the matplotlib figure has a PNG representation (the base64-encoded image) and a text representation (the internal representation of the figure)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Now, we are going to use nbconvert to convert our text notebook to other formats. This tool can be used from the command-line (if you are using IPython < 4.x, replace command `jupyter` with `ipython`). Here, we convert the notebook to an HTML document." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "!jupyter nbconvert --to html data/test.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Let's display this document in an `