{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using nbconvert as a Library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this Notebook, you will be introduced to the programatic API of nbconvert and how it can be used in various contexts. \n",
    "\n",
    "One of [@jakevdp](https://github.com/jakevdp)'s great [blog posts](http://jakevdp.github.io/blog/2013/04/15/code-golf-in-python-sudoku/) will be used to demonstrate.  This notebook will not focus on using the command line tool. The attentive reader will point-out that no data is read from or written to disk during the conversion process. Nbconvert has been designed to work in memory so that it works well in a database or web-based environement too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Credit, Jonathan Frederic (@jdfreder on github)\n",
    "\n",
    "<center>\n",
    " ![nbca](images/nbconvert_arch.png)\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The main principle of nbconvert is to instantiate an `Exporter` that controls\n",
    "the pipeline through which notebooks are converted."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, download @jakevdp's notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "response = requests.get('http://jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb')\n",
    "response.text[0:60]+'...'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you do not have `requests`, install it by running `pip install requests` (or if you don't have pip installed, you can find it on PYPI)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The response is a JSON string which represents an IPython notebook. Next, read the response using nbformat.\n",
    "\n",
    "Doing this will guarantee that the notebook structure is valid. Note that the in-memory format and on disk format are slightly different. In particual, on disk, multiline strings might be splitted into a list of strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython import nbformat\n",
    "jake_notebook = nbformat.reads(response.text, as_version=4)\n",
    "jake_notebook.cells[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The nbformat API returns a special dict.  You don't need to worry about the details of the structure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The nbconvert API exposes some basic exporters for common formats and defaults. You will start\n",
    "by using one of them. First you will import it, then instantiate it using most of the defaults, and finally you will process notebook downloaded early."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.config import Config\n",
    "from IPython.nbconvert import HTMLExporter\n",
    "\n",
    "# The `basic` template is used here.\n",
    "# Later you'll learn how to configure the exporter.\n",
    "html_exporter = HTMLExporter(config=Config({'HTMLExporter':{'default_template':'basic'}}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "(body, resources) = html_exporter.from_notebook_node(jake_notebook)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The exporter returns a tuple containing the body of the converted notebook, raw HTML in this case, as well as a resources dict.  The resource dict contains (among many things) the extracted PNG, JPG [...etc] from the notebook when applicable.  The basic HTML exporter leaves the figures as embeded base64, but you can configure it to extract the figures.  So for now, the resource dict **should** be mostly empty, except for a key containing CSS and a few others whose content will be obvious.\n",
    "\n",
    "`Exporter`s are stateless, so you won't be able to extract any usefull information beyond their configuration from them.  You can re-use an exporter instance to convert another notebook. Each exporter exposes, for convenience, a `from_file` and `from_filename` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print([key for key in resources ])\n",
    "print(resources['metadata'])\n",
    "print(resources['output_extension'])\n",
    "# print resources['inlining'] # Too long to be shown"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Part of the body, here the first Heading\n",
    "start = body.index('<h1 id', )\n",
    "print(body[:400]+'...')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you understand HTML, you'll notice that some common tags are ommited, like the `body` tag.  Those tags are included in the default `HtmlExporter`, which is what would have been constructed if no Config object was passed into it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Extracting Figures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When exporting you may want to extract the base64 encoded figures as files, this is by default what the `RstExporter` does (as seen below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.nbconvert import RSTExporter\n",
    "\n",
    "rst_exporter = RSTExporter()\n",
    "\n",
    "(body,resources) = rst_exporter.from_notebook_node(jake_notebook)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(body[:970]+'...')\n",
    "print('[.....]')\n",
    "print(body[800:1200]+'...')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that base64 images are not embeded, but instead there are file name like strings.  The strings actually are (configurable) keys that map to the binary data in the resources dict.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note, if you write an RST Plugin, you are responsible for writing all the files to the disk (or uploading, etc...) in the right location.  Of course, the naming scheme is configurable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an exercise, this notebook will show you how to get one of those images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "list(resources['outputs'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are 5 extracted binary figures, all `png`s, but they could have been `svg`s which then wouldn't appear in the binary sub dict.  Keep in mind that objects with multiple reprs will have every repr stored in the notebook avaliable for conversion. \n",
    "\n",
    "Hence if the object provides `_repr_javascript_`, `_repr_latex_`, and `_repr_png_`, you will be able to determine, at conversion time, which representaition is most appropriate. You could even show all of the representaitions of an object in a single export, it's up to you. Doing so would require a little more involvement on your part and a custom Jinja template.\n",
    "\n",
    "Back to the task of extracting an image, the Image display object can be used to display one of the images (as seen below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "Image(data=resources['outputs']['output_3_0.png'],format='png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This image is being rendered without reading or writing to the disk."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extracting figures with HTML Exporter ?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use case:\n",
    "\n",
    "> I write an [awesome blog](http://jakevdp.github.io/) using IPython notebooks converted to HTML, and I want the images to be cached.  Having one html file with all of the images base64 encoded inside it is nice when sharing with a coworker, but for a website, not so much.  I need an HTML exporter, and I want it to extract the figures!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Some theory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The process of converting a notebook to a another format with happens in a few steps:\n",
    "\n",
    "  - Retrieve the notebook and it's accompanying resource (you are responsible for this).\n",
    "  - Feed them into the exporter, which:\n",
    "      - Sequentially feeds them into an array of `Preprocessors`.  Preprocessors only act on the **structure** of the notebook, and have unrestricted access to it. \n",
    "      - Feeds the notebook into the Jinja templating engine.\n",
    "          - The template is configured (you can change which one is used).\n",
    "          - Templates make use of configurable macros called `filters`.\n",
    "  - The exporter returns the converted notebook and other relevant resources as a tuple.\n",
    "  - You write the data to the disk, or elsewhere (you are responsible for this too)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use `Preprocessors` to accomplish the task at hand.  IPython has preprocessors built in which you can use.  One of them, the `ExtractOutputPreprocessor` is responsible for crawling the notebook, finding all of the figures, and putting them into the resources directory, as well as choosing the key (i.e. `filename_xx_y.extension`) that can replace the figure inside the template.\n",
    "\n",
    "The `ExtractOutputPreprocessor` is special because it's available in all of the `Exporter`s, and is just disabled in some by default."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 3rd one should be <ExtractOutputPreprocessor>\n",
    "html_exporter._preprocessors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the IPython configuration/Traitlets system to enable it. If you have already set IPython configuration options, this system is familiar to you. Configuration options will always of the form:\n",
    "\n",
    "    ClassName.attribute_name = value\n",
    "   \n",
    "You can create a configuration object a couple of different ways.  Everytime you launch IPython, configuration objects are created from reading config files in your profile directory.  Instead of writing a config file, you can also do it programatically using a dictionary.  The following creates a config object, that enables the figure extracter, and passes it to an `HTMLExporter`.  The output is compared to an `HTMLExporter` without the config object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.config import Config\n",
    "\n",
    "c =  Config({\n",
    "            'ExtractOutputPreprocessor':{'enabled':True}\n",
    "            })\n",
    "\n",
    "exportHTML = HTMLExporter()\n",
    "exportHTML_and_figs = HTMLExporter(config=c)\n",
    "\n",
    "(_, resources)          = exportHTML.from_notebook_node(jake_notebook)\n",
    "(_, resources_with_fig) = exportHTML_and_figs.from_notebook_node(jake_notebook)\n",
    "\n",
    "print('resources without the \"figures\" key:')\n",
    "print(list(resources))\n",
    "\n",
    "print('')\n",
    "print('ditto, notice that there\\'s one more field:')\n",
    "print(list(resources_with_fig))\n",
    "list(resources_with_fig['outputs'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Custom Preprocessor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are an endless number of transformations that you may want to apply to a notebook.  This is why we provide a way to register your own preprocessors that will be applied to the notebook after the default ones.\n",
    "\n",
    "To do so, you'll have to pass an ordered list of `Preprocessor`s to the `Exporter`'s constructor. \n",
    "\n",
    "For simple cell-by-cell transformations, `Preprocessor` can be created using a decorator.  For more complex operations, you need to subclass `Preprocessor` and define a `call` method (as seen below).\n",
    "\n",
    "All transforers have a flag that allows you to enable and disable them via a configuration object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.nbconvert.preprocessors import Preprocessor\n",
    "import IPython.config\n",
    "print(\"Four relevant docstring\")\n",
    "print('=============================')\n",
    "print(Preprocessor.__doc__)\n",
    "print('=============================')\n",
    "print(Preprocessor.preprocess.__doc__)\n",
    "print('=============================')\n",
    "print(Preprocessor.preprocess_cell.__doc__)\n",
    "print('=============================')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following demonstration was requested in [an IPython GitHub issue](https://github.com/ipython/nbconvert/pull/137#issuecomment-18658235), the ability to exclude a cell by index. \n",
    "\n",
    "Inject cells is similar, and won't be covered here.  If you want to inject static content at the beginning/end of a notebook, use a custom template."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.utils.traitlets import Integer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class PelicanSubCell(Preprocessor):\n",
    "    \"\"\"A Pelican specific preprocessor to remove some of the cells of a notebook\"\"\"\n",
    "    \n",
    "    # I could also read the cells from nbc.metadata.pelican is someone wrote a JS extension\n",
    "    # But I'll stay with configurable value. \n",
    "    start = Integer(0, config=True, help=\"first cell of notebook to be converted\")\n",
    "    end   = Integer(-1, config=True, help=\"last cell of notebook to be converted\")\n",
    "    \n",
    "    def preprocess(self, nb, resources):\n",
    "\n",
    "        #nbc = deepcopy(nb)\n",
    "        nbc = nb\n",
    "        # don't print in real preprocessor !!!\n",
    "        print(\"I'll keep only cells from \", self.start, \"to \", self.end, \"\\n\\n\")\n",
    "        nbc.cells = nb.cells[self.start:self.end]                    \n",
    "        return nbc, resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# I create this on the fly, but this could be loaded from a DB, and config object support merging...\n",
    "c =  Config()\n",
    "c.PelicanSubCell.enabled = True\n",
    "c.PelicanSubCell.start = 4\n",
    "c.PelicanSubCell.end = 6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here a Pelican exporter is created that takes `PelicanSubCell` preprocessors and a `config` object as parameters.  This may seem redundant, but with the configuration system you can register an inactive preprocessor on all of the exporters and activate it from config files or the command line. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "pelican = RSTExporter(preprocessors=[PelicanSubCell], config=c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(pelican.from_notebook_node(jake_notebook)[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Programatically make templates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from jinja2 import DictLoader\n",
    "\n",
    "dl = DictLoader({'full.tpl': \n",
    "\"\"\"\n",
    "{%- extends 'basic.tpl' -%} \n",
    "\n",
    "{% block footer %}\n",
    "FOOOOOOOOTEEEEER\n",
    "{% endblock footer %}\n",
    "\"\"\"})\n",
    "\n",
    "\n",
    "exportHTML = HTMLExporter(extra_loaders=[dl])\n",
    "(body,resources) = exportHTML.from_notebook_node(jake_notebook)\n",
    "for l in body.split('\\n')[-4:]:\n",
    "    print(l)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Real World Use"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "@jakevdp uses Pelican and IPython Notebook to blog. Pelican [will use](https://github.com/getpelican/pelican-plugins/pull/21) nbconvert programatically to generate blog post. Have a look a [Pythonic Preambulations](http://jakevdp.github.io/) for Jake's blog post."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "@damianavila wrote the Nicholas Plugin to [write blog post as Notebooks](http://www.damian.oquanta.info/posts/one-line-deployment-of-your-site-to-gh-pages.html) and is developping a js-extension to publish notebooks via one click from the web app."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<blockquote class=\"twitter-tweet\"><p>As <a href=\"https://twitter.com/Mbussonn\">@Mbussonn</a> requested... easieeeeer! Deploy your Nikola site with just a click in the IPython notebook! <a href=\"http://t.co/860sJunZvj\">http://t.co/860sJunZvj</a> cc <a href=\"https://twitter.com/ralsina\">@ralsina</a></p>&mdash; Damián Avila (@damian_avila) <a href=\"https://twitter.com/damian_avila/statuses/370306057828335616\">August 21, 2013</a></blockquote>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### A few gotchas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Jinja blocks use `{% %}`by default which does not play nicely with $\\LaTeX$, hence thoses are replaced by `((* *))` in latex templates."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}