{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notedown example\n",
    "----------------\n",
    "\n",
    "This is an example of a notebook generated from markdown using\n",
    "[notedown]. The markdown source is on [github] and the generated\n",
    "output can be viewed using [nbviewer].\n",
    "\n",
    "[notedown]: http://github.com/aaren/notedown\n",
    "[github]: https://github.com/aaren/notedown/blob/master/example.md\n",
    "[nbviewer]: http://nbviewer.ipython.org/github/aaren/notedown/blob/master/example.ipynb\n",
    "\n",
    "Try opening `example.ipynb` as a notebook and running the cells to\n",
    "see how notedown works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ipython notebook example.ipynb "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notedown is simple. It converts markdown to the IPython notebook\n",
    "JSON format. It does this by splitting the markdown source into code\n",
    "blocks and not-code blocks and then creates a notebook using these\n",
    "as the input for the new cells (code and markdown).\n",
    "\n",
    "We make use of the IPython api:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import re\n",
    "import sys\n",
    "import argparse\n",
    "\n",
    "from IPython.nbformat.v3.rwbase import NotebookReader\n",
    "from IPython.nbformat.v3.nbjson import JSONWriter\n",
    "\n",
    "import IPython.nbformat.v3.nbbase as nbbase"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We create a new class `MarkdownReader` that inherits from\n",
    "`NotebookReader`. The only requirement on this new class is that it\n",
    "has a `.reads(self, s, **kwargs)` method that returns a notebook\n",
    "JSON string.\n",
    "\n",
    "We search for code blocks using regular expressions, making use of\n",
    "named groups:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "fenced_regex = r\"\"\"\n",
    "\\n*                     # any number of newlines followed by\n",
    "^(?P<fence>`{3,}|~{3,}) # a line starting with a fence of 3 or more ` or ~\n",
    "(?P<language>           # followed by the group 'language',\n",
    "[\\w+-]*)                # a word of alphanumerics, _, - or +\n",
    "[ ]*                    # followed by spaces\n",
    "(?P<options>.*)         # followed by any text\n",
    "\\n                      # followed by a newline\n",
    "(?P<content>            # start a group 'content'\n",
    "[\\s\\S]*?)               # that includes anything\n",
    "\\n(?P=fence)$           # up until the same fence that we started with\n",
    "\\n*                     # followed by any number of newlines\n",
    "\"\"\"\n",
    "\n",
    "# indented code\n",
    "indented_regex = r\"\"\"\n",
    "\\n*                        # any number of newlines\n",
    "(?P<icontent>              # start group 'icontent'\n",
    "(?P<indent>^([ ]{4,}|\\t))  # an indent of at least four spaces or one tab\n",
    "[\\s\\S]*?)                  # any code\n",
    "\\n*                        # any number of newlines\n",
    "^(?!(?P=indent))           # stop when there is a line without at least\n",
    "                            # the indent of the first one\n",
    "\"\"\"\n",
    "\n",
    "code_regex = r\"({}|{})\".format(fenced_regex, indented_regex)\n",
    "code_pattern = re.compile(code_regex, re.MULTILINE | re.VERBOSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then say we have some input text:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "text = u\"\"\"### Create IPython Notebooks from markdown\n",
    "\n",
    "This is a simple tool to convert markdown with code into an IPython\n",
    "Notebook.\n",
    "\n",
    "Usage:\n",
    "\n",
    "```\n",
    "notedown input.md > output.ipynb\n",
    "```\n",
    "\n",
    "\n",
    "It is really simple and separates your markdown into code and not\n",
    "code. Code goes into code cells, not-code goes into markdown cells.\n",
    "\n",
    "Installation:\n",
    "\n",
    "    pip install notedown\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can parse out the code block matches with "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "code_matches = [m for m in code_pattern.finditer(text)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each of the matches has a `start()` and `end()` method telling us\n",
    "the position in the string where each code block starts and\n",
    "finishes. We use this and do some rearranging to get a list of all\n",
    "of the blocks (text and code) in the order in which they appear in\n",
    "the text:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "code = u'code'\n",
    "markdown = u'markdown'\n",
    "python = u'python'\n",
    "\n",
    "def pre_process_code_block(block):\n",
    "    \"\"\"Preprocess the content of a code block, modifying the code\n",
    "    block in place.\n",
    "\n",
    "    Remove indentation and do magic with the cell language\n",
    "    if applicable.\n",
    "    \"\"\"\n",
    "    # homogenise content attribute of fenced and indented blocks\n",
    "    block['content'] = block.get('content') or block['icontent']\n",
    "\n",
    "    # dedent indented code blocks\n",
    "    if 'indent' in block and block['indent']:\n",
    "        indent = r\"^\" + block['indent']\n",
    "        content = block['content'].splitlines()\n",
    "        dedented = [re.sub(indent, '', line) for line in content]\n",
    "        block['content'] = '\\n'.join(dedented)\n",
    "\n",
    "    # alternate descriptions for python code\n",
    "    python_aliases = ['python', 'py', '', None]\n",
    "    # ensure one identifier for python code\n",
    "    if 'language' in block and block['language'] in python_aliases:\n",
    "        block['language'] = python\n",
    "\n",
    "    # add alternate language execution magic\n",
    "    if 'language' in block and block['language'] != python:\n",
    "        code_magic = \"%%{}\\n\".format(block['language'])\n",
    "        block['content'] = code_magic + block['content']\n",
    "\n",
    "# determine where the limits of the non code bits are\n",
    "# based on the code block edges\n",
    "text_starts = [0] + [m.end() for m in code_matches]\n",
    "text_stops = [m.start() for m in code_matches] + [len(text)]\n",
    "text_limits = zip(text_starts, text_stops)\n",
    "\n",
    "# list of the groups from the code blocks\n",
    "code_blocks = [m.groupdict() for m in code_matches]\n",
    "# update with a type field\n",
    "code_blocks = [dict(d.items() + [('type', code)]) for d in\n",
    "                                                            code_blocks]\n",
    "\n",
    "# remove indents, add code magic, etc.\n",
    "map(pre_process_code_block, code_blocks)\n",
    "\n",
    "text_blocks = [{'content': text[i:j], 'type': markdown} for i, j\n",
    "                                                   in text_limits]\n",
    "\n",
    "# create a list of the right length\n",
    "all_blocks = range(len(text_blocks) + len(code_blocks))\n",
    "\n",
    "# cells must alternate in order\n",
    "all_blocks[::2] = text_blocks\n",
    "all_blocks[1::2] = code_blocks\n",
    "\n",
    "# remove possible empty first, last text cells\n",
    "all_blocks = [cell for cell in all_blocks if cell['content']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we've done that it is easy to convert the blocks into IPython\n",
    "notebook cells and create a new notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "cells = []\n",
    "for block in all_blocks:\n",
    "    if block['type'] == code:\n",
    "        kwargs = {'input': block['content'],\n",
    "                  'language': block['language']}\n",
    "\n",
    "        code_cell = nbbase.new_code_cell(**kwargs)\n",
    "        cells.append(code_cell)\n",
    "\n",
    "    elif block['type'] == markdown:\n",
    "        kwargs = {'cell_type': block['type'],\n",
    "                  'source': block['content']}\n",
    "\n",
    "        markdown_cell = nbbase.new_text_cell(**kwargs)\n",
    "        cells.append(markdown_cell)\n",
    "\n",
    "    else:\n",
    "        raise NotImplementedError(\"{} is not supported as a cell\"\n",
    "                                    \"type\".format(block['type']))\n",
    "\n",
    "ws = nbbase.new_worksheet(cells=cells)\n",
    "nb = nbbase.new_notebook(worksheets=[ws])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`JSONWriter` gives us nicely formatted JSON output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "writer = JSONWriter()\n",
    "print writer.writes(nb)"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 0
}