{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Notedown example\n", "----------------\n", "\n", "This is an example of a notebook generated from markdown using\n", "[notedown]. The markdown source is on [github] and the generated\n", "output can be viewed using [nbviewer].\n", "\n", "[notedown]: http://github.com/aaren/notedown\n", "[github]: https://github.com/aaren/notedown/blob/master/example.md\n", "[nbviewer]: http://nbviewer.ipython.org/github/aaren/notedown/blob/master/example.ipynb\n", "\n", "Try opening `example.ipynb` as a notebook and running the cells to\n", "see how notedown works." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ipython notebook example.ipynb " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notedown is simple. It converts markdown to the IPython notebook\n", "JSON format. It does this by splitting the markdown source into code\n", "blocks and not-code blocks and then creates a notebook using these\n", "as the input for the new cells (code and markdown).\n", "\n", "We make use of the IPython api:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import re\n", "import sys\n", "import argparse\n", "\n", "from IPython.nbformat.v3.rwbase import NotebookReader\n", "from IPython.nbformat.v3.nbjson import JSONWriter\n", "\n", "import IPython.nbformat.v3.nbbase as nbbase" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a new class `MarkdownReader` that inherits from\n", "`NotebookReader`. The only requirement on this new class is that it\n", "has a `.reads(self, s, **kwargs)` method that returns a notebook\n", "JSON string.\n", "\n", "We search for code blocks using regular expressions, making use of\n", "named groups:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fenced_regex = r\"\"\"\n", "\\n* # any number of newlines followed by\n", "^(?P`{3,}|~{3,}) # a line starting with a fence of 3 or more ` or ~\n", "(?P # followed by the group 'language',\n", "[\\w+-]*) # a word of alphanumerics, _, - or +\n", "[ ]* # followed by spaces\n", "(?P.*) # followed by any text\n", "\\n # followed by a newline\n", "(?P # start a group 'content'\n", "[\\s\\S]*?) # that includes anything\n", "\\n(?P=fence)$ # up until the same fence that we started with\n", "\\n* # followed by any number of newlines\n", "\"\"\"\n", "\n", "# indented code\n", "indented_regex = r\"\"\"\n", "\\n* # any number of newlines\n", "(?P # start group 'icontent'\n", "(?P^([ ]{4,}|\\t)) # an indent of at least four spaces or one tab\n", "[\\s\\S]*?) # any code\n", "\\n* # any number of newlines\n", "^(?!(?P=indent)) # stop when there is a line without at least\n", " # the indent of the first one\n", "\"\"\"\n", "\n", "code_regex = r\"({}|{})\".format(fenced_regex, indented_regex)\n", "code_pattern = re.compile(code_regex, re.MULTILINE | re.VERBOSE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then say we have some input text:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "text = u\"\"\"### Create IPython Notebooks from markdown\n", "\n", "This is a simple tool to convert markdown with code into an IPython\n", "Notebook.\n", "\n", "Usage:\n", "\n", "```\n", "notedown input.md > output.ipynb\n", "```\n", "\n", "\n", "It is really simple and separates your markdown into code and not\n", "code. Code goes into code cells, not-code goes into markdown cells.\n", "\n", "Installation:\n", "\n", " pip install notedown\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can parse out the code block matches with " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "code_matches = [m for m in code_pattern.finditer(text)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each of the matches has a `start()` and `end()` method telling us\n", "the position in the string where each code block starts and\n", "finishes. We use this and do some rearranging to get a list of all\n", "of the blocks (text and code) in the order in which they appear in\n", "the text:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "code = u'code'\n", "markdown = u'markdown'\n", "python = u'python'\n", "\n", "def pre_process_code_block(block):\n", " \"\"\"Preprocess the content of a code block, modifying the code\n", " block in place.\n", "\n", " Remove indentation and do magic with the cell language\n", " if applicable.\n", " \"\"\"\n", " # homogenise content attribute of fenced and indented blocks\n", " block['content'] = block.get('content') or block['icontent']\n", "\n", " # dedent indented code blocks\n", " if 'indent' in block and block['indent']:\n", " indent = r\"^\" + block['indent']\n", " content = block['content'].splitlines()\n", " dedented = [re.sub(indent, '', line) for line in content]\n", " block['content'] = '\\n'.join(dedented)\n", "\n", " # alternate descriptions for python code\n", " python_aliases = ['python', 'py', '', None]\n", " # ensure one identifier for python code\n", " if 'language' in block and block['language'] in python_aliases:\n", " block['language'] = python\n", "\n", " # add alternate language execution magic\n", " if 'language' in block and block['language'] != python:\n", " code_magic = \"%%{}\\n\".format(block['language'])\n", " block['content'] = code_magic + block['content']\n", "\n", "# determine where the limits of the non code bits are\n", "# based on the code block edges\n", "text_starts = [0] + [m.end() for m in code_matches]\n", "text_stops = [m.start() for m in code_matches] + [len(text)]\n", "text_limits = zip(text_starts, text_stops)\n", "\n", "# list of the groups from the code blocks\n", "code_blocks = [m.groupdict() for m in code_matches]\n", "# update with a type field\n", "code_blocks = [dict(d.items() + [('type', code)]) for d in\n", " code_blocks]\n", "\n", "# remove indents, add code magic, etc.\n", "map(pre_process_code_block, code_blocks)\n", "\n", "text_blocks = [{'content': text[i:j], 'type': markdown} for i, j\n", " in text_limits]\n", "\n", "# create a list of the right length\n", "all_blocks = range(len(text_blocks) + len(code_blocks))\n", "\n", "# cells must alternate in order\n", "all_blocks[::2] = text_blocks\n", "all_blocks[1::2] = code_blocks\n", "\n", "# remove possible empty first, last text cells\n", "all_blocks = [cell for cell in all_blocks if cell['content']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we've done that it is easy to convert the blocks into IPython\n", "notebook cells and create a new notebook:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cells = []\n", "for block in all_blocks:\n", " if block['type'] == code:\n", " kwargs = {'input': block['content'],\n", " 'language': block['language']}\n", "\n", " code_cell = nbbase.new_code_cell(**kwargs)\n", " cells.append(code_cell)\n", "\n", " elif block['type'] == markdown:\n", " kwargs = {'cell_type': block['type'],\n", " 'source': block['content']}\n", "\n", " markdown_cell = nbbase.new_text_cell(**kwargs)\n", " cells.append(markdown_cell)\n", "\n", " else:\n", " raise NotImplementedError(\"{} is not supported as a cell\"\n", " \"type\".format(block['type']))\n", "\n", "ws = nbbase.new_worksheet(cells=cells)\n", "nb = nbbase.new_notebook(worksheets=[ws])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`JSONWriter` gives us nicely formatted JSON output:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "writer = JSONWriter()\n", "print writer.writes(nb)" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 0 }