{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# YAML IPython notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Little experiment base on the fact that apparently YAML is made to be better readable by Humans than JSON.\n", "We've also had some complaint that metadata are not keep in nbconvert when roundtripping through markdown, those two\n", "made me think that I could try to see what ipynb files stored as YAML would look like. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'll also use this post to do some experiment for nbviewer future nbviewer features, if you see anything wrong with the css on some device, please tell me." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### First atempt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apparently Json is a subset of YAML:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " cp foo.ipynb foo.ipyamlnb\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yeah, Mission acomplished !" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Second try" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install PyYaml, and see what we can do. " ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import json\n", "import yaml" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from IPython.nbformat import current as nbf" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "YAML Notebook.ipynb\r\n" ] } ], "source": [ "ls Y*.ipynb" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with open('YAML Notebook.ipynb') as f:\n", " nbook = nbf.read( f, 'json')" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{u'cell_type': u'code',\n", " u'collapsed': False,\n", " u'input': u'from IPython.nbformat import current as nbf',\n", " u'language': u'python',\n", " u'metadata': {},\n", " u'outputs': []}" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nbook.worksheets[0].cells[9]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'll skipp the fiddling around with the yaml converter. In short, you have to specify explicitely the part you want to dump in the literal form, otherwise they are exported as list of strings, which is a little painfull to edit afterward. I'm using the `safe_dump` and `safe_load` methods (or pass safeLoader and Dumper). Those should be default or otherwise you could unserialise arbitrary object, and have code exucuted.\n", "\n", "We probably don't want to reproduct the recent file Rail's critical vulnerability that append not so long ago." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# we'll patch a safe Yaml Dumper\n", "sd = yaml.SafeDumper\n", "\n", "# Dummy class, just to mark the part we want with custom dumping\n", "class folded_unicode(unicode): pass\n", "class literal_unicode(unicode): pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I know classes should be wit upper case, but we just want to hide the fact that thoses a class to end user. At the same time I define a folded method to use it with markdown cell. when markdown contain really long lines, those will be wrapped in the yaml document." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def folded_unicode_representer(dumper, data):\n", " return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')\n", "def literal_unicode_representer(dumper, data):\n", " return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')\n", "\n", "sd.add_representer(folded_unicode, folded_unicode_representer)\n", "sd.add_representer(literal_unicode, literal_unicode_representer)\n", "\n", "\n", "with open('YAML Notebook.ipynb') as f:\n", " nbjson = json.load(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "now we patch the part of the ipynb file we know we want to be literal or folded" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for tcell in nbjson['worksheets'][0]['cells']:\n", " if 'source' in tcell.keys():\n", " tcell['source'] = folded_unicode(\"\".join(tcell['source']))\n", " if 'input' in tcell.keys():\n", " tcell['input'] = literal_unicode(\"\".join(tcell['input']))" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with open('Yaml.ipymlnb','w') as f:\n", " f.write(yaml.dump(nbjson, default_flow_style=False, Dumper=sd))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can round trip it to json, and it's still a valid ipynb file that can be loaded. Haven't fiddled with it much more.\n", "There are just a few gotchas with empty lines as well as trailing whitespace at EOL that can respectively diseapear or make the dumper fall back to a string quoted methods to store values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can skip down to the end of this notebook to look at how it looks like. It's probably much compact than the current json we emit, in **some** cases it might be more easy to read, but I don't think it is worth considering using in the format specification.\n", "\n", "ipynb files are ment to be humanely fixable, and I strongly prefere having a consistent format with simple rules than having to explain what are the meaning of the differents shenigan like `: |2+` for literal string.\n", "\n", "Also support across languages are not consistent, and it would probably be too much of a security burden for all code that will support loading ipynb to take care of sanitazing Yaml.\n", "\n", "One area where I woudl use it would be to describe the ipynb format at a talk for example, and/or to have metadata editing more human readable/writable." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "metadata:\r\n", " name: YAML Notebook\r\n", "nbformat: 3\r\n", "nbformat_minor: 0\r\n", "worksheets:\r\n", "- cells:\r\n", " - cell_type: heading\r\n", " level: 1\r\n", " metadata: {}\r\n", " source: >-\r\n", " YAML IPython notebook\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: \"Little experiment base on the fact that apparently YAML is made to be\\\r\n", " \\ better readable by Humans than JSON.\\nWe've also had some complaint that metadata\\\r\n", " \\ are not keep in nbconvert when roundtripping through markdown, those two\\n\\\r\n", " made me think that I could try to see what ipynb files stored as YAML would\\\r\n", " \\ look like. \"\r\n", " - cell_type: heading\r\n", " level: 4\r\n", " metadata: {}\r\n", " source: >-\r\n", " First atempt\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >-\r\n", " Apparently Json is a subset of YAML:\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >2+\r\n", " cp foo.ipynb foo.ipyamlnb\r\n", "\r\n", "\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >-\r\n", " Yeah, Mission acomplished !\r\n", " - cell_type: heading\r\n", " level: 4\r\n", " metadata: {}\r\n", " source: >-\r\n", " Second try\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: \"Install PyYaml, and see what we can do. \"\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " import json\r\n", " import yaml\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " from IPython.nbformat import current as nbf\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " ls Y*.ipynb\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " with open('YAML Notebook.ipynb') as f:\r\n", " nbook = nbf.read( f, 'json')\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " nbook.worksheets[0].cells[9]\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >-\r\n", " I'll skipp the fiddling around with the yaml converter. In short, you have to\r\n", " specify explicitely the part you want to dump in the literal form, otherwise\r\n", " they are exported as list of strings, which is a little painfull to edit afterward.\r\n", " I'm using the `safe_dump` and `safe_load` methods (or pass safeLoader and Dumper).\r\n", " Those should be default or otherwise you could unserialise arbitrary object,\r\n", " and have code exucuted.\r\n", "\r\n", "\r\n", " We probably don't want to reproduct the recent file Rail's critical vulnerability\r\n", " that append not so long ago.\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " # we'll patch a safe Yaml Dumper\r\n", " sd = yaml.SafeDumper\r\n", "\r\n", " # Dummy class, just to mark the part we want with custom dumping\r\n", " class folded_unicode(unicode): pass\r\n", " class literal_unicode(unicode): pass\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >-\r\n", " I know classes should be wit upper case, but we just want to hide the fact that\r\n", " thoses a class to end user. At the same time I define a folded method if I want\r\n", " to use it later.\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " def folded_unicode_representer(dumper, data):\r\n", " return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')\r\n", " def literal_unicode_representer(dumper, data):\r\n", " return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')\r\n", "\r\n", " sd.add_representer(folded_unicode, folded_unicode_representer)\r\n", " sd.add_representer(literal_unicode, literal_unicode_representer)\r\n", "\r\n", "\r\n", " with open('YAML Notebook.ipynb') as f:\r\n", " nbjson = json.load(f)\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >-\r\n", " now we patch the part of the ipynb file we know we want to be literal or folded\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " for tcell in nbjson['worksheets'][0]['cells']:\r\n", " if 'source' in tcell.keys():\r\n", " tcell['source'] = folded_unicode(\"\".join(tcell['source']))\r\n", " if 'input' in tcell.keys():\r\n", " tcell['input'] = literal_unicode(\"\".join(tcell['input']))\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " with open('Yaml.ipymlnb','w') as f:\r\n", " f.write(yaml.dump(nbjson, default_flow_style=False, Dumper=sd))\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >-\r\n", " You can round trip it to json, and it's still a valid ipynb file that can be\r\n", " loaded. Haven't fiddled with it much more.\r\n", "\r\n", " There are just a few gotchas with empty lines as well as trailing whitespace\r\n", " at EOL that can respectively diseapear or make the dumper fall back to a string\r\n", " quoted methods to store values.\r\n", "\r\n", "\r\n", " One could also try to tiker with `folded_unicode` in markdown cell that tipically\r\n", " have long lines to play a little more nicely with VCS.\r\n", " - cell_type: markdown\r\n", " metadata: {}\r\n", " source: >-\r\n", " You can skip down to the end of this notebook to loko at how it looks like.\r\n", " It's probably much compact than the current json we emit, in **some** cases\r\n", " it might be more easy to read, but I don't think it is worth considering using\r\n", " in the format specification.\r\n", "\r\n", "\r\n", " ipynb files are ment to be humanely fixable, and I strongly prefere having a\r\n", " consistent format with simple rules than having to explain what are the meaning\r\n", " of the differents shenigan like `: |2+` for literal string.\r\n", "\r\n", "\r\n", " Also support across languages are not consistent, and it would probably be too\r\n", " much of a security burden for all code that will support loading ipynb to take\r\n", " care of sanitazing Yaml.\r\n", "\r\n", "\r\n", " One area where I woudl use it would be to describe the ipynb format at a talk\r\n", " for example, and/or to have metadata editing more human readable/writable.\r\n", " - cell_type: code\r\n", " collapsed: false\r\n", " input: |-\r\n", " !cat Yaml.ipymlnb\r\n", " language: python\r\n", " metadata: {}\r\n", " outputs: []\r\n", " metadata: {}\r\n" ] } ], "source": [ "!cat Yaml.ipymlnb" ] } ], "metadata": { "_nbviewer": { "css": "css_linalg" }, "kernelspec": { "display_name": "IPython (Python 3)", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 0 }