{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": false }, "outputs": [], "source": [ "# export\n", "from local.core.imports import *\n", "from local.notebook.core import *\n", "import nbformat,inspect\n", "from nbformat.sign import NotebookNotary" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# default_exp notebook.export\n", "# default_cls_lvl 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Converting notebooks to modules\n", "\n", "> The functions that transform the dev notebooks in the fastai library\n", "\n", "- author: \"Sylvain Gugger\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading a notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What's a notebook?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A jupyter notebook is a json file behind the scenes. We can just read it with the json module, which will return a nested dictionary of dictionaries/lists of dictionaries, but there are some small differences between reading the json and using the tools from `nbformat` so we'll use this one." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def read_nb(fname):\n", " \"Read the notebook in `fname`.\"\n", " with open(Path(fname),'r', encoding='utf8') as f: return nbformat.reads(f.read(), as_version=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`fname` can be a string or a pathlib object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_nb = read_nb('91_notebook_export.ipynb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The root has four keys: `cells` contains the cells of the notebook, `metadata` some stuff around the version of python used to execute the notebook, `nbformat` and `nbformat_minor` the version of nbformat. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['cells', 'metadata', 'nbformat', 'nbformat_minor'])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_nb.keys()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'kernelspec': {'display_name': 'Python 3',\n", " 'language': 'python',\n", " 'name': 'python3'},\n", " 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},\n", " 'file_extension': '.py',\n", " 'mimetype': 'text/x-python',\n", " 'name': 'python',\n", " 'nbconvert_exporter': 'python',\n", " 'pygments_lexer': 'ipython3',\n", " 'version': '3.6.9'}}" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_nb['metadata']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'4.4'" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f\"{test_nb['nbformat']}.{test_nb['nbformat_minor']}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cells key then contains a list of cells. Each one is a new dictionary that contains entries like the type (code or markdown), the source (what is written in the cell) and the output (for code cells)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'cell_type': 'code',\n", " 'execution_count': None,\n", " 'metadata': {'hide_input': False},\n", " 'outputs': [],\n", " 'source': '# export\\nfrom local.core.imports import *\\nfrom local.notebook.core import *\\nimport nbformat,inspect\\nfrom nbformat.sign import NotebookNotary'}" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_nb['cells'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding patterns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def _test_eq(a,b): assert a==b, f'{a}, {b}'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def check_re(cell, pat, code_only=True):\n", " \"Check if `cell` contains a line with regex `pat`\"\n", " if code_only and cell['cell_type'] != 'code': return\n", " if isinstance(pat, str): pat = re.compile(pat, re.IGNORECASE | re.MULTILINE)\n", " return pat.search(cell['source'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`pat` can be a string or a compiled regex, if `code_only=True`, ignores markdown cells." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cell = test_nb['cells'][0].copy()\n", "assert check_re(cell, '# export') is not None\n", "assert check_re(cell, re.compile('# export')) is not None\n", "assert check_re(cell, '# bla') is None\n", "cell['cell_type'] = 'markdown'\n", "assert check_re(cell, '# export') is None\n", "assert check_re(cell, '# export', code_only=False) is not None" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "_re_blank_export = re.compile(r\"\"\"\n", "# Matches any line with #export or #exports without any module name:\n", "^ # beginning of line (since re.MULTILINE is passed)\n", "\\s* # any number of whitespace\n", "\\#\\s* # # then any number of whitespace\n", "exports? # export or exports\n", "\\s* # any number of whitespace\n", "$ # end of line (since re.MULTILINE is passed)\n", "\"\"\", re.IGNORECASE | re.MULTILINE | re.VERBOSE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "_re_mod_export = re.compile(r\"\"\"\n", "# Matches any line with #export or #exports with a module name and catches it in group 1:\n", "^ # beginning of line (since re.MULTILINE is passed)\n", "\\s* # any number of whitespace\n", "\\#\\s* # # then any number of whitespace\n", "exports? # export or exports\n", "\\s* # any number of whitespace\n", "(\\S+) # catch a group with any non-whitespace chars\n", "\\s* # any number of whitespace\n", "$ # end of line (since re.MULTILINE is passed)\n", "\"\"\", re.IGNORECASE | re.MULTILINE | re.VERBOSE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def is_export(cell, default):\n", " \"Check if `cell` is to be exported and returns the name of the module.\"\n", " if check_re(cell, _re_blank_export):\n", " if default is None:\n", " print(f\"This cell doesn't have an export destination and was ignored:\\n{cell['source'][1]}\")\n", " return default\n", " tst = check_re(cell, _re_mod_export)\n", " return os.path.sep.join(tst.groups()[0].split('.')) if tst else None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cells to export are marked with an `#export` or `#exports` code, potentially with a module name where we want it exported. The default is given in a cell of the form `#default_exp bla` inside the notebook (usually at the top), though in this function, it needs the be passed (the final script will read the whole notebook to find it)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cell = test_nb['cells'][0].copy()\n", "assert is_export(cell, 'export') == 'export'\n", "cell['source'] = \"# exports\" \n", "assert is_export(cell, 'export') == 'export'\n", "cell['source'] = \"# export mod\" \n", "assert is_export(cell, 'export') == 'mod'\n", "cell['source'] = \"# export mod.file\" \n", "assert is_export(cell, 'export') == 'mod/file'\n", "cell['source'] = \"# expt mod.file\"\n", "assert is_export(cell, 'export') is None" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "_re_default_exp = re.compile(r\"\"\"\n", "# Matches any line with #default_exp with a module name and catches it in group 1:\n", "^ # beginning of line (since re.MULTILINE is passed)\n", "\\s* # any number of whitespace\n", "\\#\\s* # # then any number of whitespace\n", "default_exp # export or exports\n", "\\s* # any number of whitespace\n", "(\\S+) # catch a group with any non-whitespace chars\n", "\\s* # any number of whitespace\n", "$ # end of line (since re.MULTILINE is passed)\n", "\"\"\", re.IGNORECASE | re.MULTILINE | re.VERBOSE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def find_default_export(cells):\n", " \"Find in `cells` the default export module.\"\n", " for cell in cells:\n", " tst = check_re(cell, _re_default_exp)\n", " if tst: return tst.groups()[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Stops at the first cell containing a `#default_exp` code and return the value behind. Returns `None` if there are no cell with that code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_test_eq(find_default_export(test_nb['cells']), 'notebook.export')\n", "assert find_default_export(test_nb['cells'][2:]) is None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exporting notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're now ready to export notebooks!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def _create_mod_file(fname, nb_path):\n", " \"Create a module file for `fname`.\"\n", " fname.parent.mkdir(parents=True, exist_ok=True)\n", " with open(fname, 'w') as f:\n", " f.write(f\"#AUTOGENERATED! DO NOT EDIT! File to edit: dev/{nb_path.name} (unless otherwise specified).\")\n", " f.write('\\n\\n__all__ = []')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_re_patch_func = re.compile(r\"\"\"\n", "# Catches any function decorated with @patch, its name in group 1 and the patched class in group 2\n", "@patch # At any place in the cell, something that begins with @patch\n", "\\s*def # Any number of whitespace (including a new line probably) followed by def\n", "\\s+ # One whitespace or more\n", "([^\\(\\s]*) # Catch a group composed of anything but whitespace or an opening parenthesis (name of the function)\n", "\\s*\\( # Any number of whitespace followed by an opening parenthesis\n", "[^:]* # Any number of character different of : (the name of the first arg that is type-annotated)\n", ":\\s* # A column followed by any number of whitespace\n", "(?: # Non-catching group with either\n", "([^,\\s\\(\\)]*) # a group composed of anything but a comma, a parenthesis or whitespace (name of the class)\n", "| # or\n", "(\\([^\\)]*\\))) # a group composed of something between parenthesis (tuple of classes)\n", "\\s* # Any number of whitespace\n", "(?:,|\\)) # Non-catching group with either a comma or a closing parenthesis\n", "\"\"\", re.VERBOSE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "tst = _re_patch_func.search(\"\"\"\n", "@patch\n", "def func(obj:Class):\"\"\")\n", "_test_eq(tst.groups(), (\"func\", \"Class\", None))\n", "tst = _re_patch_func.search(\"\"\"\n", "@patch\n", "def func (obj:Class, a)\"\"\")\n", "_test_eq(tst.groups(), (\"func\", \"Class\", None))\n", "tst = _re_patch_func.search(\"\"\"\n", "@patch\n", "def func (obj:(Class1, Class2), a)\"\"\")\n", "_test_eq(tst.groups(), (\"func\", None, \"(Class1, Class2)\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_re_typedispatch_func = re.compile(r\"\"\"\n", "# Catches any function decorated with @typedispatch\n", "(@typedispatch # At any place in the cell, catch a group with something that begins with @patch\n", "\\s*def # Any number of whitespace (including a new line probably) followed by def\n", "\\s+ # One whitespace or more\n", "[^\\(]* # Anything but whitespace or an opening parenthesis (name of the function)\n", "\\s*\\( # Any number of whitespace followed by an opening parenthesis\n", "[^\\)]* # Any number of character different of )\n", "\\)\\s*:) # A closing parenthesis followed by whitespace and :\n", "\"\"\", re.VERBOSE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "assert _re_typedispatch_func.search(\"@typedispatch\\ndef func(a, b):\").groups() == ('@typedispatch\\ndef func(a, b):',)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_re_class_func_def = re.compile(r\"\"\"\n", "# Catches any 0-indented function or class definition with its name in group 1\n", "^ # Beginning of a line (since re.MULTILINE is passed)\n", "(?:def|class) # Non-catching group for def or class\n", "\\s+ # One whitespace or more\n", "([^\\(\\s]*) # Catching group with any character except an opening parenthesis or a whitespace (name)\n", "\\s* # Any number of whitespace\n", "(?:\\(|:) # Non-catching group with either an opening parenthesis or a : (classes don't need ())\n", "\"\"\", re.MULTILINE | re.VERBOSE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "assert _re_class_func_def.search(\"class Class:\").groups() == ('Class',)\n", "assert _re_class_func_def.search(\"def func(a, b):\").groups() == ('func',)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_re_obj_def = re.compile(r\"\"\"\n", "# Catches any 0-indented object definition (bla = thing) with its name in group 1\n", "^ # Beginning of a line (since re.MULTILINE is passed)\n", "([^=\\s]*) # Catching group with any character except a whitespace or an equal sign\n", "\\s*= # Any number of whitespace followed by an =\n", "\"\"\", re.MULTILINE | re.VERBOSE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "assert _re_obj_def.search(\"a = 1\").groups() == ('a',)\n", "_test_eq(_re_obj_def.search(\"a=1\").groups(), ('a',))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def _not_private(n):\n", " for t in n.split('.'):\n", " if t.startswith('_') or t.startswith('@'): return False\n", " return '\\\\' not in t and '^' not in t and '[' not in t\n", "\n", "def export_names(code, func_only=False):\n", " \"Find the names of the objects, functions or classes defined in `code` that are exported.\"\n", " #Format monkey-patches with @patch\n", " def _f(gps):\n", " nm, cls, t = gps.groups()\n", " if cls is not None: return f\"def {cls}.{nm}():\"\n", " return '\\n'.join([f\"def {c}.{nm}():\" for c in re.split(', *', t[1:-1])])\n", "\n", " code = _re_typedispatch_func.sub('', code)\n", " code = _re_patch_func.sub(_f, code)\n", " names = _re_class_func_def.findall(code)\n", " if not func_only: names += _re_obj_def.findall(code)\n", " return [n for n in names if _not_private(n)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function only picks the zero-indented objects, functions or classes (we don't want the class methods for instance) and excludes private names (that begin with `_`). It only returns func and class names when `func_only=True`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "assert export_names(\"def my_func(x):\\n pass\\nclass MyClass():\") == [\"my_func\", \"MyClass\"]\n", "#Indented funcs are ignored (funcs inside a class)\n", "assert export_names(\" def my_func(x):\\n pass\\nclass MyClass():\") == [\"MyClass\"]\n", "#Private funcs are ignored\n", "assert export_names(\"def _my_func():\\n pass\\nclass MyClass():\") == [\"MyClass\"]\n", "#trailing spaces\n", "assert export_names(\"def my_func ():\\n pass\\nclass MyClass():\") == [\"my_func\", \"MyClass\"]\n", "#class without parenthesis\n", "assert export_names(\"def my_func ():\\n pass\\nclass MyClass:\") == [\"my_func\", \"MyClass\"]\n", "#object and funcs\n", "assert export_names(\"def my_func ():\\n pass\\ndefault_bla=[]:\") == [\"my_func\", \"default_bla\"]\n", "assert export_names(\"def my_func ():\\n pass\\ndefault_bla=[]:\", func_only=True) == [\"my_func\"]\n", "#Private objects are ignored\n", "assert export_names(\"def my_func ():\\n pass\\n_default_bla = []:\") == [\"my_func\"]\n", "#Objects with dots are privates if one part is private\n", "assert export_names(\"def my_func ():\\n pass\\ndefault.bla = []:\") == [\"my_func\", \"default.bla\"]\n", "assert export_names(\"def my_func ():\\n pass\\ndefault._bla = []:\") == [\"my_func\"]\n", "#Monkey-path with @patch are properly renamed\n", "assert export_names(\"@patch\\ndef my_func(x:Class):\\n pass\") == [\"Class.my_func\"]\n", "assert export_names(\"@patch\\ndef my_func(x:Class):\\n pass\", func_only=True) == [\"Class.my_func\"]\n", "assert export_names(\"some code\\n@patch\\ndef my_func(x:Class, y):\\n pass\") == [\"Class.my_func\"]\n", "assert export_names(\"some code\\n@patch\\ndef my_func(x:(Class1,Class2), y):\\n pass\") == [\"Class1.my_func\", \"Class2.my_func\"]\n", "\n", "#Check delegates\n", "assert export_names(\"@delegates(keep=True)\\nclass someClass:\\n pass\") == [\"someClass\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Typedispatch decorated functions shouldn't be added\n", "assert export_names(\"@patch\\ndef my_func(x:Class):\\n pass\\n@typedispatch\\ndef func(x: TensorImage): pass\") == [\"Class.my_func\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_re_all_def = re.compile(r\"\"\"\n", "# Catches a cell with defines \\_all\\_ = [\\*\\*] and get that \\*\\* in group 1\n", "^_all_ # Beginning of line (since re.MULTILINE is passed)\n", "\\s*=\\s* # Any number of whitespace, =, any number of whitespace\n", "\\[ # Opening [\n", "([^\\n\\]]*) # Catching group with anything except a ] or newline\n", "\\] # Closing ]\n", "\"\"\", re.MULTILINE | re.VERBOSE)\n", "\n", "#Same with __all__\n", "_re__all__def = re.compile(r'^__all__\\s*=\\s*\\[([^\\]]*)\\]', re.MULTILINE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def extra_add(code):\n", " \"Catch adds to `__all__` required by a cell with `_all_=`\"\n", " if _re_all_def.search(code):\n", " names = _re_all_def.search(code).groups()[0]\n", " names = re.sub('\\s*,\\s*', ',', names)\n", " names = names.replace('\"', \"'\")\n", " code = _re_all_def.sub('', code)\n", " code = re.sub(r'([^\\n]|^)\\n*$', r'\\1', code)\n", " return names.split(','),code\n", " return [],code" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "assert extra_add('_all_ = [\"func\", \"func1\", \"func2\"]') == ([\"'func'\", \"'func1'\", \"'func2'\"],'')\n", "assert extra_add('_all_ = [\"func\", \"func1\" , \"func2\"]') == ([\"'func'\", \"'func1'\", \"'func2'\"],'')\n", "assert extra_add(\"_all_ = ['func','func1', 'func2']\\n\") == ([\"'func'\", \"'func1'\", \"'func2'\"],'')\n", "assert extra_add('code\\n\\n_all_ = [\"func\", \"func1\", \"func2\"]') == ([\"'func'\", \"'func1'\", \"'func2'\"],'code')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _add2add(fname, names, line_width=120):\n", " if len(names) == 0: return\n", " with open(fname, 'r', encoding='utf8') as f: text = f.read()\n", " tw = TextWrapper(width=120, initial_indent='', subsequent_indent=' '*11, break_long_words=False)\n", " re_all = _re__all__def.search(text)\n", " start,end = re_all.start(),re_all.end()\n", " text_all = tw.wrap(f\"{text[start:end-1]}{'' if text[end-2]=='[' else ', '}{', '.join(names)}]\")\n", " with open(fname, 'w', encoding='utf8') as f: f.write(text[:start] + '\\n'.join(text_all) + text[end:])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fname = 'test_add.txt'\n", "with open(fname, 'w', encoding='utf8') as f: f.write(\"Bla\\n__all__ = [my_file, MyClas]\\nBli\")\n", "_add2add(fname, ['new_function'])\n", "with open(fname, 'r', encoding='utf8') as f: \n", " _test_eq(f.read(), \"Bla\\n__all__ = [my_file, MyClas, new_function]\\nBli\")\n", "_add2add(fname, [f'new_function{i}' for i in range(10)])\n", "with open(fname, 'r', encoding='utf8') as f: \n", " _test_eq(f.read(), \"\"\"Bla\n", "__all__ = [my_file, MyClas, new_function, new_function0, new_function1, new_function2, new_function3, new_function4,\n", " new_function5, new_function6, new_function7, new_function8, new_function9]\n", "Bli\"\"\")\n", "os.remove(fname)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def _relative_import(name, fname):\n", " mods = name.split('.')\n", " splits = str(fname).split(os.path.sep)\n", " if mods[0] not in splits: return name\n", " splits = splits[splits.index(mods[0]):]\n", " while len(mods)>0 and splits[0] == mods[0]: splits,mods = splits[1:],mods[1:]\n", " return '.' * (len(splits)) + '.'.join(mods)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "assert _relative_import('local.core', Path('local')/'data.py') == '.core'\n", "assert _relative_import('local.core', Path('local')/'vision'/'data.py') == '..core'\n", "assert _relative_import('local.vision.transform', Path('local')/'vision'/'data.py') == '.transform'\n", "assert _relative_import('local.notebook.core', Path('local')/'data'/'external.py') == '..notebook.core'\n", "assert _relative_import('local.vision', Path('local')/'vision'/'learner.py') == '.'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "#Catches any from local.bla import something and catches local.bla in group 1, the imported thing(s) in group 2.\n", "_re_import = re.compile(r'^(\\s*)from (local.\\S*) import (.*)$')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def _deal_import(code_lines, fname):\n", " pat = re.compile(r'from (local.\\S*) import (\\S*)$')\n", " lines = []\n", " def _replace(m):\n", " sp,mod,obj = m.groups()\n", " return f\"{sp}from {_relative_import(mod, fname)} import {obj}\"\n", " for line in code_lines:\n", " line = re.sub('_'+'file_', '__'+'file__', line) #Need to break _file_ or that line will be treated\n", " lines.append(_re_import.sub(_replace,line))\n", " return lines" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "lines = [\"from local.core import *\", \"nothing to see\", \" from local.vision import bla1, bla2\", \"from local.vision import models\"]\n", "assert _deal_import(lines, Path('local')/'data.py') == [\n", " \"from .core import *\", \"nothing to see\", \" from .vision import bla1, bla2\", \"from .vision import models\"\n", "]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "#Tricking jupyter notebook to have a __file__ attribute. All _file_ will be replaced by __file__\n", "_file_ = Path('local').absolute()/'notebook'/'export.py'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _get_index():\n", " if not (Path(_file_).parent/'index.txt').exists(): return {}\n", " return json.load(open(Path(_file_).parent/'index.txt', 'r', encoding='utf8'))\n", "\n", "def _save_index(index):\n", " fname = Path(_file_).parent/'index.txt'\n", " fname.parent.mkdir(parents=True, exist_ok=True)\n", " json.dump(index, open(fname, 'w', encoding='utf8'), indent=2)\n", "\n", "def _reset_index():\n", " if (Path(_file_).parent/'index.txt').exists():\n", " os.remove(Path(_file_).parent/'index.txt')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "ind,ind_bak = Path(_file_).parent/'index.txt',Path(_file_).parent/'index.bak'\n", "if ind.exists(): shutil.move(ind, ind_bak)\n", "_test_eq(_get_index(), {})\n", "_save_index({'foo':'bar'})\n", "_test_eq(_get_index(), {'foo':'bar'})\n", "if ind_bak.exists(): shutil.move(ind_bak, ind)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _notebook2script(fname, silent=False, to_pkl=False):\n", " \"Finds cells starting with `#export` and puts them into a new module\"\n", " if os.environ.get('IN_TEST',0): return # don't export if running tests\n", " fname = Path(fname)\n", " nb = read_nb(fname)\n", " default = find_default_export(nb['cells'])\n", " if default is not None:\n", " default = os.path.sep.join(default.split('.'))\n", " if not to_pkl: _create_mod_file(Path.cwd()/'local'/f'{default}.py', fname)\n", " index = _get_index()\n", " exports = [is_export(c, default) for c in nb['cells']]\n", " cells = [(i,c,e) for i,(c,e) in enumerate(zip(nb['cells'],exports)) if e is not None]\n", " for i,c,e in cells:\n", " fname_out = Path.cwd()/'local'/f'{e}.py'\n", " orig = ('#C' if e==default else f'#Comes from {fname.name}, c') + 'ell\\n'\n", " code = '\\n\\n' + orig + '\\n'.join(_deal_import(c['source'].split('\\n')[1:], fname_out))\n", " # remove trailing spaces\n", " names = export_names(code)\n", " extra,code = extra_add(code)\n", " if not to_pkl: _add2add(fname_out, [f\"'{f}'\" for f in names if '.' not in f and len(f) > 0] + extra)\n", " index.update({f: fname.name for f in names})\n", " code = re.sub(r' +$', '', code, flags=re.MULTILINE)\n", " if code != '\\n\\n' + orig[:-1]:\n", " if to_pkl: _update_pkl(fname_out, (i, fname, code))\n", " else:\n", " with open(fname_out, 'a', encoding='utf8') as f: f.write(code)\n", " _save_index(index)\n", " if not silent: print(f\"Converted {fname}.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export \n", "def _get_sorted_files(all_fs: Union[bool,str], up_to=None):\n", " \"Return the list of files corresponding to `g` in the current dir.\"\n", " if (all_fs==True): ret = glob.glob('*.ipynb') # Checks both that is bool type and that is True\n", " else: ret = glob.glob(all_fs) if isinstance(g,str) else []\n", " if len(ret)==0: print('WARNING: No files found')\n", " ret = [f for f in ret if not f.startswith('_')]\n", " if up_to is not None: ret = [f for f in ret if str(f)<=str(up_to)]\n", " return sorted(ret)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Converted 03a_layers.ipynb.\n" ] } ], "source": [ "_notebook2script('03a_layers.ipynb')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export \n", "def notebook2script(fname=None, all_fs=None, up_to=None, silent=False, to_pkl=False):\n", " \"Convert `fname` or all the notebook satisfying `all_fs`.\"\n", " # initial checks\n", " if os.environ.get('IN_TEST',0): return # don't export if running tests\n", " assert fname or all_fs\n", " if all_fs: _reset_index()\n", " if (all_fs is None) and (up_to is not None): all_fs=True # Enable allFiles if upTo is present\n", " fnames = _get_sorted_files(all_fs, up_to=up_to) if all_fs else [fname]\n", " [_notebook2script(f, silent=silent, to_pkl=to_pkl) for f in fnames]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finds cells starting with `#export` and puts them into the appropriate module.\n", "* `fname`: the filename of one notebook to convert\n", "* `all_fs`: `True` if you want to convert all notebook files in the folder or a glob expression\n", "* `up_to`: converts all notebooks respecting the previous arg up to a certain number\n", "\n", "Examples of use in console:\n", "```\n", "notebook2script # Parse all files\n", "notebook2script --fname 00_export.ipynb # Parse 00_export.ipynb\n", "notebook2script --all_fs=nb* # Parse all files starting with nb*\n", "notebook2script --up_to=10 # Parse all files with (name<='10')\n", "notebook2script --all_fs=*_*.ipynb --up_to=10 # Parse all files with an '_' and (name<='10')\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding the way back to notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to get the name of the object we are looking for, and then we'll try to find it in our index file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export \n", "def _get_property_name(p):\n", " \"Get the name of property `p`\"\n", " if hasattr(p, 'fget'):\n", " return p.fget.func.__qualname__ if hasattr(p.fget, 'func') else p.fget.__qualname__\n", " else: return next(iter(re.findall(r'\\'(.*)\\'', str(p)))).split('.')[-1]\n", "\n", "def get_name(obj):\n", " \"Get the name of `obj`\"\n", " if hasattr(obj, '__name__'): return obj.__name__\n", " elif getattr(obj, '_name', False): return obj._name\n", " elif hasattr(obj,'__origin__'): return str(obj.__origin__).split('.')[-1] #for types\n", " elif type(obj)==property: return _get_property_name(obj)\n", " else: return str(obj).split('.')[-1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def qual_name(obj):\n", " \"Get the qualified name of `obj`\"\n", " if hasattr(obj,'__qualname__'): return obj.__qualname__\n", " if inspect.ismethod(obj): return f\"{get_name(obj.__self__)}.{get_name(fn)}\"\n", " return get_name(obj)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_test_eq(get_name(in_ipython), 'in_ipython')\n", "_test_eq(get_name(DocsTestClass.test), 'test')\n", "# assert get_name(Union[Tensor, float]) == 'Union'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For properties defined using `property` or our own `add_props` helper, we approximate the name by looking at their getter functions, since we don't seem to have access to the property name itself. If everything fails (a getter cannot be found), we return the name of the object that contains the property. This suffices for `source_nb` to work." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "class PropertyClass:\n", " p_lambda = property(lambda x: x)\n", " def some_getter(self): return 7\n", " p_getter = property(some_getter)\n", "\n", "_test_eq(get_name(PropertyClass.p_lambda), 'PropertyClass.')\n", "_test_eq(get_name(PropertyClass.p_getter), 'PropertyClass.some_getter')\n", "_test_eq(get_name(PropertyClass), 'PropertyClass')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def source_nb(func, is_name=None, return_all=False):\n", " \"Return the name of the notebook where `func` was defined\"\n", " is_name = is_name or isinstance(func, str)\n", " index = _get_index()\n", " name = func if is_name else qual_name(func)\n", " while len(name) > 0:\n", " if name in index: return (name,index[name]) if return_all else index[name]\n", " name = '.'.join(name.split('.')[:-1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_test_eq(qual_name(DocsTestClass), 'DocsTestClass')\n", "_test_eq(qual_name(DocsTestClass.test), 'DocsTestClass.test')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "_re_default_nb = re.compile(r'File to edit: dev/(\\S+)\\s+')\n", "_re_cell = re.compile(r'^#Cell|^#Comes from\\s+(\\S+), cell')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can either pass an object or its name (by default `is_name` will look if `func` is a string or not, but you can override if there is some inconsistent behavior). \n", "\n", "If passed a method of a class, the function will return the notebook in which the largest part of the function was defined in case there is a monkey-matching that defines `class.method` in a different notebook than `class`. If `return_all=True`, the function will return a tuple with the name by which the function was found and the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from local.core.transform import Transform\n", "from local.test import test_fail" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_test_eq(source_nb(test_fail), '00_test.ipynb')\n", "_test_eq(source_nb(Transform), '01c_transform.ipynb')\n", "_test_eq(source_nb(Transform.decode), '01c_transform.ipynb')\n", "#opt_call is in the core module but defined in 02\n", "# from local.core import opt_call\n", "# _test_eq(source_nb(opt_call), '02_data_pipeline.ipynb' # TODO: find something else)\n", "assert source_nb(int) is None\n", "#Added through a monkey-patch\n", "_test_eq(source_nb('Path.ls'), '01a_utils.ipynb')\n", "\n", "#Test with name TODO:Investigate\n", "#_test_eq(source_nb('DocsTestClass'), '90_notebook_core.ipynb')\n", "#_test_eq(source_nb('DocsTestClass.test'), '90_notebook_core.ipynb')\n", "\n", "#Test return_all\n", "#assert source_nb(DocsTestClass, return_all=True) == ('DocsTestClass','90_notebook_core.ipynb')\n", "#assert source_nb(DocsTestClass.test, return_all=True) == ('DocsTestClass','90_notebook_core.ipynb')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "# Commented out to avoid circ ref - uncomment to test manually\n", "# from local.data.core import *\n", "# _test_eq(source_nb(DataBunch.train_dl), '05_data_core.ipynb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading the library" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If someone decides to change a module instead of the notebooks, the following functions help update the notebooks accordingly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def _split(code):\n", " lines = code.split('\\n')\n", " default_nb = _re_default_nb.search(lines[0])\n", " if not default_nb: set_trace()\n", " default_nb = default_nb.groups()[0]\n", " s,res = 1,[]\n", " while _re_cell.search(lines[s]) is None: s += 1\n", " e = s+1\n", " while e < len(lines):\n", " while e < len(lines) and _re_cell.search(lines[e]) is None: e += 1\n", " grps = _re_cell.search(lines[s]).groups()\n", " nb = grps[0] or default_nb\n", " content = lines[s+1:e]\n", " while len(content) > 1 and content[-1] == '': content = content[:-1]\n", " res.append((nb, '\\n'.join(content)))\n", " s,e = e,e+1\n", " return res" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(Path.cwd()/'local'/'core'/'foundation.py') as f: code = f.read()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _relimport2name(name, mod_name):\n", " if mod_name.endswith('.py'): mod_name = mod_name[:-3]\n", " mods = mod_name.split(os.path.sep)\n", " mods = mods[mods.index('local'):]\n", " if name=='.':\n", " print(\"###\",'.'.join(mods[:-1]))\n", " return '.'.join(mods[:-1])\n", " i = 0\n", " while name[i] == '.': i += 1\n", " return '.'.join(mods[:-i] + [name[i:]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "#Catches any from .bla import something and catches local.bla in group 1, the imported thing(s) in group 2.\n", "_re_loc_import = re.compile(r'(^\\s*)from (\\.\\S*) import (.*)$')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "assert _relimport2name('.core', 'local/data.py') == 'local.core'\n", "assert _relimport2name('.core', 'home/sgugger/fastai_dev/dev/local/data.py') == 'local.core'\n", "assert _relimport2name('..core', 'local/vision/data.py') == 'local.core'\n", "assert _relimport2name('.transform', 'local/vision/data.py') == 'local.vision.transform'\n", "assert _relimport2name('..notebook.core', 'local/data/external.py') == 'local.notebook.core'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _deal_loc_import(code, fname):\n", " lines = []\n", " def _replace(m):\n", " sp,mod,obj = m.groups()\n", " return f\"{sp}from {_relimport2name(mod, fname)} import {obj}\"\n", " for line in code.split('\\n'):\n", " line = re.sub('__'+'file__', '_'+'file_', line) #Need to break _file_ or that line will be treated\n", " lines.append(_re_loc_import.sub(_replace,line))\n", " return '\\n'.join(lines)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# export\n", "def _update_pkl(fname, cell):\n", " dic = pickle.load(open((Path.cwd()/'lib.pkl'), 'rb')) if (Path.cwd()/'lib.pkl').exists() else collections.defaultdict(list)\n", " dic[fname].append(cell)\n", " pickle.dump(dic, open((Path.cwd()/'lib.pkl'), 'wb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "code = \"from .core import *\\nnothing to see\\n from .vision import bla1, bla2\"\n", "assert _deal_loc_import(code, 'local/data.py') == \"from local.core import *\\nnothing to see\\n from local.vision import bla1, bla2\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _script2notebook(fname, dic, silent=False):\n", " \"Put the content of `fname` back in the notebooks it came from.\"\n", " if os.environ.get('IN_TEST',0): return # don't export if running tests\n", " if not silent: print(f\"Converting {fname}.\")\n", " fname = Path(fname)\n", " with open(fname, encoding='utf8') as f: code = f.read()\n", " splits = _split(code)\n", " assert len(splits)==len(dic[fname]), f\"Exported file from notebooks should have {len(dic[fname])} cells but has {len(splits)}.\"\n", " assert np.all([c1[0]==c2[1]] for c1,c2 in zip(splits, dic[fname]))\n", " splits = [(c2[0],c1[0],c1[1]) for c1,c2 in zip(splits, dic[fname])]\n", " nb_fnames = {s[1] for s in splits}\n", " for nb_fname in nb_fnames:\n", " nb = read_nb(nb_fname)\n", " for i,f,c in splits:\n", " c = _deal_loc_import(c, str(fname))\n", " if f == nb_fname:\n", " l = nb['cells'][i]['source'].split('\\n')[0]\n", " nb['cells'][i]['source'] = l + '\\n' + c\n", " NotebookNotary().sign(nb)\n", " nbformat.write(nb, nb_fname, version=4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if (Path.cwd()/'lib.pkl').exists(): os.remove(Path.cwd()/'lib.pkl')\n", "notebook2script(all_fs=True, silent=True, to_pkl=True)\n", "dic = pickle.load(open(Path.cwd()/'lib.pkl', 'rb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Converting /root/workspace/fastai_dev_fork/dev/local/tabular/core.py.\n" ] } ], "source": [ "_script2notebook(Path().cwd()/'local/tabular/core.py', dic)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_manual_mods = 'version.py __init__.py imports.py torch_imports.py patch_tables.py all.py torch_basics.py fp16_utils.py test_utils.py basics.py launch.py'.split()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def script2notebook(folder='local', silent=False):\n", " if (Path.cwd()/'lib.pkl').exists(): os.remove(Path.cwd()/'lib.pkl')\n", " notebook2script(all_fs=True, silent=True, to_pkl=True)\n", " dic = pickle.load(open(Path.cwd()/'lib.pkl', 'rb'))\n", " os.remove(Path.cwd()/'lib.pkl')\n", " if os.environ.get('IN_TEST',0): return # don't export if running tests\n", " for f in (Path.cwd()/folder).glob('**/*.py'):\n", " if f.name not in _manual_mods: _script2notebook(f, dic, silent=silent)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#script2notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Diff notebook - library" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "import subprocess" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _print_diff(code1, code2, fname):\n", " diff = difflib.ndiff(code1, code2)\n", " sys.stdout.writelines(diff)\n", " #for l in difflib.context_diff(code1, code2): print(l)\n", " #_print_diff_py(code1, code2, fname) if fname.endswith('.py') else _print_diff_txt(code1, code2, fname)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def diff_nb_script(lib_folder='local'):\n", " \"Print the diff between the notebooks and the library in `lib_folder`\"\n", " tmp_path1,tmp_path2 = Path.cwd()/'tmp_lib',Path.cwd()/'tmp_lib1'\n", " shutil.copytree(Path.cwd()/lib_folder, tmp_path1)\n", " try:\n", " notebook2script(all_fs=True, silent=True)\n", " shutil.copytree(Path.cwd()/lib_folder, tmp_path2)\n", " shutil.rmtree(Path.cwd()/lib_folder)\n", " shutil.copytree(tmp_path1, Path.cwd()/lib_folder)\n", " res = subprocess.run(['diff', '-ru', 'tmp_lib1', lib_folder], stdout=subprocess.PIPE)\n", " print(res.stdout.decode('utf-8'))\n", " finally:\n", " shutil.rmtree(tmp_path1)\n", " shutil.rmtree(tmp_path2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "diff_nb_script()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "notebook2script(all_fs=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }