{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": false }, "outputs": [], "source": [ "#hide\n", "#default_exp merge\n", "from nbdev.showdoc import show_doc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "from nbdev.imports import *\n", "from fastcore.script import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fix merge conflicts\n", "\n", "> Fix merge conflicts in jupyter notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When working with jupyter notebooks (which are json files behind the scenes) and GitHub, it is very common that a merge conflict (that will add new lines in the notebook source file) will break some notebooks you are working on. This module defines the function `fix_conflicts` to fix those notebooks for you, and attempt to automatically merge standard conflicts. The remaining ones will be delimited by markdown cells like this:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Fixed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Walk cells" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "tst_nb=\"\"\"{\n", " \"cells\": [\n", " {\n", " \"cell_type\": \"code\",\n", "<<<<<<< HEAD\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"outputs\": [\n", " {\n", " \"data\": {\n", " \"text/plain\": [\n", " \"3\"\n", " ]\n", " },\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"output_type\": \"execute_result\"\n", " }\n", " ],\n", " \"source\": [\n", " \"z=3\\n\",\n", " \"z\"\n", " ]\n", " },\n", " {\n", " \"cell_type\": \"code\",\n", " \"execution_count\": 7,\n", "=======\n", " \"execution_count\": 5,\n", ">>>>>>> a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35\n", " \"metadata\": {},\n", " \"outputs\": [\n", " {\n", " \"data\": {\n", " \"text/plain\": [\n", " \"6\"\n", " ]\n", " },\n", "<<<<<<< HEAD\n", " \"execution_count\": 7,\n", "=======\n", " \"execution_count\": 5,\n", ">>>>>>> a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35\n", " \"metadata\": {},\n", " \"output_type\": \"execute_result\"\n", " }\n", " ],\n", " \"source\": [\n", " \"x=3\\n\",\n", " \"y=3\\n\",\n", " \"x+y\"\n", " ]\n", " },\n", " {\n", " \"cell_type\": \"code\",\n", " \"execution_count\": null,\n", " \"metadata\": {},\n", " \"outputs\": [],\n", " \"source\": []\n", " }\n", " ],\n", " \"metadata\": {\n", " \"kernelspec\": {\n", " \"display_name\": \"Python 3\",\n", " \"language\": \"python\",\n", " \"name\": \"python3\"\n", " }\n", " },\n", " \"nbformat\": 4,\n", " \"nbformat_minor\": 2\n", "}\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an example of broken notebook we defined in `tst_nb`. The json format is broken by the lines automatically added by git. Such a file can't be opened again in jupyter notebook, leaving the user with no other choice than to fix the text file manually." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"cells\": [\n", " {\n", " \"cell_type\": \"code\",\n", "<<<<<<< HEAD\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"outputs\": [\n", " {\n", " \"data\": {\n", " \"text/plain\": [\n", " \"3\"\n", " ]\n", " },\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"output_type\": \"execute_result\"\n", " }\n", " ],\n", " \"source\": [\n", " \"z=3\n", "\",\n", " \"z\"\n", " ]\n", " },\n", " {\n", " \"cell_type\": \"code\",\n", " \"execution_count\": 7,\n", "=======\n", " \"execution_count\": 5,\n", ">>>>>>> a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35\n", " \"metadata\": {},\n", " \"outputs\": [\n", " {\n", " \"data\": {\n", " \"text/plain\": [\n", " \"6\"\n", " ]\n", " },\n", "<<<<<<< HEAD\n", " \"execution_count\": 7,\n", "=======\n", " \"execution_count\": 5,\n", ">>>>>>> a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35\n", " \"metadata\": {},\n", " \"output_type\": \"execute_result\"\n", " }\n", " ],\n", " \"source\": [\n", " \"x=3\n", "\",\n", " \"y=3\n", "\",\n", " \"x+y\"\n", " ]\n", " },\n", " {\n", " \"cell_type\": \"code\",\n", " \"execution_count\": null,\n", " \"metadata\": {},\n", " \"outputs\": [],\n", " \"source\": []\n", " }\n", " ],\n", " \"metadata\": {\n", " \"kernelspec\": {\n", " \"display_name\": \"Python 3\",\n", " \"language\": \"python\",\n", " \"name\": \"python3\"\n", " }\n", " },\n", " \"nbformat\": 4,\n", " \"nbformat_minor\": 2\n", "}\n" ] } ], "source": [ "print(tst_nb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in this example, the second conflict is easily solved: it just concerns the execution count of the second cell and can be solved by choosing either option without really impacting your notebook. This is the kind of conflicts `fix_conflicts` will (by default) fix automatically. The first conflict is more complicated as it spans across two cells and there is a cell present in one version, not the other. Such a conflict (and generally the ones where the inputs of the cells change form one version to the other) aren't automatically fixed, but `fix_conflicts` will return a proper json file where the annotations introduced by git will be placed in markdown cells.\n", "\n", "The first step to do this is to walk the raw text file to extract the cells. We can't read it as a JSON since it's broken, so we have to parse the text." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def extract_cells(raw_txt):\n", " \"Manually extract cells in potential broken json `raw_txt`\"\n", " lines = raw_txt.split('\\n')\n", " cells = []\n", " i = 0\n", " while not lines[i].startswith(' \"cells\"'): i+=1\n", " i += 1\n", " start = '\\n'.join(lines[:i])\n", " while lines[i] != ' ],':\n", " while lines[i] != ' {': i+=1\n", " j = i\n", " while not lines[j].startswith(' }'): j+=1\n", " c = '\\n'.join(lines[i:j+1])\n", " if not c.endswith(','): c = c + ','\n", " cells.append(c)\n", " i = j+1\n", " end = '\\n'.join(lines[i:])\n", " return start,cells,end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function returns the beginning of the text (before the cells are defined), the list of cells and the end of the text (after the cells are defined)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "start,cells,end = extract_cells(tst_nb)\n", "test_eq(len(cells), 3)\n", "test_eq(cells[0], \"\"\" {\n", " \"cell_type\": \"code\",\n", "<<<<<<< HEAD\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"outputs\": [\n", " {\n", " \"data\": {\n", " \"text/plain\": [\n", " \"3\"\n", " ]\n", " },\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"output_type\": \"execute_result\"\n", " }\n", " ],\n", " \"source\": [\n", " \"z=3\\n\",\n", " \"z\"\n", " ]\n", " },\"\"\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "#Test the whole text is there\n", "#We add a , to the last cell (because we might add some after for merge conflicts at the end, so we need to remove it)\n", "test_eq(tst_nb, '\\n'.join([start] + cells[:-1] + [cells[-1][:-1]] + [end]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When walking the broken cells, we will add conflicts marker before and after the cells with conflicts as markdown cells. To do that we use this function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def get_md_cell(txt):\n", " \"A markdown cell with `txt`\"\n", " return ''' {\n", " \"cell_type\": \"markdown\",\n", " \"metadata\": {},\n", " \"source\": [\n", " \"''' + txt + '''\"\n", " ]\n", " },'''" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tst = ''' {\n", " \"cell_type\": \"markdown\",\n", " \"metadata\": {},\n", " \"source\": [\n", " \"A bit of markdown\"\n", " ]\n", " },'''\n", "assert get_md_cell(\"A bit of markdown\") == tst" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "conflicts = '<<<<<<< ======= >>>>>>>'.split()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def _split_cell(cell, cf, names):\n", " \"Split `cell` between `conflicts` given state in `cf`, save `names` of branches if seen\"\n", " res1,res2 = [],[]\n", " for line in cell.split('\\n'):\n", " if line.startswith(conflicts[cf]):\n", " if names[cf//2] is None: names[cf//2] = line[8:]\n", " cf = (cf+1)%3\n", " continue\n", " if cf<2: res1.append(line)\n", " if cf%2==0: res2.append(line)\n", " return '\\n'.join(res1),'\\n'.join(res2),cf,names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "tst = '\\n'.join(['a', f'{conflicts[0]} HEAD', 'b', conflicts[1], 'c', f'{conflicts[2]} lala', 'd'])\n", "v1,v2,cf,names = _split_cell(tst, 0, [None,None])\n", "assert v1 == 'a\\nb\\nd'\n", "assert v2 == 'a\\nc\\nd'\n", "assert cf == 0\n", "assert names == ['HEAD', 'lala']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "tst = '\\n'.join(['a', f'{conflicts[0]} HEAD', 'b', conflicts[1], 'c', f'{conflicts[2]} lala', 'd', f'{conflicts[0]} HEAD', 'e'])\n", "v1,v2,cf,names = _split_cell(tst, 0, [None,None])\n", "assert v1 == 'a\\nb\\nd\\ne'\n", "assert v2 == 'a\\nc\\nd'\n", "assert cf == 1\n", "assert names == ['HEAD', 'lala']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "tst = '\\n'.join(['a', f'{conflicts[0]} HEAD', 'b', conflicts[1], 'c', f'{conflicts[2]} lala', 'd', f'{conflicts[0]} HEAD', 'e', conflicts[1]])\n", "v1,v2,cf,names = _split_cell(tst, 0, [None,None])\n", "assert v1 == 'a\\nb\\nd\\ne'\n", "assert v2 == 'a\\nc\\nd'\n", "assert cf == 2\n", "assert names == ['HEAD', 'lala']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "tst = '\\n'.join(['b', conflicts[1], 'c', f'{conflicts[2]} lala', 'd'])\n", "v1,v2,cf,names = _split_cell(tst, 1, ['HEAD',None])\n", "assert v1 == 'b\\nd'\n", "assert v2 == 'c\\nd'\n", "assert cf == 0\n", "assert names == ['HEAD', 'lala']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "tst = '\\n'.join(['c', f'{conflicts[2]} lala', 'd'])\n", "v1,v2,cf,names = _split_cell(tst, 2, ['HEAD',None])\n", "assert v1 == 'd'\n", "assert v2 == 'c\\nd'\n", "assert cf == 0\n", "assert names == ['HEAD', 'lala']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_re_conflict = re.compile(r'^<<<<<<<', re.MULTILINE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "assert _re_conflict.search('a\\nb\\nc') is None\n", "assert _re_conflict.search('a\\n<<<<<<<\\nc') is not None" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def same_inputs(t1, t2):\n", " \"Test if the cells described in `t1` and `t2` have the same inputs\"\n", " if len(t1)==0 or len(t2)==0: return False\n", " try:\n", " c1,c2 = json.loads(t1[:-1]),json.loads(t2[:-1])\n", " return c1['source']==c2['source']\n", " except Exception as e: return False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts = [''' {\n", " \"cell_type\": \"code\",\n", " \"source\": [\n", " \"'''+code+'''\"\n", " ]\n", " },''' for code in [\"a=1\", \"b=1\", \"a=1\"]]\n", "assert same_inputs(ts[0],ts[2])\n", "assert not same_inputs(ts[0], ts[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def analyze_cell(cell, cf, names, prev=None, added=False, fast=True, trust_us=True):\n", " \"Analyze and solve conflicts in `cell`\"\n", " if cf==0 and _re_conflict.search(cell) is None: return cell,cf,names,prev,added\n", " old_cf = cf\n", " v1,v2,cf,names = _split_cell(cell, cf, names)\n", " if fast and same_inputs(v1,v2):\n", " if old_cf==0 and cf==0: return (v2 if trust_us else v1),cf,names,prev,added\n", " v1,v2 = (v2,v2) if trust_us else (v1,v1)\n", " res = []\n", " if old_cf == 0:\n", " added=True\n", " res.append(get_md_cell(f'`{conflicts[0]} {names[0]}`'))\n", " res.append(v1)\n", " if cf ==0:\n", " res.append(get_md_cell(f'`{conflicts[1]}`'))\n", " if prev is not None: res += prev\n", " res.append(v2)\n", " res.append(get_md_cell(f'`{conflicts[2]} {names[1]}`'))\n", " prev = None\n", " else: prev = [v2] if prev is None else prev + [v2]\n", " return '\\n'.join([r for r in res if len(r) > 0]),cf,names,prev,added" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the main function used to walk through the cells of a notebook. `cell` is the cell we're at, `cf` the conflict state: `0` if we're not in any conflict, `1` if we are inside the first part of a conflict (between `<<<<<<<` and `=======`) and `2` for the second part of a conflict. `names` contains the names of the branches (they start at `[None,None]` and get updated as we pass along conflicts). `prev` contains a copy of what should be included at the start of the second version (if `cf=1` or `cf=2`). `added` starts at `False` and keeps track of whether we added any markdown cells (this flag allows us to know if a fast merge didn't leave any conflicts at the end). `fast` and `trust_us` are passed along by `fix_conflicts`: if `fast` is `True`, we don't point out conflict between cells if the inputs in the two versions are the same. Instead we merge using the local or remote branch, depending on `trust_us`.\n", "\n", "The function then returns the updated text (with one or several cells, depending on the conflicts to solve), the updated `cf`, `names`, `prev` and `added`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tst = '\\n'.join(['a', f'{conflicts[0]} HEAD', 'b', conflicts[1], 'c'])\n", "c,cf,names,prev,added = analyze_cell(tst, 0, [None,None], None, False,fast=False)\n", "test_eq(c, get_md_cell('`<<<<<<< HEAD`')+'\\na\\nb')\n", "test_eq(cf, 2)\n", "test_eq(names, ['HEAD', None])\n", "test_eq(prev, ['a\\nc'])\n", "test_eq(added, True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here in this example, we were entering cell `tst` with no conflict state. At the end of the cells, we are still in the second part of the conflict, hence `cf=2`. The result returns a marker for the branch head, then the whole cell in version 1 (a + b). We save a (prior to the conflict hence common to the two versions) and c (only in version 2) for the next cell in `prev` (that should contain the resolution of this conflict)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Main function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "@call_parse\n", "def nbdev_fix_merge(\n", " fname:str, # A notebook filename to fix\n", " fast:bool=True, # Fast fix: automatically fix the merge conflicts in outputs or metadata\n", " trust_us:bool=True # Use local outputs/metadata when fast merging\n", "):\n", " \"Fix merge conflicts in notebook `fname`\"\n", " fname=Path(fname)\n", " shutil.copy(fname, fname.with_suffix('.ipynb.bak'))\n", " with open(fname, 'r') as f: raw_text = f.read()\n", " start,cells,end = extract_cells(raw_text)\n", " res = [start]\n", " cf,names,prev,added = 0,[None,None],None,False\n", " for cell in cells:\n", " c,cf,names,prev,added = analyze_cell(cell, cf, names, prev, added, fast=fast, trust_us=trust_us)\n", " res.append(c)\n", " if res[-1].endswith(','): res[-1] = res[-1][:-1]\n", " with open(f'{fname}', 'w') as f: f.write('\\n'.join([r for r in res+[end] if len(r) > 0]))\n", " if fast and not added: print(\"Successfully merged conflicts!\")\n", " else: print(\"One or more conflict remains in the notebook, please inspect manually.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This begins by backing the notebook `fname` to `fname.bak` in case something goes wrong. Then it parses the broken json, solving conflicts in cells. If `fast=True`, every conflict that only involves metadata or outputs of cells will be solved automatically by using the local (`trust_us=True`) or the remote (`trust_us=False`) branch. Otherwise, or for conflicts involving the inputs of cells, the json will be repaired by including the two version of the conflicted cell(s) with markdown cells indicating the conflicts. You will be able to open the notebook again and search for the conflicts (look for `<<<<<<<`) then fix them as you wish.\n", "\n", "If `fast=True`, the function will print a message indicating whether the notebook was fully merged or if conflicts remain." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export-" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Converted 00_export.ipynb.\n", "Converted 01_sync.ipynb.\n", "Converted 02_showdoc.ipynb.\n", "Converted 03_export2html.ipynb.\n", "Converted 04_test.ipynb.\n", "Converted 05_merge.ipynb.\n", "Converted 06_cli.ipynb.\n", "Converted 07_clean.ipynb.\n", "Converted 99_search.ipynb.\n", "Converted index.ipynb.\n", "Converted tutorial.ipynb.\n" ] } ], "source": [ "#hide\n", "from nbdev.export import notebook2script\n", "notebook2script()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jupytext": { "split_at_heading": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }