{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": false }, "outputs": [], "source": [ "#|default_exp merge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# merge\n", "\n", "> Fix merge conflicts in jupyter notebooks" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "from nbdev.imports import *\n", "from nbdev.config import *\n", "from nbdev.export import *\n", "from nbdev.sync import *\n", "\n", "from execnb.nbio import *\n", "from fastcore.script import *\n", "from fastcore import shutil\n", "\n", "import subprocess\n", "from difflib import SequenceMatcher" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|hide\n", "from nbdev.doclinks import nbdev_export\n", "from fastcore.test import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When working with jupyter notebooks (which are json files behind the scenes) and GitHub, it is very common that a merge conflict (that will add new lines in the notebook source file) will break some notebooks you are working on. This module defines the function `nbdev_fix` to fix those notebooks for you, and attempt to automatically merge standard conflicts. The remaining ones will be delimited by markdown cells like this:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`<<<<<< HEAD`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# local code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`======`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# remote code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`>>>>>> a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is an example of broken notebook. The json format is broken by the lines automatically added by git. Such a file can't be opened in jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"cells\": [\n", " {\n", " \"cell_type\": \"code\",\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"outputs\": [\n", " {\n", " \"data\": {\n", " \"text/plain\": [\n", " \"3\"\n", " ]\n", " },\n", " \"execution_count\": 6,\n", " \"metadata\": {},\n", " \"output_type\": \"execute_result\"\n", " }\n", " ],\n", " \"source\": [\n", "<<<<<<< HEAD\n", " \"z=3\\n\",\n", "=======\n", " \"z=2\\n\",\n", ">>>>>>> a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35\n", " \"z\"\n", " ]\n", " },\n", " {\n", " \"cell_type\": \"code\",\n", " \"execution_count\": 7,\n", " \"execution_count\": 5,\n", " \"metadata\": {},\n", " \"outputs\": [\n", " {\n", " \"data\": {\n", " \"text/plain\": [\n", " \"6\"\n", " ]\n", " },\n", "<<<<<<< HEAD\n", " \"execution_count\": 7,\n", "=======\n", " \"execution_count\": 5,\n", ">>>>>>> a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35\n", " \"metadata\": {},\n", " \"output_type\": \"execute_result\"\n", " }\n", " ],\n", " \"source\": [\n", " \"x=3\\n\",\n", " \"y=3\\n\",\n", " \"x+y\"\n", " ]\n", " },\n", " {\n", " \"cell_type\": \"code\",\n", " \"execution_count\": null,\n", " \"metadata\": {},\n", " \"outputs\": [],\n", " \"source\": []\n", " }\n", " ],\n", " \"metadata\": {\n", " \"kernelspec\": {\n", " \"display_name\": \"Python 3\",\n", " \"language\": \"python\",\n", " \"name\": \"python3\"\n", " }\n", " },\n", " \"nbformat\": 4,\n", " \"nbformat_minor\": 2\n", "}\n", "\n" ] } ], "source": [ "broken = Path('..//tests/example.ipynb.broken')\n", "tst_nb = broken.read_text()\n", "print(tst_nb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in this example, the second conflict is easily solved: it just concerns the execution count of the second cell and can be solved by choosing either option without really impacting your notebook. This is the kind of conflict we will fix automatically. The first conflict is more complicated as it spans across two cells and there is a cell present in one version, not the other. Such a conflict (and generally the ones where the inputs of the cells change form one version to the other) aren't automatically fixed, but we will return a proper json file where the annotations introduced by git will be placed in markdown cells." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a merged notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The approach we use is to first \"unpatch\" the conflicted file, regenerating the two files it was originally created from. Then we redo the diff process, but using cells instead of text lines." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "_BEG,_MID,_END = '<'*7,'='*7,'>'*7\n", "conf_re = re.compile(rf'^{_BEG}\\s+(\\S+)\\n(.*?)^{_MID}\\n(.*?)^{_END}\\s+([\\S ]+)\\n', re.MULTILINE|re.DOTALL)\n", "\n", "def _unpatch_f(before, cb1, cb2, c, r):\n", " if cb1 is not None and cb1 != cb2: raise Exception(f'Branch mismatch: {cb1}/{cb2}')\n", " r.append(before)\n", " r.append(c)\n", " return cb2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "def unpatch(s:str):\n", " \"Takes a string with conflict markers and returns the two original files, and their branch names\"\n", " *main,last = conf_re.split(s)\n", " r1,r2,c1b,c2b = [],[],None,None\n", " for before,c1_branch,c1,c2,c2_branch in chunked(main, 5):\n", " c1b = _unpatch_f(before, c1b, c1_branch, c1, r1)\n", " c2b = _unpatch_f(before, c2b, c2_branch, c2, r2)\n", " return ''.join(r1+[last]), ''.join(r2+[last]), c1b, c2b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result of \"unpatching\" our conflicted test notebook is the two original notebooks it would have been created from. Each of these original notebooks will contain valid JSON:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "```json\n", "{ 'cells': [ { 'cell_type': 'code',\n", " 'execution_count': 6,\n", " 'idx_': 0,\n", " 'metadata': {},\n", " 'outputs': [ { 'data': {'text/plain': ['3']},\n", " 'execution_count': 6,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'z=3\\nz'},\n", " { 'cell_type': 'code',\n", " 'execution_count': 5,\n", " 'idx_': 1,\n", " 'metadata': {},\n", " 'outputs': [ { 'data': {'text/plain': ['6']},\n", " 'execution_count': 7,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'x=3\\ny=3\\nx+y'},\n", " { 'cell_type': 'code',\n", " 'execution_count': None,\n", " 'idx_': 2,\n", " 'metadata': {},\n", " 'outputs': [],\n", " 'source': ''}],\n", " 'metadata': { 'kernelspec': { 'display_name': 'Python 3',\n", " 'language': 'python',\n", " 'name': 'python3'}},\n", " 'nbformat': 4,\n", " 'nbformat_minor': 2}\n", "```" ], "text/plain": [ "{'cells': [{'cell_type': 'code',\n", " 'execution_count': 6,\n", " 'metadata': {},\n", " 'outputs': [{'data': {'text/plain': ['3']},\n", " 'execution_count': 6,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'z=3\\nz',\n", " 'idx_': 0},\n", " {'cell_type': 'code',\n", " 'execution_count': 5,\n", " 'metadata': {},\n", " 'outputs': [{'data': {'text/plain': ['6']},\n", " 'execution_count': 7,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'x=3\\ny=3\\nx+y',\n", " 'idx_': 1},\n", " {'cell_type': 'code',\n", " 'execution_count': None,\n", " 'metadata': {},\n", " 'outputs': [],\n", " 'source': '',\n", " 'idx_': 2}],\n", " 'metadata': {'kernelspec': {'display_name': 'Python 3',\n", " 'language': 'python',\n", " 'name': 'python3'}},\n", " 'nbformat': 4,\n", " 'nbformat_minor': 2}" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a,b,branch1,branch2 = unpatch(tst_nb)\n", "dict2nb(loads(a))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "```json\n", "{ 'cells': [ { 'cell_type': 'code',\n", " 'execution_count': 6,\n", " 'idx_': 0,\n", " 'metadata': {},\n", " 'outputs': [ { 'data': {'text/plain': ['3']},\n", " 'execution_count': 6,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'z=2\\nz'},\n", " { 'cell_type': 'code',\n", " 'execution_count': 5,\n", " 'idx_': 1,\n", " 'metadata': {},\n", " 'outputs': [ { 'data': {'text/plain': ['6']},\n", " 'execution_count': 5,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'x=3\\ny=3\\nx+y'},\n", " { 'cell_type': 'code',\n", " 'execution_count': None,\n", " 'idx_': 2,\n", " 'metadata': {},\n", " 'outputs': [],\n", " 'source': ''}],\n", " 'metadata': { 'kernelspec': { 'display_name': 'Python 3',\n", " 'language': 'python',\n", " 'name': 'python3'}},\n", " 'nbformat': 4,\n", " 'nbformat_minor': 2}\n", "```" ], "text/plain": [ "{'cells': [{'cell_type': 'code',\n", " 'execution_count': 6,\n", " 'metadata': {},\n", " 'outputs': [{'data': {'text/plain': ['3']},\n", " 'execution_count': 6,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'z=2\\nz',\n", " 'idx_': 0},\n", " {'cell_type': 'code',\n", " 'execution_count': 5,\n", " 'metadata': {},\n", " 'outputs': [{'data': {'text/plain': ['6']},\n", " 'execution_count': 5,\n", " 'metadata': {},\n", " 'output_type': 'execute_result'}],\n", " 'source': 'x=3\\ny=3\\nx+y',\n", " 'idx_': 1},\n", " {'cell_type': 'code',\n", " 'execution_count': None,\n", " 'metadata': {},\n", " 'outputs': [],\n", " 'source': '',\n", " 'idx_': 2}],\n", " 'metadata': {'kernelspec': {'display_name': 'Python 3',\n", " 'language': 'python',\n", " 'name': 'python3'}},\n", " 'nbformat': 4,\n", " 'nbformat_minor': 2}" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dict2nb(loads(b))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('HEAD', 'a7ec1b0bfb8e23b05fd0a2e6cafcb41cd0fb1c35')" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "branch1,branch2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "def _make_md(code): return [dict(source=f'`{code}`', cell_type=\"markdown\", metadata={})]\n", "def _make_conflict(a,b, branch1, branch2):\n", " return _make_md(f'{_BEG} {branch1}') + a+_make_md(_MID)+b + _make_md(f'{_END} {branch2}')\n", "\n", "def _merge_cells(a, b, brancha, branchb, theirs):\n", " matches = SequenceMatcher(None, a, b).get_matching_blocks()\n", " res,prev_sa,prev_sb,conflict = [],0,0,False\n", " for sa,sb,sz in matches:\n", " ca,cb = a[prev_sa:sa],b[prev_sb:sb]\n", " if ca or cb:\n", " res += _make_conflict(ca, cb, brancha, branchb)\n", " conflict = True\n", " if sz: res += b[sb:sb+sz] if theirs else a[sa:sa+sz]\n", " prev_sa,prev_sb = sa+sz,sb+sz\n", " return res,conflict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "@call_parse\n", "def nbdev_fix(nbname:str, # Notebook filename to fix\n", " outname:str=None, # Filename of output notebook (defaults to `nbname`)\n", " nobackup:bool_arg=True, # Do not backup `nbname` to `nbname`.bak if `outname` not provided\n", " theirs:bool=False, # Use their outputs and metadata instead of ours\n", " noprint:bool=False): # Do not print info about whether conflicts are found\n", " \"Create working notebook from conflicted notebook `nbname`\"\n", " nbname = Path(nbname)\n", " if not nobackup and not outname: shutil.copy(nbname, nbname.with_suffix('.ipynb.bak'))\n", " nbtxt = nbname.read_text()\n", " a,b,branch1,branch2 = unpatch(nbtxt)\n", " ac,bc = dict2nb(loads(a)),dict2nb(loads(b))\n", " dest = bc if theirs else ac\n", " cells,conflict = _merge_cells(ac.cells, bc.cells, branch1, branch2, theirs=theirs)\n", " dest.cells = cells\n", " write_nb(dest, ifnone(outname, nbname))\n", " if not noprint:\n", " if conflict: print(\"One or more conflict remains in the notebook, please inspect manually.\")\n", " else: print(\"Successfully merged conflicts!\")\n", " return conflict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This begins by optionally backing the notebook `fname` to `fname.bak` in case something goes wrong. Then it parses the broken json, solving conflicts in cells. Every conflict that only involves metadata or outputs of cells will be solved automatically by using the local (`theirs==False`) or the remote (`theirs==True`) branch. Otherwise, or for conflicts involving the inputs of cells, the json will be repaired by including the two version of the conflicted cell(s) with markdown cells indicating the conflicts. You will be able to open the notebook again and search for the conflicts (look for `<<<<<<<`) then fix them as you wish.\n", "\n", "A message will be printed indicating whether the notebook was fully merged or if conflicts remain." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "One or more conflict remains in the notebook, please inspect manually.\n" ] } ], "source": [ "nbdev_fix(broken, outname='tmp.ipynb')\n", "chk = read_nb('tmp.ipynb')\n", "test_eq(len(chk.cells), 7)\n", "os.unlink('tmp.ipynb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Git merge driver" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "def _git_branch_merge():\n", " try: return only(v for k,v in os.environ.items() if k.startswith('GITHEAD'))\n", " except ValueError: return" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "def _git_rebase_head():\n", " for d in ('apply','merge'):\n", " d = Path(f'.git/rebase-{d}')\n", " if d.is_dir():\n", " cmt = (d/'orig-head').read_text()\n", " msg = run(f'git show-branch --no-name {cmt}')\n", " return f'{cmt[:7]} ({msg})'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "def _git_merge_file(base, ours, theirs):\n", " \"`git merge-file` with expected labels depending on if a `merge` or `rebase` is in-progress\"\n", " l_theirs = _git_rebase_head() or _git_branch_merge() or 'THEIRS'\n", " cmd = f\"git merge-file -L HEAD -L BASE -L '{l_theirs}' {ours} {base} {theirs}\"\n", " return subprocess.run(cmd, shell=True, capture_output=True, text=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|export\n", "@call_parse\n", "def nbdev_merge(base:str, ours:str, theirs:str, path:str):\n", " \"Git merge driver for notebooks\"\n", " if not _git_merge_file(base, ours, theirs).returncode: return\n", " theirs = str2bool(os.environ.get('THEIRS', False))\n", " return nbdev_fix.__wrapped__(ours, theirs=theirs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This implements a [git merge driver](https://git-scm.com/docs/gitattributes#_defining_a_custom_merge_driver) for notebooks that automatically resolves conflicting metadata and outputs, and splits remaining conflicts as separate cells so that the notebook can be viewed and fixed in Jupyter. The easiest way to install it is by running `nbdev_install_hooks`.\n", "\n", "This works by first running Git's default merge driver, and then `nbdev_fix` if there are still conflicts. You can set `nbdev_fix`'s `theirs` argument using the `THEIRS` environment variable, for example:\n", "\n", " THEIRS=True git merge branch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export -" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#|hide\n", "import nbdev.doclinks; nbdev.nbdev_export()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jupytext": { "split_at_heading": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }