{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"\n",
"\n",
"\n",
"# BHSA version mappings\n",
"\n",
"In this notebook we map the nodes between the all the extant versions of the BHSA dataset.\n",
"\n",
"The resulting mappings can be used for writing version independent programs that process\n",
"the BHSA data.\n",
"Those programs can only be version independent to a certain extent, because\n",
"in general, node mappings between versions cannot be made perfect.\n",
"\n",
"If one imagines what may change between versions, it seems intractable to make a device that overcomes\n",
"differences in the encoding of the texts and its syntax.\n",
"However, we are dealing with versions of a very stable text, that is linguistically annotated by means\n",
"of a consistent method, so there is reason to be optimistic.\n",
"This notebook shows that this optimism is well-founded.\n",
"\n",
"In another notebook,\n",
"[version Phrases](versionPhrases.ipynb)\n",
"we show how one can use the mappings to analyse phrase encodings across versions of the data.\n",
"\n",
"# Overview\n",
"We create the mappings in two distinct stages, each being based on a particular insight, and dealing with\n",
"a set of difficult cases.\n",
"\n",
"* [Slot nodes](#Slot-nodes): we restrict ourselves to the *slot* nodes,\n",
" the nodes that correspond to the individual words;\n",
"* [Nodes in general](#Nodes-in-general): we extend the slot mapping in a generic way to\n",
" a mapping between all nodes.\n",
" Those other nodes are the ones that correspond to higher level textual objects, such as phrases, clauses,\n",
" sentences.\n",
"\n",
"This is a big notebook, here are links to some locations in the computation.\n",
"\n",
"* [start of the computation](#Computing)\n",
"* [start of making slot mappings](#Making-slot-mappings)\n",
"* [start of expanding them to node mappings](#Extending-to-node-mappings)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Nodes, edges, mappings\n",
"\n",
"In the\n",
"[text-fabric data model](https://github.com/Dans-labs/text-fabric/wiki/Data-model),\n",
"nodes correspond to the objects in the text and its syntax, and edges correspond to relationships between\n",
"those objects.\n",
"Normally, these edges are **intra**-dataset, they are between nodes in the same dataset.\n",
"\n",
"Now, each version of the BHSA in text-fabric is its own dataset.\n",
"The mappings between nodes of one version and corresponding nodes in another version are\n",
"**inter**-dataset edges.\n",
"\n",
"Nodes in text-fabric are abstract, they are just numbers,\n",
"starting with 1 for the first slot (word),\n",
"increasing by one for each slot up to the last slot,\n",
"and then just continuing beyond that for the non-slot nodes.\n",
"\n",
"So an edge is just a mapping between numbers, and it is perfectly possible to have just any mapping\n",
"between numbers in a dataset.\n",
"\n",
"We store mappings as ordinary TF edge features, so you can use the mapping in both ways, by\n",
"\n",
"```\n",
"nodesInVersion4 = Es('omap@3-4').f(nodeInVersion3)\n",
"nodesInVersion3 = Es('omap@3-4').t(nodeInVersion4)\n",
"```\n",
"\n",
"respectively.\n",
"When one version supersedes another, we store the mapping between the older and newer version\n",
"as an edge in the new version, leaving the older version untouched.\n",
"\n",
"We store the node mapping with a bit more information than the mere correspondence between nodes.\n",
"We also add an integer to each correspondence which indicates how problematic that correspondence is.\n",
"\n",
"If the correspondence is perfect, we do not add any value.\n",
"If it is a simple discrepancy, confined to an equal amount of slots in both versions, we add value `0`.\n",
"If the discrepancy is more complicated, we add a higher number.\n",
"The details of this will follow."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Slot nodes\n",
"\n",
"The basic idea in creating a slot mapping is to walk through the slots of both versions in parallel,\n",
"and upon encountering a difference, to take one of a few prescribed actions, that may lead to catching up\n",
"slots in one of the two versions.\n",
"\n",
"The standard behaviour is to stop at each difference encountered, unless the difference conforms\n",
"to a \"predefined\" case. When there is no match, the user may add a case to the list of cases.\n",
"It might be needed to add a different systematic kind of case, and for that programming is needed.\n",
"\n",
"This notebook shows the patterns and the very small lists of cases that were needed to do the job for 4\n",
"version transitions, each corresponding to 1 year or more of encoding activity.\n",
"\n",
"## Differences\n",
"\n",
"When we compare versions, our aim is not to record all differences in general, but to record\n",
"the correspondence between the slots of the versions, and exactly where and how this\n",
"correspondence is disturbed.\n",
"\n",
"We use the lexeme concept as an anchor point for the correspondence.\n",
"If we compare the two versions, slot by slot, and as long as we encounter the same lexemes,\n",
"we have an undisturbed correspondence.\n",
"In fact, we relax this a little bit, because the very concept of lexeme might change between versions.\n",
"So we reduce the information in lexemes considerably, before we compare them, so that we\n",
"do not get disturbed by petty changes.\n",
"\n",
"While being undisturbed, we just create an edge between the slot in the one version that we are at,\n",
"to the node in the other version that we are at,\n",
"and we assign no value to such an edge.\n",
"\n",
"But eventually, we encounter real disturbances.\n",
"They manifest themselves in just a few situations:\n",
"\n",
"1. ![1](diffs/diffs.001.png)\n",
"2. ![2](diffs/diffs.002.png)\n",
"3. ![3](diffs/diffs.003.png)\n",
"\n",
"In full generality, we can say:\n",
"$i$ slots in the source $V$ version correspond\n",
"to $j$ slots in the target version $W$,\n",
"where $i$ and $j$ may be 0, but not at the same time:\n",
"\n",
"1. ![4](diffs/diffs.004.png)\n",
"\n",
"If $i$ slots in version $V$, starting at $n$\n",
"get replaced by $j$ slots in the version $W$, starting at $m$,\n",
"we create edges between all $n, ..., n+i-1$ on the one hand\n",
"and all $m, ..., m+j-1$ on the other hand,\n",
"and associate them all with the same number $j-i$.\n",
"\n",
"But so far, it turns out that the only things we have to deal with,\n",
"are specific instances of 1, 2, and 3 above.\n",
"\n",
"We have a closer look at those cases.\n",
"\n",
"### Lexeme change\n",
"When a lexeme changes at a particular spot $n, m$,\n",
"we have $i=j=1$, leading to exactly one edge $(n, m)$ with value $0$.\n",
"\n",
"### Slot splitting\n",
"When slot $n\\in V$ splits into $m, ..., m+j \\in W$, we create edges from $n$ to each of the $m, ..., m+j$,\n",
"each carrying the number $j$. The larger $j$ is,\n",
"the greater the dissimilarity between node $n\\in V$\n",
"and each of the $m, ..., m+j \\in W$.\n",
"\n",
"### Slot collapse\n",
"When slots $n, ..., n+i \\in V$ collapse into $m\\in W$, we create edges from each of the $n, ..., n+i$ to $m$,\n",
"each carrying the number $j$. The larger $j$ is,\n",
"the greater the dissimilarity between the nodes $n, ..., n+i\\in V$\n",
"and $m \\in W$.\n",
"\n",
"### Slot deletion\n",
"When slot $n$ is deleted from $V$, we have $i=1, j=0$, leading to zero edges from $n$.\n",
"But so far, we have not encountered this case.\n",
"\n",
"### Slot addition\n",
"When slot $m$ is added to $W$, we have $i=0, j=1$, again leading to zero edges to $m$.\n",
"But so far, we have not encountered this case."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Nodes in general\n",
"The basic idea we use for the general case is that that nodes are linked to slots.\n",
"In text-fabric, the standard `oslots` edge feature lists for each non-slot node the slots it is linked to.\n",
"\n",
"Combining the just created slot mappings between versions and the `oslots` feature,\n",
"we can extend the slot mapping into a general node mapping.\n",
"\n",
"In order to map a node $n$ in version $V$, we look at its slots $s$,\n",
"use the already established *slot mapping* to map these to slots $t$ in version $W$,\n",
"and collect the nodes $m$ in version $W$ that are linked to those $t$.\n",
"They are good candidates for the mapping.\n",
"\n",
"![5](diffs/diffs.005.png)\n",
"\n",
"# Refinements\n",
"\n",
"When we try to match nodes across versions, based on slot containment, we also respect\n",
"their `otype`s. So we will not try to match a `clause` to a `phrase`.\n",
"We make implicit use of the fact that for most `otype`s, the members contain disjoint slots.\n",
"\n",
"# Multiple candidates\n",
"Of course, things are not always as neat as in the diagram. Textual objects may have split, or shifted,\n",
"or collapsed.\n",
"In general we find 0 or more candidates.\n",
"Even if we find exactly one candidate, it does not have to be a perfect match.\n",
"A typical situation is this:\n",
"\n",
"![6](diffs/diffs.006.png)\n",
"\n",
"We do not find a node $m\\in W$ that occupies the mapped slots exactly.\n",
"Instead, we find that the target area is split between two candidates who\n",
"also reach outside the target area.\n",
"\n",
"In such cases, we make edges to all such candidates, but we add a dissimilarity measure.\n",
"If $m$ is the collection of slots, mapped from $n$, and $m_1$ is a candidate for $n$, meaning $m_1$ has\n",
"overlap with $m$, then the *dissimilarity* of $m_1$ is defined as:\n",
"\n",
"$$|m_1\\cup m| - |m_1\\cap m|$$\n",
"\n",
"In words: the number of slots in the union of $m_1$ and $m$ minus the number of slots in their intersection.\n",
"\n",
"In other words: $m_1$ gets a penalty for\n",
"\n",
"* each slot $s\\in m_1$ that is not in the mapped slots $m$;\n",
"* each mapped slot $t\\in m$ that is not in $m_1$.\n",
"\n",
"If a candidate occupies exactly the mapped slots, the dissimilarity is 0.\n",
"If there is only one such candidate of the right type, the case is completely clear, and we\n",
"do not add a dissimilarity value to the edge.\n",
"\n",
"If there are more candidates, all of them will get an edge, and those edges will contain the dissimilarity\n",
"value, even if that value is $0$.\n",
"\n",
"\n",
"# Subphrases\n",
"The most difficult type to handle in our dataset is the `subphrase`,\n",
"because they nest and overlap.\n",
"But it turns out that the similarity measure almost always helps out: when looking for candidates\n",
"for a mapped subphrase, usually one of them has a dissimilarity of 0.\n",
"That's the real counterpart.\n",
"\n",
"# Reporting\n",
"We report the success in establishing the match between non-slot nodes.\n",
"We do so per node type, and for each node type we list a few statistics,\n",
"both in absolute numbers and in percentage of the total amount of nodes of that\n",
"type in the source version.\n",
"\n",
"We count the nodes that fall in each of the following cases.\n",
"The list of cases is ordered by decreasing success of the mapping.\n",
"\n",
"1. **unique, perfect**: there is only one match for the mapping and it is a perfect one in terms\n",
" of slots linked to it;\n",
"2. **multiple, one perfect**: there are multiple matches, but at least one is perfect; this occurs\n",
" typically if nodes of a type are linked to nested and overlapping sequences of slots, such as `subphrase`s;\n",
"3. **unique, imperfect**: there is only one match, but it is not perfect; this indicates that some\n",
" boundary reorganization has happened between the two versions, and that some slots of the source node\n",
" have been cut off in the target node; yet the fact that the source node and the\n",
" target node correspond is clear;\n",
"4. **multiple, cleanly composed**: in this case the source node corresponds to a bunch of matches, that\n",
" together cleanly cover the mapped slots of the source node; in other words: the original node\n",
" has been split in several parts;\n",
"5. **multiple, non-perfect**: all remaining cases where there are matches; these situations can be the\n",
" result of more intrusive changes; if it turns out to be a small set they do require closer inspection;\n",
"6. **not mapped**: these are nodes for which no match could be found."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Computing"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import collections\n",
"from functools import reduce\n",
"from tf.dataset.nodemaps import caption\n",
"from tf.fabric import Fabric"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We specify our versions and the subtle differences between them as far as they are relevant."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"REPO = os.path.expanduser(\"~/github/etcbc/bhsa\")\n",
"baseDir = \"{}/tf\".format(REPO)\n",
"tempDir = \"{}/_temp\".format(REPO)\n",
"SILENT = \"auto\"\n",
"\n",
"versions = \"\"\"\n",
" 3\n",
" 4\n",
" 4b\n",
" 2016\n",
" 2017\n",
" c\n",
"\"\"\".strip().split()\n",
"\n",
"# work only with selected versions\n",
"# remove this if you want to work with all versions\n",
"versions = \"\"\"\n",
" 2017\n",
" 2021\n",
"\"\"\".strip().split()\n",
"\n",
"versionInfo = {\n",
" \"\": dict(\n",
" OCC=\"g_word\",\n",
" LEX=\"lex\",\n",
" ),\n",
" \"3\": dict(\n",
" OCC=\"text_plain\",\n",
" LEX=\"lexeme\",\n",
" ),\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load all versions in one go!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"lines_to_next_cell": 2
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"..............................................................................................\n",
". 0.00s Version -> 2017 <- loading ... .\n",
"..............................................................................................\n",
"This is Text-Fabric 10.2.0\n",
"Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n",
"\n",
"115 features found and 0 ignored\n",
"..............................................................................................\n",
". 1.85s Version -> 2021 <- loading ... .\n",
"..............................................................................................\n",
"This is Text-Fabric 10.2.0\n",
"Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n",
"\n",
"116 features found and 0 ignored\n",
"..............................................................................................\n",
". 3.69s All versions loaded .\n",
"..............................................................................................\n"
]
}
],
"source": [
"TF = {}\n",
"api = {}\n",
"for v in versions:\n",
" for (param, value) in versionInfo.get(v, versionInfo[\"\"]).items():\n",
" globals()[param] = value\n",
" caption(4, \"Version -> {} <- loading ...\".format(v), silent=SILENT)\n",
" TF[v] = Fabric(locations=\"{}/{}\".format(baseDir, v), modules=[\"\"], silent=SILENT)\n",
" api[v] = TF[v].load(\"{} {}\".format(OCC, LEX)) # noqa F821\n",
"caption(4, \"All versions loaded\", silent=SILENT)"
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"We want to switch easily between the APIs for the versions."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def activate(v):\n",
" for (param, value) in versionInfo.get(v, versionInfo[\"\"]).items():\n",
" globals()[param] = value\n",
" api[v].makeAvailableIn(globals())\n",
" caption(4, \"Active version is now -> {} <-\".format(v), silent=SILENT)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Inspect the amount of slots in all versions."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"..............................................................................................\n",
". 7.49s Active version is now -> 2017 <- .\n",
"..............................................................................................\n",
"..............................................................................................\n",
". 7.49s Active version is now -> 2021 <- .\n",
"..............................................................................................\n"
]
}
],
"source": [
"nSlots = {}\n",
"for v in versions:\n",
" activate(v)\n",
" nSlots[v] = F.otype.maxSlot\n",
" caption(0, \"\\t {} slots\".format(nSlots[v]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Method\n",
"\n",
"When we compare two versions, we inspect the lexemes found at corresponding positions in the versions.\n",
"We start at the beginning, and when the lexemes do not match, we have a closer look.\n",
"\n",
"However, in order not to be disturbed by minor discrepancies in the lexemes, we mask the lexemes: we\n",
"apply a few transformations to it, such as removing alephs and wavs, and finally even turning them into\n",
"ordered sets of letters, thereby loosing the order and multiplicity of letter.\n",
"We also strip the disambiguation marks.\n",
"\n",
"We maintain a current mapping between the slots of the two versions, and we update it if we encounter\n",
"disturbances.\n",
"Initially, this map is the identity map.\n",
"\n",
"What we encounter as remaining differences boils down to the following:\n",
"\n",
"* a lexeme is split into two lexemes with the same total material, typically involving `H`, `MN`, or `B`\n",
"* the lexeme is part of a special case, listed in the `cases` table (which has been found by repeatedly\n",
" chasing for the first remaining difference.\n",
"* the both lexemes differ, but that's it, no map updates have to be done.\n",
"\n",
"The first two types of cases can be solved by splitting a lexeme into `k` parts or combining `k` lexemes into one.\n",
"After that the mapping has to be shifted to the right or to the left from a certain point onward.\n",
"\n",
"The loop then is as follows:\n",
"\n",
"* find the first slot with a lexeme in the first version that is different from the lexeme at the mapped slot\n",
" in the second version\n",
"* analyse what is the case:\n",
" * if the disturbance is recognized on the basis of existing patterns and cases, update the map and\n",
" consider this case solved\n",
" * if the disturbance is not recognized, the case is unsolved, and we break out of the loop.\n",
" More analysis is needed, and the outcome of that has to be coded as an extra pattern or case.\n",
"* if the status is solved, go back to the first step\n",
"\n",
"We end up with a mapping from the slots of the first version to those of the other version that links\n",
"slots with approximately equal lexemes together."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Making slot mappings\n",
"## Lexeme masking\n",
"We start by defining our masking function, and compile lists of all lexemes and masked lexemes for all versions."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"masks = [\n",
" (lambda lex: lex.rstrip(\"[/=\"), \"strip disambiguation\"),\n",
" (lambda lex: lex[0:-2] if lex.endswith(\"JM\") else lex, \"remove JM\"),\n",
" (lambda lex: lex[0:-2] if lex.endswith(\"WT\") else lex, \"remove WT\"),\n",
" (lambda lex: lex.replace(\"J\", \"\"), \"remove J\"),\n",
" (lambda lex: lex.replace(\">\", \"\"), \"remove Alef\"),\n",
" (lambda lex: lex.replace(\"W\", \"\"), \"remove W\"),\n",
" (lambda lex: lex.replace(\"Z\", \"N\"), \"identify Z and N\"),\n",
" (lambda lex: lex.rstrip(\"HT\"), \"strip HT\"),\n",
" (\n",
" lambda lex: (\"\".join(sorted(set(set(lex))))) + \"_\" * lex.count(\"_\"),\n",
" \"ignore order and multiplicity\",\n",
" ),\n",
"]\n",
"\n",
"\n",
"def mask(lex, trans=None):\n",
" \"\"\"Apply a single masking operation or apply them all.\n",
" \n",
" Parameters\n",
" ----------\n",
" lex: string\n",
" The text of the lexem\n",
" trans: integer, optional `None`\n",
" If given, it is an index in the `masks` list, and the corresponding\n",
" mask transformation will be applied to `lex`.\n",
" If `None`, all transformations in the `masks` list will be applied in that order.\n",
" \n",
" Returns\n",
" -------\n",
" string\n",
" The result of transforming `lex`\n",
" \"\"\"\n",
" if trans is not None:\n",
" return masks[trans][0](lex)\n",
" for (fun, desc) in masks:\n",
" lex = fun(lex)\n",
" return lex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Carry out the lexeme masking for all versions."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"..............................................................................................\n",
". 12s Masking lexemes .\n",
"..............................................................................................\n",
"..............................................................................................\n",
". 12s Active version is now -> 2017 <- .\n",
"..............................................................................................\n",
"..............................................................................................\n",
". 13s Active version is now -> 2021 <- .\n",
"..............................................................................................\n"
]
}
],
"source": [
"lexemes = {}\n",
"\n",
"caption(4, \"Masking lexemes\", silent=SILENT)\n",
"for v in versions:\n",
" activate(v)\n",
" lexemes[v] = collections.OrderedDict()\n",
" for n in F.otype.s(\"word\"):\n",
" lex = Fs(LEX).v(n) # noqa F821\n",
" lexemes[v][n] = (lex, mask(lex, trans=0), mask(lex))\n",
"caption(0, \"Done\", silent=SILENT)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now for each version `v`, `lexemes[v]` is a mapping from word nodes `n` \n",
"to lexeme information of the word at node `n`.\n",
"The lexeme information is a tuple with members\n",
"\n",
"* **full lexeme** the full disambiguated lexeme\n",
"* **lexeme** the lexeme without the disambiguation marks\n",
"* **masked lexeme** the fully transformed lexeme"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"# Cases and mappings\n",
"In `cases` we store special cases that we stumbled upon.\n",
"Every time we encountered a disturbance which did not follow a recognized pattern,\n",
"we turned it into a case.\n",
"The number is the slot number in the first version where the case will be applied.\n",
"Cases will only be applied at these exact slot numbers and nowhere else.\n",
"\n",
"In `mappings` we build a mapping between corresponding nodes across a pair of versions.\n",
"At some of those correspondences there are disturbances, there we add a measure of the\n",
"dissimilarity to the mapped pair.\n",
"\n",
"Later, we extend those slot mappings to *node* mappings, which are maps between versions where\n",
"*all* nodes get mapped, not just slot nodes.\n",
"We deliver those node mappings as formal edges in TF.\n",
"Then these edges will be added in the second version, so that each newer version knows\n",
"how to link to the previous version.\n",
"We build the node maps in `edges`.\n",
"\n",
"We store the dissimilarities in a separate dictionary, `dissimilarity`.\n",
"\n",
"All these dictionaries are keyed by a 2-tuple of versions."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"cases = {}\n",
"mappings = {}\n",
"dissimilarity = {}\n",
"edges = {}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Algorithm\n",
"\n",
"Here is the code that directly implements the method.\n",
"Every pair of distinct versions can be mapped.\n",
"We store the mappings in a dictionary, keyed by tuples like `(4, 4b)`,\n",
"for the mapping from version `4` to `4b`, for instance.\n",
"\n",
"The loop is in `doDiffs` below."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def inspect(v, w, start, end):\n",
" \"\"\"Helper function for inspecting the situation in a given range of slots.\n",
" \n",
" Parameters\n",
" ----------\n",
" v: string\n",
" First version\n",
" w: string\n",
" Second version\n",
" start: integer\n",
" Slot number (in first version) where we start the inspection.\n",
" end: integer\n",
" Slot number (in first version) where we end the inspection.\n",
" \n",
" Returns\n",
" -------\n",
" None\n",
" The situation will be printed as a table with a row for each slot\n",
" and columns:\n",
" slot number in version 1,\n",
" lexeme of that slot in version 1,\n",
" lexeme of the corresponding slot in version 2\n",
" \"\"\"\n",
" mapKey = (v, w)\n",
" mapping = mappings[mapKey]\n",
" version1Info = versionInfo.get(v, versionInfo[\"\"])\n",
" version2Info = versionInfo.get(w, versionInfo[\"\"])\n",
" \n",
" for slot in range(start, end):\n",
" print(\n",
" \"{:>6}: {:<8} {:<8}\".format(\n",
" slot,\n",
" api[v].Fs(version1Info[\"LEX\"]).v(slot),\n",
" api[w].Fs(version2Info[\"LEX\"]).v(mapping[slot]),\n",
" )\n",
" )\n",
"\n",
"\n",
"def inspect2(v, w, slot, k):\n",
" \"\"\"Helper function for inspecting the edges in a given range of slots.\n",
" \n",
" Not used, currently.\n",
" \n",
" Parameters\n",
" ----------\n",
" v: string\n",
" First version\n",
" w: string\n",
" Second version\n",
" slot: integer\n",
" Slot number (in first version) in the center of the inspection\n",
" k: integer\n",
" Amount of slots left and right from the center where we inspect.\n",
" \n",
" Returns\n",
" -------\n",
" None\n",
" The situation will be printed as a table with a row for each slot\n",
" and columns:\n",
" slot number in version 1,\n",
" the edge at that slot number, or X if there is no edge\n",
" \"\"\"\n",
" mapKey = (v, w)\n",
" edge = edges[mapKey]\n",
" for i in range(slot - k, slot + k + 1):\n",
" print(f\"EDGE {i} =>\", edge.get(i, \"X\"))\n",
"\n",
"\n",
"def firstDiff(v, w, start):\n",
" \"\"\"Find the first discrepancy after a given position.\n",
" \n",
" First we walk quickly through the slots of th first version,\n",
" until we reach the starting position.\n",
" \n",
" Then we continue walking until the current slot is either\n",
" \n",
" * a special case\n",
" * a discrepancy\n",
" \n",
" Parameters\n",
" ----------\n",
" v: string\n",
" First version\n",
" w: string\n",
" Second version\n",
" start: integer\n",
" start position\n",
" \n",
" Returns\n",
" -------\n",
" int or None\n",
" If there is no discrepancy, None is returned,\n",
" otherwise the position of the first discrepancy.\n",
" \"\"\"\n",
" mapKey = (v, w)\n",
" mapping = mappings[mapKey]\n",
" theseCases = cases[mapKey]\n",
"\n",
" fDiff = None\n",
" for (slot, (lex1, bareLex1, maskedLex1)) in lexemes[v].items():\n",
" if slot < start:\n",
" continue\n",
" maskedLex2 = lexemes[w][mapping[slot]][2]\n",
" if slot in theseCases or maskedLex1 != maskedLex2:\n",
" fDiff = slot\n",
" break\n",
" return fDiff\n",
"\n",
"\n",
"def printDiff(v, w, slot, k):\n",
" \"\"\"Prints the situation around a discrepancy.\n",
" \n",
" We also show phrase atom boundaries.\n",
" WE show the bare lexemes in the display, not the masked lexemes.\n",
" \n",
" Parameters\n",
" ----------\n",
" v: string\n",
" First version\n",
" w: string\n",
" Second version\n",
" slot: integer\n",
" position of the discrepancy\n",
" k: integer\n",
" amount of slots around the discrepancy to include in the display\n",
" \n",
" Returns\n",
" -------\n",
" A plain text display of the situation around the discrepancy.\n",
" \"\"\"\n",
" \n",
" mapKey = (v, w)\n",
" mapping = mappings[mapKey]\n",
" comps = {}\n",
" prevChunkV = None\n",
" prevChunkW = None\n",
" \n",
" # gather the comparison material in comps\n",
" # which has as keys the versions and as value a list of display items\n",
" \n",
" for i in range(slot - k, slot + k + 1):\n",
" # determine if we are at a phrase atom boundary in version 1\n",
" chunkV = None if i not in mapping else api[v].L.u(i, otype=\"phrase_atom\")\n",
" boundaryV = prevChunkV is not None and prevChunkV != chunkV\n",
" prevChunkV = chunkV\n",
" # determine if we are at the actual discrepancy in version 1\n",
" currentV = i == slot\n",
"\n",
" # determine if we are at a phrase atom boundary in version 2\n",
" j = mapping.get(i, None)\n",
" chunkW = None if j is None else api[w].L.u(j, otype=\"phrase_atom\")\n",
" boundaryW = prevChunkW is not None and prevChunkW != chunkW\n",
" prevChunkW = chunkW\n",
" # determine if we are at the actual discrepancy in version 2\n",
" currentW = j == mapping[slot]\n",
"\n",
" lvTuple = lexemes[v].get(i, None)\n",
" lwTuple = None if j is None else lexemes[w].get(j, None)\n",
" lv = \"□\" if lvTuple is None else lvTuple[1] # bare lexeme\n",
" lw = \"□\" if lwTuple is None else lwTuple[1] # bare lexeme\n",
"\n",
" comps.setdefault(v, []).append((lv, currentV, boundaryV))\n",
" comps.setdefault(w, []).append((lw, currentW, boundaryW))\n",
" \n",
" # turn the display items into strings and store them in rep\n",
" # which is also keyed by the versions\n",
" \n",
" rep = {}\n",
" for version in comps:\n",
" rep[version] = printVersion(version, comps[version])\n",
"\n",
" # compose the display out of the strings per version\n",
" # and make a header of sectional information and slot positions\n",
" \n",
" print(\n",
" \"\"\"{} {}:{} ==> slot {} ==> {}\n",
" {}\n",
" {}\n",
"\"\"\".format(\n",
" *api[v].T.sectionFromNode(slot),\n",
" slot,\n",
" mapping[slot],\n",
" rep[v],\n",
" rep[w],\n",
" )\n",
" )\n",
"\n",
"\n",
"def printVersion(v, comps):\n",
" \"\"\"Generate a string displaying a stretch of lexemes around a position.\n",
" \n",
" Parameters\n",
" ----------\n",
" comps: list of tuple\n",
" For each slot there is a comp tuple consisting of\n",
" \n",
" * the bare lexeme\n",
" * whether the slot is in the discrepancy position\n",
" * whether the slot is at a phrase atom boundary\n",
" \n",
" Returns\n",
" -------\n",
" string\n",
" A sequence of lexemes with boundary characters in between.\n",
" \"\"\"\n",
" \n",
" rep = \"\"\n",
" for (lex, isCurrent, boundary) in comps:\n",
" rep += \"┫┣\" if boundary else \"╋\"\n",
" rep += f\"▶{lex}◀\" if isCurrent else lex\n",
" rep += \"╋\"\n",
" return rep"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"# `doDiffs`\n",
"\n",
"This function contains the loop to walk through all differences.\n",
"\n",
"We walk from discrepancy to discrepancy, and stop if there are no more discrepancies or when we\n",
"have reached an artificial upper boundary of discrepancies.\n",
"\n",
"We try to solve the discrepancies.\n",
"If we hit a discrepancy that we cannot solve, we break out the loop too.\n",
"\n",
"## `MAX_ITER`\n",
"\n",
"The artificial limit is `MAX_ITER`.\n",
"You determine it experimentally.\n",
"Keep it low at first, when you are meeting the initial discrepancies.\n",
"When you have dealt with them and discover that you can dealt with that amount of discrepancies,\n",
"increase the limit.\n",
"\n",
"## Cases\n",
"\n",
"We will encounter discrepancies, and we will learn how to solve them.\n",
"There are some generic ways of solving them, and these we collect in a dictionary of cases.\n",
"\n",
"The keys of the cases are either slot positions or lexemes.\n",
"\n",
"When the algorithms walks through the corpus, it will consider slots\n",
"whose number or whose lexeme is in the cases as solved.\n",
"\n",
"The value of a case is a tuple consisting of\n",
"\n",
"* the name of an *action*\n",
"* a parameter\n",
"\n",
"Here are the actions\n",
"\n",
"key | action | parameters | description\n",
"--- | --- | --- | ---\n",
"slot | `ok` | `None` | the discrepancy is OK, nothing to worry about; we set the dissimilarity to 0, which is worse than `None`\n",
"slot | `split` | `n` integer | split the lexeme in version 1 into `n` lexemes in version 2; set the dissimilarity to `n`\n",
"slot | `collapse` | `n` integer | collapse `n` lexemes in version 1 into one lexeme in version 2; dissimilarity `-n`\n",
"lex | `ok` | `alt` string | the discrepancy is OK if version 2 has *alt* instead of *lex*; dissimilarity set to 0\n",
"lex | `split` | `n` integer | split *lex* in version 1 into `n` extra slots in version 2; set the dissimilarity to `n`\n",
"\n",
"If a discrepancy falls through all these special cases, we have a few generic rules that will also be applied:\n",
"\n",
"* if a lexeme in version 1 contains `_`, we split on it and treat it as separate lexemes.\n",
" In fact, we perform the action `split` with parameter the number of parts separated by `_`.\n",
"* if the lex in version 1 equals the lex in version 2 plus the next lex in version 2, and if the lex in version 2 is `H`,\n",
" we split the lex in version 1 into that `H` and the rest.\n",
"* if the set of letters in the masked lexeme in version 1 is the union of the sets of the corresponding masked lexeme\n",
" in version 2 plus that of the next lexeme in version 2, and if the corresponding lexeme in version 2 is either `B` or `MN`,\n",
" we split the lex in version 1 into that `B` or `MN` and the rest.\n",
" \n",
"Note that these rules are very corpus dependent, and have been distilled from experience with the BHSA versions involved.\n",
"If you are in the process of applying this algorithm to other corpora, you can leave out these rules, and add your\n",
"own depending on what you encounter."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"lines_to_end_of_cell_marker": 2
},
"outputs": [],
"source": [
"MAX_ITER = 250\n",
"\n",
"\n",
"def doDiffs(v, w):\n",
" mapKey = (v, w)\n",
" \n",
" thisDissimilarity = {}\n",
" dissimilarity[mapKey] = thisDissimilarity\n",
" \n",
" thisMapping = dict(((n, n) for n in api[v].F.otype.s(\"word\")))\n",
" mappings[mapKey] = thisMapping\n",
" \n",
" theseCases = cases.get(mapKey, {})\n",
"\n",
" iteration = 0\n",
" start = 1\n",
"\n",
" solved = True\n",
"\n",
" while True:\n",
" # try to find the next difference from where you are now\n",
" n = firstDiff(v, w, start)\n",
"\n",
" if n is None:\n",
" print(f\"No more differences.\\nFound {iteration} points of disturbance\")\n",
" break\n",
"\n",
" if iteration > MAX_ITER:\n",
" print(\"There might be more disturbances: increase MAX_ITER\")\n",
" break\n",
"\n",
" iteration += 1\n",
" \n",
" # there is a difference: we have to do work\n",
" # we print it as a kind of logging\n",
" \n",
" printDiff(v, w, n, 5)\n",
"\n",
" # we try to solve the discrepancy\n",
" # first we gather the information of about the lexemes at this position in both versions\n",
" \n",
" (lex1, bareLex1, maskedLex1) = lexemes[v][n]\n",
" (lex2, bareLex2, maskedLex2) = lexemes[w][thisMapping[n]]\n",
" \n",
" # and at the next position\n",
" \n",
" (lex1next, bareLex1next, maskedLex1next) = lexemes[v][n + 1]\n",
" (lex2next, bareLex2next, maskedLex2next) = lexemes[w][thisMapping[n + 1]]\n",
"\n",
" # the discrepancy is not solved unless we find it in a case or in a rule\n",
" solved = None\n",
" skip = 0\n",
" \n",
" # first check the explicit cases\n",
" \n",
" if n in theseCases:\n",
" (action, param) = theseCases[n]\n",
" if action == \"collapse\":\n",
" plural = \"\" if param == 1 else \"s\"\n",
" solved = f\"{action} {param} fewer slot{plural}\"\n",
" thisDissimilarity[n] = -param\n",
" skip = param\n",
" for m in range(api[v].F.otype.maxSlot, n + param, -1):\n",
" thisMapping[m] = thisMapping[m - param]\n",
" for m in range(n + 1, n + param + 1):\n",
" thisMapping[m] = thisMapping[n]\n",
" elif action == \"split\":\n",
" plural = \"\" if param == 1 else \"s\"\n",
" solved = f\"{action} into {param} extra slot{plural}\"\n",
" thisDissimilarity[n] = param\n",
" for m in range(n + 1, api[v].F.otype.maxSlot + 1):\n",
" thisMapping[m] = thisMapping[m] + param\n",
" elif action == \"ok\":\n",
" solved = \"incidental variation in lexeme\"\n",
" thisDissimilarity[n] = 0\n",
" elif lex1 in theseCases:\n",
" (action, param) = theseCases[lex1]\n",
" if action == \"ok\":\n",
" if lex2 == param:\n",
" solved = \"systematic variation in lexeme\"\n",
" thisDissimilarity[n] = 0\n",
" elif action == \"split\":\n",
" plural = \"\" if param == 1 else \"s\"\n",
" solved = f\"systematic {action} into {param} extra slot{plural}\"\n",
" thisDissimilarity[n] = param\n",
" for m in range(n + 1, api[v].F.otype.maxSlot + 1):\n",
" thisMapping[m] = thisMapping[m] + param\n",
" \n",
" # then try some more general rules\n",
" \n",
" elif \"_\" in lex1:\n",
" action = \"split\"\n",
" param = lex1.count(\"_\")\n",
" plural = \"\" if param == 1 else \"s\"\n",
" solved = f\"{action} on _ into {param} extra slot{plural}\"\n",
" thisDissimilarity[n] = param\n",
" for m in range(n + 1, api[v].F.otype.maxSlot + 1):\n",
" thisMapping[m] = thisMapping[m] + param\n",
" elif lex1 == lex2 + lex2next:\n",
" if lex2 == \"H\":\n",
" solved = \"split article off\"\n",
" thisDissimilarity[n] = 1\n",
" for m in range(n + 1, api[v].F.otype.maxSlot + 1):\n",
" thisMapping[m] = thisMapping[m] + 1\n",
" elif set(maskedLex1) == set(maskedLex2) | set(maskedLex2next):\n",
" if lex2 == \"B\" or lex2 == \"MN\":\n",
" solved = \"split preposition off\"\n",
" thisDissimilarity[n] = 1\n",
" for m in range(n + 1, api[v].F.otype.maxSlot + 1):\n",
" thisMapping[m] = thisMapping[m] + 1\n",
" print(f\"Action: {solved if solved else 'BLOCKED'}\\n\")\n",
"\n",
" # stop the loop if the discrepancy is not solved\n",
" # The discrepancy has already been printed to the output,\n",
" # so you can see immediately what is happening there\n",
" \n",
" if not solved:\n",
" break\n",
"\n",
" # if the discrepancy was solved, \n",
" # advance to the first position after the discrepancy\n",
" # and try to find a new discrepancy in the next iteration\n",
" start = n + 1 + skip\n",
"\n",
" if not solved:\n",
" print(f\"Blocking difference in {iteration} iterations\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"The mappings itself are needed elsewhere in Text-Fabric, let us write them to file.\n",
"We write them into the dataset corresponding to the target version.\n",
"So the map `3-4` ends up in version `4`."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def edgesFromMaps():\n",
" edges.clear()\n",
" for ((v, w), mp) in sorted(mappings.items()):\n",
" caption(4, \"Make edge from slot mapping {} => {}\".format(v, w), silent=SILENT)\n",
"\n",
" edge = {}\n",
" dm = dissimilarity[(v, w)]\n",
"\n",
" for n in range(1, api[v].F.otype.maxSlot + 1):\n",
" m = mp[n]\n",
" k = dm.get(n, None)\n",
" if k is None:\n",
" if n in edge:\n",
" if m not in edge[n]:\n",
" edge[n][m] = None\n",
" else:\n",
" edge.setdefault(n, {})[m] = None\n",
" else:\n",
" if k > 0:\n",
" for j in range(m, m + k + 1):\n",
" edge.setdefault(n, {})[j] = k\n",
" elif k < 0:\n",
" for i in range(n, n - k + 1):\n",
" edge.setdefault(i, {})[m] = k\n",
" else:\n",
" edge.setdefault(n, {})[m] = 0\n",
" edges[(v, w)] = edge"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Running\n",
"\n",
"Here we run the mapping between `3` and `4`.\n",
"\n",
"## 3 => 4\n",
"\n",
"Here are the special cases for this conversion."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"cases.update(\n",
" {\n",
" (\"3\", \"4\"): {\n",
" \"CXH[\": (\"ok\", \"XWH[\"),\n",
" \"MQYT/\": (\"split\", 1),\n",
" 28730: (\"ok\", None),\n",
" 121812: (\"ok\", None),\n",
" 174515: (\"ok\", None),\n",
" 201089: (\"ok\", None),\n",
" 218383: (\"split\", 2),\n",
" 221436: (\"ok\", None),\n",
" 247730: (\"ok\", None),\n",
" 272883: (\"collapse\", 1),\n",
" 353611: (\"ok\", None),\n",
" },\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Genesis 18:2 ==> slot 7840 ==> 7840\n",
" ╋MN╋PTX╋H╋>HL┫┣W┫┣▶CXH◀┫┣>RY┫┣W┫┣>MR┫┣>DNJ┫┣>M╋\n",
" ╋MN╋PTX╋H╋>HL┫┣W┫┣▶XWH◀┫┣>RY┫┣W┫┣>MR┫┣>DNJ┫┣>M╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 19:1 ==> slot 8447 ==> 8447\n",
" ╋W┫┣QWM┫┣L╋QR>┫┣W┫┣▶CXH◀┫┣>P┫┣>RY┫┣W┫┣>MR┫┣HNH╋\n",
" ╋W┫┣QWM┫┣L╋QR>┫┣W┫┣▶XWH◀┫┣>P┫┣>RY┫┣W┫┣>MR┫┣HNH╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 21:14 ==> slot 9856 ==> 9856\n",
" ╋HLK┫┣W┫┣TR_CB<◀┫┣W┫┣KLH┫┣H╋MJM┫┣MN╋\n",
" ╋HLK┫┣W┫┣TR◀╋CB<┫┣W┫┣KLH┫┣H╋MJM╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 21:31 ==> slot 10174 ==> 10175\n",
" ╋L╋H╋MQWM╋H╋HW>┫┣▶B>R_CB<◀┫┣KJ┫┣CM┫┣CB<┫┣CNJM┫┣W╋\n",
" ╋L╋H╋MQWM╋H╋HW>┫┣▶B>R◀╋CB<┫┣KJ┫┣CM┫┣CB<┫┣CNJM╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 21:32 ==> slot 10183 ==> 10185\n",
" ╋CNJM┫┣W┫┣KRT┫┣BRJT┫┣B╋▶B>R_CB<◀┫┣W┫┣QWM┫┣>BJMLK╋W╋PJKL╋\n",
" ╋CNJM┫┣W┫┣KRT┫┣BRJT┫┣B╋▶B>R◀╋CB<┫┣W┫┣QWM┫┣>BJMLK╋W╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 21:33 ==> slot 10200 ==> 10203\n",
" ╋PLCTJ┫┣W┫┣NV<┫┣>CL┫┣B╋▶B>R_CB<◀┫┣W┫┣QR>┫┣CM┫┣B╋CM╋\n",
" ╋PLCTJ┫┣W┫┣NV<┫┣>CL┫┣B╋▶B>R◀╋CB<┫┣W┫┣QR>┫┣CM┫┣B╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 22:5 ==> slot 10341 ==> 10345\n",
" ╋NL┫┣W┫┣LQX╋\n",
" ╋NL┫┣W┫┣LQX╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 22:19 ==> slot 10641 ==> 10645\n",
" ╋QWM┫┣W┫┣HLK┫┣JXDW┫┣>L╋▶B>R_CB<◀┫┣W┫┣JCB┫┣>BRHM┫┣B╋B>R_CB<╋\n",
" ╋QWM┫┣W┫┣HLK┫┣JXDW┫┣>L╋▶B>R◀╋CB<┫┣W┫┣JCB┫┣>BRHM┫┣B╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 22:19 ==> slot 10646 ==> 10651\n",
" ╋B>R_CB<┫┣W┫┣JCB┫┣>BRHM┫┣B╋▶B>R_CB<◀┫┣W┫┣HJH┫┣>XR╋H╋DBR╋\n",
" ╋B>R┫┣W┫┣JCB┫┣>BRHM┫┣B╋▶B>R◀╋CB<┫┣W┫┣HJH┫┣>XR╋H╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 23:7 ==> slot 10830 ==> 10836\n",
" ╋MWT┫┣W┫┣QWM┫┣>BRHM┫┣W┫┣▶CXH◀┫┣L╋RY┫┣L╋\n",
" ╋MWT┫┣W┫┣QWM┫┣>BRHM┫┣W┫┣▶XWH◀┫┣L╋RY┫┣L╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 23:12 ==> slot 10933 ==> 10939\n",
" ╋NTN┫┣L┫┣QBR┫┣MWT┫┣W┫┣▶CXH◀┫┣>BRHM┫┣L╋PNH╋BRHM┫┣L╋PNH╋ slot 11604 ==> 11610\n",
" ╋W┫┣QDD┫┣H╋>JC┫┣W┫┣▶CXH◀┫┣L╋JHWH┫┣W┫┣>MR┫┣BRK╋\n",
" ╋W┫┣QDD┫┣H╋>JC┫┣W┫┣▶XWH◀┫┣L╋JHWH┫┣W┫┣>MR┫┣BRK╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 24:48 ==> slot 12051 ==> 12057\n",
" ╋T╋\n",
" ╋T╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 24:52 ==> slot 12144 ==> 12150\n",
" ╋BRHM┫┣>T╋DBR┫┣W┫┣▶CXH◀┫┣>RY┫┣L╋JHWH┫┣W┫┣JY>╋\n",
" ╋BRHM┫┣>T╋DBR┫┣W┫┣▶XWH◀┫┣>RY┫┣L╋JHWH┫┣W┫┣JY>╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 25:20 ==> slot 12724 ==> 12730\n",
" ╋BT╋BTW>L┫┣H╋>RMJ┫┣MN╋▶PDN_>RM◀┫┣>XWT╋LBN┫┣H╋>RMJ┫┣L╋\n",
" ╋BT╋BTW>L┫┣H╋>RMJ┫┣MN╋▶PDN◀╋>RM┫┣>XWT╋LBN┫┣H╋>RMJ╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 26:23 ==> slot 13405 ==> 13412\n",
" ╋>RY┫┣W┫┣R_CB<◀┫┣W┫┣R>H┫┣>L┫┣JHWH┫┣B╋\n",
" ╋>RY┫┣W┫┣R◀╋CB<┫┣W┫┣R>H┫┣>L┫┣JHWH╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 26:33 ==> slot 13588 ==> 13596\n",
" ╋R_CB<◀┫┣R◀╋CB<┫┣ slot 14101 ==> 14110\n",
" ╋W╋TJRWC┫┣M┫┣HWH┫┣GBJR┫┣L╋\n",
" ╋W╋TJRWC┫┣M┫┣HWH┫┣GBJR┫┣L╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 27:29 ==> slot 14109 ==> 14118\n",
" ╋HWH┫┣GBJR┫┣L╋>X┫┣W┫┣▶CXH◀┫┣L┫┣BN╋>M┫┣>RR┫┣>RR╋\n",
" ╋HWH┫┣GBJR┫┣L╋>X┫┣W┫┣▶XWH◀┫┣L┫┣BN╋>M┫┣>RR┫┣>RR╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 28:2 ==> slot 14510 ==> 14519\n",
" ╋MN╋BT╋KNRM◀┫┣BJT╋BTW>L┫┣>B╋>M┫┣W╋\n",
" ╋MN╋BT╋KNRM┫┣BJT╋BTW>L┫┣>B╋>M╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 28:5 ==> slot 14568 ==> 14578\n",
" ╋JYXQ┫┣>T╋JRM◀┫┣>L╋LBN┫┣BN╋BTW>L┫┣H╋\n",
" ╋JYXQ┫┣>T╋JRM┫┣>L╋LBN┫┣BN╋BTW>L╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 28:6 ==> slot 14592 ==> 14603\n",
" ╋>T╋JT┫┣▶PDN_>RM◀┫┣L╋LQX┫┣L┫┣MN╋CM╋\n",
" ╋>T╋JT┫┣▶PDN◀╋>RM┫┣L╋LQX┫┣L┫┣MN╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 28:7 ==> slot 14623 ==> 14635\n",
" ╋W╋>L╋>M┫┣W┫┣HLK┫┣▶PDN_>RM◀┫┣W┫┣R>H┫┣L╋>M┫┣W┫┣HLK┫┣▶PDN◀╋>RM┫┣W┫┣R>H┫┣ slot 14659 ==> 14672\n",
" ╋>CH┫┣W┫┣JY>┫┣JR_CB<◀┫┣W┫┣HLK┫┣XRN┫┣W┫┣PG<╋\n",
" ╋>CH┫┣W┫┣JY>┫┣JR◀╋CB<┫┣W┫┣HLK┫┣XRN┫┣W╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 31:18 ==> slot 16687 ==> 16701\n",
" ╋MQNH╋QNJN┫┣>CR┫┣RKC┫┣B╋▶PDN_>RM◀┫┣L╋BW>┫┣>L╋JYXQ┫┣>B╋\n",
" ╋MQNH╋QNJN┫┣>CR┫┣RKC┫┣B╋▶PDN◀╋>RM┫┣L╋BW>┫┣>L╋JYXQ╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 33:3 ==> slot 18117 ==> 18132\n",
" ╋HW>┫┣
RY┫┣CB<╋P┫┣
RY┫┣CB<╋P slot 18175 ==> 18190\n",
" ╋CPXH┫┣HNH╋W╋JLD┫┣W┫┣▶CXH◀┫┣W┫┣NGC┫┣GM┫┣L>H╋W╋\n",
" ╋CPXH┫┣HNH╋W╋JLD┫┣W┫┣▶XWH◀┫┣W┫┣NGC┫┣GM╋L>H╋W╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 33:7 ==> slot 18183 ==> 18198\n",
" ╋GM┫┣L>H╋W╋JLD┫┣W┫┣▶CXH◀┫┣W┫┣>XR┫┣NGC┫┣JWSP╋W╋\n",
" ╋GM╋L>H╋W╋JLD┫┣W┫┣▶XWH◀┫┣W┫┣>XR┫┣NGC┫┣JWSP╋W╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 33:7 ==> slot 18191 ==> 18206\n",
" ╋NGC┫┣JWSP╋W╋RXL┫┣W┫┣▶CXH◀┫┣W┫┣>MR┫┣MJ┫┣L┫┣KL╋\n",
" ╋NGC┫┣JWSP╋W╋RXL┫┣W┫┣▶XWH◀┫┣W┫┣>MR┫┣MJ┫┣L┫┣KL╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 33:18 ==> slot 18397 ==> 18412\n",
" ╋>RY╋KN┫┣MN╋▶PDN_>RM◀┫┣W┫┣XNH┫┣>T╋PNH╋H╋\n",
" ╋>RY╋KN┫┣MN╋▶PDN◀╋>RM┫┣W┫┣XNH┫┣>T╋PNH╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 35:9 ==> slot 19216 ==> 19232\n",
" ╋J┫┣MN╋▶PDN_>RM◀┫┣W┫┣BRK┫┣>T┫┣W┫┣>MR╋\n",
" ╋J┫┣MN╋▶PDN◀╋>RM┫┣W┫┣BRK┫┣>T┫┣W╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 35:26 ==> slot 19485 ==> 19502\n",
" ╋JCR┫┣JLD┫┣L┫┣B╋▶PDN_>RM◀┫┣W┫┣BW>┫┣JL╋JYXQ╋\n",
" ╋JCR┫┣JLD┫┣L┫┣B╋▶PDN◀╋>RM┫┣W┫┣BW>┫┣JL╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 37:7 ==> slot 20271 ==> 20289\n",
" ╋W┫┣HNH┫┣SBB┫┣>LMH┫┣W┫┣▶CXH◀┫┣L╋>LMH┫┣W┫┣>MR┫┣L╋\n",
" ╋W┫┣HNH┫┣SBB┫┣>LMH┫┣W┫┣▶XWH◀┫┣L╋>LMH┫┣W┫┣>MR┫┣L╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 37:9 ==> slot 20323 ==> 20341\n",
" ╋JRX╋W╋>XD╋L╋>B╋\n",
" ╋JRX╋W╋>XD╋L╋>B╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 37:10 ==> slot 20355 ==> 20373\n",
" ╋W╋>M╋W╋>X┫┣L╋▶CXH◀┫┣L┫┣>RY┫┣W┫┣QN>┫┣B╋\n",
" ╋W╋>M╋W╋>X┫┣L╋▶XWH◀┫┣L┫┣>RY┫┣W┫┣QN>┫┣B╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 42:6 ==> slot 23509 ==> 23527\n",
" ╋W┫┣BW>┫┣>X╋JWSP┫┣W┫┣▶CXH◀┫┣L┫┣>P┫┣>RY┫┣W┫┣R>H╋\n",
" ╋W┫┣BW>┫┣>X╋JWSP┫┣W┫┣▶XWH◀┫┣L┫┣>P┫┣>RY┫┣W┫┣R>H╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 43:26 ==> slot 24650 ==> 24668\n",
" ╋B╋JD┫┣H╋BJT┫┣W┫┣▶CXH◀┫┣L┫┣>RY┫┣W┫┣C>L┫┣L╋\n",
" ╋B╋JD┫┣H╋BJT┫┣W┫┣▶XWH◀┫┣L┫┣>RY┫┣W┫┣C>L┫┣L╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 43:28 ==> slot 24682 ==> 24700\n",
" ╋┫┣H╋\n",
" ╋┫┣H╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 46:1 ==> slot 25981 ==> 25999\n",
" ╋KL┫┣>CR┫┣L┫┣W┫┣BW>┫┣▶B>R_CB<◀┫┣W┫┣ZBX┫┣ZBX┫┣L╋>LHJM╋\n",
" ╋KL┫┣>CR┫┣L┫┣W┫┣BW>┫┣▶B>R◀╋CB<┫┣W┫┣ZBX┫┣ZBX┫┣L╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 46:5 ==> slot 26042 ==> 26061\n",
" ╋R_CB<◀┫┣W┫┣NF>┫┣BN╋JFR>L┫┣>T╋\n",
" ╋R◀╋CB<┫┣W┫┣NF>┫┣BN╋JFR>L╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 46:15 ==> slot 26201 ==> 26221\n",
" ╋>CR┫┣JLD┫┣L╋JRM◀┫┣W┫┣>T╋DJNH┫┣BT┫┣KL╋\n",
" ╋>CR┫┣JLD┫┣L╋JRM┫┣W┫┣>T╋DJNH┫┣BT╋\n",
"\n",
"Action: split on _ into 1 extra slot\n",
"\n",
"Genesis 47:31 ==> slot 27267 ==> 27288\n",
" ╋L┫┣W┫┣CB<┫┣L┫┣W┫┣▶CXH◀┫┣JFR>L┫┣C╋H╋MVH╋\n",
" ╋L┫┣W┫┣CB<┫┣L┫┣W┫┣▶XWH◀┫┣JFR>L┫┣C╋H╋MVH╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 48:12 ==> slot 27501 ==> 27522\n",
" ╋>T┫┣MN╋P┫┣>RY┫┣W┫┣LQX╋\n",
" ╋>T┫┣MN╋P┫┣>RY┫┣W┫┣LQX╋\n",
"\n",
"Action: systematic variation in lexeme\n",
"\n",
"Genesis 49:8 ==> slot 27858 ==> 27879\n",
" ╋>X┫┣JD┫┣B╋