# Alignment of the Lakhnawi and Afifi editions

We use our own algorithm for alignment.
Read the [docs](https://among.github.io/fusus/fusus/align.html) for an extensive discussion of it.

We also tried an existing text-alignment tool, Collatex.
That attempt was unsuccessful, read more in 
[this notebook](collatexAfLk.ipynb).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from fusus.align import Alignment

In [3]:
ALIGN = Alignment()

# Load both editions

We use the Text-Fabric versions of the separate LK and AF editions.
These datasets correspond to the results as generated by the pipelines and then further
cleaned and enriched by Cornelis.

The process leading to these results is never completely finished, so we maintain versions.

In [4]:
VERSION = "0.7"

Finding a good alignment involved a bit of trial and error.
That involved specifying a few cases where we explicitly tell the algorithm what to do.

In [5]:
CASES = {
    3072: (3032, 5, 1),
    4597: (4554, 1, 1),
    4598: (4555, 1, 1),
    4600: (4557, 0, 4),
    8273: (8286, 4, 0),
    13539: (13529, 2, 1),
    14878: (14829, 1, 1),
    14879: (14830, 1, 0),
    14880: (14830, 12, 0),
    16198: (16134, 1, 1),
    16199: (16135, 1, 1),
    16200: (16136, 1, 1),
    16201: (16137, 1, 1),
    16212: (16148, 1, 1),
    18029: (17970, 6, 1),
    22762: (22660, 1, 0),
    22763: (22660, 1, 1),
    22764: (22661, 1, 1),
}

We call a function that reads both editions and gives us a handle to all relevant
information in the alignment process.

In [6]:
ALIGN.readEditions(VERSION, CASES)

This is Text-Fabric 9.1.4
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

27 features found and 0 ignored


This is Text-Fabric 9.1.4
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

17 features found and 0 ignored


Let's find out the max slot of both editions.

In [7]:
print(f"{ALIGN.maxLK=} {ALIGN.maxAF=}")

ALIGN.maxLK=40379 ALIGN.maxAF=40271


# Run the comparison

Here we go!

In [8]:
ALIGN.doDiffs()

Alignment complete, 40983 entries.
Special cases: all relevant 18 cases defined, encountered, and applied
 0 taken:     1 x
 1 taken:    18 x
 2 taken: 36340 x
 3 taken:   538 x
 4 taken:   312 x
 6 taken:    32 x
 7 taken:    12 x
 8 taken:     1 x
 9 taken:     9 x
12 taken:     1 x


We show the first few lines of the alignment table.

Left you see the LK slot numbers and the words in those slots.

Right you see the AF words and their AF slot numbers.

The middle column is a measure for the edit distance and ratio between the words at both sides.

`@88` means: both sides have been matched per special case, no edit distance and ratio have been computed.

`@99` means: at one of the sides the word is missing.

All other values after the `@` are the number of edits you have to make in order to change one word to the other.
The number after the `~` is the ratio of similarity between the two words.

Sometimes words are combined: a number of words left corresponds to a possibly different number of words right.
You see that indicated in the `cc` columns.
If the numbers are not equal, empty words have been inserted where necessary.

If just a single word left is aligned with a single word right, the `cc` column is empty.

In [9]:
print(ALIGN.printLines(start=0, end=20))

pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
      |     |0 |                    |@99~0.0|bnzlylālʿ           |  |    1 047:01
      |     |0 |                    |@99~0.0|ylrʿā               |  |    2 047:01
008:02|    1|  |               ālḥmd|@0 ~1.0|ālḥmd               |  |    3 047:02
008:02|    2|  |                 llh|@0 ~1.0|lh                  |  |    4 047:02
008:02|    3|  |                mnzl|@0 ~1.0|mnzl                |  |    5 047:02
008:02|    4|  |               ālḥkm|@0 ~1.0|ālḥk                |  |    6 047:02
008:02|    5|  |                 ʿlá|@0 ~1.0|ʿlá                 |  |    7 047:02
008:02|    6|  |                ḳlwb|@0 ~1.0|ḳlwb                |  |    8 047:02
008:02|    7|  |               ālklm|@0 ~1.0|ālklm               |  |    9 047:02
008:02|    8|  |              bāḥdyŧ|@0 ~1.0|bāḥdyŧ              |  |   10 047:02
008:02|    9|  |

In [10]:
ALIGN.check()


SANITY

All OK

AGREEMENT

Where are the words?

	LK-only:   712 slots
	AF-only:   604 slots
	both:    39667 slots

How well is the agreement?

edit distance   0 : 39507 words
edit distance   1 :   908 words
edit distance   2 :    51 words
edit distance   3 :    11 words
edit distance  88 :    45 words
edit distance  99 :   461 words
NB: 88 are special cases that have been declared explicitly
NB: 99 are words that have no counterpart in the other edition

COMBINATIONS

What combination alignments are there and how many?
	( 0,  4) :    1 x :
EXAMPLE 1:
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
051:01| 4598|1 |               nwḥyŧ|@88~0.0|wḥh                 | 1| 4555 068:03
051:02| 4599|  |                āʿlm|@0 ~1.0|āʿlm                |  | 4556 068:04
      |     |0 |                    |@88~0.0|āydlk               | 4| 4557 068:04
      |     |0 |               

# Test cases

In [11]:
ALIGN.doDiffs(startLK=178, startAF=182, steps=4, show=True, debug=True)

[178~182] single comparison with distance <= 0 and ratio >= 1.0
[180~183] single comparison failed with distance <= 0 and ratio >= 1.0
[180~183] single comparison recovered with distance <= 0 and ratio >= 1.0
[179~183] left 1-jump to 180
[182~185] single comparison with distance <= 1 and ratio >= 0.8
[183~186] single comparison failed with distance <= 0 and ratio >= 1.0
[183~186] single comparison recovered with distance <= 0 and ratio >= 1.0
No special cases defined for this stretch
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
011:04|  178|  |                ālḥḳ|@0 ~1.0|ālḥḳ                |  |  182 048:03
011:04|  179|  |               tʿālá|@99~0.0|                    | 0|            
011:04|  180|  |                 lmā|@0 ~1.0|mā                  |  |  183 048:03
011:04|  181|  |                 smʿ|@0 ~1.0|smʿ                 |  |  184 048:03
011:04|  182|  |   

In [12]:
ALIGN.doDiffs(startLK=13347, startAF=13338, steps=4, show=True, debug=True)

[13347~13338] single comparison with distance <= 0 and ratio >= 1.0
[13348~13339] single comparison with distance <= 0 and ratio >= 1.0
[13349~13340] (2, 2) comparison with distance <= 0 and ratio >= 1.0
[13351~13342] single comparison with distance <= 0 and ratio >= 1.0
No special cases defined for this stretch
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
153:01|13347|  |              ālālhy|@0 ~1.0|ālālhy              |  |13338 108:10
153:01|13348|  |                 flā|@0 ~1.0|flā                 |  |13339 108:11
153:01|13349|2 |                 ḳrb|@0 ~1.0|ḳrbā                | 2|13340 108:11
153:02|13350|2 |                āḳrb|@0 ~1.0|ḳrb                 | 2|13341 108:11
153:02|13351|  |                  mn|@0 ~1.0|mn                  |  |13342 108:11
 2 taken:     4 x


In [13]:
ALIGN.doDiffs(startLK=5635, startAF=5607, steps=4, show=True, debug=True)

[5635~5616] single comparison with distance <= 0 and ratio >= 1.0
[5635~5607] right 9-jump to 5616
[5636~5617] single comparison with distance <= 0 and ratio >= 1.0
[5637~5618] single comparison with distance <= 0 and ratio >= 1.0
[5638~5619] single comparison with distance <= 0 and ratio >= 1.0
No special cases defined for this stretch
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
      |     |0 |                    |@99~0.0|smwhm               |  | 5607 072:14
      |     |0 |                    |@99~0.0|flw                 |  | 5608 072:14
      |     |0 |                    |@99~0.0|smwhm               |  | 5609 072:14
      |     |0 |                    |@99~0.0|ā                   |  | 5610 072:15
      |     |0 |                    |@99~0.0|lsmwhm              |  | 5611 072:15
      |     |0 |                    |@99~0.0|ḥǧārŧ               |  | 5612 072:15
     

In [14]:
ALIGN.doDiffs(startLK=11794, startAF=11799, steps=4, show=True, debug=True)

[11794~11799] single comparison failed with distance <= 0 and ratio >= 1.0
[11794~11799] single comparison recovered with distance <= 0 and ratio >= 1.0
[11796~11801] single comparison with distance <= 0 and ratio >= 1.0
[11797~11802] single comparison with distance <= 0 and ratio >= 1.0
[11798~11803] single comparison with distance <= 0 and ratio >= 1.0
No special cases defined for this stretch
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
137:04|11794|  |               ānḳṣt|@0 ~1.0|tḳf                 |  |11799 101:14
137:04|11795|  |                ʿlyh|@0 ~1.0|ʿlyh                |  |11800 101:14
137:04|11796|  |                  ān|@0 ~1.0|ān                  |  |11801 101:14
137:04|11797|  |                 šāʾ|@0 ~1.0|šāʾ                 |  |11802 101:14
137:04|11798|  |                āllh|@0 ~1.0|āllh                |  |11803 101:14
 2 taken:     4 x


In [15]:
ALIGN.doDiffs(startLK=27434, startAF=27337, steps=2, show=True, debug=True)

[27434~27337] single comparison with distance <= 1 and ratio >= 0.8
[27435~27339] single comparison failed with distance <= 2 and ratio >= 0.7
[27435~27339] single comparison recovered with distance <= 2 and ratio >= 0.7
[27435~27338] right 1-jump to 27339
No special cases defined for this stretch
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
289:03|27434|  |                 kmā|@1 ~0.8|kā                  |  |27337 171:05
      |     |0 |                    |@99~0.0|ān                  |  |27338 171:05
289:03|27435|  |               nsbth|@2 ~0.8|nsbŧ                |  |27339 171:05
289:03|27436|  |             ālfwḳyŧ|@2 ~0.8|ālfwḳ               |  |27340 171:05
 3 taken:     1 x
12 taken:     1 x


In [16]:
ALIGN.doDiffs(startLK=4595, startAF=4552, steps=10, show=True, debug=True)

[4595~4552] single comparison with distance <= 1 and ratio >= 0.8
[4596~4553] single comparison with distance <= 3 and ratio >= 0.6
[4597~4554] special case (1, 1)
[4598~4555] special case (1, 1)
[4599~4556] single comparison with distance <= 0 and ratio >= 1.0
[4600~4557] special case (0, 4)
[4600~4561] single comparison with distance <= 0 and ratio >= 1.0
[4601~4562] single comparison with distance <= 0 and ratio >= 1.0
[4602~4563] single comparison with distance <= 0 and ratio >= 1.0
[4603~4564] single comparison with distance <= 0 and ratio >= 1.0
Special cases: all relevant 3 cases defined, encountered, and applied
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
051:01| 4595|  |              sbwḥyŧ|@1 ~0.8|sbwḥyh              |  | 4552 068:03
051:01| 4596|  |                  fy|@1 ~0.7|y                   |  | 4553 068:03
051:01| 4597|1 |                klmŧ|@88~0.0

In [17]:
ALIGN.doDiffs(startLK=16197, startAF=16133, steps=10, show=True, debug=True)

[16197~16133] single comparison with distance <= 0 and ratio >= 1.0
[16198~16134] special case (1, 1)
[16199~16135] special case (1, 1)
[16200~16136] special case (1, 1)
[16201~16137] special case (1, 1)
[16202~16138] single comparison with distance <= 0 and ratio >= 1.0
[16203~16139] single comparison with distance <= 0 and ratio >= 1.0
[16204~16140] single comparison with distance <= 0 and ratio >= 1.0
[16205~16141] single comparison with distance <= 0 and ratio >= 1.0
[16206~16142] single comparison with distance <= 0 and ratio >= 1.0
Special cases: all relevant 4 cases defined, encountered, and applied
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
178:05|16197|  |                  fy|@0 ~1.0|fy                  |  |16133 121:10
178:05|16198|1 |            ālʿārfyn|@88~0.0|ālʿārf              | 1|16134 121:11
178:05|16199|1 |                 tḳf|@88~0.0|yḳf          

In [18]:
ALIGN.doDiffs(startLK=16210, startAF=16146, steps=15, show=True, debug=True)

[16210~16146] single comparison with distance <= 0 and ratio >= 1.0
[16211~16147] single comparison with distance <= 0 and ratio >= 1.0
[16212~16148] special case (1, 1)
[16213~16149] single comparison failed with distance <= 0 and ratio >= 1.0
[16213~16149] single comparison recovered with distance <= 0 and ratio >= 1.0
[16215~16151] single comparison with distance <= 0 and ratio >= 1.0
[16216~16152] (2, 3) comparison with distance <= 1 and ratio >= 0.8
[16218~16155] (2, 3) comparison with distance <= 2 and ratio >= 0.7
[16220~16158] single comparison with distance <= 0 and ratio >= 1.0
[16221~16159] single comparison with distance <= 0 and ratio >= 1.0
[16222~16160] (2, 1) comparison with distance <= 0 and ratio >= 1.0
[16224~16161] single comparison with distance <= 0 and ratio >= 1.0
[16225~16162] single comparison with distance <= 0 and ratio >= 1.0
[16226~16163] single comparison with distance <= 0 and ratio >= 1.0
[16227~16164] (2, 1) comparison with distance <= 0 and ratio >= 1