<img align="right" src="images/ninologo.png" width="150"/>
<img align="right" src="images/tf-small.png" width="125"/>
<img align="right" src="images/dans.png" width="150"/>

# Jumps

Things do not only lie embedded in each other, they can also *point* to each other.
The mechanism for that are *edges*. Edges are links between *nodes*.
Like nodes, edges may carry feature values.

We learn how to deal with structure in a quantitative way.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import collections
from IPython.display import Markdown, display
from tf.app import use

In [3]:
A = use("Nino-cunei/uruk",hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
tablet,6364,22.01,100
face,9456,14.1,95
column,14023,9.34,93
line,35842,3.61,92
case,9651,3.46,24
cluster,32753,1.03,24
quad,3794,2.05,6
comment,11090,1.0,8
sign,140094,1.0,100


## Measuring depth

Numbered lines in the transliterations indicate a hierarchy of cases within lines.
How deep can cases go?
We explore the distribution of cases with respect to their depth.

We need a function that computes the depth of a case.
We program that function in such a way that it also works for *quads* (seen before),
and *clusters* (will see later).

The idea of this function is:
* if a structure does not have sub-structures, its depth is 1 or 0;
  * it is 1 if the lowest level parts of the structure have a different name
    such as quads versus signs;
  * it is 0 if the lowest level parts of the structure have the same name,
    such as cases in lines;
* the depth of a structure is 1 more than the maximum of the depths of its sub-structures.

How do we find the sub-structures of a structure?
By following *edges* with a `sub` feature, as we have seen in
[quads](quads.ipynb).

In [4]:
def depthStructure(node, nodeType, ground):
    subDepths = [
        depthStructure(subNode, nodeType, ground)
        for subNode in E.sub.f(node)
        if F.otype.v(subNode) == nodeType
    ]
    if len(subDepths) == 0:
        return ground
    else:
        return max(subDepths) + 1

## Example: cases

We call up our example tablet and do a few basic checks on cases.

Note that there is also a feature **depth** that provides the depth at which a case is found,
which is different from the depth a case has.

In [5]:
pNum = "P005381"
query = """
tablet catalogId=P005381
"""
results = A.search(query)
A.show(results, withNodes=True, lineNumbers=True, showGraphics=False)

  0.00s 1 result


In [6]:
line1 = T.nodeFromSection((pNum, "obverse:1", "1"))
A.pretty(line1, showGraphics=False)
depthStructure(line1, "case", 0)

1

That makes sense, since case 1 is divided in one level of sub-cases: 1a and 1b.

In [7]:
L.d(line1, otype="case")

(167736, 167737)

In [8]:
line2 = T.nodeFromSection((pNum, "obverse:1", "2"))
A.pretty(line2, showGraphics=False)
depthStructure(line2, "case", 0)

0

Indeed, case 2 does not have a division in sub-cases.

In [9]:
L.d(line2, otype="case")

()

## Counting by depth

For a variety of structures we'll find out how deep they go,
and how depth is distributed in the corpus.

### Cases

We are going to collect all cases in buckets according to their depths.

In [10]:
caseDepths = collections.defaultdict(list)

for n in F.otype.s("line"):
    caseDepths[depthStructure(n, "case", 0)].append(n)
for n in F.otype.s("case"):
    caseDepths[depthStructure(n, "case", 0)].append(n)

caseDepthsSorted = sorted(
    caseDepths.items(),
    key=lambda x: (-x[0], -len(x[1])),
)

for (depth, casesOrLines) in caseDepthsSorted:
    print(f"{len(casesOrLines):>5} cases or lines with depth {depth}")

   24 cases or lines with depth 4
   66 cases or lines with depth 3
 1024 cases or lines with depth 2
 3247 cases or lines with depth 1
41132 cases or lines with depth 0


We'll have some fun with this. We find two of the deepest cases, one on
a face that is as small as possible, one on a face that is as big as possible.

So we restrict ourselves to `caseDepths[4]`.

For all of these cases we find the face they are on, and the number of quads on that face.

In [11]:
deepCases = caseDepths[4]
candidates = []

for case in deepCases:
    face = L.u(case, otype="face")[0]
    size = len(A.getOuterQuads(face))
    candidates.append((case, size))

sortedCandidates = sorted(candidates, key=lambda x: (x[1], x[0]))
sortedCandidates

[(253501, 16),
 (232985, 18),
 (248868, 23),
 (255246, 32),
 (241089, 37),
 (247955, 38),
 (250963, 38),
 (231788, 41),
 (231789, 41),
 (245488, 45),
 (242207, 48),
 (253727, 48),
 (241171, 52),
 (255664, 53),
 (249501, 59),
 (251109, 63),
 (255650, 94),
 (242646, 112),
 (242647, 112),
 (248316, 112),
 (256051, 295),
 (256058, 295),
 (256061, 295),
 (256062, 295)]

We can do better than this!

In [12]:
A.table(sortedCandidates)

n,p,line,sign
1,P006428 obverse:2:2,1a4(N14) 3(N01) [...] [...] 1b1b11b1A2(N14) 3(N01) BA 1b1B1b1B1AN 3(N57) 1b1B2EN~a PA~a ERIN 1b2[...] 2(N01) GI [...],X
2,P006428 obverse:3:1,1a1a11a1A1a1A1[...] 5(N01) [...] UDU~a 1a1A2[...] 7(N01) MASZ2 1a1B4(N14) 1(N01) DUR~b 1a21a2A[...] [...] 1a2B2(N14) 3(N01) UDU~a GI 1b|LAL2~axNIM~b2| [...],X
3,P006428 obverse:3:3,2a2a15(N01) SU~a PAP~a 2a2UNUG~a RAD~a 2b2b12b1A2b1A11(N01) SZUR2~a KU3~a E2~a 2b1A2[1(N01)] [...] 2b1BUR~a 2b23(N01) TUR BAR,SUHUR
4,P006428 obverse:3:7,3a5(N01) [...] SZA3~a1 TUR 3b3b13b1A[...] [SAL] 3b1B3b1B1[...] 3b23b2A5(N01) KUR~a 3b2B3b2B1X [...] 3b2B2X [...],X
5,P006428 obverse:5:1,1a[...] 5(N14) 6(N01) GAR 1b1b1[...] X [...] 1b21(N34) 4(N14) 8(N01) SZE~a GAR 1c1c11c1a[...] 1(N14) [...] 1c1b[...] 1(N34) [...] 1(N14) GUG2~a 1c21c2a4(N14) 4(N01) [...] TUR |U4.2(N08)| 1c2b1c2b11(N01) [...] 1c2b21(N14) 7(N01) TUR 1c2b32(N14) 6(N01) SUR,...
6,P448701,2a6(N01) EN~a 2b2b12b1A2(N01) NUN~a ZATU687 EN~a EN~a TUR 2b1B2b1B1(EN~a# PAP~a#)a 2b1B2(3(N57) GAN2)a 2b22b2A4(N01) EN~a X KI ZATU687 AN 2b2B2b2B1(EN~a |SZU2.E2~b|)a 2b2B2(BU~a SZU)a 2b2B3(SAL BU~a)a 2b2B4(EN~a HI KASZ~c)a,
7,P448701,1a1a11a1A4(N01) SZE~a 1a1B1a1B11(N01) UD5~a 1a1B23(N01) MASZ2 1a24(N34) 4(N14) 2(N01) DUB~a BA UDU~a 1b1b12(N34) 2(N14) 4(N01) DARA4~c2 1b21(N34) 5(N14) 6(N01) MASZ2 UD5~a 1b32(N14) 2(N01) SZE3 UDU~a 1c1(N34) 2(N14) 1(N01) |U4x1(N57)| BAR,
8,P448701 obverse:1:1,1a5(N01) SAL 1b1b11b1A4(N01) SAL 1b1B1b1B1(NAB DI |BU~a+DU6~a|)a 1b1B2(ZI~a#? AN)a 1b1B3(ANSZE~e 7(N57) DUR2 DU)a 1b1B4(LAL3~a#? GAR IG~b)a 1b21b2A1(N01) SZA3~a1 TUR 1b2B(TU~b)a,4(N41)
9,P448701 obverse:1:1,2a3(N01) KUR~a 2b2b12b1A1(N01) KUR~a 2b1B(NA~a NIR~a)a 2b22b2A2(N01) SZA3~a1 TUR 2b2B2b2B1(GI6 KISZIK~a# URI3~a)a 2b2B2([...])a,4(N41)
10,P448701 obverse:1:2,3a2(N34) 4(N14) [...] X [...] 3b3b13b1A[...] ZAG~a X SUHUR [...] 3b1B3b1B12(N34) 2(N14) 4(N01) SUHUR [...] 3b1B22(N14) 4(N01) SUHUR [...] 3b1B34(N01) SUHUR [...] 3b1B4[...] 3b21(N14) |HI.SUHUR| [...],X


We can also assemble relevant information for this table by hand
and put it in a markdown table.

In [13]:
markdown = """
case type | case number | tablet | face | size
------ | ---- | ---- | ---- | ----
""".strip()
markdown += "\n"

bigCase = sortedCandidates[-1][0]
smallCase = sortedCandidates[0][0]

for (case, size) in sortedCandidates:
    caseType = F.otype.v(case)
    caseNum = F.number.v(case)
    face = L.u(case, otype="face")[0]
    tablet = L.u(case, otype="tablet")[0]
    markdown += f"""
{caseType} | {caseNum} | {A.cdli(tablet, asString=True)} | {F.type.v(face)} | {size}
""".strip()
    markdown += "\n"

Markdown(markdown)

case type | case number | tablet | face | size
------ | ---- | ---- | ---- | ----
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P005294" title="to CDLI main page for this item">P005294</a> | obverse | 16
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P218054" title="to CDLI main page for this item">P218054</a> | reverse | 18
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P006092" title="to CDLI main page for this item">P006092</a> | obverse | 23
line | 3 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P002694" title="to CDLI main page for this item">P002694</a> | reverse | 32
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P325754" title="to CDLI main page for this item">P325754</a> | reverse | 37
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P006036" title="to CDLI main page for this item">P006036</a> | obverse | 38
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P006295" title="to CDLI main page for this item">P006295</a> | reverse | 38
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P004735" title="to CDLI main page for this item">P004735</a> | obverse | 41
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P004735" title="to CDLI main page for this item">P004735</a> | obverse | 41
line | 3 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P002856" title="to CDLI main page for this item">P002856</a> | obverse | 45
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P411608" title="to CDLI main page for this item">P411608</a> | obverse | 48
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P005322" title="to CDLI main page for this item">P005322</a> | reverse | 48
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P325234" title="to CDLI main page for this item">P325234</a> | reverse | 52
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P003531" title="to CDLI main page for this item">P003531</a> | obverse | 53
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P006160" title="to CDLI main page for this item">P006160</a> | obverse | 59
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P006307" title="to CDLI main page for this item">P006307</a> | reverse | 63
line | 3 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P003529" title="to CDLI main page for this item">P003529</a> | obverse | 94
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P387752" title="to CDLI main page for this item">P387752</a> | obverse | 112
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P387752" title="to CDLI main page for this item">P387752</a> | obverse | 112
line | 3 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P006056" title="to CDLI main page for this item">P006056</a> | reverse | 112
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P003808" title="to CDLI main page for this item">P003808</a> | obverse | 295
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P003808" title="to CDLI main page for this item">P003808</a> | obverse | 295
line | 1 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P003808" title="to CDLI main page for this item">P003808</a> | obverse | 295
line | 2 | <a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&amp;ObjectID=P003808" title="to CDLI main page for this item">P003808</a> | obverse | 295


Not surprisingly: the deepest cases are all lines.
Because every case is enclosed by a line, which is one deeper than that case.

You can click on the P-numbers to view these tablets on CDLI.

We finally show the source lines that contain these deep cases.

In [14]:
A.pretty(smallCase)
A.pretty(bigCase)

With a bit of coding we can get another display:

In [15]:
(smallPnum, smallColumn, smallCaseNum) = A.caseFromNode(smallCase)
(bigPnum, bigColumn, bigCaseNum) = A.caseFromNode(bigCase)

smallLineStr = "\n".join(A.getSource(smallCase))
bigLineStr = "\n".join(A.getSource(bigCase))

display(
    Markdown(
        f"""
**{smallPnum} {smallColumn} line {smallCaseNum}**

```
{smallLineStr}
```
"""
    )
)
A.lineart(smallPnum, width=200)

display(
    Markdown(
        f"""

---

**{bigPnum} {bigColumn} line {bigCaseNum}**

```
{bigLineStr}
```
"""
    )
)
A.photo(bigPnum, width=400)


**P005294 obverse:1 line 1**

```
@obverse 
@column 1 
1.a. 4(N14)# 3(N01) [...] , [...] 
1.b1A. 2(N14) 3(N01) , BA 
1.b1B1. , AN 3(N57) 
1.b1B2. , EN~a PA~a ERIN 
1.b2. [...] 2(N01)# , GI# [...] 
```




---

**P003808 obverse:6 line 2**

```
2.a. 1(N01) , KU6~a 
2.b1A. 6(N01) , |SILA3~axGARA2~a| 
2.b1B1. 1(N57) , EN~a# SAG# 
2.b1B2. 1(N57) , HI E2~a DILMUN NUN~a 
2.b1B3. 1(N57) , NAMESZDA 
2.b1B4. 1(N57) , GESZTU~a? DIM~a 
2.b1B5. 1(N57) , SZA SZU 
2.b1B6. 1(N57) , GI BAD 
2.b2. 1(N01) , |SILA3~axGA~a| |SIxSZE3| EN~a# NUN~a# 
2.b3. 5(N14) , BA SILA3~a KASZ~b 
```


### Quads

We just want to see how deep quads can get.

In [16]:
quadDepths = collections.defaultdict(list)

for quad in F.otype.s("quad"):
    quadDepths[depthStructure(quad, "quad", 1)].append(quad)

quadDepthsSorted = sorted(
    quadDepths.items(),
    key=lambda x: (-x[0], -len(x[1])),
)

for (depth, quads) in quadDepthsSorted:
    print(f"{len(quads):>5} quads with depth {depth}")

    1 quads with depth 3
  167 quads with depth 2
 3626 quads with depth 1


Lo and behold! There is just one quad of depth 3 and it is on our leading
example tablet.

We have studied it already in [quads](quads.jpg).

In [17]:
bigQuad = quadDepths[3][0]
tablet = L.u(bigQuad, otype="tablet")[0]
A.lineart(bigQuad)
A.cdli(tablet)

### Clusters

Clusters are groups of consecutive quads between brackets.

Clusters can be nested.
As with quads, we find the members of a cluster by following `sub` edges.

#### Depths in clusters

We use familiar logic to get a hang of cluster depths.

In [18]:
clusterDepths = collections.defaultdict(list)

for cl in F.otype.s("cluster"):
    clusterDepths[depthStructure(cl, "cluster", 1)].append(cl)

clusterDepthsSorted = sorted(
    clusterDepths.items(),
    key=lambda x: (-x[0], -len(x[1])),
)

for (depth, cls) in clusterDepthsSorted:
    print(f"{len(cls):>5} clusters with depth {depth}")

  106 clusters with depth 2
32647 clusters with depth 1


Not much going on here.
Let's pick a nested cluster.

In [19]:
nestedCluster = clusterDepths[2][0]
tablet = L.u(nestedCluster, otype="tablet")[0]
quads = A.getOuterQuads(nestedCluster)
print(A.atfFromCluster(nestedCluster))
A.pretty(nestedCluster, withNodes=True)
A.lineart(quads[0], height=150)
A.cdli(tablet)

(IDIGNA [...] ...)a


#### Kinds of clusters

In our corpus we encounter several types of brackets:

* `( )a` for proper names
* `[ ]` for uncertainty
* `< >` for supplied material.

The next thing is to get on overview of the distribution of these kinds.

In [20]:
clusterTypeDistribution = collections.Counter()

for cluster in F.otype.s("cluster"):
    typ = F.type.v(cluster)
    clusterTypeDistribution[typ] += 1

for (typ, amount) in sorted(
    clusterTypeDistribution.items(),
    key=lambda x: (-x[1], x[0]),
):
    print(f"{amount:>5} x a {typ:>8}-cluster")

32116 x a uncertain-cluster
  636 x a properName-cluster
    1 x a supplied-cluster


The conversion to TF has transformed `[...]` to a cluster of one sign with grapheme `…`.
These are trivial clusters and we want to exclude them from further analysis, so we redo the counting.

First we make a sequence of all non-trivial clusters:

In [21]:
realClusters = [
    c
    for c in F.otype.s("cluster")
    if (
        F.type.v(c) != "uncertain"
        or len(E.oslots.s(c)) > 1
        or F.grapheme.v(E.oslots.s(c)[0]) != "…"
    )
]
len(realClusters)

3384

Now we redo the same analysis, but we start with the filtered cluster sequence.

In [22]:
clusterTypeDistribution = collections.Counter()

for cluster in realClusters:
    typ = F.type.v(cluster)
    clusterTypeDistribution[typ] += 1

for (typ, amount) in sorted(
    clusterTypeDistribution.items(),
    key=lambda x: (-x[1], x[0]),
):
    print(f"{amount:>5} x a {typ:>8}-cluster")

 2747 x a uncertain-cluster
  636 x a properName-cluster
    1 x a supplied-cluster


#### Lengths of clusters

How long are clusters in general?
There are two possible ways to measure the length of a cluster:

* the amount of signs it occupies;
* the amount of top-level members it has (quads or signs)

By now, the pattern to answer questions like this is becoming familiar.

We express the logic in a function, that takes the way of measuring
as a parameter.
In that way, we can easily provide a cluster-length distribution based
on measurements in signs and in quads.

In [23]:
def computeDistribution(nodes, measure):
    distribution = collections.Counter()

    for node in nodes:
        m = measure(node)
        distribution[m] += 1

    for (m, amount) in sorted(
        distribution.items(),
        key=lambda x: (-x[1], x[0]),
    ):
        print(f"{amount:>5} x a measure of {m:>8}")

In [24]:
def lengthInSigns(node):
    return len(L.d(node, otype="sign"))


def lengthInMembers(node):
    return len(E.sub.f(node))

Now we can show the length distributions of clusters by just calling `computeDistribution()`:

In [25]:
computeDistribution(realClusters, lengthInSigns)

 2691 x a measure of        1
  433 x a measure of        2
  205 x a measure of        3
   41 x a measure of        4
    9 x a measure of        5
    3 x a measure of        6
    2 x a measure of        7


In [26]:
computeDistribution(realClusters, lengthInMembers)

 2678 x a measure of        1
  452 x a measure of        2
  194 x a measure of        3
   44 x a measure of        4
   11 x a measure of        5
    4 x a measure of        6
    1 x a measure of        7


Of course, we want to see the longest cluster.

In [27]:
longestCluster = [c for c in F.otype.s("cluster") if lengthInMembers(c) == 7][0]
A.pretty(longestCluster)

#### Lengths of quads

If you look closely at the code for these functions, there is nothing in it that
is specific for clusters.

The measures are in terms of the totally generic `oslots` function, and the fairly generic
`sub` edges, which are also defined for quads.

So, in one go, we can obtain a length distribution of quads.

Note that quads can also be sub-quads.

In [28]:
computeDistribution(F.otype.s("quad"), lengthInSigns)

 3611 x a measure of        2
  175 x a measure of        3
    7 x a measure of        4
    1 x a measure of        5


In [29]:
computeDistribution(F.otype.s("quad"), lengthInMembers)

 3778 x a measure of        2
   16 x a measure of        3


In [30]:
longestQuad = [q for q in F.otype.s("quad") if lengthInSigns(q) == 5][0]
A.pretty(longestQuad)

# Next

[cases](cases.ipynb)

*In* case *you are serious ...*

Try the
[primers](http://nbviewer.jupyter.org/github/Nino-cunei/primers/tree/master/)
for introductions into digital cuneiform research.

All chapters:
[start](start.ipynb)
[imagery](imagery.ipynb)
[steps](steps.ipynb)
[search](search.ipynb)
[calc](calc.ipynb)
[signs](signs.ipynb)
[quads](quads.ipynb)
**jumps**
[cases](cases.ipynb)

---

CC-BY Dirk Roorda