{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "# Cases\n", "\n", "Cases are the building blocks on the faces of tablets.\n", "\n", "What about the distribution of signs in deeply nested cases versus outer cases?\n", "We show here how you can begin to investigate that." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-11T10:16:51.900758Z", "start_time": "2018-05-11T10:16:51.835847Z" } }, "outputs": [], "source": [ "import collections\n", "from IPython.display import display, Markdown\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-05-11T10:16:54.454775Z", "start_time": "2018-05-11T10:16:53.137842Z" } }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/Nino-cunei/uruk/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/Nino-cunei/uruk/tf/1.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 11.3.0, Nino-cunei/uruk/app v3, Search Reference
\n", " Data: Nino-cunei - uruk 1.0, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
tablet636422.01100
face945614.1095
column140239.3493
line358423.6192
case96513.4624
cluster327531.0324
quad37942.056
comment110901.008
sign1400941.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Uruk IV/III: Proto-cuneiform tablets \n", "
\n", "\n", "
\n", "
\n", "catalogId\n", "
\n", "
str
\n", "\n", " identifier of tablet in catalog (http://www.flutopedia.com/tablets.htm)\n", "\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "damage\n", "
\n", "
int
\n", "\n", " indicates damage of signs or quads,corresponds to #-flag in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "depth\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "excavation\n", "
\n", "
str
\n", "\n", " excavation number of tablet\n", "\n", "
\n", "\n", "
\n", "
\n", "fragment\n", "
\n", "
str
\n", "\n", " level between tablet and face\n", "\n", "
\n", "\n", "
\n", "
\n", "fullNumber\n", "
\n", "
str
\n", "\n", " the combination of face type and column number on columns\n", "\n", "
\n", "\n", "
\n", "
\n", "grapheme\n", "
\n", "
str
\n", "\n", " name of a grapheme (glyph)\n", "\n", "
\n", "\n", "
\n", "
\n", "identifier\n", "
\n", "
str
\n", "\n", " additional information pertaining to the name of a face\n", "\n", "
\n", "\n", "
\n", "
\n", "modifier\n", "
\n", "
str
\n", "\n", " indicates modifcation of a sign; corresponds to sign@letter in transcription. if the grapheme is a repeat, the modification applies to the whole repeat.\n", "\n", "
\n", "\n", "
\n", "
\n", "modifierFirst\n", "
\n", "
str
\n", "\n", " indicates the order between modifiers and variants on the same object; if 1, modifiers come before variants\n", "\n", "
\n", "\n", "
\n", "
\n", "modifierInner\n", "
\n", "
str
\n", "\n", " indicates modifcation of a sign within a repeatcorresponds to sign@letter in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "name\n", "
\n", "
str
\n", "\n", " name of tablet\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " number of a column or line or case\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "period\n", "
\n", "
str
\n", "\n", " period that characterises the tablet corpus\n", "\n", "
\n", "\n", "
\n", "
\n", "prime\n", "
\n", "
int
\n", "\n", " indicates the presence/multiplicity of a prime (single quote)\n", "\n", "
\n", "\n", "
\n", "
\n", "remarkable\n", "
\n", "
int
\n", "\n", " corresponds to ! flag in transcription \n", "\n", "
\n", "\n", "
\n", "
\n", "repeat\n", "
\n", "
int
\n", "\n", " number indicating the number of repeats of a grapheme,especially in numerals; -1 comes from repeat N in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "srcLn\n", "
\n", "
str
\n", "\n", " transcribed line\n", "\n", "
\n", "\n", "
\n", "
\n", "srcLnNum\n", "
\n", "
int
\n", "\n", " line number in transcription file\n", "\n", "
\n", "\n", "
\n", "
\n", "terminal\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "text\n", "
\n", "
str
\n", "\n", " text of comment nodes\n", "\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "\n", " type of a face; type of a comment; type of a cluster;type of a sign\n", "\n", "
\n", "\n", "
\n", "
\n", "uncertain\n", "
\n", "
int
\n", "\n", " corresponds to ?-flag in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "variant\n", "
\n", "
str
\n", "\n", " allograph for a sign, corresponds to ~x in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "variantOuter\n", "
\n", "
str
\n", "\n", " allograph for a quad, corresponds to ~x in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "written\n", "
\n", "
str
\n", "\n", " corresponds to !(xxx) flag in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "comments\n", "
\n", "
none
\n", "\n", " links comment nodes to their targets\n", "\n", "
\n", "\n", "
\n", "
\n", "op\n", "
\n", "
str
\n", "\n", " operator connecting left to right operand in a quad\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "sub\n", "
\n", "
none
\n", "\n", " connects line or case with sub-cases, quad with sub-quads; clusters with sub-clusters\n", "\n", "
\n", "\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/Nino-cunei/uruk/sources/cdli/images" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 2095 ideograph linearts
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 2724 tablet linearts
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 5495 tablet photos
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"Nino-cunei/uruk\",hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `search`\n", "\n", "You might want to read the\n", "[docs](https://annotation.github.io/text-fabric/tf/about/searchusage.html)\n", "or the tutorial chapter on\n", "[search](search.ipynb)\n", "first.\n", "\n", "Here is a quick recap.\n", "\n", "### Explanation\n", "\n", "The search template is basically\n", "\n", "```\n", "line\n", " case\n", " case\n", " sign\n", "```\n", "\n", "This bare template looks for a sign within a case within a case within a line.\n", "Indentation acts as shorthand for embedding.\n", "\n", "But this is not enough, because a subsubcase of a case is also embedded in that case.\n", "We look for a situation where the first case is *directly* embedded in the line,\n", "and the second case is *directly* embedded in the first case.\n", "\n", "In our data we have an *edge* (relationship), called `sub`, that connects lines/cases with\n", "cases that are directly embedded in them.\n", "\n", "So\n", "\n", "```\n", "c0 -sub> c1\n", "```\n", "\n", "means that `c0` is `sub`-related to `c1`.\n", "\n", "Now it is possible to see that the result of this query will have signs that occur in\n", "subcases of cases of lines." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Cunei API provides a function to collect (sub)cases at a given level of nesting.\n", "\n", "We show how to use them, and for each task we show **how you can get things done easier with search**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Level 0\n", "\n", "If we do `casesByLevel(0, terminal=False)` we get all lines.\n", "\n", "If we do `casesByLevel(0)`, we get precisely the undivided lines." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:02.543628Z", "start_time": "2018-05-09T17:50:02.448403Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test0Cases: 35842\n", "allLines : 35842\n", "test0Cases equal to allLines: True\n", "types of test0Cases: {'line'}\n", "test0CasesT: 32732\n", "Divided lines: 3110\n" ] } ], "source": [ "test0Cases = set(A.casesByLevel(0, terminal=False))\n", "allLines = set(F.otype.s(\"line\"))\n", "types0 = {F.otype.v(n) for n in test0Cases}\n", "print(f\"test0Cases: {len(test0Cases):>5}\")\n", "print(f\"allLines : {len(allLines):>5}\")\n", "print(f\"test0Cases equal to allLines: {test0Cases == allLines}\")\n", "print(f\"types of test0Cases: {types0}\")\n", "\n", "test0CasesT = set(A.casesByLevel(0))\n", "print(f\"test0CasesT: {len(test0CasesT):>5}\")\n", "print(f\"Divided lines: {len(test0Cases) - len(test0CasesT):>5}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us compare this with doing the same by means of search.\n", "\n", "* All lines" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:05.087375Z", "start_time": "2018-05-09T17:50:05.040872Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.02s 35842 results\n" ] } ], "source": [ "query = \"\"\"\n", "line\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Undivided lines" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:06.483120Z", "start_time": "2018-05-09T17:50:06.388710Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.03s 32732 results\n" ] } ], "source": [ "query = \"\"\"\n", "line terminal\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Divided lines" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:11.046637Z", "start_time": "2018-05-09T17:50:10.967173Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 3110 results\n" ] } ], "source": [ "query = \"\"\"\n", "line terminal#\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Level 1\n", "\n", "If we do `casesByLevel(1, terminal=False)` we get all cases (not lines) that are the first subdivision of a line.\n", "\n", "If we do `casesByLevel(1)`, we get a subset of these cases, namely the ones that are not themselves subdivided." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:14.536431Z", "start_time": "2018-05-09T17:50:14.489811Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test1Cases: 6559\n", "types of test1Cases: {'case'}\n", "test1CasesT: 5468\n", "Divided cases: 1091\n" ] } ], "source": [ "test1Cases = set(A.casesByLevel(1, terminal=False))\n", "types1 = {F.otype.v(n) for n in test1Cases}\n", "\n", "print(f\"test1Cases: {len(test1Cases):>5}\")\n", "print(f\"types of test1Cases: {types1}\")\n", "\n", "test1CasesT = set(A.casesByLevel(1))\n", "print(f\"test1CasesT: {len(test1CasesT):>5}\")\n", "print(f\"Divided cases: {len(test1Cases) - len(test1CasesT):>5}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, by query:\n", "\n", "* Top-level cases" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:18.634620Z", "start_time": "2018-05-09T17:50:18.592599Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 6559 results\n" ] } ], "source": [ "query = \"\"\"\n", "case depth=1\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Undivided top-level cases" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:20.516445Z", "start_time": "2018-05-09T17:50:20.471585Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 5468 results\n" ] } ], "source": [ "query = \"\"\"\n", "case depth=1 terminal\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Divided top-level cases" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:22.744904Z", "start_time": "2018-05-09T17:50:22.694624Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 1091 results\n" ] } ], "source": [ "query = \"\"\"\n", "case depth=1 terminal#\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example tablet\n", "Here we show by means of an example tablet the difference between `terminal=False` and\n", "`terminal=True` when calling `A.casesByLevel`\n", "\n", "We'll use an example tablet `P471695`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:25.101466Z", "start_time": "2018-05-09T17:50:25.082362Z" } }, "outputs": [ { "data": { "text/html": [ "
tablet P471695
Anonymous 471695uruk-iii
comment
atf: lang qpc
face obverse
column 1
line 1
case 1a
depth=1terminal=1
3(N01)
APIN~a
3(N57)
UR4~a
case 1b
depth=1
case 1b1
depth=2terminal=1
cluster =
EN~a
DU
ZATU759
cluster =
case 1b2
depth=2terminal=1
cluster =
BAN~b
KASZ~c
cluster =
case 1b3
depth=2terminal=1
cluster =
KI@n
SAG
cluster =
line 2
case 2a
depth=1terminal=1
1(N14)
2(N01)
cluster ?
...
cluster ?
case 2b
depth=1
case 2b1
depth=2terminal=1
cluster =
3(N57)
PAP~a
cluster =
case 2b2
depth=2terminal=1
comment
n lines broken
cluster =
SZU
KI
X
cluster =
case 2b3'
depth=2terminal=1
cluster =
EN~a
AN
EZINU~d
cluster =
case 2b4'
depth=2terminal=1
comment
rest broken
comment
(for a total of 12 sub-cases with PNN)
cluster =
IDIGNA
cluster ?
...
cluster ?
cluster =
column 2
line 1
case 1a
depth=1terminal=1
1(N01)
ISZ~a#?
case 1b
depth=1
case 1b1
depth=2terminal=1
comment
blank space
comment
rest broken
cluster =
PAP~a
GIR3~c
cluster =
face reverse
comment
beginning broken
column 0
line 1
terminal=1
cluster ?
1(N14)
cluster ?
6(N01)#
cluster ?
...
cluster ?
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "examplePnum = \"P471695\"\n", "exampleTablet = T.nodeFromSection((examplePnum,))\n", "A.getSource(exampleTablet)\n", "A.pretty(exampleTablet)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above we have selected all cases of level 1 from the whole corpus, and constructed two sets:\n", "* terminal cases of level 1;\n", "* all cases of level 1.\n", "Now we take the intersection of these sets with the cases of the example tablet." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:28.564887Z", "start_time": "2018-05-09T17:50:28.557253Z" } }, "outputs": [], "source": [ "exampleCases = set(L.d(exampleTablet, otype=\"case\")) | set(\n", " L.d(exampleTablet, otype=\"line\")\n", ")\n", "example2 = test1Cases & exampleCases\n", "example2T = test1CasesT & exampleCases" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:29.733836Z", "start_time": "2018-05-09T17:50:29.723812Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.a. 3(N01) , APIN~a 3(N57) UR4~a \n", "------------------------------------------------\n", "1.b1. , (EN~a DU ZATU759)a \n", "1.b2. , (BAN~b KASZ~c)a \n", "1.b3. , (KI@n SAG)a \n", "------------------------------------------------\n", "2.a. 1(N14) 2(N01) , [...] \n", "------------------------------------------------\n", "2.b1. , (3(N57) PAP~a)a \n", "2.b2. , (SZU KI X)a \n", "$ n lines broken \n", "2.b3'. , (EN~a AN EZINU~d)a \n", "2.b4'. , (IDIGNA [...])a \n", "$ rest broken \n", "$ (for a total of 12 sub-cases with PNN) \n", "------------------------------------------------\n", "1.a. 1(N01) , ISZ~a#? \n", "------------------------------------------------\n", "1.b1. , (PAP~a GIR3~c)a \n", "$ blank space \n", "$ rest broken \n" ] } ], "source": [ "print(f'\\n{\"-\" * 48}\\n'.join(\"\\n\".join(A.getSource(c)) for c in sorted(example2)))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:31.335542Z", "start_time": "2018-05-09T17:50:31.328295Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.a. 3(N01) , APIN~a 3(N57) UR4~a \n", "------------------------------------------------\n", "2.a. 1(N14) 2(N01) , [...] \n", "------------------------------------------------\n", "1.a. 1(N01) , ISZ~a#? \n" ] } ], "source": [ "print(f'\\n{\"-\" * 48}\\n'.join(\"\\n\".join(A.getSource(c)) for c in sorted(example2T)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also show it with `plain()`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:33.171554Z", "start_time": "2018-05-09T17:50:33.146893Z" } }, "outputs": [ { "data": { "text/html": [ "
P471695 obverse:1:1  1a3(N01) APIN~a 3(N57) UR4~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
P471695 obverse:1:1  1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
P471695 obverse:1:2  2a1(N14) 2(N01) [...]
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
P471695 obverse:1:2  2b2b1(3(N57) PAP~a)a 2b2 (SZU KI X)a 2b3'(EN~a AN EZINU~d)a 2b4' (IDIGNA [...])a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
P471695 obverse:2:1  1a1(N01) ISZ~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
P471695 obverse:2:1  1b (PAP~a GIR3~c)a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for c in sorted(example2):\n", " A.plain(c)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:35.495622Z", "start_time": "2018-05-09T17:50:35.479279Z" } }, "outputs": [ { "data": { "text/html": [ "
P471695 obverse:1:1  1a3(N01) APIN~a 3(N57) UR4~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
P471695 obverse:1:2  2a1(N14) 2(N01) [...]
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
P471695 obverse:2:1  1a1(N01) ISZ~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for c in sorted(example2T):\n", " A.plain(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also show it with `pretty()`." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:37.451755Z", "start_time": "2018-05-09T17:50:37.427299Z" } }, "outputs": [ { "data": { "text/html": [ "
case 1a
depth=1terminal=1
3(N01)
APIN~a
3(N57)
UR4~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
case 1b
depth=1
case 1b1
depth=2terminal=1
cluster =
EN~a
DU
ZATU759
cluster =
case 1b2
depth=2terminal=1
cluster =
BAN~b
KASZ~c
cluster =
case 1b3
depth=2terminal=1
cluster =
KI@n
SAG
cluster =
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
case 2a
depth=1terminal=1
1(N14)
2(N01)
cluster ?
...
cluster ?
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
case 2b
depth=1
case 2b1
depth=2terminal=1
cluster =
3(N57)
PAP~a
cluster =
case 2b2
depth=2terminal=1
comment
n lines broken
cluster =
SZU
KI
X
cluster =
case 2b3'
depth=2terminal=1
cluster =
EN~a
AN
EZINU~d
cluster =
case 2b4'
depth=2terminal=1
comment
rest broken
comment
(for a total of 12 sub-cases with PNN)
cluster =
IDIGNA
cluster ?
...
cluster ?
cluster =
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
case 1a
depth=1terminal=1
1(N01)
ISZ~a#?
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
case 1b
depth=1
comment
blank space
comment
rest broken
cluster =
PAP~a
GIR3~c
cluster =
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for c in sorted(example2):\n", " A.pretty(c, showGraphics=False)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:41.067990Z", "start_time": "2018-05-09T17:50:41.050502Z" } }, "outputs": [ { "data": { "text/html": [ "
case 1a
depth=1terminal=1
3(N01)
APIN~a
3(N57)
UR4~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
case 2a
depth=1terminal=1
1(N14)
2(N01)
cluster ?
...
cluster ?
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
case 1a
depth=1terminal=1
1(N01)
ISZ~a#?
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for c in sorted(example2T):\n", " A.pretty(c, showGraphics=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about case `1.b`?\n", "It is a case at level 2.\n", "Why is it not in `example2T`?\n", "\n", "Yes, but it is not a terminal case. It has subcases.\n", "That is why `1.b` is left out.\n", "The parameter `terminal` specifies that only cases without children will be in the result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Level 2\n", "\n", "What if we want all signs that occur in a subcase, i.e. a case at level 2?\n", "\n", "We can call `casesByLevel(2, terminal=False)`, iterate through the resulting cases, and\n", "collect all signs per case.\n", "However, we will encounter signs multiple times.\n", "Because if a sign is in a subcase, it is also in its containing case and in its containing line.\n", "We can solve this by collecting the signs in a set.\n", "Then we loose the corpus order of the signs, but we can easily reorder the set into a list.\n", "\n", "There is an alternative method: a search template.\n", "Search delivers unordered results, so we will reorder the search results as well.\n", "\n", "Text-Fabric has an API function for sorting nodes into corpus order: `sortNodes`.\n", "\n", "Let us try out both methods and compare the outcomes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `casesByLevel`" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:45.090757Z", "start_time": "2018-05-09T17:50:45.052254Z" } }, "outputs": [ { "data": { "text/plain": [ "7738" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cases = A.casesByLevel(2, terminal=False)\n", "signSet = set()\n", "for case in cases:\n", " signSet |= set(L.d(case, otype=\"sign\"))\n", "signsA = N.sortNodes(signSet)\n", "len(signsA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or, by query:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:47.418888Z", "start_time": "2018-05-09T17:50:47.152637Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.05s 7738 results\n" ] } ], "source": [ "query = \"\"\"\n", "case depth=2\n", " sign\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or by a query not using the `depth` feature:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:49.778344Z", "start_time": "2018-05-09T17:50:49.575908Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.06s 7738 results\n" ] } ], "source": [ "query = \"\"\"\n", "line\n", " -sub> case\n", " -sub> case\n", " sign\n", "\"\"\"\n", "results = A.search(query)\n", "signsB = N.sortNodes(r[3] for r in results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A bit about results.\n", "The query mentions four quantities: `line`, `case`, `case`, `sign`.\n", "Every result of the query is an instantiation of those 4 quantities, hence a tuple of nodes:\n", "\n", "```\n", "(resultLine, resultCase1, resultCase2, resultSign)\n", "```\n", "\n", "See the table view:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:51.742028Z", "start_time": "2018-05-09T17:50:51.728828Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nplinecasecasesign
1P471695 obverse:1:11a3(N01) APIN~a 3(N57) UR4~a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1(EN~a DU ZATU759)a EN~a
2P471695 obverse:1:11a3(N01) APIN~a 3(N57) UR4~a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1(EN~a DU ZATU759)a DU
3P471695 obverse:1:11a3(N01) APIN~a 3(N57) UR4~a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1(EN~a DU ZATU759)a ZATU759
4P471695 obverse:1:11a3(N01) APIN~a 3(N57) UR4~a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b2(BAN~b KASZ~c)a BAN~b
5P471695 obverse:1:11a3(N01) APIN~a 3(N57) UR4~a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b2(BAN~b KASZ~c)a KASZ~c
6P471695 obverse:1:11a3(N01) APIN~a 3(N57) UR4~a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b3(KI@n SAG)a KI@n
7P471695 obverse:1:11a3(N01) APIN~a 3(N57) UR4~a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b1b1(EN~a DU ZATU759)a 1b2(BAN~b KASZ~c)a 1b3(KI@n SAG)a 1b3(KI@n SAG)a SAG
8P471695 obverse:1:22a1(N14) 2(N01) [...] 2b2b1(3(N57) PAP~a)a 2b2 (SZU KI X)a 2b3'(EN~a AN EZINU~d)a 2b4' (IDIGNA [...])a 2b2b1(3(N57) PAP~a)a 2b2 (SZU KI X)a 2b3'(EN~a AN EZINU~d)a 2b4' (IDIGNA [...])a 2b1(3(N57) PAP~a)a 3(N57)
9P471695 obverse:1:22a1(N14) 2(N01) [...] 2b2b1(3(N57) PAP~a)a 2b2 (SZU KI X)a 2b3'(EN~a AN EZINU~d)a 2b4' (IDIGNA [...])a 2b2b1(3(N57) PAP~a)a 2b2 (SZU KI X)a 2b3'(EN~a AN EZINU~d)a 2b4' (IDIGNA [...])a 2b1(3(N57) PAP~a)a PAP~a
10P471695 obverse:1:22a1(N14) 2(N01) [...] 2b2b1(3(N57) PAP~a)a 2b2 (SZU KI X)a 2b3'(EN~a AN EZINU~d)a 2b4' (IDIGNA [...])a 2b2b1(3(N57) PAP~a)a 2b2 (SZU KI X)a 2b3'(EN~a AN EZINU~d)a 2b4' (IDIGNA [...])a 2b2 (SZU KI X)a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our purposes we are only interested in the `resultSign` part, so we select it by the\n", "`r[3]` when we walk through all results `r`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check\n", "\n", "Both methods yield the same number of results, but are they exactly the same results?" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:54.751424Z", "start_time": "2018-05-09T17:50:54.742379Z" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "signsA == signsB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yes!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Twist\n", "\n", "Now we want to restrict ourselves to non-numerical signs.\n", "If you look at the feature docs (see the link at the start of the notebook),\n", "and read about the `type` feature for signs, you see that it can have the values\n", "`empty` `unknown` `numeral` `ideograph`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:57.313253Z", "start_time": "2018-05-09T17:50:57.161093Z" } }, "outputs": [ { "data": { "text/plain": [ "(('ideograph', 53249),\n", " ('numeral', 38122),\n", " ('uncertain', 32116),\n", " ('ellipsis', 29413),\n", " ('empty', 12440),\n", " ('unknown', 6870),\n", " ('meta', 6630),\n", " ('obverse', 5959),\n", " ('ruling', 4057),\n", " ('reverse', 3044),\n", " ('properName', 636),\n", " ('surface', 408),\n", " ('object', 403),\n", " ('seal', 21),\n", " ('bottom', 17),\n", " ('top', 3),\n", " ('left', 2),\n", " ('noface', 2),\n", " ('supplied', 1))" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.type.freqList()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ah, the feature `type` is also used for other things than signs.\n", "We just want a frequency list of `type` values for signs:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:50:59.706021Z", "start_time": "2018-05-09T17:50:59.485736Z" } }, "outputs": [ { "data": { "text/plain": [ "(('ideograph', 53249),\n", " ('numeral', 38122),\n", " ('ellipsis', 29413),\n", " ('empty', 12440),\n", " ('unknown', 6870))" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.type.freqList({\"sign\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We just want the ideographs.\n", "\n", "We'll adapt both methods to get them and ignore the numerals and lesser defined graphemes.\n", "\n", "Of course, we can just filter the result list that we have already got,\n", "but this is a tutorial, and it may come in handy to have a well stocked repertoire\n", "of direct ways to drill to your data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `casesByLevel`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:02.890023Z", "start_time": "2018-05-09T17:51:02.843207Z" } }, "outputs": [ { "data": { "text/plain": [ "3813" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cases = A.casesByLevel(2, terminal=False)\n", "signSet = set()\n", "for case in cases:\n", " signSet |= set(s for s in L.d(case, otype=\"sign\") if F.type.v(s) == \"ideograph\")\n", "signsA = N.sortNodes(signSet)\n", "len(signsA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `search`\n", "\n", "Note that it is very easy to add the desired condition to the template.\n", "\n", "This method is much easier to adapt than the first method!" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:07.242458Z", "start_time": "2018-05-09T17:51:06.900769Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.07s 3813 results\n" ] } ], "source": [ "query = \"\"\"\n", "case depth=2\n", " sign type=ideograph\n", "\"\"\"\n", "results = A.search(query)\n", "signsB = N.sortNodes(r[1] for r in results)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:08.743091Z", "start_time": "2018-05-09T17:51:08.736191Z" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "signsA == signsB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Supercase versus subcase\n", "\n", "We finish of with a comparison of the frequencies of signs that occur on lines and level-1 cases, and the frequencies of signs that occur on level-2 and deeper cases.\n", "\n", "From both groups we pick the top-20.\n", "We make a nice markdown table showing the frequencies those top-20 signs in both groups.\n", "\n", "We do this for non-numeric ideographs only.\n", "\n", "Note that we have already collected the group of the subcases and deeper: `signsB`.\n", "\n", "We give this sequence an other name: `subSigns`." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:11.031878Z", "start_time": "2018-05-09T17:51:11.023641Z" } }, "outputs": [ { "data": { "text/plain": [ "3813" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "subSigns = signsB\n", "len(subSigns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to collect the group of signs in lines and immediate cases.\n", "So we have to exclude cases that are subdivided in subcases.\n", "\n", "For that, we use the feature `terminal`, which exists and is equal to `1` for undivided\n", "cases and lines, and which does not exist for divided cases and lines.\n", "\n", "We get this group by two queries." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:14.123200Z", "start_time": "2018-05-09T17:51:13.489034Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.13s 41788 results\n" ] } ], "source": [ "query0 = \"\"\"\n", "line terminal=1\n", " sign type=ideograph\n", "\"\"\"\n", "signs0 = [r[1] for r in A.search(query0)]" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:25.872050Z", "start_time": "2018-05-09T17:51:25.452155Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.09s 7648 results\n" ] } ], "source": [ "query1 = \"\"\"\n", "line\n", " -sub> case terminal=1\n", " sign type=ideograph\n", "\"\"\"\n", "signs1 = [r[2] for r in A.search(query1)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us collect both results into `superSigns`.\n", "Note that `signs0` and `signs1` have no occurrences in common:\n", "a sign in `signs1` is part of a case, so the line that contains that case is divided,\n", "so it has no value for the`terminal` feature, so it is not in the results of `query0`." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:30.106012Z", "start_time": "2018-05-09T17:51:30.100850Z" } }, "outputs": [], "source": [ "superSigns = signs0 + signs1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also note that `superSigns` and `subSigns` have nothing in common, for the same kind of reasoning as why `signs0` and `signs1` have no occurrences in common.\n", "\n", "That said, reasoning is one thing, and using data to verify assertions is another thing.\n", "Let us just check!" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:34.127191Z", "start_time": "2018-05-09T17:51:34.111018Z" } }, "outputs": [ { "data": { "text/plain": [ "set()" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(signs0) & set(signs1)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:35.225601Z", "start_time": "2018-05-09T17:51:35.212911Z" } }, "outputs": [ { "data": { "text/plain": [ "set()" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(subSigns) & set(superSigns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last, but not least, we want to compare the frequencies of the super and sub groups with the\n", "overall frequencies." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:41.822173Z", "start_time": "2018-05-09T17:51:41.301271Z" }, "lines_to_next_cell": 2 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.12s 53249 results\n" ] } ], "source": [ "queryA = \"\"\"\n", "line\n", " sign type=ideograph\n", "\"\"\"\n", "allSigns = [r[1] for r in A.search(queryA)]" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "### Frequency and rank\n", "\n", "We are going to make a frequency distribution for both groups.\n", "We do not want to repeat ourselves\n", "[DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself),\n", "so we write a function that given a list of items,\n", "produces a frequency list.\n", "\n", "While we're at it, we also produce a ranking list: the most frequent item has rank 1,\n", "the second frequent item has rank 2, and so on.\n", "\n", "When we compute the frequencies, we count the number of times a sign, identified by its\n", "ATF transcription (without flags), occurs." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:51.655351Z", "start_time": "2018-05-09T17:51:51.645870Z" } }, "outputs": [], "source": [ "def getFreqs(items):\n", " freqs = collections.Counter()\n", " for item in items:\n", " freqs[A.atfFromSign(item)] += 1\n", " ranks = {}\n", " for item in sorted(freqs, key=lambda i: -freqs[i]):\n", " ranks[item] = len(ranks) + 1\n", " return (freqs, ranks)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:53.020313Z", "start_time": "2018-05-09T17:51:52.435073Z" }, "lines_to_next_cell": 2 }, "outputs": [], "source": [ "(allFreqs, allRanks) = getFreqs(allSigns)\n", "(superFreqs, superRanks) = getFreqs(superSigns)\n", "(subFreqs, subRanks) = getFreqs(subSigns)" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "Now we want the top scorers in the super and sub teams.\n", "We make it customisable whether you want the top-20 or top-100, or whatever." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:54.201650Z", "start_time": "2018-05-09T17:51:54.195913Z" } }, "outputs": [], "source": [ "def getTop(ranks, amount):\n", " return sorted(ranks, key=lambda i: ranks[i])[0:amount]" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:55.220801Z", "start_time": "2018-05-09T17:51:55.214275Z" } }, "outputs": [], "source": [ "AMOUNT = 20\n", "superTop = getTop(superRanks, AMOUNT)\n", "subTop = getTop(subRanks, AMOUNT)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We combine the two tops without duplication ..." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:57.541506Z", "start_time": "2018-05-09T17:51:57.535190Z" } }, "outputs": [], "source": [ "combiTopSet = set(superTop) | set(subTop)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... and sort them by overall rank:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:51:58.623765Z", "start_time": "2018-05-09T17:51:58.618624Z" }, "lines_to_next_cell": 2 }, "outputs": [], "source": [ "combiTop = sorted(combiTopSet, key=lambda i: allRanks[i])" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "Since we have now our top characters ready, let us just show them.\n", "We group them into horizontal lines." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:52:00.641095Z", "start_time": "2018-05-09T17:52:00.634478Z" } }, "outputs": [], "source": [ "def chunk(items, chunkSize):\n", " chunks = [[]]\n", " j = 0\n", " for item in items:\n", " if j == chunkSize:\n", " chunks.append([])\n", " j = 0\n", " chunks[-1].append(item)\n", " j += 1\n", " return chunks" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:52:02.116647Z", "start_time": "2018-05-09T17:52:02.020123Z" } }, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "---\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "---\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "---\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "---\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "---\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "---\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "---\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for batch in chunk(combiTop, 4):\n", " display(Markdown(\"\\n\\n---\\n\\n\"))\n", " A.lineart(batch, height=80, width=60)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now compose our table.\n", "\n", "For each sign we make a row in which we report the frequency and rank of that sign in all\n", "groups." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:52:06.119441Z", "start_time": "2018-05-09T17:52:06.107390Z" } }, "outputs": [ { "data": { "text/markdown": [ "\n", "### Frequencies and ranks of non-numeral signs\n", "sign | all F | all R | super F | super R | sub F | sub R\n", "--- | --- | --- | --- | --- | --- | ---\n", "**EN~a** | **1830** | **1** | 1670 | *1* | 160 | *1*\n", "**SZE~a** | **1294** | **2** | 1178 | *2* | 116 | *2*\n", "**GAL~a** | **1164** | **3** | 1136 | *3* | 28 | *30*\n", "**U4** | **1022** | **4** | 936 | *5* | 86 | *7*\n", "**AN** | **1020** | **5** | 946 | *4* | 74 | *9*\n", "**SAL** | **876** | **6** | 795 | *6* | 81 | *8*\n", "**PAP~a** | **851** | **7** | 765 | *7* | 86 | *6*\n", "**GI** | **849** | **8** | 754 | *8* | 95 | *4*\n", "**BA** | **781** | **9** | 679 | *10* | 102 | *3*\n", "**NUN~a** | **719** | **10** | 677 | *11* | 42 | *20*\n", "**SANGA~a** | **714** | **11** | 698 | *9* | 16 | *60*\n", "**SZU** | **680** | **12** | 611 | *13* | 69 | *10*\n", "**BU~a** | **653** | **13** | 561 | *16* | 92 | *5*\n", "**NAM2** | **649** | **14** | 625 | *12* | 24 | *39*\n", "**E2~a** | **646** | **15** | 583 | *14* | 63 | *11*\n", "**UDU~a** | **616** | **16** | 574 | *15* | 42 | *19*\n", "**A** | **600** | **17** | 557 | *17* | 43 | *18*\n", "**KI** | **546** | **18** | 503 | *19* | 43 | *17*\n", "**DUG~b** | **509** | **19** | 506 | *18* | 3 | *226*\n", "**DU** | **480** | **20** | 435 | *22* | 45 | *14*\n", "**GISZ** | **478** | **21** | 461 | *20* | 17 | *57*\n", "**HI** | **408** | **29** | 363 | *31* | 45 | *15*\n", "**TUR** | **382** | **32** | 330 | *36* | 52 | *12*\n", "**KU3~a** | **264** | **52** | 220 | *58* | 44 | *16*\n", "**HI@g~a** | **235** | **58** | 186 | *67* | 49 | *13*\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "table = \"\"\"\n", "### Frequencies and ranks of non-numeral signs\n", "sign | all F | all R | super F | super R | sub F | sub R\n", "--- | --- | --- | --- | --- | --- | ---\n", "\"\"\"\n", "for sign in combiTop:\n", " allF = allFreqs[sign]\n", " allR = allRanks[sign]\n", " superF = superFreqs.get(sign, \" \")\n", " superR = superRanks.get(sign, \" \")\n", " subF = subFreqs.get(sign, \" \")\n", " subR = subRanks.get(sign, \" \")\n", " row = f\"**{sign}** | **{allF}** | **{allR}** | {superF} | *{superR}* | {subF} | *{subR}*\"\n", " table += f\"{row}\\n\"\n", "display(Markdown(table))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next\n", "\n", "*Ready for advanced ...*\n", "\n", "Try the\n", "[primers](http://nbviewer.jupyter.org/github/Nino-cunei/primers/tree/master/)\n", "for introductions into digital cuneiform research.\n", "\n", "All chapters:\n", "[start](start.ipynb)\n", "[imagery](imagery.ipynb)\n", "[steps](steps.ipynb)\n", "[search](search.ipynb)\n", "[calc](calc.ipynb)\n", "[signs](signs.ipynb)\n", "[quads](quads.ipynb)\n", "[jumps](jumps.ipynb)\n", "**cases**\n", "\n", "---\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "651px", "left": "0px", "right": "1054px", "top": "66px", "width": "226px" }, "toc_section_display": "block", "toc_window_display": false }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }