{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "# Signs\n", "\n", "Signs are the building blocks in the transcriptions.\n", "They correspond to the individual \"glyphs\" on the tablet.\n", "\n", "However, we have inserted a few *empty* signs, which we can leave out subsequently ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need a few extra modules." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-11T09:57:17.153509Z", "start_time": "2018-05-11T09:57:17.132378Z" } }, "outputs": [], "source": [ "import os\n", "import collections\n", "from textwrap import dedent\n", "from IPython.display import Markdown\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-05-11T09:57:19.233364Z", "start_time": "2018-05-11T09:57:17.858041Z" } }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/Nino-cunei/uruk/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/Nino-cunei/uruk/tf/1.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 11.3.0, Nino-cunei/uruk/app v3, Search Reference
\n", " Data: Nino-cunei - uruk 1.0, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
tablet636422.01100
face945614.1095
column140239.3493
line358423.6192
case96513.4624
cluster327531.0324
quad37942.056
comment110901.008
sign1400941.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Uruk IV/III: Proto-cuneiform tablets \n", "
\n", "\n", "
\n", "
\n", "catalogId\n", "
\n", "
str
\n", "\n", " identifier of tablet in catalog (http://www.flutopedia.com/tablets.htm)\n", "\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "damage\n", "
\n", "
int
\n", "\n", " indicates damage of signs or quads,corresponds to #-flag in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "depth\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "excavation\n", "
\n", "
str
\n", "\n", " excavation number of tablet\n", "\n", "
\n", "\n", "
\n", "
\n", "fragment\n", "
\n", "
str
\n", "\n", " level between tablet and face\n", "\n", "
\n", "\n", "
\n", "
\n", "fullNumber\n", "
\n", "
str
\n", "\n", " the combination of face type and column number on columns\n", "\n", "
\n", "\n", "
\n", "
\n", "grapheme\n", "
\n", "
str
\n", "\n", " name of a grapheme (glyph)\n", "\n", "
\n", "\n", "
\n", "
\n", "identifier\n", "
\n", "
str
\n", "\n", " additional information pertaining to the name of a face\n", "\n", "
\n", "\n", "
\n", "
\n", "modifier\n", "
\n", "
str
\n", "\n", " indicates modifcation of a sign; corresponds to sign@letter in transcription. if the grapheme is a repeat, the modification applies to the whole repeat.\n", "\n", "
\n", "\n", "
\n", "
\n", "modifierFirst\n", "
\n", "
str
\n", "\n", " indicates the order between modifiers and variants on the same object; if 1, modifiers come before variants\n", "\n", "
\n", "\n", "
\n", "
\n", "modifierInner\n", "
\n", "
str
\n", "\n", " indicates modifcation of a sign within a repeatcorresponds to sign@letter in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "name\n", "
\n", "
str
\n", "\n", " name of tablet\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " number of a column or line or case\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "period\n", "
\n", "
str
\n", "\n", " period that characterises the tablet corpus\n", "\n", "
\n", "\n", "
\n", "
\n", "prime\n", "
\n", "
int
\n", "\n", " indicates the presence/multiplicity of a prime (single quote)\n", "\n", "
\n", "\n", "
\n", "
\n", "remarkable\n", "
\n", "
int
\n", "\n", " corresponds to ! flag in transcription \n", "\n", "
\n", "\n", "
\n", "
\n", "repeat\n", "
\n", "
int
\n", "\n", " number indicating the number of repeats of a grapheme,especially in numerals; -1 comes from repeat N in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "srcLn\n", "
\n", "
str
\n", "\n", " transcribed line\n", "\n", "
\n", "\n", "
\n", "
\n", "srcLnNum\n", "
\n", "
int
\n", "\n", " line number in transcription file\n", "\n", "
\n", "\n", "
\n", "
\n", "terminal\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "text\n", "
\n", "
str
\n", "\n", " text of comment nodes\n", "\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "\n", " type of a face; type of a comment; type of a cluster;type of a sign\n", "\n", "
\n", "\n", "
\n", "
\n", "uncertain\n", "
\n", "
int
\n", "\n", " corresponds to ?-flag in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "variant\n", "
\n", "
str
\n", "\n", " allograph for a sign, corresponds to ~x in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "variantOuter\n", "
\n", "
str
\n", "\n", " allograph for a quad, corresponds to ~x in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "written\n", "
\n", "
str
\n", "\n", " corresponds to !(xxx) flag in transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "comments\n", "
\n", "
none
\n", "\n", " links comment nodes to their targets\n", "\n", "
\n", "\n", "
\n", "
\n", "op\n", "
\n", "
str
\n", "\n", " operator connecting left to right operand in a quad\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "sub\n", "
\n", "
none
\n", "\n", " connects line or case with sub-cases, quad with sub-quads; clusters with sub-clusters\n", "\n", "
\n", "\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/Nino-cunei/uruk/sources/cdli/images" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 2095 ideograph linearts
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 2724 tablet linearts
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 5495 tablet photos
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"Nino-cunei/uruk\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Showing signs\n", "\n", "The main characteristic of a sign is its **grapheme**.\n", "Everything we do with signs, is complicated by the fact that signs can be *repeated*, and *augmented* with\n", "*primes*, *variants*, *modifiers* and *flags*.\n", "\n", "Before we go on, we call up our example tablet.\n", "\n", "If you want to output multiple text items in an output cell, you have to `print()` it." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:55:04.428390Z", "start_time": "2018-05-09T16:55:04.384703Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s 1 result\n" ] }, { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

tablet:148166 P005381
MSVO 3, 70uruk-iiicatalogId=P005381
comment:178162
atf: lang qpc
face:156932 obverse
column:190362 1
line:254173 1
case:167736 1a
106585 2(N14)
106586 SZE~a
106587 SAL
106588 TUR3~a
106589 NUN~a
case:167737 1b
106590 3(N19)
quad:143013
106591 GISZ
.
106592 TE
line:254174 2
106593 1(N14)
106594 NAR
106595 NUN~a
106596 SIG7
line:254175 3
106597 2(N04)#
106598 PIRIG~b1
106599 SIG7
106600 URI3~a
106601 NUN~a
column:190363 2
line:254176 1
106602 3(N04)
quad:143014
106603 GISZ
.
106604 TE
106605 GAR
quad:143015
106606 SZU2
.
quad:143016
quad:143017
106607 HI
+
106608 1(N57)
+
quad:143018
106609 HI
+
106610 1(N57)
106611 GI4~a
line:254177 2
106612 GU7
106613 AZ
106614 SI4~f
face:156933 reverse
column:190364 1
line:254178 1
106615 3(N14)
106616 SZE~a
line:254179 2
106617 3(N19)
106618 5(N04)
line:254180 3
106619 GU7
column:190365 2
line:254181 1
106620 AZ
106621 SI4~f
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pNum = \"P005381\"\n", "query = \"\"\"\n", "tablet catalogId=P005381\n", "\"\"\"\n", "results = A.search(query)\n", "A.show(results, withNodes=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We navigate to the last sign in line 1 in column 2 on the obverse face:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:55:12.287810Z", "start_time": "2018-05-09T16:55:12.278115Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "106611\n" ] } ], "source": [ "case = A.nodeFromCase((pNum, \"obverse:2\", \"1\"))\n", "sign1 = L.d(case, otype=\"sign\")[-1]\n", "print(sign1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That must be the right bar code.\n", "\n", "We can retrieve the ATF transliteration:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:55:13.799410Z", "start_time": "2018-05-09T16:55:13.792773Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GI4~a\n" ] } ], "source": [ "print(A.atfFromSign(sign1))" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-03-06T11:59:02.784967Z", "start_time": "2018-03-06T11:59:02.774624Z" } }, "source": [ "Note that we get the ATF for a sign by means of `A.atfFromSign(node)`.\n", "We get also the augments such as primes and modifiers and variant.\n", "We get the flags if we say so by `flags=True`.\n", "\n", "Take for example the first sign on line 3 in column 1 on the obverse face:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:55:16.005048Z", "start_time": "2018-05-09T16:55:15.996777Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "106597\n", "2(N04)\n", "2(N04)#\n" ] } ], "source": [ "case = A.nodeFromCase((pNum, \"obverse:1\", \"3\"))\n", "sign2 = L.d(case, otype=\"sign\")[0]\n", "print(sign2)\n", "print(A.atfFromSign(sign2))\n", "print(A.atfFromSign(sign2, flags=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Secondly, we want to get pointers to the locations of these signs in the corpus." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:55:18.166711Z", "start_time": "2018-05-09T16:55:18.154138Z" } }, "outputs": [ { "data": { "text/html": [ "
106611 GI4~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
106597 2(N04)#
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.pretty(sign1, withNodes=True)\n", "A.pretty(sign2, withNodes=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Click the links below `sign` and you are taken to the CDLI page for this tablet.\n", "\n", "If we want to enlarge the sign, we can call it up with the lineart function." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:55:20.405031Z", "start_time": "2018-05-09T16:55:20.396663Z" } }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.lineart([sign1, sign2], width=200)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**N.B.**\n", "\n", "For concepts that span one or more transliteration lines,\n", "such as *tablet*, *face*, *column*, *line*, *case*, *comment*, you can get the source\n", "material by requesting the feature *`srcLn`*, as we have seen before.\n", "\n", "For *inline* concepts, such as *clusters*, *quads*, and *signs*,\n", "there are functions in `A.`.\n", "\n", "For signs we have:\n", "\n", "* `atfFromSign(sign, flags=False)`\n", "\n", " Returns the ATF representation for a sign, including primes, repeats, variants, modifiers,\n", " and, optionally, flags." ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-03-06T11:59:02.784967Z", "start_time": "2018-03-06T11:59:02.774624Z" } }, "source": [ "The unaugmented transliteration of a single sign can be obtained from the feature **grapheme**:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:55:23.280018Z", "start_time": "2018-05-09T16:55:23.272714Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GI4\n", "N04\n" ] } ], "source": [ "print(F.grapheme.v(sign1))\n", "print(F.grapheme.v(sign2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's pretty-print the line in which `sign2` occurs:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:57:57.706780Z", "start_time": "2018-05-09T16:57:57.698101Z" } }, "outputs": [ { "data": { "text/html": [ "
line 3
2(N04)#
PIRIG~b1
SIG7
URI3~a
NUN~a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.pretty(L.u(sign2, otype=\"line\")[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Occurrences of a sign\n", "\n", "Now we are using something we learned before: we want all signs with exactly\n", "the grapheme `GU7`, regardless of augments or flags:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:00.060975Z", "start_time": "2018-05-09T16:58:00.022447Z" } }, "outputs": [ { "data": { "text/plain": [ "314" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gu7s = F.grapheme.s(\"GU7\")\n", "len(gu7s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, with a search template:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:02.669578Z", "start_time": "2018-05-09T16:58:02.370790Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.05s 314 results\n" ] } ], "source": [ "results = A.search(\n", " \"\"\"\n", "sign grapheme=GU7\n", "\"\"\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### With `table()` and `show()`\n", "The simplest way to show the results is with `A.table()` for a compact tabular view, or with\n", "`A.show()` with a full context view.\n", "\n", "We show a tabular view of 3 occurrences, including node numbers.\n", "The show view can be quite unwieldy, so we show a only 3 tablets." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:05.029738Z", "start_time": "2018-05-09T16:58:05.019744Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
npsign
1P001705 obverse:3:12456 GU7
2P001719 obverse:1:12660 GU7
3P001951 obverse:3:23883 GU7
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, withNodes=True, end=3)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:25.893454Z", "start_time": "2018-05-09T16:58:25.857582Z" } }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

tablet P001705
ATU 6, pl. 003, W 10594+uruk-iiiW 10594 + W 10599 + W 10600
comment
atf: lang qpc
face obverse
column 1
line 1
cluster ?
...
grapheme=
cluster ?
cluster ?
...
grapheme=
cluster ?
column 2
line 1
2(N47)#
grapheme=N47
6(N20)#
grapheme=N20
4(N05)#
grapheme=N05
cluster ?
...
grapheme=
cluster ?
X
grapheme=X
line 2
cluster ?
...
grapheme=
cluster ?
cluster ?
...
grapheme=
cluster ?
line 3
cluster ?
...
grapheme=
cluster ?
cluster ?
...
grapheme=
cluster ?
column 3
line 1
2(N47)#
grapheme=N47
6(N20)#
grapheme=N20
5(N05)#?
grapheme=N05
1(N42~a)#
grapheme=N42
1(N25)#
grapheme=N25
1(N28~c)#?
grapheme=N28
GU7#?
grapheme=GU7
cluster ?
...
grapheme=
cluster ?
line 2
3(N20)#?
grapheme=N20
3(N42~a)#
grapheme=N42
1(N28~c)#?
grapheme=N28
BA
grapheme=BA
line 3
grapheme=
face reverse
column 1
line 1
case 1a
cluster ?
...
grapheme=
cluster ?
cluster ?
...
grapheme=
cluster ?
SZENNUR~a#?
grapheme=SZENNUR
case 1b
2(N47)
grapheme=N47
line 2
case 2a
cluster ?
...
grapheme=
cluster ?
cluster ?
...
grapheme=
cluster ?
U4
grapheme=U4
KI
grapheme=KI
case 2b
6(N20)
grapheme=N20
4(N05)
grapheme=N05
line 3
case 3a
cluster ?
...
grapheme=
cluster ?
cluster ?
...
grapheme=
cluster ?
U4
grapheme=U4
KI
grapheme=KI
case 3b
1(N01)#
grapheme=N01
1(N39~a)#
grapheme=N39
1(N24)#
grapheme=N24
1(N28)#
grapheme=N28
line 4
cluster ?
...
grapheme=
cluster ?
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

tablet P001719
ATU 6, pl. 004, W 10608uruk-iiiW 10608
comment
atf: lang qpc
face obverse
column 1
line 1
cluster ?
...
grapheme=
cluster ?
1(N14)#
grapheme=N14
2(N01)#
grapheme=N01
GU7#?
grapheme=GU7
cluster ?
...
grapheme=
cluster ?
line 2
cluster ?
...
grapheme=
cluster ?
GUM~b#
grapheme=GUM
cluster ?
...
grapheme=
cluster ?
column 2
line 1
cluster ?
...
grapheme=
cluster ?
cluster ?
...
grapheme=
cluster ?
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

tablet P001951
ATU 6, pl. 032, W 14109uruk-iiiW 14109
comment
atf: lang qpc
face obverse
column 1
line 1
1(N01)
grapheme=N01
2(N58)
grapheme=N58
ZATU675~d
grapheme=ZATU675
NAGA~a
grapheme=NAGA
column 2
line 1
1(N01)
grapheme=N01
ZATU659
grapheme=ZATU659
BU~a?
grapheme=BU
NAM2
grapheme=NAM2
PAP~a
grapheme=PAP
line 2
SZITA~a3
grapheme=SZITA
quad
U4
grapheme=U4
x
1(N01)
grapheme=N01
E2~b
grapheme=E2
line 3
1(N57)
grapheme=N57
DU
grapheme=DU
E2~b
grapheme=E2
NUNUZ~a1#
grapheme=NUNUZ
line 4
case 4a
1(N57)
grapheme=N57
NAGA~a#
grapheme=NAGA
3(N57)#
grapheme=N57
case 4b
1(N57)
grapheme=N57
E2~b#
grapheme=E2
X
grapheme=X
line 5
case 5a
1(N01)
grapheme=N01
ZATU659
grapheme=ZATU659
NIR~a#
grapheme=NIR
case 5b
SU~a
grapheme=SU
PAP~a
grapheme=PAP
3(N57)
grapheme=N57
line 6
1(N57)
grapheme=N57
EN~a
grapheme=EN
SAG
grapheme=SAG
SZITA~a1
grapheme=SZITA
line 7
1(N57)
grapheme=N57
PAP~a
grapheme=PAP
GA2~a1
grapheme=GA2
GISZ
grapheme=GISZ
ERIM~a#?
grapheme=ERIM
column 3
line 1
1(N57)
grapheme=N57
BU~a#?
grapheme=BU
SZU
grapheme=SZU
line 2
GU7
grapheme=GU7
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, end=3, showGraphics=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### As a plain list\n", "\n", "There are a few hundred occurrences, we show a bit more context for them, like we did before.\n", "We show the full grapheme, with all its augments, **and flags**.\n", "We also show the full source line." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:39.470656Z", "start_time": "2018-05-09T16:58:39.458807Z" }, "lines_to_next_cell": 2 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GU7#? P001705 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] \n", "GU7#? P001719 1. [...] 1(N14)# 2(N01)# , GU7#? [...] \n", "GU7 P001951 2. , GU7 \n", "GU7# P002002 1. , [...] EN~a# |SILA3~axSZE~a@t|# X [...] GU7# \n", "GU7# P002035 2. , GU7# \n", "GU7# P002062 1. [...] , [...] GU7# \n", "GU7 P002100 1. , [...] GU7 \n", "GU7# P002370 1. 1(N14) 1(N01) , GU7# [...] \n", "GU7 P002510 1. 1(N34) 1(N14) 1(N01) , GIR2~a GU7 \n", "GU7 P002524 2. , GU7 \n" ] } ], "source": [ "for g in gu7s[0:10]:\n", " t = L.u(g, otype=\"tablet\")[0]\n", " cl = A.lineFromNode(g)\n", " pNum = T.sectionFromNode(t)[0]\n", " gRep = A.atfFromSign(g, flags=True)\n", "\n", " line = F.srcLn.v(cl)\n", "\n", " print(f\"{gRep:<7} {pNum} {line}\")" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "### As a linked table\n", "\n", "We can make it more user friendly: we can link each occurrence to its page on CDLI, and\n", "put everything in a Markdown table.\n", "\n", "We have a function to generate the link: `A.cdli()`.\n", "\n", "We build a markdown table.\n", "\n", "We write a function for this, because we want to do it again.\n", "\n", "First we use the function to write the first 10 to the screen,\n", "and then to write the whole set to a directory on your file system." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:42.767028Z", "start_time": "2018-05-09T16:58:42.756785Z" } }, "outputs": [], "source": [ "def showSigns(signs, amount=None):\n", "\n", " markdown = dedent(\"\"\"\\\n", " sign | tablet | line\n", " ---- | ------ | ----\\\n", " \"\"\").strip()\n", "\n", " for g in signs if amount is None else signs[0:amount]:\n", " t = L.u(g, otype=\"tablet\")[0]\n", " cl = A.lineFromNode(g)\n", " gRep = A.atfFromSign(g, flags=True)\n", " line = F.srcLn.v(cl).replace(\"|\", \"|\")\n", "\n", " markdown += f\"\\n{gRep} | {A.cdli(t, asString=True)} | {line}\"\n", "\n", " markdown += \"\\n\"\n", " return markdown" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:48.574655Z", "start_time": "2018-05-09T16:58:48.567500Z" } }, "outputs": [ { "data": { "text/markdown": [ "sign | tablet | line\n", "---- | ------ | ----\n", "GU7#? | P001705 | 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] \n", "GU7#? | P001719 | 1. [...] 1(N14)# 2(N01)# , GU7#? [...] \n", "GU7 | P001951 | 2. , GU7 \n" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Markdown(showSigns(gu7s, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A bit more please ..." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:53.264707Z", "start_time": "2018-05-09T16:58:53.253792Z" } }, "outputs": [ { "data": { "text/markdown": [ "sign | tablet | line\n", "---- | ------ | ----\n", "GU7#? | P001705 | 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] \n", "GU7#? | P001719 | 1. [...] 1(N14)# 2(N01)# , GU7#? [...] \n", "GU7 | P001951 | 2. , GU7 \n", "GU7# | P002002 | 1. , [...] EN~a# |SILA3~axSZE~a@t|# X [...] GU7# \n", "GU7# | P002035 | 2. , GU7# \n", "GU7# | P002062 | 1. [...] , [...] GU7# \n", "GU7 | P002100 | 1. , [...] GU7 \n", "GU7# | P002370 | 1. 1(N14) 1(N01) , GU7# [...] \n", "GU7 | P002510 | 1. 1(N34) 1(N14) 1(N01) , GIR2~a GU7 \n", "GU7 | P002524 | 2. , GU7 \n" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Markdown(showSigns(gu7s, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Everything to file\n", "\n", "We give you the whole list, in a Markdown file,\n", "on your local system." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:58:58.614728Z", "start_time": "2018-05-09T16:58:58.592031Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data written to file /Users/me/text-fabric-data/github/Nino-cunei/uruk/_temp/gu7.md\n" ] } ], "source": [ "if not os.path.exists(A.tempDir):\n", " os.makedirs(A.tempDir, exist_ok=True)\n", "with open(f\"{A.tempDir}/gu7.md\", \"w\") as fh:\n", " fh.write(showSigns(gu7s))\n", "\n", "print(f\"data written to file {A.tempDir}/gu7.md\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Have a look!\n", "\n", "**Tip:** Open the file in [Atom](https://github.com/atom/atom).\n", "Switch to preview by Ctr+Shift+M (in Atom).\n", "\n", "Again, the tablet links are clickable, and bring you straight to CDLI.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Frequency lists\n", "\n", "We use a bit more power of Text-Fabric by generating frequency lists.\n", "\n", "### Graphemes\n", "\n", "We just studied the `GU7` grapheme a bit.\n", "Suppose we want to get an overview of all graphemes?\n", "\n", "There is a generic Text-Fabric function to give us that.\n", "For each feature you can call up a frequency list of its values." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:59:01.697516Z", "start_time": "2018-05-09T16:59:01.586807Z" } }, "outputs": [ { "data": { "text/plain": [ "632" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graphemes = F.grapheme.freqList()\n", "len(graphemes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We show the top-20:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:59:02.844691Z", "start_time": "2018-05-09T16:59:02.835063Z" } }, "outputs": [ { "data": { "text/plain": [ "(('…', 29413),\n", " ('N01', 21645),\n", " ('', 12440),\n", " ('X', 6870),\n", " ('N14', 5898),\n", " ('EN', 1950),\n", " ('N34', 1831),\n", " ('N57', 1826),\n", " ('SZE', 1334),\n", " ('GAL', 1180),\n", " ('DUG', 1084),\n", " ('U4', 1023),\n", " ('AN', 1020),\n", " ('PAP', 876),\n", " ('SAL', 876),\n", " ('NUN', 870),\n", " ('E2', 854),\n", " ('GI', 850),\n", " ('BA', 781),\n", " ('SANGA', 733))" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graphemes[0:20]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**N.B.:**\n", "\n", "* empty graphemes: `('', 12440),` These have been inserted by the conversion\n", " to Text-Fabric inside comments, in order to link them to the tablets.\n", "* ellipsis graphemes: correspond to the `...` in ATF, usually within an uncertainty\n", " cluster `[...]`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Augments\n", "\n", "We can quickly get an overview of all kinds of augments: primes ,variants, modifiers, flags." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Prime\n", "\n", "The prime is a feature with values: 2, 1 or 0.\n", "The number indicates the number of primes.\n", "Below you see how often that occurs.\n", "Note that we count all primes here: on signs, case numbers and column numbers." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:59:13.953362Z", "start_time": "2018-05-09T16:59:13.943358Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 5164 x 1\n", " 1 x 2\n" ] } ], "source": [ "for (value, frequency) in F.prime.freqList():\n", " print(f\"{frequency:>5} x {value}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Variant\n", "\n", "The variant or *allograph* is what occurs after the grapheme and after the `~` symbol, which should be digits and/or\n", "lowercase letters except the `x`.\n", "\n", "Here is the frequency list of variant values." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:59:16.127330Z", "start_time": "2018-05-09T16:59:16.087597Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "23162 x a\n", " 3994 x b\n", " 1505 x c\n", " 1308 x a1\n", " 689 x b1\n", " 188 x a2\n", " 183 x d\n", " 125 x b2\n", " 85 x f\n", " 72 x a3\n", " 42 x e\n", " 29 x c2\n", " 22 x c1\n", " 22 x c3\n", " 14 x c5\n", " 12 x a0\n", " 12 x b3\n", " 12 x d1\n", " 12 x v\n", " 11 x c4\n", " 6 x a4\n", " 6 x g\n", " 5 x d2\n", " 4 x d4\n", " 4 x h\n", " 2 x 3a\n", " 2 x d3\n", " 1 x h2\n" ] } ], "source": [ "for (value, frequency) in F.variant.freqList():\n", " print(f\"{frequency:>5} x {value}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Modifier\n", "\n", "The modifier is what occurs after the grapheme and after the `@` symbol\n", "It consists of digits and/or\n", "lowercase letters except the `x`.\n", "\n", "Sometimes modifiers occur inside a repeat, then we have stored the modifier in the feature\n", "`modifierInner`, as in\n", "\n", "```\n", "7(N34@f)\n", "```\n", "\n", "Here is the frequency list of *modifier* and `modifierInner` values." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:59:18.102258Z", "start_time": "2018-05-09T16:59:18.094434Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 634 x g\n", " 262 x t\n", " 35 x n\n", " 6 x r\n", " 4 x s\n", " 1 x c\n", " 1 x h\n" ] } ], "source": [ "for (value, frequency) in F.modifier.freqList():\n", " print(f\"{frequency:>5} x {value}\")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T16:59:18.887962Z", "start_time": "2018-05-09T16:59:18.880936Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 25 x f\n", " 1 x r\n", " 1 x v\n" ] } ], "source": [ "for (value, frequency) in F.modifierInner.freqList():\n", " print(f\"{frequency:>5} x {value}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Full signs\n", "\n", "We make a frequency list of all full signs, i.e. the grapheme including variant, modifier, and prime.\n", "We show them as they appear in transcriptions.\n", "\n", "We only deal with instances which are not contained in a quad.\n", "\n", "This is no longer the frequency distribution of the values of a single feature,\n", "so we have to do the coding ourselves." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:07:03.407005Z", "start_time": "2018-05-09T17:07:02.774036Z" } }, "outputs": [ { "data": { "text/plain": [ "1476" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fullGraphemes = collections.Counter()\n", "\n", "for n in F.otype.s(\"sign\"):\n", " grapheme = F.grapheme.v(n)\n", " if grapheme == \"\" or grapheme == \"…\" or grapheme == \"X\":\n", " continue\n", " fullGrapheme = A.atfFromSign(n)\n", " fullGraphemes[fullGrapheme] += 1\n", "\n", "len(fullGraphemes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or with a query:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:07:12.306930Z", "start_time": "2018-05-09T17:07:11.540470Z" } }, "outputs": [ { "data": { "text/plain": [ "1476" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query = \"\"\"\n", "sign type=ideograph|numeral\n", "\"\"\"\n", "fullGraphemesQ = {A.atfFromSign(r[0]) for r in A.search(query, silent=True)}\n", "len(fullGraphemesQ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There! We have counted all incarnations of full graphemes, and there are 1476 distinct ones.\n", "\n", "We show the top-20, sorted by frequency.\n", "\n", "We specify a `key` function, that given an (value, amount) pair returns\n", "(-amount, value).\n", "This determines the order after sorting. Signs with a high value of amount come\n", "before signs with a low value." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:07:45.933397Z", "start_time": "2018-05-09T17:07:45.918977Z" }, "lines_to_next_cell": 2 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12983 x 1(N01)\n", " 3080 x 2(N01)\n", " 2584 x 1(N14)\n", " 1830 x EN~a\n", " 1598 x 3(N01)\n", " 1357 x 2(N14)\n", " 1294 x 5(N01)\n", " 1294 x SZE~a\n", " 1164 x GAL~a\n", " 1117 x 4(N01)\n", " 1022 x U4\n", " 1020 x AN\n", " 999 x 1(N34)\n", " 876 x SAL\n", " 851 x PAP~a\n", " 849 x GI\n", " 791 x 3(N14)\n", " 789 x 1(N57)\n", " 781 x BA\n", " 719 x NUN~a\n" ] } ], "source": [ "for (value, frequency) in sorted(\n", " fullGraphemes.items(),\n", " key=lambda x: (-x[1], x[0]),\n", ")[0:20]:\n", " print(f\"{frequency:>5} x {value}\")" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "### Writing results to file\n", "\n", "We also want to write the results to files in your `_temp` directory, within this repo.\n", "\n", "`writeFreqs` writes distribution data of data items called `dataName`\n", "to a file `fileName.txt`.\n", "In fact, it writes two files:\n", "* `fileName-alpha.txt`, ordered by data items\n", "* `fileName-freq.txt`, ordered by frequency." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:07:56.681487Z", "start_time": "2018-05-09T17:07:56.674385Z" } }, "outputs": [], "source": [ "def writeFreqs(fileName, data, dataName):\n", " print(f\"There are {len(data)} {dataName}s\")\n", "\n", " for (sortName, sortKey) in (\n", " (\"alpha\", lambda x: (x[0], -x[1])),\n", " (\"freq\", lambda x: (-x[1], x[0])),\n", " ):\n", " with open(f\"{A.tempDir}/{fileName}-{sortName}.txt\", \"w\") as fh:\n", " for (item, freq) in sorted(data, key=sortKey):\n", " if item != \"\":\n", " fh.write(f\"{freq:>5} x {item}\\n\")" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:07:57.856903Z", "start_time": "2018-05-09T17:07:57.738368Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 632 bare graphemes\n" ] } ], "source": [ "writeFreqs(\"grapheme-plain\", F.grapheme.freqList(), \"bare grapheme\")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2018-05-09T17:07:59.508374Z", "start_time": "2018-05-09T17:07:59.486970Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 1476 full graphemes\n" ] } ], "source": [ "writeFreqs(\"grapheme-full\", fullGraphemes.items(), \"full grapheme\")" ] }, { "cell_type": "markdown", "metadata": { "variables": { "A.tempDir": "

NameError: name 'CN' is not defined

\n" } }, "source": [ "Now have a look at your {{A.tempDir}} and you see two generated files:\n", "\n", "* `graphemes-plain-alpha.txt` (sorted by grapheme)\n", "* `graphemes-plain-freq.txt` (sorted by frequency)\n", "* `graphemes-full-alpha.txt` (sorted by grapheme)\n", "* `graphemes-full-freq.txt` (sorted by frequency)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next\n", "\n", "[quads](quads.ipynb)\n", "\n", "*Things never stay simple ...*\n", "\n", "All chapters:\n", "[start](start.ipynb)\n", "[imagery](imagery.ipynb)\n", "[steps](steps.ipynb)\n", "[search](search.ipynb)\n", "[calc](calc.ipynb)\n", "**signs**\n", "[quads](quads.ipynb)\n", "[jumps](jumps.ipynb)\n", "[cases](cases.ipynb)\n", "\n", "---\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "jupytext": { "encoding": "# -*- coding: utf-8 -*-" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "607px", "left": "0px", "right": "983px", "top": "110px", "width": "297px" }, "toc_section_display": "block", "toc_window_display": false }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }