"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A = use(\"Nino-cunei/uruk\", hoist=globals())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Showing signs\n",
"\n",
"The main characteristic of a sign is its **grapheme**.\n",
"Everything we do with signs, is complicated by the fact that signs can be *repeated*, and *augmented* with\n",
"*primes*, *variants*, *modifiers* and *flags*.\n",
"\n",
"Before we go on, we call up our example tablet.\n",
"\n",
"If you want to output multiple text items in an output cell, you have to `print()` it."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:55:04.428390Z",
"start_time": "2018-05-09T16:55:04.384703Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.00s 1 result\n"
]
},
{
"data": {
"text/html": [
"result 1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"tablet:148166 P005381
MSVO 3, 70uruk-iiicatalogId=P005381
comment:178162
atf: lang qpc
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"pNum = \"P005381\"\n",
"query = \"\"\"\n",
"tablet catalogId=P005381\n",
"\"\"\"\n",
"results = A.search(query)\n",
"A.show(results, withNodes=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We navigate to the last sign in line 1 in column 2 on the obverse face:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:55:12.287810Z",
"start_time": "2018-05-09T16:55:12.278115Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"106611\n"
]
}
],
"source": [
"case = A.nodeFromCase((pNum, \"obverse:2\", \"1\"))\n",
"sign1 = L.d(case, otype=\"sign\")[-1]\n",
"print(sign1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That must be the right bar code.\n",
"\n",
"We can retrieve the ATF transliteration:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:55:13.799410Z",
"start_time": "2018-05-09T16:55:13.792773Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GI4~a\n"
]
}
],
"source": [
"print(A.atfFromSign(sign1))"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T11:59:02.784967Z",
"start_time": "2018-03-06T11:59:02.774624Z"
}
},
"source": [
"Note that we get the ATF for a sign by means of `A.atfFromSign(node)`.\n",
"We get also the augments such as primes and modifiers and variant.\n",
"We get the flags if we say so by `flags=True`.\n",
"\n",
"Take for example the first sign on line 3 in column 1 on the obverse face:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:55:16.005048Z",
"start_time": "2018-05-09T16:55:15.996777Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"106597\n",
"2(N04)\n",
"2(N04)#\n"
]
}
],
"source": [
"case = A.nodeFromCase((pNum, \"obverse:1\", \"3\"))\n",
"sign2 = L.d(case, otype=\"sign\")[0]\n",
"print(sign2)\n",
"print(A.atfFromSign(sign2))\n",
"print(A.atfFromSign(sign2, flags=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Secondly, we want to get pointers to the locations of these signs in the corpus."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:55:18.166711Z",
"start_time": "2018-05-09T16:55:18.154138Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.pretty(sign1, withNodes=True)\n",
"A.pretty(sign2, withNodes=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Click the links below `sign` and you are taken to the CDLI page for this tablet.\n",
"\n",
"If we want to enlarge the sign, we can call it up with the lineart function."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:55:20.405031Z",
"start_time": "2018-05-09T16:55:20.396663Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.lineart([sign1, sign2], width=200)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**N.B.**\n",
"\n",
"For concepts that span one or more transliteration lines,\n",
"such as *tablet*, *face*, *column*, *line*, *case*, *comment*, you can get the source\n",
"material by requesting the feature *`srcLn`*, as we have seen before.\n",
"\n",
"For *inline* concepts, such as *clusters*, *quads*, and *signs*,\n",
"there are functions in `A.`.\n",
"\n",
"For signs we have:\n",
"\n",
"* `atfFromSign(sign, flags=False)`\n",
"\n",
" Returns the ATF representation for a sign, including primes, repeats, variants, modifiers,\n",
" and, optionally, flags."
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2018-03-06T11:59:02.784967Z",
"start_time": "2018-03-06T11:59:02.774624Z"
}
},
"source": [
"The unaugmented transliteration of a single sign can be obtained from the feature **grapheme**:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:55:23.280018Z",
"start_time": "2018-05-09T16:55:23.272714Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GI4\n",
"N04\n"
]
}
],
"source": [
"print(F.grapheme.v(sign1))\n",
"print(F.grapheme.v(sign2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's pretty-print the line in which `sign2` occurs:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:57:57.706780Z",
"start_time": "2018-05-09T16:57:57.698101Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.pretty(L.u(sign2, otype=\"line\")[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Occurrences of a sign\n",
"\n",
"Now we are using something we learned before: we want all signs with exactly\n",
"the grapheme `GU7`, regardless of augments or flags:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:00.060975Z",
"start_time": "2018-05-09T16:58:00.022447Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"314"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gu7s = F.grapheme.s(\"GU7\")\n",
"len(gu7s)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or, with a search template:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:02.669578Z",
"start_time": "2018-05-09T16:58:02.370790Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.05s 314 results\n"
]
}
],
"source": [
"results = A.search(\n",
" \"\"\"\n",
"sign grapheme=GU7\n",
"\"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### With `table()` and `show()`\n",
"The simplest way to show the results is with `A.table()` for a compact tabular view, or with\n",
"`A.show()` with a full context view.\n",
"\n",
"We show a tabular view of 3 occurrences, including node numbers.\n",
"The show view can be quite unwieldy, so we show a only 3 tablets."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:05.029738Z",
"start_time": "2018-05-09T16:58:05.019744Z"
}
},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.table(results, withNodes=True, end=3)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:25.893454Z",
"start_time": "2018-05-09T16:58:25.857582Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"result 1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"tablet P001705
ATU 6, pl. 003, W 10594+uruk-iiiW 10594 + W 10599 + W 10600
face reverse
column 1
line 1
case 1a
SZENNUR~a#?
grapheme=SZENNUR
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"result 2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"tablet P001719
ATU 6, pl. 004, W 10608uruk-iiiW 10608
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"result 3"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"tablet P001951
ATU 6, pl. 032, W 14109uruk-iiiW 14109
face obverse
column 1
line 1
ZATU675~d
grapheme=ZATU675
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.show(results, end=3, showGraphics=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### As a plain list\n",
"\n",
"There are a few hundred occurrences, we show a bit more context for them, like we did before.\n",
"We show the full grapheme, with all its augments, **and flags**.\n",
"We also show the full source line."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:39.470656Z",
"start_time": "2018-05-09T16:58:39.458807Z"
},
"lines_to_next_cell": 2
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GU7#? P001705 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] \n",
"GU7#? P001719 1. [...] 1(N14)# 2(N01)# , GU7#? [...] \n",
"GU7 P001951 2. , GU7 \n",
"GU7# P002002 1. , [...] EN~a# |SILA3~axSZE~a@t|# X [...] GU7# \n",
"GU7# P002035 2. , GU7# \n",
"GU7# P002062 1. [...] , [...] GU7# \n",
"GU7 P002100 1. , [...] GU7 \n",
"GU7# P002370 1. 1(N14) 1(N01) , GU7# [...] \n",
"GU7 P002510 1. 1(N34) 1(N14) 1(N01) , GIR2~a GU7 \n",
"GU7 P002524 2. , GU7 \n"
]
}
],
"source": [
"for g in gu7s[0:10]:\n",
" t = L.u(g, otype=\"tablet\")[0]\n",
" cl = A.lineFromNode(g)\n",
" pNum = T.sectionFromNode(t)[0]\n",
" gRep = A.atfFromSign(g, flags=True)\n",
"\n",
" line = F.srcLn.v(cl)\n",
"\n",
" print(f\"{gRep:<7} {pNum} {line}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"### As a linked table\n",
"\n",
"We can make it more user friendly: we can link each occurrence to its page on CDLI, and\n",
"put everything in a Markdown table.\n",
"\n",
"We have a function to generate the link: `A.cdli()`.\n",
"\n",
"We build a markdown table.\n",
"\n",
"We write a function for this, because we want to do it again.\n",
"\n",
"First we use the function to write the first 10 to the screen,\n",
"and then to write the whole set to a directory on your file system."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:42.767028Z",
"start_time": "2018-05-09T16:58:42.756785Z"
}
},
"outputs": [],
"source": [
"def showSigns(signs, amount=None):\n",
"\n",
" markdown = dedent(\"\"\"\\\n",
" sign | tablet | line\n",
" ---- | ------ | ----\\\n",
" \"\"\").strip()\n",
"\n",
" for g in signs if amount is None else signs[0:amount]:\n",
" t = L.u(g, otype=\"tablet\")[0]\n",
" cl = A.lineFromNode(g)\n",
" gRep = A.atfFromSign(g, flags=True)\n",
" line = F.srcLn.v(cl).replace(\"|\", \"|\")\n",
"\n",
" markdown += f\"\\n{gRep} | {A.cdli(t, asString=True)} | {line}\"\n",
"\n",
" markdown += \"\\n\"\n",
" return markdown"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:48.574655Z",
"start_time": "2018-05-09T16:58:48.567500Z"
}
},
"outputs": [
{
"data": {
"text/markdown": [
"sign | tablet | line\n",
"---- | ------ | ----\n",
"GU7#? | P001705 | 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] \n",
"GU7#? | P001719 | 1. [...] 1(N14)# 2(N01)# , GU7#? [...] \n",
"GU7 | P001951 | 2. , GU7 \n"
],
"text/plain": [
""
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Markdown(showSigns(gu7s, 3))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A bit more please ..."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:53.264707Z",
"start_time": "2018-05-09T16:58:53.253792Z"
}
},
"outputs": [
{
"data": {
"text/markdown": [
"sign | tablet | line\n",
"---- | ------ | ----\n",
"GU7#? | P001705 | 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] \n",
"GU7#? | P001719 | 1. [...] 1(N14)# 2(N01)# , GU7#? [...] \n",
"GU7 | P001951 | 2. , GU7 \n",
"GU7# | P002002 | 1. , [...] EN~a# |SILA3~axSZE~a@t|# X [...] GU7# \n",
"GU7# | P002035 | 2. , GU7# \n",
"GU7# | P002062 | 1. [...] , [...] GU7# \n",
"GU7 | P002100 | 1. , [...] GU7 \n",
"GU7# | P002370 | 1. 1(N14) 1(N01) , GU7# [...] \n",
"GU7 | P002510 | 1. 1(N34) 1(N14) 1(N01) , GIR2~a GU7 \n",
"GU7 | P002524 | 2. , GU7 \n"
],
"text/plain": [
""
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Markdown(showSigns(gu7s, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Everything to file\n",
"\n",
"We give you the whole list, in a Markdown file,\n",
"on your local system."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:58:58.614728Z",
"start_time": "2018-05-09T16:58:58.592031Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"data written to file /Users/me/text-fabric-data/github/Nino-cunei/uruk/_temp/gu7.md\n"
]
}
],
"source": [
"if not os.path.exists(A.tempDir):\n",
" os.makedirs(A.tempDir, exist_ok=True)\n",
"with open(f\"{A.tempDir}/gu7.md\", \"w\") as fh:\n",
" fh.write(showSigns(gu7s))\n",
"\n",
"print(f\"data written to file {A.tempDir}/gu7.md\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"Have a look!\n",
"\n",
"**Tip:** Open the file in [Atom](https://github.com/atom/atom).\n",
"Switch to preview by Ctr+Shift+M (in Atom).\n",
"\n",
"Again, the tablet links are clickable, and bring you straight to CDLI.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Frequency lists\n",
"\n",
"We use a bit more power of Text-Fabric by generating frequency lists.\n",
"\n",
"### Graphemes\n",
"\n",
"We just studied the `GU7` grapheme a bit.\n",
"Suppose we want to get an overview of all graphemes?\n",
"\n",
"There is a generic Text-Fabric function to give us that.\n",
"For each feature you can call up a frequency list of its values."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:59:01.697516Z",
"start_time": "2018-05-09T16:59:01.586807Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"632"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graphemes = F.grapheme.freqList()\n",
"len(graphemes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We show the top-20:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:59:02.844691Z",
"start_time": "2018-05-09T16:59:02.835063Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(('…', 29413),\n",
" ('N01', 21645),\n",
" ('', 12440),\n",
" ('X', 6870),\n",
" ('N14', 5898),\n",
" ('EN', 1950),\n",
" ('N34', 1831),\n",
" ('N57', 1826),\n",
" ('SZE', 1334),\n",
" ('GAL', 1180),\n",
" ('DUG', 1084),\n",
" ('U4', 1023),\n",
" ('AN', 1020),\n",
" ('PAP', 876),\n",
" ('SAL', 876),\n",
" ('NUN', 870),\n",
" ('E2', 854),\n",
" ('GI', 850),\n",
" ('BA', 781),\n",
" ('SANGA', 733))"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graphemes[0:20]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**N.B.:**\n",
"\n",
"* empty graphemes: `('', 12440),` These have been inserted by the conversion\n",
" to Text-Fabric inside comments, in order to link them to the tablets.\n",
"* ellipsis graphemes: correspond to the `...` in ATF, usually within an uncertainty\n",
" cluster `[...]`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Augments\n",
"\n",
"We can quickly get an overview of all kinds of augments: primes ,variants, modifiers, flags."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Prime\n",
"\n",
"The prime is a feature with values: 2, 1 or 0.\n",
"The number indicates the number of primes.\n",
"Below you see how often that occurs.\n",
"Note that we count all primes here: on signs, case numbers and column numbers."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:59:13.953362Z",
"start_time": "2018-05-09T16:59:13.943358Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 5164 x 1\n",
" 1 x 2\n"
]
}
],
"source": [
"for (value, frequency) in F.prime.freqList():\n",
" print(f\"{frequency:>5} x {value}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Variant\n",
"\n",
"The variant or *allograph* is what occurs after the grapheme and after the `~` symbol, which should be digits and/or\n",
"lowercase letters except the `x`.\n",
"\n",
"Here is the frequency list of variant values."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:59:16.127330Z",
"start_time": "2018-05-09T16:59:16.087597Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"23162 x a\n",
" 3994 x b\n",
" 1505 x c\n",
" 1308 x a1\n",
" 689 x b1\n",
" 188 x a2\n",
" 183 x d\n",
" 125 x b2\n",
" 85 x f\n",
" 72 x a3\n",
" 42 x e\n",
" 29 x c2\n",
" 22 x c1\n",
" 22 x c3\n",
" 14 x c5\n",
" 12 x a0\n",
" 12 x b3\n",
" 12 x d1\n",
" 12 x v\n",
" 11 x c4\n",
" 6 x a4\n",
" 6 x g\n",
" 5 x d2\n",
" 4 x d4\n",
" 4 x h\n",
" 2 x 3a\n",
" 2 x d3\n",
" 1 x h2\n"
]
}
],
"source": [
"for (value, frequency) in F.variant.freqList():\n",
" print(f\"{frequency:>5} x {value}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Modifier\n",
"\n",
"The modifier is what occurs after the grapheme and after the `@` symbol\n",
"It consists of digits and/or\n",
"lowercase letters except the `x`.\n",
"\n",
"Sometimes modifiers occur inside a repeat, then we have stored the modifier in the feature\n",
"`modifierInner`, as in\n",
"\n",
"```\n",
"7(N34@f)\n",
"```\n",
"\n",
"Here is the frequency list of *modifier* and `modifierInner` values."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:59:18.102258Z",
"start_time": "2018-05-09T16:59:18.094434Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 634 x g\n",
" 262 x t\n",
" 35 x n\n",
" 6 x r\n",
" 4 x s\n",
" 1 x c\n",
" 1 x h\n"
]
}
],
"source": [
"for (value, frequency) in F.modifier.freqList():\n",
" print(f\"{frequency:>5} x {value}\")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T16:59:18.887962Z",
"start_time": "2018-05-09T16:59:18.880936Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 25 x f\n",
" 1 x r\n",
" 1 x v\n"
]
}
],
"source": [
"for (value, frequency) in F.modifierInner.freqList():\n",
" print(f\"{frequency:>5} x {value}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Full signs\n",
"\n",
"We make a frequency list of all full signs, i.e. the grapheme including variant, modifier, and prime.\n",
"We show them as they appear in transcriptions.\n",
"\n",
"We only deal with instances which are not contained in a quad.\n",
"\n",
"This is no longer the frequency distribution of the values of a single feature,\n",
"so we have to do the coding ourselves."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T17:07:03.407005Z",
"start_time": "2018-05-09T17:07:02.774036Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1476"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fullGraphemes = collections.Counter()\n",
"\n",
"for n in F.otype.s(\"sign\"):\n",
" grapheme = F.grapheme.v(n)\n",
" if grapheme == \"\" or grapheme == \"…\" or grapheme == \"X\":\n",
" continue\n",
" fullGrapheme = A.atfFromSign(n)\n",
" fullGraphemes[fullGrapheme] += 1\n",
"\n",
"len(fullGraphemes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or with a query:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T17:07:12.306930Z",
"start_time": "2018-05-09T17:07:11.540470Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1476"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"\"\"\n",
"sign type=ideograph|numeral\n",
"\"\"\"\n",
"fullGraphemesQ = {A.atfFromSign(r[0]) for r in A.search(query, silent=True)}\n",
"len(fullGraphemesQ)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There! We have counted all incarnations of full graphemes, and there are 1476 distinct ones.\n",
"\n",
"We show the top-20, sorted by frequency.\n",
"\n",
"We specify a `key` function, that given an (value, amount) pair returns\n",
"(-amount, value).\n",
"This determines the order after sorting. Signs with a high value of amount come\n",
"before signs with a low value."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T17:07:45.933397Z",
"start_time": "2018-05-09T17:07:45.918977Z"
},
"lines_to_next_cell": 2
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"12983 x 1(N01)\n",
" 3080 x 2(N01)\n",
" 2584 x 1(N14)\n",
" 1830 x EN~a\n",
" 1598 x 3(N01)\n",
" 1357 x 2(N14)\n",
" 1294 x 5(N01)\n",
" 1294 x SZE~a\n",
" 1164 x GAL~a\n",
" 1117 x 4(N01)\n",
" 1022 x U4\n",
" 1020 x AN\n",
" 999 x 1(N34)\n",
" 876 x SAL\n",
" 851 x PAP~a\n",
" 849 x GI\n",
" 791 x 3(N14)\n",
" 789 x 1(N57)\n",
" 781 x BA\n",
" 719 x NUN~a\n"
]
}
],
"source": [
"for (value, frequency) in sorted(\n",
" fullGraphemes.items(),\n",
" key=lambda x: (-x[1], x[0]),\n",
")[0:20]:\n",
" print(f\"{frequency:>5} x {value}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 2
},
"source": [
"### Writing results to file\n",
"\n",
"We also want to write the results to files in your `_temp` directory, within this repo.\n",
"\n",
"`writeFreqs` writes distribution data of data items called `dataName`\n",
"to a file `fileName.txt`.\n",
"In fact, it writes two files:\n",
"* `fileName-alpha.txt`, ordered by data items\n",
"* `fileName-freq.txt`, ordered by frequency."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T17:07:56.681487Z",
"start_time": "2018-05-09T17:07:56.674385Z"
}
},
"outputs": [],
"source": [
"def writeFreqs(fileName, data, dataName):\n",
" print(f\"There are {len(data)} {dataName}s\")\n",
"\n",
" for (sortName, sortKey) in (\n",
" (\"alpha\", lambda x: (x[0], -x[1])),\n",
" (\"freq\", lambda x: (-x[1], x[0])),\n",
" ):\n",
" with open(f\"{A.tempDir}/{fileName}-{sortName}.txt\", \"w\") as fh:\n",
" for (item, freq) in sorted(data, key=sortKey):\n",
" if item != \"\":\n",
" fh.write(f\"{freq:>5} x {item}\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T17:07:57.856903Z",
"start_time": "2018-05-09T17:07:57.738368Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 632 bare graphemes\n"
]
}
],
"source": [
"writeFreqs(\"grapheme-plain\", F.grapheme.freqList(), \"bare grapheme\")"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"ExecuteTime": {
"end_time": "2018-05-09T17:07:59.508374Z",
"start_time": "2018-05-09T17:07:59.486970Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 1476 full graphemes\n"
]
}
],
"source": [
"writeFreqs(\"grapheme-full\", fullGraphemes.items(), \"full grapheme\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"variables": {
"A.tempDir": "NameError: name 'CN' is not defined
\n"
}
},
"source": [
"Now have a look at your {{A.tempDir}} and you see two generated files:\n",
"\n",
"* `graphemes-plain-alpha.txt` (sorted by grapheme)\n",
"* `graphemes-plain-freq.txt` (sorted by frequency)\n",
"* `graphemes-full-alpha.txt` (sorted by grapheme)\n",
"* `graphemes-full-freq.txt` (sorted by frequency)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Next\n",
"\n",
"[quads](quads.ipynb)\n",
"\n",
"*Things never stay simple ...*\n",
"\n",
"All chapters:\n",
"[start](start.ipynb)\n",
"[imagery](imagery.ipynb)\n",
"[steps](steps.ipynb)\n",
"[search](search.ipynb)\n",
"[calc](calc.ipynb)\n",
"**signs**\n",
"[quads](quads.ipynb)\n",
"[jumps](jumps.ipynb)\n",
"[cases](cases.ipynb)\n",
"\n",
"---\n",
"\n",
"CC-BY Dirk Roorda"
]
}
],
"metadata": {
"jupytext": {
"encoding": "# -*- coding: utf-8 -*-"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {
"height": "607px",
"left": "0px",
"right": "983px",
"top": "110px",
"width": "297px"
},
"toc_section_display": "block",
"toc_window_display": false
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}