{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting started\n", "\n", "It is assumed that you have read\n", "[start](start.ipynb)\n", "and followed the installation instructions there." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Corpus\n", "\n", "This is\n", "\n", "* `oldbabylonian` Old Babylonian Letters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# First acquaintance\n", "\n", "We just want to grasp what the corpus is about and how we can find our way in the data.\n", "\n", "Open a terminal or command prompt and say one of the following\n", "\n", "```text-fabric oldbabylonian```\n", "\n", "Wait and see a lot happening before your browser starts up and shows you an interface on the corpus:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Text-Fabric needs an app to deal with the corpus-specific things.\n", "It downloads/finds/caches the latest version of the **app**:\n", "\n", "```\n", "Using TF-app in /Users/dirk/text-fabric-data/annotation/app-oldbabylonian/code:\n", "\trv0.2=#4bb2530bfb94dc93601f8b3df7722cb0e5df7a43 (latest release)\n", "```\n", "\n", "It downloads/finds/caches the latest version of the **data**:\n", "\n", "```\n", "Using data in /Users/dirk/text-fabric-data/Nino-cunei/oldbabylonian/tf/1.0.4:\n", "\trv1.4=#43c36d148794e3feeb3dd39e105ce6a4df79c467 (latest release)\n", "```\n", "\n", "The data is preprocessed in order to speed up typical Text-Fabric operations.\n", "The result is cached on your computer.\n", "Preprocessing costs time. Next time you use this corpus on this machine, the startup time is much quicker.\n", "\n", "```\n", "TF setup done.\n", "```\n", "\n", "Then the app goes on to act as a local webserver serving the corpus that has just been downloaded\n", "and it will open your browser for you and load the corpus page\n", "\n", "```\n", " * Running on http://localhost:8106/ (Press CTRL+C to quit)\n", "Opening oldbabylonian in browser\n", "Listening at port 18986\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Help!\n", "\n", "Indeed, that is what you need. Click the vertical `Help` tab.\n", "\n", "From there, click around a little bit. Don't read closely, just note the kinds of information that is presented to you.\n", "\n", "Later on, it will make more sense!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Browsing\n", "\n", "First we browse our data. Click the browse button.\n", "\n", "\n", "\n", "and then, in the table of *documents* (tablets), click on `obverse`\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you're looking at one side of tablet: the marks in an ASCII transcription.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now click the *Options* tab and select the `layout-orig-unicode` format to see the same tablet in cuneiform signs.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can click a triangle to see how a line is broken down:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Searching\n", "\n", "See that line, starting with the word `um-ma`, and whose last word ends in the sign `ma`?\n", "\n", "That is a pattern. Let's search for it.\n", "\n", "Enter this query in the search pad and press the search icon above it.\n", "\n", "```\n", "line\n", " =: word\n", " =: sign reading=um\n", " <: sign reading=ma\n", " :=\n", " < sign reading=ma\n", " :=\n", "```\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In English:\n", "\n", "search all `line`s that contain a `word` and a `sign` where:\n", "\n", "* `=:` the `word` starts where the `line` starts\n", "* the `word` contains a `sign` and a `sign` where:\n", " * `=:` the first `sign` starts where the `word` starts\n", " * `<:` the second sign follows the first sign immediately\n", " * `:=` the second sign ends where the word ends\n", "* `<` the `sign` comes after the word\n", "* `:=` the `sign` ends where the line ends\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can expand results by clicking the triangle.\n", "\n", "You can see the result in context by clicking the browse icon.\n", "\n", "You can go back to the result list by clicking the results icon.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Computing\n", "\n", "We see that this line comes at the start of a tablet.\n", "\n", "In fact, this pattern corresponds to a heading of a letter.\n", "\n", "Question: of all 1274 results, how many are the first line, the second line, the third line, etc?\n", "\n", "*This is a typical question where you want to leave the search mode and enter computing mode*.\n", "\n", "Let's do that!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have followed the installation instructions, you are set.\n", "Go to the browser window that opened when you gave the command `jupyter notebook` in your terminal.\n", "\n", "Then continue reading, and, ... executing.\n", "\n", "You can execute a cell by putting your cursor inside it and pressing `Shift Enter`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we load the Text-Fabric module, as follows:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import collections\n", "from tf.app import use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we load the TF-app for the corpus `oldbabylonian` and that app loads the corpus data.\n", "\n", "We give a name to the result of all that loading: `A`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/annotation/app-oldbabylonian/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/Nino-cunei/oldbabylonian/tf/1.0.5" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 8.3.0, app-oldbabylonian, Search Reference
Data: OLDBABYLONIAN, Character table, Feature docs
Features:
Old Babylonian Letters 1900-1600: Cuneiform tablets ARK
after
afterr
afteru
atf
atfpost
atfpre
author
col
collated
collection
comment
damage
det
docnote
docnumber
excavation
excised
face
flags
fraction
genre
grapheme
graphemer
graphemeu
lang
langalt
ln
lnc
lnno
material
missing
museumcode
museumname
object
operator
operatorr
operatoru
otype
period
pnumber
primecol
primeln
pubdate
question
reading
readingr
readingu
remarkable
remarks
repeat
srcLn
srcLnNum
srcfile
subgenre
supplied
sym
symr
symu
trans
transcriber
translation@ll
type
uncertain
volume
oslots
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"oldbabylonian:clone\", checkout=\"clone\", hoist=globals())\n", "# A = use('oldbabylonian', hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some bits are familiar from above, when you ran the `text-fabric` command in the terminal.\n", "\n", "Other bits are links to the documentation, they point to the same places as the links on the Text-Fabric browser.\n", "\n", "You see a list of all the data features that have been loaded.\n", "\n", "And a list of references to the API documentation, which tells you how you can use this data in your program statements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Searching (revisited)\n", "\n", "We do the same search again, but now inside our program.\n", "\n", "That means that we can capture the results in a list for further processing." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.70s 1274 results\n" ] } ], "source": [ "results = A.search(\n", " \"\"\"\n", "line\n", " =: word\n", " =: sign reading=um\n", " <: sign reading=ma\n", " :=\n", " < sign reading=ma\n", " :=\n", "\"\"\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In less than a second, we have all the results!\n", "\n", "Let's look at the first one:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(230790, 258166, 11, 12, 20)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each result is a list of numbers: for a\n", "\n", "1. line\n", "1. word\n", "1. sign\n", "1. sign\n", "1. sign\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the second one:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(230826, 258317, 359, 360, 366)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here the last one:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(258128, 334552, 202886, 202887, 202894)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to find out something for each result line: which line number does it have among the lines on the same tablet face?\n", "\n", "Click the link `Feature docs` above, and read a bit under **Node type line**.\n", "\n", "There you see that the feature `ln` is of particular interest to us.\n", "\n", "First we get the line number of result 1000:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "252681\n", "3\n" ] } ], "source": [ "node = results[999][0]\n", "print(node)\n", "lineNumber = F.ln.v(node)\n", "print(lineNumber)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we collect the set of all line numbers that our result lines have:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 31}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{F.ln.v(result[0]) for result in results}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we really want to know is how the result lines are distributed over the line numbers." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Counter({3: 834, 2: 110, 4: 102, 6: 42, 7: 37, 5: 33, 8: 31, 9: 16, 10: 13, 1: 11, 12: 9, 11: 8, 13: 8, 16: 5, 15: 4, 14: 4, 20: 2, 17: 2, 31: 1, 19: 1, 21: 1})\n" ] } ], "source": [ "distribution = collections.Counter()\n", "\n", "for result in results:\n", " lineNumber = F.ln.v(result[0])\n", " distribution[lineNumber] += 1\n", "\n", "print(distribution)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An overwhelming majority has it on line 3\n", "\n", "Let's make the output a bit more friendly:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "line 1 is home to 11 results\n", "line 2 is home to 110 results\n", "line 3 is home to 834 results\n", "line 4 is home to 102 results\n", "line 5 is home to 33 results\n", "line 6 is home to 42 results\n", "line 7 is home to 37 results\n", "line 8 is home to 31 results\n", "line 9 is home to 16 results\n", "line 10 is home to 13 results\n", "line 11 is home to 8 results\n", "line 12 is home to 9 results\n", "line 13 is home to 8 results\n", "line 14 is home to 4 results\n", "line 15 is home to 4 results\n", "line 16 is home to 5 results\n", "line 17 is home to 2 results\n", "line 19 is home to 1 results\n", "line 20 is home to 2 results\n", "line 21 is home to 1 results\n", "line 31 is home to 1 results\n" ] } ], "source": [ "for (lineNumber, amount) in sorted(distribution.items()):\n", " print(f\"line {lineNumber:>2} is home to {amount:>3} results\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now inspect more closely what is going on, for example where results appear late in the tablet, after line 16:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.67s 7 results\n" ] } ], "source": [ "results16 = A.search(\n", " \"\"\"\n", "line ln>16\n", " =: word\n", " =: sign reading=um\n", " <: sign reading=ma\n", " :=\n", " < sign reading=ma\n", " :=\n", "\"\"\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can show them here too:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nplinewordsignsignsign
1P365130 obverse:20
um-ma a-ma-na-nu-um-ma
um-ma
um-
ma
ma
2P479269 obverse:20
um-ma szu-ma
um-ma
um-
ma
ma
3P479269 obverse:31
um-ma szu-ma
um-ma
um-
ma
ma
4P387306 obverse:19
um-ma at-ta-a-ma
um-ma
um-
ma
ma
5P387324 obverse:17
um-ma at!-ta-ma#
um-ma
um-
ma
ma#
6P372422 obverse:17
um-ma _sag-geme2_-ma
um-ma
um-
ma
ma
7P372422 obverse:21
um-ma szu-u2-ma
um-ma
um-
ma
ma
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But at this point it might be easier to take the new query back to the Text-Fabric browser and query it there:\n", "\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }