{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "# Tutorial\n", "\n", "This notebook gets you started with using\n", "[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in the Old-Assyrian Letter corpus (cuneiform).\n", "\n", "Familiarity with the underlying\n", "[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html)\n", "is recommended." ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## Installing Text-Fabric\n", "\n", "See [here](https://annotation.github.io/text-fabric/tf/about/install.html)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Tip\n", "If you start computing with this tutorial, first copy its parent directory to somewhere else,\n", "outside your repository.\n", "If you pull changes from the repository later, your work will not be overwritten.\n", "Where you put your tutorial directory is up to you.\n", "It will work from any directory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Old Assyrian data\n", "\n", "Text-Fabric will fetch the data set for you from the newest GitHub release binaries.\n", "\n", "The data will be stored in the `text-fabric-data` in your home directory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Features\n", "The data of the corpus is organized in features.\n", "They are *columns* of data.\n", "Think of the corpus as a gigantic spreadsheet, where row 1 corresponds to the\n", "first sign, row 2 to the second sign, and so on, for all 766,000 signs.\n", "\n", "The information which reading each sign has, constitutes a column in that spreadsheet.\n", "The Old Assyrian corpus contains over 60 columns, not only for the signs, but also for thousands of other\n", "textual objects, such as clusters, lines, columns, faces, documents.\n", "\n", "Instead of putting that information in one big table, the data is organized in separate columns.\n", "We call those columns **features**." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:16.202764Z", "start_time": "2018-05-18T09:17:16.197546Z" } }, "outputs": [], "source": [ "import os\n", "import collections" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Incantation\n", "\n", "The simplest way to get going is by this *incantation*:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:17.537171Z", "start_time": "2018-05-18T09:17:17.517809Z" } }, "outputs": [], "source": [ "from tf.app import use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the very last version, use `hot`.\n", "\n", "For the latest release, use `latest`.\n", "\n", "If you have cloned the repos (TF app and data), use `clone`.\n", "\n", "If you do not want/need to upgrade, leave out the checkout specifiers." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/Nino-cunei/oldassyrian/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/Nino-cunei/oldassyrian/tf/0.1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "This is Text-Fabric 9.2.2\n", "Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n", "\n", "67 features found and 0 ignored\n" ] }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 9.2.2, Nino-cunei/oldassyrian/app v3, Search Reference
Data: OLDASSYRIAN, Character table, Feature docs
Features:
\n", "
Old Assyrian Documents 2000-1600: Cuneiform tablets\n", "
\n", "\n", "
\n", "
\n", "ARK\n", "
\n", "
str
\n", "
\n", " persistent identifier of type ARK from metadata field \"UCLA Library ARK\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:37Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "
\n", " what comes after a sign or word (- or space)\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:37Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "afterr\n", "
\n", "
str
\n", "
\n", " what comes after a sign or word (- or space); between adjacent signs a ␣ is inserted\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:38Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "afteru\n", "
\n", "
str
\n", "
\n", " what comes after a sign when represented as unicode (space)\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:39Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "atf\n", "
\n", "
str
\n", "
\n", " full atf of a sign (without cluster chars) or word (including cluster chars)\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:40Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "atfpost\n", "
\n", "
str
\n", "
\n", " atf of cluster closings at sign\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "atfpre\n", "
\n", "
str
\n", "
\n", " atf of cluster openings at sign\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "author\n", "
\n", "
str
\n", "
\n", " author from metadata field \"Author(s)\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "col\n", "
\n", "
int
\n", "
\n", " ATF column number\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "collection\n", "
\n", "
str
\n", "
\n", " collection of a document\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "comment\n", "
\n", "
str
\n", "
\n", " $ comment to line or inline comment to slot ($ and $)\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "damage\n", "
\n", "
int
\n", "
\n", " whether a sign is damaged\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
int
\n", "
\n", " whether a sign is a determinative gloss - between braces { }\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "docnote\n", "
\n", "
str
\n", "
\n", " additional remarks in the document identification\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "docnumber\n", "
\n", "
str
\n", "
\n", " number of a document within a collection-volume\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "excavation\n", "
\n", "
str
\n", "
\n", " excavation number from metadata field \"Excavation no.\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "excised\n", "
\n", "
int
\n", "
\n", " whether a sign is excised - between double angle brackets << >>\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "face\n", "
\n", "
str
\n", "
\n", " full name of a face including the enclosing object\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "flags\n", "
\n", "
str
\n", "
\n", " sequence of flags after a sign\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fraction\n", "
\n", "
str
\n", "
\n", " fraction of a numeral\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "genre\n", "
\n", "
str
\n", "
\n", " genre from metadata field \"Genre\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "grapheme\n", "
\n", "
str
\n", "
\n", " grapheme of a sign\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "graphemer\n", "
\n", "
str
\n", "
\n", " grapheme of a sign using non-ascii characters\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "graphemeu\n", "
\n", "
str
\n", "
\n", " grapheme of a sign using cuneiform unicode characters\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lang\n", "
\n", "
str
\n", "
\n", " language of a document\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "langalt\n", "
\n", "
int
\n", "
\n", " 1 if a sign is in the alternate language (i.e. Sumerian) - between underscores _ _\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
int
\n", "
\n", " ATF line number of a numbered line, without prime\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lnc\n", "
\n", "
str
\n", "
\n", " ATF line identification of a comment line ($)\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lnno\n", "
\n", "
str
\n", "
\n", " ATF line number, may be $ or #, with prime; column number prepended\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:41Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "material\n", "
\n", "
str
\n", "
\n", " material indication from metadata field \"Material\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "missing\n", "
\n", "
int
\n", "
\n", " whether a sign is missing - between square brackets [ ]\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "museumcode\n", "
\n", "
str
\n", "
\n", " museum code from metadata field \"Museum no.\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "museumname\n", "
\n", "
str
\n", "
\n", " museum name from metadata field \"Collection\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "object\n", "
\n", "
str
\n", "
\n", " name of an object of a document\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "operator\n", "
\n", "
str
\n", "
\n", " the ! or x in a !() or x() construction\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "operatorr\n", "
\n", "
str
\n", "
\n", " the ! or x in a !() or x() construction, represented as =, ␣\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "operatoru\n", "
\n", "
str
\n", "
\n", " the ! or x in a !() or x() construction, represented as =, ␣\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "
\n", " \n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "period\n", "
\n", "
str
\n", "
\n", " period indication from metadata field \"Period\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "pnumber\n", "
\n", "
str
\n", "
\n", " P number of a document\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "primecol\n", "
\n", "
int
\n", "
\n", " whether a prime is present on a column number\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "primeln\n", "
\n", "
int
\n", "
\n", " whether a prime is present on a line number\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "pubdate\n", "
\n", "
str
\n", "
\n", " publication date from metadata field \"Publication date\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "question\n", "
\n", "
int
\n", "
\n", " whether a sign has the question flag (?)\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "reading\n", "
\n", "
str
\n", "
\n", " reading of a sign\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:42Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "readingr\n", "
\n", "
str
\n", "
\n", " reading of a sign using non-ascii characters\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:43Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "readingu\n", "
\n", "
str
\n", "
\n", " reading of a sign using cuneiform unicode characters\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:44Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "remarkable\n", "
\n", "
int
\n", "
\n", " whether a sign is remarkable (!)\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:44Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "remarks\n", "
\n", "
str
\n", "
\n", " # comment to line\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:44Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "repeat\n", "
\n", "
int
\n", "
\n", " repeat of a numeral; the value n (unknown) is represented as -1\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:44Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "srcLn\n", "
\n", "
str
\n", "
\n", " full line in source file\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:45Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "srcLnNum\n", "
\n", "
int
\n", "
\n", " line number in source file\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:45Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "srcfile\n", "
\n", "
str
\n", "
\n", " source file name of a document\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:45Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "subgenre\n", "
\n", "
str
\n", "
\n", " genre from metadata field \"Sub-genre\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:45Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "supplied\n", "
\n", "
int
\n", "
\n", " whether a sign is supplied - between angle brackets < >\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:45Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "sym\n", "
\n", "
str
\n", "
\n", " essential part of a sign or of a word\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:45Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "symr\n", "
\n", "
str
\n", "
\n", " essential part of a sign or of a word using non-ascii characters\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:46Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "symu\n", "
\n", "
str
\n", "
\n", " essential part of a sign or of a word using cuneiform unicode characters\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:48Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
int
\n", "
\n", " whether a line has a translation\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:49Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "transcriber\n", "
\n", "
str
\n", "
\n", " person who did the encoding into ATF from metadata field \"ATF source\"\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:49Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "translation@ll\n", "
\n", "
str
\n", "
\n", " translation of line in language en = English\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:49Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "
\n", " name of a type of cluster or kind of sign\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:49Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "uncertain\n", "
\n", "
int
\n", "
\n", " whether a sign is uncertain - between brackets ( )\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:50Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "version\n", "
\n", "
str
\n", "
\n", " version from meta data line\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:50Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "volume\n", "
\n", "
int
\n", "
\n", " volume of a document within a collection\n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:50Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "
\n", " \n", "
\n", "\n", "
\n", "
converters:
\n", "
Alba de Ridder, Martijn Kokken, Cale Johnson, Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-26T08:32:50Z
\n", "
\n", "\n", "
\n", "
editor:
\n", "
various
\n", "
\n", "\n", "
\n", "
institute:
\n", "
CDL
\n", "
\n", "\n", "
\n", "
name:
\n", "
Old Assyrian Documents
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"Nino-cunei/oldassyrian\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see which features have been loaded, and if you click on a feature name, you find its documentation.\n", "If you hover over a name, you see where the feature is located on your system." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## API\n", "\n", "The result of the incantation is that we have a bunch of special variables at our disposal\n", "that give us access to the text and data of the corpus.\n", "\n", "At this point it is helpful to throw a quick glance at the text-fabric API documentation\n", "(see the links under **API Members** above).\n", "\n", "The most essential thing for now is that we can use `F` to access the data in the features\n", "we've loaded.\n", "But there is more, such as `N`, which helps us to walk over the text, as we see in a minute.\n", "\n", "The **API members** above show you exactly which new names have been inserted in your namespace.\n", "If you click on these names, you go to the API documentation for them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Search\n", "Text-Fabric contains a flexible search engine, that does not only work for the data,\n", "of this corpus, but also for other corpora and data that you add to corpora.\n", "\n", "**Search is the quickest way to come up-to-speed with your data, without too much programming.**\n", "\n", "Jump to the dedicated [search](search.ipynb) search tutorial first, to whet your appetite.\n", "\n", "The real power of search lies in the fact that it is integrated in a programming environment.\n", "You can use programming to:\n", "\n", "* compose dynamic queries\n", "* process query results\n", "\n", "Therefore, the rest of this tutorial is still important when you want to tap that power.\n", "If you continue here, you learn all the basics of data-navigation with Text-Fabric." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Counting\n", "\n", "In order to get acquainted with the data, we start with the simple task of counting.\n", "\n", "## Count all nodes\n", "We use the\n", "[`N,walk()` generator](https://annotation.github.io/text-fabric/tf/core/nodes.html#tf.core.nodes.Nodes.walk)\n", "to walk through the nodes.\n", "\n", "We compared the TF data to a gigantic spreadsheet, where the rows correspond to the signs.\n", "In Text-Fabric, we call the rows `slots`, because they are the textual positions that can be filled with signs.\n", "\n", "We also mentioned that there are also other textual objects.\n", "They are the clusters, lines, faces and documents.\n", "They also correspond to rows in the big spreadsheet.\n", "\n", "In Text-Fabric we call all these rows *nodes*, and the `N()` generator\n", "carries us through those nodes in the textual order.\n", "\n", "Just one extra thing: the `info` statements generate timed messages.\n", "If you use them instead of `print` you'll get a sense of the amount of time that\n", "the various processing steps typically need." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:43.894153Z", "start_time": "2018-05-18T09:17:43.597128Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Counting nodes ...\n", " 0.16s 1289143 nodes\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"Counting nodes ...\")\n", "\n", "i = 0\n", "for n in N.walk():\n", " i += 1\n", "\n", "A.info(\"{} nodes\".format(i))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here you see it: well over a million nodes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What are those nodes?\n", "Every node has a type, like sign, or line, face.\n", "But what exactly are they?\n", "\n", "Text-Fabric has two special features, `otype` and `oslots`, that must occur in every Text-Fabric data set.\n", "`otype` tells you for each node its type, and you can ask for the number of `slot`s in the text.\n", "\n", "Here we go!" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:47.820323Z", "start_time": "2018-05-18T09:17:47.812328Z" } }, "outputs": [ { "data": { "text/plain": [ "'sign'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.slotType" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:48.549430Z", "start_time": "2018-05-18T09:17:48.543371Z" } }, "outputs": [ { "data": { "text/plain": [ "766501" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.maxSlot" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:49.251302Z", "start_time": "2018-05-18T09:17:49.244467Z" } }, "outputs": [ { "data": { "text/plain": [ "1289143" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.maxNode" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:49.922863Z", "start_time": "2018-05-18T09:17:49.916078Z" } }, "outputs": [ { "data": { "text/plain": [ "('document', 'face', 'line', 'word', 'cluster', 'sign')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.all" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:51.782779Z", "start_time": "2018-05-18T09:17:51.774167Z" } }, "outputs": [ { "data": { "text/plain": [ "(('document', 160.52376963350784, 848587, 853361),\n", " ('face', 64.35768261964735, 853362, 865271),\n", " ('line', 6.977061714909885, 865272, 975131),\n", " ('word', 2.374915608320701, 975132, 1289143),\n", " ('cluster', 1.7090576841079368, 766502, 848586),\n", " ('sign', 1, 1, 766501))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.levels.data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is interesting: above you see all the textual objects, with the average size of their objects,\n", "the node where they start, and the node where they end." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Count individual object types\n", "This is an intuitive, but inefficient way to count the number of nodes in each type.\n", "Note in passing, how we use the `indent` in conjunction with `info` to produce neat timed\n", "and indented progress messages." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:57.806821Z", "start_time": "2018-05-18T09:17:57.558523Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s counting objects ...\n", " | 0.00s 4775 documents\n", " | 0.00s 11910 faces\n", " | 0.01s 109860 lines\n", " | 0.03s 314012 words\n", " | 0.01s 82085 clusters\n", " | 0.07s 766501 signs\n", " 0.13s Done\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"counting objects ...\")\n", "\n", "for otype in F.otype.all:\n", " i = 0\n", "\n", " A.indent(level=1, reset=True)\n", "\n", " for n in F.otype.s(otype):\n", " i += 1\n", "\n", " A.info(\"{:>7} {}s\".format(i, otype))\n", "\n", "A.indent(level=0)\n", "A.info(\"Done\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Much more efficient is:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:57.806821Z", "start_time": "2018-05-18T09:17:57.558523Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s counting objects ...\n", " | 0.00s 4775 documents\n", " | 0.00s 11910 faces\n", " | 0.00s 109860 lines\n", " | 0.00s 314012 words\n", " | 0.00s 82085 clusters\n", " | 0.00s 766501 signs\n", " 0.00s Done\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"counting objects ...\")\n", "\n", "for otype in F.otype.all:\n", " i = 0\n", "\n", " A.indent(level=1, reset=True)\n", " amount = len(F.otype.s(otype))\n", " A.info(f\"{amount:>7} {otype}s\")\n", "\n", "A.indent(level=0)\n", "A.info(\"Done\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But nothing beats this:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 4775 documents\n", " 11910 faces\n", "109860 lines\n", "314012 words\n", " 82085 clusters\n", "766501 signs\n" ] } ], "source": [ "for lv in C.levels.data:\n", " print(f\"{lv[3] - lv[2] + 1:>6} {lv[0]}s\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Viewing textual objects\n", "\n", "You can use the A API (the extra power) to display cuneiform text.\n", "\n", "See the [display](display.ipynb) tutorial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature statistics\n", "\n", "`F`\n", "gives access to all features.\n", "Every feature has a method\n", "`freqList()`\n", "to generate a frequency list of its values, higher frequencies first.\n", "Here are the repeats of numerals (the `-1` comes from a `n(rrr)`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "((1, 9533),\n", " (2, 4139),\n", " (5, 2975),\n", " (3, 2448),\n", " (4, 1787),\n", " (6, 1236),\n", " (7, 987),\n", " (8, 756),\n", " (9, 395),\n", " (-1, 2))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.repeat.freqList()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Signs have types and clusters have types. We can count them separately:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "(('langalt', 47511),\n", " ('missing', 28852),\n", " ('det', 3823),\n", " ('supplied', 1281),\n", " ('excised', 349),\n", " ('uncertain', 269))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.type.freqList(\"cluster\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "(('reading', 691349),\n", " ('numeral', 30256),\n", " ('wdiv', 14395),\n", " ('ellipsis', 12937),\n", " ('unknown', 10808),\n", " ('commentline', 6345),\n", " ('grapheme', 305),\n", " ('complex', 43),\n", " ('empty', 30),\n", " ('comment', 19),\n", " ('other', 5))" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.type.freqList(\"sign\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note in passing that we have nearly 15,000 word dividers!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, the flags:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "(('#', 12950),\n", " ('?', 2875),\n", " ('!', 2141),\n", " ('#?', 684),\n", " ('#!', 127),\n", " ('!?', 99),\n", " ('?!', 37),\n", " ('#!?', 21),\n", " ('?#!', 6),\n", " ('?#', 3),\n", " ('#?!', 1))" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.flags.freqList()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Word matters\n", "\n", "## Top 20 frequent words\n", "\n", "We represent words by their essential symbols, collected in the feature *sym* (which also exists for signs)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12421 a-na\n", "11623 sza\n", "10191 ...\n", " 9683 ku3-babbar\n", " 8560 ma-na\n", " 7809 x\n", " 5849 dumu\n", " 5788 u3\n", " 5717 gin2\n", " 4899 1(disz)\n", " 4360 i-na\n", " 4042 um-ma\n", " 3922 la2\n", " 3839 1(u)\n", " 3726 igi\n", " 2903 u2\n", " 2885 tug2\n", " 2864 2(disz)\n", " 2814 1/2(disz)\n", " 2592 5(disz)\n" ] } ], "source": [ "for (w, amount) in F.sym.freqList(\"word\")[0:20]:\n", " print(f\"{amount:>5} {w}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Word distribution\n", "\n", "Let's do a bit more fancy word stuff.\n", "\n", "### Hapaxes\n", "\n", "A hapax can be found by picking the words with frequency 1\n", "\n", "We print 20 hapaxes." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"&i2-li-esz18-dar\"\n", "\"...+3(disz)\"\n", "\"...-a-...\"\n", "\"...-a-ar\"\n", "\"...-a-at\"\n", "\"...-a-hi\"\n", "\"...-a-ku-nu-ti2\"\n", "\"...-a-kum\"\n", "\"...-a-li\"\n", "\"...-a-na\"\n", "\"...-a-s,a\"\n", "\"...-a-szur-ma\"\n", "\"...-a-ta\"\n", "\"...-a-tam2\"\n", "\"...-a-ti2\"\n", "\"...-a-ti2-szu\"\n", "\"...-a-wa-tim\"\n", "\"...-ab\"\n", "\"...-ab2-ti2-ka3\"\n", "\"...-ad-ma\"\n" ] } ], "source": [ "for w in [w for (w, amount) in F.sym.freqList(\"word\") if amount == 1][0:20]:\n", " print(f'\"{w}\"')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Small occurrence base\n", "\n", "The occurrence base of a word are the documents in which occurs.\n", "\n", "We compute the occurrence base of each word." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "occurrenceBase = collections.defaultdict(set)\n", "\n", "for w in F.otype.s(\"word\"):\n", " pNum = T.sectionFromNode(w)[0]\n", " occurrenceBase[F.sym.v(w)].add(pNum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An overview of how many words have how big occurrence bases:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "base size 1 : 19660 words\n", "base size 2 : 3912 words\n", "base size 3 : 1765 words\n", "base size 4 : 1042 words\n", "base size 5 : 650 words\n", "base size 6 : 481 words\n", "base size 7 : 328 words\n", "base size 8 : 279 words\n", "base size 9 : 219 words\n", "base size 10 : 185 words\n", "...\n", "base size 1995 : 1 words\n", "base size 2023 : 1 words\n", "base size 2055 : 1 words\n", "base size 2159 : 1 words\n", "base size 2168 : 1 words\n", "base size 2337 : 1 words\n", "base size 2824 : 1 words\n", "base size 3148 : 1 words\n", "base size 3586 : 1 words\n", "base size 3737 : 1 words\n" ] } ], "source": [ "occurrenceSize = collections.Counter()\n", "\n", "for (w, pNums) in occurrenceBase.items():\n", " occurrenceSize[len(pNums)] += 1\n", "\n", "occurrenceSize = sorted(\n", " occurrenceSize.items(),\n", " key=lambda x: (-x[1], x[0]),\n", ")\n", "\n", "for (size, amount) in occurrenceSize[0:10]:\n", " print(f\"base size {size:>4} : {amount:>5} words\")\n", "print(\"...\")\n", "for (size, amount) in occurrenceSize[-10:]:\n", " print(f\"base size {size:>4} : {amount:>5} words\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's give the predicate *private* to those words whose occurrence base is a single document." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "19660" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "privates = {w for (w, base) in occurrenceBase.items() if len(base) == 1}\n", "len(privates)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Peculiarity of documents\n", "\n", "As a final exercise with words, lets make a list of all documents, and show their\n", "\n", "* total number of words\n", "* number of private words\n", "* the percentage of private words: a measure of the peculiarity of the document" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:52.143337Z", "start_time": "2018-05-18T09:18:52.130385Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 17 empty documents\n", "Found 813 ordinary documents (i.e. without private words)\n" ] } ], "source": [ "docList = []\n", "\n", "empty = set()\n", "ordinary = set()\n", "\n", "for d in F.otype.s(\"document\"):\n", " pNum = T.documentName(d)\n", " words = {F.sym.v(w) for w in L.d(d, otype=\"word\")}\n", " a = len(words)\n", " if not a:\n", " empty.add(pNum)\n", " continue\n", " o = len({w for w in words if w in privates})\n", " if not o:\n", " ordinary.add(pNum)\n", " continue\n", " p = 100 * o / a\n", " docList.append((pNum, a, o, p))\n", "\n", "docList = sorted(docList, key=lambda e: (-e[3], -e[1], e[0]))\n", "\n", "print(f\"Found {len(empty):>4} empty documents\")\n", "print(f\"Found {len(ordinary):>4} ordinary documents (i.e. without private words)\")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:52.143337Z", "start_time": "2018-05-18T09:18:52.130385Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "document #all #own %own\n", "-----------------------------------\n", "P360785 5 5 100.0%\n", "P358235 1 1 100.0%\n", "P360454 1 1 100.0%\n", "P360852 12 11 91.7%\n", "P465862 15 11 73.3%\n", "P360830 19 13 68.4%\n", "P361545 78 52 66.7%\n", "P360142 15 10 66.7%\n", "P357757 78 50 64.1%\n", "P293870 36 21 58.3%\n", "P361474 14 8 57.1%\n", "P360285 7 4 57.1%\n", "P216609 96 52 54.2%\n", "P359305 13 7 53.8%\n", "P361421 62 33 53.2%\n", "P359433 30 15 50.0%\n", "P368348 16 8 50.0%\n", "P290330 12 6 50.0%\n", "P360453 4 2 50.0%\n", "P359347 2 1 50.0%\n", "...\n", "P358110 67 1 1.5%\n", "P358779 67 1 1.5%\n", "P359125 67 1 1.5%\n", "P361502 68 1 1.5%\n", "P359063 70 1 1.4%\n", "P359717 70 1 1.4%\n", "P297242 71 1 1.4%\n", "P358833 72 1 1.4%\n", "P359251 72 1 1.4%\n", "P358588 76 1 1.3%\n", "P390603 76 1 1.3%\n", "P390605 76 1 1.3%\n", "P360654 81 1 1.2%\n", "P358879 85 1 1.2%\n", "P361701 91 1 1.1%\n", "P360405 97 1 1.0%\n", "P358479 100 1 1.0%\n", "P360842 105 1 1.0%\n", "P390595 107 1 0.9%\n", "P359051 117 1 0.9%\n" ] } ], "source": [ "print(\n", " \"{:<20}{:>5}{:>5}{:>5}\\n{}\".format(\n", " \"document\",\n", " \"#all\",\n", " \"#own\",\n", " \"%own\",\n", " \"-\" * 35,\n", " )\n", ")\n", "\n", "for x in docList[0:20]:\n", " print(\"{:<20} {:>4} {:>4} {:>4.1f}%\".format(*x))\n", "print(\"...\")\n", "for x in docList[-20:]:\n", " print(\"{:<20} {:>4} {:>4} {:>4.1f}%\".format(*x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Locality API\n", "We travel upwards and downwards, forwards and backwards through the nodes.\n", "The Locality-API (`L`) provides functions: `u()` for going up, and `d()` for going down,\n", "`n()` for going to next nodes and `p()` for going to previous nodes.\n", "\n", "These directions are indirect notions: nodes are just numbers, but by means of the\n", "`oslots` feature they are linked to slots. One node *contains* an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to.\n", "And one if next or previous to an other, if its slots follow or precede the slots of the other one.\n", "\n", "`L.u(node)` **Up** is going to nodes that embed `node`.\n", "\n", "`L.d(node)` **Down** is the opposite direction, to those that are contained in `node`.\n", "\n", "`L.n(node)` **Next** are the next *adjacent* nodes, i.e. nodes whose first slot comes immediately after the last slot of `node`.\n", "\n", "`L.p(node)` **Previous** are the previous *adjacent* nodes, i.e. nodes whose last slot comes immediately before the first slot of `node`.\n", "\n", "All these functions yield nodes of all possible node types.\n", "By passing an optional parameter, you can restrict the results to nodes of that type.\n", "\n", "The result are ordered according to the order of things in the text.\n", "\n", "The functions return always a tuple, even if there is just one node in the result.\n", "\n", "## Going up\n", "We go from the first word to the document it contains.\n", "Note the `[0]` at the end. You expect one document, yet `L` returns a tuple.\n", "To get the only element of that tuple, you need to do that `[0]`.\n", "\n", "If you are like me, you keep forgetting it, and that will lead to weird error messages later on." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:55.410034Z", "start_time": "2018-05-18T09:18:55.404051Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "848587\n" ] } ], "source": [ "firstDoc = L.u(1, otype=\"document\")[0]\n", "print(firstDoc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And let's see all the containing objects of sign 3:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:56.772513Z", "start_time": "2018-05-18T09:18:56.766324Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sign 3 is contained in document 848587\n", "sign 3 is contained in face 853362\n", "sign 3 is contained in line 865272\n", "sign 3 is contained in word 975133\n", "sign 3 is contained in cluster x\n" ] } ], "source": [ "s = 3\n", "for otype in F.otype.all:\n", " if otype == F.otype.slotType:\n", " continue\n", " up = L.u(s, otype=otype)\n", " upNode = \"x\" if len(up) == 0 else up[0]\n", " print(\"sign {} is contained in {} {}\".format(s, otype, upNode))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going next\n", "Let's go to the next nodes of the first document." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:58.821681Z", "start_time": "2018-05-18T09:18:58.814893Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 693: sign first slot=693 , last slot=693 \n", " 975374: word first slot=693 , last slot=694 \n", " 865338: line first slot=693 , last slot=696 \n", " 853365: face first slot=693 , last slot=720 \n", " 848588: document first slot=693 , last slot=739 \n" ] } ], "source": [ "afterFirstDoc = L.n(firstDoc)\n", "for n in afterFirstDoc:\n", " print(\n", " \"{:>7}: {:<13} first slot={:<6}, last slot={:<6}\".format(\n", " n,\n", " F.otype.v(n),\n", " E.oslots.s(n)[0],\n", " E.oslots.s(n)[-1],\n", " )\n", " )\n", "secondDoc = L.n(firstDoc, otype=\"document\")[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going previous\n", "\n", "And let's see what is right before the second document." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:00.163973Z", "start_time": "2018-05-18T09:19:00.154857Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 848587: document first slot=1 , last slot=692 \n", " 853364: face first slot=579 , last slot=692 \n", " 865337: line first slot=680 , last slot=692 \n", " 975373: word first slot=692 , last slot=692 \n", " 766521: cluster first slot=692 , last slot=692 \n", " 692: sign first slot=692 , last slot=692 \n" ] } ], "source": [ "for n in L.p(secondDoc):\n", " print(\n", " \"{:>7}: {:<13} first slot={:<6}, last slot={:<6}\".format(\n", " n,\n", " F.otype.v(n),\n", " E.oslots.s(n)[0],\n", " E.oslots.s(n)[-1],\n", " )\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going down" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We go to the faces of the first document, and just count them." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:02.530705Z", "start_time": "2018-05-18T09:19:02.475279Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "faces = L.d(firstDoc, otype=\"face\")\n", "print(len(faces))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The first line\n", "We pick two nodes and explore what is above and below them:\n", "the first line and the first word." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:04.024679Z", "start_time": "2018-05-18T09:19:03.995207Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Node 975132\n", " | UP\n", " | | 766502 cluster\n", " | | 865272 line\n", " | | 853362 face\n", " | | 848587 document\n", " | DOWN\n", " | | 766502 cluster\n", " | | 1 sign\n", "Node 865272\n", " | UP\n", " | | 853362 face\n", " | | 848587 document\n", " | DOWN\n", " | | 975132 word\n", " | | 766502 cluster\n", " | | 1 sign\n", " | | 975133 word\n", " | | 2 sign\n", " | | 3 sign\n", " | | 4 sign\n", " | | 975134 word\n", " | | 766503 cluster\n", " | | 5 sign\n", "Done\n" ] } ], "source": [ "for n in [\n", " F.otype.s(\"word\")[0],\n", " F.otype.s(\"line\")[0],\n", "]:\n", " A.indent(level=0)\n", " A.info(\"Node {}\".format(n), tm=False)\n", " A.indent(level=1)\n", " A.info(\"UP\", tm=False)\n", " A.indent(level=2)\n", " A.info(\"\\n\".join([\"{:<15} {}\".format(u, F.otype.v(u)) for u in L.u(n)]), tm=False)\n", " A.indent(level=1)\n", " A.info(\"DOWN\", tm=False)\n", " A.indent(level=2)\n", " A.info(\"\\n\".join([\"{:<15} {}\".format(u, F.otype.v(u)) for u in L.d(n)]), tm=False)\n", "A.indent(level=0)\n", "A.info(\"Done\", tm=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Text API\n", "\n", "So far, we have mainly seen nodes and their numbers, and the names of node types.\n", "You would almost forget that we are dealing with text.\n", "So let's try to see some text.\n", "\n", "In the same way as `F` gives access to feature data,\n", "`T` gives access to the text.\n", "That is also feature data, but you can tell Text-Fabric which features are specifically\n", "carrying the text, and in return Text-Fabric offers you\n", "a Text API: `T`.\n", "\n", "## Formats\n", "Cuneiform text can be represented in a number of ways:\n", "\n", "* original ATF, with bracketings and flags\n", "* essential symbols: readings and graphemes, repeats and fractions (of numerals), no flags, no clusterings\n", "* unicode symbols\n", "\n", "If you wonder where the information about text formats is stored:\n", "not in the program text-fabric, but in the data set.\n", "It has a feature `otext`, which specifies the formats and which features\n", "must be used to produce them. `otext` is the third special feature in a TF data set,\n", "next to `otype` and `oslots`.\n", "It is an optional feature.\n", "If it is absent, there will be no `T` API.\n", "\n", "Here is a list of all available formats in this data set." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:05.606582Z", "start_time": "2018-05-18T09:19:05.593486Z" } }, "outputs": [ { "data": { "text/plain": [ "['layout-orig-rich',\n", " 'layout-orig-unicode',\n", " 'text-orig-full',\n", " 'text-orig-plain',\n", " 'text-orig-rich',\n", " 'text-orig-unicode']" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted(T.formats)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the formats\n", "\n", "The ` T.text()` function is central to get text representations of nodes. Its most basic usage is\n", "\n", "```python\n", "T.text(nodes, fmt=fmt)\n", "```\n", "where `nodes` is a list or iterable of nodes, usually word nodes, and `fmt` is the name of a format.\n", "If you leave out `fmt`, the default `text-orig-full` is chosen.\n", "\n", "The result is the text in that format for all nodes specified:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'lugal lugal-ke-en6 lugala-ki-di2-e re-be-'" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], fmt=\"text-orig-plain\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is also another usage of this function:\n", "\n", "```python\n", "T.text(node, fmt=fmt)\n", "```\n", "\n", "where `node` is a single node.\n", "In this case, the default format is `ntype-orig-full` where `ntype` is the type of `node`.\n", "\n", "If the format is defined in the corpus, it will be used. Otherwise, the word nodes contained in `node` will be looked up\n", "and represented with the default format `text-orig-full`.\n", "\n", "In this way we can sensibly represent a lot of different nodes, such as documents, faces, lines, clusters, words and signs.\n", "\n", "We compose a set of example nodes and run `T.text` on them:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 975132, 766502, 865272, 853362, 848587]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exampleNodes = [\n", " F.otype.s(\"sign\")[0],\n", " F.otype.s(\"word\")[0],\n", " F.otype.s(\"cluster\")[0],\n", " F.otype.s(\"line\")[0],\n", " F.otype.s(\"face\")[0],\n", " F.otype.s(\"document\")[0],\n", "]\n", "exampleNodes" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is sign 1:\n", "_lugal_ \n", "\n", "This is word 975132:\n", "_lugal_ \n", "\n", "This is cluster 766502:\n", "_lugal_ \n", "\n", "This is line 865272:\n", "_lugal_ lugal-ke-en6 _lugal_\n", "\n", "This is face 853362:\n", "_lugal_ lugal-ke-en6 _lugal_a-ki-di2-e re-be-tim _lugal_da-num sza isz-ti2 i-le-ee-ta-wu-ni {d}iszkur da-nu-tam2i-di2-szu-ma isz-tu3 s,i2-itsza-am-szi2-im a-di2 e-ra-ab2sza-am-szi2-im ma-tam2 as,-ba-at-mai-na u4-mi3-im isz-te2-en6a-na 7(u) a-la2-ni ka3-ka3-am a-di2-inru-ba-e-szu-nu u2-s,a-bi4-it u3 a-li-szu-nuu2-ha-li-iq {d}iszkur be-el e-mu-qi2-imu3 esz18-dar be-la2-at ta-ha-zi-imat-ma s,a-bi4-tam2 a-mu-ur-ma li-bi4-tam2a-na na-ri-im a-di2-ma i-nala2-sa3-mi3-a mu-sa3-ri i-bi4-ti2-iq-maza-ar-a-am asz2-ta-ka3-an-ma al-su2-mas,a-bi4-tam2 as,-ba-at li-bi4-tam2i-ma-e u2-sze2-li {d}iszkur u3 esz18-darat#-[ma] _1(disz) li#-im gu4 hi-a_ 6(disz) li-me-e_udu-hi-a_ u2-mi3-sza-ma lu u2-t,a-ba-ah7(disz) li-me-e qa2-ra#-du-a sza i-ra-timu2-mi3-sza-ma ma-ah-ri-a e-ku-lu-ni3(disz) li-me-e la2-si2-mu-u2-asza ar-ka3-tim e-ku-lu-ni_1(disz) li-im_ sza-qi2-u2-au4-mi3-sza-ma mu-ha-amsza kur-ur-si2-na-tim a-di2-isza-ba-im e-ku-lu-ni_tir_-tu3 i-ig-re-e-ma7(disz) li-me-e qa2-ra-du-a\n", "\n", "This is document 848587:\n", "_lugal_ lugal-ke-en6 _lugal_a-ki-di2-e re-be-tim _lugal_da-num sza isz-ti2 i-le-ee-ta-wu-ni {d}iszkur da-nu-tam2i-di2-szu-ma isz-tu3 s,i2-itsza-am-szi2-im a-di2 e-ra-ab2sza-am-szi2-im ma-tam2 as,-ba-at-mai-na u4-mi3-im isz-te2-en6a-na 7(u) a-la2-ni ka3-ka3-am a-di2-inru-ba-e-szu-nu u2-s,a-bi4-it u3 a-li-szu-nuu2-ha-li-iq {d}iszkur be-el e-mu-qi2-imu3 esz18-dar be-la2-at ta-ha-zi-imat-ma s,a-bi4-tam2 a-mu-ur-ma li-bi4-tam2a-na na-ri-im a-di2-ma i-nala2-sa3-mi3-a mu-sa3-ri i-bi4-ti2-iq-maza-ar-a-am asz2-ta-ka3-an-ma al-su2-mas,a-bi4-tam2 as,-ba-at li-bi4-tam2i-ma-e u2-sze2-li {d}iszkur u3 esz18-darat#-[ma] _1(disz) li#-im gu4 hi-a_ 6(disz) li-me-e_udu-hi-a_ u2-mi3-sza-ma lu u2-t,a-ba-ah7(disz) li-me-e qa2-ra#-du-a sza i-ra-timu2-mi3-sza-ma ma-ah-ri-a e-ku-lu-ni3(disz) li-me-e la2-si2-mu-u2-asza ar-ka3-tim e-ku-lu-ni_1(disz) li-im_ sza-qi2-u2-au4-mi3-sza-ma mu-ha-amsza kur-ur-si2-na-tim a-di2-isza-ba-im e-ku-lu-ni_tir_-tu3 i-ig-re-e-ma7(disz) li-me-e qa2-ra-du-ai-ra-tim e-ku-lu a-nawa-ar-ki-im i-ir-tumla2 ik-szu-ud-ma a-la2-ap2-szuku-sza-ma-ni-a-am sza ku-si2-i-szuit,-bu-uh3-ma a-na wa-ar-ki-imi-ir-tam2 i-di2-in nu-hi-ti2-mi3ku-ur--na-am u2-ri-ir-maa-na ar-ni-szu _1(disz) me-et gu4 hi-a__2(disz) me-et udu-hi-a_ it,-bu-uh3-maur-di2-a u2-sza-ki-il5 {d}iszkuru3 esz18-dar at-ma _mu 7(disz)-sze3 iti-kam_u3 sza-pa2-tam2 i-na i-ki-il5-timqa2-du um-me-ni-a lu u2-szi2-ibi-na wa-s,a-i-a sza _na4 gug_u3 _na4 za-gin3_ qa2-nu-a-amlu ar-ku-us2-ma a-na ma-timlu u2-za-iz sza-du-a-am hu-ma-nama-szi2-ni-szu am-ha-su2-ma ki-masi2-ki-tim i-ba-ri-szu-nu s,a-al-mi3u2-sza-zi-iz ru-ba-amsza tu3-uk-ri-isz masz-kam u2-la2-bi4-iszhu-du-ra bi4-be-na-tim qa2-qa2-da-ti2-szu-nuasz2-ku-un a-la2-szi2-am ki-masi2-ni-isz-tim qa2-qa2-da-ti2-szu-nuak-tu3-um sza a-mu-ri-eki-ma a-pi3-szu-nu sza ma-t,imi-sza-ar-szu-nu aq-t,i2-i sza ki-la2-ri-ei-mar-szi2-im qa2-qa2-da-ti2-szu-nuar-ku-us2 sza-ni-um ka3-ni-szi2su2-tu3-hi-szu-nu u2-sze2-ersza ha-tim qa2-ba-al-ti2 qa2-qa2-da-ti2-szu-nu u2-sza-ag-li-ib lu-uh3-me-etu3-di2-tam2 u2-di2-id gu5-ti2-tam2 lu-lu-am u3 ha-ha-am su2-tu3?-hi?-szu-nu u2-sza-ri4(disz) zi-qi2 sza-ma-e i-qa2-ti2-a al-pu-ut mi3-na-am i-t,up-pi3-imlu-sza-am-i-id a-nu-um la2 i-de8-a-ni ki-ma lugal a-na-ku-nima-tam2 e-li-tam2 u3 sza-ap2-li-tam2 as,-bu-tu3-ni-isza-tu3-uk-ki li-sza-ar-bi4-u2 {d}iszkur / _lugal_\n", "\n" ] } ], "source": [ "for n in exampleNodes:\n", " print(f\"This is {F.otype.v(n)} {n}:\")\n", " print(T.text(n))\n", " print(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the formats\n", "Now let's use those formats to print out the first line in this corpus.\n", "\n", "Note that only the formats starting with `text-` are usable for this.\n", "\n", "For the `layout-` formats, see [display](display.ipynb)." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:10.077589Z", "start_time": "2018-05-18T09:19:10.070503Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "text-orig-full:\n", "\t_lugal_ lugal-ke-en6 _lugal_a-ki-di2-e re-be-\n", "text-orig-plain:\n", "\tlugal lugal-ke-en6 lugala-ki-di2-e re-be-\n", "text-orig-rich:\n", "\tlugal lugal-ke-en₆ lugala-ki-di₂-e re-be-\n", "text-orig-unicode:\n", "\t𒈗 𒈗𒆠𒅔 𒈗𒀀𒆠𒊹𒂊 𒊑𒁁\n" ] } ], "source": [ "for fmt in sorted(T.formats):\n", " if fmt.startswith(\"text-\"):\n", " print(\"{}:\\n\\t{}\".format(fmt, T.text(range(1, 12), fmt=fmt)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we do not specify a format, the **default** format is used (`text-orig-full`)." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:13.490426Z", "start_time": "2018-05-18T09:19:13.486053Z" } }, "outputs": [ { "data": { "text/plain": [ "'_lugal_ lugal-ke-en6 _lugal_a-ki-di2-e re-be-'" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(range(1, 12))" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'_lugal_ lugal-ke-en6 _lugal_'" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "firstLine = F.otype.s(\"line\")[0]\n", "T.text(firstLine)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'𒈗 𒈗𒆠𒅔 𒈗'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(firstLine, fmt=\"text-orig-unicode\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Word dividers\n", "\n", "First we grab all word dividers in a list." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "14395" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = F.type.s(\"wdiv\")\n", "len(ds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we take the first word divider and look up the line in which it occurs" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "P390626 left:6" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "d = ds[0]\n", "ln = L.u(d, otype=\"line\")[0]\n", "A.webLink(ln)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ATF source of this line is:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['6. sza-tu3-uk-ki li-sza-ar-bi4-u2 {d}iszkur / _lugal_']" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.getSource(ln)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the text formats to display this line in various forms:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'sza-tu3-uk-ki li-sza-ar-bi4-u2 {d}iszkur / _lugal_'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(ln)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'sza-tu3-uk-ki li-sza-ar-bi4-u2 d⁼iszkur / lugal'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(ln, fmt=\"text-orig-plain\")" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ša-tu₃-uk-ki li-ša-ar-bi₄-u₂ d⁼iškur / lugal'" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(ln, fmt=\"text-orig-rich\")" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'𒊭𒁺𒊌𒆠 𒇷𒊭𒅈𒁁𒌑 𒀭𒅎 𒁹 𒈗'" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(ln, fmt=\"text-orig-unicode\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These characters do not look right, but that is because of the font. We can show the text in the right font with the more advanced functions of Text-Fabric (see also [display](display.ipynb):" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
P390626 left:6  𒊭𒁺𒊌𒆠 𒇷𒊭𒅈𒁁𒌑 𒀭𒅎 𒁹 𒈗
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.plain(ln, fmt=\"text-orig-unicode\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now with the word divider highlighted:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
P390626 left:6  𒊭𒁺𒊌𒆠 𒇷𒊭𒅈𒁁𒌑 𒀭𒅎 𒁹 𒈗
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.plain(ln, fmt=\"text-orig-unicode\", highlights=set(ds))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The important things to remember are:\n", "\n", "* you can supply a list of slot nodes and get them represented in all formats\n", "* you can get non-slot nodes `n` in default format by `T.text(n)`\n", "* you can get non-slot nodes `n` in other formats by `T.text(n, fmt=fmt, descend=True)`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Whole text in all formats in just 6 seconds\n", "Part of the pleasure of working with computers is that they can crunch massive amounts of data.\n", "The text of the Old Assyrian Letters is a piece of cake.\n", "\n", "It takes just ten seconds to have that cake and eat it.\n", "In nearly a dozen formats." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:27.839331Z", "start_time": "2018-05-18T09:19:18.526400Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s writing plain text of all letters in all text formats\n", " 4.75s done 4 formats\n", "text-orig-full\n", "_lugal_ lugal-ke-en6 _lugal_\n", "a-ki-di2-e re-be-tim _lugal_\n", "da-num sza isz-ti2 i-le-e\n", "e-ta-wu-ni {d}iszkur da-nu-tam2\n", "i-di2-szu-ma isz-tu3 s,i2-it\n", "\n", "text-orig-plain\n", "lugal lugal-ke-en6 lugal\n", "a-ki-di2-e re-be-tim lugal\n", "da-num sza isz-ti2 i-le-e\n", "e-ta-wu-ni d⁼iszkur da-nu-tam2\n", "i-di2-szu-ma isz-tu3 s,i2-it\n", "\n", "text-orig-rich\n", "lugal lugal-ke-en₆ lugal\n", "a-ki-di₂-e re-be-tim lugal\n", "da-num ša iš-ti₂ i-le-e\n", "e-ta-wu-ni d⁼iškur da-nu-tam₂\n", "i-di₂-šu-ma iš-tu₃ ṣi₂-it\n", "\n", "text-orig-unicode\n", "𒈗 𒈗𒆠𒅔 𒈗\n", "𒀀𒆠𒊹𒂊 𒊑𒁁𒁴 𒈗\n", "𒁕𒉏 𒊭 𒅖𒊹 𒄿𒇷𒂊\n", "𒂊𒋫𒉿𒉌 𒀭𒅎 𒁕𒉡𒁮\n", "𒄿𒊹𒋗𒈠 𒅖𒁺 𒍣𒀉\n", "\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"writing plain text of all letters in all text formats\")\n", "\n", "text = collections.defaultdict(list)\n", "\n", "for ln in F.otype.s(\"line\"):\n", " for fmt in sorted(T.formats):\n", " if fmt.startswith(\"text-\"):\n", " text[fmt].append(T.text(ln, fmt=fmt, descend=True))\n", "\n", "A.info(\"done {} formats\".format(len(text)))\n", "\n", "for fmt in sorted(text):\n", " print(\"{}\\n{}\\n\".format(fmt, \"\\n\".join(text[fmt][0:5])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The full plain text\n", "We write all formats to file, in your `Downloads` folder." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:34.250294Z", "start_time": "2018-05-18T09:19:34.156658Z" } }, "outputs": [], "source": [ "for fmt in T.formats:\n", " if fmt.startswith(\"text-\"):\n", " with open(os.path.expanduser(f\"~/Downloads/{fmt}.txt\"), \"w\") as f:\n", " f.write(\"\\n\".join(text[fmt]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sections\n", "\n", "A section in the letter corpus is a document, a face or a line.\n", "Knowledge of sections is not baked into Text-Fabric.\n", "The config feature `otext.tf` may specify three section levels, and tell\n", "what the corresponding node types and features are.\n", "\n", "From that knowledge it can construct mappings from nodes to sections, e.g. from line\n", "nodes to tuples of the form:\n", "\n", " (p-number, face specifier, line number)\n", "\n", "You can get the section of a node as a tuple of relevant document, face, and line nodes.\n", "Or you can get it as a passage label, a string.\n", "\n", "You can ask for the passage corresponding to the first slot of a node, or the one corresponding to the last slot.\n", "\n", "If you are dealing with document and face nodes, you can ask to fill out the line and face parts as well.\n", "\n", "Here are examples of getting the section that corresponds to a node and vice versa.\n", "\n", "**NB:** `sectionFromNode` always delivers a verse specification, either from the\n", "first slot belonging to that node, or, if `lastSlot`, from the last slot\n", "belonging to that node." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "someNodes = (\n", " F.otype.s(\"sign\")[100000],\n", " F.otype.s(\"word\")[10000],\n", " F.otype.s(\"cluster\")[5000],\n", " F.otype.s(\"line\")[15000],\n", " F.otype.s(\"face\")[1000],\n", " F.otype.s(\"document\")[500],\n", ")" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:43.056511Z", "start_time": "2018-05-18T09:19:43.043552Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100001 sign - P361585 reverse:5 P361585 reverse:5 ((849102, 854752, 879010), (849102, 854752, 879010))\n", " 985132 word - P360578 reverse:10 P360578 reverse:10 ((848733, 853846, 869271), (848733, 853846, 869271))\n", " 771502 cluster - P390597 reverse:14 P390597 reverse:14 ((848876, 854199, 873114), (848876, 854199, 873114))\n", " 880272 line - P358365 left:4 P358365 left:4 ((849149, 854885, 880272), (849149, 854885, 880272))\n", " 854362 face - P390640 reverse P390640 reverse:6 ((848941, 854362), (848941, 854362, 874900))\n", " 849087 document - P390603 P390603 obverse:29 ((849087,), (849087, 854715, 878463))\n" ] } ], "source": [ "for n in someNodes:\n", " nType = F.otype.v(n)\n", " d = f\"{n:>7} {nType}\"\n", " first = A.sectionStrFromNode(n)\n", " last = A.sectionStrFromNode(n, lastSlot=True, fillup=True)\n", " tup = (\n", " T.sectionTuple(n),\n", " T.sectionTuple(n, lastSlot=True, fillup=True),\n", " )\n", " print(f\"{d:<16} - {first:<18} {last:<18} {tup}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Clean caches\n", "\n", "Text-Fabric pre-computes data for you, so that it can be loaded faster.\n", "If the original data is updated, Text-Fabric detects it, and will recompute that data.\n", "\n", "But there are cases, when the algorithms of Text-Fabric have changed, without any changes in the data, that you might\n", "want to clear the cache of precomputed results.\n", "\n", "There are two ways to do that:\n", "\n", "* Locate the `.tf` directory of your dataset, and remove all `.tfx` files in it.\n", " This might be a bit awkward to do, because the `.tf` directory is hidden on Unix-like systems.\n", "* Call `TF.clearCache()`, which does exactly the same.\n", "\n", "It is not handy to execute the following cell all the time, that's why I have commented it out.\n", "So if you really want to clear the cache, remove the comment sign below." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "# TF.clearCache()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next steps\n", "\n", "By now you have an impression how to compute around in the corpus.\n", "While this is still the beginning, I hope you already sense the power of unlimited programmatic access\n", "to all the bits and bytes in the data set.\n", "\n", "Here are a few directions for unleashing that power.\n", "\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[similarLines](similarLines.ipynb)** spot the similarities between lines\n", "\n", "---\n", "\n", "See the [cookbook](cookbook) for recipes for small, concrete tasks.\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false }, "toc-autonumbering": false, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }