{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "You might want to consider the [start](search.ipynb) of this tutorial.\n", "\n", "Short introductions to other TF datasets:\n", "\n", "* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),\n", "* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),\n", "or the\n", "* [Quran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Upgrade features along a node mapping\n", "\n", "Consider the semantic actor features in \n", "[ch-jensen/participants/actor/tf](https://github.com/ch-jensen/participants/tree/master/actor/tf).\n", "\n", "We see only features for version ``c`` of the BHSA, but we prefer to work with version `2021` of the BHSA.\n", "\n", "When we try to load the features by simply saying\n", "\n", "```\n", "A = use(\"ETCBC/bhsa\", mod=\"ch-jensen/participants/actor/tf\")\n", "```\n", "\n", "we have no luck, because there is no `ch-jensen/participants/actor/tf/2021` on GitHub.\n", "\n", "But, one of the features in the BHSA is `omap@c-2021.tf` and this contains the information to map\n", "all nodes in version `c` to the nodes of version `2021`, as faithfully as is reasonably possible.\n", "\n", "My homework as Text-Fabric developer is to make it so that the statement above works, by steering Text-Fabric\n", "to download version `c` and using the mapping feature to produce upgraded data in the right place.\n", "But I have not get round to that yet.\n", "\n", "So, here is what *you* can do about it 😎.\n", "\n", "1. File an [issue](https://github.com/ch-jensen/participants/issues) and ask Christian whether he is inclined to\n", " use his software to build the features against BHSA version 2021.\n", " *But he might be too busy to do that right now.*\n", "2. Fork [ch-jensen/participants](https://github.com/ch-jensen/participants) and try to run his software yourself.\n", " *That might not be easy. It seems that the code to run is in another repository.\n", " Is all the input data publicly available? Are special settings needed for version 2021?\n", " Is the software still executable?*\n", "3. Do fork the repo by all means, and then use a tool of text-fabric to *upgrade* the features of the older version\n", " to the newer version.\n", " \n", "We take you through the last option and evaluate how well the upgrade process fares." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Incantation\n", "\n", "The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are\n", "explained in the [start tutorial](start.ipynb)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:39.818664Z", "start_time": "2018-05-24T10:06:39.796588Z" } }, "outputs": [], "source": [ "import collections\n", "\n", "from tf.app import use\n", "from tf.fabric import Fabric\n", "from tf.dataset.nodemaps import Versions" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Load the current version of the BHSA\n", "\n", "We need the current version (`2021`) of the BHSA anyway, so we are going to load it.\n", "\n", "## Convention\n", "\n", "We will have two versions of the corpus in our notebook and in our variables.\n", "It is handy to have a consistent naming scheme:\n", "\n", "* `N` (the *now* version): `2021`\n", "* `P` (the *previous* version): `c`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/github/etcbc/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.0, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "N = use(\"ETCBC/bhsa\")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Load the available version of the participant features\n", "\n", "We have forked Christian's repo to `etcbc/participants`, so make sure to clone it to your computer:\n", "\n", "```\n", "cd ~/github/etcbc\n", "git clone https://github.com/ETCBC/participants\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "LOCATION = \"data:~/github/etcbc/participants/actor/tf\"" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "Now we can load the actor features for version `c`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.0, no app configured
Data: ~/github/etcbc/participants/actor/tf/c
Features:
\n", "
TF dataset (unspecified)\n", "
\n", "\n", "
\n", "
\n", "actor\n", "
\n", "
str
\n", "\n", " Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_actor\n", "
\n", "
str
\n", "\n", " Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "\n", "
\n", "\n", "
\n", "
\n", "coref\n", "
\n", "
none
\n", "\n", " Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "P = use(LOCATION, version=\"c\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By clicking the triangles you can find more information about these features." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Upgrade the participant features\n", "\n", "We are going to upgrade the participant features from version `c` to version `2021`.\n", "\n", "For that, we use [tf.dataset.nodemaps.Versions](https://annotation.github.io/text-fabric/tf/dataset/nodemaps.html#tf.dataset.nodemaps.Versions).\n", "\n", "We initialize the Versions object with two text-fabric API objects:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "apis = {\"2021\": N.api, \"c\": P.api}\n", "\n", "V = Versions(apis, \"c\", \"2021\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we migrate the features from \"c\" to \"2021\" and save them in the correct location.\n", "\n", "We skip the `otext` feature, since it is a special config feature, not a data feature made by Christian." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 49s start migrating\n", " 0.03s Done\n" ] } ], "source": [ "V.migrateFeatures((\"actor\", \"coref\", \"prs_actor\"), location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here it is handy to make the migration a bit more verbose. We do it again:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 57s start migrating\n", " 0.32s All additional features loaded - for details use TF.isLoaded()\n", " 0.32s Mapping actor (node)\n", " 0.33s Mapping coref (edge)\n", " 0.40s Mapping prs_actor (node)\n", " 0.00s Exporting 2 node and 1 edge and 0 config features to data:~/github/etcbc/participants/actor/tf/2021:\n", " | 0.00s T actor to data:~/github/etcbc/participants/actor/tf/2021\n", " | 0.00s T prs_actor to data:~/github/etcbc/participants/actor/tf/2021\n", " | 0.03s T coref to data:~/github/etcbc/participants/actor/tf/2021\n", " 0.03s Exported 2 node features and 1 edge features and 0 config features to data:~/github/etcbc/participants/actor/tf/2021\n", " 0.03s Done\n" ] } ], "source": [ "V.migrateFeatures((\"actor\", \"coref\", \"prs_actor\"), location=LOCATION, silent=\"auto\")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Load the upgraded module\n", "\n", "Now we are in a position that we can load version 2021 of the BHSA together with the migrated module of participant features.\n", "Note that we we point Text-Fabric to the forked repo (`etcbc` instead of `ch-jensen`) and then to\n", "our local clone (`:clone`).\n", "\n", "We increase the verbosity, in order to display more metadata of the features." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/github/etcbc/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/participants/actor/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "This is Text-Fabric 10.2.0\n", "Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n", "\n", "125 features found and 0 ignored\n", " 0.67s Dataset without structure sections in otext:no structure functions in the T-API\n", " 2.18s All features loaded/computed - for details use TF.isLoaded()\n", " 1.48s All additional features loaded - for details use TF.isLoaded()\n" ] }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.0, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", "
\n", " 🆗 links between similar passages\n", "
\n", "\n", "
\n", "
author:
\n", "
BHSA Data: Constantijn Sikkel; Parallels Notebook: Dirk Roorda, Martijn Naaijer
\n", "
\n", "\n", "
\n", "
coreData:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:40:46Z
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
Parallels notebook, see https://github.com/ETCBC/parallels
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", "
\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:55Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", "
\n", " ✅ book name in amharic (ኣማርኛ)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:20:27Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
language:
\n", "
ኣማርኛ
\n", "
\n", "\n", "
\n", "
languageCode:
\n", "
am
\n", "
\n", "\n", "
\n", "
languageEnglish:
\n", "
amharic
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
book names from wikipedia and other sources
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", "
\n", " ✅ chapter number (1; 2; 3; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:55Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", "
\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:56Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", "
\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:56Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", "
\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:57Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", "
\n", " ✅ frequency of lexemes\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:24:45Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
computed on the basis of the ETCBC core set of features
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", "
\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:57Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", "
\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:57Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:58Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", "
\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:58Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:17:59Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", "
\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:04Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:04Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", "
\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:13Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", "
\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:05Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", "
\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:06Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", "
\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:13Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", "
\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:14Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:15Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", "
\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:15Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", "
\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:15Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", "
\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:08Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", "
\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:08Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", "
\n", " ✅ sequence number of an object within its context\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:09Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:15Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", "
\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:22:50Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional paragraph file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", "
\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:10Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", "
\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:11Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", "
\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:11Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", "
\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:11Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", "
\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:12Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", "
\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:12Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", "
\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:12Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", "
\n", " ✅ word pointed-transliterated masoretic reading correction\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:23:29Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional ketiv/qere file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", "
\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:23:29Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional ketiv/qere file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:23:29Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional ketiv/qere file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:23:29Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional ketiv/qere file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", "
\n", " ✅ ranking of lexemes based on freqnuecy\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:24:46Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
computed on the basis of the ETCBC core set of features
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", "
\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:13Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", "
\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:16Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", "
\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:14Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", "
\n", " ✅ clause atom: its level in the linguistic embedding\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:16Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", "
\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:01Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:01Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", "
\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:16Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", "
\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:16Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", "
\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:17Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", "
\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:17Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", "
\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:17Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", "
\n", " ✅ verse number\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:18Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", "
\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:16Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", "
\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:17Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
from additional lexicon file provided by the ETCBC
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", "
\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:18Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", "
\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:18Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", "
\n", " ✅ linguistic dependency between textual objects\n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:18:22Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", "
author:
\n", "
Eep Talstra Centre for Bible and Computer
\n", "
\n", "\n", "
\n", "
dataset:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
datasetName:
\n", "
Biblia Hebraica Stuttgartensia Amstelodamensis
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:21:17Z
\n", "
\n", "\n", "
\n", "
email:
\n", "
shebanq@ancient-data.org
\n", "
\n", "\n", "
\n", "
encoders:
\n", "
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
website:
\n", "
https://shebanq.ancient-data.org
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
etcbc/participants/actor/tf\n", "
\n", "\n", "
\n", "
\n", "actor\n", "
\n", "
str
\n", "\n", "
\n", " Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "
\n", "\n", "
\n", "
coreData:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
coreVersion:
\n", "
c
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-16T11:10:24Z
\n", "
\n", "\n", "
\n", "
upgraded:
\n", "
‼️ from version c to 2021
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_actor\n", "
\n", "
str
\n", "\n", "
\n", " Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "
\n", "\n", "
\n", "
coreData:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
coreVersion:
\n", "
c
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-16T11:10:24Z
\n", "
\n", "\n", "
\n", "
upgraded:
\n", "
‼️ from version c to 2021
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "coref\n", "
\n", "
none
\n", "\n", "
\n", " Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "
\n", "\n", "
\n", "
coreData:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
coreVersion:
\n", "
c
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-16T11:10:24Z
\n", "
\n", "\n", "
\n", "
upgraded:
\n", "
‼️ from version c to 2021
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", "
\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "
\n", "\n", "
\n", "
author:
\n", "
BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda
\n", "
\n", "\n", "
\n", "
coreData:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:25:55Z
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
computed by the phono notebook, see https://github.com/ETCBC/phono
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", "
\n", " 🆗 interword material in phonological transcription\n", "
\n", "\n", "
\n", "
author:
\n", "
BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda
\n", "
\n", "\n", "
\n", "
coreData:
\n", "
BHSA
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2021-12-09T14:25:55Z
\n", "
\n", "\n", "
\n", "
provenance:
\n", "
computed by the phono notebook, see https://github.com/ETCBC/phono
\n", "
\n", "\n", "
\n", "
version:
\n", "
2021
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "N = use(\"etcbc/bhsa\", mod=\"etcbc/participants/actor/tf:clone\", silent=\"verbose\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you click the triangles and navigate to the full metadata of the participants features,\n", "you see a line\n", "\n", "```\n", "upgraded: ‼️ from version c to 2021\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checks\n", "\n", "Let's do a few checks to see how well the upgrade process has worked.\n", "\n", "First we load the `c` version of the BHSA and Christian's original features." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/github/etcbc/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/bhsa/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/ch-jensen/participants/actor/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/phono/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/parallels/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.0, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
ch-jensen/participants/actor/tf\n", "
\n", "\n", "
\n", "
\n", "actor\n", "
\n", "
str
\n", "\n", " Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_actor\n", "
\n", "
str
\n", "\n", " Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "\n", "
\n", "\n", "
\n", "
\n", "coref\n", "
\n", "
none
\n", "\n", " Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "P = use(\"etcbc/bhsa\", mod=\"ch-jensen/participants/actor/tf\", version=\"c\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we are going to peek into the corpus by means of pretty displays.\n", "Here we tweak what is displayed and in what style.\n", "\n", "* we load the node mapping feature since it is not loaded by default\n", "* we hide a few container types that are not relevant for our investigation\n", "* we display material in sentence containers\n", "* we use the phonological transcription, instead of fully pointed Hebrew,\n", " so that non-Hebraists can see what is happening here." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "omap@c-2021 edge (int) ⚠️ Maps the nodes of version c to 2021\n" ] } ], "source": [ "N.load(\"omap@c-2021\", silent=\"deep\")\n", "N.isLoaded(\"omap@c-2021\")\n", "\n", "hiddenTypes=\"half_verse,sentence_atom,clause,clause_atom\"\n", "\n", "N.displaySetup(hiddenTypes=hiddenTypes, condenseType=\"sentence\", withNodes=True, fmt=\"text-phono-full\")\n", "P.displaySetup(hiddenTypes=hiddenTypes, condenseType=\"sentence\", withNodes=True, fmt=\"text-phono-full\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Node feature \"actor\"\n", "\n", "What are the node types that have an *actor* value?" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'phrase_atom', 'subphrase'}" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{P.api.F.otype.v(n) for n in P.api.N.walk() if P.api.F.actor.v(n) is not None}" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'phrase_atom', 'subphrase'}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{N.api.F.otype.v(n) for n in N.api.N.walk() if N.api.F.actor.v(n) is not None}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's inspect the frequency lists of *actor*, per node type." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Comparing frequencies on phrase_atoms: 361 OK; 2 discrepancies\n", " 91 94 >JC\n", " 7 9 CNH\n", "\n", "Comparing frequencies on subphrases: 135 OK; 0 discrepancies\n" ] } ], "source": [ "for otype in (\"phrase_atom\", \"subphrase\"):\n", " frequenciesN = N.api.F.actor.freqList(nodeTypes={otype})\n", " frequenciesP = P.api.F.actor.freqList(nodeTypes={otype})\n", " freqDictN = {v: f for (v, f) in frequenciesN}\n", " freqDictP = {v: f for (v, f) in frequenciesP}\n", " goodOnes = []\n", " badOnes = []\n", " for v in sorted(set(freqDictN) | set(freqDictP)):\n", " fN = freqDictN.get(v, 0)\n", " fP = freqDictP.get(v, 0)\n", " if fN == fP:\n", " goodOnes.append(v)\n", " else:\n", " badOnes.append((v, fN, fP))\n", " \n", " print(f\"\\nComparing frequencies on {otype}s: {len(goodOnes)} OK; {len(badOnes)} discrepancies\")\n", " for (v, fN, fP) in badOnes[0:100]:\n", " print(f\"{fN:>3} {fP:>3} {v}\")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Closer inspection\n", "\n", "Most actors on phrase atoms carry over well. But e.g. `CNH` has discrepancies.\n", "Let's get a feel of why we get the discrepancies." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.09s 7 results\n", " 0.09s 9 results\n" ] } ], "source": [ "actorCNH = \"\"\"\n", "phrase_atom\n", " actor=CNH\n", "\"\"\"\n", "\n", "resultsN = N.search(actorCNH)\n", "resultsP = P.search(actorCNH)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
npphrase_atom
1Leviticus 25:11945873tihyˈeh
2Leviticus 25:12945886yôvˈēl
3Leviticus 25:12945887hˈiw
4Leviticus 25:12945888qˌōḏeš
5Leviticus 25:12945889tihyˈeh
6Leviticus 25:51946353baššānˈîm
7Leviticus 25:52946362baššānˈîm
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
npphrase_atom
1Leviticus 25:10945830šānˈā
2Leviticus 25:11945851šānˌā
3Leviticus 25:11945852tihyˈeh
4Leviticus 25:12945865yôvˈēl
5Leviticus 25:12945866hˈiw
6Leviticus 25:12945867qˌōḏeš
7Leviticus 25:12945868tihyˈeh
8Leviticus 25:51946332baššānˈîm
9Leviticus 25:52946341baššānˈîm
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "N.table(resultsN)\n", "P.table(resultsP)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clearly, there is something interesting in Leviticus 25 verses 10 and 11.\n", "\n", "We compare verse 10 in both versions.\n", "Here are the original actors in version `c`:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

sentence 1

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
sentence:1181939
phrase:690866
phrase_atom:945827
67332 wᵊ
phrase:690867
phrase_atom:945828
actor=BN JFR>L
phrase:690868
phrase_atom:945829
actor=CNH XMC
subphrase:1318488
subphrase:1318489
actor=XMC
67336 ha
phrase:690869
phrase_atom:945830
actor=CNH
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "P.show(resultsP, start=1, end=1, condensed=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's find the same sentence in version `2021`" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((1181957, None),)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sP = 1181939\n", "mappedSb = N.api.Es(\"omap@c-2021\").f(sP)\n", "mappedSb" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
sentence:1181957
phrase:690899
phrase_atom:945850
67333 wᵊ
phrase:690900
phrase_atom:945851
actor=BN JFR>L
phrase:690901
phrase_atom:945852
actor=CNH XMC
subphrase:1318495
subphrase:1318498
subphrase:1318496
actor=XMC
67337 ha
subphrase:1318497
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "N.pretty(mappedSb[0][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aha: in version 2021 there is no counterpart of the phrase atom 945830, the one which carried `actor=CNH`.\n", "\n", "This phrase atom has morphed into a subphrase, and hence we loose the connection and this particular annotation." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Edge feature `coref`\n", "\n", "We also have an edge feature in the module. Let's test that as well.\n", "\n", "First we explore the edge feature a little bit.\n", "From which node type to which node type do they go?\n", "\n", "We constrain our displays to phrases from now on." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "N.displaySetup(condenseType=\"phrase\")\n", "P.displaySetup(condenseType=\"phrase\")" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "nodeTypes = collections.Counter()\n", "\n", "for (f, ts) in P.api.E.coref.items():\n", " fromType = P.api.F.otype.v(f)\n", " for t in ts:\n", " toType = P.api.F.otype.v(t)\n", " nodeTypes[(fromType, toType)] += 1" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({('word', 'subphrase'): 471,\n", " ('word', 'phrase_atom'): 20254,\n", " ('word', 'word'): 19884,\n", " ('phrase_atom', 'phrase_atom'): 34404,\n", " ('phrase_atom', 'subphrase'): 1621,\n", " ('phrase_atom', 'word'): 20254,\n", " ('subphrase', 'word'): 471,\n", " ('subphrase', 'subphrase'): 1086,\n", " ('subphrase', 'phrase_atom'): 1621})" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nodeTypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `coref` relation seems to be symmetrical, so when we check cases, we can skip a number\n", "of pairs." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "word - subphrase \n", " 0.09s 471 results\n", " 0.08s 471 results\n", "good: 471\n", "bad : 0\n", "Good:\n", "\tbānˈāʸw => ʔˈel-ʔahᵃrˈōn \n", "\tzivḥêhem => bᵊnˈê yiśrāʔˈēl \n", "\tzivḥêhem => mibbᵊnˈê yiśrāʔˈēl \n", "----------------------------------------\n", "\n", "word - phrase_atom \n", " 0.17s 20188 results\n", " 0.16s 20254 results\n", "good: 3785\n", "bad : 16403\n", "Good:\n", "\tʔᵃlêhˈem => ʔˈel-ʔahᵃrˈōn wᵊʔel-bānˈāʸw wᵊʔˌel kol-bᵊnˈê yiśrāʔˈēl \n", "\thᵉvîʔˌô => šˌôr ʔô-ḵˈeśev ʔô-ʕˌēz \n", "\tʕammˈô . => ʔˌîš ʔîš \n", "Bad:\n", "\tzzarʕˈô => ʔˈîš ʔîš != ʔˈîš 64423 64422 => 944121 944096\n", "\tzzarʕˈô => yittˈēn != ʔîš 64423 64422 => 944127 944097\n", "\tzzarʕˈô => yûmˈāṯ != yittˈēn 64423 64422 => 944131 944103\n", "----------------------------------------\n", "\n", "word - word \n", " 0.22s 19884 results\n", " 0.22s 19884 results\n", "good: 19884\n", "bad : 0\n", "Good:\n", "\tzivḥêhem => zivḥêhˈem \n", "\tzivḥêhem => lāhˌem \n", "\tzivḥêhem => ḏōrōṯˈām . \n", "----------------------------------------\n", "\n", "phrase_atom - phrase_atom \n", " 0.16s 34215 results\n", " 0.16s 34404 results\n", "good: 745\n", "bad : 33470\n", "Good:\n", "\tyᵊḏabbˌēr => [yᵊhwˌāh] \n", "\tyᵊḏabbˌēr => llēʔmˈōr . \n", "\tyᵊḏabbˌēr => ṣiwwˌā \n", "Bad:\n", "\tʔˌîš ʔˈîš != ʔˌîš => ʔˌîš ʔˈîš != ʔˈîš 943311 943285 => 943311 943286\n", "\tmibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr != ʔˈîš => mibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr != ʔˌîš 943312 943286 => 943292 943285\n", "\tggˈār != mibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr => yāḡˈûr != mibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr 943314 943287 => 943294 943266\n", "----------------------------------------\n", "\n", "phrase_atom - subphrase \n", " 0.06s 1599 results\n", " 0.07s 1621 results\n", "good: 220\n", "bad : 1379\n", "Good:\n", "\tyᵊḏabbˌēr => [yᵊhwˈāh] \n", "\tyᵊḏabbˌēr => [yᵊhwˈāh] \n", "\t[yᵊhwˌāh] => [yᵊhwˈāh] \n", "Bad:\n", "\tʔˌîš ʔˈîš != ʔˌîš => ʔîš 943311 943285 => 1317262 1317261\n", "\tʔˌîš ʔˈîš != ʔˌîš => ʔˈîš 943311 943285 => 1317334 1317331\n", "\tggˈār != ʔˈîš => min-haggˌēr != ʔîš 943314 943286 => 1317308 1317261\n", "----------------------------------------\n", "\n", "subphrase - subphrase \n", " 0.05s 1086 results\n", " 0.04s 1086 results\n", "good: 1086\n", "bad : 0\n", "Good:\n", "\tbᵊnˈê yiśrāʔˈēl => mibbᵊnˈê yiśrāʔˈēl \n", "\tyiśrāʔˈēl => yiśrāʔˈēl \n", "\tyiśrāʔˈēl => yiśrāʔˈēl \n", "----------------------------------------\n", "\n" ] } ], "source": [ "done = set()\n", "\n", "for (fromType, toType) in nodeTypes:\n", " if (fromType, toType) in done:\n", " continue\n", " done.add((fromType, toType))\n", " done.add((toType, fromType))\n", " print(f\"{fromType:<15} - {toType:<15}\")\n", " template = f\"\"\"\n", "{fromType}\n", "-coref> {toType}\n", "\"\"\"\n", " resultsN = N.search(template)\n", " resultsP = P.search(template)\n", " \n", " goodOnes = []\n", " badOnes = []\n", "\n", " phonoN = lambda n: N.api.T.text(n, fmt=\"text-phono-full\")\n", " phonoP = lambda n: P.api.T.text(n, fmt=\"text-phono-full\")\n", "\n", " for ((fN, tN), (fP, tP)) in zip(resultsN, resultsP):\n", " fNp = phonoN(fN)\n", " fPp = phonoP(fP)\n", " tNp = phonoN(tN)\n", " tPp = phonoP(tP)\n", " if fNp == fPp and tNp == tPp:\n", " goodOnes.append(f\"{fNp} => {tNp}\")\n", " else:\n", " fDif = fNp if fNp == fPp else f\"{fNp} != {fPp}\"\n", " tDif = tNp if tNp == tPp else f\"{tNp} != {tPp}\"\n", " badOnes.append((f\"{fDif} => {tDif}\", fN, fP, tN, tP))\n", " print(f\"good: {len(goodOnes):>5}\\nbad : {len(badOnes):>5}\")\n", " if len(goodOnes):\n", " print(\"Good:\")\n", " for rep in goodOnes[0:3]:\n", " print(f\"\\t{rep}\")\n", " if len(badOnes):\n", " print(\"Bad:\")\n", " for (rep, fN, fP, tN, tP) in badOnes[0:3]:\n", " print(f\"\\t{rep} {fN} {fP} => {tN} {tP}\")\n", " print(\"-\" * 40)\n", " print(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Observations:\n", "\n", "All `coref` links between words and subphrases match perfectly.\n", "\n", "But where phrase atoms are involved, we get bad ones, sometimes more bad ones than good ones.\n", "\n", "We inspect a few bad cases." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "##### between words and phrase atoms:\n", "\n", "```\n", "zzarʕˈô => ʔˈîš ʔîš != ʔˈîš 64423 64422 => 944121 944096\n", "```" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "fP = 64422\n", "tP = 944096\n", "pfP = P.api.L.u(fP, otype=\"phrase\")[0]\n", "ptP = P.api.L.u(tP, otype=\"phrase\")[0]\n", "highlightsP = {fP: \"orange\", tP: \"cyan\"}" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "fN = 64423\n", "tN = 944121\n", "pfN = N.api.L.u(fN, otype=\"phrase\")[0]\n", "ptN = N.api.L.u(tN, otype=\"phrase\")[0]\n", "highlightsN = {fN: \"orange\", tN: \"cyan\"}" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
phrase:689238
phrase_atom:944104
64421 mi
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
phrase:689232
phrase_atom:944096
64406 ʔˈîš
phrase_atom:944097
64407 ʔîš
phrase_atom:944098
subphrase:1317606
64408 mi
subphrase:1317604
subphrase:1317605
64411 û
subphrase:1317607
64412 min-
64413 ha
64414 ggˈēr
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# original coref link\n", "P.pretty(pfP, highlights=highlightsP)\n", "if pfP != ptP:\n", " P.pretty(ptP, highlights=highlightsP)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
phrase:689271
phrase_atom:944128
64422 mi
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
phrase:689265
phrase_atom:944121
subphrase:1317607
64407 ʔˈîš
subphrase:1317608
64408 ʔîš
phrase_atom:944122
subphrase:1317611
64409 mi
subphrase:1317609
subphrase:1317610
64412 û
subphrase:1317612
64413 min-
64414 ha
64415 ggˈēr
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# mapped `coref` link\n", "N.pretty(pfN, highlights=highlightsN)\n", "if pfN != pfP:\n", " N.pretty(ptN, highlights=highlightsN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Force majeure! The phrase atom in the original has changed. In the new version it is combined with its neighbour,\n", "and the two constituting parts are now subphrases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### between phrase atoms:\n", "\n", "```\n", "ʔˌîš ʔˈîš != ʔˌîš => ʔˌîš ʔˈîš != ʔˈîš 943311 943285 => 943311 943286\n", "```" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "fP = 943285\n", "tP = 943286\n", "pfP = P.api.L.u(fP, otype=\"phrase\")[0]\n", "ptP = P.api.L.u(tP, otype=\"phrase\")[0]\n", "highlightsP = {fP: \"orange\", tP: \"cyan\"}" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "fN = 943311\n", "tN = 943311\n", "pfN = N.api.L.u(tN, otype=\"phrase\")[0]\n", "ptN = N.api.L.u(tN, otype=\"phrase\")[0]\n", "highlightsN = {fN: \"orange\", tN: \"cyan\"}" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
phrase:688450
phrase_atom:943285
63210 ʔˌîš
phrase_atom:943286
63211 ʔˈîš
phrase_atom:943287
subphrase:1317318
63212 mi
subphrase:1317316
subphrase:1317317
63215 û
subphrase:1317319
63216 min-
63217 ha
63218 ggˌēr
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "P.pretty(pfP, highlights=highlightsP)\n", "if pfP != ptP:\n", " P.pretty(ptP, highlights=highlightsP)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
phrase:688483
phrase_atom:943311
subphrase:1317317
63211 ʔˌîš
subphrase:1317318
63212 ʔˈîš
phrase_atom:943312
subphrase:1317321
63213 mi
subphrase:1317319
subphrase:1317320
63216 û
subphrase:1317322
63217 min-
63218 ha
63219 ggˌēr
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "N.pretty(pfN, highlights=highlightsN)\n", "if pfN != ptN:\n", " N.pretty(ptN, highlights=highlightsN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same kind of force majeure. \n", "In this case the link was between the two original phrase atoms.\n", "In the new version these have merged into one phrase atom, and now there is \n", "a `coref` self-link!" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "##### between phrase atoms and subphrases:\n", "\n", "```\n", "ʔˌîš ʔˈîš != ʔˌîš => ʔîš 943311 943285 => 1317262 1317261\n", "```" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "fP = 943285\n", "tP = 1317261\n", "pfP = P.api.L.u(fP, otype=\"phrase\")[0]\n", "ptP = P.api.L.u(tP, otype=\"phrase\")[0]\n", "highlightsP = {fP: \"orange\", tP: \"cyan\"}" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "fN = 943311\n", "tN = 1317262\n", "pfN = N.api.L.u(fN, otype=\"phrase\")[0]\n", "ptN = N.api.L.u(tN, otype=\"phrase\")[0]\n", "highlightsN = {fN: \"orange\", tN: \"cyan\"}" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
phrase:688450
phrase_atom:943285
63210 ʔˌîš
phrase_atom:943286
63211 ʔˈîš
phrase_atom:943287
subphrase:1317318
63212 mi
subphrase:1317316
subphrase:1317317
63215 û
subphrase:1317319
63216 min-
63217 ha
63218 ggˌēr
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
phrase:688362
phrase_atom:943190
subphrase:1317260
63037 ʔˌîš
subphrase:1317261
63038 ʔîš
phrase_atom:943191
63039 mi
subphrase:1317262
subphrase:1317263
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# original `coref` link\n", "P.pretty(pfP, highlights=highlightsP)\n", "if pfP != ptP:\n", " P.pretty(ptP, highlights=highlightsP)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
phrase:688483
phrase_atom:943311
subphrase:1317317
63211 ʔˌîš
subphrase:1317318
63212 ʔˈîš
phrase_atom:943312
subphrase:1317321
63213 mi
subphrase:1317319
subphrase:1317320
63216 û
subphrase:1317322
63217 min-
63218 ha
63219 ggˌēr
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
phrase:688395
phrase_atom:943216
subphrase:1317261
63038 ʔˌîš
subphrase:1317262
63039 ʔîš
phrase_atom:943217
63040 mi
subphrase:1317263
subphrase:1317264
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# mapped `coref` link\n", "N.pretty(pfN, highlights=highlightsN)\n", "if pfN != ptN:\n", " N.pretty(ptN, highlights=highlightsN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same kind of force majeure. \n", "\n", "Clearly, there is a massive reorganization of phrase atoms in version `2021` as compared to version `c`." ] }, { "cell_type": "markdown", "metadata": { "incorrectly_encoded_metadata": "jp-MarkdownHeadingCollapsed=true", "tags": [] }, "source": [ "## Conclusion\n", "\n", "It is great to be able to upgrade features from a version against which they have been created to a newer\n", "version.\n", "But the corpus may have been changed in unforeseen ways, and not every node in the old corpus can be necessarily\n", "matched with a unique node in the new corpus.\n", "If there are annotations on such nodes, then they either do not carry over to the new version, or they may carry\n", "over to unintended extra nodes in the new version.\n", "\n", "We saw a lot of \"bad\" cases. But yet, all these discrepancies are really not that bad.\n", "The mapping has always picked the closest node in the new version that corresponds with the original node in the old version.\n", "\n", "There are ways to detect such discrepancies, and the node mapping already has relevant information about the quality of the mapping.\n", "In fact, the `migrateFeatures` of Text-Fabric uses the quality information when it assigns feature values to nodes.\n", "\n", "But nothing beats generating the features against the new version by the same code that generated them against\n", "the old version.\n", "If there are issues due to important version differences, the author of the generated feature knows best\n", "how to handle that." ] }, { "cell_type": "markdown", "metadata": { "incorrectly_encoded_metadata": "jp-MarkdownHeadingCollapsed=true", "tags": [] }, "source": [ "# All steps\n", "\n", "* **[start](start.ipynb)** your first step in mastering the bible computationally\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[export Excel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[export](export.ipynb)** export your dataset as an Emdros database\n", "* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features\n", "* **map** map somebody else's annotations to a new version of the corpus\n", "* **[volumes](volumes.ipynb)** work with selected books only\n", "* **[trees](trees.ipynb)** work with the BHSA data as syntax trees\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "jupytext": { "encoding": "# -*- coding: utf-8 -*-" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }