{ "cells": [ { "cell_type": "markdown", "id": "exterior-circulation", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "You might want to consider the [start](search.ipynb) of this tutorial.\n", "\n", "Short introductions to other TF datasets:\n", "\n", "* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),\n", "* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),\n", "or the\n", "* [Quran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)\n" ] }, { "cell_type": "markdown", "id": "7e35d0a2-6d43-464c-a214-36b7908cb89d", "metadata": { "tags": [] }, "source": [ "# Volume support\n", "\n", "Text-Fabric 9.0.0 introduces volume support.\n", "Read\n", "[here](https://annotation.github.io/text-fabric/tf/about/volumes.html)\n", "what that is and why you might want it.\n", "\n", "In this tutorial we show the practical side:\n", "how to *extract volumes* from works and *collect* several *volumes* into *collections*." ] }, { "cell_type": "code", "execution_count": 1, "id": "3f0a5ad5-9ce3-48d5-8fe0-4437d7318bcb", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "bf56aa8a-3898-415e-8552-a07f426a305d", "metadata": {}, "outputs": [], "source": [ "import os\n", "from tf.app import use\n", "from tf.fabric import Fabric\n", "from tf.volumes import extract, collect\n", "from tf.core.files import unexpanduser as ux" ] }, { "cell_type": "code", "execution_count": 3, "id": "3f653f0b-f22c-4db2-8d5a-999382327ebc", "metadata": {}, "outputs": [], "source": [ "GH = os.path.expanduser(\"~/github\")\n", "BH = f\"{GH}/ETCBC/bhsa\"\n", "VERSION = \"2021\"\n", "SOURCE = f\"{BH}/tf/{VERSION}\"\n", "TARGET = f\"{BH}/tf/{VERSION}/_local\"" ] }, { "cell_type": "markdown", "id": "d723f88c-a459-42d2-9dc7-32a047ccabac", "metadata": {}, "source": [ "# Work and volumes\n", "\n", "We use the Hebrew Bible as *work*.\n", "The *volumes* of a work are lists of its top-level sections.\n", "Volumes may have a name.\n", "\n", "We define three volumes out of the smallest books of the bible:" ] }, { "cell_type": "code", "execution_count": 4, "id": "bb1a50a9-8f98-4dcc-9221-45a75e89d380", "metadata": {}, "outputs": [], "source": [ "VOLUMES = dict(\n", " tiny=(\"Obadiah\", \"Nahum\", \"Haggai\", \"Habakkuk\", \"Jonah\", \"Micah\"),\n", " small=(\"Malachi\", \"Joel\"),\n", " medium=(\"Ezra\",),\n", ")\n", "COLLECTION = \"prophets\"" ] }, { "cell_type": "markdown", "id": "b812a3f4-312d-42d6-937f-5145ea5a84f6", "metadata": {}, "source": [ "# Volume support\n", "\n", "We can work with works through several TF apis:\n", "\n", "* the usual, high-level API using `A = use(work)`.\n", "* the basic, low-level API using `TF = Fabric(locations, modules)`\n", "* as plain functions `extract()` and `collect()`.\n", "\n", "We show all ways of doing it." ] }, { "cell_type": "markdown", "id": "5acc9632-aad0-40d8-bab3-77d27acf3526", "metadata": {}, "source": [ "# High-level API `A = use()`\n", "\n", "If we load the BHSA with the advanced API, like `A = use(\"ETCBC/bhsa\", ...)`, we also get some standard modules,\n", "such as `phono` for phonological transcription and `parallels` for cross-references between similar passages.\n", "\n", "We see that when we split the BHSA into volumes we also get these features into the volumes." ] }, { "cell_type": "markdown", "id": "d1bf46a9-97e3-47fe-8531-727cb7df3581", "metadata": { "tags": [] }, "source": [ "## Load the work\n", "\n", "We load the BHSA in the advanced way:" ] }, { "cell_type": "code", "execution_count": 5, "id": "8b524c1a-33f8-476b-bbf2-2336b6cabac1", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/github/ETCBC/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 12.0.4, ETCBC/bhsa/app v3, Search Reference
\n", " Data: ETCBC - bhsa 2021, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
book3910938.21100
chapter929459.19100
lex923046.22100
verse2321318.38100
half_verse451799.44100
sentence637176.70100
sentence_atom645146.61100
clause881314.84100
clause_atom907044.70100
phrase2532031.68100
phrase_atom2675321.59100
subphrase1138501.4238
word4265901.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: ETCBC/bhsa
  3. appPath: /Users/me/github/ETCBC/bhsa/app
  4. commit: no value
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • absent
      • n/a
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: clone
  11. localDir: /Users/me/github/ETCBC/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: ETCBC
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: ETCBC
        • relative: /tf
        • repo: parallels
    • org: ETCBC
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: no value
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Aw = use(\"ETCBC/bhsa:clone\", checkout=\"clone\")" ] }, { "cell_type": "markdown", "id": "d53d2925-7d20-4b38-9bd7-b9960d16258f", "metadata": {}, "source": [ "We check that the features of interest are loaded:" ] }, { "cell_type": "code", "execution_count": 6, "id": "f45fdc47-4304-4b4d-a49f-a2c8f2cf5eeb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "crossref edge (int) 🆗 links between similar passages\n", "lex node (str) ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "phono node (str) 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n" ] } ], "source": [ "Aw.isLoaded(features=\"lex phono crossref\")" ] }, { "cell_type": "markdown", "id": "d9491683-5ea6-40b7-b290-6180ecc0a7b9", "metadata": {}, "source": [ "We can now extract volumes by using the `extract()` method on the `app` object\n", "which is held in the variable `Aw`.\n", "\n", "Note: we are going to load several volumes and collections too, so instead storing the\n", "handle to the API in a variable with the name `A`, we choose one with the name `Aw`." ] }, { "cell_type": "markdown", "id": "e962a2b4-7e27-466e-9c85-8a6d6638f9ed", "metadata": {}, "source": [ "## Extract volumes" ] }, { "cell_type": "code", "execution_count": 7, "id": "5a50a27c-4101-4475-bc52-38e5fb29a006", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Check volumes ...\n", " | Volume tiny exists and will be recreated\n", " | Volume small exists and will be recreated\n", " | Volume medium exists and will be recreated\n", " | Work consists of 39 books:\n", " | book Genesis : with 28764 slots\n", " | book Exodus : with 23748 slots\n", " | book Leviticus : with 17099 slots\n", " | book Numbers : with 23188 slots\n", " | book Deuteronomy : with 20128 slots\n", " | book Joshua : with 14526 slots\n", " | book Judges : with 14086 slots\n", " | book 1_Samuel : with 18929 slots\n", " | book 2_Samuel : with 15612 slots\n", " | book 1_Kings : with 18685 slots\n", " | book 2_Kings : with 17307 slots\n", " | book Isaiah : with 22931 slots\n", " | book Jeremiah : with 29736 slots\n", " | book Ezekiel : with 26182 slots\n", " | book Hosea : with 3146 slots\n", " | book Joel : with 1318 slots\n", " | book Amos : with 2780 slots\n", " | book Obadiah : with 392 slots\n", " | book Jonah : with 985 slots\n", " | book Micah : with 1895 slots\n", " | book Nahum : with 746 slots\n", " | book Habakkuk : with 897 slots\n", " | book Zephaniah : with 1037 slots\n", " | book Haggai : with 877 slots\n", " | book Zechariah : with 4471 slots\n", " | book Malachi : with 1187 slots\n", " | book Psalms : with 25372 slots\n", " | book Job : with 10912 slots\n", " | book Proverbs : with 8859 slots\n", " | book Ruth : with 1802 slots\n", " | book Song_of_songs : with 1682 slots\n", " | book Ecclesiastes : with 4233 slots\n", " | book Lamentations : with 1945 slots\n", " | book Esther : with 4621 slots\n", " | book Daniel : with 8072 slots\n", " | book Ezra : with 5268 slots\n", " | book Nehemiah : with 7842 slots\n", " | book 1_Chronicles : with 15566 slots\n", " | book 2_Chronicles : with 19764 slots\n", " 0.09s volumes ok\n", " 0.09s Distribute nodes over volumes ...\n", " | 0.00s volume tiny ...\n", " | | 0.00s book Obadiah with 392 slots\n", " | | 0.00s book Nahum with 746 slots\n", " | | 0.00s book Haggai with 877 slots\n", " | | 0.00s book Habakkuk with 897 slots\n", " | | 0.00s book Jonah with 985 slots\n", " | | 0.00s book Micah with 1895 slots\n", " | 0.01s volume tiny with 5792 slots and 21779 nodes ...\n", " | 0.01s volume small ...\n", " | | 0.00s book Malachi with 1187 slots\n", " | | 0.00s book Joel with 1318 slots\n", " | 0.01s volume small with 2505 slots and 9495 nodes ...\n", " | 0.01s volume medium ...\n", " | | 0.00s book Ezra with 5268 slots\n", " | 0.02s volume medium with 5268 slots and 17286 nodes ...\n", " 0.11s distribution done\n", " 0.11s Remap features ...\n", " | 0.00s volume tiny with 21779 nodes ...\n", " | 0.17s volume small with 9495 nodes ...\n", " | 0.24s volume medium with 17286 nodes ...\n", " 0.45s remapping done\n", " 0.45s Write volumes as TF datasets\n", " | 0.00s Writing volume tiny\n", " | 0.14s Writing volume small\n", " | 0.20s Writing volume medium\n", " 0.77s writing done\n", " 0.77s All done\n" ] } ], "source": [ "volumes = Aw.extract(VOLUMES, overwrite=True)" ] }, { "cell_type": "markdown", "id": "12d6f4f2-91ca-48c5-91cb-b448503948e0", "metadata": {}, "source": [ "## Inspect the volumes\n", "\n", "The `extract()` method returns basic information about the volumes:\n", "their location on disk." ] }, { "cell_type": "code", "execution_count": 8, "id": "22fb91db-80c4-4e59-b7a8-d57d8b88f48d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "volume medium : (new) at ~/github/ETCBC/bhsa/tf/2021/_local/medium\n", "volume small : (new) at ~/github/ETCBC/bhsa/tf/2021/_local/small\n", "volume tiny : (new) at ~/github/ETCBC/bhsa/tf/2021/_local/tiny\n" ] } ], "source": [ "if volumes:\n", " for (name, info) in volumes.items():\n", " loc = info[\"location\"]\n", " new = \"(new) \" if info[\"new\"] else \"(existing)\"\n", " print(f\"volume {name:<7}: {new} at {ux(loc)}\")\n", "else:\n", " print(volumes)" ] }, { "cell_type": "markdown", "id": "beb412a5-a72f-47d2-88ed-3b8d5b1c3395", "metadata": { "tags": [] }, "source": [ "## Load single volumes\n", "\n", "We load the volumes separately.\n", "For each volume we get a handle, which we store in a dictionary `As`, keyed by its name." ] }, { "cell_type": "code", "execution_count": 10, "id": "1c2dc49f-db40-485d-90e2-e15410b50e29", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/github/ETCBC/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 12.0.4, ETCBC/bhsa/app v3, Search Reference
\n", " Data: ETCBC - bhsa 2021 volume medium:Ezra, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
book15268.00100
chapter10526.80100
verse28018.81100
sentence49110.73100
half_verse49210.71100
sentence_atom50610.41100
clause8246.39100
clause_atom8706.06100
lex9915.32100
phrase23852.21100
phrase_atom27301.93100
subphrase24381.4065
word52681.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "ointerfrom\n", "
\n", "
str
\n", "\n", " all outgoing inter-volume edges\n", "\n", "
\n", "\n", "
\n", "
\n", "ointerto\n", "
\n", "
str
\n", "\n", " all incoming inter-volume edges\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "owork\n", "
\n", "
int
\n", "\n", " mapping from nodes in the volume to nodes in the work\n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: ETCBC/bhsa
  3. appPath: /Users/me/github/ETCBC/bhsa/app
  4. commit: no value
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • absent
      • n/a
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: clone
  11. localDir: /Users/me/github/ETCBC/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: ETCBC
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: ETCBC
        • relative: /tf
        • repo: parallels
    • org: ETCBC
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: no value
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/github/ETCBC/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 12.0.4, ETCBC/bhsa/app v3, Search Reference
\n", " Data: ETCBC - bhsa 2021 volume small:Malachi-Joel, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
book21252.50100
chapter7357.86100
verse12819.57100
half_verse2539.90100
sentence4505.57100
sentence_atom4615.43100
clause5824.30100
lex5874.27100
clause_atom6004.17100
phrase16411.53100
phrase_atom16811.49100
subphrase5981.3632
word25051.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "ointerfrom\n", "
\n", "
str
\n", "\n", " all outgoing inter-volume edges\n", "\n", "
\n", "\n", "
\n", "
\n", "ointerto\n", "
\n", "
str
\n", "\n", " all incoming inter-volume edges\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "owork\n", "
\n", "
int
\n", "\n", " mapping from nodes in the volume to nodes in the work\n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: ETCBC/bhsa
  3. appPath: /Users/me/github/ETCBC/bhsa/app
  4. commit: no value
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • absent
      • n/a
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: clone
  11. localDir: /Users/me/github/ETCBC/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: ETCBC
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: ETCBC
        • relative: /tf
        • repo: parallels
    • org: ETCBC
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: no value
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/github/ETCBC/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 12.0.4, ETCBC/bhsa/app v3, Search Reference
\n", " Data: ETCBC - bhsa 2021 volume tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
book6965.33100
chapter20289.60100
verse31518.39100
half_verse6239.30100
sentence10325.61100
sentence_atom10465.54100
lex11734.94100
clause13994.14100
clause_atom14264.06100
phrase37741.53100
phrase_atom39111.48100
subphrase12621.3028
word57921.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "ointerfrom\n", "
\n", "
str
\n", "\n", " all outgoing inter-volume edges\n", "\n", "
\n", "\n", "
\n", "
\n", "ointerto\n", "
\n", "
str
\n", "\n", " all incoming inter-volume edges\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "owork\n", "
\n", "
int
\n", "\n", " mapping from nodes in the volume to nodes in the work\n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: ETCBC/bhsa
  3. appPath: /Users/me/github/ETCBC/bhsa/app
  4. commit: no value
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • absent
      • n/a
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: clone
  11. localDir: /Users/me/github/ETCBC/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: ETCBC
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: ETCBC
        • relative: /tf
        • repo: parallels
    • org: ETCBC
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: no value
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "As = {}\n", "\n", "for name in volumes:\n", " As[name] = use(\"ETCBC/bhsa:clone\", checkout=\"clone\", version=\"2021\", volume=name)" ] }, { "cell_type": "markdown", "id": "b252e906-2e03-4fef-8a84-b6bf4c60636f", "metadata": {}, "source": [ "We see it reported that single volumes have been loaded instead of the whole work.\n", "\n", "The volume info can be obtained separately by reading the attribute `volumeInfo`,\n", "either on the `A` or on the `TF` object:" ] }, { "cell_type": "code", "execution_count": 11, "id": "59d693ed-deaf-43ac-9c2a-38dfb476e88c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "medium:Ezra\n", "small:Malachi-Joel\n", "tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah\n" ] } ], "source": [ "for name in volumes:\n", " print(As[name].volumeInfo)" ] }, { "cell_type": "markdown", "id": "6bd5ab65-fabc-4f7f-a1cb-1e515c0d2c72", "metadata": {}, "source": [ "## Generated features\n", "\n", "When volumes are created, some extra features are generated, which have to do with the relation\n", "between the original work and the volume, and what happens at the boundaries of volumes." ] }, { "cell_type": "code", "execution_count": 12, "id": "2af3be55-4bd3-4216-819e-ecc8f12650cc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "medium\n", "\towork: mapping from nodes in the volume to nodes in the work\n", "\tointerfrom: all outgoing inter-volume edges\n", "\tointerto: all incoming inter-volume edges\n", "small\n", "\towork: mapping from nodes in the volume to nodes in the work\n", "\tointerfrom: all outgoing inter-volume edges\n", "\tointerto: all incoming inter-volume edges\n", "tiny\n", "\towork: mapping from nodes in the volume to nodes in the work\n", "\tointerfrom: all outgoing inter-volume edges\n", "\tointerto: all incoming inter-volume edges\n" ] } ], "source": [ "for name in volumes:\n", " print(name)\n", " for (feat, info) in As[name].isLoaded(\"owork ointerfrom ointerto\", pretty=False).items():\n", " print(f\"\\t{feat}: {info['meta']['description']}\")" ] }, { "cell_type": "markdown", "id": "7d2587f5-8cfa-4036-b96d-4d5d149acf44", "metadata": { "incorrectly_encoded_metadata": "jp-MarkdownHeadingCollapsed=true", "tags": [] }, "source": [ "### `owork`\n", "\n", "Note that each volume has an extra feature: `owork`. Its value for each node in a volume dataset\n", "is the corresponding node in the *original work* from which the volume is taken.\n", "\n", "If you use the volume to compute annotations,\n", "and you want to publish these annotations against the original work,\n", "the feature `owork` provides the necessary information to do so.\n", "\n", "Suppose `annotvx` is a dict, mapping some nodes in the volume `x` to interesting values,\n", "then you apply them to the original work as follows\n", "\n", "``` python\n", "\n", "{F.owork.v(n): value for (n, value) in annotvx.items}\n", "```\n", "\n", "There is another important function of `owork`: when collecting volumes, we may encounter nodes in the volumes\n", "that come from a single node in the work. We want to *merge* these nodes in the collected work.\n", "The information in `owork` provides the necessary information for that." ] }, { "cell_type": "markdown", "id": "3bb04dbb-7bee-4510-bd42-ded39c2783d7", "metadata": {}, "source": [ "### `ointerto`, `ointerfrom`\n", "\n", "Note that we do have features `ointerto` and `ointerfrom`.\n", "\n", "We'll come back to them later." ] }, { "cell_type": "markdown", "id": "a169bb41-fb51-487b-a080-0ecbb5dacdc6", "metadata": {}, "source": [ "## Make collections of volumes\n", "\n", "We can collect volumes into new works by means of the `collect()` method on `Aw`.\n", "Let's collect all volumes just created." ] }, { "cell_type": "code", "execution_count": 13, "id": "aaaba04e-3ab1-4ff1-a69d-f6ea9ac52597", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collection prophets exists and will be recreated\n", " 0.00s Loading volume medium from ~/github/ETCBC/bhsa/tf/2021/_local/medium ...\n", " 0.03s Feature overview: 85 for nodes; 3 for edges; 2 configs; 9 computed\n", " 0.05s Loading volume small from ~/github/ETCBC/bhsa/tf/2021/_local/small ...\n", " 0.02s Feature overview: 85 for nodes; 3 for edges; 2 configs; 9 computed\n", " 0.08s Loading volume tiny from ~/github/ETCBC/bhsa/tf/2021/_local/tiny ...\n", " 0.04s Feature overview: 85 for nodes; 3 for edges; 2 configs; 9 computed\n", " 0.14s inspect metadata ...\n", " 0.14s metadata sorted out\n", " 0.14s check nodetypes ...\n", " | volume medium\n", " | volume small\n", " | volume tiny\n", " 0.14s node types ok\n", " 0.14s Collect nodes from volumes ...\n", " | 0.00s Check against overlapping slots ...\n", " | | medium : 5268 slots\n", " | | small : 2505 slots\n", " | | tiny : 5792 slots\n", " | 0.01s no overlap\n", " | 0.01s Group non-slot nodes by type\n", " | | medium : 5269- 17286\n", " | | small : 2506- 9495\n", " | | tiny : 5793- 21779\n", " | 0.01s Mapping nodes from volume to/from work ...\n", " | | book : 13566 - 13574\n", " | | chapter : 13575 - 13611\n", " | | clause : 13612 - 16416\n", " | | clause_atom : 16417 - 19312\n", " | | half_verse : 19313 - 20680\n", " | | phrase : 20681 - 28480\n", " | | phrase_atom : 28481 - 36802\n", " | | sentence : 36803 - 38775\n", " | | sentence_atom : 38776 - 40788\n", " | | subphrase : 40789 - 45086\n", " | | verse : 45087 - 45809\n", " | | lex : 45810 - 47884\n", " | 0.02s The new work has 47884 nodes of which 13565 slots\n", " 0.17s collection done\n", " 0.17s remap features ...\n", " 0.42s remapping done\n", " 0.42s write work as TF data set\n", " 0.72s writing done\n", " 0.72s done\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Aw.collect(\n", " tuple(volumes),\n", " COLLECTION,\n", " overwrite=True,\n", ")" ] }, { "cell_type": "markdown", "id": "1d277f74-b798-4f07-8b79-1a87c7011d27", "metadata": {}, "source": [ "## Load collection\n", "\n", "We can load the collection in the same way as a volume, but now using `collection=`:" ] }, { "cell_type": "code", "execution_count": 14, "id": "918191c1-2563-43ad-9a4b-b40937602d4f", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/github/ETCBC/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/ETCBC/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " | 0.03s T otype from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.30s T oslots from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@ar from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@he from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T qere_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T qere from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T chapter from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T phono from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T g_word from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@ur from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@yo from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@pt from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T verse from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@en from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@am from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T trailer from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@tr from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T g_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@da from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T phono_trailer from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@el from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T voc_lex_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T qere_trailer from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T qere_trailer_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@bn from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@hi from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@ru from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T lex_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@de from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@fa from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T g_word_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@ja from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T g_lex_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@nl from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@id from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@syc from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T g_cons_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@fr from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@pa from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@es from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T g_cons from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@sw from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T trailer_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@zh from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@ko from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T book@la from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | | 0.01s C __levels__ from otype, oslots, otext\n", " | | 0.23s C __order__ from otype, oslots, __levels__\n", " | | 0.01s C __rank__ from otype, __order__\n", " | | 0.50s C __levUp__ from otype, oslots, __rank__\n", " | | 0.32s C __levDown__ from otype, __levUp__, __rank__\n", " | | 0.04s C __characters__ from otext\n", " | | 0.10s C __boundary__ from otype, oslots, __rank__\n", " | | 0.00s C __sections__ from otype, oslots, otext, __levUp__, __levels__, book, chapter, verse\n", " | 0.01s T code from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T crossref from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T det from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.01s T domain from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T freq_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.02s T function from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T gloss from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T gn from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.01s T label from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T language from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T ls from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T mother from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.00s T nametype from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T nme from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T nu from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.07s T number from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.13s T ovolume from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.09s T owork from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.01s T pargr from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T pdp from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T pfm from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T prs from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T prs_gn from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T prs_nu from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T prs_ps from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T ps from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T rank_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.05s T rela from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T sp from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T st from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.01s T tab from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.01s T txt from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.05s T typ from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T uvf from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T vbe from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T vbs from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.04s T voc_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T vs from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n", " | 0.03s T vt from ~/github/ETCBC/bhsa/tf/2021/_local/prophets\n" ] }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 12.0.4, ETCBC/bhsa/app v3, Search Reference
\n", " Data: ETCBC - bhsa 2021 collection prophets:medium,small,tiny, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
book91507.22100
chapter37366.62100
verse72318.76100
half_verse13689.92100
sentence19736.88100
sentence_atom20136.74100
clause28054.84100
clause_atom28964.68100
lex20753.3852
phrase78001.74100
phrase_atom83221.63100
subphrase42981.3643
word135651.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "ovolume\n", "
\n", "
str
\n", "\n", " mapping from a node in the work to the volume it comes from and its corresponding node there\n", "\n", "
\n", "\n", "
\n", "
\n", "owork\n", "
\n", "
int
\n", "\n", " mapping from nodes in the volume to nodes in the work\n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: ETCBC/bhsa
  3. appPath: /Users/me/github/ETCBC/bhsa/app
  4. commit: no value
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • absent
      • n/a
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: clone
  11. localDir: /Users/me/github/ETCBC/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: ETCBC
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: ETCBC
        • relative: /tf
        • repo: parallels
    • org: ETCBC
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: no value
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Ac = use(\"ETCBC/bhsa:clone\", checkout=\"clone\", version=\"2021\", collection=COLLECTION)" ] }, { "cell_type": "markdown", "id": "c020f6d5-4041-4627-a7bb-d17ed42019c7", "metadata": {}, "source": [ "Which books have we got?" ] }, { "cell_type": "code", "execution_count": 15, "id": "abca1543-4af7-4c5b-bbbd-66169be6ff18", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ezra\n", "Malachi\n", "Joel\n", "Obadiah\n", "Nahum\n", "Haggai\n", "Habakkuk\n", "Jonah\n", "Micah\n" ] } ], "source": [ "Fc = Ac.api.F\n", "Tc = Ac.api.T\n", "\n", "for b in Fc.otype.s(\"book\"):\n", " print(Tc.sectionFromNode(b)[0])" ] }, { "cell_type": "code", "execution_count": 16, "id": "9ab45623-5739-43b5-81d1-57c3f4d5ed58", "metadata": {}, "outputs": [ { "data": { "text/plain} lexeme nodes\")\n", "print(f\"{'Total':<10} {total:>5} lexeme nodes\")" ] }, { "cell_type": "markdown", "id": "76b37442-4a4c-419f-b103-6517dfd91230", "metadata": {}, "source": [ "Now let's count the lexemes in the new collection." ] }, { "cell_type": "code", "execution_count": 20, "id": "19050ee0-78eb-43fd-8002-1b1a25fe544e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2075" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lexNodesCollection = Fc.otype.s(\"lex\")\n", "len(lexNodesCollection)" ] }, { "cell_type": "markdown", "id": "d37c95d8-571a-439f-94ca-99c499cb57f6", "metadata": {}, "source": [ "Exactly the same amount as in the original work.\n", "\n", "Let's make absolutely sure that we have the same lexeme set:" ] }, { "cell_type": "code", "execution_count": 21, "id": "b1211b23-8c6a-4fad-b947-900bb59902ec", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lexNodesWork == lexNodesCollection" ] }, { "cell_type": "markdown", "id": "a065160b-fc08-4f6f-877c-570f38706ff5", "metadata": {}, "source": [ "Of course, because the node numbers in the original work are almost guaranteed to be different from the node numbers in the collection.\n", "\n", "But the information attached to the nodes in the collection should be identical to the information attached to the\n", "corresponding nodes in the work." ] }, { "cell_type": "code", "execution_count": 22, "id": "25c3e8d5-98a3-43e3-9809-5ae91c120403", "metadata": {}, "outputs": [], "source": [ "lexemesWork = {Fw.lex.v(n) for n in lexNodesWork}\n", "lexemesCollection = {Fc.lex.v(n) for n in lexNodesCollection}" ] }, { "cell_type": "code", "execution_count": 23, "id": "5f618657-05b9-4577-a839-2d2cd355936b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lexemesWork == lexemesCollection" ] }, { "cell_type": "markdown", "id": "ef3424a9-e4ae-449a-910a-49e320557292", "metadata": {}, "source": [ "Another way of verifying this is to map the lexeme nodes of the collection back to those of the work\n", "and see whether they are equal sets." ] }, { "cell_type": "code", "execution_count": 24, "id": "3ac6d52b-7228-4d4a-bdfd-043415986b70", "metadata": {}, "outputs": [], "source": [ "lexNodesCollectionToWork = {Fc.owork.v(n) for n in lexNodesCollection}" ] }, { "cell_type": "code", "execution_count": 25, "id": "0fedeace-212c-45f4-81ca-c2cfed0a3fee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lexNodesWork == lexNodesCollectionToWork" ] }, { "cell_type": "markdown", "id": "7d288344-e135-4803-b3f9-fa620d9f6b2c", "metadata": {}, "source": [ "## Check: `crossrefs`\n", "\n", "The edge feature `crossref` has inter-volume edges.\n", "\n", "We explore the situation in the original work, inside the volumes, and in the new collection.\n", "\n", "We count the incoming and outgoing edges w.r.t. the nodes in the relevant material.\n", "\n", "`crossref` edges are between verses, so we first collect all relevant verses in the original work.\n", "\n", "We want the verses in all the books of all the volumes, and we want those verses per volume." ] }, { "cell_type": "code", "execution_count": 26, "id": "795869b2-ba13-4b11-a2b4-3cfc6431af4c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'all': {'Ezra',\n", " 'Habakkuk',\n", " 'Haggai',\n", " 'Joel',\n", " 'Jonah',\n", " 'Malachi',\n", " 'Micah',\n", " 'Nahum',\n", " 'Obadiah'},\n", " 'tiny': {'Habakkuk', 'Haggai', 'Jonah', 'Micah', 'Nahum', 'Obadiah'},\n", " 'small': {'Joel', 'Malachi'},\n", " 'medium': {'Ezra'}}" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "books = dict(all=set())\n", "for (name, parts) in VOLUMES.items():\n", " partsSet = set(parts)\n", " books[name] = partsSet\n", " books[\"all\"] |= partsSet\n", "books" ] }, { "cell_type": "code", "execution_count": 27, "id": "5a53a146-8c43-4d2d-b97b-0a10e436c0bb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "all 723 verses\n", "tiny 315 verses\n", "small 128 verses\n", "medium 280 verses\n" ] } ], "source": [ "verseNodesWork = {}\n", "\n", "for (name, heads) in books.items():\n", " for b in Fw.otype.s(\"book\"):\n", " if Tw.sectionFromNode(b)[0] not in heads:\n", " continue\n", " for vs in Lw.d(b, otype=\"verse\"):\n", " verseNodesWork.setdefault(name, set()).add(vs)\n", "\n", "for (name, verses) in verseNodesWork.items():\n", " print(f\"{name:<10} {len(verses):>3} verses\")" ] }, { "cell_type": "markdown", "id": "9d90c888-f1ad-4313-a781-5c1212a75eb3", "metadata": {}, "source": [ "### Compute edges from the work data\n", "\n", "Now we determine the number of incoming and outgoing edges w.r.t. these portions,\n", "and we split them into *inter*-portion and *intra*-portion edges." ] }, { "cell_type": "code", "execution_count": 28, "id": "4b41a591-9358-45dc-9c64-8c1c50048485", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "all : total: incoming: 400 outgoing: 400\n", "all : intra: incoming: 64 outgoing: 64\n", "all : inter: incoming: 336 outgoing: 336\n", "tiny : total: incoming: 245 outgoing: 245\n", "tiny : intra: incoming: 8 outgoing: 8\n", "tiny : inter: incoming: 237 outgoing: 237\n", "small : total: incoming: 3 outgoing: 3\n", "small : intra: incoming: 0 outgoing: 0\n", "small : inter: incoming: 3 outgoing: 3\n", "medium : total: incoming: 152 outgoing: 152\n", "medium : intra: incoming: 56 outgoing: 56\n", "medium : inter: incoming: 96 outgoing: 96\n" ] } ], "source": [ "Ew = Aw.api.E\n", "\n", "incomingWorkTotal = {}\n", "incomingWorkIntra = {}\n", "incomingWorkInter = {}\n", "outgoingWorkTotal = {}\n", "outgoingWorkIntra = {}\n", "outgoingWorkInter = {}\n", "\n", "for (name, verses) in verseNodesWork.items():\n", " inct = set()\n", " inca = set()\n", " incr = set()\n", " ougt = set()\n", " ouga = set()\n", " ougr = set()\n", "\n", " for vs in verses:\n", " wvs = Ew.crossref.t(vs)\n", " if wvs:\n", " for wv in wvs:\n", " ws = wv[0]\n", " inct.add((ws, vs))\n", " if ws in verses:\n", " inca.add((ws, vs))\n", " else:\n", " incr.add((ws, vs))\n", "\n", " wvs = Ew.crossref.f(vs)\n", " if wvs:\n", " for wv in wvs:\n", " ws = wv[0]\n", " ougt.add((vs, ws))\n", " if ws in verses:\n", " ouga.add((vs, ws))\n", " else:\n", " ougr.add((vs, ws))\n", " incomingWorkTotal[name] = inct\n", " incomingWorkIntra[name] = inca\n", " incomingWorkInter[name] = incr\n", " outgoingWorkTotal[name] = ougt\n", " outgoingWorkIntra[name] = ouga\n", " outgoingWorkInter[name] = ougr\n", "\n", "for name in verseNodesWork:\n", " print(f\"{name:<10}: total: incoming: {len(incomingWorkTotal[name]):>3} outgoing: {len(outgoingWorkTotal[name]):>3}\")\n", " print(f\"{name:<10}: intra: incoming: {len(incomingWorkIntra[name]):>3} outgoing: {len(outgoingWorkIntra[name]):>3}\")\n", " print(f\"{name:<10}: inter: incoming: {len(incomingWorkInter[name]):>3} outgoing: {len(outgoingWorkInter[name]):>3}\")" ] }, { "cell_type": "markdown", "id": "61c2378a-d528-4360-9af8-834ff7708c46", "metadata": {}, "source": [ "Ah, the `crossref` edges are symmetric, so there are as many incoming as outgoing edges." ] }, { "cell_type": "markdown", "id": "3d1181f4-7363-486b-9950-05e00c3d7fd8", "metadata": {}, "source": [ "### Compute edges from the volume data\n", "\n", "We only see the intra edges, they should coincide with the `incomingWorkIntra[volume]` edges.\n", "\n", "First the number of edges:" ] }, { "cell_type": "code", "execution_count": 29, "id": "47d005ee-0fbd-4685-92f3-ac3693302b35", "metadata": {}, "outputs": [], "source": [ "incomingVolumeTotal = {}\n", "outgoingVolumeTotal = {}\n", "\n", "for name in volumes:\n", " Av = As[name]\n", " Fv = Av.api.F\n", " Ev = Av.api.E\n", "\n", " verses = Fv.otype.s(\"verse\")\n", " inct = set()\n", " ougt = set()\n", "\n", " for vs in verses:\n", " wvs = Ev.crossref.t(vs)\n", " if wvs:\n", " for wv in wvs:\n", " ws = wv[0]\n", " inct.add((ws, vs))\n", "\n", " wvs = Ev.crossref.f(vs)\n", " if wvs:\n", " for wv in wvs:\n", " ws = wv[0]\n", " ougt.add((vs, ws))\n", " incomingVolumeTotal[name] = inct\n", " outgoingVolumeTotal[name] = ougt" ] }, { "cell_type": "markdown", "id": "f05b3ec0-0d4e-4b2e-a6f1-853c19ae8c89", "metadata": {}, "source": [ "We have gathered the data.\n", "\n", "Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping." ] }, { "cell_type": "code", "execution_count": 30, "id": "13bf5377-c622-47a9-afec-9a434ef8d8b3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "medium : total: incoming: 56 outgoing: 56\n", "equal amount of incoming inter-edges as in work? True\n", "equal amount of outgoing inter-edges as in work? True\n", "same incoming inter-edges as in work? True\n", "same outgoing inter-edges as in work? True\n", "small : total: incoming: 0 outgoing: 0\n", "equal amount of incoming inter-edges as in work? True\n", "equal amount of outgoing inter-edges as in work? True\n", "same incoming inter-edges as in work? True\n", "same outgoing inter-edges as in work? True\n", "tiny : total: incoming: 8 outgoing: 8\n", "equal amount of incoming inter-edges as in work? True\n", "equal amount of outgoing inter-edges as in work? True\n", "same incoming inter-edges as in work? True\n", "same outgoing inter-edges as in work? True\n" ] } ], "source": [ "for name in volumes:\n", " Av = As[name]\n", " Fv = Av.api.F\n", "\n", " inVolTotal = incomingVolumeTotal[name]\n", " outVolTotal = outgoingVolumeTotal[name]\n", " inWorkIntra = incomingWorkIntra[name]\n", " outWorkIntra = outgoingWorkIntra[name]\n", "\n", " print(f\"{name:<10}: total: incoming: {len(inVolTotal):>3} outgoing: {len(outVolTotal):>3}\")\n", " eqamountIncoming = len(inWorkIntra) == len(inVolTotal)\n", " eqamountOutgoing = len(outWorkIntra) == len(outVolTotal)\n", " print(f\"equal amount of incoming inter-edges as in work? {eqamountIncoming}\")\n", " print(f\"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}\")\n", " inVolToWork = {(Fv.owork.v(f), Fv.owork.v(t)) for (f, t) in inVolTotal}\n", " outVolToWork = {(Fv.owork.v(f), Fv.owork.v(t)) for (f, t) in outVolTotal}\n", " sameIncoming = inWorkIntra == inVolToWork\n", " sameOutgoing = outWorkIntra == outVolToWork\n", " print(f\"same incoming inter-edges as in work? {sameIncoming}\")\n", " print(f\"same outgoing inter-edges as in work? {sameOutgoing}\")" ] }, { "cell_type": "markdown", "id": "2239bb09-4413-49ee-a5e6-125e90fcaa4d", "metadata": {}, "source": [ "### Compute edges from collection data\n", "\n", "The final test is whether the collection has the right edges.\n", "When the collection was created, inter-volume edges have been added on the basis of the `ointerto` and `ointerfrom` features\n", "in the individual volumes.\n", "\n", "Now we check whether that went well." ] }, { "cell_type": "code", "execution_count": 31, "id": "13a1d420-0fc0-4d8b-bfa5-9710ad8492e8", "metadata": {}, "outputs": [], "source": [ "Ec = Ac.api.E\n", "\n", "verses = Fc.otype.s(\"verse\")\n", "inct = set()\n", "ougt = set()\n", "\n", "for vs in verses:\n", " wvs = Ec.crossref.t(vs)\n", " if wvs:\n", " for wv in wvs:\n", " ws = wv[0]\n", " inct.add((ws, vs))\n", "\n", " wvs = Ec.crossref.f(vs)\n", " if wvs:\n", " for wv in wvs:\n", " ws = wv[0]\n", " ougt.add((vs, ws))\n", "\n", "incomingCollectionTotal = inct\n", "outgoingCollectionTotal = ougt" ] }, { "cell_type": "markdown", "id": "5c53601e-cc94-4ed8-928e-93df67c0375f", "metadata": {}, "source": [ "We have gathered the data.\n", "\n", "Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping." ] }, { "cell_type": "code", "execution_count": 32, "id": "0a0facc4-238c-410a-bf4a-a8f3122b771e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "collection: total: incoming: 64 outgoing: 64\n", "equal amount of incoming inter-edges as in work? True\n", "equal amount of outgoing inter-edges as in work? True\n", "same incoming inter-edges as in work? True\n", "same outgoing inter-edges as in work? True\n" ] } ], "source": [ "inColTotal = incomingCollectionTotal\n", "outColTotal = outgoingCollectionTotal\n", "inWorkIntra = incomingWorkIntra[\"all\"]\n", "outWorkIntra = outgoingWorkIntra[\"all\"]\n", "\n", "print(f\"collection: total: incoming: {len(inColTotal):>3} outgoing: {len(outColTotal):>3}\")\n", "eqamountIncoming = len(inWorkIntra) == len(inColTotal)\n", "eqamountOutgoing = len(outWorkIntra) == len(outColTotal)\n", "print(f\"equal amount of incoming inter-edges as in work? {eqamountIncoming}\")\n", "print(f\"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}\")\n", "inColToWork = {(Fc.owork.v(f), Fc.owork.v(t)) for (f, t) in inColTotal}\n", "outColToWork = {(Fc.owork.v(f), Fc.owork.v(t)) for (f, t) in outColTotal}\n", "sameIncoming = inWorkIntra == inColToWork\n", "sameOutgoing = outWorkIntra == outColToWork\n", "print(f\"same incoming inter-edges as in work? {sameIncoming}\")\n", "print(f\"same outgoing inter-edges as in work? {sameOutgoing}\")" ] }, { "cell_type": "markdown", "id": "e13e3f9b-b6a0-4f3a-96d5-6062ca518a98", "metadata": {}, "source": [ "# Success!\n", "\n", "We have seen that when we take a collection of volumes\n", "the identification of lexeme nodes of the same lexeme across volumes\n", "works out perfectly.\n", "\n", "The collection of inter-volume edges works!" ] }, { "cell_type": "markdown", "id": "02995699-5cd1-4ecc-aabd-53df64bf9503", "metadata": {}, "source": [ "# Low-level API `TF=Fabric()`\n", "\n", "We now load the data through `Fabric()`.\n", "\n", "You do not have to load the work before extracting volumes, but you may do so.\n", "The advantage of pre-loading is that after the extraction of volumes you still have\n", "a handle to the work." ] }, { "cell_type": "code", "execution_count": 33, "id": "7d02bd4c-06ad-4b75-acc5-9c3b1347adc7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 2.20s Feature overview: 109 for nodes; 6 for edges; 1 configs; 9 computed\n" ] }, { "data": { "text/plain": [ "[('Computed',\n", " 'computed-data',\n", " ('C Computed', 'Call AllComputeds', 'Cs ComputedString')),\n", " ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')),\n", " ('Fabric', 'loading', ('TF',)),\n", " ('Locality', 'locality', ('L Locality',)),\n", " ('Nodes', 'navigating-nodes', ('N Nodes',)),\n", " ('Features',\n", " 'node-features',\n", " ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')),\n", " ('Search', 'search', ('S Search',)),\n", " ('Text', 'text', ('T Text',))]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TFw = Fabric(locations=SOURCE)\n", "apiw = TFw.loadAll()\n", "apiw.makeAvailableIn(globals())" ] }, { "cell_type": "markdown", "id": "3583e620-d35e-4cca-8e42-01e905910a2e", "metadata": {}, "source": [ "## Extract\n", "\n", "We use the same specification as before." ] }, { "cell_type": "code", "execution_count": 34, "id": "fbed036e-2ba8-4348-9618-d103498caedc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Check volumes ...\n", " | Volume tiny exists and will be recreated\n", " | Volume small exists and will be recreated\n", " | Volume medium exists and will be recreated\n", " | Work consists of 39 books:\n", " | book Genesis : with 28764 slots\n", " | book Exodus : with 23748 slots\n", " | book Leviticus : with 17099 slots\n", " | book Numbers : with 23188 slots\n", " | book Deuteronomy : with 20128 slots\n", " | book Joshua : with 14526 slots\n", " | book Judges : with 14086 slots\n", " | book 1_Samuel : with 18929 slots\n", " | book 2_Samuel : with 15612 slots\n", " | book 1_Kings : with 18685 slots\n", " | book 2_Kings : with 17307 slots\n", " | book Isaiah : with 22931 slots\n", " | book Jeremiah : with 29736 slots\n", " | book Ezekiel : with 26182 slots\n", " | book Hosea : with 3146 slots\n", " | book Joel : with 1318 slots\n", " | book Amos : with 2780 slots\n", " | book Obadiah : with 392 slots\n", " | book Jonah : with 985 slots\n", " | book Micah : with 1895 slots\n", " | book Nahum : with 746 slots\n", " | book Habakkuk : with 897 slots\n", " | book Zephaniah : with 1037 slots\n", " | book Haggai : with 877 slots\n", " | book Zechariah : with 4471 slots\n", " | book Malachi : with 1187 slots\n", " | book Psalms : with 25372 slots\n", " | book Job : with 10912 slots\n", " | book Proverbs : with 8859 slots\n", " | book Ruth : with 1802 slots\n", " | book Song_of_songs : with 1682 slots\n", " | book Ecclesiastes : with 4233 slots\n", " | book Lamentations : with 1945 slots\n", " | book Esther : with 4621 slots\n", " | book Daniel : with 8072 slots\n", " | book Ezra : with 5268 slots\n", " | book Nehemiah : with 7842 slots\n", " | book 1_Chronicles : with 15566 slots\n", " | book 2_Chronicles : with 19764 slots\n", " 0.09s volumes ok\n", " 0.09s Distribute nodes over volumes ...\n", " | 0.00s volume tiny ...\n", " | | 0.00s book Obadiah with 392 slots\n", " | | 0.00s book Nahum with 746 slots\n", " | | 0.00s book Haggai with 877 slots\n", " | | 0.00s book Habakkuk with 897 slots\n", " | | 0.00s book Jonah with 985 slots\n", " | | 0.00s book Micah with 1895 slots\n", " | 0.01s volume tiny with 5792 slots and 21779 nodes ...\n", " | 0.01s volume small ...\n", " | | 0.00s book Malachi with 1187 slots\n", " | | 0.00s book Joel with 1318 slots\n", " | 0.01s volume small with 2505 slots and 9495 nodes ...\n", " | 0.01s volume medium ...\n", " | | 0.00s book Ezra with 5268 slots\n", " | 0.02s volume medium with 5268 slots and 17286 nodes ...\n", " 0.11s distribution done\n", " 0.11s Remap features ...\n", " | 0.00s volume tiny with 21779 nodes ...\n", " | 0.25s volume small with 9495 nodes ...\n", " | 0.35s volume medium with 17286 nodes ...\n", " 0.60s remapping done\n", " 0.60s Write volumes as TF datasets\n", " | 0.00s Writing volume tiny\n", " | 0.20s Writing volume small\n", " | 0.30s Writing volume medium\n", " 1.07s writing done\n", " 1.07s All done\n" ] } ], "source": [ "volumes = TFw.extract(VOLUMES, overwrite=True)" ] }, { "cell_type": "markdown", "id": "ca8761a2-4836-4ff2-949c-f74d4b32e679", "metadata": {}, "source": [ "## Inspect" ] }, { "cell_type": "code", "execution_count": 35, "id": "75805bbf-7fcb-41a0-bc6f-589c027279d6", "metadata": {}, "outputs": [], "source": [ "TFs = {}\n", "\n", "for name in volumes:\n", " TFs[name] = Fabric(locations=SOURCE, volume=name)\n", " TFs[name].loadAll(silent=\"deep\")" ] }, { "cell_type": "code", "execution_count": 36, "id": "765da125-cd8c-4d57-85d4-e9940ff1544b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "medium:Ezra\n", " owork : 17286 values\n", " mapping from nodes in the volume to nodes in the work\n", " ointerfrom: 0 values\n", " all outgoing inter-volume edges\n", " ointerto : 0 values\n", " all incoming inter-volume edges\n", "small:Malachi-Joel\n", " owork : 9495 values\n", " mapping from nodes in the volume to nodes in the work\n", " ointerfrom: 0 values\n", " all outgoing inter-volume edges\n", " ointerto : 0 values\n", " all incoming inter-volume edges\n", "tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah\n", " owork : 21779 values\n", " mapping from nodes in the volume to nodes in the work\n", " ointerfrom: 0 values\n", " all outgoing inter-volume edges\n", " ointerto : 0 values\n", " all incoming inter-volume edges\n" ] } ], "source": [ "for name in volumes:\n", " TFv = TFs[name]\n", " Fsv = TFv.api.Fs\n", " print(TFv.volumeInfo)\n", " for (feat, info) in TFv.isLoaded(\"owork ointerfrom ointerto\", pretty=False).items():\n", " n = 0\n", " for x in Fsv(feat).items():\n", " n += 1\n", " print(f\" {feat:<10}: {n:>7} values\\n {info['meta']['description']}\")" ] }, { "cell_type": "markdown", "id": "29349f1b-e7ba-4a51-9d3f-03d94b55f351", "metadata": {}, "source": [ "### `ointerto`, `ointerfrom`\n", "\n", "Note that in our volumes the features `ointerfrom`, `ointerto` are empty.\n", "\n", "These are features that collect edge data for edges between a node inside the volume and an edge outside the volume.\n", "\n", "In our work, we do not have such edges, because we did not load the parallels module explicitly,\n", "and the `Fabric(locations, modules)` function only looks in directories specified in its `locations` and `modules` parameters." ] }, { "cell_type": "markdown", "id": "e6cb58ef-5345-452f-b46a-d4fe7bdc1d92", "metadata": {}, "source": [ "## Collect\n", "\n", "We used the same collection specification as before." ] }, { "cell_type": "code", "execution_count": 37, "id": "e755cbca-9e67-4f51-ae1c-fa8cbe0dd827", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collection prophets exists and will be recreated\n", " 0.00s Loading volume medium from ~/github/ETCBC/bhsa/tf/2021/_local/medium ...\n", " 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed\n", " 0.08s Loading volume small from ~/github/ETCBC/bhsa/tf/2021/_local/small ...\n", " 0.02s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed\n", " 0.13s Loading volume tiny from ~/github/ETCBC/bhsa/tf/2021/_local/tiny ...\n", " 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed\n", " 0.22s inspect metadata ...\n", " 0.22s metadata sorted out\n", " 0.22s check nodetypes ...\n", " | volume medium\n", " | volume small\n", " | volume tiny\n", " 0.22s node types ok\n", " 0.22s Collect nodes from volumes ...\n", " | 0.00s Check against overlapping slots ...\n", " | | medium : 5268 slots\n", " | | small : 2505 slots\n", " | | tiny : 5792 slots\n", " | 0.00s no overlap\n", " | 0.01s Group non-slot nodes by type\n", " | | medium : 5269- 17286\n", " | | small : 2506- 9495\n", " | | tiny : 5793- 21779\n", " | 0.01s Mapping nodes from volume to/from work ...\n", " | | book : 13566 - 13574\n", " | | chapter : 13575 - 13611\n", " | | clause : 13612 - 16416\n", " | | clause_atom : 16417 - 19312\n", " | | half_verse : 19313 - 20680\n", " | | phrase : 20681 - 28480\n", " | | phrase_atom : 28481 - 36802\n", " | | sentence : 36803 - 38775\n", " | | sentence_atom : 38776 - 40788\n", " | | subphrase : 40789 - 45086\n", " | | verse : 45087 - 45809\n", " | | lex : 45810 - 47884\n", " | 0.02s The new work has 47884 nodes of which 13565 slots\n", " 0.24s collection done\n", " 0.24s remap features ...\n", " 0.61s remapping done\n", " 0.61s write work as TF data set\n", " 1.06s writing done\n", " 1.06s done\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TFw.collect(\n", " tuple(volumes),\n", " COLLECTION,\n", " overwrite=True,\n", ")" ] }, { "cell_type": "markdown", "id": "8e21c5ac-93de-4c4f-92af-a8b11a8001c3", "metadata": {}, "source": [ "### Load collection" ] }, { "cell_type": "code", "execution_count": 38, "id": "8329f8a7-beb7-4527-be97-5ab852d4c90c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TFc = Fabric(locations=SOURCE, collection=COLLECTION)\n", "TFc.loadAll(silent=\"deep\")" ] }, { "cell_type": "markdown", "id": "fa455b4e-d2be-4ae6-81fd-c2b9cada46d9", "metadata": {}, "source": [ "## Lowest level: plain functions\n", "\n", "We can pass the location of a work, or the API to the loaded features of a work.\n", "We do the latter here." ] }, { "cell_type": "code", "execution_count": 39, "id": "ce6ec4f5-52b7-453e-a9af-62d232f48d1f", "metadata": {}, "outputs": [], "source": [ "VOLUMES_WRONG = dict(\n", " tiny=(\"Obadiah\", \"Nahum\", \"Haggai\", \"Habakkuk\", \"Jonah\", \"Micah\"),\n", " small=(\"Obadiah\", \"Malachi\", \"Joel\"),\n", " medium=(\"Ezra\",),\n", ")" ] }, { "cell_type": "markdown", "id": "391ac00b-1c9f-49e8-8c2d-3540c3437085", "metadata": {}, "source": [ "This will turn out to be wrong because there is a book that occurs in several volumes." ] }, { "cell_type": "code", "execution_count": 40, "id": "61534c9f-4b78-4e07-9720-87b569e7ce72", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Check volumes ...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " | 17s Section Obadiah of volume tiny reoccurs in volume small\n" ] } ], "source": [ "volumes = extract(SOURCE, TARGET, VOLUMES_WRONG, api=apiw, overwrite=True)" ] }, { "cell_type": "markdown", "id": "a9d53782-500d-486d-96d6-7802fc66462a", "metadata": {}, "source": [ "It is not allowed to extract volumes that have material in common!" ] }, { "cell_type": "code", "execution_count": 41, "id": "4fcfda2f-200f-480f-be72-6712142b3348", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Check volumes ...\n", " | Volume tiny exists and will be recreated\n", " | Volume small exists and will be recreated\n", " | Volume medium exists and will be recreated\n", " | Work consists of 39 books:\n", " | book Genesis : with 28764 slots\n", " | book Exodus : with 23748 slots\n", " | book Leviticus : with 17099 slots\n", " | book Numbers : with 23188 slots\n", " | book Deuteronomy : with 20128 slots\n", " | book Joshua : with 14526 slots\n", " | book Judges : with 14086 slots\n", " | book 1_Samuel : with 18929 slots\n", " | book 2_Samuel : with 15612 slots\n", " | book 1_Kings : with 18685 slots\n", " | book 2_Kings : with 17307 slots\n", " | book Isaiah : with 22931 slots\n", " | book Jeremiah : with 29736 slots\n", " | book Ezekiel : with 26182 slots\n", " | book Hosea : with 3146 slots\n", " | book Joel : with 1318 slots\n", " | book Amos : with 2780 slots\n", " | book Obadiah : with 392 slots\n", " | book Jonah : with 985 slots\n", " | book Micah : with 1895 slots\n", " | book Nahum : with 746 slots\n", " | book Habakkuk : with 897 slots\n", " | book Zephaniah : with 1037 slots\n", " | book Haggai : with 877 slots\n", " | book Zechariah : with 4471 slots\n", " | book Malachi : with 1187 slots\n", " | book Psalms : with 25372 slots\n", " | book Job : with 10912 slots\n", " | book Proverbs : with 8859 slots\n", " | book Ruth : with 1802 slots\n", " | book Song_of_songs : with 1682 slots\n", " | book Ecclesiastes : with 4233 slots\n", " | book Lamentations : with 1945 slots\n", " | book Esther : with 4621 slots\n", " | book Daniel : with 8072 slots\n", " | book Ezra : with 5268 slots\n", " | book Nehemiah : with 7842 slots\n", " | book 1_Chronicles : with 15566 slots\n", " | book 2_Chronicles : with 19764 slots\n", " 0.10s volumes ok\n", " 0.10s Distribute nodes over volumes ...\n", " | 0.00s volume tiny ...\n", " | | 0.00s book Obadiah with 392 slots\n", " | | 0.00s book Nahum with 746 slots\n", " | | 0.00s book Haggai with 877 slots\n", " | | 0.00s book Habakkuk with 897 slots\n", " | | 0.00s book Jonah with 985 slots\n", " | | 0.00s book Micah with 1895 slots\n", " | 0.01s volume tiny with 5792 slots and 21779 nodes ...\n", " | 0.01s volume small ...\n", " | | 0.00s book Malachi with 1187 slots\n", " | | 0.00s book Joel with 1318 slots\n", " | 0.01s volume small with 2505 slots and 9495 nodes ...\n", " | 0.01s volume medium ...\n", " | | 0.00s book Ezra with 5268 slots\n", " | 0.02s volume medium with 5268 slots and 17286 nodes ...\n", " 0.12s distribution done\n", " 0.12s Remap features ...\n", " | 0.00s volume tiny with 21779 nodes ...\n", " | 0.23s volume small with 9495 nodes ...\n", " | 0.33s volume medium with 17286 nodes ...\n", " 0.60s remapping done\n", " 0.60s Write volumes as TF datasets\n", " | 0.00s Writing volume tiny\n", " | 0.20s Writing volume small\n", " | 0.30s Writing volume medium\n", " 1.06s writing done\n", " 1.06s All done\n" ] } ], "source": [ "volumes = extract(SOURCE, TARGET, VOLUMES, api=apiw, overwrite=True)" ] }, { "cell_type": "markdown", "id": "10b3837f-9aaf-4b58-bb69-3547eb3574f6", "metadata": {}, "source": [ "Now we make the same collection as before, but first we make a few deliberate mistakes." ] }, { "cell_type": "code", "execution_count": 42, "id": "f33ef3d9-0e01-46bc-8bff-80fdfae619ba", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collection bible exists and will be recreated\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 25s Volume tiny is already part of the collection\n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collect(\n", " ((\"tiny\", f\"{TARGET}/tiny\"), (\"tiny\", f\"{TARGET}/small\")),\n", " f\"{TARGET}/bible\",\n", " overwrite=True,\n", ")" ] }, { "cell_type": "code", "execution_count": 43, "id": "1038afc0-6cdb-4917-b825-73a85d688c20", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 28s Volume tiny at location ~/github/ETCBC/bhsa/tf/2021/_local/tiny reoccurs as volume small\n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collect(\n", " ((\"tiny\", f\"{TARGET}/tiny\"), (\"small\", f\"{TARGET}/tiny\")),\n", " f\"{TARGET}/bible\",\n", " overwrite=True,\n", ")" ] }, { "cell_type": "code", "execution_count": 45, "id": "3df3273c-1b0b-4b88-9d9b-1f6b89b7aeba", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collection bible exists and will be recreated\n", " 0.00s Loading volume medium from ~/github/ETCBC/bhsa/tf/2021/_local/medium ...\n", " 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed\n", " 0.08s Loading volume small from ~/github/ETCBC/bhsa/tf/2021/_local/small ...\n", " 0.03s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed\n", " 0.13s Loading volume tiny from ~/github/ETCBC/bhsa/tf/2021/_local/tiny ...\n", " 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed\n", " 0.22s inspect metadata ...\n", " 0.22s metadata sorted out\n", " 0.22s check nodetypes ...\n", " | volume medium\n", " | volume small\n", " | volume tiny\n", " 0.22s node types ok\n", " 0.22s Collect nodes from volumes ...\n", " | 0.00s Check against overlapping slots ...\n", " | | medium : 5268 slots\n", " | | small : 2505 slots\n", " | | tiny : 5792 slots\n", " | 0.01s no overlap\n", " | 0.01s Group non-slot nodes by type\n", " | | medium : 5269- 17286\n", " | | small : 2506- 9495\n", " | | tiny : 5793- 21779\n", " | 0.01s Mapping nodes from volume to/from work ...\n", " | | book : 13566 - 13574\n", " | | chapter : 13575 - 13611\n", " | | clause : 13612 - 16416\n", " | | clause_atom : 16417 - 19312\n", " | | half_verse : 19313 - 20680\n", " | | phrase : 20681 - 28480\n", " | | phrase_atom : 28481 - 36802\n", " | | sentence : 36803 - 38775\n", " | | sentence_atom : 38776 - 40788\n", " | | subphrase : 40789 - 45086\n", " | | verse : 45087 - 45809\n", " | | lex : 45810 - 47884\n", " | 0.02s The new work has 47884 nodes of which 13565 slots\n", " 0.24s collection done\n", " 0.24s remap features ...\n", " 0.62s remapping done\n", " 0.62s write work as TF data set\n", " 1.07s writing done\n", " 1.07s done\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collect(\n", " {name: info[\"location\"] for (name, info) in volumes.items()},\n", " f\"{TARGET}/bible\",\n", " overwrite=True,\n", ")" ] }, { "cell_type": "markdown", "id": "jewish-stroke", "metadata": {}, "source": [ "# All steps\n", "\n", "* **[start](start.ipynb)** your first step in mastering the bible computationally\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[export Excel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[export](export.ipynb)** export your dataset as an Emdros database\n", "* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features\n", "* **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus\n", "* **volumes** work with selected books only\n", "* **[trees](trees.ipynb)** work with the BHSA data as syntax trees\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }