{ "cells": [ { "cell_type": "markdown", "id": "1cf27c95-0b45-4d97-a62d-9950654eb386", "metadata": {}, "source": [ "# Various text formats (Nestle1904LFT)" ] }, { "cell_type": "markdown", "id": "1495a021-daa1-4c2e-80d5-ab7d2d75bc3f", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## Table of content \n", "* 1 - Introduction\n", "* 2 - Load Text-Fabric app and data\n", "* 3 - Performing the queries\n", " * 3.1 - Display the formatting options available for this corpus\n", " * 3.2 - Showcasing the various formats\n", " * 3.3 - Normalized text\n", " * 3.4 - Unaccented text\n", " * 3.5 - Transliterated text\n", " * 3.6 - Text with text critical markers\n", " * 3.7 - Nestle version 1904 and version 1913 (Mark 1:1)\n" ] }, { "cell_type": "markdown", "id": "e6830070-1e97-4bdf-aa0c-5eda4e624a84", "metadata": {}, "source": [ "# 1 - Introduction \n", "##### [Back to TOC](#TOC)\n", "\n", "This Jupyter Notebook is designed to demonstrate the predefined text formats available in this Text-Fabric dataset, specifically focusing on displaying the Greek surface text of the New Testament." ] }, { "cell_type": "markdown", "id": "a1b900e2-995f-4f36-ad74-d821092ca02c", "metadata": {}, "source": [ "# 2 - Load Text-Fabric app and data \n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "code", "execution_count": 1, "id": "6bd6c621-361d-487f-a8df-c27fb1ec9de2", "metadata": { "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "0071a0db-916c-4357-88bd-6b3255af0764", "metadata": {}, "outputs": [], "source": [ "# Loading the Text-Fabric code\n", "# Note: it is assumed Text-Fabric is installed in your environment\n", "from tf.fabric import Fabric\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "id": "ed76db5d-5463-4bf1-99ca-7f14b3a0f277", "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "The requested app is not available offline\n", "\t~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app not found\n", "rate limit is 5000 requests per hour, with 4996 left for this hour\n", "\tconnecting to online GitHub repo tonyjurg/Nestle1904LFT ... connected\n", "\tapp/README.md...downloaded\n", "\tapp/config.yaml...downloaded\n", "\tapp/static...directory\n", "\t\tapp/static/css...directory\n", "\t\t\tapp/static/css/display_custom.css...downloaded\n", "\t\tapp/static/logo.png...downloaded\n", "\tOK\n" ] }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.7" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.2.2, tonyjurg/Nestle1904LFT/app v3, Search Reference
\n", " Data: tonyjurg - Nestle1904LFT 0.7, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book275102.93100
chapter260529.92100
verse794317.35100
sentence801117.20100
wg1054306.85524
word1377791.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Nestle 1904 (Low Fat Tree)\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "\n", " ✅ Characters (eg. punctuations) following the word\n", "\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ Book name (in English language)\n", "\n", "
\n", "\n", "
\n", "
\n", "booknumber\n", "
\n", "
int
\n", "\n", " ✅ NT book number (Matthew=1, Mark=2, ..., Revelation=27)\n", "\n", "
\n", "\n", "
\n", "
\n", "bookshort\n", "
\n", "
str
\n", "\n", " ✅ Book name (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "\n", " ✅ Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ Chapter number inside book\n", "\n", "
\n", "\n", "
\n", "
\n", "clausetype\n", "
\n", "
str
\n", "\n", " ✅ Clause type details (e.g. Verbless, Minor)\n", "\n", "
\n", "\n", "
\n", "
\n", "containedclause\n", "
\n", "
str
\n", "\n", " 🆗 Contained clause (WG number)\n", "\n", "
\n", "\n", "
\n", "
\n", "degree\n", "
\n", "
str
\n", "\n", " ✅ Degree (e.g. Comparitative, Superlative)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " ✅ English gloss\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ Gramatical gender (Masculine, Feminine, Neuter)\n", "\n", "
\n", "\n", "
\n", "
\n", "headverse\n", "
\n", "
str
\n", "\n", " ✅ Start verse number of a sentence\n", "\n", "
\n", "\n", "
\n", "
\n", "junction\n", "
\n", "
str
\n", "\n", " ✅ Junction data related to a wordgroup\n", "\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "\n", " ✅ Lexeme (lemma)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_dom\n", "
\n", "
str
\n", "\n", " ✅ Lexical domain according to Semantic Dictionary of Biblical Greek, SDBG (not present everywhere?)\n", "\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
str
\n", "\n", " ✅ Lauw-Nida lexical classification (not present everywhere?)\n", "\n", "
\n", "\n", "
\n", "
\n", "markafter\n", "
\n", "
str
\n", "\n", " 🆗 Text critical marker after word\n", "\n", "
\n", "\n", "
\n", "
\n", "markbefore\n", "
\n", "
str
\n", "\n", " 🆗 Text critical marker before word\n", "\n", "
\n", "\n", "
\n", "
\n", "markorder\n", "
\n", "
str
\n", "\n", "  Order of punctuation and text critical marker\n", "\n", "
\n", "\n", "
\n", "
\n", "monad\n", "
\n", "
int
\n", "\n", " ✅ Monad (smallest token matching word order in the corpus)\n", "\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "\n", " ✅ Gramatical mood of the verb (passive, etc)\n", "\n", "
\n", "\n", "
\n", "
\n", "morph\n", "
\n", "
str
\n", "\n", " ✅ Morphological tag (Sandborg-Petersen morphology)\n", "\n", "
\n", "\n", "
\n", "
\n", "nodeID\n", "
\n", "
str
\n", "\n", " ✅ Node ID (as in the XML source data)\n", "\n", "
\n", "\n", "
\n", "
\n", "normalized\n", "
\n", "
str
\n", "\n", " ✅ Surface word with accents normalized and trailing punctuations removed\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ Gramatical number (Singular, Plural)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " ✅ Gramatical number of the verb (e.g. singular, plural)\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "person\n", "
\n", "
str
\n", "\n", " ✅ Gramatical person of the verb (first, second, third)\n", "\n", "
\n", "\n", "
\n", "
\n", "punctuation\n", "
\n", "
str
\n", "\n", " ✅ Punctuation after word\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " ✅ Value of the ref ID (taken from XML sourcedata)\n", "\n", "
\n", "\n", "
\n", "
\n", "reference\n", "
\n", "
str
\n", "\n", " ✅ Reference (to nodeID in XML source data, not yet post-processes)\n", "\n", "
\n", "\n", "
\n", "
\n", "roleclausedistance\n", "
\n", "
str
\n", "\n", " ⚠️ Distance to the wordgroup defining the syntactical role of this word\n", "\n", "
\n", "\n", "
\n", "
\n", "sentence\n", "
\n", "
int
\n", "\n", " ✅ Sentence number (counted per chapter)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ Part of Speech (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp_full\n", "
\n", "
str
\n", "\n", " ✅ Part of Speech (long description)\n", "\n", "
\n", "\n", "
\n", "
\n", "strongs\n", "
\n", "
str
\n", "\n", " ✅ Strongs number\n", "\n", "
\n", "\n", "
\n", "
\n", "subj_ref\n", "
\n", "
str
\n", "\n", " 🆗 Subject reference (to nodeID in XML source data, not yet post-processes)\n", "\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "\n", " ✅ Gramatical tense of the verb (e.g. Present, Aorist)\n", "\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "\n", " ✅ Gramatical type of noun or pronoun (e.g. Common, Personal)\n", "\n", "
\n", "\n", "
\n", "
\n", "unicode\n", "
\n", "
str
\n", "\n", " ✅ Word as it apears in the text in Unicode (incl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ Verse number inside chapter\n", "\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "\n", " ✅ Gramatical voice of the verb (e.g. active,passive)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgclass\n", "
\n", "
str
\n", "\n", " ✅ Class of the wordgroup (e.g. cl, np, vp)\n", "\n", "
\n", "\n", "
\n", "
\n", "wglevel\n", "
\n", "
int
\n", "\n", " 🆗 Number of the parent wordgroups for a wordgroup\n", "\n", "
\n", "\n", "
\n", "
\n", "wgnum\n", "
\n", "
int
\n", "\n", " ✅ Wordgroup number (counted per book)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgrole\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the wordgroup (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgrolelong\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the wordgroup (full)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgrule\n", "
\n", "
str
\n", "\n", " ✅ Wordgroup rule information (e.g. Np-Appos, ClCl2, PrepNp)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgtype\n", "
\n", "
str
\n", "\n", " ✅ Wordgroup type details (e.g. group, apposition)\n", "\n", "
\n", "\n", "
\n", "
\n", "word\n", "
\n", "
str
\n", "\n", " ✅ Word as it appears in the text (excl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordlevel\n", "
\n", "
str
\n", "\n", " 🆗 Number of the parent wordgroups for a word\n", "\n", "
\n", "\n", "
\n", "
\n", "wordrole\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the word (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordrolelong\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the word (full)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordtranslit\n", "
\n", "
str
\n", "\n", " 🆗 Transliteration of the text (in latin letters, excl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordunacc\n", "
\n", "
str
\n", "\n", " ✅ Word without accents (excl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: tonyjurg/Nestle1904LFT
  3. appPath:C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/app
  4. commit: e68bd68c7c4c862c1464d995d51e27db7691254f
  5. css: ''
  6. dataDisplay:
    • excludedFeatures:
      • orig_order
      • verse
      • book
      • chapter
    • noneValues:
      • none
      • unknown
      • no value
      • NA
      • ''
    • showVerseInTuple: 0
    • textFormat: text-orig-full
  7. docs:
    • docBase: https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/
    • docPage: about
    • docRoot: https://github.com/tonyjurg/Nestle1904LFT
    • featureBase:https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/features/<feature>.md
  8. interfaceDefaults: {fmt: layout-orig-full}
  9. isCompatible: True
  10. local: no value
  11. localDir:C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/_temp
  12. provenanceSpec:
    • corpus: Nestle 1904 (Low Fat Tree)
    • doi: 10.5281/zenodo.10182594
    • org: tonyjurg
    • relative: /tf
    • repo: Nestle1904LFT
    • repro: Nestle1904LFT
    • version: 0.7
    • webBase: https://learner.bible/text/show_text/nestle1904/
    • webHint: Show this on the Bible Online Learner website
    • webLang: en
    • webUrl:https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: v0.6
  14. typeDisplay:
    • book:
      • condense: True
      • hidden: True
      • label: {book}
      • style: ''
    • chapter:
      • condense: True
      • hidden: True
      • label: {chapter}
      • style: ''
    • sentence:
      • hidden: 0
      • label: #{sentence} (start: {book} {chapter}:{headverse})
      • style: ''
    • verse:
      • condense: True
      • excludedFeatures: chapter verse
      • label: {book} {chapter}:{verse}
      • style: ''
    • wg:
      • hidden: 0
      • label:#{wgnum}: {wgtype} {wgclass} {clausetype} {wgrole} {wgrule} {junction}
      • style: ''
    • word:
      • base: True
      • features: lemma
      • featuresBare: gloss
      • surpress: chapter verse
  15. writing: grc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the N1904 app and data\n", "N1904 = use (\"tonyjurg/Nestle1904LFT\", version=\"0.7\", hoist=globals())" ] }, { "cell_type": "code", "execution_count": 4, "id": "d5da5d1a-6827-49b3-ad37-7ca29ba59b45", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)\n", "N1904.dh(N1904.getCss())" ] }, { "cell_type": "code", "execution_count": 5, "id": "d0d63399-6955-4fa0-bbd8-a6273c617188", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Set default view in a way to limit noise as much as possible.\n", "N1904.displaySetup(condensed=True, multiFeatures=False, queryFeatures=False)" ] }, { "cell_type": "markdown", "id": "58ef1678-a19d-4c0c-80f3-84f8471a90e2", "metadata": { "tags": [] }, "source": [ "# 3 - Performing the queries \n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "b59c83bd-329d-4820-8bcc-ca92e1c55f6d", "metadata": {}, "source": [ "## 3.1 - Display the formatting options available for this corpus\n", "##### [Back to TOC](#TOC)\n", "\n", "The output of the following command provides details on available formats to present the text of the corpus. \n", "\n", "See also [module tf.advanced.options\n", "Display Settings](https://annotation.github.io/text-fabric/tf/advanced/options.html)." ] }, { "cell_type": "code", "execution_count": 7, "id": "1d4b1b93-08e5-41f4-a587-66e444a3e271", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "format | level | template\n", "--- | --- | ---\n", "`text-critical` | **word** | `{unicode} `\n", "`text-normalized` | **word** | `{normalized}{after}`\n", "`text-orig-full` | **word** | `{word}{after}`\n", "`text-transliterated` | **word** | `{wordtranslit}{after}`\n", "`text-unaccented` | **word** | `{wordunacc}{after}`\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "N1904.showFormats()" ] }, { "cell_type": "markdown", "id": "e4174cea-13db-411b-8bb3-5b17a20941c0", "metadata": {}, "source": [ "Note 1: This data originates from the file `otext.tf`:\n", "\n", "> \n", "```\n", "@config\n", "...\n", "@fmt:text-orig-full={word}{after}\n", "...\n", "```\n" ] }, { "cell_type": "markdown", "id": "5c5c346a-826e-4fcd-a23d-c331cf888d29", "metadata": {}, "source": [ "Note 2: The names of the available formats can also be obtaind by using the following call. However, this will not display the features that are included into the format. The function will return a list of ordered tuples that can easily be postprocessed:" ] }, { "cell_type": "code", "execution_count": 8, "id": "acaaf356-eeae-4101-b5ef-090607dca5fc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'text-critical': 'word',\n", " 'text-normalized': 'word',\n", " 'text-orig-full': 'word',\n", " 'text-transliterated': 'word',\n", " 'text-unaccented': 'word'}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.formats" ] }, { "cell_type": "markdown", "id": "08c67b53-bd6c-42e6-a0cf-b7f609cd9879", "metadata": {}, "source": [ "## 3.2 - Showcasing the various formats\n", "##### [Back to TOC](#TOC)\n", "\n", "The following will show the differences between the displayed text for the various formats. The verse to be printed is from Mark 1:1. The assiated verse node is 139200." ] }, { "cell_type": "code", "execution_count": 9, "id": "ea12ca08-505a-4497-bc18-3b9247502350", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "fmt=text-critical\t: Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ (Υἱοῦ Θεοῦ). \n", "fmt=text-normalized\t: Ἀρχή τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ Υἱοῦ Θεοῦ. \n", "fmt=text-orig-full\t: Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ Υἱοῦ Θεοῦ. \n", "fmt=text-transliterated\t: Arkhe tou euaggeliou Iesou Khristou Uiou Theou. \n", "fmt=text-unaccented\t: Αρχη του ευαγγελιου Ιησου Χριστου Υιου Θεου. \n" ] } ], "source": [ "for formats in T.formats:\n", " print(f'fmt={formats}\\t: {T.text(139200,formats)}')" ] }, { "cell_type": "markdown", "id": "211b2bde-002b-4243-87c9-4bd850868354", "metadata": { "jupyter": { "outputs_hidden": true }, "tags": [] }, "source": [ "## 3.3 - Normalized text\n", "##### [Back to TOC](#TOC)\n", "\n", "The normalized Greek text refers to a standardized and consistent representation of Greek characters and linguistic elements in a text. Using normalized text ensures a consistent presentation, which, in turn, allows for easier postprocessing. The relevance of normalized text becomes evident through the following demonstration.\n", "\n", "In the upcoming code segment, a list will be created to display the top 10 differences in values between the \"word\" feature and the \"normalized\" feature on the same word node." ] }, { "cell_type": "code", "execution_count": 10, "id": "b5ce40f1-9a22-444f-955a-c5545797a056", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "37182 differences found between feature word and feature normalized.\n", "╒══════════════════════╤═════════════╕\n", "│ word -> normalized │ frequency │\n", "╞══════════════════════╪═════════════╡\n", "│ καὶ -> καί │ 8545 │\n", "├──────────────────────┼─────────────┤\n", "│ δὲ -> δέ │ 2620 │\n", "├──────────────────────┼─────────────┤\n", "│ τὸ -> τό │ 1658 │\n", "├──────────────────────┼─────────────┤\n", "│ τὸν -> τόν │ 1556 │\n", "├──────────────────────┼─────────────┤\n", "│ τὴν -> τήν │ 1518 │\n", "├──────────────────────┼─────────────┤\n", "│ γὰρ -> γάρ │ 921 │\n", "├──────────────────────┼─────────────┤\n", "│ μὴ -> μή │ 902 │\n", "├──────────────────────┼─────────────┤\n", "│ τὰ -> τά │ 817 │\n", "├──────────────────────┼─────────────┤\n", "│ τοὺς -> τούς │ 722 │\n", "├──────────────────────┼─────────────┤\n", "│ πρὸς -> πρός │ 670 │\n", "╘══════════════════════╧═════════════╛\n" ] }, { "data": { "text/markdown": [ "**Warning: table truncated!**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Library to format table\n", "from tabulate import tabulate\n", "\n", "# get a node list for all word nodes\n", "WordQuery = '''\n", "word \n", "'''\n", "# The option 'silent=True' has been added in the next line to prevent printing the number of nodes found\n", "WordResult = N1904.search(WordQuery,silent=True) \n", "\n", "# Gather the results where feature normalized is different from feature word \n", "ResultDict = {}\n", "NumberOfChanges=0\n", "for tuple in WordResult:\n", " word=F.word.v(tuple[0])\n", " normalized=F.normalized.v(tuple[0])\n", " if word!=normalized:\n", " Change=f\"{word} -> {normalized}\"\n", " NumberOfChanges+=1\n", " if Change in ResultDict:\n", " # If it exists, add the count to the existing value\n", " ResultDict[Change]+=1\n", " else:\n", " # If it doesn't exist, initialize the count as the value\n", " ResultDict[Change]=1\n", " \n", "print(f\"{NumberOfChanges} differences found between feature word and feature normalized.\")\n", "# Convert the dictionary into a list of key-value pairs and sort it according to frequency\n", "UnsortedTableData = [[key, value] for key, value in ResultDict.items()]\n", "TableData= sorted(UnsortedTableData, key=lambda row: row[1], reverse=True)\n", "\n", "# In this example the table will be truncated \n", "max_rows = 10 # Set your desired number of rows here\n", "TruncatedTable = TableData[:max_rows]\n", "\n", "# Produce the table\n", "headers = [\"word -> normalized\",\"frequency\"]\n", "print(tabulate(TruncatedTable, headers=headers, tablefmt='fancy_grid'))\n", "\n", "# Add a warning using markdown (API call A.dm) allowing it to be printed in bold type\n", "N1904.dm(\"**Warning: table truncated!**\")" ] }, { "cell_type": "markdown", "id": "ba91511c-f13c-42ea-95b8-c55be4179092", "metadata": {}, "source": [ "Now, it would be interesting to check whether καί and δέ already exist (with these accents) in the feature \"word.\"" ] }, { "cell_type": "code", "execution_count": 11, "id": "6f2dc96f-44c2-4910-b7c7-fc67d4424fa6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.09s 31 results\n", " 0.08s 144 results\n" ] } ], "source": [ "# get a node list for all word nodes with feature word=καί \n", "KaiQuery = '''\n", "word word=καί\n", "'''\n", "KaiResult = N1904.search(KaiQuery) \n", "\n", "# get a node list for all word nodes with feature word=δέ\n", "DeQuery = '''\n", "word word=δέ\n", "'''\n", "DeResult = N1904.search(DeQuery) " ] }, { "cell_type": "markdown", "id": "b898c383-9394-4022-999c-233f636c686a", "metadata": {}, "source": [ "This demonstrates the presence of variant accents for καί and δέ in the feature word. Consequently, constructing queries based on a single accent variant would result in the omission of certain results." ] }, { "cell_type": "markdown", "id": "9b7ba7c5-26fe-4d54-a447-1963141050b6", "metadata": {}, "source": [ "## 3.4 - Unaccented text\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "d30aded3-afc7-4f5e-80c6-7305aa2932de", "metadata": {}, "source": [ "A similar case can be made regarding postprocessing with respect to the unaccented text; however, accents do play a significant role in understanding some Greek words (homographs). It is important to realize that the accents were not part of the original text, which was in unaccented capital letters (uncials) without spaces between words." ] }, { "cell_type": "code", "execution_count": 12, "id": "b3f19cd6-ea72-450a-9b9e-b04334a597d5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.10s 30 results\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n", "κύριος\n" ] } ], "source": [ "# get a node list for all word nodes containing some variants in accents\n", "KosmosQuery = '''\n", "word word~λ[όο]γ[όο]ς\n", "'''\n", "\n", "PneumaQuery = '''\n", "word word~πν[εέ][ῦυ]μα\n", "'''\n", "\n", "KuriosQuery = '''\n", "word word~κ[ύυ]ρ[ίι]ος\n", "'''\n", "\n", "HemeraQuery = '''\n", "word word~ἡμ[έε]ρα\n", "'''\n", "\n", "\n", "Result = N1904.search(KuriosQuery) \n", "for tuple in Result:\n", " word=F.word.v(tuple[0])\n", " print(word)" ] }, { "cell_type": "markdown", "id": "17670341-3d4c-4c15-9e75-605ce8b4b162", "metadata": {}, "source": [ "## 3.5 - Transliterated text\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "92e8a14a-6cd3-482a-95ae-49b1597d2795", "metadata": {}, "source": [ "Using transliterated text can be convenient when creating queries, as it allows you to use your normal keyboard without the need to include Greek characters. See the following example:" ] }, { "cell_type": "code", "execution_count": 13, "id": "bddbf2e8-11a6-4d3b-8372-ba673b54854b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.09s 63 results\n", "λόγος\n" ] } ], "source": [ "LatinQuery = '''\n", "word wordtranslit=logos\n", "'''\n", "Result = N1904.search(LatinQuery) \n", "for tuple in Result:\n", " word=F.word.v(tuple[0])\n", " print(word)\n", " break" ] }, { "cell_type": "markdown", "id": "d9e4b397-c992-4e30-99e6-94bb53a742d6", "metadata": {}, "source": [ "## 3.6 - Text with text critical markers\n", "##### [Back to TOC](#TOC)\n", "\n", "A limited number of critical markers are included in the dataset, stored in the features \"markbefore\" and \"markafter.\" To get an impression of their quantity:" ] }, { "cell_type": "code", "execution_count": 14, "id": "86e767d0-1460-40ae-b80c-36b3a7ac17ca", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(('', 137728), ('—', 31), (')', 11), (']]', 7), ('(', 1), (']', 1))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.markafter.freqList()" ] }, { "cell_type": "code", "execution_count": 15, "id": "4d91d601-9098-4e4e-b3ce-b98f141d788c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(('', 137745), ('—', 16), ('(', 10), ('[[', 7), ('[', 1))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.markbefore.freqList()" ] }, { "cell_type": "markdown", "id": "33ef8f51-9fc9-4426-90f1-46bf6311ec96", "metadata": {}, "source": [ "A quick investigation was conducted to check the dataset's consistency. Note that an automated check for '—' is not implemented below, as it is difficult to determine whether this marker indicates a start or an end." ] }, { "cell_type": "code", "execution_count": 16, "id": "358d78ac-e26f-4150-a81d-5c87c84f76d9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mark 1:1: set single round\n", "Mark 1:1: unset single round\n", "\n", "Mark 16:9: set double square\n", "Mark 16:20: unset double square\n", "\n", "Mark 16:99: set double square\n", "Mark 16:99: unset double square\n", "\n", "Luke 24:12: set double square\n", "Luke 24:12: unset double square\n", "\n", "Luke 24:36: set double square\n", "Luke 24:36: unset double square\n", "\n", "Luke 24:40: set double square\n", "Luke 24:40: unset double square\n", "\n", "Luke 24:51: set double square\n", "Luke 24:51: unset double square\n", "\n", "Luke 24:52: set double square\n", "Luke 24:52: unset double square\n", "\n", "John 1:38: set single round\n", "John 1:38: unset single round\n", "\n", "John 1:41: set single round\n", "John 1:41: unset single round\n", "\n", "John 1:42: set single round\n", "John 1:42: unset single round\n", "\n", "John 5:3: set single round\n", "John 5:4: unset single round\n", "\n", "John 7:53: set single round\n", "John 8:11: unset single round\n", "\n", "John 9:7: set single round\n", "John 9:7: unset single round\n", "\n", "John 20:16: set single round\n", "John 20:16: unset single round\n", "\n", "Romans 4:16: set single round\n", "Romans 4:17: unset single round\n", "\n", "Ephesians 1:1: set single square\n", "Ephesians 1:1: unset single square\n", "\n", "Colossians 4:10: set single round\n", "Colossians 4:10: unset single round\n", "\n", "I_Timothy 3:5: set single round\n", "I_Timothy 3:5: unset single round\n", "\n" ] } ], "source": [ "# get a node list for all word nodes\n", "WordQuery = '''\n", "word \n", "'''\n", "BracketList=(\"(\" , \")\" , \"[\" , \"]\" , \"[[\" , \"]]\")\n", "\n", "# The option 'silent=True' has been added in the next line to prevent printing the number of nodes found\n", "WordResult = N1904.search(WordQuery,silent=True) \n", "\n", "SingleRound=SingleSquare=DoubleSquare=False\n", "\n", "for tuple in WordResult:\n", " node=tuple[0]\n", " MarkAfter=F.markafter.v(node)\n", " MarkBefore=F.markbefore.v(node)\n", " Mark=MarkAfter+MarkBefore\n", " location=\"{} {}:{}\".format(F.book.v(node),F.chapter.v(node),F.verse.v(node))\n", " if (Mark in BracketList):\n", " if Mark==\"(\":\n", " if SingleRound==True: print (\"Sequence problem?\")\n", " SingleRound=True\n", " print (f\"{location}: set single round\")\n", " if Mark==\")\":\n", " if SingleRound==False: print (\"Sequence problem?\")\n", " SingleRound=False\n", " print (f\"{location}: unset single round\\n\")\n", " \n", " if Mark==\"[\":\n", " if SingleSquare==True: print (\"Sequence problem?\")\n", " SingleSquare=True\n", " print (f\"{location}: set single square\")\n", " if Mark==\"]\":\n", " if SingleSquare==False: print (\"Sequence problem?\")\n", " SingleSquare=False\n", " print (f\"{location}: unset single square\\n\")\n", " \n", " if Mark==\"[[\":\n", " if DoubleSquare==True: print (\"Sequence problem?\")\n", " DoubleSquare=True\n", " print (f\"{location}: set double square\")\n", " if Mark==\"]]\":\n", " if DoubleSquare==False: print (\"Sequence problem?\")\n", " DoubleSquare=False\n", " print (f\"{location}: unset double square\\n\")\n" ] }, { "cell_type": "markdown", "id": "9f37dadc-c4b0-4c88-979d-7c6fcb369e6f", "metadata": {}, "source": [ "## 3.7 - Nestle version 1904 and version 1913 (Mark 1:1)\n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "845d605f-cd7e-4abd-bd0c-b18f572f09cd", "metadata": {}, "source": [ "The dataset seems to be (also) compiled based upon the Nestle version or 1913, as explained on [https://sites.google.com/site/nestle1904/faq]:\n", "\n", "> *What are your sources?*\n", "> For the text, I used the scanned books available at the Internet Archive (The first edition of 1904, and a reprinting from 1913 – the latter one has a better quality).\n", "\n", "Print Mark 1:1 from Text-Fabric data:" ] }, { "cell_type": "code", "execution_count": 17, "id": "553d07df-6b07-4884-940f-f4ce2c698c24", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ (Υἱοῦ Θεοῦ). '" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(139200,fmt='text-critical')" ] }, { "cell_type": "markdown", "id": "bd71bad9-f370-4e18-963f-4f4749c38db3", "metadata": {}, "source": [ "The result can be verified by examining the scans of the following printed versions:\n", "* Nestle version 1904: [@ archive.org](https://archive.org/details/the-greek-new-testament-nestle-1904-us-edition/page/84/mode/2up)\n", "* Nestle version 1913: [@ archive.org](https://archive.org/details/hkainediathekete00lond/page/88/mode/1up)\n", "\n", "Or, in an image, placed side by side:\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }