{
"cells": [
{
"cell_type": "markdown",
"id": "1cf27c95-0b45-4d97-a62d-9950654eb386",
"metadata": {},
"source": [
"# Various text formats (N1904-TF)"
]
},
{
"cell_type": "markdown",
"id": "1495a021-daa1-4c2e-80d5-ab7d2d75bc3f",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"## Table of content (TOC) \n",
"* 1 - Introduction\n",
" * 1.1 - Naming schema for text formating\n",
"* 2 - Load Text-Fabric app and data\n",
"* 3 - Examining the text formats\n",
" * 3.1 - Display the formatting options available for this corpus\n",
" * 3.2 - Showcasing the various formats\n",
" * 3.3 - Transliterated text\n",
" * 3.4 - Text with text critical markers\n",
" * 3.5 - Nestle version 1904 and version 1913 (Mark 1:1)\n",
"* 4 - Notebook version"
]
},
{
"cell_type": "markdown",
"id": "e6830070-1e97-4bdf-aa0c-5eda4e624a84",
"metadata": {},
"source": [
"# 1 - Introduction \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"This Jupyter Notebook is designed to demonstrate the predefined text formats available in this Text-Fabric dataset, specifically focusing on displaying the Greek surface text of the New Testament.\n",
"\n",
"Text-Fabric's data design allows for flexible representation of the corpus text but requires at least one text format to be specified as its default (in this dataset: text-orig-full). During the creation of the dataset, additional formats relevant to this corpus were defined, which are basically based on a subset of the following surface text-related features:\n",
"\n",
" * [after](https://centerblc.github.io/N1904/features/after.html#start): All material found after a word (including text-critical signs).\n",
" * [before](https://centerblc.github.io/N1904/features/before.html#start): All material found before a word.\n",
" * [criticalsign](https://centerblc.github.io/N1904/features/criticalsign.html#start): Text-critical signs.\n",
" * [normalized](https://centerblc.github.io/N1904/features/normalized.html#start): Normalized Greek text.\n",
" * [punctuation](https://centerblc.github.io/N1904/features/punctuation.html#start): Punctuations found after a word.\n",
" * [text](https://centerblc.github.io/N1904/features/text.html#start): Word without punctuations and text-critical signs.\n",
" * [trailer](https://centerblc.github.io/N1904/features/trailer.html#start): All material found after a word (excluding text-critical signs).\n",
" * [translit](https://centerblc.github.io/N1904/features/translit.html#start): Transliteration of the word surface texts.\n",
" * [unaccent](https://centerblc.github.io/N1904/features/unaccent.html#start): Word without accents and diacritical markers.\n",
" * [unicode](https://centerblc.github.io/N1904/features/unicode.html#start): Unicode presentation including all material before and after word.\n",
"\n",
"The relation between these features in relation to the surface text is shown in the following image.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "575903e1-59ca-46ca-9656-34ff80e433d5",
"metadata": {
"tags": []
},
"source": [
"## 1.1 - Naming schema for text formating\n",
"\n",
"The text formats in this Text-Fabric database are identified by unique names that reflect their actual formats. These names follow a structured naming schema, consisting of a string of keywords separated by hyphens (-).\n",
"\n",
"```\n",
" `what`-`how`-`fullness`\n",
"```\n",
"\n",
"In our database the following keywords are used:\n",
"\n",
"\n",
"\n",
"
\n",
" Keyword | Value | Meaning |
\n",
" what | text | words as they belong to the text |
\n",
" what | lex | lexemes of the words |
\n",
" how | orig | the original Greek script (all Unicode) | \n",
"
how | unaccent | the original Greek script without accents |
\n",
" how | translit | transliteration into Latin alphabet |
\n",
" fullness | full | complete text with text-critical markers |
\n",
" fullness | plain | complete text without text-critical markers |
\n",
"
\n"
]
},
{
"cell_type": "markdown",
"id": "1dcaad00-1386-49ba-a775-04e48cc29139",
"metadata": {
"tags": []
},
"source": [
"Not all possible combinations are defined or relevant. The following text-formatting options are defined:\n",
"\n",
"\n",
" Format | Usage | Template |
\n",
" lex-orig-plain | Lexemes of the Greek surface text | \n",
" {lemma}{trailer} | \n",
"
\n",
" lex-translit-plain | Transliteration of the lexemes of the Greek surface text | \n",
" {lemmatranslit}\n",
" {trailer} | \n",
"
\n",
" text-orig-full (default) | The Greek surface text in unicode including text-critical markers | \n",
" {before}\n",
" {text}\n",
" {after} | \n",
"
\n",
" text-orig-plain | The Greek surface text in unicode | \n",
" {text}\n",
" {trailer} | \n",
"
\n",
" text-translit-plain | Transliteration of the Greek surface text | \n",
" {translit}\n",
" {trailer} | \n",
"
\n",
" text-unaccent-plain | The Greek surface text in unicode without accents | \n",
" {unaccent}\n",
" {trailer} | \n",
"
\n",
"
\n"
]
},
{
"cell_type": "markdown",
"id": "a1b900e2-995f-4f36-ad74-d821092ca02c",
"metadata": {},
"source": [
"# 2 - Load Text-Fabric app and data \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6bd6c621-361d-487f-a8df-c27fb1ec9de2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0071a0db-916c-4357-88bd-6b3255af0764",
"metadata": {},
"outputs": [],
"source": [
"# Loading the Text-Fabric code\n",
"# Note: it is assumed Text-Fabric is installed in your environment\n",
"from tf.fabric import Fabric\n",
"from tf.app import use"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ed76db5d-5463-4bf1-99ca-7f14b3a0f277",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"**Locating corpus resources ...**"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"app: ~/text-fabric-data/github/CenterBLC/N1904/app"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/github/CenterBLC/N1904/tf/1.0.0"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" TF: TF API 12.5.3, CenterBLC/N1904/app v3, Search Reference
\n",
" Data: CenterBLC - N1904 1.0.0, Character table, Feature docs
\n",
" Node types
\n",
"\n",
" \n",
" Name | \n",
" # of nodes | \n",
" # slots / node | \n",
" % coverage | \n",
"
\n",
"\n",
"\n",
" book | \n",
" 27 | \n",
" 5102.93 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" chapter | \n",
" 260 | \n",
" 529.92 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" verse | \n",
" 7944 | \n",
" 17.34 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" sentence | \n",
" 8011 | \n",
" 17.20 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" group | \n",
" 8945 | \n",
" 7.01 | \n",
" 46 | \n",
"
\n",
"\n",
"\n",
" clause | \n",
" 42506 | \n",
" 8.36 | \n",
" 258 | \n",
"
\n",
"\n",
"\n",
" wg | \n",
" 106868 | \n",
" 6.88 | \n",
" 533 | \n",
"
\n",
"\n",
"\n",
" phrase | \n",
" 69007 | \n",
" 1.90 | \n",
" 95 | \n",
"
\n",
"\n",
"\n",
" subphrase | \n",
" 116178 | \n",
" 1.60 | \n",
" 135 | \n",
"
\n",
"\n",
"\n",
" word | \n",
" 137779 | \n",
" 1.00 | \n",
" 100 | \n",
"
\n",
"
\n",
" Sets: no custom sets
\n",
" Features:
\n",
"Nestle 1904 Greek New Testament
\n",
" \n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
material after the end of the word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
1 if it is an apposition container\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
1 if the sentence, group, clause, phrase or wg has an article\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute before\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
book name (full name)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
book name (abbreviated) from ref attribute in xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical case\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
chapter number, from ref attribute in xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
clause type\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute cls\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
clause type\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute criticalsign\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
clause rule (from xml attribute Rule)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical degree\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
1 if the word is out of sequence in the xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
domain\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute framespec\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute function\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical gender\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
English gloss (BGVB)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
xml id\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
type of junction\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
language the text is in\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
lexical lemma\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
transliteration of the word lemma\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
ln\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
verbal mood\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
morphological code\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
node id (as in the XML source data)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
lemma normalized\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
annotation of linguistic nature\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical number\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
grammatical person\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
punctuation found after a word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
biblical reference with word counting\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
number of referent\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute rela\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
role\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
syntactical rule\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
part-of-speach\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
strong number\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute subjrefspec\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
verbal tense\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
the text of a word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
material after the end of the word (excluding critical signs)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
translation of the word surface text according to the Berean Interlinear Bible\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
transliteration of the word surface text\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
syntactical type (on sentence, group, clause or phrase)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
word in unicode characters without accents and diacritical markers\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
word in unicode characters plus material after it\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
this is XML attribute variant\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
verse number, from ref attribute in xml\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
verbal voice\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
frame\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
parent relationship between words\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
this is XML attribute sibling\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
number of subject referent\n",
"\n",
"
\n",
"\n",
"
\n",
" \n",
"\n",
" Settings:
specified
- apiVersion:
3
- appName:
CenterBLC/N1904
- appPath:
C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/app
- commit:
gdb630837ae89b9468c9e50d13bda05cfd3de4f18
- css:
''
dataDisplay:
- excludedFeatures:
[]
noneValues:
- sectionSep1:
- sectionSep2:
:
- textFormat:
text-orig-full
docs:
- docBase:
https://github.com/CenterBLC/N1904/tree/main/docs
- docPage:
about
- docRoot:
https://github.com/CenterBLC/N1904
featureBase:
https://github.com/CenterBLC/N1904/blob/main/docs/features/<feature>.md
- featurePage:
README
- interfaceDefaults: {fmt:
text-orig-full
} - isCompatible:
True
- local:
local
localDir:
C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/_temp
provenanceSpec:
- branch:
main
- corpus:
Nestle 1904 Greek New Testament
- doi:
10.5281/zenodo.13117910
- moduleSpecs:
[]
- org:
CenterBLC
- relative:
/tf
- repo:
N1904
- repro:
N1904
- version:
1.0.0
- webBase:
https://learner.bible/text/show_text/nestle1904/
- webHint:
Show this on the website
- webLang:
en
webUrl:
https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
- webUrlLex:
{webBase}/word?version={version}&id=<lid>
- release:
1.0.0
typeDisplay:
clause:
- condense:
True
- label:
{typ} {function} {rela} \\\\ {cls} {role} {junction}
- style:
''
group:
- label:
{typ} {function} {rela} \\\\ {typems} {role} {rule}
- style:
''
phrase:
- condense:
True
- label:
{typ} {function} {rela} \\\\ {typems} {role} {rule}
- style:
''
sentence:
- label:
{typ} {function} {rela} \\\\ {role} {rule}
- style:
''
subphrase:
- label:
{typ} {function} {rela} \\\\ {typems} {role} {rule}
- style:
''
verse:
- condense:
True
- label:
{book} {chapter}:{verse}
- style:
''
wg:
- condense:
True
- label:
{typems} {role} {rule} {junction}
- style:
''
word:
features:
- featuresBare: [
gloss
]
- writing:
grc
\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# load the N1904 app and data\n",
"N1904 = use (\"CenterBLC/N1904\", version=\"1.0.0\", hoist=globals())"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d5da5d1a-6827-49b3-ad37-7ca29ba59b45",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)\n",
"N1904.dh(N1904.getCss())"
]
},
{
"cell_type": "markdown",
"id": "58ef1678-a19d-4c0c-80f3-84f8471a90e2",
"metadata": {
"tags": []
},
"source": [
"# 3 - Examining the text format\n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "markdown",
"id": "b59c83bd-329d-4820-8bcc-ca92e1c55f6d",
"metadata": {},
"source": [
"## 3.1 - Display the text formatting options available for this corpus\n",
"\n",
"The output of the following command provides details on available formats to present the text of the corpus. \n",
"\n",
"See also module tf.advanced.options\n",
"Display Settings."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1d4b1b93-08e5-41f4-a587-66e444a3e271",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"format | level | template\n",
"--- | --- | ---\n",
"`lex-orig-plain` | **word** | `{lemma}{trailer}`\n",
"`lex-translit-plain` | **word** | `{lemmatranslit}{trailer}`\n",
"`text-orig-full` | **word** | `{before}{text}{after}`\n",
"`text-orig-plain` | **word** | `{text}{trailer}`\n",
"`text-translit-plain` | **word** | `{translit}{trailer}`\n",
"`text-unaccent-plain` | **word** | `{unaccent}{trailer}`\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"N1904.showFormats()"
]
},
{
"cell_type": "markdown",
"id": "e4174cea-13db-411b-8bb3-5b17a20941c0",
"metadata": {},
"source": [
"Note 1: This data originates from the file [`otext.tf`](https://github.com/CenterBLC/N1904/blob/main/tf/1.0.0/otext.tf):\n",
"\n",
"> \n",
"```\n",
"@config\n",
"...\n",
"@fmt:text-orig-full={before}{text}{after}\n",
"...\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "5c5c346a-826e-4fcd-a23d-c331cf888d29",
"metadata": {},
"source": [
"Note 2: The names of the available formats can also be obtaind by using the following call. However, this will not display the features that are included into the format. The function will return a list of ordered tuples that can easily be postprocessed:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "acaaf356-eeae-4101-b5ef-090607dca5fc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'lex-orig-plain': 'word',\n",
" 'lex-translit-plain': 'word',\n",
" 'text-orig-full': 'word',\n",
" 'text-orig-plain': 'word',\n",
" 'text-translit-plain': 'word',\n",
" 'text-unaccent-plain': 'word'}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"T.formats"
]
},
{
"cell_type": "markdown",
"id": "08c67b53-bd6c-42e6-a0cf-b7f609cd9879",
"metadata": {},
"source": [
"## 3.2 - Showcasing the various formats\n",
"\n",
"This section will demonstrate the differences in how various text formats are displayed, using the verse Mark 1:1 as an example. To locate the corresponding verse node for Mark 1:1 in this dataset, the following command can be executed."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "73f60106-c700-41d6-88e5-e0dc0d5d49e9",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"383782"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"T.nodeFromSection(['Mark', 1, 1])"
]
},
{
"cell_type": "markdown",
"id": "b3d38167-4076-4afe-a05e-a2cb85c3749b",
"metadata": {
"tags": []
},
"source": [
"The returned integer represents the numeric value of the verse node for Mark 1:1. This value can now be used in the following Python snippet to iterate through the defined text formats."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "ea12ca08-505a-4497-bc18-3b9247502350",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"fmt=lex-orig-plain\t: ἀρχή ὁ εὐαγγέλιον Ἰησοῦς Χριστός υἱός θεός. \n",
"fmt=lex-translit-plain\t: arkhe o euaggelion Iesous Khristos uios theos. \n",
"fmt=text-orig-full\t: Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ (Υἱοῦ Θεοῦ). \n",
"fmt=text-orig-plain\t: Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ Υἱοῦ Θεοῦ. \n",
"fmt=text-translit-plain\t: Arkhe tou euaggeliou Iesou Khristou Uiou Theou. \n",
"fmt=text-unaccent-plain\t: Αρχη του ευαγγελιου Ιησου Χριστου Υιου Θεου. \n"
]
}
],
"source": [
"for formats in T.formats:\n",
" print(f'fmt={formats}\\t: {T.text(383782,formats)}')"
]
},
{
"cell_type": "markdown",
"id": "17670341-3d4c-4c15-9e75-605ce8b4b162",
"metadata": {},
"source": [
"## 3.3 - Transliterated text"
]
},
{
"cell_type": "markdown",
"id": "92e8a14a-6cd3-482a-95ae-49b1597d2795",
"metadata": {},
"source": [
"Using transliterated text can be convenient for crafting queries, as it allows you to use your regular keyboard without needing to input Greek characters. The following example query efficiently retrieves all occurrences of the Greek conjunction 'δὲ'"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "bddbf2e8-11a6-4d3b-8372-ba673b54854b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.09s 2769 results\n",
"Word Frequency\n",
"------------------------------\n",
"δὲ 2620\n",
"δέ 144\n",
"δὴ 4\n",
"δή 1\n"
]
}
],
"source": [
"LatinQuery = '''\n",
"word translit=de\n",
"'''\n",
"Result = N1904.search(LatinQuery) \n",
"\n",
"from collections import Counter\n",
"# Initialize a counter to store word frequencies\n",
"word_counts = Counter()\n",
"# Loop through the results and count the occurrences of each word\n",
"for tuple in Result:\n",
" word = F.text.v(tuple[0])\n",
" word_counts[word] += 1\n",
"# Convert the counter into a list of tuples (word, frequency)\n",
"word_frequencies = word_counts.most_common()\n",
"# Print the word frequency table\n",
"print(f\"{'Word':<20}{'Frequency'}\")\n",
"print(\"-\" * 30)\n",
"for word, freq in word_frequencies:\n",
" print(f\"{word:<20}{freq}\")"
]
},
{
"cell_type": "markdown",
"id": "f599fe46-fdd5-47ab-8093-f77b2374eba0",
"metadata": {},
"source": [
"This example highlights the importance of careful use of transliteration. While the vast majority of the results match the expected word, an additional 5 results (approximately 0.18% of the total) correspond to a different - but sound-alike - word, the emphatic particle δὴ."
]
},
{
"cell_type": "markdown",
"id": "9f37dadc-c4b0-4c88-979d-7c6fcb369e6f",
"metadata": {},
"source": [
"## 3.4 - Text with text critical markers"
]
},
{
"cell_type": "markdown",
"id": "845d605f-cd7e-4abd-bd0c-b18f572f09cd",
"metadata": {},
"source": [
"The base text of this Text-Fabric dataset is based upon the Nestle version or 1913, as explained on sites.google.com/site/nestle1904/faq:\n",
"\n",
"> *What are your sources?*\n",
"> For the text, I used the scanned books available at the Internet Archive (The first edition of 1904, and a reprinting from 1913 – the latter one has a better quality).\n",
"\n",
"This version does have a limited amount of textual critical markers embedded in the base text. We have preserved this in text format 'text-orig-full', which can be printed using the following command. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "553d07df-6b07-4884-940f-f4ce2c698c24",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ (Υἱοῦ Θεοῦ). '"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"T.text(383782,fmt='text-orig-full')"
]
},
{
"cell_type": "markdown",
"id": "bd71bad9-f370-4e18-963f-4f4749c38db3",
"metadata": {},
"source": [
"## 3.5 - Nestle version 1904 and version 1913 (Mark 1:1)\n",
"\n",
"The previous result can be verified by examining the scans of the following printed versions:\n",
"\n",
"* Nestle version 1904: @ archive.org\n",
"* Nestle version 1913: @ archive.org\n",
"\n",
"Or, in an image, placed side by side:\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "ceaee907-0017-46cf-acd8-7e803f5286f1",
"metadata": {},
"source": [
"# 4 - Notebook version\n",
"##### [Back to TOC](#TOC)\n",
"\n",
"\n",
"
\n",
" \n",
" Author | \n",
" Tony Jurg | \n",
"
\n",
" \n",
" Version | \n",
" 1.0 | \n",
"
\n",
" \n",
" Date | \n",
" 9 October 2024 | \n",
"
\n",
"
\n",
"
"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}