{ "cells": [ { "cell_type": "markdown", "id": "bec25f1a", "metadata": {}, "source": [ "# Load the Text-Fabric dataset (N1904-TF)" ] }, { "cell_type": "markdown", "id": "fa7f85d5", "metadata": { "tags": [] }, "source": [ "## Table of content (TOC)\n", "* 1 - Introduction\n", " * 1.1 - Text-Fabric data versions\n", " * 1.2 - Prerequisites / Installation\n", " * 1.3 - Updates\n", "* 2 - Load Text-Fabric into memory\n", " * 2.1 - Load the Text-Fabric code\n", " * 2.2 - Load the Text-Fabric app and data\n", " * 2.3 - Push CSS code to the Notebook\n", "* 3 - Notebook version" ] }, { "cell_type": "markdown", "id": "c2c48614-5571-47f8-b70c-28f1ea58f97b", "metadata": { "tags": [] }, "source": [ "# 1 - Introduction \n", "##### [Back to TOC](#TOC)\n", "\n", "This Jupyter Notebook provides detailed instructions on how to load the [CenterBLC/N1904 Text-Fabric dataset](https://centerblc.github.io/N1904/) into your Python environment. This will enable you to perform linguistic analysis on the Greek New Testament ([Nestle 1904, 7th edition](https://centerblc.github.io/N1904/about.html#the-nestle-text))." ] }, { "cell_type": "markdown", "id": "76d1044b-e3b6-4a51-b9fa-5f3234a6d08b", "metadata": {}, "source": [ "## 1.1 - Text-Fabric data versions \n", "\n", "The CenterBLC/N1904 Text-Fabric dataset is available as a collection of files hosted on [GitHub](https://github.com/CenterBLC/N1904). The files in this dataset can be distinguised into two main types:\n", "\n", "* The [feature data files](https://centerblc.github.io/N1904/features/index.html#start) are stored in the directory [tf](https://github.com/CenterBLC/N1904/tree/main/tf) where each subdirectory maps to a specific version. Each version is accompanied with release information that can be [viewed here](https://github.com/CenterBLC/N1904/releases).\n", "\n", "* The [application related files](https://github.com/CenterBLC/N1904/tree/main/app) are integral part of the Text-Fabric dataset and provide dataset-specific functionalities like [viewtypes](https://centerblc.github.io/N1904/viewtypes.html#start).\n", "\n", "When invoking the latest version of the Text-Fabric dataset, the code downloads a single zip file instead of individual files. This file, 'complete.zip,' contains all the necessary files (and some bookkeeping files) for a [specific release](https://github.com/CenterBLC/N1904/releases). \n", "\n", "In case you want to load a specific version (other than the latest one) there may be the need to increase GitHub's rate limit. Instructions on how this can be achieved can be found in this [Jupyter Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Increase_GitHub_rate_limit.ipynb)." ] }, { "cell_type": "markdown", "id": "2e01787a-7480-43df-8ae4-6b73e0805f72", "metadata": {}, "source": [ "## 1.2 - Prerequisites / Installation\n", "\n", "Before you can start using Text-Fabric, you need to set up a suitable Python environment (at least [Python version 3.7.0](https://annotation.github.io/text-fabric/tf/about/install.html)). An example of installing a Python environment using Anaconda is demonstrated in this [Jupyter Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Install_Python.ipynb). Further it is required to install the Text-Fabric package in this environment. Instructions on this are provided in this [Jupyter Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Install_Text-Fabric.ipynb). This setup process only needs to be done once. Afterward, the Text-Fabric code will be available for loading into your system's memory.\n", "\n", "Besides keeping your Python environment updated, it is also advisable to periodically update your installed version of Text-Fabric to the latest or a more recent release. How to do this from within a Jupyter Notebook is demonstrated in [this Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Update_Text-Fabric.ipynb).\n", "\n", "In certain situations (particularly when loading Text-Fabric datasets other than the latest version), it may also be necessary to increase the rate limit for GitHub. [See this Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Increase_GitHub_rate_limit.ipynb) for more information. " ] }, { "cell_type": "markdown", "id": "edadeba4", "metadata": {}, "source": [ "# 2 - Load Text-Fabric into memory \n", "##### [Back to TOC](#TOC)\n", "\n", "The instructions in this section need to be executed each time you want to use Text-Fabric. They will first load the Text-Fabric code and then load the data into memory." ] }, { "cell_type": "markdown", "id": "2152b562-5135-4b27-bd56-b3dc7abaa031", "metadata": {}, "source": [ "## 2.1 - Load the Text-Fabric code " ] }, { "cell_type": "code", "execution_count": 2, "id": "a5bc2a5d", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 4, "id": "31f3bbde", "metadata": {}, "outputs": [], "source": [ "# Loading the Text-Fabric code\n", "# Note: it is assumed Text-Fabric is installed in your environment\n", "from tf.fabric import Fabric\n", "from tf.app import use" ] }, { "cell_type": "markdown", "id": "f8a57edd-2c89-406a-873f-e7f71a5539c3", "metadata": {}, "source": [ "## 2.2 - Load Text-Fabric app and data \n", "\n", "The following invocation of function [`use()`](https://annotation.github.io/text-fabric/tf/about/usefunc.html) loads all features of the corpus. It creates a datastructure (in this example `N1904`) with associated methods and function. Collectively this is refered to as the 'Advanced API', in the ['cheat sheet'](https://annotation.github.io/text-fabric/tf/cheatsheet.html) references to `A.*something*`. The exact name is however determend during invocation by the `use()` command. Hence, in this notebook references to this 'Advanced API' should be adressed as `N1904`. " ] }, { "cell_type": "code", "execution_count": 8, "id": "b8574f48", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/CenterBLC/N1904/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CenterBLC/N1904/tf/1.0.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.5.3, CenterBLC/N1904/app v3, Search Reference
\n", " Data: CenterBLC - N1904 1.0.0, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book275102.93100
chapter260529.92100
verse794417.34100
sentence801117.20100
group89457.0146
clause425068.36258
wg1068686.88533
phrase690071.9095
subphrase1161781.60135
word1377791.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Nestle 1904 Greek New Testament\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "\n", " material after the end of the word\n", "\n", "
\n", "\n", "
\n", " \n", "
int
\n", "\n", " 1 if it is an apposition container\n", "\n", "
\n", "\n", "
\n", "
\n", "articular\n", "
\n", "
int
\n", "\n", " 1 if the sentence, group, clause, phrase or wg has an article\n", "\n", "
\n", "\n", "
\n", "
\n", "before\n", "
\n", "
str
\n", "\n", " this is XML attribute before\n", "\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " book name (full name)\n", "\n", "
\n", "\n", "
\n", "
\n", "bookshort\n", "
\n", "
str
\n", "\n", " book name (abbreviated) from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "\n", " grammatical case\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " chapter number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "clausetype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "cls\n", "
\n", "
str
\n", "\n", " this is XML attribute cls\n", "\n", "
\n", "\n", "
\n", "
\n", "cltype\n", "
\n", "
str
\n", "\n", " clause type\n", "\n", "
\n", "\n", "
\n", "
\n", "criticalsign\n", "
\n", "
str
\n", "\n", " this is XML attribute criticalsign\n", "\n", "
\n", "\n", "
\n", "
\n", "crule\n", "
\n", "
str
\n", "\n", " clause rule (from xml attribute Rule)\n", "\n", "
\n", "\n", "
\n", "
\n", "degree\n", "
\n", "
str
\n", "\n", " grammatical degree\n", "\n", "
\n", "\n", "
\n", "
\n", "discontinuous\n", "
\n", "
int
\n", "\n", " 1 if the word is out of sequence in the xml\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " domain\n", "\n", "
\n", "\n", "
\n", "
\n", "framespec\n", "
\n", "
str
\n", "\n", " this is XML attribute framespec\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " this is XML attribute function\n", "\n", "
\n", "\n", "
\n", "
\n", "gender\n", "
\n", "
str
\n", "\n", " grammatical gender\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " English gloss (BGVB)\n", "\n", "
\n", "\n", "
\n", "
\n", "id\n", "
\n", "
str
\n", "\n", " xml id\n", "\n", "
\n", "\n", "
\n", "
\n", "junction\n", "
\n", "
str
\n", "\n", " type of junction\n", "\n", "
\n", "\n", "
\n", "
\n", "lang\n", "
\n", "
str
\n", "\n", " language the text is in\n", "\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "\n", " lexical lemma\n", "\n", "
\n", "\n", "
\n", "
\n", "lemmatranslit\n", "
\n", "
str
\n", "\n", " transliteration of the word lemma\n", "\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
str
\n", "\n", " ln\n", "\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "\n", " verbal mood\n", "\n", "
\n", "\n", "
\n", "
\n", "morph\n", "
\n", "
str
\n", "\n", " morphological code\n", "\n", "
\n", "\n", "
\n", "
\n", "nodeid\n", "
\n", "
str
\n", "\n", " node id (as in the XML source data)\n", "\n", "
\n", "\n", "
\n", "
\n", "normalized\n", "
\n", "
str
\n", "\n", " lemma normalized\n", "\n", "
\n", "\n", "
\n", "
\n", "note\n", "
\n", "
str
\n", "\n", " annotation of linguistic nature\n", "\n", "
\n", "\n", "
\n", "
\n", "num\n", "
\n", "
int
\n", "\n", " generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " grammatical number\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "person\n", "
\n", "
str
\n", "\n", " grammatical person\n", "\n", "
\n", "\n", "
\n", "
\n", "punctuation\n", "
\n", "
str
\n", "\n", " punctuation found after a word\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " biblical reference with word counting\n", "\n", "
\n", "\n", "
\n", "
\n", "referent\n", "
\n", "
str
\n", "\n", " number of referent\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " this is XML attribute rela\n", "\n", "
\n", "\n", "
\n", "
\n", "role\n", "
\n", "
str
\n", "\n", " role\n", "\n", "
\n", "\n", "
\n", "
\n", "rule\n", "
\n", "
str
\n", "\n", " syntactical rule\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " part-of-speach\n", "\n", "
\n", "\n", "
\n", "
\n", "strong\n", "
\n", "
int
\n", "\n", " strong number\n", "\n", "
\n", "\n", "
\n", "
\n", "subjrefspec\n", "
\n", "
str
\n", "\n", " this is XML attribute subjrefspec\n", "\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "\n", " verbal tense\n", "\n", "
\n", "\n", "
\n", "
\n", "text\n", "
\n", "
str
\n", "\n", " the text of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " material after the end of the word (excluding critical signs)\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " translation of the word surface text according to the Berean Interlinear Bible\n", "\n", "
\n", "\n", "
\n", "
\n", "translit\n", "
\n", "
str
\n", "\n", " transliteration of the word surface text\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " syntactical type (on sentence, group, clause or phrase)\n", "\n", "
\n", "\n", "
\n", "
\n", "typems\n", "
\n", "
str
\n", "\n", " morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)\n", "\n", "
\n", "\n", "
\n", "
\n", "unaccent\n", "
\n", "
str
\n", "\n", " word in unicode characters without accents and diacritical markers\n", "\n", "
\n", "\n", "
\n", "
\n", "unicode\n", "
\n", "
str
\n", "\n", " word in unicode characters plus material after it\n", "\n", "
\n", "\n", "
\n", "
\n", "variant\n", "
\n", "
str
\n", "\n", " this is XML attribute variant\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " verse number, from ref attribute in xml\n", "\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "\n", " verbal voice\n", "\n", "
\n", "\n", "
\n", "
\n", "frame\n", "
\n", "
str
\n", "\n", " frame\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "parent\n", "
\n", "
none
\n", "\n", " parent relationship between words\n", "\n", "
\n", "\n", "
\n", "
\n", "sibling\n", "
\n", "
int
\n", "\n", " this is XML attribute sibling\n", "\n", "
\n", "\n", "
\n", "
\n", "subjref\n", "
\n", "
none
\n", "\n", " number of subject referent\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: CenterBLC/N1904
  3. appPath: C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/app
  4. commit: gdb630837ae89b9468c9e50d13bda05cfd3de4f18
  5. css: ''
  6. dataDisplay:
    • excludedFeatures: []
    • noneValues:
      • none
      • unknown
      • no value
      • NA
    • sectionSep1:
    • sectionSep2: :
    • textFormat: text-orig-full
  7. docs:
    • docBase: https://github.com/CenterBLC/N1904/tree/main/docs
    • docPage: about
    • docRoot: https://github.com/CenterBLC/N1904
    • featureBase:https://github.com/CenterBLC/N1904/blob/main/docs/features/<feature>.md
    • featurePage: README
  8. interfaceDefaults: {fmt: text-orig-full}
  9. isCompatible: True
  10. local: local
  11. localDir:C:/Users/tonyj/text-fabric-data/github/CenterBLC/N1904/_temp
  12. provenanceSpec:
    • branch: main
    • corpus: Nestle 1904 Greek New Testament
    • doi: 10.5281/zenodo.13117910
    • moduleSpecs: []
    • org: CenterBLC
    • relative: /tf
    • repo: N1904
    • repro: N1904
    • version: 1.0.0
    • webBase: https://learner.bible/text/show_text/nestle1904/
    • webHint: Show this on the website
    • webLang: en
    • webUrl:https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: 1.0.0
  14. typeDisplay:
    • clause:
      • condense: True
      • label: {typ} {function} {rela} \\\\ {cls} {role} {junction}
      • style: ''
    • group:
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • phrase:
      • condense: True
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • sentence:
      • label: {typ} {function} {rela} \\\\ {role} {rule}
      • style: ''
    • subphrase:
      • label: {typ} {function} {rela} \\\\ {typems} {role} {rule}
      • style: ''
    • verse:
      • condense: True
      • label: {book} {chapter}:{verse}
      • style: ''
    • wg:
      • condense: True
      • label: {typems} {role} {rule} {junction}
      • style: ''
    • word:
      • features:
        • lemma
        • sp
      • featuresBare: [gloss]
  15. writing: grc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the N1904-TF app and data\n", "N1904 = use (\"CenterBLC/N1904\", version=\"1.0.0\", hoist=globals())" ] }, { "cell_type": "markdown", "id": "9bd5d827-7233-4ec9-99d8-f1ae7a2f1ab7", "metadata": {}, "source": [ "## 2.3 - Push CSS code to the Notebook\n", "\n", "The following code is optional. Its main function is to ensure the formatting of Text-Fabric objects, such as tables and syntax trees, is properly displayed in the online Notebook Viewer, matching the way it is shown in the Jupyter Notebook itself. It is using the [`getCss(app)`](https://annotation.github.io/text-fabric/tf/advanced/display.html#tf.advanced.display.getCss) function to collect the complete CSS code from the TF and the app." ] }, { "cell_type": "code", "execution_count": 10, "id": "932992c9-3fd9-4b5a-aa22-48eb376c8622", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)\n", "N1904.dh(N1904.getCss())" ] }, { "cell_type": "markdown", "id": "c20d3218-76b4-4bde-a82a-18e0528f1bed", "metadata": {}, "source": [ "Note: this is achieved by embedding the CSS code inside the notebook file. The content of the CSS code can be examined from this cells output (truncated):\n", "
\n",
    "{\n",
    "   \"cell_type\": \"code\",\n",
    "   \"execution_count\": 7,\n",
    "   \"id\": \"932992c9-3fd9-4b5a-aa22-48eb376c8622\",\n",
    "   \"metadata\": {},\n",
    "   \"outputs\": [\n",
    "    {\n",
    "     \"data\": {\n",
    "      \"text/html\": [\n",
    "       \"<style>tr.tf.ltr, td.tf.ltr, th.tf.ltr { text-align: left ! important;}\\n\",\n",
    "       \"tr.tf.rtl, td.tf.rtl, th.tf.rtl { text-align: right ! important;}\\n\",\n",
    "       \"@font-face {\\n\",\n",
    "       \"  font-family: \\\"Gentium Plus\\\";\\n\",\n",
    "       \n",
    "       ... etc ...\n",
    "
" ] }, { "cell_type": "markdown", "id": "a5705e69-0488-4dc4-a5fc-d1a3d2c44641", "metadata": {}, "source": [ "# 3 - Notebook version\n", "##### [Back to TOC](#TOC)\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AuthorTony Jurg
Version1.1
Date9 October 2024
\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }