# Load the Text-Fabric dataset (N1904-TF)

## Table of content (ToC)<a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction</a>
    * <a href="#bullet1x1">1.1 - Text-Fabric data versions</a>
    * <a href="#bullet1x2">1.2 - Prerequisites / Installation</a>
    * <a href="#bullet1x3">1.3 - Updates</a>
* <a href="#bullet2">2 - Load Text-Fabric into memory</a>
    * <a href="#bullet2x1">2.1 - Load the Text-Fabric code</a>
    * <a href="#bullet2x2">2.2 - Load the Text-Fabric app and data</a>
    * <a href="#bullet2x3">2.3 - Push CSS code to the Notebook</a>
* <a href="#bullet3">3 - Notebook version details</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to ToC](#TOC)

This Jupyter Notebook provides detailed instructions on how to load the [CenterBLC/N1904 Text-Fabric dataset](https://centerblc.github.io/N1904/) into your Python environment. This will enable you to perform linguistic analysis on the Greek New Testament ([Nestle 1904, 7th edition](https://centerblc.github.io/N1904/about.html#the-nestle-text)).

## 1.1 - Text-Fabric data versions <a class="anchor" id="bullet1x1"></a>

The CenterBLC/N1904 Text-Fabric dataset is available as a collection of files hosted on [GitHub](https://github.com/CenterBLC/N1904). The files in this dataset can be distinguised into two main types:

* The [feature data files](https://centerblc.github.io/N1904/features/index.html#start) are stored in the directory [tf](https://github.com/CenterBLC/N1904/tree/main/tf) where each subdirectory maps to a specific version. Each version is accompanied with release information that can be [viewed here](https://github.com/CenterBLC/N1904/releases).

* The [application related files](https://github.com/CenterBLC/N1904/tree/main/app) are integral part of the Text-Fabric dataset and provide dataset-specific functionalities like [viewtypes](https://centerblc.github.io/N1904/viewtypes.html#start).

When invoking the latest version of the Text-Fabric dataset, the code downloads a single zip file instead of individual files. This file, 'complete.zip,' contains all the necessary files (and some bookkeeping files) for a [specific release](https://github.com/CenterBLC/N1904/releases). 

In case you want to load a specific version (other than the latest one) there may be the need to increase GitHub's rate limit. Instructions on how this can be achieved can be found in this [Jupyter Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Increase_GitHub_rate_limit.ipynb).

## 1.2 - Prerequisites / Installation<a class="anchor" id="bullet1x2"></a>

Before you can start using Text-Fabric, you need to set up a suitable Python environment (at least [Python version 3.7.0](https://annotation.github.io/text-fabric/tf/about/install.html)). An example of installing a Python environment using Anaconda is demonstrated in this [Jupyter Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Install_Python.ipynb). Further it is required to install the Text-Fabric package in this environment. Instructions on this are provided in this [Jupyter Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Install_Text-Fabric.ipynb). This setup process only needs to be done once. Afterward, the Text-Fabric code will be available for loading into your system's memory.

Besides keeping your Python environment updated, it is also advisable to periodically update your installed version of Text-Fabric to the latest or a more recent release. How to do this from within a Jupyter Notebook is demonstrated in [this Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Update_Text-Fabric.ipynb).

In certain situations (particularly when loading Text-Fabric datasets other than the latest version), it may also be necessary to increase the rate limit for GitHub. [See this Notebook](https://nbviewer.org/github/CenterBLC/N1904/tree/main/docs/tutorial/Increase_GitHub_rate_limit.ipynb) for more information. 

## 1.3 - Updates <a class="anchor" id="bullet1x3"></a>

The following [Jupyter Notebook](Update_Text-Fabric.ipynb) discusses the various aspects of updating your Text-Fabric version to other releases.

# 2 - Load Text-Fabric into memory <a class="anchor" id="bullet2"></a>
##### [Back to ToC](#TOC)

The instructions in this section need to be executed each time you want to use Text-Fabric. They will first load the Text-Fabric code and then load the data into memory.

## 2.1 - Load the Text-Fabric code <a class="anchor" id="bullet2x1"></a>

In [2]:
%load_ext autoreload
%autoreload 2

In [4]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

## 2.2 - Load Text-Fabric app and data <a class="anchor" id="bullet2x2"></a>

The following invocation of function [`use()`](https://annotation.github.io/text-fabric/tf/about/usefunc.html) loads all features of the corpus. It creates a datastructure (in this example `N1904`) with associated methods and function. Collectively this is refered to as the 'Advanced API', in the ['cheat sheet'](https://annotation.github.io/text-fabric/tf/cheatsheet.html) references to `A.*something*`. The exact name is however determend during invocation by the `use()` command. Hence, in this notebook references to this 'Advanced API' should be adressed as `N1904`. 

In [8]:
# load the N1904-TF app and data
N1904 = use ("CenterBLC/N1904", version="1.0.0", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7944,17.34,100
sentence,8011,17.2,100
group,8945,7.01,46
clause,42506,8.36,258
wg,106868,6.88,533
phrase,69007,1.9,95
subphrase,116178,1.6,135
word,137779,1.0,100


Display is setup for viewtype [syntax-view](https://github.com/CenterBLC/N1904/blob/main/docs/syntax-view.md#start)

See [here](https://github.com/CenterBLC/N1904/blob/main/docs/viewtypes.md#start) for more information on viewtypes

## 2.3 - Push CSS code to the Notebook<a class="anchor" id="bullet2x3"></a>

The following code is optional. Its main function is to ensure the formatting of Text-Fabric objects, such as tables and syntax trees, is properly displayed in the online Notebook Viewer, matching the way it is shown in the Jupyter Notebook itself. It is using the [`getCss(app)`](https://annotation.github.io/text-fabric/tf/advanced/display.html#tf.advanced.display.getCss) function to collect the complete CSS code from the TF and the app.

In [10]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())

Note: this is achieved by embedding the CSS code inside the notebook file. The content of the CSS code can be examined from this cells output (truncated):
<pre>
{
   "cell_type": "code",
   "execution_count": 7,
   "id": "932992c9-3fd9-4b5a-aa22-48eb376c8622",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "&lt;style&gt;tr.tf.ltr, td.tf.ltr, th.tf.ltr { text-align: left ! important;}\n",
       "tr.tf.rtl, td.tf.rtl, th.tf.rtl { text-align: right ! important;}\n",
       "@font-face {\n",
       "  font-family: \"Gentium Plus\";\n",
       
       ... etc ...
</pre>

# 3 - Notebook version details<a class="anchor" id="bullet3"></a>
##### [Back to ToC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.1</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>9 October 2024</td>
    </tr>
  </table>
</div>