{ "cells": [ { "cell_type": "markdown", "id": "bec25f1a", "metadata": {}, "source": [ "# Loading Text-Fabric (Nestle1904GBI)\n", "\n", "**Work in progress**" ] }, { "cell_type": "markdown", "id": "fa7f85d5", "metadata": { "tags": [] }, "source": [ "## Table of content \n", "* 1 - Introduction\n", " * 1.1 - Text-Fabric data versions\n", "* 2 - Preparation / installation\n", " * 2.1 - Install Python\n", " * 2.2 - Install Text-Fabric\n", " * 2.3 - Raise rate limit on Github\n", "* 3 - Load Text-Fabric into memory\n", " * 3.1 - Load the code\n", " * 3.2 - Load the app and data\n", " * 3.3 - Load the style sheets\n", "* 4 - Add additional features\n", " * 4.1 - The official method\n", " * 4.2 - The unofficial method\n", " * 4.3 - Additional dataset\n", " * 4.4 - Further reference\n", "* 5 - Using multiple Text-Fabric corpora" ] }, { "cell_type": "markdown", "id": "c2c48614-5571-47f8-b70c-28f1ea58f97b", "metadata": { "tags": [] }, "source": [ "# 1 - Introduction \n", "##### [back to TOC](#TOC)\n", "\n", "Basic instructions on loading the Text-Fabric and start using it on your system. It will provide examples of the various ways you can invoke Text-Fabric." ] }, { "cell_type": "markdown", "id": "76d1044b-e3b6-4a51-b9fa-5f3234a6d08b", "metadata": {}, "source": [ "### 1.1 - Text-Fabric data versions \n", "\n", "Some discussion related to versions" ] }, { "cell_type": "markdown", "id": "2e01787a-7480-43df-8ae4-6b73e0805f72", "metadata": {}, "source": [ "## 2 - Preparation / installation\n", "##### [back to TOC](#TOC)\n", "\n", "The instructions in this section are only required once to be executed. This will result in the Text-Fabric code being available for loading into memory of your system." ] }, { "cell_type": "markdown", "id": "bf3f556a-a84d-423d-833d-5a1f39dfd733", "metadata": {}, "source": [ "### 2.1 - Install Python \n", "\n", "You need to have Python on your system. Most systems have it out of the box,but alas, that is python2 and we need at least python **3.6**.\n", "\n", "Install it from [python.org](https://www.python.org) or from\n", "[Anaconda](https://www.anaconda.com/products/distribution)." ] }, { "cell_type": "markdown", "id": "d5a6e05a", "metadata": {}, "source": [ "### 2.2 - Install Text-Fabric \n", "\n", "(if not yet installed) \n", " \n", "**TF itself**\n", "\n", " pip3 install text-fabric\n", " \n", "**When using Jupyter notebook**\n", "\n", "You need [Jupyter](http://jupyter.org) or a platform like [Anaconda](https://www.anaconda.com/products/distribution) which includes Jupyter.\n", "\n", "If it is not already installed:\n", "\n", " pip3 install jupyter\n" ] }, { "cell_type": "markdown", "id": "8684bf30", "metadata": {}, "source": [ "### 2.3 - Raise rate limit on Github \n", "##### [back to TOC](#TOC)\n", "\n", "It may be required to increase rate limit for GitHub. [See instructions](https://annotation.github.io/text-fabric/tf/advanced/repo.html#increase-the-rate-limit) on aquiring and setting the GHPERS variable. \n", "See [here](https://www.howtogeek.com/789660/how-to-use-windows-cmd-environment-variables/#autotoc_anchor_2) if you want to set the varibale on windows using the command prompt." ] }, { "cell_type": "markdown", "id": "edadeba4", "metadata": {}, "source": [ "## 3 - Load Text-Fabric into memory \n", "##### [back to TOC](#TOC)\n", "\n", "The instructions in this section are required once to be executed each time you want to use Text_Fabric. It will load the Text-Fabric code and data into memory." ] }, { "cell_type": "markdown", "id": "2152b562-5135-4b27-bd56-b3dc7abaa031", "metadata": {}, "source": [ "### 3.1 - Load the code \n", "##### [back to TOC](#TOC)" ] }, { "cell_type": "code", "execution_count": 1, "id": "a5bc2a5d", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 1, "id": "31f3bbde", "metadata": {}, "outputs": [], "source": [ "# Loading the Text-Fabric code\n", "# Note: it is assumed Text-Fabric is installed in your environment\n", "from tf.fabric import Fabric\n", "from tf.app import use" ] }, { "cell_type": "markdown", "id": "f8a57edd-2c89-406a-873f-e7f71a5539c3", "metadata": {}, "source": [ "### 3.2 - Load app and data \n", "##### [back to TOC](#TOC)\n", "\n", "The following invocation of function [`use`](https://annotation.github.io/text-fabric/tf/about/usefunc.html) loads all features of the corpus (and extra modules, see section 4). It creates an variable (in this example `N1904GBI`) with its associated methods and function, the 'Advanced API'. In the 'cheat sheet' there are many references to `A.*something*`. In this notebook they should be read as `N1904GBI`. " ] }, { "cell_type": "code", "execution_count": 2, "id": "b8574f48", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 11.4.10, tonyjurg/Nestle1904GBI/app v3, Search Reference
\n", " Data: tonyjurg - Nestle1904GBI 0.4, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
book275102.93100
chapter260529.92100
sentence572024.09100
verse794317.35100
clause161248.54100
phrase726741.90100
word1377791.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Nestle 1904 (GBI nodes)\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "booknum\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "bookshort\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "clause\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "clauserule\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "clausetype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "degree\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "formaltag\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "functionaltag\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "lex_dom\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "monad\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "nodeID\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "normalized\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "person\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "phrase\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "phrasefunction\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "phrasefunctionlong\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "phrasetype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "sentence\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "splong\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "strongs\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "subj_ref\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "word\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the app and data\n", "N1904GBI = use (\"tonyjurg/Nestle1904GBI\", version=\"0.4\", hoist=globals())" ] }, { "cell_type": "markdown", "id": "45df826a-3ac1-478a-a2e8-cfd8663c0d1b", "metadata": {}, "source": [ "### 3.3 - Load the style sheets\n", "##### [back to TOC](#TOC)\n", "\n", "This step is stricly speaking not required when using Text-Fabric only localy. However, when making it available for tools like nbviewer, including this statement will show very handy since it ensures proper formatting. It is using function [`getCss`](https://annotation.github.io/text-fabric/tf/advanced/display.html#tf.advanced.display.getCss) to obtain all style information and uses function [`dh`](https://annotation.github.io/text-fabric/tf/advanced/helpers.html#tf.advanced.helpers.dh) to push it as HTML towards the Jupyter NoteBook." ] }, { "cell_type": "code", "execution_count": 4, "id": "55e8339d-c209-4be7-bec3-126d40f08565", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)\n", "N1904GBI.dh(N1904GBI.getCss())" ] }, { "cell_type": "markdown", "id": "9f8b75f6-3bf8-40ee-bb1e-a05714794a5b", "metadata": {}, "source": [ "# 4 - Add additional features\n", "##### [back to TOC](#TOC)\n", "\n", "\n", "**The following is optional.**\n" ] }, { "cell_type": "markdown", "id": "d0a09e00-481a-4c33-a7f6-9a53a84d8a31", "metadata": {}, "source": [ "## 4.1 - The official method\n", "##### [back to TOC](#TOC)\n", "\n", "Still to be done: find good example" ] }, { "cell_type": "code", "execution_count": 6, "id": "266af257-e70a-4ae9-87af-405ddb57fb3a", "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (1781119015.py, line 2)", "output_type": "error", "traceback": [ "\u001b[1;36m Input \u001b[1;32mIn [6]\u001b[1;36m\u001b[0m\n\u001b[1;33m N1904GBIMOD = use (\"tonyjurg/Nestle1904GBI\", version=\"0.4\", mod=f\"annotation/banks/sim/tf\" hoist=globals())\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "# load the app and data with ad\n", "N1904GBIMOD = use (\"tonyjurg/Nestle1904GBI\", version=\"0.4\", mod=f\"annotation/banks/sim/tf\" hoist=globals())" ] }, { "cell_type": "markdown", "id": "b144959b-2e67-414a-9d26-535e04cfeddf", "metadata": {}, "source": [ "## 4.2 - The unofficial method\n", "##### [back to TOC](#TOC)\n", "\n", "Warning: to use this method it is critical to verify that **ALL** the following match:\n", "* most importantly, the Text-Fabric dataset should be based upon the same corpus (in the most literal sense of the word!)\n", "* the node range(s) (check output of command `F.otype.all` or values found in file `otype.tf`).\n", "* the slot order (i.e. the order of the wordsin the Text-Fabric corpus; usualy refered to as monad).\n", "\n", "If these conditions are met, it is possible to copy the .tf files from the donor dataset to your local Text-Fabric directory.." ] }, { "cell_type": "markdown", "id": "44fd949d-c638-42d1-a235-b700c0a3454a", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## 4.3 - Additional dataset\n", "##### [back to TOC](#TOC)\n", "\n", "Some additional dataset that should work with this Text-Fabric implementation are:\n", "\n", "Dataset location | additions\n", "--- | ---\n", "[CenterBLC](https://github.com/CenterBLC/NA/tree/main/tf/202201) | *additional grammatical features, Bible Online Learner details*\n", " " ] }, { "cell_type": "markdown", "id": "0f50a69e-794d-4fa1-87c3-3cfdf7bd8b97", "metadata": {}, "source": [ "## 4.4 - Further reference\n", "##### [back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "17574d99-d4d9-4b00-a1f8-f5f7b721e391", "metadata": {}, "source": [ "Further reference [module tf.about.datasharing](https://annotation.github.io/text-fabric/tf/about/datasharing.html)" ] }, { "cell_type": "markdown", "id": "392e108c-fea8-44bd-9e20-213b9d68e499", "metadata": {}, "source": [ "# 5 - Using multiple Text-Fabric corpora\n", "##### [back to TOC](#TOC)\n", "\n", "When using multiple Text-Fabric corpora there are a few things to take care of.\n", "The most important are to invocate function [`use`](https://annotation.github.io/text-fabric/tf/about/usefunc.html) twice using a different variables (name) to create two Advanced API's. In the following example two `A` (Advanced API) objects are created named CORPUS1 and CORPUS2:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b8ff5040-8127-459b-ba4c-7d36ab948768", "metadata": {}, "outputs": [], "source": [ "CORPUS1 = use ()\n", "CORPUS2 = use ()" ] }, { "cell_type": "markdown", "id": "ea086155-a0dc-485f-aa6a-8c5e909b3f3c", "metadata": {}, "source": [ "**IMPORTANT:** When working with multiple corpora, do not add 'hoist=globals()' to the invocation!. See the comments on [section hoist of function use](https://annotation.github.io/text-fabric/tf/about/usefunc.html#hoisting)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }