{
"cells": [
{
"cell_type": "markdown",
"id": "1cf27c95-0b45-4d97-a62d-9950654eb386",
"metadata": {},
"source": [
"# Some system statistics (Nestle1904GBI)"
]
},
{
"cell_type": "markdown",
"id": "1495a021-daa1-4c2e-80d5-ab7d2d75bc3f",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"## Table of content \n",
"* 1 - Introduction\n",
"* 2 - Load Text-Fabric app and data\n",
"* 3 - Performing the queries\n",
" * 3.1 - Print the Text-Fabric version\n",
" * 3.2 - Dump selection of header \n",
" * 3.3 - Memory footprint \n",
" * 3.4 - List loaded features \n",
" * 3.5 - Statistics on node types\n",
" * 3.6 - Node number ranges\n",
"* 4 - Required libraries"
]
},
{
"cell_type": "markdown",
"id": "e6830070-1e97-4bdf-aa0c-5eda4e624a84",
"metadata": {},
"source": [
"# 1 - Introduction \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"This Jupyter Notebook showcases several examples of statistical analysis performed on a Text-Fabric corpus."
]
},
{
"cell_type": "markdown",
"id": "a1b900e2-995f-4f36-ad74-d821092ca02c",
"metadata": {},
"source": [
"# 2 - Load Text-Fabric app and data \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6bd6c621-361d-487f-a8df-c27fb1ec9de2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0071a0db-916c-4357-88bd-6b3255af0764",
"metadata": {},
"outputs": [],
"source": [
"# Loading the Text-Fabric code\n",
"# Note: it is assumed Text-Fabric is installed in your environment\n",
"from tf.fabric import Fabric\n",
"from tf.app import use"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ed76db5d-5463-4bf1-99ca-7f14b3a0f277",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"**Locating corpus resources ...**"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"app: ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/app"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" Text-Fabric: Text-Fabric API 11.4.10, tonyjurg/Nestle1904GBI/app v3, Search Reference
\n",
" Data: tonyjurg - Nestle1904GBI 0.4, Character table, Feature docs
\n",
" Node types
\n",
"\n",
" \n",
" Name | \n",
" # of nodes | \n",
" # slots/node | \n",
" % coverage | \n",
"
\n",
"\n",
"\n",
" book | \n",
" 27 | \n",
" 5102.93 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" chapter | \n",
" 260 | \n",
" 529.92 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" sentence | \n",
" 5720 | \n",
" 24.09 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" verse | \n",
" 7943 | \n",
" 17.35 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" clause | \n",
" 16124 | \n",
" 8.54 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" phrase | \n",
" 72674 | \n",
" 1.90 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" word | \n",
" 137779 | \n",
" 1.00 | \n",
" 100 | \n",
"
\n",
"
\n",
" Sets: no custom sets
\n",
" Features:
\n",
"Nestle 1904 (GBI nodes)
\n",
" \n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
" \n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# load the N1904 app and data\n",
"N1904 = use (\"tonyjurg/Nestle1904GBI\", version=\"0.4\", hoist=globals())"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8da62ff0-0aa7-4830-8637-2eeee8bb67d8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)\n",
"N1904.dh(N1904.getCss())"
]
},
{
"cell_type": "markdown",
"id": "58ef1678-a19d-4c0c-80f3-84f8471a90e2",
"metadata": {
"tags": []
},
"source": [
"# 3 - Performing the queries \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "markdown",
"id": "c0d05d38-a888-4397-92ff-a26451f80eda",
"metadata": {},
"source": [
"## 3.1 - Print the Text-Fabric version \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"Although this is somewhat trivial, this example does serve a purpose. We will print te version by means of calling the Text-Fabric parameter [VERSION](https://annotation.github.io/text-fabric/tf/parameters.html#tf.parameters.VERSION) which is fixed for the whole programm. To access any of these parameters in our notebook, it first needs to be imported from `tf.parameters`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5076b0c8-728d-4488-a75c-ca4758e58ecf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TextFabric version: 11.4.10\n"
]
}
],
"source": [
"from tf.parameters import VERSION\n",
"print ('TextFabric version: {}'.format(VERSION))"
]
},
{
"cell_type": "markdown",
"id": "bdefff93-b65d-4cbf-bf2f-6b3cca6b2d3a",
"metadata": {},
"source": [
"Note that any other parameters can be dumped in similar manner."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b8e8ce2d-43db-48dd-ace9-2156c7046692",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"tonyjurg/Nestle1904GBI app context
\n",
"\n",
" \n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"N1904.showContext(...)"
]
},
{
"cell_type": "markdown",
"id": "75627859-1d9c-4d99-9020-d2302f6de408",
"metadata": {},
"source": [
"## 3.2 - Dump selection of header\n",
"##### [Back to TOC](#TOC)\n",
"\n",
"In this example the header of the loaded Text-Fabric dataset is dumped. This is done by means of an API call to [`A.header()`](https://annotation.github.io/text-fabric/tf/advanced/links.html#tf.advanced.links.header). \n",
"\n",
"Please note that in the example below `A` is replaced by `N1904`. This is result of the method of incantation:\n",
"> N1904 = use (... *etc* ... )\n",
"\n",
"The [`use`](https://annotation.github.io/text-fabric/tf/app.html#tf.app.use) function returns an oject whose attributes and methods constitute the advanced API. In the \n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b5ce40f1-9a22-444f-955a-c5545797a056",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" Text-Fabric: Text-Fabric API 11.4.10, tonyjurg/Nestle1904GBI/app v3, Search Reference
\n",
" Data: tonyjurg - Nestle1904GBI 0.4, Character table, Feature docs
\n",
" Node types
\n",
"\n",
" \n",
" Name | \n",
" # of nodes | \n",
" # slots/node | \n",
" % coverage | \n",
"
\n",
"\n",
"\n",
" book | \n",
" 27 | \n",
" 5102.93 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" chapter | \n",
" 260 | \n",
" 529.92 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" sentence | \n",
" 5720 | \n",
" 24.09 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" verse | \n",
" 7943 | \n",
" 17.35 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" clause | \n",
" 16124 | \n",
" 8.54 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" phrase | \n",
" 72674 | \n",
" 1.90 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" word | \n",
" 137779 | \n",
" 1.00 | \n",
" 100 | \n",
"
\n",
"
\n",
" Sets: no custom sets
\n",
" Features:
\n",
"Nestle 1904 (GBI nodes)
\n",
" \n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
" \n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"N1904.header(allMeta=False)"
]
},
{
"cell_type": "markdown",
"id": "270f57c1-35a7-4b90-aee9-09033fb15390",
"metadata": {},
"source": [
"## 3.3 - Memory footprint \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"The following API call [`footprint`](https://annotation.github.io/text-fabric/tf/core/api.html#tf.core.api.Api.footprint) provides a nicely formatted overview of memory footprint for each of the features in the Text_fabric corpus."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f924907d-ec13-431d-bd6d-9e512d809285",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" "
]
},
{
"data": {
"text/markdown": [
"\n",
"# 49 features\n",
"\n",
"feature | members | size in bytes\n",
"--- | --- | ---\n",
"__levUp__ | 240,527 | 27,849,840\n",
"phrase | 210,453 | 21,708,200\n",
"nodeID | 137,779 | 17,505,299\n",
"__boundary__ | 2 | 15,456,576\n",
"monad | 137,779 | 12,951,424\n",
"clause | 153,930 | 12,487,268\n",
"oslots | 3 | 12,055,800\n",
"sentence | 143,553 | 10,903,980\n",
"word | 137,779 | 10,862,812\n",
"normalized | 137,779 | 10,773,392\n",
"gloss | 137,779 | 10,312,734\n",
"book | 162,187 | 9,785,747\n",
"bookshort | 162,187 | 9,785,636\n",
"booknum | 162,187 | 9,784,204\n",
"chapter | 162,160 | 9,783,448\n",
"verse | 161,900 | 9,776,168\n",
"__levDown__ | 102,748 | 9,727,776\n",
"lemma | 137,779 | 9,581,098\n",
"ln | 137,779 | 9,532,549\n",
"subj_ref | 137,779 | 9,454,588\n",
"strongs | 137,779 | 9,382,667\n",
"lex_dom | 137,779 | 9,198,787\n",
"functionaltag | 137,779 | 9,160,464\n",
"formaltag | 137,779 | 9,159,903\n",
"sp | 137,779 | 9,101,359\n",
"type | 137,779 | 9,101,355\n",
"splong | 137,779 | 9,101,353\n",
"mood | 137,779 | 9,101,184\n",
"tense | 137,779 | 9,101,170\n",
"after | 137,779 | 9,101,136\n",
"case | 137,779 | 9,101,118\n",
"voice | 137,779 | 9,101,059\n",
"gn | 137,779 | 9,101,001\n",
"person | 137,779 | 9,100,994\n",
"degree | 137,779 | 9,100,951\n",
"nu | 137,779 | 9,100,943\n",
"number | 137,779 | 9,100,943\n",
"__order__ | 240,527 | 8,659,012\n",
"phrasetype | 72,674 | 4,658,567\n",
"phrasefunction | 72,674 | 4,657,128\n",
"phrasefunctionlong | 72,674 | 4,657,002\n",
"__structure__ | 6 | 4,023,786\n",
"clauserule | 16,124 | 1,075,551\n",
"__rank__ | 240,527 | 1,022,312\n",
"otype | 4 | 822,535\n",
"__sections__ | 2 | 573,560\n",
"clausetype | 3,846 | 255,410\n",
"__characters__ | 1 | 30,405\n",
"__levels__ | 7 | 1,519\n",
"TOTAL | 5,825,378 | 435,731,713"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"TF.footprint()"
]
},
{
"cell_type": "markdown",
"id": "d23c6817",
"metadata": {},
"source": [
"## 3.4 - List loaded features \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"The API call [`A.isLoaded()`](https://annotation.github.io/text-fabric/tf/core/api.html#tf.core.api.Api.isLoaded) will show information about loaded features."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "64e9b90d-49ef-4675-970f-5b893c97bc87",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"__boundary__ computed \n",
"__characters__ computed \n",
"__levDown__ computed \n",
"__levUp__ computed \n",
"__levels__ computed \n",
"__order__ computed \n",
"__rank__ computed \n",
"__sections__ computed \n",
"__structure__ computed \n",
"after node (str)\n",
"book node (str)\n",
"booknum node (int)\n",
"bookshort node (str)\n",
"case node (str)\n",
"chapter node (int)\n",
"clause node (int)\n",
"clauserule node (str)\n",
"clausetype node (str)\n",
"degree node (str)\n",
"formaltag node (str)\n",
"functionaltag node (str)\n",
"gloss node (str)\n",
"gn node (str)\n",
"lemma node (str)\n",
"lex_dom node (str)\n",
"ln node (str)\n",
"monad node (int)\n",
"mood node (str)\n",
"nodeID node (str)\n",
"normalized node (str)\n",
"nu node (str)\n",
"number node (str)\n",
"oslots edge \n",
"otext config \n",
"otype node (str)\n",
"person node (str)\n",
"phrase node (int)\n",
"phrasefunction node (str)\n",
"phrasefunctionlong node (str)\n",
"phrasetype node (str)\n",
"reference NOT LOADED\n",
"sentence node (int)\n",
"sp node (str)\n",
"splong node (str)\n",
"strongs node (str)\n",
"subj_ref node (str)\n",
"tense node (str)\n",
"type node (str)\n",
"verse node (int)\n",
"voice node (str)\n",
"word node (str)\n"
]
}
],
"source": [
"N1904.isLoaded()"
]
},
{
"cell_type": "markdown",
"id": "b3aabb3a-63a9-4432-bdc9-82a7ec20bc0d",
"metadata": {
"tags": []
},
"source": [
"## 3.5 - Statistics on node types\n",
"##### [Back to TOC](#TOC)\n",
"\n",
"This example will show various statistics on node types. The call to [`C.levels.data`](https://annotation.github.io/text-fabric/tf/core/prepare.html#tf.core.prepare.levels) results in list of ordered tuples which will be nicely displayed using the tabulate function."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c797fa57-d536-4471-b44d-d3a45653f34a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"╒══════════╤══════════════════════╤═════════╤════════╕\n",
"│ Node │ Avarage # of slots │ first │ last │\n",
"╞══════════╪══════════════════════╪═════════╪════════╡\n",
"│ book │ 5102.93 │ 137780 │ 137806 │\n",
"├──────────┼──────────────────────┼─────────┼────────┤\n",
"│ chapter │ 529.919 │ 137807 │ 138066 │\n",
"├──────────┼──────────────────────┼─────────┼────────┤\n",
"│ sentence │ 24.0872 │ 226865 │ 232584 │\n",
"├──────────┼──────────────────────┼─────────┼────────┤\n",
"│ verse │ 17.346 │ 232585 │ 240527 │\n",
"├──────────┼──────────────────────┼─────────┼────────┤\n",
"│ clause │ 8.54496 │ 138067 │ 154190 │\n",
"├──────────┼──────────────────────┼─────────┼────────┤\n",
"│ phrase │ 1.89585 │ 154191 │ 226864 │\n",
"├──────────┼──────────────────────┼─────────┼────────┤\n",
"│ word │ 1 │ 1 │ 137779 │\n",
"╘══════════╧══════════════════════╧═════════╧════════╛\n"
]
}
],
"source": [
"# Library to format table\n",
"from tabulate import tabulate\n",
"headers = [\"Node\", \"Avarage # of slots\",\"first\",\"last\"]\n",
"ResultList= C.levels.data\n",
"print(tabulate(ResultList, headers=headers, tablefmt='fancy_grid'))"
]
},
{
"cell_type": "markdown",
"id": "b3cbf04f",
"metadata": {},
"source": [
"## 3.6 - Node number ranges \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "20dd1920",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"book (137780, 137806)\n",
"chapter (137807, 138066)\n",
"sentence (226865, 232584)\n",
"verse (232585, 240527)\n",
"clause (138067, 154190)\n",
"phrase (154191, 226864)\n",
"word (1, 137779)\n"
]
}
],
"source": [
"for NodeType in F.otype.all:\n",
" print (NodeType, F.otype.sInterval(NodeType))"
]
},
{
"cell_type": "markdown",
"id": "3a835ffa-a45b-462d-a5b4-0f04c2c5155a",
"metadata": {},
"source": [
"Note that the ranges shown as output of this command are (except, possibly with repect to order) the same as found in file `otype.tf`:\n",
">\n",
"```@node\n",
"@TextFabric version=11.4.10\n",
"...\n",
"@valueType=str\n",
"@writtenBy=Text-Fabric\n",
"@dateWritten=2023-06-19T16:21:20Z\n",
"\n",
"1-137779\tword\n",
"137780-137806\tbook\n",
"137807-138066\tchapter\n",
"138067-154190\tclause\n",
"154191-226864\tphrase\n",
"226865-232584\tsentence\n",
"232585-240527\tverse\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "240a715c-342b-4fe5-af21-9e7bd3f7685e",
"metadata": {},
"source": [
"# 4 - Required libraries \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:\n",
"\n",
" tabulate\n",
"\n",
"You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e34b38bc-7cc1-4770-b61e-9ba2352f1664",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}