{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook explains the basics of working with Text-Fabric and Python. \n",
"\n",
"First, we import the Text-Fabric module. Make sure to have Text-Fabric installed first! If you have not installed Text-Fabric on your computer, you can uncomment (remove the `#`) first line to install it temporarily here. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"rate limit is 5000 requests per hour, with 5000 left for this hour\n",
"\tconnecting to online GitHub repo annotation/app-bhsa ... connected\n"
]
},
{
"data": {
"text/html": [
"TF-app: C:\\Users\\Mark/text-fabric-data/annotation/app-bhsa/code"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"data: C:\\Users\\Mark/text-fabric-data/etcbc/bhsa/tf/c"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"data: C:\\Users\\Mark/text-fabric-data/etcbc/phono/tf/c"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"data: C:\\Users\\Mark/text-fabric-data/etcbc/parallels/tf/c"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Text-Fabric: Text-Fabric API 8.4.0, app-bhsa, Search Reference
Data: BHSA, Character table, Feature docs
Features:
Parallel Passages
crossref
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
book
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_lex
g_lex_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
prs
prs_gn
prs_nu
prs_ps
ps
qere
qere_trailer
qere_trailer_utf8
qere_utf8
rank_lex
rela
sp
st
tab
trailer
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
voc_lex
voc_lex_utf8
vs
vt
mother
oslots
Phonetic Transcriptions
phono
phono_trailer
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#pip install text-fabric\n",
"from tf.app import use\n",
"A = use('bhsa:hot', hoist=globals())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The complete API (Application Programming Interface) of Text-Fabric can be found here: [Cheatsheet](https://annotation.github.io/text-fabric/cheatsheet.html). As this might be too complicated for you to understand right now, the most useful and important functions will be explained below. Once you get the hang of it, the others are pretty much self-explanatory. \n",
"\n",
"The most used function is probably `F.otype.s(...)`. It collects all nodes of a specific node type. To find out what node types there are, we can check the [feature documentation](https://etcbc.github.io/bhsa/features/otype/). For example, if we want to collect all books or words, we can simply do the following:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"range(426585, 426624)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"F.otype.s('book')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"range(1, 426585)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"F.otype.s('word')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apparently, every [node type](https://etcbc.github.io/bhsa/features/otype/) is represented by a number. The words are represented by the numbers 1 through 426584 (ranges are exclusive in Python) and the books by the numbers 426585 through 246623.\n",
"\n",
"So, for example, node 100000 represents a word. We can use the node of this word to find out more about its features. To do this, we look in the feature documentation for all features that are applicable to 'words'. If we want to know this word's part of speech, we simply look for that feature in the documentation and find out that it is called `sp` and that it is applicable to 'objects of type word' ([sp](https://etcbc.github.io/bhsa/features/sp/)).\n",
"\n",
"To find out the feature of node 100000, we add the following elements together:\n",
"- `F.`, indicating 'Feature' \n",
"- The name of the feature, here `sp` \n",
"- `.v`, because we know the node and we are looking for the value\n",
"- The number of the node, `(100000)` \n",
"\n",
"Together, this forms `F.sp.v(100000)`. Similarly, if we want to know the word's lexeme, we construct `F.lex.v(100000)`. "
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'prep'"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"F.sp.v(100000)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'B'"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"F.lex.v(100000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To find out more about the position and the surroundings of node 100000, we have to use different functions than `F.` because `F.` only applies to the feature of the word itself. \n",
"\n",
"\n",
"Consulting the [cheatsheet](https://annotation.github.io/text-fabric/cheatsheet.html#gsc.tab=0) again, we find out that there also exist functions that start with the letter A, T, or L. \n",
"\n",
"\n",
"The L and T functions will probably be used the most, but there is at least one handy function that starts with the letter A: `A.sectionStrFromNode()`. It returns the heading of a section to which the node belongs. Let's apply it to node 100000 again: "
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Deuteronomy 11:19'"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A.sectionStrFromNode(100000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function shows to which section a node belongs by returning a *string* (a data type that is a sequence of characters). There is also a similar function which starts with T, `T.sectionFromNode()`, but which returns a *tuple* (an ordered and immutable collection of objects). Let's check out the difference between both functions:"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Function A: Deuteronomy 11:19\n",
"Function T: ('Deuteronomy', 11, 19)\n",
"Function T: Deuteronomy\n",
"Function T: Deuteronomy 11\n"
]
}
],
"source": [
"A_section = A.sectionStrFromNode(100000)\n",
"T_section = T.sectionFromNode(100000)\n",
"\n",
"print(\"Function A:\", A_section)\n",
"print(\"Function T:\", T_section)\n",
"print(\"Function T:\", T_section[0])\n",
"print(\"Function T:\", T_section[0], T_section[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While `A.sectionStrFromNode()` is useful if you only want to know a node's section, `T.sectionFromNode` allows you to easily adapt the data to your wishes. \n",
"\n",
"Another useful `T.` function is `T.text(node, fmt=...)`. It simply prints the text that is represented by the node. It requires a node as input with the option to specify the format, `fmt`. Below you can see some examples of different formats. When the format is not specified, the default format is `text-orig-full`, signifying the text in Hebrew including all diacritical marks. \n"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"B\n",
"B.:-\n",
"ב\n",
"בְּ\n",
"בְּ\n"
]
}
],
"source": [
"text_trans_plain = T.text(100000, fmt='text-trans-plain')\n",
"text_trans_full = T.text(100000, fmt='text-trans-full')\n",
"text_orig_plain = T.text(100000, fmt='text-orig-plain')\n",
"text_orig_full = T.text(100000, fmt='text-orig-full')\n",
"print(text_trans_plain)\n",
"print(text_trans_full)\n",
"print(text_orig_plain)\n",
"print(text_orig_full)\n",
"print(T.text(100000))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, let's introduce two functions starting with `L.`. We have been focusing on node 100000, which is a word. What if we want to analyse its direct surroundings? For example, to which phrase does it belong, what is its function in the overarching clause?\n",
"\n",
"To move up or down from one node type to another can be done with *Locality functions*. `L.u(node, otype=node type)` moves up from the *node* to the specified node type. For example, if we want to print the text of the clause to which our node 100000 belongs, we could do the following:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"בְּשִׁבְתְּךָ֤ בְּבֵיתֶ֨ךָ֙ \n"
]
}
],
"source": [
"# move up from node 100000 to its clause\n",
"clause_node = L.u(100000, 'clause')\n",
"\n",
"# get the text for this clause\n",
"clause_text = T.text(clause_node)\n",
"\n",
"# print the clause text\n",
"print(clause_text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or, if we want to find the first clause in the BHSA, print the part of speech of each word within that clause, and determine the subject of the clause, we must do the following:\n",
"- Determine the node of the first clause by getting the first element of `F.otype.s('clause')`, the collection of all clauses.\n",
"- Get its section using `A.sectionStrFromNode(node)`\n",
"- Use `L.d(node, 'word)` to move down from clause level to word level and collect the word nodes\n",
"- Use `F.sp.v(node)` to get the part of speech for each word\n",
"- Use `L.d(node, 'phrase')` to get the phrase nodes of the clause\n",
"- Use `F.function.v(node)` to get the phrase function to check whether it is the subject of the clause (function is a feature on phrase level, see [here](https://etcbc.github.io/bhsa/features/function/)).\n",
"\n",
"Between each step, the program will print the results to provide insight in the intermediate results."
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"all_clauses: range(427553, 515674)\n",
"first_clause_node: 427553\n",
"Section: Genesis 1:1\n",
"word_nodes_first_clause: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)\n",
"1 prep ב\n",
"2 subs ראשׁית \n",
"3 verb ברא \n",
"4 subs אלהים \n",
"5 prep את \n",
"6 art ה\n",
"7 subs שׁמים \n",
"8 conj ו\n",
"9 prep את \n",
"10 art ה\n",
"11 subs ארץ׃ \n",
"phrase_nodes_first_clause: (651542, 651543, 651544, 651545)\n",
"651544 Subj אלהים \n"
]
}
],
"source": [
"# collecting all clauses\n",
"all_clauses = F.otype.s('clause')\n",
"print(\"all_clauses:\", all_clauses)\n",
"\n",
"# getting the first clause\n",
"first_clause_node = all_clauses[0]\n",
"print(\"first_clause_node:\", first_clause_node)\n",
"\n",
"# getting the section of the first clause\n",
"section_first_clause = A.sectionStrFromNode(first_clause_node)\n",
"print(\"Section:\", section_first_clause)\n",
"\n",
"# collecting all words from the first clause\n",
"word_nodes_first_clause = L.d(first_clause_node, 'word')\n",
"print(\"word_nodes_first_clause:\", word_nodes_first_clause)\n",
"\n",
"# iterating through all word nodes in the first clause and \n",
"# printing for each word: node, part of speech, unvocalised text\n",
"for word in word_nodes_first_clause:\n",
" print(word, F.sp.v(word), T.text(word, fmt='text-orig-plain'))\n",
" \n",
"# collecting all phrases from the first clause\n",
"phrase_nodes_first_clause = L.d(first_clause_node, 'phrase')\n",
"print(\"phrase_nodes_first_clause:\", phrase_nodes_first_clause)\n",
"\n",
"# iterating through all phrase node and checking whether \n",
"# their function matches 'Subj'. If so, it prints:\n",
"# phrase node, function, unvocalised text\n",
"for phrase in phrase_nodes_first_clause:\n",
" if F.function.v(phrase) == 'Subj':\n",
" print(phrase, F.function.v(phrase), T.text(phrase, fmt='text-orig-plain'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}