{
"cells": [
{
"cell_type": "markdown",
"id": "cb10925c-cccb-420e-be3b-b5ca49ad5cf5",
"metadata": {
"tags": []
},
"source": [
"# Identifying 'odd' characters for feature 'after' (N1904LFT)"
]
},
{
"cell_type": "markdown",
"id": "b9e178c9-7abb-46cf-b4d4-8d38a5985bf2",
"metadata": {
"tags": []
},
"source": [
"## Table of content \n",
"* 1 - Introduction]\n",
"* 2 - Load Text-Fabric app and data\n",
"* 3 - Performing the queries\n",
" * 3.1 - Showing the issue\n",
" * 3.2 - Setting up a query to find them\n",
" * 3.3 - Explanation of the regular expression\n",
" * 3.4 - Bug"
]
},
{
"cell_type": "markdown",
"id": "549033e8-b844-4504-8017-bd4a389e1164",
"metadata": {},
"source": [
"# 1 - Introduction \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"This Jupyter Notebook investigates the pressense of 'odd' values for feature 'after'. "
]
},
{
"cell_type": "markdown",
"id": "01f65e07-00ae-4099-892e-6dcfeecd6663",
"metadata": {},
"source": [
"# 2 - Load Text-Fabric app and data \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "51782023-07ce-4923-b46a-3ed0fd2b8a12",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a1afe711-fc3a-49c7-a3a8-6d889c0adc0e",
"metadata": {},
"outputs": [],
"source": [
"# Loading the New Testament TextFabric code\n",
"# Note: it is assumed Text-Fabric is installed in your environment.\n",
"\n",
"from tf.fabric import Fabric\n",
"from tf.app import use"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "29ff4a94-c84d-4011-9dd8-4acfd3f4a845",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"**Locating corpus resources ...**"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Status: latest release online v03 versus None locally"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"downloading app, main data and requested additions ..."
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"app: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"The requested data is not available offline\n",
"\t~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3 not found\n"
]
},
{
"data": {
"text/html": [
"Status: latest release online v03 versus None locally"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"downloading app, main data and requested additions ..."
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"data: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" | 0.30s T otype from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 3.07s T oslots from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.01s T book from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.58s T chapter from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.70s T word from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.57s T after from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.57s T verse from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | | 0.08s C __levels__ from otype, oslots, otext\n",
" | | 1.79s C __order__ from otype, oslots, __levels__\n",
" | | 0.08s C __rank__ from otype, __order__\n",
" | | 4.63s C __levUp__ from otype, oslots, __rank__\n",
" | | 2.70s C __levDown__ from otype, __levUp__, __rank__\n",
" | | 0.06s C __characters__ from otext\n",
" | | 1.19s C __boundary__ from otype, oslots, __rank__\n",
" | | 0.05s C __sections__ from otype, oslots, otext, __levUp__, __levels__, book, chapter, verse\n",
" | | 0.26s C __structure__ from otype, oslots, otext, __rank__, __levUp__, book, chapter, verse\n",
" | 0.54s T appos from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.58s T book_long from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.50s T booknumber from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.57s T bookshort from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.55s T case from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.47s T clausetype from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.66s T containedclause from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.50s T degree from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.65s T gloss from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.54s T gn from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.72s T id from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.48s T junction from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.63s T lemma from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.58s T lex_dom from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.60s T ln from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.52s T monad from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.51s T mood from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.59s T morph from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.61s T nodeID from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.68s T normalized from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.56s T nu from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.56s T number from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.50s T orig_order from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.51s T person from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.75s T ref from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.57s T roleclausedistance from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.55s T rule from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.51s T sentence from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.58s T sp from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.57s T sp_full from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.61s T strongs from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.51s T subj_ref from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.51s T tense from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.53s T type from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.71s T unicode from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.52s T voice from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.55s T wgclass from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.48s T wglevel from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.51s T wgnum from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.50s T wgrole from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.50s T wgrolelong from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.53s T wgtype from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.07s T wordgroup from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.57s T wordlevel from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.56s T wordrole from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n",
" | 0.58s T wordrolelong from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.3\n"
]
},
{
"data": {
"text/html": [
"\n",
" Text-Fabric: Text-Fabric API 11.4.10, tonyjurg/Nestle1904LFT/app v3, Search Reference
\n",
" Data: tonyjurg - Nestle1904LFT 0.3, Character table, Feature docs
\n",
" Node types
\n",
"\n",
" \n",
" Name | \n",
" # of nodes | \n",
" # slots/node | \n",
" % coverage | \n",
"
\n",
"\n",
"\n",
" book | \n",
" 27 | \n",
" 5102.93 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" chapter | \n",
" 260 | \n",
" 529.92 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" verse | \n",
" 7943 | \n",
" 17.35 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" sentence | \n",
" 12160 | \n",
" 11.33 | \n",
" 100 | \n",
"
\n",
"\n",
"\n",
" wg | \n",
" 132460 | \n",
" 6.59 | \n",
" 633 | \n",
"
\n",
"\n",
"\n",
" word | \n",
" 137779 | \n",
" 1.00 | \n",
" 100 | \n",
"
\n",
"
\n",
" Sets: no custom sets
\n",
" Features:
\n",
"Nestle 1904 (LowFat Tree)
\n",
" \n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Characters (eg. punctuations) following the word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Apposition details\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Book\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Book name (fully spelled out)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
NT book number (Matthew=1, Mark=2, ..., Revelation=27)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Book name (abbreviated)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
Chapter number inside book\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Clause type details\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Contained clause (WG number)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Degree (e.g. Comparitative, Superlative)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
English gloss\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical gender (Masculine, Feminine, Neuter)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
id of the word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Junction data related to a wordgroup\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Lexeme (lemma)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Lexical domain according to Semantic Dictionary of Biblical Greek, SDBG (not present everywhere?)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Lauw-Nida lexical classification (not present everywhere?)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
Monad (currently: order of words in XML tree file!)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical mood of the verb (passive, etc)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Morphological tag (Sandborg-Petersen morphology)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Node ID (as in the XML source data, not yet post-processes)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Surface word stripped of punctations\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical number (Singular, Plural)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical number of the verb\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
Word order within corpus (per book)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical person of the verb (first, second, third)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
ref Id\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
distance to wordgroup defining the role of this word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Wordgroup rule information \n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
Sentence number (counted per chapter)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Part of Speech (abbreviated)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Part of Speech (long description)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Strongs number\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Subject reference (to nodeID in XML source data, not yet post-processes)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical tense of the verb (e.g. Present, Aorist)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical type of noun or pronoun (e.g. Common, Personal)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Word as it arears in the text in Unicode (incl. punctuations)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
Verse number inside chapter\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Gramatical voice of the verb\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Class of the wordgroup ()\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
number of parent wordgroups for a wordgroup\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
Wordgroup number (counted per book)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Role of the wordgroup (abbreviated)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Role of the wordgroup (full)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Wordgroup type details\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Word as it appears in the text (excl. punctuations)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
int
\n",
"\n",
"
Wordgroup number (counted per book)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
number of parent wordgroups for a word\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Role of the word (abbreviated)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
str
\n",
"\n",
"
Role of the word (full)\n",
"\n",
"
\n",
"\n",
"
\n",
"
\n",
"
none
\n",
"\n",
"
\n",
"\n",
"
\n",
"\n",
"
\n",
" \n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# load the app and data\n",
"N1904 = use (\"tonyjurg/Nestle1904LFT:latest\", hoist=globals())"
]
},
{
"cell_type": "markdown",
"id": "4b1bf471-6511-4fd9-8bb8-116379da307f",
"metadata": {
"tags": []
},
"source": [
"# 3 - Performing the queries \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "markdown",
"id": "62bc2c34-bb57-46bf-96f3-3ff6f43c9eee",
"metadata": {},
"source": [
"## 3.1 - Showing the issue \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "markdown",
"id": "754e51a0-0f07-4262-ac18-e96f81160d48",
"metadata": {},
"source": [
"The following shows the pressence of a few 'odd' cases for feature 'after':"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "748067ec-15ac-4080-9fbc-65c97b8cce2b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"frequency: ((' ', 119271), (',', 9443), ('.', 5717), ('·', 2355), (';', 970), ('—', 7), ('ε', 3), ('ς', 3), ('ὶ', 2), ('ί', 1), ('α', 1), ('ι', 1), ('χ', 1), ('ἱ', 1), ('ὁ', 1), ('ὰ', 1), ('ὸ', 1))\n"
]
}
],
"source": [
"result = F.after.freqList()\n",
"print ('frequency: {0}'.format(result))"
]
},
{
"cell_type": "markdown",
"id": "86138709-f6c0-433a-aaab-db6aa079e33a",
"metadata": {},
"source": [
"## 3.2 - Setting up a query to find them \n",
"##### [Back to TOC](#TOC)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "15d731b8-ac0e-4427-a998-29a2997d72b6",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.11s 16 results\n",
"╒═════════════════════╤══════════════╤═════════╕\n",
"│ location │ word │ after │\n",
"╞═════════════════════╪══════════════╪═════════╡\n",
"│ Luke 23:51 │ —οὗτο │ ς │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Luke 2:35 │ —κα │ ὶ │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ John 4:2 │ —καίτοιγ │ ε │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ John 7:22 │ —οὐ │ χ │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Acts 22:2 │ —ἀκούσαντε │ ς │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Romans 15:25 │ —νυν │ ὶ │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ I_Corinthians 9:15 │ —τ │ ὸ │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ II_Corinthians 12:2 │ —ἁρπαγέντ │ α │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ II_Corinthians 12:2 │ —εἴτ │ ε │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ II_Corinthians 12:3 │ —εἴτ │ ε │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ II_Corinthians 6:2 │ —λέγε │ ι │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Galatians 2:6 │ —ὁποῖο │ ί │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Ephesians 5:10 │ —δοκιμάζοντε │ ς │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Ephesians 5:9 │ — │ ὁ │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Hebrews 7:20 │ —ο │ ἱ │\n",
"├─────────────────────┼──────────────┼─────────┤\n",
"│ Hebrews 7:22 │ —κατ │ ὰ │\n",
"╘═════════════════════╧══════════════╧═════════╛\n"
]
}
],
"source": [
"# Library to format table\n",
"from tabulate import tabulate\n",
"\n",
"# The actual query\n",
"SearchOddAfters = '''\n",
"word after~^(?!([\\s\\.·—,;]))\n",
" '''\n",
"OddAfterList = N1904.search(SearchOddAfters)\n",
"\n",
"# Postprocess the query results\n",
"Results=[]\n",
"for tuple in OddAfterList:\n",
" node=tuple[0]\n",
" location=\"{} {}:{}\".format(F.book.v(node),F.chapter.v(node),F.verse.v(node))\n",
" result=(location,F.word.v(node),F.after.v(node))\n",
" Results.append(result)\n",
" \n",
"# Produce the table\n",
"headers = [\"location\",\"word\",\"after\"]\n",
"print(tabulate(Results, headers=headers, tablefmt='fancy_grid'))"
]
},
{
"cell_type": "markdown",
"id": "d1a89bf7-1aa6-4762-99ab-cf2e677946ec",
"metadata": {},
"source": [
"## 3.3 - Explanation of the regular expression \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"The regular expression broken down in its components:\n",
"\n",
"`^`: This symbol is called a caret and represents the start of a string. It ensures that the following pattern is applied at the beginning of the string.\n",
"\n",
"`(?!...)`: This is a negative lookahead assertion. It checks if the pattern inside the parentheses does not match at the current position.\n",
"\n",
"`[…]`: This denotes a character class, which matches any single character that is within the brackets.\n",
"\n",
"`[\\s\\.·,—,;]`: This character class contains multiple characters enclosed in the brackets. Let's break down the characters within it:\n",
"\n",
"* `\\s`: This is a shorthand character class that matches any whitespace character, including spaces, tabs, and newlines.\n",
"* `\\.`: This matches a literal period (dot).\n",
"* `·`: This matches a specific Unicode character, which is a middle dot.\n",
"* `—`: This matches an em dash character.\n",
"* `,`: This matches a comma.\n",
"* `;`: This matches a semicolon.\n",
"In summary, the character class `[\\s\\.·,—,;]` matches any single character that is either a whitespace character, a period, a middle dot, an em dash, a comma, or a semicolon.\n",
"\n",
"The regular expression selects any string which does not starts with a whitespace character, period, middle dot, em dash, comma, or semicolon."
]
},
{
"cell_type": "markdown",
"id": "a5e4bdf3-b108-4c6a-b99b-a4bf4723afee",
"metadata": {},
"source": [
"The following site can be used to build and verify a regular expression: [regex101.com](https://regex101.com/) (choose the 'Pyton flavor') "
]
},
{
"cell_type": "markdown",
"id": "3854f61c",
"metadata": {},
"source": [
"## 3.4 - Bug \n",
"##### [Back to TOC](#TOC)\n",
"\n",
"The observed behaviour was due to a bug. [Issue tracker #76](https://github.com/Clear-Bible/macula-greek/issues/76) was opened. When the text of a node starts with punctuation, the @after attribute contains the last character of the word. This is a bug in the transformation to XML LowFat Tree data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "118862e2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}