# Identifying use of critical signs in the text (N1904LFT)

## Table of content <a class="anchor" id="TOC"></a>
* <a href="#bullet1">1 - Introduction]</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Performing the queries</a>
    * <a href="#bullet3x1">3.1 - Getting an overview of leading critical signs</a>
    * <a href="#bullet3x2">3.2 - Query for all words that contain some critical marks</a>
    * <a href="#bullet3x3">3.3 - Collect critical marks before and after word</a>
    * <a href="#bullet3x4">3.4 - Comparing with print edition</a>
    * <a href="#bullet3x5">3.5 - Freqency of markorder</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

This Jupyter Notebook investigates the pressense of 'odd' values for feature 'after'. 

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [1]:
%load_ext autoreload
%autoreload 2

In [1]:
# Loading the New Testament TextFabric code
# Note: it is assumed Text-Fabric is installed in your environment.

from tf.fabric import Fabric
from tf.app import use

In [2]:
# load the app and data
N1904 = use ("tonyjurg/Nestle1904LFT", version="0.5", hoist=globals())

**Locating corpus resources ...**

The requested data is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.5 not found


Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
verse,7943,17.35,100
sentence,8011,17.2,100
wg,113447,7.58,624
word,137779,1.0,100


# 3 - Performing the queries <a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

## 3.1 - Getting an overview of leading critical signs<a class="anchor" id="bullet3x1"></a>
##### [Back to TOC](#TOC)

First get a list of all unique words in unicode (including punctuations, critical signs and trailing spaces):

In [3]:
unicodeList = F.unicode.freqList()
print ('Number of results:',len(unicodeList))

Number of results: 25213


Now just look at the first character:

In [4]:
# Initialize an empty dictionary to store the frequencies
frequencyList = {}
criticalSignsList ={}
criticalSigns={"—","[","("}
              
# Iterate through the list (which is a list of ordered tuples)
for item in unicodeList:
    # Get the first character of the item
    firstChar = item[0][0]

    # Update the frequency in the dictionary for the full list
    frequencyList[firstChar] = frequencyList.get(firstChar, 0) + 1
    # add to other list if critical sign
    if firstChar in criticalSigns:
        criticalSignsList[firstChar]=criticalSignsList.get(firstChar, 0) + 1
    

print("Frequency list of all first character:")
print(frequencyList)
print("\nFrequency list of critical character:")
print(criticalSignsList)

Frequency list of all first character:
{'κ': 1791, 'ὁ': 113, 'ἐ': 2831, 'δ': 1285, 'τ': 717, 'ε': 636, 'ὅ': 64, 'ο': 289, 'ἡ': 146, 'γ': 577, 'μ': 927, 'α': 304, 'π': 2923, 'ἵ': 8, 'ὡ': 19, 'ἀ': 2523, 'Ἰ': 187, 'Θ': 71, 'Κ': 183, 'ὑ': 411, 'ἢ': 1, 'λ': 559, 'σ': 1345, 'ἦ': 31, 'ὃ': 3, 'ἂ': 1, 'ἕ': 65, 'ἰ': 114, 'Χ': 52, 'ν': 289, 'ἔ': 555, 'Ὁ': 11, 'ᾧ': 1, 'Τ': 114, 'ἃ': 2, 'ἄ': 293, 'ἣ': 2, 'Π': 253, 'Υ': 16, 'ὧ': 6, 'ὄ': 103, 'Μ': 149, 'ὥ': 21, 'ἑ': 206, 'ὀ': 226, 'Ο': 42, 'Ε': 57, 'β': 462, 'ἤ': 93, 'χ': 367, 'ζ': 189, 'Ἐ': 160, 'Φ': 73, 'ἧ': 2, 'Σ': 200, 'Ἀ': 258, 'ἓ': 3, 'θ': 431, 'ἥ': 38, 'ὢ': 1, 'ᾗ': 1, 'φ': 505, 'υ': 26, 'Δ': 107, 'ἁ': 174, 'ᾖ': 4, 'Ἱ': 26, 'ψ': 97, 'Ἠ': 45, 'Ἁ': 15, 'Ἅ': 16, 'ἱ': 94, 'Ὡ': 9, 'Γ': 80, 'ῥ': 84, 'Ἄ': 45, 'ὕ': 39, 'Β': 102, 'ἠ': 195, 'Ὑ': 19, 'ἴ': 50, 'ὤ': 13, 'Ὅ': 15, 'ὦ': 10, 'Ἴ': 4, 'ἅ': 61, 'ᾔ': 10, 'Ἦ': 6, 'Ἔ': 50, 'Ῥ': 40, 'Λ': 67, 'Ὃ': 3, 'Α': 33, 'Ζ': 29, 'Ν': 65, 'Ὦ': 2, 'Ὕ': 4, 'ἆ': 6, 'Ἢ': 2, 'ᾐ': 8, 'ξ': 40, 'Ὀ': 15, 'ὠ': 44, 'ᾠ': 6, '

## 3.2 - Query for all words that contain some critical marks<a class="anchor" id="bullet3x2"></a>
##### [Back to TOC](#TOC)

In [53]:
# Library to format table
from tabulate import tabulate

# The actual query
SearchCriticalMarks = '''
word word~[(\(\[—\)\])]
    '''
MarksList = N1904.search(SearchCriticalMarks)

# Postprocess the query results
Results=[]
for tuple in MarksList:
    node=tuple[0]
    location="{} {}:{}".format(F.book.v(node),F.chapter.v(node),F.verse.v(node))
    result=(location,F.unicode.v(node),F.word.v(node),F.after.v(node))
    Results.append(result)
      
# Produce the table
headers = ["location","unicode","word","after"]
print(tabulate(Results, headers=headers, tablefmt='fancy_grid'))

  0.11s 55 results
╒═════════════════════╤══════════════════╤══════════════════╤═════════╕
│ location            │ unicode          │ word             │ after   │
╞═════════════════════╪══════════════════╪══════════════════╪═════════╡
│ Mark 1:1            │ (Υἱοῦ            │ (Υἱοῦ            │         │
├─────────────────────┼──────────────────┼──────────────────┼─────────┤
│ Mark 1:1            │ Θεοῦ).           │ Θεοῦ)            │ .       │
├─────────────────────┼──────────────────┼──────────────────┼─────────┤
│ Mark 16:9           │ [[Ἀναστὰς        │ [[Ἀναστὰς        │         │
├─────────────────────┼──────────────────┼──────────────────┼─────────┤
│ Mark 16:20          │ σημείων.]]       │ σημείων.]]       │         │
├─────────────────────┼──────────────────┼──────────────────┼─────────┤
│ Mark 16:99          │ [[Πάντα          │ [[Πάντα          │         │
├─────────────────────┼──────────────────┼──────────────────┼─────────┤
│ Mark 16:99          │ σωτηρίας.]]      │ σω

Note: The following site can be used to build and verify a regular expression: [regex101.com](https://regex101.com/) (choose the 'Pyton flavor') 

## 3.3 - Collect critical marks before and after word<a class="anchor" id="bullet3x3"></a>
##### [Back to TOC](#TOC)

In [5]:
# Library to format table
from tabulate import tabulate

# Creating a translation table to remove unwanted characters
criticalMarkCharacters = "[]()—"
punctuationCharacters = ",.;·"
translationTableMarkers = str.maketrans("", "", criticalMarkCharacters)
translationTablePunctuations = str.maketrans("", "", punctuationCharacters)
punctuations=('.',',',';','·')

# Query for words containing critical markers
SearchCriticalMarkers = '''
word unicode~[(\(\[—\)\])]
    '''
MarksList = N1904.search(SearchCriticalMarkers)

# Postprocess the query results
Results=[]
for tuple in MarksList:
    node=tuple[0]
    location="{} {}:{}".format(F.book.v(node),F.chapter.v(node),F.verse.v(node))
    rawWord=F.unicode.v(node)
    cleanWord= rawWord.translate(translationTableMarkers)
    rawWithoutPunctuations=rawWord.translate(translationTablePunctuations)
    PunctuationMarkOrder="No mark"
    if cleanWord[-1] in punctuations:
        punctuation=cleanWord[-1]
        after=punctuation+' '
        word=cleanWord[:-1]
    else:
        after=' '
        word=cleanWord
        punctuation=''
    if rawWithoutPunctuations!=word:
        markAfter=markBefore=''
        if rawWord.find(word)==0:
            markAfter=rawWithoutPunctuations.replace(word,"")
            if punctuation!='':
                if rawWord.find(markAfter)-rawWord.find(punctuation)>0:
                    PunctuationMarkOrder="(-1) punct. before mark."
                else:
                    PunctuationMarkOrder="(1) punct. after mark."
            else:
                PunctuationMarkOrder="(0) no punctuation, mark after word"
        else:
            markBefore=rawWithoutPunctuations.replace(word,"")
            PunctuationMarkOrder="(na) mark is before word"
    
    
    # built in Python function repr() explicitly showing spaces (incl. whitespace characters like space, tab, and newline),
    result=(location,repr(rawWord),repr(markBefore),repr(word),repr(markAfter),repr(after),PunctuationMarkOrder)
    Results.append(result)
      
# Produce the table
headers = ["location","rawWord","markBefore","word","markAfter","after","punct. mark. order"]
print(tabulate(Results, headers=headers, tablefmt='fancy_grid'))

  0.12s 85 results
╒══════════════════════╤════════════════════╤══════════════╤══════════════════╤═════════════╤═════════╤═════════════════════════════════════╕
│ location             │ rawWord            │ markBefore   │ word             │ markAfter   │ after   │ punct. mark. order                  │
╞══════════════════════╪════════════════════╪══════════════╪══════════════════╪═════════════╪═════════╪═════════════════════════════════════╡
│ Matthew 9:6          │ 'ἁμαρτίας—'        │ ''           │ 'ἁμαρτίας'       │ '—'         │ ' '     │ (0) no punctuation, mark after word │
├──────────────────────┼────────────────────┼──────────────┼──────────────────┼─────────────┼─────────┼─────────────────────────────────────┤
│ Mark 1:1             │ '(Υἱοῦ'            │ '('          │ 'Υἱοῦ'           │ ''          │ ' '     │ (na) mark is before word            │
├──────────────────────┼────────────────────┼──────────────┼──────────────────┼─────────────┼─────────┼──────────────────────────

## 3.4 - Comparing with the print edition<a class="anchor" id="bullet3x4"></a>
##### [Back to TOC](#TOC)

Some selections from the Nestle print edition @ [archive.org](https://archive.org/details/the-greek-new-testament-nestle-1904-us-edition):



**Mark 7:2-4:**

<img src="images/mark7v2-4Nestle.jpg">

**Luke 2:35-36:**

<img src="images/luke2v35-36Nestle.jpg">

**Luke 2:35-36:**

<img src="images/luke24v12-13Nestle.jpg">

**John 10:12-13:**

<img src="images/john10v12-13Nestle.jpg">

**2 Cor 12:2:**

<img src="images/2Cor12v2-3Nestle.jpg">





## 3.5 - Freqency of markorder <a class="anchor" id="bullet3x5"></a>
##### [Back to TOC](#TOC)

In [6]:
F.markorder.freqList()

(('', 137694), ('0', 34), ('3', 32), ('2', 10), ('1', 9))

Put into a table:

markorder | Description | Frequency
---  | --- | ---
` ` | No critical marks | 137694
`0` | Mark is before word | 34
`1` | Mark is after word, no punctuations after word | 9
`2` | Mark is after word, punctuations is after mark | 10
`3` | Mark is after word, punctuations is before mark | 32