# Differences between 'word' and 'normalized'

## Table of content <a class="anchor" id="TOC"></a> 
* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Performing the queries</a>

# 1 - Introduction <a class="anchor" id="bullet1"></a>

Jupyter NoteBook to investigate the differences between feature 'word' and 'normalized'. 

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment.
from tf.fabric import Fabric
from tf.app import use

In [3]:
# load the app and data
N1904 = use ("tonyjurg/Nestle1904GBI:latest", hoist=globals())

**Locating corpus resources ...**

The requested data is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.3 not found


Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
sentence,5720,24.09,100
verse,7944,17.34,100
clause,16124,8.54,100
phrase,73547,1.87,100
word,137779,1.0,100


# 3 - Performing the queries<a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

In [4]:
Differences = '''
a:word 
a .word#normalized. a
'''
DifferencesList = N1904.search(Differences)

  0.15s 37244 results


In [5]:
# Library to format table
from tabulate import tabulate

ResultDict = {}
for Difference in DifferencesList:
    index=+1
    node=Difference[0]
    Change=F.word.v(node)+" -> "+F.normalized.v(node)
    # Check if this Change already exists in ResultDict
    if Change in ResultDict:
         # If it exists, add the count to the existing value
         ResultDict[Change]+=1
    else:
         # If it doesn't exist, initialize the count as the value
         ResultDict[Change]=1
            
# Convert the dictionary into a list of key-value pairs and sort it according to frequency
UnsortedTableData = [[key, value] for key, value in ResultDict.items()]
TableData= sorted(UnsortedTableData, key=lambda row: row[1], reverse=True)

# Produce the table
headers = ["word -> normalized","frequency"]
print(tabulate(TableData, headers=headers, tablefmt='fancy_grid'))


╒══════════════════════════════════════╤═════════════╕
│ word -> normalized                   │   frequency │
╞══════════════════════════════════════╪═════════════╡
│ καὶ -> καί                           │        8541 │
├──────────────────────────────────────┼─────────────┤
│ δὲ -> δέ                             │        2620 │
├──────────────────────────────────────┼─────────────┤
│ τὸ -> τό                             │        1657 │
├──────────────────────────────────────┼─────────────┤
│ τὸν -> τόν                           │        1556 │
├──────────────────────────────────────┼─────────────┤
│ τὴν -> τήν                           │        1518 │
├──────────────────────────────────────┼─────────────┤
│ γὰρ -> γάρ                           │         921 │
├──────────────────────────────────────┼─────────────┤
│ μὴ -> μή                             │         902 │
├──────────────────────────────────────┼─────────────┤
│ τὰ -> τά                             │         817 │
├─────────