# Vocalisation of the Tetragrammaton (BHSA)

## Table of content <a class="anchor" id="TOC"></a> (TOC)

* <a href="#bullet1">1 - Introduction</a>
* <a href="#bullet2">2 - Load Text-Fabric app and data</a>
* <a href="#bullet3">3 - Performing the queries</a>
    * <a href="#bullet3x1">3.1 - Get overview of all pointed versions</a>
    * <a href="#bullet3x2">3.2 - Plotting the punctuations of the Tetragrammaton</a>
    * <a href="#bullet3x3">3.3 - Some other playing around</a>
* <a href="#bullet4">4 - Required libraries</a>
* <a href="#bullet5">5 - Notebook details</a>


# 1 - Introduction <a class="anchor" id="bullet1"></a>
##### [Back to TOC](#TOC)

The Old Testament contains the how the Tetragrammaton ◊ô◊î◊ï◊î written with different vowels, for example with the vowals of of ◊ê÷≤◊ì÷π◊†÷∑◊ô (Adonai, ETCBC transliteration: >:ADON@J).

# 2 - Load Text-Fabric app and data <a class="anchor" id="bullet2"></a>
##### [Back to TOC](#TOC)

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment.
from tf.fabric import Fabric
from tf.app import use

In [3]:
# load the BHS app and data
BHS = use ("etcbc/BHSA",hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,39,10938.21,100
chapter,929,459.19,100
lex,9230,46.22,100
verse,23213,18.38,100
half_verse,45179,9.44,100
sentence,63717,6.7,100
sentence_atom,64514,6.61,100
clause,88131,4.84,100
clause_atom,90704,4.7,100
phrase,253203,1.68,100


Note: Thefeature documentation can be found at [ETCBC GitHub](https://github.com/ETCBC/bhsa/blob/master/docs/features/0_home.md) 

In [4]:
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
BHS.dh(BHS.getCss())

# 3 - Performing the queries <a class="anchor" id="bullet3"></a>
##### [Back to TOC](#TOC)

## 3.1 - Get overview of all pointed versions <a class="anchor" id="bullet3x1"></a>

First get all occurances of the Tetragrammaton ◊ô◊î◊ï◊î (so without vowel pointing and other diacritical marks). See also notes on [feature g_word](https://github.com/ETCBC/bhsa/blob/master/docs/features/g_word.md). 

In [5]:
JHWHQuery = '''
book
  chapter
     verse
       word g_cons=JHWH 
'''

JHWHResults = BHS.search(JHWHQuery)

  0.51s 6828 results


Now post process the results to create a nice table.

In [6]:
# Libraries for table formatting and regular expressions
import re
import pandas as pd
from IPython.display import display

# Initialize dictionary for storing results
resultDict = {}

# Process each item in the JHWHResults
for item in JHWHResults:
    node = item[3]
    # Get the pointed and unpointed representation of a word occurrence
    pointedWord = F.g_word.v(node)
    hebrewWord = F.g_word_utf8.v(node)
        
    # Remove cantillations in the BSHA (presented by digits)
    vocalizedWord = re.sub(r'\d', '', pointedWord)
    
    if vocalizedWord in resultDict:
        # If exists, increment the frequency count
        resultDict[vocalizedWord][0] += 1
    else:
        # Initialize count and store the first occurrence
        firstOccurrence = T.sectionFromNode(node)
        resultDict[vocalizedWord] = [1, firstOccurrence, hebrewWord]  

# Convert the dictionary into a DataFrame and sort by frequency
tableData = pd.DataFrame(
    [[key, value[0], value[1], value[2]] for key, value in resultDict.items()],
    columns=["Pointed Word", "Frequency", "First Occurrence", "Hebrew Word"]
)
tableData = tableData.sort_values(by="Frequency", ascending=False)

# Display the table
display(tableData)


Unnamed: 0,Pointed Word,Frequency,First Occurrence,Hebrew Word
0,J:HW@H,5682,"(Genesis, 2, 4)",◊ô÷∞◊î◊ï÷∏÷•◊î
2,JHW@H,788,"(Genesis, 4, 3)",◊ô◊î◊ï÷∏÷Ω◊î
5,J:HWIH,270,"(Deuteronomy, 3, 24)",◊ô÷∞◊î◊ï÷¥÷ó◊î
1,J:HOW@H,45,"(Genesis, 3, 14)",◊ô÷∞◊î÷π◊ï÷∏÷®◊î
7,J:HOWIH,32,"(1_Kings, 2, 26)",◊ô÷∞◊î÷π◊ï÷¥◊î÷ô
4,JHOW@H,6,"(Genesis, 18, 17)",◊ô◊î÷π◊ï÷∏÷ñ◊î
3,J:EHWIH,2,"(Genesis, 15, 2)",◊ô÷±◊î◊ï÷¥◊î÷ô
6,J:EHOWIH,1,"(Judges, 16, 28)",◊ô÷±◊î÷π◊ï÷¥÷°◊î
8,JHWIH,1,"(Psalms, 68, 21)",◊ô◊î◊ï÷¥÷•◊î
9,J:AHW@H,1,"(Psalms, 144, 15)",◊ô÷≤◊î◊ï÷∏÷•◊î


## 3.2 Plotting the punctuations of the Tetragrammaton <a class="anchor" id="bullet3x2"></a>

In [7]:
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.layouts import column

# Enable Bokeh output in the notebook
output_notebook()

# Ensure tableData has the exact column names you need
tableData.columns = ["Pointed Word", "Frequency", "First Occurrence", "Hebrew Word"]

# Create a ColumnDataSource for the Bokeh plot
source = ColumnDataSource(tableData)

# Create a Bokeh figure for the bar chart
p = figure(
    x_range=tableData['Hebrew Word'].tolist(),  # convert x_range to list explicitly
    height=800, 
    width=1000,
    title="Frequency of Tetragrammaton vocalisation in biblical text",
    toolbar_location="right"
)

# Create bar chart
p.vbar(x='Hebrew Word', top='Frequency', width=0.5, source=source)

# Add labels and customizations
p.xaxis.axis_label = "Hebrew Word"
p.yaxis.axis_label = "Frequency"
p.xaxis.major_label_orientation = "horizontal"
p.xaxis.major_label_text_font_size = "26pt"  # Increase font size of x-axis labels

# Add hover tool
hover = HoverTool()
hover.tooltips = [
    ("Pointed Word", "@{Pointed Word}"),
    ("Frequency", "@Frequency"),
    ("First Occurrence", "@{First Occurrence}"),
    ("Hebrew Word", "@{Hebrew Word}")
]
p.add_tools(hover)

# Show the interactive plot
show(p)

## 3.3 Some other playing around <a class="anchor" id="bullet3x3"></a>

Add another condition to the query. This is to select for the wowels for adOnAi, translatiteratd as O and @, which should be around the Wav. The regexp inludes '.*' to allow for in-between cantilation marks.

In [8]:
adonaiQuery = '''
word g_cons=JHWH g_word~O.*W.*@
'''

adonaiResults = BHS.search(adonaiQuery)

  0.29s 51 results


In [9]:
BHS.table(adonaiResults, condensed=False, extraFeatures={'voc_lex'})

n,p,word
1,Genesis 3:14,◊ô÷∞◊î÷π◊ï÷∏÷®◊î
2,Genesis 9:26,◊ô÷∞◊î÷π◊ï÷∏÷ñ◊î
3,Genesis 18:17,◊ô◊î÷π◊ï÷∏÷ñ◊î
4,Exodus 3:2,◊ô÷∞◊î÷π◊ï÷∏÷•◊î
5,Exodus 13:3,◊ô÷∞◊î÷π◊ï÷∏÷õ◊î
6,Exodus 13:9,◊ô÷∞◊î÷π◊ï÷∏÷ñ◊î
7,Exodus 13:12,◊ô◊î÷π◊ï÷∏÷ë◊î
8,Exodus 13:15,◊ô÷∞◊î÷π◊ï÷∏÷§◊î
9,Exodus 14:1,◊ô÷∞◊î÷π◊ï÷∏÷ñ◊î
10,Exodus 14:8,◊ô÷∞◊î÷π◊ï÷∏÷ó◊î


In [10]:
adonaiQuery2 = '''
word lex=JHWH/ g_word~O.*W.*@
'''

adonaiResults2 = BHS.search(adonaiQuery2)

  0.32s 51 results


Print the features associated with word nodes that containing data

In [11]:
featureList=Fall()
for item in adonaiResults2:
    Node=item[0]
    for feature in featureList:
        featureValue=Fs(feature).v(Node)
        if type(featureValue)!=type(None): print (feature,'=',featureValue)
    break

freq_lex = 6828
g_cons = JHWH
g_cons_utf8 = ◊ô◊î◊ï◊î
g_lex = J:HOW@H
g_lex_utf8 = ◊ô÷∞◊î÷π◊ï÷∏◊î
g_word = J:HOW@63H
g_word_utf8 = ◊ô÷∞◊î÷π◊ï÷∏÷®◊î
gloss = YHWH
gn = m
language = Hebrew
lex = JHWH/
lex_utf8 = ◊ô◊î◊ï◊î
ls = none
nametype = pers
nme = 
nu = sg
number = 1427
otype = word
pdp = nmpr
pfm = n/a
phono = [y·µäh√¥ÀåƒÅh]
phono_trailer =  
prs = n/a
prs_gn = NA
prs_nu = NA
prs_ps = NA
ps = NA
rank_lex = 6
sp = nmpr
st = a
trailer =  
trailer_utf8 =  
uvf = absent
vbe = n/a
vbs = n/a
voc_lex = J:HW@H
voc_lex_utf8 = ◊ô÷∞◊î◊ï÷∏◊î
vs = NA
vt = NA


In [12]:
import re
import pandas as pd
from IPython.display import display

# Initialize dictionary for storing results
resultDict = {}

# Process each item in the JHWHResults
for item in JHWHResults:
    node = item[3]
    # Get the pointed and unpointed representation of a word occurrence
    pointedWord = F.g_word.v(node)
    hebrewWord = F.g_word_utf8.v(node)
        
    # Remove cantillations in the BHSA (presented by digits)
    vocalizedWord = re.sub(r'\d', '', pointedWord)
    
    if vocalizedWord in resultDict:
        # If it exists, add the count to the existing value
        resultDict[vocalizedWord][0] += 1 # Increase frequency count
    else:
        # If it doesn't exist, initialize the count and store firstOccurrence
        firstOccurrence = T.sectionFromNode(node)
        resultDict[vocalizedWord] = [1, firstOccurrence, hebrewWord]  

# Convert the dictionary into a DataFrame and sort by frequency
tableData = pd.DataFrame(
    [[key, value[0], value[1], value[2]] for key, value in resultDict.items()],
    columns=["Pointing", "Frequency", "First Occurrence", "Hebrew Word"]
)
tableData = tableData.sort_values(by="Frequency", ascending=False)

# Display the table
display(tableData)

Unnamed: 0,Pointing,Frequency,First Occurrence,Hebrew Word
0,J:HW@H,5682,"(Genesis, 2, 4)",◊ô÷∞◊î◊ï÷∏÷•◊î
2,JHW@H,788,"(Genesis, 4, 3)",◊ô◊î◊ï÷∏÷Ω◊î
5,J:HWIH,270,"(Deuteronomy, 3, 24)",◊ô÷∞◊î◊ï÷¥÷ó◊î
1,J:HOW@H,45,"(Genesis, 3, 14)",◊ô÷∞◊î÷π◊ï÷∏÷®◊î
7,J:HOWIH,32,"(1_Kings, 2, 26)",◊ô÷∞◊î÷π◊ï÷¥◊î÷ô
4,JHOW@H,6,"(Genesis, 18, 17)",◊ô◊î÷π◊ï÷∏÷ñ◊î
3,J:EHWIH,2,"(Genesis, 15, 2)",◊ô÷±◊î◊ï÷¥◊î÷ô
6,J:EHOWIH,1,"(Judges, 16, 28)",◊ô÷±◊î÷π◊ï÷¥÷°◊î
8,JHWIH,1,"(Psalms, 68, 21)",◊ô◊î◊ï÷¥÷•◊î
9,J:AHW@H,1,"(Psalms, 144, 15)",◊ô÷≤◊î◊ï÷∏÷•◊î


In [13]:
qereQuery = '''
word qere_utf8 g_cons=JHWH
'''

qereResults = BHS.search(qereQuery)

  0.28s 0 results


In [14]:
for item in qereResults:
    node = item[0]
    pointedWord = F.g_word.v(node)
    qereWord =F.qere.v(node)
    uncantQereWord=re.sub(r'\d', '', qereWord)
    print (pointedWord,qereWord,uncantQereWord)
    break

# 4 - Required libraries<a class="anchor" id="bullet4"></a>
##### [Back to TOC](#TOC)

The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:

    bokeh
    IPython
    pandas
    re   

You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.

# 5 - Notebook details<a class="anchor" id="bullet5"></a>
##### [Back to TOC](#TOC)

<div style="float: left;">
  <table>
    <tr>
      <td><strong>Author</strong></td>
      <td>Tony Jurg</td>
    </tr>
    <tr>
      <td><strong>Version</strong></td>
      <td>1.0</td>
    </tr>
    <tr>
      <td><strong>Date</strong></td>
      <td>4 Novermber 2024</td>
    </tr>
  </table>
</div>