{ "cells": [ { "cell_type": "markdown", "id": "ef1d222e-9996-4c9f-bddd-173624840ebc", "metadata": {}, "source": [ "# Differences in MT and SP in parasha #20: Tetzaveh (Exodus 27:20-30:10)" ] }, { "cell_type": "markdown", "id": "00169c3b-05ee-417b-8bc4-950a9914cda7", "metadata": {}, "source": [ "## Table of Content (ToC)\n", "\n", "* 1 - Introduction\n", "* 2 - Load Text-Fabric app and data\n", "* 3 - Compare surface texts of SP and MT\n", "* 4 - Compare texts using minimum Levenshtein distance\n", "* 5 - Comparison of spelling of proper nouns between SP and MT\n", "* 6 - References and acknowledgement\n", "* 7 - Required libraries\n", "* 8 - Notebook version details" ] }, { "cell_type": "markdown", "id": "b9d778e2-a5f4-4f48-acfc-ab657714729c", "metadata": { "tags": [] }, "source": [ "# 1 - Introduction \n", "##### [Back to ToC](#TOC)\n", "\n", "The Samaritan Pentateuch (SP) is a version of the Torah preserved by the Samaritan community, differing from the Masoretic Text (MT) in several aspects, including language, orthography, and occasionally theological emphasis. This notebook compares the text of the Masoretic Text, based on the BHSA dataset in Text-Fabric, with the Samaritan Pentateuch, also available as a Text-Fabric dataset.1\n", "\n", "In this analysis, we focus on comparing the text of the verses in a specific parasha, highlighting differences in wording and orthography. Additionally, special attention is given to spelling variations of proper nouns between the two traditions. This notebook draws inspiration from the notebook provided by Martijn Naaijer2 and aims to explore the textual nuances between these two important versions of the Torah." ] }, { "cell_type": "markdown", "id": "b1c357f7-dcb7-4292-b8f3-8349b5844057", "metadata": {}, "source": [ "# 2 - Load Text-Fabric app and data \n", "##### [Back to ToC](#TOC)\n", "\n", "The following code will load the Text-Fabric version of the [Samaritan Pentatuch](https://github.com/DT-UCPH/sp), the [Biblia Hebraica Stuttgartensia (Amstelodamensis)](https://etcbc.github.io/bhsa/) together with the additonal parasha related features from [tonyjurg/BHSaddons](https://github.com/tonyjurg/BHSaddons)." ] }, { "cell_type": "code", "execution_count": 1, "id": "b8699903-30cf-4a0b-902d-70622ddeead0", "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/DT-UCPH/sp/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/DT-UCPH/sp/tf/3.4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.6.1, DT-UCPH/sp/app v3, Search Reference
\n", " Data: DT-UCPH - sp 3.4, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book579878.40100
chapter1872135.79100
verse584168.38100
word1148903.48100
sign3993921.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
The Samaritan Pentateuch\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " book title\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " chapter number\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " word consonantal-transliterated\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_raw\n", "
\n", "
str
\n", "\n", " word consonantal-transliterated (without disambiguation of Shin (C) and Sin (F))\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " word in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " realized lexeme\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " realized lexeme in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "g_nme\n", "
\n", "
str
\n", "\n", " realized nominal ending consonantal\n", "\n", "
\n", "\n", "
\n", "
\n", "g_nme_utf8\n", "
\n", "
str
\n", "\n", " realized nominal ending consonantal in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "g_pfm\n", "
\n", "
str
\n", "\n", " realized verbal preformative consonantal\n", "\n", "
\n", "\n", "
\n", "
\n", "g_pfm_utf8\n", "
\n", "
str
\n", "\n", " realized verbal preformative consonantal in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "g_prs\n", "
\n", "
str
\n", "\n", " realized pronominal suffix consonantal\n", "\n", "
\n", "\n", "
\n", "
\n", "g_prs_utf8\n", "
\n", "
str
\n", "\n", " realized pronominal suffix consonantal in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "g_uvf\n", "
\n", "
str
\n", "\n", " realized univalent final\n", "\n", "
\n", "\n", "
\n", "
\n", "g_uvf_utf8\n", "
\n", "
str
\n", "\n", " realized univalent final in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "g_vbe\n", "
\n", "
str
\n", "\n", " realized verbal ending consonantal\n", "\n", "
\n", "\n", "
\n", "
\n", "g_vbe_utf8\n", "
\n", "
str
\n", "\n", " realized verbal ending consonantal in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "g_vbs\n", "
\n", "
str
\n", "\n", " realized verbal stem consonantal\n", "\n", "
\n", "\n", "
\n", "
\n", "g_vbs_utf8\n", "
\n", "
str
\n", "\n", " realized verbal stem consonantal in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " gender\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " language\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " lexeme consonantal-transliterated\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " lexeme in Hebrew script\n", "\n", "
\n", "\n", "
\n", "
\n", "mt_feat\n", "
\n", "
str
\n", "\n", " features imposed from MT\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " grammatical number\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "prediction\n", "
\n", "
str
\n", "\n", " neural network prediction\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " pronominal suffix gender\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " pronominal suffix number\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " pronominal suffix person\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " grammatical person\n", "\n", "
\n", "\n", "
\n", "
\n", "sign\n", "
\n", "
str
\n", "\n", " consonantal letter\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " part of speech\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " interword material\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " verbal tense\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: DT-UCPH/sp
  3. appPath: C:/Users/tonyj/text-fabric-data/github/DT-UCPH/sp/app
  4. commit: g0c9b2fff6448228af93ed6c466ba95e6c0bb3547
  5. css: ''
  6. dataDisplay:
  7. \n", " textFormats:\n", "
  8. \n", " layout-orig-full:\n", "
    • method: layoutRich
    • style: orig
    \n", "
  9. \n", "
  10. docs:
    • docBase: {docRoot}/bhsa
    • docExt: ''
    • docPage: ''
    • docRoot: https://etcbc.github.io
    • featurePage: 0_home
  11. interfaceDefaults: {}
  12. isCompatible: True
  13. local: local
  14. localDir: C:/Users/tonyj/text-fabric-data/github/DT-UCPH/sp/_temp
  15. provenanceSpec:
    • corpus: The Samaritan Pentateuch
    • org: DT-UCPH
    • relative: /tf
    • repo: sp
    • version: 3.4
  16. release: v3.4
  17. typeDisplay:
    • verse:
      • label: {verse}
      • template: {verse}
      • verselike: True
    • word: {features: lex}
  18. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/etcbc/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/tonyjurg/BHSaddons/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/etcbc/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.6.1, etcbc/bhsa/app v3, Search Reference
\n", " Data: etcbc - bhsa 2021, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book3910938.21100
chapter929459.19100
lex923046.22100
verse2321318.38100
half_verse451799.44100
sentence637176.70100
sentence_atom645146.61100
clause881314.84100
clause_atom907044.70100
phrase2532031.68100
phrase_atom2675321.59100
subphrase1138501.4238
word4265901.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
tonyjurg/BHSaddons/tf\n", "
\n", "\n", "
\n", "
\n", "aliyotnum\n", "
\n", "
str
\n", "\n", " The sequence number of the aliyot within the parasha\n", "\n", "
\n", "\n", "
\n", "
\n", "maftir\n", "
\n", "
str
\n", "\n", " Set to 1 if this verse is part of a maftir\n", "\n", "
\n", "\n", "
\n", "
\n", "parashahebr\n", "
\n", "
str
\n", "\n", " The name of the parasha in Hebrew\n", "\n", "
\n", "\n", "
\n", "
\n", "parashanum\n", "
\n", "
int
\n", "\n", " The sequence number of the parasha\n", "\n", "
\n", "\n", "
\n", "
\n", "parashatrans\n", "
\n", "
str
\n", "\n", " Transliteration of the Hebrew parasha name\n", "\n", "
\n", "\n", "
\n", "
\n", "parashaverse\n", "
\n", "
str
\n", "\n", " The sequence number of the verse within the parasha\n", "\n", "
\n", "\n", "
\n", "
\n", "wordboundary\n", "
\n", "
str
\n", "\n", " indicates wordboudaries (spaces OR maqaf)\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: etcbc/bhsa
  3. appPath: C:/Users/tonyj/text-fabric-data/github/etcbc/bhsa/app
  4. commit: gd905e3fb6e80d0fa537600337614adc2af157309
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: local
  11. localDir: C:/Users/tonyj/text-fabric-data/github/etcbc/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: etcbc
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: etcbc
        • relative: /tf
        • repo: parallels
    • org: etcbc
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: v1.8
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from tf.app import use\n", "\n", "# Load the SP data, and rename the node features class F,\n", "# the locality class L and the text class T, \n", "# then they cannot be overwritten while loading the MT.\n", "SP = use('DT-UCPH/sp', version='3.4')\n", "Fsp, Lsp, Tsp = SP.api.F, SP.api.L, SP.api.T\n", "\n", "# Do the same for the MT dataset (BHSA) together with BHSaddons \n", "MT = use('etcbc/bhsa', version='2021',mod=\"tonyjurg/BHSaddons/tf/\")\n", "Fmt, Lmt, Tmt = MT.api.F, MT.api.L, MT.api.T" ] }, { "cell_type": "markdown", "id": "f0086270", "metadata": {}, "source": [ "# 3 - Compare surface texts of SP and MT \n", "##### [Back to ToC](#TOC)\n", "\n", "In this section, we compare the surface texts of the Samaritan Pentateuch (SP) and the Masoretic Text (MT) at the verse level. By analyzing the wording and structure of these texts, we aim to identify variations." ] }, { "cell_type": "code", "execution_count": 2, "id": "3b399f27-1f11-4efc-94e6-db55fe87a684", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 101 results\n" ] } ], "source": [ "# find all word nodes for this parasha (we can either use the transliterated name or the sequence number)\n", "parashaQuery = '''\n", "verse parashanum=20\n", "'''\n", "parashaResults = MT.search(parashaQuery)" ] }, { "cell_type": "code", "execution_count": 3, "id": "e064e73f-8ca2-4d78-ac65-2b518e1dc4aa", "metadata": {}, "outputs": [], "source": [ "# Extract book, chapter, and verse information\n", "bookChapterVerseList = [\n", " Tmt.sectionFromNode(verse[0]) for verse in parashaResults\n", "]\n", "\n", "# Store parashname, start and end verse for future use\n", "startNode=parashaResults[0][0]\n", "endNode=parashaResults[-1][0]\n", "parashaNameHebrew=Fmt.parashahebr.v(startNode)\n", "parashaNameEnglish=Fmt.parashatrans.v(startNode)\n", "bookStart,chapterStart,startVerse=Tmt.sectionFromNode(startNode)\n", "parashaStart=f'{bookStart} {chapterStart}:{startVerse}'\n", "bookEnd,chapterEnd,startEnd=Tmt.sectionFromNode(endNode)\n", "parashaEnd=f'{chapterEnd}:{startEnd}'\n", "htmlStart=''\n", "htmlFooter=f'

Data generated by `delta_mt_and_sp.ipynb` at `github.com/tonyjurg/Parashot`

`'" ] }, { "cell_type": "code", "execution_count": 4, "id": "06e216e5-a5b1-47d6-994c-f4e8c6a66e90", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'<=' not supported between instances of 'NoneType' and 'int'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[1;32mIn[4], line 17\u001b[0m\n\u001b[0;32m 14\u001b[0m verseTexts[verseName] \u001b[38;5;241m=\u001b[39m verseText\u001b[38;5;241m.\u001b[39mstrip()\n\u001b[0;32m 15\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m verseTexts\n\u001b[1;32m---> 17\u001b[0m SPverses \u001b[38;5;241m=\u001b[39m \u001b[43mreconstructVerses\u001b[49m\u001b[43m(\u001b[49m\u001b[43mFsp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mLsp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mTsp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mg_cons\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbookChapterVerseList\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 18\u001b[0m MTverses \u001b[38;5;241m=\u001b[39m reconstructVerses(Fmt, Lmt, Tmt, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mg_cons\u001b[39m\u001b[38;5;124m'\u001b[39m, bookChapterVerseList)\n", "Cell \u001b[1;32mIn[4], line 8\u001b[0m, in \u001b[0;36mreconstructVerses\u001b[1;34m(F, L, T, textFeature, inputList)\u001b[0m\n\u001b[0;32m 6\u001b[0m verseText \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124m'\u001b[39m\n\u001b[0;32m 7\u001b[0m verseNode \u001b[38;5;241m=\u001b[39m T\u001b[38;5;241m.\u001b[39mnodeFromSection(verseName)\n\u001b[1;32m----> 8\u001b[0m wordNodes \u001b[38;5;241m=\u001b[39m \u001b[43mL\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43md\u001b[49m\u001b[43m(\u001b[49m\u001b[43mverseNode\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mword\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[0;32m 9\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m wordNode \u001b[38;5;129;01min\u001b[39;00m wordNodes:\n\u001b[0;32m 10\u001b[0m wordText \u001b[38;5;241m=\u001b[39m \u001b[38;5;28meval\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mF.\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mtextFeature\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.v(wordNode)\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", "File \u001b[1;32m~\\anaconda3\\envs\\Text-Fabric\\Lib\\site-packages\\tf\\core\\locality.py:182\u001b[0m, in \u001b[0;36mLocality.d\u001b[1;34m(self, n, otype)\u001b[0m\n\u001b[0;32m 180\u001b[0m fOtype \u001b[38;5;241m=\u001b[39m Fotype\u001b[38;5;241m.\u001b[39mv\n\u001b[0;32m 181\u001b[0m maxSlot \u001b[38;5;241m=\u001b[39m Fotype\u001b[38;5;241m.\u001b[39mmaxSlot\n\u001b[1;32m--> 182\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[43mn\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m<\u001b[39;49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mmaxSlot\u001b[49m:\n\u001b[0;32m 183\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mtuple\u001b[39m()\n\u001b[0;32m 184\u001b[0m maxNode \u001b[38;5;241m=\u001b[39m Fotype\u001b[38;5;241m.\u001b[39mmaxNode\n", "\u001b[1;31mTypeError\u001b[0m: '<=' not supported between instances of 'NoneType' and 'int'" ] } ], "source": [ "# Function to reconstruct verses\n", "def reconstructVerses(F, L, T, textFeature, inputList):\n", " \"\"\"Reconstruct text for each verse.\"\"\"\n", " verseTexts = {}\n", " for verseName in inputList:\n", " verseText = ''\n", " verseNode = T.nodeFromSection(verseName)\n", " wordNodes = L.d(verseNode, 'word')\n", " for wordNode in wordNodes:\n", " wordText = eval(f'F.{textFeature}.v(wordNode)')\n", " trailer = F.trailer.v(wordNode)\n", " if wordText:\n", " verseText += wordText + (trailer if trailer else ' ')\n", " verseTexts[verseName] = verseText.strip()\n", " return verseTexts\n", " \n", "SPverses = reconstructVerses(Fsp, Lsp, Tsp, 'g_cons', bookChapterVerseList)\n", "MTverses = reconstructVerses(Fmt, Lmt, Tmt, 'g_cons', bookChapterVerseList)" ] }, { "cell_type": "code", "execution_count": null, "id": "492e7cbe-2b84-48a9-a44c-6b6d051b0c5d", "metadata": {}, "outputs": [], "source": [ "from difflib import SequenceMatcher\n", "from IPython.display import HTML, display\n", "\n", "def highlightMatches(baseText, comparisonText):\n", " matcher = SequenceMatcher(None, baseText, comparisonText)\n", " highlightedComparisonText = \"\" \n", " for tag, i1, i2, j1, j2 in matcher.get_opcodes():\n", " if tag == \"equal\": # Identical parts\n", " highlightedComparisonText += comparisonText[j1:j2]\n", " else: # Non-matching parts\n", " highlightedComparisonText += f'{comparisonText[j1:j2]}' \n", " return highlightedComparisonText\n", "\n", "def cleanText(text):\n", " replacements = [\n", " # for the transcoded strings\n", " ('00_P', ''), # Remove '00_P'\n", " ('00_S', ''), # Remove '00_S'\n", " ('00', ''), # Remove '00'\n", " ('&', ' '), # Replace '&' with a space\n", " # for the Hebrew strings\n", " ('ס ', ''), # Final Samekh\n", " ('פ ', ''), # Final Pe\n", " ('׃', ''), # End of verse\n", " ('־',' ') # maqaf\n", " ]\n", " # Apply each replacement\n", " for old, new in replacements:\n", " text = text.replace(old, new)\n", " return text\n", "\n", "# Function to format and highlight verse differences between MT and SP\n", "def formatAndHighlight(label, MTverseText, SPverseText):\n", " book, chapter, verse = label\n", " MTverseNode = Tmt.nodeFromSection(label)\n", " MTtext = cleanText(Tmt.text(MTverseNode, \"text-orig-plain\"))\n", " SPverseNode = Tsp.nodeFromSection(label)\n", " SPtext = Tsp.text(SPverseNode)\n", " SPmarkedText = highlightMatches(MTtext, SPtext)\n", " MTmarkedText = highlightMatches(SPtext, MTtext)\n", " formattedDiff = (\n", " f'

{book} {chapter}:{verse}

'\n", " f'

SP: {SPmarkedText}
MT: {MTmarkedText}

'\n", " )\n", " return formattedDiff\n", "\n", "# Gather differences into an HTML string\n", "htmlContent = f'

Differences between MT and SP for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})

'\n", "for label, MTverseText in MTverses.items():\n", " SPverseText = SPverses.get(label, '')\n", " MTverseText = cleanText(MTverseText)\n", " if MTverseText != SPverseText: # Check for differences\n", " difference = formatAndHighlight(label, MTverseText, SPverseText)\n", " htmlContent += difference\n", "\n", "# Save the content to an HTML file\n", "fileName = f\"differences_MT_SP({parashaNameEnglish.replace(' ','%20')}).html\"\n", "with open(fileName, \"w\", encoding=\"utf-8\") as file:\n", " file.write(htmlContent)\n", "\n", "# Display the content in the notebook\n", "display(HTML(htmlContent))\n", "\n", "# wrap html header and footer and display a download button\n", "htmlContentFull = f'{htmlStart}{htmlContent}{htmlFooter}'\n", "downloadButton = f\"\"\"\n", "', '>').replace('\"', '"').replace(\"'\", ''')}\" target=\"_blank\">\n", " \n", "\n", "\"\"\"\n", "display(HTML(downloadButton))" ] }, { "cell_type": "markdown", "id": "da916301", "metadata": {}, "source": [ "# 4 - Compare texts using minimum Levenshtein distance\n", "##### [Back to ToC](#TOC)" ] }, { "cell_type": "markdown", "id": "f516dc36-754b-4b37-b2d7-7f07b18310ac", "metadata": {}, "source": [ "The Levenshtein distance measures the minimum number of single-character edits (insertions, deletions, or substitutions) needed to transform one text into another, providing a quantitative way to compare textual differences. For comparing the Masoretic Text and Samaritan Pentateuch, it highlights variations in spelling, word order, or minor textual changes. \n", "In the context of the Levenshtein distance (in the script below `threshold`), a higher number indicates greater dissimilarity between two texts, meaning more edits (insertions, deletions, or substitutions) are needed to transform one text into the other." ] }, { "cell_type": "code", "execution_count": null, "id": "e71c202d-a845-4e35-89a4-02d6a45fff31", "metadata": {}, "outputs": [], "source": [ "from Levenshtein import distance\n", "from IPython.display import HTML, display\n", "\n", "threshold = 15\n", "\n", "# Create an HTML string to store the output\n", "htmlContent = f'

Levenshtein distance >{threshold} between MT and SP for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})

'\n", "\n", "# Create header\n", "MT.dm(f'### Levenshtein distance >{threshold} between MT and SP for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})')\n", "\n", "# Generate the HTML content\n", "for label, MTverseText in MTverses.items():\n", " SPverseText = SPverses.get(label, '')\n", " levDistance = distance(MTverseText, SPverseText) # Calculate the distance\n", " if levDistance > threshold:\n", " formattedDiff = formatAndHighlight(label, MTverseText, SPverseText)\n", " formattedDiff += f'

Levenshtein Distance: {levDistance}

' # Add the distance\n", " MT.dm(formattedDiff)\n", " htmlContent += formattedDiff # Append to the HTML content\n", "\n", "# Save the content to an HTML file\n", "fileName = f\"levenshtein_differences_MT_SP({parashaNameEnglish.replace(' ','%20')}).html\"\n", "with open(fileName, \"w\", encoding=\"utf-8\") as file:\n", " file.write(htmlContent)\n", "\n", "# wrap html header and footer and display a download button\n", "htmlContentFull = f'{htmlStart}{htmlContent}{htmlFooter}'\n", "downloadButton = f\"\"\"\n", "', '>').replace('\"', '"').replace(\"'\", ''')}\" target=\"_blank\">\n", " \n", "\n", "\"\"\"\n", "display(HTML(downloadButton))" ] }, { "cell_type": "markdown", "id": "63886933", "metadata": {}, "source": [ "# 5 - Comparison of spelling of proper nouns between SP and MT\n", "##### [Back to ToC](#TOC)\n", "\n", "This section focuses on comparing the spelling of proper nouns between the Samaritan Pentateuch (SP) and the Masoretic Text (MT). Proper nouns, including names of people, places, and unique terms, often exhibit variations in spelling" ] }, { "cell_type": "code", "execution_count": null, "id": "a4011f7c", "metadata": {}, "outputs": [], "source": [ "import collections\n", "\n", "def collectProperNounSpellings(F, L, T, inputList):\n", " \"\"\"\n", " Collect proper noun spellings and their associated word node numbers.\n", " Ensures only one tuple is stored for each lexeme-to-spelling mapping.\n", " \"\"\"\n", " properNounsSpellings = {}\n", " for bookChapterVerse in inputList:\n", " verseNode = T.nodeFromSection(bookChapterVerse)\n", " wordNodes = L.d(verseNode, 'word')\n", " for wordNode in wordNodes:\n", " if F.sp.v(wordNode) == 'nmpr': # Check if the word is a proper noun\n", " lex = F.lex.v(wordNode) # Lexical form\n", " spelling = F.g_cons.v(wordNode) # Spelling\n", " # Store only the first occurrence for each lex-to-cons mapping\n", " if lex not in properNounsSpellings or spelling not in {item[0] for item in properNounsSpellings[lex]}:\n", " properNounsSpellings.setdefault(lex, []).append((spelling, wordNode))\n", " return properNounsSpellings\n", " \n", "SPspellingDict = collectProperNounSpellings(Fsp, Lsp, Tsp, bookChapterVerseList) \n", "MTspellingDict = collectProperNounSpellings(Fmt, Lmt, Tmt, bookChapterVerseList)" ] }, { "cell_type": "code", "execution_count": null, "id": "4f596918-087e-4658-bc57-0ffc85779cb7", "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML, display\n", "\n", "# Initialize HTML content\n", "htmlContent = f'

Spelling differences in proper nouns between SP and MT for parasha {parashaNameEnglish} ({parashaStart}-{parashaEnd})

'\n", "\n", "# Generate the HTML output\n", "for lex, MTspellings in MTspellingDict.items():\n", " # Retrieve SP spellings, defaulting to an empty set if lex is not found\n", " SPspellings = SPspellingDict.get(lex, set())\n", "\n", " # Extract only the spellings (ignoring node numbers) for comparison\n", " MTspellingSet = {spelling for spelling, _ in MTspellings}\n", " SPspellingSet = {spelling for spelling, _ in SPspellings}\n", "\n", " # Compare the sets of spellings\n", " if MTspellingSet != SPspellingSet:\n", " # Print MT spelling with reference\n", " MTnode = list(MTspellings)[0][1] # Get first tuple's node number\n", " book, chapter, verse = Tmt.sectionFromNode(MTnode)\n", " MTgloss = Fmt.gloss.v(MTnode)\n", " MTspelling = Fmt.g_cons_utf8.v(MTnode)\n", "\n", " # Build HTML output\n", " output = (\n", " f'

Word: {MTgloss} '\n", " f''\n", " f'{book} {chapter}:{verse}

'\n", " f'
  • MT Spelling: {MTspelling}
  • '\n", " )\n", "\n", " # Print SP spellings with reference\n", " if SPspellings:\n", " SPnode = list(SPspellings)[0][1] # Get first tuple's node number\n", " SPspelling = Fsp.g_cons_utf8.v(SPnode)\n", " output += f'
  • SP Spelling: {SPspelling}
'\n", " else:\n", " output += '
  • SP Spelling: None
  • '\n", "\n", " # Append the output to the HTML content\n", " htmlContent += output\n", "\n", "# Save the HTML content to a file\n", "fileName = f\"spelling_differences_SP_MT({parashaNameEnglish.replace(' ','%20')}).html\"\n", "with open(fileName, \"w\", encoding=\"utf-8\") as file:\n", " file.write(htmlContent)\n", "\n", "# Display the HTML content in the notebook\n", "display(HTML(htmlContent))\n", "\n", "# wrap html header and footer and display a download button\n", "htmlContentFull = f'{htmlStart}{htmlContent}{htmlFooter}'\n", "downloadButton = f\"\"\"\n", "', '>').replace('\"', '"').replace(\"'\", ''')}\" target=\"_blank\">\n", " \n", "\n", "\"\"\"\n", "display(HTML(downloadButton))" ] }, { "cell_type": "markdown", "id": "0fbcc828-2880-4d36-bd66-5a9f085b616d", "metadata": {}, "source": [ "# 6 - References and acknowledgement \n", "##### [Back to ToC](#TOC)\n", "\n", "1 Christian Canu Højgaard, Martijn Naaijer, & Stefan Schorch. (2023). Text-Fabric Dataset of the Samaritan Pentateuch. Zenodo. https://doi.org/10.5281/zenodo.7734632\n", "\n", "2 [Notebook created by Martijn Naaijer](https://github.com/DT-UCPH/sp/blob/main/notebooks/combine_sp_with_mt_data.ipynb)" ] }, { "cell_type": "markdown", "id": "4e319feb-814e-4903-922a-58bae953224c", "metadata": { "tags": [] }, "source": [ "# 7 - Required libraries \n", "##### [Back to ToC](#TOC)\n", "\n", "The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:\n", "\n", " collections\n", " difflib\n", " Levenshtein\n", "\n", "You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`." ] }, { "cell_type": "markdown", "id": "68f75bc4-7bc2-42f1-af91-fe6e5b83a277", "metadata": {}, "source": [ "# 8 - Notebook version details\n", "##### [Back to ToC](#TOC)\n", "\n", "
    \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
    AuthorTony Jurg
    Version1.1
    Date5 March 2025
    \n", "
    " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }