{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "---\n", "\n", "To get started: consult [start](start.ipynb)\n", "\n", "---\n", "\n", "# Similar lines\n", "\n", "We spot the many similarities between lines in the corpus.\n", "\n", "There are ca 50,000 lines in the corpus of which ca 35,000 with real content.\n", "To compare these requires more than half a billion comparisons.\n", "That is a costly operation.\n", "[On this laptop it took 21 whole minutes](https://nbviewer.jupyter.org/github/etcbc/dss/blob/master/programs/parallels.ipynb).\n", "\n", "The good news it that we have stored the outcome in an extra feature.\n", "\n", "This feature is packaged in a TF data module,\n", "that we will automatically loaded with the DSS." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:39.818664Z", "start_time": "2018-05-24T10:06:39.796588Z" } }, "outputs": [], "source": [ "import collections\n", "\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/etcbc/dss/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/etcbc/dss/tf/0.9" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/etcbc/dss/parallels/tf/0.9" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "This is Text-Fabric 9.2.2\n", "Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n", "\n", "67 features found and 1 ignored\n" ] }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 9.2.2, etcbc/dss/app v3, Search Reference
Data: DSS, Character table, Feature docs
Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "sim\n", "
\n", "
int
\n", "
\n", " similarity between lines, as a percentage of the common material wrt the combined material\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2019-05-09
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-06-11T14:51:21Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
sourceCreatedBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
sourceCreatedDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
sourceDescription:
\n", "
Dead Sea Scrolls: biblical and non-biblical scrolls
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "\n", "
Dead Sea Scrolls\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "
\n", " space behind the word, if any\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:55Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
(space)
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "alt\n", "
\n", "
int
\n", "
\n", " alternative reading\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "biblical\n", "
\n", "
int
\n", "
\n", " whether we are in biblical material or not\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
applies:
\n", "
scroll fragment line cluster word
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
remark:
\n", "
for lines it means that the material is taken from the bib source while there is also material for this line in the nonbib source. But the nonbib material is either identical or virtually absent, in which case the bib material is a reconstruction and marked as such.
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1=biblical, 2=biblical but also with nonbiblical material
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "
\n", " acronym of the book in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
str
\n", "
\n", " label of the chapter in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "cl\n", "
\n", "
str
\n", "
\n", " class (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
advb, art, artp, card, cmn, conj, gent, indp, intj, intr, mult, nega, objm, ord, prep, prp, rela, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "cl2\n", "
\n", "
str
\n", "
\n", " class (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
d, h, n, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "cor\n", "
\n", "
int
\n", "
\n", " correction made by an ancient or modern editor\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1 = modern, 2 = ancient, 3 = ancient supralinear
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fragment\n", "
\n", "
str
\n", "
\n", " label of a fragment of a scroll\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "full\n", "
\n", "
str
\n", "
\n", " full transcription (Unicode) of a word including flags and brackets\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fulle\n", "
\n", "
str
\n", "
\n", " full transcription (ETCBC transliteration) of a word including flags and brackets\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fullo\n", "
\n", "
str
\n", "
\n", " full transcription (original source) of a word including flags and brackets\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:58Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:58Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glex\n", "
\n", "
str
\n", "
\n", " representation (Unicode) of a lexeme leaving out non-letters\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:58Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glexe\n", "
\n", "
str
\n", "
\n", " representation (ETCBC transliteration) of a lexeme leaving out non-letters\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:59Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glexo\n", "
\n", "
str
\n", "
\n", " representation (original source) of a lexeme leaving out non-letters\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:59Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glyph\n", "
\n", "
str
\n", "
\n", " representation (Unicode) of a word or sign\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:00Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glyphe\n", "
\n", "
str
\n", "
\n", " representation (ETCBC transliteration) of a word or sign\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:02Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glypho\n", "
\n", "
str
\n", "
\n", " representation (original source) of a word or sign\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:04Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "
\n", " gender (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
b, c, f, m, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn2\n", "
\n", "
str
\n", "
\n", " gender (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
c, f, m, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn3\n", "
\n", "
str
\n", "
\n", " gender (for part 3) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
c, f, m
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "halfverse\n", "
\n", "
str
\n", "
\n", " label of the half-verse in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "intl\n", "
\n", "
int
\n", "
\n", " interlinear material, the value indicates the sequence number of the interlinear line\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lang\n", "
\n", "
str
\n", "
\n", " language of a word or sign, only if it is not Hebrew\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
g=greek, a=aramaic
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "
\n", " representation (Unicode) of a lexeme\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lex_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:07Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lexe\n", "
\n", "
str
\n", "
\n", " representation (ETCBC transliteration) of a lexeme\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:07Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lexo\n", "
\n", "
str
\n", "
\n", " representation (original source) of a lexeme\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "line\n", "
\n", "
str
\n", "
\n", " label of a line of a fragment of a scroll\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "md\n", "
\n", "
str
\n", "
\n", " mood (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
coho, cons, juss, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "merr\n", "
\n", "
str
\n", "
\n", " errors in parsing the morphology tag\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "morpho\n", "
\n", "
str
\n", "
\n", " morphological tag (by Abegg)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nr\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "
\n", " number (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
d, p, s, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu2\n", "
\n", "
str
\n", "
\n", " number (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
p, s, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu3\n", "
\n", "
str
\n", "
\n", " number (for part 3) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
s
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: biblical and non-biblical scrolls\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "
\n", " person (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1, 2, 3, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps2\n", "
\n", "
str
\n", "
\n", " person (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1, 2, 3
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps3\n", "
\n", "
str
\n", "
\n", " person (for part 3) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1, 2, 3
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "
\n", " trailing punctuation (Unicode) of a word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "punce\n", "
\n", "
str
\n", "
\n", " trailing punctuation (ETCBC transliteration) of a word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "punco\n", "
\n", "
str
\n", "
\n", " trailing punctuation (original source) of a word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "rec\n", "
\n", "
int
\n", "
\n", " reconstructed by a modern editor\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "rem\n", "
\n", "
int
\n", "
\n", " removed by an ancient or modern editor\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1 = modern, 2 = ancient
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "script\n", "
\n", "
str
\n", "
\n", " script in which the word or sign is written if it is not Hebrew\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
paleohebrew greekcapital
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "scroll\n", "
\n", "
str
\n", "
\n", " acronym of a scroll\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "
\n", " part of speech (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
adjv, numr, pron, ptcl, subs, suff, unknown, verb
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "sp_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "srcLn\n", "
\n", "
int
\n", "
\n", " the line number of the word in the source data file\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "
\n", " state (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:13Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
a, c, d, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "
\n", " type of sign or cluster\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:13Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "unc\n", "
\n", "
int
\n", "
\n", " uncertain material in various degrees: higher degree is less certain\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1 2 3 4
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vac\n", "
\n", "
int
\n", "
\n", " empty, unwritten space\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
str
\n", "
\n", " label of the verse in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "
\n", " verbal stem (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
aphel, apoel, haphel, hifil, hishtafel, hishtaphel, hithaphel, hithpaal, hithpeel, hithpolel, hitopel, hitpael, hitpalpel, hitpoel, hofal, hophal, hotpaal, hpealal, ishtaphel, ithpaal, ithpeel, ithpoel, nifal, nitpael, pael, palel, passive, peal, peil, piel, pilpel, poal, poel, polal, polel, pual, pulal, qal, shaphel, tifil, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vs_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "
\n", " verbal tense/aspect (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:16Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
impf, impv, infa, infc, perf, ptca, ptcp, unknown, wayy
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vt_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:16Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "occ\n", "
\n", "
none
\n", "
\n", " edge feature from a lexeme to its occurrences\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:17Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "
\n", " Dead Sea Scrolls: biblical and non-biblical scrolls\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:17Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"etcbc/dss\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The new feature is **sim** and it it an edge feature.\n", "It annotates pairs of lines $(l, m)$ where $l$ and $m$ have similar content.\n", "The degree of similarity is a percentage (between 60 and 100), and this value\n", "is annotated onto the edges.\n", "\n", "Here is an example:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 similar lines\n", "1563769 with similarity 69\n" ] }, { "data": { "text/html": [ "\n", "\n", "
npline
1CD 1:1  ועתה שמעו כל יודעי צדק ובינו במעשי
24Q268 f1:9ועתה שמעו ל׳י כול יודעי צדק ובינו במעשי אל ׃ כי ריב
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "allLines = F.otype.s(\"line\")\n", "nLines = len(allLines)\n", "exampleLine = allLines[0]\n", "sisters = E.sim.b(exampleLine)\n", "print(f\"{len(sisters)} similar lines\")\n", "print(\"\\n\".join(f\"{s[0]} with similarity {s[1]}\" for s in sisters[0:10]))\n", "A.table(tuple((s[0],) for s in ((exampleLine,), *sisters)), end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# All similarities\n", "\n", "Let's first find out the range of similarities:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimum similarity is 60\n", "maximum similarity is 100\n" ] } ], "source": [ "minSim = None\n", "maxSim = None\n", "similarity = dict()\n", "\n", "for ln in F.otype.s(\"line\"):\n", " sisters = E.sim.f(ln)\n", " if not sisters:\n", " continue\n", " for (m, s) in sisters:\n", " similarity[(ln, m)] = s\n", " thisMin = min(s[1] for s in sisters)\n", " thisMax = max(s[1] for s in sisters)\n", " if minSim is None or thisMin < minSim:\n", " minSim = thisMin\n", " if maxSim is None or thisMax > maxSim:\n", " maxSim = thisMax\n", "\n", "print(f\"minimum similarity is {minSim:>3}\")\n", "print(f\"maximum similarity is {maxSim:>3}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The bottom lines\n", "\n", "We give a few examples of the least similar lines.\n", "\n", "We can use a search template to get the 90% lines." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "line\n", "-sim=60> line\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In words: find a line connected via a sim-edge with value 60 to an other line." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.14s 24546 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nlineline
1CD 3:9  בעדת׳ם ׃ ובני׳הם ב׳ו אבדו ומלכי׳הם ב׳ו נכרתו וגיבורי׳הם ב׳ו 4Q269 f2:4  אבדו ומלכי׳הם ב׳ו נכרתו וגבורי׳הם ב׳ו אבדו וארצ׳ם ב׳ו שממה ׃
2CD 7:11  אשר אמר יבוא עלי׳ך ועל עמ׳ך ועל בית אבי׳ך ימים אשר לא 4Q59 f2_3:1  יביא יהוה עלי׳ך ועל עמ׳ך ועל בית אבי׳ך ימים אשר לא באו למיום סור אפרים
3CD 10:7  ועשרים שנה עד בני ששים שנה ׃ ואל יתיצב עוד מבן 4Q266 f8iii:6  ובישודי הברית מבני חמש ועשרים שנה ועד בן ששים שנה ׃ ואל יתיצב
4CD 10:16  רחוק מן השער מלוא׳ו ׃ כי הוא אשר אמר שמור את 4Q270 f6v:2  מן העת אשר יהיה גלגל השמש רחוק מן השער מלוא׳ו כי הוא אשר אמר
5CD 11:6  אם אלפים באמה ׃   אל ירם את יד׳ו להכות׳ה באגרוף   אם 4Q271 f5i:3  אל ירם איש את יד׳ו להכות׳ה באגרוף ׃ אם סוררת היא אל יוציא׳ה
6CD 11:16    וכל נפש אדם אשר תפול אל מים מקום מים ואל מקום 4Q270 f6v:19  הון ובצע בשבת ׃ וכל נפש אדם אשר תפול אל מקום מים ואל בור אל
7CD 12:16  והעפר אשר יגואלו בטמאת האדם לגאולי שמן ב׳הם כפי 4Q266 f9ii:3  יגואלו בטמאת האדם לגאולי שמן ב׳הם כפי טמאת׳ם יטמא
8CD 12:17  טמאת׳ם יטמא הנוגע ב׳ם ׃   וכל כלי מסמר מסמר או יתד בכותל 4Q266 f9ii:4  הנוגע ב׳ם ׃ וכול כלי מסמר ויתד בכותל אשר יהיו עם
9CD 13:13  מבני המחנה להביא איש אל העדה זולת פי המבקר אשר למחנה ׃ 4Q267 f9iv:10  ימשול איש מכול ? בני המחנה להביא איש אל העדה
10CD 14:6  שלושת׳ם והגר רביע ׃ וכן ישבו וכן ישאלו לכל ׃ והכהן אשר יפקד 4Q269 f10ii:11  ישראל שלשיים והגר רביע ׃ וכן ישבו וכן ישאלו לכול ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, start=1, end=10, withPassage=\"1 2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or in full layout:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nlineline
1CD 3:9  בעדת׳ם  ׃ ובני׳הם ב׳ו אבדו ומלכי׳הם ב׳ו נכרתו וגיבורי׳הם ב׳ו 4Q269 f2:4  אבדו ומלכי׳הם ב׳ו נכרתו וגבורי׳הם ב׳ו אבדו וארצ׳ם ב׳ו שממה  ׃
2CD 7:11  אשר אמר יבוא עלי׳ך ועל עמ׳ך ועל בית אבי׳ך ימים אשר לא 4Q59 f2_3:1  יביא יהוה עלי׳ך ועל עמ׳ך ועל בית אבי׳ך ימים אשר לא באו למיום סור אפרים
3CD 10:7  ועשרים שנה עד בני ששים שנה  ׃ ואל יתיצב עוד מבן 4Q266 f8iii:6  ובישודי הברית מבני חמש ועשרים שנה ועד בן ששים שנה  ׃ ואל יתיצב
4CD 10:16  רחוק מן השער מלוא׳ו  ׃ כי הוא אשר אמר שמור את 4Q270 f6v:2  מן העת אשר יהיה גלגל השמש רחוק מן השער מלוא׳ו כי הוא אשר אמר
5CD 11:6  אם אלפים באמה  ׃   אל ירם את יד׳ו להכות׳ה באגרוף   אם 4Q271 f5i:3  אל ירם איש את יד׳ו להכות׳ה באגרוף  ׃ אם סוררת היא אל יוציא׳ה
6CD 11:16    וכל נפש אדם אשר תפול אל מים מקום מים ואל מקום 4Q270 f6v:19  הון ובצע בשבת  ׃ וכל נפש אדם אשר תפול אל מקום מים ואל בור אל
7CD 12:16  והעפר אשר יגואלו בטמאת האדם לגאולי שמן ב׳הם כפי 4Q266 f9ii:3  יגואלו בטמאת האדם לגאולי שמן ב׳הם כפי טמאת׳ם יטמא
8CD 12:17  טמאת׳ם יטמא הנוגע ב׳ם  ׃   וכל כלי מסמר מסמר או יתד בכותל 4Q266 f9ii:4  הנוגע ב׳ם  ׃ וכול כלי מסמר ויתד בכותל אשר יהיו עם
9CD 13:13  מבני המחנה להביא איש אל העדה זולת פי המבקר אשר למחנה  ׃ 4Q267 f9iv:10  ימשול איש מכול ? בני המחנה להביא איש אל העדה
10CD 14:6  שלושת׳ם והגר רביע  ׃ וכן ישבו וכן ישאלו לכל  ׃ והכהן אשר יפקד 4Q269 f10ii:11  ישראל שלשיים והגר רביע  ׃ וכן ישבו וכן ישאלו לכול  ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, start=1, end=10, fmt=\"layout-orig-full\", withPassage=\"1 2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# More research\n", "\n", "Let's find out which lines have the most correspondences." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "16114 out of 52895 lines have at least one similar line\n" ] } ], "source": [ "parallels = {}\n", "\n", "for (ln, m) in similarity:\n", " parallels.setdefault(ln, set()).add(m)\n", " parallels.setdefault(m, set()).add(ln)\n", "\n", "print(f\"{len(parallels)} out of {nLines} lines have at least one similar line\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "rankedParallels = sorted(\n", " parallels.items(),\n", " key=lambda x: (-len(x[1]), x[0]),\n", ")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 317 siblings of 1554667 = ε # ם והב # # ל # # # # # # # # # ε = -- \\M whb\\\\l\\\\\\ \\\\\\\\\\\\ -- \n", " 291 siblings of 1565610 = ε ותי׳כם ε = -- wty/kM -- \n", " 291 siblings of 1569619 = ε # ותי׳הם # ε ׃ = -- \\wty/hM \\ -- . \n", " 291 siblings of 1578909 = ε # # # ותי׳כה ε ׃ = -- \\ \\ \\wty/kh -- . \n", " 291 siblings of 1579081 = ε # ותי׳נו ε = -- \\wty/nw -- \n", " 190 siblings of 1555321 = ε ירים למ # ε = -- yryM lm\\ -- \n", " 190 siblings of 1577062 = ε ות׳ם לה # ε = -- wt/M lh\\ -- \n", " 190 siblings of 1582371 = ε # ין ל׳הון ε = -- \\yN l/hwN -- \n", " 181 siblings of 1554556 = ε # # # # ם וכול # #   # # = -- \\\\\\\\M wkwl \\\\ □\\\\ \n", " 181 siblings of 1559975 = ε ין וכל ε = -- yN wkl -- \n" ] } ], "source": [ "for (ln, paras) in rankedParallels[0:10]:\n", " print(\n", " f'{len(paras):>4} siblings of {ln} = {T.text(ln)} = {T.text(ln, fmt=\"text-source-full\", descend=True)}'\n", " )" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 102 siblings of ε ם והפריח ε = -- M whpryj -- \n", " 102 siblings of וצואהוא # ε = wxwahwa \\ -- \n", " 102 siblings of ε # כרם וה   ׃ = -- \\ krM wh □ . \n", " 102 siblings of יחדו וית # ε = yjdw wyt\\ -- \n", " 102 siblings of וכ # ל ε ׳כה = wk\\l -- /kh \n", " 102 siblings of ε ים ואיכה = -- yM waykh \n", " 102 siblings of ε # ותוצאת ε = -- \\ wtwxat -- \n", " 102 siblings of וי # # חוץ ו # ε = wy\\ \\jwX w\\ -- \n", " 102 siblings of ε י׳הם ודב ε = -- y/hM wdb -- \n", " 102 siblings of ε ת ומנינ # ε ׃ = -- t wmnyn\\ -- . \n" ] } ], "source": [ "for (ln, paras) in rankedParallels[100:110]:\n", " print(\n", " f'{len(paras):>4} siblings of {T.text(ln)} = {T.text(ln, fmt=\"text-source-full\", descend=True)}'\n", " )" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 45 siblings of ε ב׳כה ובתורה ε = -- b/kh wbtwrh -- \n", " 45 siblings of ε ים אשר ε = -- yM aCr -- \n", " 45 siblings of אלוהים לכול ε = alwhyM lkwl -- \n", " 45 siblings of ובבינת ε = wbbynt -- \n", " 45 siblings of ε ית׳כה אשר ε = -- yt/kh aCr -- \n", " 45 siblings of ε # י׳כה אשר ε = -- \\y/kh aCr -- \n", " 45 siblings of ובעשרין ε = wboCryN -- \n", " 45 siblings of ε ים אשר ε ׃ ╱ = -- yM aCr -- . ╱ \n", " 44 siblings of ε ובעדת׳נו   ε ׃ = -- wbodt/nw □ -- . \n", " 44 siblings of ε לכול עולמים ε ׃ = -- lkwl owlmyM -- . \n" ] } ], "source": [ "for (ln, paras) in rankedParallels[500:510]:\n", " print(\n", " f'{len(paras):>4} siblings of {T.text(ln)} = {T.text(ln, fmt=\"text-source-full\", descend=True)}'\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And how many lines have just one correspondence?\n", "\n", "We look at the tail of `rankedParallels`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 7426 exclusively parallel pairs of lines\n" ] } ], "source": [ "pairs = [(x, list(paras)[0]) for (x, paras) in rankedParallels if len(paras) == 1]\n", "print(f\"There are {len(pairs)} exclusively parallel pairs of lines\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 69\n" ] }, { "data": { "text/html": [ "
CD 1:1    ועתה שמעו כל יודעי צדק ובינו במעשי
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q268 f1:9  ועתה שמעו ל׳י כול יודעי צדק ובינו במעשי אל  ׃ כי ריב
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 85\n" ] }, { "data": { "text/html": [ "
CD 1:3  כי במועל׳ם אשר עזבו׳הו הסתיר פני׳ו מישראל וממקדש׳ו
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q266 f2i:8  בכול מנאצ׳ו  ׃ כי במעל׳ם אשר עזבו׳הו הסתיר פני׳ו מישראל וממקדש׳ו
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 83\n" ] }, { "data": { "text/html": [ "
CD 1:4  ויתנ׳ם לחרב  ׃ ובזכר׳ו ברית ראשנים השאיר שאירית
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q266 f2i:9  ויתנ׳ם לחרב  ׃ ובזכר׳ו ברית רישונים השאיר שארית לישראל ולא
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 67\n" ] }, { "data": { "text/html": [ "
CD 1:5  לישראל ולא נתנ׳ם לכלה  ׃ ובקץ חרון שנים שלוש מאות
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q266 f2i:10  נתנ׳ם לכלה  ׃ ובקץ חרון שנים שלוש מאות ותשעים לתת׳ו אות׳ם ביד
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 83\n" ] }, { "data": { "text/html": [ "
CD 1:7  פקד׳ם  ׃ ויצמח מישראל ומאהרן שורש מטעת לירוש
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q268 f1:14  מלך בבל פקד׳ם  ׃ ויצמח מישראל ומאהרון שורש מטעת לירוש
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 73\n" ] }, { "data": { "text/html": [ "
CD 1:9  אנשים אשימים הם  ׃ ויהיו כעורים וכימגששים דרך
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q268 f1:16  אשמים המה  ׃ ויהיו כעוורים וכמגששים דרך שנים עשרים  ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 62\n" ] }, { "data": { "text/html": [ "
CD 1:10  שנים עשרים  ׃ ויבן אל אל מעשי׳הם כי בלב שלם דרשו׳הו
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q266 f2i:14  ויבן אל אל מעשי׳הם כי בלב שלם דרשו׳הו ויקם ל׳הם מורה צדק
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 64\n" ] }, { "data": { "text/html": [ "
CD 1:19  לפרצות ויבחרו בטוב הצואר ויצדיקו רשע וירשיעו צדיק  ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q266 f2i:22  ויבחרו במהתלות ויצפו לפרצות ויבחרו בטוב הצור ויצדיקו
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 79\n" ] }, { "data": { "text/html": [ "
CD 2:1  אל בעדת׳ם להשם את כל המונ׳ם ומעשי׳הם לנדה לפני׳ו  ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q266 f2ii:1  ויחר אף אל בעדת׳ם להשם את כול המונ׳ם ומעשי׳הם לנדה
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "---\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "similarity 79\n" ] }, { "data": { "text/html": [ "
CD 2:2    ועתה שמעו אל׳י כל באי ברית ואגלה אזנ׳כם בדרכי
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
4Q266 f2ii:2  לפנ׳ו  ׃   ועתה שמעו אל׳י כול באי ברית ואגלה אזנ׳כם בדרכי רשעים
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for (x, y) in pairs[0:10]:\n", " A.dm(\"---\\n\")\n", " print(f\"similarity {similarity[(x,y)]}\")\n", " A.plain(x, fmt=\"layout-orig-full\")\n", " A.plain(y, fmt=\"layout-orig-full\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why not make an overview of exactly how wide-spread parallel lines are?\n", "\n", "We count how many lines have how many parallels." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 445 lines have n > 100 sisters\n", " 720 lines have 20 < n <= 50 sisters\n", "1047 lines have 10 < n <= 20 sisters\n", "6476 lines have 2 < n <= 10 sisters\n", "7426 lines have n <= 2 sisters\n" ] } ], "source": [ "parallelCount = collections.Counter()\n", "\n", "buckets = (2, 10, 20, 50, 100)\n", "\n", "bucketRep = {}\n", "prevBucket = None\n", "for bucket in buckets:\n", " if prevBucket is None:\n", " bucketRep[bucket] = f\" n <= {bucket:>3}\"\n", " elif bucket == buckets[-1]:\n", " bucketRep[bucket] = f\" n > {bucket:>3}\"\n", " else:\n", " bucketRep[bucket] = f\"{prevBucket:>3} < n <= {bucket:>3}\"\n", " prevBucket = bucket\n", "\n", "for (ln, paras) in rankedParallels:\n", " clusterSize = len(paras) + 1\n", " if clusterSize > buckets[-1]:\n", " theBucket = buckets[-1]\n", " else:\n", " for bucket in buckets:\n", " if clusterSize <= bucket:\n", " theBucket = bucket\n", " break\n", " parallelCount[theBucket] += 1\n", "\n", "for (bucket, amount) in sorted(\n", " parallelCount.items(),\n", " key=lambda x: (-x[0], x[1]),\n", "):\n", " print(f\"{amount:>4} lines have {bucketRep[bucket]} sisters\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cluster the lines\n", "\n", "Before we try to find them, let's see if we can cluster the similar lines in similar clusters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From now on we forget about the level of similarity, and focus on whether two lines are just \"similar\", meaning that they have\n", "a high degree of similarity." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "SIMILARITY_THRESHOLD = 0.8\n", "CLUSTER_THRESHOLD = 0.4\n", "\n", "\n", "def makeClusters():\n", " # determine the domain\n", " domain = set()\n", " for ln in allLines:\n", " ms = E.sim.f(ln)\n", " for (m, s) in ms:\n", " if s > SIMILARITY_THRESHOLD:\n", " domain.add(s)\n", " added = True\n", " if added:\n", " domain.add(m)\n", "\n", " A.indent(reset=True)\n", " chunkSize = 1000\n", " b = 0\n", " j = 0\n", " clusters = []\n", " for ln in domain:\n", " j += 1\n", " b += 1\n", " if b == chunkSize:\n", " b = 0\n", " A.info(f\"{j:>5} lines and {len(clusters):>5} clusters\")\n", " lSisters = {x[0] for x in E.sim.b(ln) if x[1] > SIMILARITY_THRESHOLD}\n", " lAdded = False\n", " for cl in clusters:\n", " if len(cl & lSisters) > CLUSTER_THRESHOLD * len(cl):\n", " cl.add(ln)\n", " lAdded = True\n", " break\n", " if not lAdded:\n", " clusters.append({ln})\n", " A.info(f\"{j:>5} lines and {len(clusters)} clusters\")\n", " return clusters" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.08s 1000 lines and 811 clusters\n", " 0.27s 2000 lines and 1540 clusters\n", " 0.61s 3000 lines and 2432 clusters\n", " 1.09s 4000 lines and 3298 clusters\n", " 1.72s 5000 lines and 4114 clusters\n", " 2.23s 5736 lines and 4688 clusters\n" ] } ], "source": [ "clusters = makeClusters()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the distribution of the clusters, in terms of how many similar lines they contain?\n", "We count them." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "clusters of size 30: 1\n", "clusters of size 21: 1\n", "clusters of size 15: 1\n", "clusters of size 14: 1\n", "clusters of size 10: 2\n", "clusters of size 8: 1\n", "clusters of size 7: 3\n", "clusters of size 6: 6\n", "clusters of size 5: 12\n", "clusters of size 4: 26\n", "clusters of size 3: 183\n", "clusters of size 2: 407\n", "clusters of size 1: 4044\n" ] } ], "source": [ "clusterSizes = collections.Counter()\n", "\n", "for cl in clusters:\n", " clusterSizes[len(cl)] += 1\n", "\n", "for (size, amount) in sorted(\n", " clusterSizes.items(),\n", " key=lambda x: (-x[0], x[1]),\n", "):\n", " print(f\"clusters of size {size:>4}: {amount:>5}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interesting groups\n", "\n", "Exercise: investigate some interesting groups, that lie in some sweet spots.\n", "\n", "* the biggest clusters: more than 13 members\n", "* the medium clusters: between 4 and 13 members\n", "* the small clusters: between 2 and 4 members" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "All chapters:\n", "\n", "* **[start](start.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **similar Lines** spot the similarities between lines\n", "\n", "---\n", "\n", "See the [cookbook](cookbook) for recipes for small, concrete tasks.\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }