{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "# Tutorial\n", "\n", "This notebook gets you started with using\n", "[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in the Dead-Sea Scrolls.\n", "\n", "Familiarity with the underlying\n", "[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html)\n", "is recommended." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cookbook\n", "\n", "This tutorial and its sister tutorials are meant to showcase most of things TF can do.\n", "\n", "But we also have a [cookbook](cookbook) with a set of focused recipes on tricky things." ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## Installing Text-Fabric\n", "\n", "See [here](https://annotation.github.io/text-fabric/tf/about/install.html)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Tip\n", "If you start computing with this tutorial, first copy its parent directory to somewhere else,\n", "outside your repository.\n", "If you pull changes from the repository later, your work will not be overwritten.\n", "Where you put your tutorial directory is up to you.\n", "It will work from any directory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "Text-Fabric will fetch the data set for you from GitHub, and check for updates.\n", "\n", "The data will be stored in the `text-fabric-data` in your home directory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Features\n", "The data of the corpus is organized in features.\n", "They are *columns* of data.\n", "Think of the corpus as a gigantic spreadsheet, where row 1 corresponds to the\n", "first sign, row 2 to the second sign, and so on, for all ~ 1.5 M signs,\n", "followed by ~ 500 K word nodes and yet another 200 K nodes of other types.\n", "\n", "The information which reading each sign has, constitutes a column in that spreadsheet.\n", "The DSS corpus contains > 50 columns.\n", "\n", "Instead of putting that information in one big table, the data is organized in separate columns.\n", "We call those columns **features**." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:16.202764Z", "start_time": "2018-05-18T09:17:16.197546Z" } }, "outputs": [], "source": [ "import os\n", "import collections" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Incantation\n", "\n", "The simplest way to get going is by this *incantation*:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:17.537171Z", "start_time": "2018-05-18T09:17:17.517809Z" } }, "outputs": [], "source": [ "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/etcbc/dss/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/etcbc/dss/tf/0.9" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/etcbc/dss/parallels/tf/0.9" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "This is Text-Fabric 9.3.1\n", "Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n", "\n", "67 features found and 1 ignored\n" ] }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 9.3.1, etcbc/dss/app v3, Search Reference
Data: DSS, Character table, Feature docs
Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "sim\n", "
\n", "
int
\n", "
\n", " similarity between lines, as a percentage of the common material wrt the combined material\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2019-05-09
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2019-06-11T14:51:21Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
sourceCreatedBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
sourceCreatedDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
sourceDescription:
\n", "
Dead Sea Scrolls: biblical and non-biblical scrolls
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "\n", "
Dead Sea Scrolls\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "
\n", " space behind the word, if any\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:55Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
(space)
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "alt\n", "
\n", "
int
\n", "
\n", " alternative reading\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "biblical\n", "
\n", "
int
\n", "
\n", " whether we are in biblical material or not\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
applies:
\n", "
scroll fragment line cluster word
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
remark:
\n", "
for lines it means that the material is taken from the bib source while there is also material for this line in the nonbib source. But the nonbib material is either identical or virtually absent, in which case the bib material is a reconstruction and marked as such.
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1=biblical, 2=biblical but also with nonbiblical material
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "
\n", " acronym of the book in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
str
\n", "
\n", " label of the chapter in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "cl\n", "
\n", "
str
\n", "
\n", " class (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:56Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
advb, art, artp, card, cmn, conj, gent, indp, intj, intr, mult, nega, objm, ord, prep, prp, rela, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "cl2\n", "
\n", "
str
\n", "
\n", " class (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
d, h, n, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "cor\n", "
\n", "
int
\n", "
\n", " correction made by an ancient or modern editor\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1 = modern, 2 = ancient, 3 = ancient supralinear
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fragment\n", "
\n", "
str
\n", "
\n", " label of a fragment of a scroll\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "full\n", "
\n", "
str
\n", "
\n", " full transcription (Unicode) of a word including flags and brackets\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fulle\n", "
\n", "
str
\n", "
\n", " full transcription (ETCBC transliteration) of a word including flags and brackets\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:57Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "fullo\n", "
\n", "
str
\n", "
\n", " full transcription (original source) of a word including flags and brackets\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:58Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:58Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glex\n", "
\n", "
str
\n", "
\n", " representation (Unicode) of a lexeme leaving out non-letters\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:58Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glexe\n", "
\n", "
str
\n", "
\n", " representation (ETCBC transliteration) of a lexeme leaving out non-letters\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:59Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glexo\n", "
\n", "
str
\n", "
\n", " representation (original source) of a lexeme leaving out non-letters\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:01:59Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glyph\n", "
\n", "
str
\n", "
\n", " representation (Unicode) of a word or sign\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:00Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glyphe\n", "
\n", "
str
\n", "
\n", " representation (ETCBC transliteration) of a word or sign\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:02Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "glypho\n", "
\n", "
str
\n", "
\n", " representation (original source) of a word or sign\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:04Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "
\n", " gender (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
b, c, f, m, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn2\n", "
\n", "
str
\n", "
\n", " gender (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
c, f, m, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn3\n", "
\n", "
str
\n", "
\n", " gender (for part 3) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
c, f, m
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gn_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "halfverse\n", "
\n", "
str
\n", "
\n", " label of the half-verse in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "intl\n", "
\n", "
int
\n", "
\n", " interlinear material, the value indicates the sequence number of the interlinear line\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lang\n", "
\n", "
str
\n", "
\n", " language of a word or sign, only if it is not Hebrew\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
g=greek, a=aramaic
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "
\n", " representation (Unicode) of a lexeme\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:06Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lex_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:07Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lexe\n", "
\n", "
str
\n", "
\n", " representation (ETCBC transliteration) of a lexeme\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:07Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "lexo\n", "
\n", "
str
\n", "
\n", " representation (original source) of a lexeme\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "line\n", "
\n", "
str
\n", "
\n", " label of a line of a fragment of a scroll\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "md\n", "
\n", "
str
\n", "
\n", " mood (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
coho, cons, juss, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "merr\n", "
\n", "
str
\n", "
\n", " errors in parsing the morphology tag\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "morpho\n", "
\n", "
str
\n", "
\n", " morphological tag (by Abegg)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:08Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nr\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "
\n", " number (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
d, p, s, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu2\n", "
\n", "
str
\n", "
\n", " number (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
p, s, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu3\n", "
\n", "
str
\n", "
\n", " number (for part 3) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
s
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "nu_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:09Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: biblical and non-biblical scrolls\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "
\n", " person (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1, 2, 3, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps2\n", "
\n", "
str
\n", "
\n", " person (for part 2) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1, 2, 3
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps3\n", "
\n", "
str
\n", "
\n", " person (for part 3) (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1, 2, 3
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "ps_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:10Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "
\n", " trailing punctuation (Unicode) of a word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "punce\n", "
\n", "
str
\n", "
\n", " trailing punctuation (ETCBC transliteration) of a word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "punco\n", "
\n", "
str
\n", "
\n", " trailing punctuation (original source) of a word\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "rec\n", "
\n", "
int
\n", "
\n", " reconstructed by a modern editor\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:11Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "rem\n", "
\n", "
int
\n", "
\n", " removed by an ancient or modern editor\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1 = modern, 2 = ancient
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "script\n", "
\n", "
str
\n", "
\n", " script in which the word or sign is written if it is not Hebrew\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
paleohebrew greekcapital
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "scroll\n", "
\n", "
str
\n", "
\n", " acronym of a scroll\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "
\n", " part of speech (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
adjv, numr, pron, ptcl, subs, suff, unknown, verb
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "sp_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "srcLn\n", "
\n", "
int
\n", "
\n", " the line number of the word in the source data file\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:12Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "
\n", " state (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:13Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
a, c, d, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "
\n", " type of sign or cluster\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:13Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "unc\n", "
\n", "
int
\n", "
\n", " uncertain material in various degrees: higher degree is less certain\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1 2 3 4
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vac\n", "
\n", "
int
\n", "
\n", " empty, unwritten space\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
1
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
str
\n", "
\n", " label of the verse in which the word occurs\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "
\n", " verbal stem (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
aphel, apoel, haphel, hifil, hishtafel, hishtaphel, hithaphel, hithpaal, hithpeel, hithpolel, hitopel, hitpael, hitpalpel, hitpoel, hofal, hophal, hotpaal, hpealal, ishtaphel, ithpaal, ithpeel, ithpoel, nifal, nitpael, pael, palel, passive, peal, peil, piel, pilpel, poal, poel, polal, polel, pual, pulal, qal, shaphel, tifil, unknown
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vs_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:15Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "
\n", " verbal tense/aspect (morphology tag)\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:16Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
values:
\n", "
impf, impv, infa, infc, perf, ptca, ptcp, unknown, wayy
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "vt_etcbc\n", "
\n", "
str
\n", "
\n", " Dead Sea Scrolls: additions based on BHSA and machine learning\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss-additions
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Martijn Naaijer, ETCBC
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2020
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:16Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martijn Naaijer's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "occ\n", "
\n", "
none
\n", "
\n", " edge feature from a lexeme to its occurrences\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:17Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "
\n", " Dead Sea Scrolls: biblical and non-biblical scrolls\n", "
\n", "\n", "
\n", "
acronym:
\n", "
dss
\n", "
\n", "\n", "
\n", "
convertedBy:
\n", "
Jarod Jacobs, Martijn Naaijer and Dirk Roorda
\n", "
\n", "\n", "
\n", "
createdBy:
\n", "
Martin G. Abegg, Jr., James E. Bowley, and Edward M. Cook
\n", "
\n", "\n", "
\n", "
createdDate:
\n", "
2015
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-12-29T15:02:17Z
\n", "
\n", "\n", "
\n", "
license:
\n", "
Creative Commons Attribution-NonCommercial 4.0 International License
\n", "
\n", "\n", "
\n", "
licenseUrl:
\n", "
http://creativecommons.org/licenses/by-nc/4.0/
\n", "
\n", "\n", "
\n", "
source:
\n", "
Martin Abegg's data files, personal communication
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"etcbc/dss\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see which features have been loaded, and if you click on a feature name, you find its documentation.\n", "If you hover over a name, you see where the feature is located on your system." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## API\n", "\n", "The result of the incantation is that we have a bunch of special variables at our disposal\n", "that give us access to the text and data of the corpus.\n", "\n", "At this point it is helpful to throw a quick glance at the text-fabric API documentation\n", "(see the links under **API Members** above).\n", "\n", "The most essential thing for now is that we can use `F` to access the data in the features\n", "we've loaded.\n", "But there is more, such as `N`, which helps us to walk over the text, as we see in a minute.\n", "\n", "The **API members** above show you exactly which new names have been inserted in your namespace.\n", "If you click on these names, you go to the API documentation for them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Search\n", "Text-Fabric contains a flexible search engine, that does not only work for the data,\n", "of this corpus, but also for other corpora and data that you add to corpora.\n", "\n", "**Search is the quickest way to come up-to-speed with your data, without too much programming.**\n", "\n", "Jump to the dedicated [search](search.ipynb) search tutorial first, to whet your appetite.\n", "\n", "The real power of search lies in the fact that it is integrated in a programming environment.\n", "You can use programming to:\n", "\n", "* compose dynamic queries\n", "* process query results\n", "\n", "Therefore, the rest of this tutorial is still important when you want to tap that power.\n", "If you continue here, you learn all the basics of data-navigation with Text-Fabric." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Counting\n", "\n", "In order to get acquainted with the data, we start with the simple task of counting.\n", "\n", "## Count all nodes\n", "We use the\n", "[`N.walk()` generator](https://annotation.github.io/text-fabric/tf/core/nodes.html#tf.core.nodes.Nodes.walk)\n", "to walk through the nodes.\n", "\n", "We compared the TF data to a gigantic spreadsheet, where the rows correspond to the signs.\n", "In Text-Fabric, we call the rows `slots`, because they are the textual positions that can be filled with signs.\n", "\n", "We also mentioned that there are also other textual objects.\n", "They are the clusters, lines, faces and documents.\n", "They also correspond to rows in the big spreadsheet.\n", "\n", "In Text-Fabric we call all these rows *nodes*, and the `N()` generator\n", "carries us through those nodes in the textual order.\n", "\n", "Just one extra thing: the `info` statements generate timed messages.\n", "If you use them instead of `print` you'll get a sense of the amount of time that\n", "the various processing steps typically need." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:43.894153Z", "start_time": "2018-05-18T09:17:43.597128Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Counting nodes ...\n", " 0.26s 2108303 nodes\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"Counting nodes ...\")\n", "\n", "i = 0\n", "for n in N.walk():\n", " i += 1\n", "\n", "A.info(\"{} nodes\".format(i))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here you see it: over 2M nodes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What are those nodes?\n", "Every node has a type, like sign, or line, face.\n", "But what exactly are they?\n", "\n", "Text-Fabric has two special features, `otype` and `oslots`, that must occur in every Text-Fabric data set.\n", "`otype` tells you for each node its type, and you can ask for the number of `slot`s in the text.\n", "\n", "Here we go!" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:47.820323Z", "start_time": "2018-05-18T09:17:47.812328Z" } }, "outputs": [ { "data": { "text/plain": [ "'sign'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.slotType" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:48.549430Z", "start_time": "2018-05-18T09:17:48.543371Z" } }, "outputs": [ { "data": { "text/plain": [ "1430241" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.maxSlot" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:49.251302Z", "start_time": "2018-05-18T09:17:49.244467Z" } }, "outputs": [ { "data": { "text/plain": [ "2108303" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.maxNode" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:49.922863Z", "start_time": "2018-05-18T09:17:49.916078Z" } }, "outputs": [ { "data": { "text/plain": [ "('scroll',\n", " 'lex',\n", " 'fragment',\n", " 'line',\n", " 'clause',\n", " 'cluster',\n", " 'phrase',\n", " 'word',\n", " 'sign')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.otype.all" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:51.782779Z", "start_time": "2018-05-18T09:17:51.774167Z" } }, "outputs": [ { "data": { "text/plain": [ "(('scroll', 1428.8121878121879, 1605868, 1606868),\n", " ('lex', 129.1396172248804, 1542523, 1552972),\n", " ('fragment', 127.90565194061885, 1531341, 1542522),\n", " ('line', 27.03924756593251, 1552973, 1605867),\n", " ('clause', 12.848, 2107864, 2107988),\n", " ('cluster', 6.678582379647672, 1430242, 1531340),\n", " ('phrase', 5.098412698412698, 2107989, 2108303),\n", " ('word', 2.814359424744758, 1606869, 2107863),\n", " ('sign', 1, 1, 1430241))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.levels.data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is interesting: above you see all the textual objects, with the average size of their objects,\n", "the node where they start, and the node where they end." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Count individual object types\n", "This is an intuitive way to count the number of nodes in each type.\n", "Note in passing, how we use the `indent` in conjunction with `info` to produce neat timed\n", "and indented progress messages." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:17:57.806821Z", "start_time": "2018-05-18T09:17:57.558523Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s counting objects ...\n", " | 0.00s 1001 scrolls\n", " | 0.00s 10450 lexs\n", " | 0.00s 11182 fragments\n", " | 0.01s 52895 lines\n", " | 0.00s 125 clauses\n", " | 0.01s 101099 clusters\n", " | 0.00s 315 phrases\n", " | 0.05s 500995 words\n", " | 0.13s 1430241 signs\n", " 0.20s Done\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"counting objects ...\")\n", "\n", "for otype in F.otype.all:\n", " i = 0\n", "\n", " A.indent(level=1, reset=True)\n", "\n", " for n in F.otype.s(otype):\n", " i += 1\n", "\n", " A.info(\"{:>7} {}s\".format(i, otype))\n", "\n", "A.indent(level=0)\n", "A.info(\"Done\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Viewing textual objects\n", "\n", "You can use the A API (the extra power) to display cuneiform text.\n", "\n", "See the [display](display.ipynb) tutorial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature statistics\n", "\n", "`F`\n", "gives access to all features.\n", "Every feature has a method\n", "`freqList()`\n", "to generate a frequency list of its values, higher frequencies first.\n", "Here are the parts of speech:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "(('ptcl', 154464),\n", " ('subs', 108562),\n", " ('unknown', 80256),\n", " ('verb', 58873),\n", " ('suff', 45747),\n", " ('adjv', 10633),\n", " ('numr', 6526),\n", " ('pron', 5784))" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.sp.freqList()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Signs, words and clusters have types. We can count them separately:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "(('rec', 93733),\n", " ('vac', 3522),\n", " ('cor3', 1582),\n", " ('unc2', 906),\n", " ('rem2', 706),\n", " ('alt', 333),\n", " ('cor2', 147),\n", " ('cor', 95),\n", " ('rem', 75))" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.type.freqList(\"cluster\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "(('glyph', 470605), ('punct', 29927), ('numr', 463))" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.type.freqList(\"word\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:18.039544Z", "start_time": "2018-05-18T09:18:17.784073Z" } }, "outputs": [ { "data": { "text/plain": [ "(('cons', 1156780),\n", " ('empty', 98407),\n", " ('missing', 53864),\n", " ('sep', 46453),\n", " ('punct', 29927),\n", " ('unc', 27168),\n", " ('term', 15532),\n", " ('numr', 2029),\n", " ('add', 65),\n", " ('foreign', 16))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.type.freqList(\"sign\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Word matters\n", "\n", "## Top 20 frequent words\n", "\n", "We represent words by their essential symbols, collected in the feature *glyph* (which also exists for signs)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "45393 ו\n", "20491 ה\n", "19378 ל\n", "18225 ב\n", " 6389 את\n", " 5863 מ\n", " 4894 אשר\n", " 4789 יהוה\n", " 4355 א\n", " 4236 כול\n", " 4185 על\n", " 4172 אל\n", " 3262 כי\n", " 3091 כ\n", " 3005 לא\n", " 2841 כל\n", " 2424 לוא\n", " 1938 ארץ\n", " 1829 ישראל\n", " 1653 יום\n" ] } ], "source": [ "for (w, amount) in F.glyph.freqList(\"word\")[0:20]:\n", " print(f\"{amount:>5} {w}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Word distribution\n", "\n", "Let's do a bit more fancy word stuff.\n", "\n", "### Hapaxes\n", "\n", "A hapax can be found by picking the words with frequency 1.\n", "We do have lexeme information in this corpus, let's use it for determining hapaxes.\n", "\n", "We print 20 hapaxes." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3813" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hapaxes1 = sorted(lx for (lx, amount) in F.lex.freqList(\"word\") if amount == 1)\n", "len(hapaxes1)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " # # # # # \n", " # # # # # # # # # \n", " # # # # # ות\n", " # # # # # ל # # # \n", " # # # # # ם\n", " # # # # ב\n", " # # # # ה\n", " # # # # ו # \n", " # # # # ך\n", " # # # # ל # # \n", " # # # # תא\n", " # # # ד\n", " # # # דב\n", " # # # דה\n", " # # # ה # # \n", " # # # הו\n", " # # # הם\n", " # # # ות\n", " # # # ט\n", " # # # כת\n" ] } ], "source": [ "for lx in hapaxes1[0:20]:\n", " print(lx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An other way to find lexemes with only one occurrence is to use the `occ` edge feature from lexeme nodes to the word nodes of\n", "its occurrences." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3813" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hapaxes2 = sorted(F.lex.v(lx) for lx in F.otype.s(\"lex\") if len(E.occ.f(lx)) == 1)\n", "len(hapaxes2)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " # # # # # \n", " # # # # # # # # # \n", " # # # # # ות\n", " # # # # # ל # # # \n", " # # # # # ם\n", " # # # # ב\n", " # # # # ה\n", " # # # # ו # \n", " # # # # ך\n", " # # # # ל # # \n", " # # # # תא\n", " # # # ד\n", " # # # דב\n", " # # # דה\n", " # # # ה # # \n", " # # # הו\n", " # # # הם\n", " # # # ות\n", " # # # ט\n", " # # # כת\n" ] } ], "source": [ "for lx in hapaxes2[0:20]:\n", " print(lx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The feature `lex` contains lexemes that may have uncertain characters in it.\n", "\n", "The function `glex` has all those characters stripped.\n", "Let's use `glex` instead." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3813" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hapaxes1g = sorted(lx for (lx, amount) in F.glex.freqList(\"word\") if amount == 1)\n", "len(hapaxes1)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100\n", "115\n", "126\n", "150\n", "300\n", "32\n", "350\n", "50\n", "52\n", "536\n", "54\n", "61\n", "65\n", "66\n", "67\n", "71\n", "83\n", "92\n", "99\n", " ידה\n" ] } ], "source": [ "for lx in hapaxes1g[0:20]:\n", " print(lx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we are not interested in the numerals:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " ידה\n", " לוט\n", " נַחַל\n", " שֵׂעָר\n", "ֶ\n", "אֱגֹוז\n", "אֱלִידָד\n", "אֱלִיעָם\n", "אֱלִישֶׁבַע\n", "אֲבִיאֵל\n", "אֲבִיטַל\n", "אֲבִיעֶזְרִי\n", "אֲבִיעֶזֶר\n", "אֲבִישׁוּעַ\n", "אֲבַטִּיחַ\n", "אֲגֹורָה\n", "אֲדַמְדַּם\n", "אֲדָר\n", "אֲדֹנִי\n", "אֲדֹנִיָּה\n" ] } ], "source": [ "for lx in [x for x in hapaxes1g if not x.isdigit()][0:20]:\n", " print(lx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Small occurrence base\n", "\n", "The occurrence base of a word are the scrolls in which occurs.\n", "\n", "We compute the occurrence base of each word, based on lexemes according to the `glex` feature." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s compiling occurrence base ...\n", " 6.19s 8265 entries\n" ] } ], "source": [ "occurrenceBase1 = collections.defaultdict(set)\n", "\n", "A.indent(reset=True)\n", "A.info(\"compiling occurrence base ...\")\n", "for w in F.otype.s(\"word\"):\n", " scroll = T.sectionFromNode(w)[0]\n", " occurrenceBase1[F.glex.v(w)].add(scroll)\n", "A.info(f\"{len(occurrenceBase1)} entries\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wow, that took long!\n", "\n", "We looked up the scroll for each word.\n", "\n", "But there is another way:\n", "\n", "Start with scrolls, and iterate through their words." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s compiling occurrence base ...\n", " 0.42s done\n", " 0.42s 8265 entries\n" ] } ], "source": [ "occurrenceBase2 = collections.defaultdict(set)\n", "\n", "A.indent(reset=True)\n", "A.info(\"compiling occurrence base ...\")\n", "for s in F.otype.s(\"scroll\"):\n", " scroll = F.scroll.v(s)\n", " for w in L.d(s, otype=\"word\"):\n", " occurrenceBase2[F.glex.v(w)].add(scroll)\n", "A.info(\"done\")\n", "A.info(f\"{len(occurrenceBase2)} entries\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Much better. Are the results equal?" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "occurrenceBase1 == occurrenceBase2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yes." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "occurrenceBase = occurrenceBase2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An overview of how many words have how big occurrence bases:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "base size 1 : 2789 words\n", "base size 2 : 1109 words\n", "base size 3 : 692 words\n", "base size 4 : 462 words\n", "base size 5 : 335 words\n", "base size 6 : 256 words\n", "base size 7 : 219 words\n", "base size 8 : 182 words\n", "base size 9 : 177 words\n", "base size 10 : 122 words\n", "...\n", "base size 457 : 1 words\n", "base size 459 : 1 words\n", "base size 538 : 1 words\n", "base size 600 : 1 words\n", "base size 605 : 1 words\n", "base size 629 : 1 words\n", "base size 745 : 1 words\n", "base size 761 : 1 words\n", "base size 844 : 1 words\n", "base size 997 : 1 words\n" ] } ], "source": [ "occurrenceSize = collections.Counter()\n", "\n", "for (w, scrolls) in occurrenceBase.items():\n", " occurrenceSize[len(scrolls)] += 1\n", "\n", "occurrenceSize = sorted(\n", " occurrenceSize.items(),\n", " key=lambda x: (-x[1], x[0]),\n", ")\n", "\n", "for (size, amount) in occurrenceSize[0:10]:\n", " print(f\"base size {size:>4} : {amount:>5} words\")\n", "print(\"...\")\n", "for (size, amount) in occurrenceSize[-10:]:\n", " print(f\"base size {size:>4} : {amount:>5} words\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's give the predicate *private* to those words whose occurrence base is a single scroll." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2789" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "privates = {w for (w, base) in occurrenceBase.items() if len(base) == 1}\n", "len(privates)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Peculiarity of scrolls\n", "\n", "As a final exercise with scrolls, lets make a list of all scrolls, and show their\n", "\n", "* total number of words\n", "* number of private words\n", "* the percentage of private words: a measure of the peculiarity of the scroll" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:52.143337Z", "start_time": "2018-05-18T09:18:52.130385Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 0 empty scrolls\n", "Found 507 ordinary scrolls (i.e. without private words)\n" ] } ], "source": [ "scrollList = []\n", "\n", "empty = set()\n", "ordinary = set()\n", "\n", "for d in F.otype.s(\"scroll\"):\n", " scroll = T.scrollName(d)\n", " words = {F.glex.v(w) for w in L.d(d, otype=\"word\")}\n", " a = len(words)\n", " if not a:\n", " empty.add(scroll)\n", " continue\n", " o = len({w for w in words if w in privates})\n", " if not o:\n", " ordinary.add(scroll)\n", " continue\n", " p = 100 * o / a\n", " scrollList.append((scroll, a, o, p))\n", "\n", "scrollList = sorted(scrollList, key=lambda e: (-e[3], -e[1], e[0]))\n", "\n", "print(f\"Found {len(empty):>4} empty scrolls\")\n", "print(f\"Found {len(ordinary):>4} ordinary scrolls (i.e. without private words)\")" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:52.143337Z", "start_time": "2018-05-18T09:18:52.130385Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scroll #all #own %own\n", "-----------------------------------\n", "4Q341 32 21 65.6%\n", "4Q340 15 5 33.3%\n", "11Q26 6 2 33.3%\n", "4Q313a 3 1 33.3%\n", "4Q358 3 1 33.3%\n", "4Q347 10 3 30.0%\n", "4Q124 86 25 29.1%\n", "4Q282d 7 2 28.6%\n", "1Q70bis 11 3 27.3%\n", "1Q70 24 6 25.0%\n", "4Q346a 4 1 25.0%\n", "4Q357 4 1 25.0%\n", "1Q41 9 2 22.2%\n", "3Q15 269 58 21.6%\n", "4Q561 73 15 20.5%\n", "4Q559 129 26 20.2%\n", "4Q360a 20 4 20.0%\n", "1Q58 5 1 20.0%\n", "4Q250b 5 1 20.0%\n", "4Q468bb 5 1 20.0%\n", "...\n", "4Q427 343 2 0.6%\n", "4Q2 174 1 0.6%\n", "4Q366 185 1 0.5%\n", "4Q98 192 1 0.5%\n", "4Q56 963 5 0.5%\n", "4Q394 194 1 0.5%\n", "4Q59 404 2 0.5%\n", "4Q88 208 1 0.5%\n", "11Q20 429 2 0.5%\n", "4Q57 875 4 0.5%\n", "11Q11 222 1 0.5%\n", "4Q58 450 2 0.4%\n", "4Q174 241 1 0.4%\n", "4Q13 257 1 0.4%\n", "4Q524 280 1 0.4%\n", "4Q271 293 1 0.3%\n", "4Q84 350 1 0.3%\n", "4Q33 365 1 0.3%\n", "4Q428 385 1 0.3%\n", "1QpHab 463 1 0.2%\n" ] } ], "source": [ "print(\n", " \"{:<20}{:>5}{:>5}{:>5}\\n{}\".format(\n", " \"scroll\",\n", " \"#all\",\n", " \"#own\",\n", " \"%own\",\n", " \"-\" * 35,\n", " )\n", ")\n", "\n", "for x in scrollList[0:20]:\n", " print(\"{:<20} {:>4} {:>4} {:>4.1f}%\".format(*x))\n", "print(\"...\")\n", "for x in scrollList[-20:]:\n", " print(\"{:<20} {:>4} {:>4} {:>4.1f}%\".format(*x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tip\n", "\n", "See the [lexeme recipe](cookbook/lexeme.ipynb) in the cookbook for how you get from a lexeme node to\n", "its word occurrence nodes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Locality API\n", "We travel upwards and downwards, forwards and backwards through the nodes.\n", "The Locality-API (`L`) provides functions: `u()` for going up, and `d()` for going down,\n", "`n()` for going to next nodes and `p()` for going to previous nodes.\n", "\n", "These directions are indirect notions: nodes are just numbers, but by means of the\n", "`oslots` feature they are linked to slots. One node *contains* an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to.\n", "And one if next or previous to an other, if its slots follow or precede the slots of the other one.\n", "\n", "`L.u(node)` **Up** is going to nodes that embed `node`.\n", "\n", "`L.d(node)` **Down** is the opposite direction, to those that are contained in `node`.\n", "\n", "`L.n(node)` **Next** are the next *adjacent* nodes, i.e. nodes whose first slot comes immediately after the last slot of `node`.\n", "\n", "`L.p(node)` **Previous** are the previous *adjacent* nodes, i.e. nodes whose last slot comes immediately before the first slot of `node`.\n", "\n", "All these functions yield nodes of all possible node types.\n", "By passing an optional parameter, you can restrict the results to nodes of that type.\n", "\n", "The result are ordered according to the order of things in the text.\n", "\n", "The functions return always a tuple, even if there is just one node in the result.\n", "\n", "## Going up\n", "We go from the first word to the scroll it contains.\n", "Note the `[0]` at the end. You expect one scroll, yet `L` returns a tuple.\n", "To get the only element of that tuple, you need to do that `[0]`.\n", "\n", "If you are like me, you keep forgetting it, and that will lead to weird error messages later on." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:55.410034Z", "start_time": "2018-05-18T09:18:55.404051Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1605868\n" ] } ], "source": [ "firstScroll = L.u(1, otype=\"scroll\")[0]\n", "print(firstScroll)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And let's see all the containing objects of sign 3:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:56.772513Z", "start_time": "2018-05-18T09:18:56.766324Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sign 3 is contained in scroll 1605868\n", "sign 3 is contained in lex 1542524\n", "sign 3 is contained in fragment 1531341\n", "sign 3 is contained in line 1552973\n", "sign 3 is contained in clause x\n", "sign 3 is contained in cluster x\n", "sign 3 is contained in phrase x\n", "sign 3 is contained in word 1606870\n" ] } ], "source": [ "s = 3\n", "for otype in F.otype.all:\n", " if otype == F.otype.slotType:\n", " continue\n", " up = L.u(s, otype=otype)\n", " upNode = \"x\" if len(up) == 0 else up[0]\n", " print(\"sign {} is contained in {} {}\".format(s, otype, upNode))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going next\n", "Let's go to the next nodes of the first scroll." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:18:58.821681Z", "start_time": "2018-05-18T09:18:58.814893Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 17149: sign first slot=17149 , last slot=17149 \n", "1612982: word first slot=17149 , last slot=17149 \n", "1553387: line first slot=17149 , last slot=17176 \n", "1531359: fragment first slot=17149 , last slot=18207 \n", "1605869: scroll first slot=17149 , last slot=33885 \n" ] } ], "source": [ "afterFirstScroll = L.n(firstScroll)\n", "for n in afterFirstScroll:\n", " print(\n", " \"{:>7}: {:<13} first slot={:<6}, last slot={:<6}\".format(\n", " n,\n", " F.otype.v(n),\n", " E.oslots.s(n)[0],\n", " E.oslots.s(n)[-1],\n", " )\n", " )\n", "secondScroll = L.n(firstScroll, otype=\"scroll\")[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going previous\n", "\n", "And let's see what is right before the second scroll." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:00.163973Z", "start_time": "2018-05-18T09:19:00.154857Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1605868: scroll first slot=1 , last slot=17148 \n", "1531358: fragment first slot=15658 , last slot=17148 \n", "1553386: line first slot=17099 , last slot=17148 \n", "1612981: word first slot=17147 , last slot=17148 \n", " 17148: sign first slot=17148 , last slot=17148 \n" ] } ], "source": [ "for n in L.p(secondScroll):\n", " print(\n", " \"{:>7}: {:<13} first slot={:<6}, last slot={:<6}\".format(\n", " n,\n", " F.otype.v(n),\n", " E.oslots.s(n)[0],\n", " E.oslots.s(n)[-1],\n", " )\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Going down" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We go to the fragments of the first scroll, and just count them." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:02.530705Z", "start_time": "2018-05-18T09:19:02.475279Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18\n" ] } ], "source": [ "fragments = L.d(firstScroll, otype=\"fragment\")\n", "print(len(fragments))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The first line\n", "We pick two nodes and explore what is above and below them:\n", "the first line and the first word." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:04.024679Z", "start_time": "2018-05-18T09:19:03.995207Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Node 1606869\n", " | UP\n", " | | 1542523 lex\n", " | | 1552973 line\n", " | | 1531341 fragment\n", " | | 1605868 scroll\n", " | DOWN\n", " | | 2 sign\n", "Node 1552973\n", " | UP\n", " | | 1531341 fragment\n", " | | 1605868 scroll\n", " | DOWN\n", " | | 1430242 cluster\n", " | | 1 sign\n", " | | 1606869 word\n", " | | 2 sign\n", " | | 1606870 word\n", " | | 3 sign\n", " | | 4 sign\n", " | | 5 sign\n", " | | 1606871 word\n", " | | 6 sign\n", " | | 7 sign\n", " | | 8 sign\n", " | | 9 sign\n", " | | 1606872 word\n", " | | 10 sign\n", " | | 11 sign\n", " | | 1606873 word\n", " | | 12 sign\n", " | | 13 sign\n", " | | 14 sign\n", " | | 15 sign\n", " | | 16 sign\n", " | | 1606874 word\n", " | | 17 sign\n", " | | 18 sign\n", " | | 19 sign\n", " | | 1606875 word\n", " | | 20 sign\n", " | | 1606876 word\n", " | | 21 sign\n", " | | 22 sign\n", " | | 23 sign\n", " | | 24 sign\n", " | | 1606877 word\n", " | | 25 sign\n", " | | 1606878 word\n", " | | 26 sign\n", " | | 27 sign\n", " | | 28 sign\n", " | | 29 sign\n", "Done\n" ] } ], "source": [ "for n in [\n", " F.otype.s(\"word\")[0],\n", " F.otype.s(\"line\")[0],\n", "]:\n", " A.indent(level=0)\n", " A.info(\"Node {}\".format(n), tm=False)\n", " A.indent(level=1)\n", " A.info(\"UP\", tm=False)\n", " A.indent(level=2)\n", " A.info(\"\\n\".join([\"{:<15} {}\".format(u, F.otype.v(u)) for u in L.u(n)]), tm=False)\n", " A.indent(level=1)\n", " A.info(\"DOWN\", tm=False)\n", " A.indent(level=2)\n", " A.info(\"\\n\".join([\"{:<15} {}\".format(u, F.otype.v(u)) for u in L.d(n)]), tm=False)\n", "A.indent(level=0)\n", "A.info(\"Done\", tm=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Text API\n", "\n", "So far, we have mainly seen nodes and their numbers, and the names of node types.\n", "You would almost forget that we are dealing with text.\n", "So let's try to see some text.\n", "\n", "In the same way as `F` gives access to feature data,\n", "`T` gives access to the text.\n", "That is also feature data, but you can tell Text-Fabric which features are specifically\n", "carrying the text, and in return Text-Fabric offers you\n", "a Text API: `T`.\n", "\n", "## Formats\n", "DSS text can be represented in a number of ways:\n", "\n", "* `orig`: unicode\n", "* `trans`: ETCBC transcription\n", "* `source`: as in Abegg's data files\n", "\n", "All three can be represented in two flavours:\n", "\n", "* `full`: all glyphs, but no bracketings and flags\n", "* `extra`: everything\n", "\n", "If you wonder where the information about text formats is stored:\n", "not in the program text-fabric, but in the data set.\n", "It has a feature `otext`, which specifies the formats and which features\n", "must be used to produce them. `otext` is the third special feature in a TF data set,\n", "next to `otype` and `oslots`.\n", "It is an optional feature.\n", "If it is absent, there will be no `T` API.\n", "\n", "Here is a list of all available formats in this data set." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'lex-default': 'word',\n", " 'lex-orig-full': 'word',\n", " 'lex-source-full': 'word',\n", " 'lex-trans-full': 'word',\n", " 'morph-source-full': 'word',\n", " 'text-orig-extra': 'word',\n", " 'text-orig-full': 'sign',\n", " 'text-source-extra': 'word',\n", " 'text-source-full': 'sign',\n", " 'text-trans-extra': 'word',\n", " 'text-trans-full': 'sign',\n", " 'layout-orig-full': 'sign',\n", " 'layout-source-full': 'sign',\n", " 'layout-trans-full': 'sign'}" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.formats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the formats\n", "\n", "The ` T.text()` function is central to get text representations of nodes. Its most basic usage is\n", "\n", "```python\n", "T.text(nodes, fmt=fmt)\n", "```\n", "where `nodes` is a list or iterable of nodes, usually word nodes, and `fmt` is the name of a format.\n", "If you leave out `fmt`, the default `text-orig-full` is chosen.\n", "\n", "The result is the text in that format for all nodes specified:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You see for each format in the list above its intended level of operation: `sign` or `word`.\n", "\n", "If TF formats a node according to a defined text-format, it will descend to constituent nodes and represent those\n", "constituent nodes.\n", "\n", "In this case, the formats ending in `-extra` specify the `word` level as the descend type.\n", "Because, in this dataset, the features that contain the text-critical brackets are only defined at the word level.\n", "At the sign level, those brackets are no longer visible, but they have left their traces in other features." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we do not specify a format, the **default** format is used (`text-orig-full`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We examine a portion of biblical material at the start 1Q1." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1540222" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fragmentNode = T.nodeFromSection((\"1Q1\", \"f1\"))\n", "fragmentNode" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Fragment ('1Q1', 'f1') with\n", " 157 signs\n", " 57 words\n", " 3 lines\n", "\n" ] } ], "source": [ "signs = L.d(fragmentNode, otype=\"sign\")\n", "words = L.d(fragmentNode, otype=\"word\")\n", "lines = L.d(fragmentNode, otype=\"line\")\n", "print(\n", " f\"\"\"\n", "Fragment {T.sectionFromNode(fragmentNode)} with\n", " {len(signs):>3} signs\n", " {len(words):>3} words\n", " {len(lines):>3} lines\n", "\"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:13.490426Z", "start_time": "2018-05-18T09:19:13.486053Z" } }, "outputs": [ { "data": { "text/plain": [ "'וירא אלהים כי טוב ׃ ויהי ערב ויהי בקר יום רביעי ׃ ויאמר ╱ אלהים ישרוצו המים שרץ נפש חיה ועוף יעופף על הארץ על פני רקיע השמים '" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(signs[0:100])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:13.490426Z", "start_time": "2018-05-18T09:19:13.486053Z" } }, "outputs": [ { "data": { "text/plain": [ "'וירא אלהים כי טוב ׃ ויהי ערב ויהי בקר יום רביעי ׃ ויאמר אלהים ישרוצו ה'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(words[0:20])" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:13.490426Z", "start_time": "2018-05-18T09:19:13.486053Z" } }, "outputs": [ { "data": { "text/plain": [ "'וירא אלהים כי טוב ׃ ויהי ערב ויהי בקר יום רביעי ׃ ויאמר ╱ אלהים ישרוצו המים שרץ נפש חיה ועוף יעופף על הארץ על פני רקיע השמים ׃ ╱ '" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(lines[0:2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `-extra` formats\n", "\n", "In order to use non-default formats, we have to specify them in the `fmt` parameter." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "''" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(signs[0:100], fmt=\"text-orig-extra\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We do not get much, let's ask why." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "EXPLANATION: T.text() called with parameters:\n", "\tnodes : iterable of 2 nodes\n", "\tfmt : text-orig-extra targeted at word\n", "\tdescend: implicit\n", "\tfunc : no custom format implementation\n", "\n", "\tNODE: sign 770999\n", "\t\tTARGET LEVEL: word (descend=None) (format target type)\n", "\t\tEXPANSION: 0 words \n", "\t\tFORMATTING: explicit text-orig-extra does .g at 0x133d4a680>\n", "\t\tMATERIAL:\n", "\tNODE: sign 771000\n", "\t\tTARGET LEVEL: word (descend=None) (format target type)\n", "\t\tEXPANSION: 0 words \n", "\t\tFORMATTING: explicit text-orig-extra does .g at 0x133d4a680>\n", "\t\tMATERIAL:\n" ] }, { "data": { "text/plain": [ "''" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(signs[0:2], fmt=\"text-orig-extra\", explain=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reason can be found in `TARGET LEVEL: word` and `EXPANSION 0 words`.\n", "We are applying the word targeted format `text-orig-extra` to a sign, which does not contain words." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[ וירא אל ]הים כי [ טוב ׃ ויהי ערב ויהי בקר יום רביעי ׃ ויאמר ] [ אלהים יש ]רוצו ה'" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(words[0:20], fmt=\"text-orig-extra\")" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[ וירא אל ]הים כי [ טוב ׃ ויהי ערב ויהי בקר יום רביעי ׃ ויאמר ] [ אלהים יש ]רוצו המים שר#[ ץ נפש חיה ועוף יעופף על הארץ על פני רקיע השמים ׃ '" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(lines[0:2], fmt=\"text-orig-extra\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the direction of the brackets look wrong, because they have not been adapted to the right-to-left writing direction.\n", "\n", "We can view them in ETCBC transcription as well:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[ WJR> >L ]HJm KJ [ VWB 00 WJHJ MR ] [ >LHJm J# ]RWYW H'" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(words[0:20], fmt=\"text-trans-extra\")" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[ WJR> >L ]HJm KJ [ VWB 00 WJHJ MR ] [ >LHJm J# ]RWYW HMJm #R#[ y NP# XJH WRy 200:\n", " text = text[0:200] + f\"\\nand {len(text) - 200} characters more\"\n", " print(text)\n", " print(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at the last case, the lexeme node: obviously, the text-format that has been invoked provides\n", "the *language* (`h`) of the lexeme, plus its representations in UNICODE, ETCBC, and Abegg transcription.\n", "\n", "But what format exactly has been invoked?\n", "Let's ask." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "EXPLANATION: T.text() called with parameters:\n", "\tnodes : single node\n", "\tfmt : implicit\n", "\tdescend: implicit\n", "\tfunc : no custom format implementation\n", "\n", "\tNODE: lex 1542524\n", "\t\tTARGET LEVEL: lex (no expansion needed) (descend=None) (format target type)\n", "\t\tEXPANSION: 1 lex 1542524\n", "\t\tFORMATTING: implicit lex-default does .g at 0x133d49a20>\n", "\t\tMATERIAL:\n", "\t\t\tlex 1542524 ADDS \"h-עַתָּה-H h->:ELOHIJm h-K.IJ h-VOWB h-00 h-W:h-HJH h-MR \n", "\n", "morph-source-full:\n", "\tPcvqw3msj ncmp Pc ams . Pcvqw3msj ncms Pcvqw3msj ncms ncms uomsa . Pcvqw3ms \n", "\n", "text-orig-extra:\n", "\t[ וירא אל ]הים כי [ טוב ׃ ויהי ערב ויהי בקר יום רביעי ׃ ויאמר ] \n", "\n", "text-orig-full:\n", "\tוירא אלהים כי טוב ׃ ויהי ערב ויהי בקר יום רביעי ׃ ויאמר ╱ \n", "\n", "text-source-extra:\n", "\t]wyra al[hyM ky ]fwb . wyhy orb wyhy bqr ywM rbyoy . wyamr[ \n", "\n", "text-source-full:\n", "\twyra alhyM ky fwb . wyhy orb wyhy bqr ywM rbyoy . wyamr ╱ \n", "\n", "text-trans-extra:\n", "\t[ WJR> >L ]HJm KJ [ VWB 00 WJHJ MR ] \n", "\n", "text-trans-full:\n", "\tWJR> >LHJm KJ VWB 00 WJHJ MR ╱ \n", "\n" ] } ], "source": [ "firstLine = T.nodeFromSection((\"1Q1\", \"f1\", \"1\"))\n", "for fmt in usefulFormats:\n", " if not fmt.startswith(\"layout-\"):\n", " print(\n", " \"{}:\\n\\t{}\\n\".format(\n", " fmt,\n", " T.text(firstLine, fmt=fmt),\n", " )\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Whole text in all formats in a few seconds\n", "Part of the pleasure of working with computers is that they can crunch massive amounts of data.\n", "The text of the Dead Sea Scrolls is a piece of cake.\n", "\n", "It takes just a dozen seconds or so to have that cake and eat it.\n", "In all useful formats." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:27.839331Z", "start_time": "2018-05-18T09:19:18.526400Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s writing plain text of all scrolls in all text formats\n", " 8.74s done 6 formats\n", "text-orig-extra\n", "ועתה שמעו כל יודעי צדק ובינו במעשי \n", "אל ׃ כי ריב ל׳ו עם כל בשר ומשפט יעשה בכל מנאצי׳ו ׃ \n", "כי במועל׳ם אשר עזבו׳הו הסתיר פני׳ו מישראל וממקדש׳ו \n", "ו?יתנ׳ם לחרב ׃ ובזכר׳ו ברית ראשנים השאיר שאירית \n", "לישראל ולא נתנ׳ם לכלה ׃ ובקץ חרון שנים שלוש מאות \n", "\n", "text-orig-full\n", "  ועתה שמעו כל יודעי צדק ובינו במעשי \n", "אל ׃ כי ריב ל׳ו עם כל בשר ומשפט יעשה בכל מנאצי׳ו ׃ \n", "כי במועל׳ם אשר עזבו׳הו הסתיר פני׳ו מישראל וממקדש׳ו \n", "ויתנ׳ם לחרב ׃ ובזכר׳ו ברית ראשנים השאיר שאירית \n", "לישראל ולא נתנ׳ם לכלה ׃ ובקץ חרון שנים שלוש מאות \n", "\n", "text-source-extra\n", "woth Cmow kl ywdoy xdq wbynw bmoCy \n", "al . ky ryb l/w oM kl bCr wmCpf yoCh bkl mnaxy/w . \n", "ky bmwol/M aCr ozbw/hw hstyr pny/w myCral wmmqdC/w \n", "wØytn/M ljrb . wbzkr/w bryt raCnyM hCayr Cayryt \n", "lyCral wla ntn/M lklh . wbqX jrwN CnyM ClwC mawt \n", "\n", "text-source-full\n", "□ woth Cmow kl ywdoy xdq wbynw bmoCy \n", "al . ky ryb l/w oM kl bCr wmCpf yoCh bkl mnaxy/w . \n", "ky bmwol/M aCr ozbw/hw hstyr pny/w myCral wmmqdC/w \n", "wytn/M ljrb . wbzkr/w bryt raCnyM hCayr Cayryt \n", "lyCral wla ntn/M lklh . wbqX jrwN CnyM ClwC mawt \n", "\n", "text-trans-extra\n", "WL 00 KJ RJB L'W YJ'W 00 \n", "KJ BMW#R L WMMQD#'W \n", "W?JTN'm LXRB 00 WBZKR'W BRJT R>#NJm H#>JR #>JRJT \n", "LJ#R>L WL> NTN'm LKLH 00 WBQy XRWn #NJm #LW# M>WT \n", "\n", "text-trans-full\n", "  WL 00 KJ RJB L'W YJ'W 00 \n", "KJ BMW#R L WMMQD#'W \n", "WJTN'm LXRB 00 WBZKR'W BRJT R>#NJm H#>JR #>JRJT \n", "LJ#R>L WL> NTN'm LKLH 00 WBQy XRWn #NJm #LW# M>WT \n", "\n" ] } ], "source": [ "A.indent(reset=True)\n", "A.info(\"writing plain text of all scrolls in all text formats\")\n", "\n", "text = collections.defaultdict(list)\n", "\n", "for ln in F.otype.s(\"line\"):\n", " for fmt in usefulFormats:\n", " if fmt.startswith(\"text-\"):\n", " text[fmt].append(T.text(ln, fmt=fmt, descend=True))\n", "\n", "A.info(\"done {} formats\".format(len(text)))\n", "\n", "for fmt in sorted(text):\n", " print(\"{}\\n{}\\n\".format(fmt, \"\\n\".join(text[fmt][0:5])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The full plain text\n", "We write all formats to file, in your `Downloads` folder." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:34.250294Z", "start_time": "2018-05-18T09:19:34.156658Z" } }, "outputs": [], "source": [ "for fmt in T.formats:\n", " if fmt.startswith(\"text-\"):\n", " with open(\n", " os.path.expanduser(f\"~/Downloads/{fmt}.txt\"),\n", " \"w\",\n", " # encoding='utf8',\n", " ) as f:\n", " f.write(\"\\n\".join(text[fmt]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(if this errors, uncomment the line with `encoding`)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sections\n", "\n", "A section in the DSS is a scroll, a fragment or a line.\n", "Knowledge of sections is not baked into Text-Fabric.\n", "The config feature `otext.tf` may specify three section levels, and tell\n", "what the corresponding node types and features are.\n", "\n", "From that knowledge it can construct mappings from nodes to sections, e.g. from line\n", "nodes to tuples of the form:\n", "\n", " (scroll acronym, fragment label, line number)\n", "\n", "You can get the section of a node as a tuple of relevant scroll, fragment, and line nodes.\n", "Or you can get it as a passage label, a string.\n", "\n", "You can ask for the passage corresponding to the first slot of a node, or the one corresponding to the last slot.\n", "\n", "If you are dealing with scroll and fragment nodes, you can ask to fill out the line and fragment parts as well.\n", "\n", "Here are examples of getting the section that corresponds to a node and vice versa.\n", "\n", "**NB:** `sectionFromNode` always delivers a line specification, either from the\n", "first slot belonging to that node, or, if `lastSlot`, from the last slot\n", "belonging to that node." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "someNodes = (\n", " F.otype.s(\"sign\")[100000],\n", " F.otype.s(\"word\")[10000],\n", " F.otype.s(\"cluster\")[5000],\n", " F.otype.s(\"line\")[15000],\n", " F.otype.s(\"fragment\")[1000],\n", " F.otype.s(\"scroll\")[500],\n", ")" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:19:43.056511Z", "start_time": "2018-05-18T09:19:43.043552Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100001 sign - 1QHa 25:31 1QHa 25:31 ((1605874, 1531445, 1555227), (1605874, 1531445, 1555227))\n", "1616869 word - 1QS 8:10 1QS 8:10 ((1605869, 1531366, 1553578), (1605869, 1531366, 1553578))\n", "1435242 cluster - 1Q29 f2:3 1Q29 f2:3 ((1605890, 1531685, 1556400), (1605890, 1531685, 1556400))\n", "1567973 line - 4Q368 f3:4 4Q368 f3:4 ((1606221, 1534207, 1567973), (1606221, 1534207, 1567973))\n", "1532341 fragment - 4Q186 f2ii 4Q186 f2ii:3 ((1605991, 1532341), (1605991, 1532341, 1559220))\n", "1606368 scroll - 4Q471b 4Q471b f1a_d:10 ((1606368,), (1606368, 1536089, 1575660))\n" ] } ], "source": [ "for n in someNodes:\n", " nType = F.otype.v(n)\n", " d = f\"{n:>7} {nType}\"\n", " first = A.sectionStrFromNode(n)\n", " last = A.sectionStrFromNode(n, lastSlot=True, fillup=True)\n", " tup = (\n", " T.sectionTuple(n),\n", " T.sectionTuple(n, lastSlot=True, fillup=True),\n", " )\n", " print(f\"{d:<16} - {first:<18} {last:<18} {tup}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Clean caches\n", "\n", "Text-Fabric pre-computes data for you, so that it can be loaded faster.\n", "If the original data is updated, Text-Fabric detects it, and will recompute that data.\n", "\n", "But there are cases, when the algorithms of Text-Fabric have changed, without any changes in the data, that you might\n", "want to clear the cache of precomputed results.\n", "\n", "There are two ways to do that:\n", "\n", "* Locate the `.tf` directory of your dataset, and remove all `.tfx` files in it.\n", " This might be a bit awkward to do, because the `.tf` directory is hidden on Unix-like systems.\n", "* Call `TF.clearCache()`, which does exactly the same.\n", "\n", "It is not handy to execute the following cell all the time, that's why I have commented it out.\n", "So if you really want to clear the cache, remove the comment sign below." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "# TF.clearCache()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next steps\n", "\n", "By now you have an impression how to compute around in the corpus.\n", "While this is still the beginning, I hope you already sense the power of unlimited programmatic access\n", "to all the bits and bytes in the data set.\n", "\n", "Here are a few directions for unleashing that power.\n", "\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[similarLines](similarLines.ipynb)** spot the similarities between lines\n", "\n", "---\n", "\n", "See the [cookbook](cookbook) for recipes for small, concrete tasks.\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }