{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exporting actors as TF-features\n", "\n", "This notebook exports a corrected version of the participant reference dataset created by Eep Talstra. That dataset was inspected and a list of mismatches was identified for manual correction (see [notebook](https://github.com/ch-jensen/Semantic-mapping-of-participants/blob/master/2_Exploring%20the%20dataset.ipynb)). The dataset went through two rounds of corrections in Excel. This corrected version will now be exported as three [text-fabric](https://dans-labs.github.io/text-fabric/) features:\n", "\n", "1. *actor* (actors for words, phrase atoms and subphrases)\n", "2. *prs_actor* (actors for pronominal suffixes)\n", "3. *coref* (an edge feature linking each reference to its respective co-references)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Importing\n", "\n", "### 1.1. Importing modules" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import sys, os\n", "import csv, collections\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using etcbc/bhsa/tf - c r1.4 in C:\\Users\\Ejer/text-fabric-data\n", "Using etcbc/phono/tf - c r1.1 in C:\\Users\\Ejer/text-fabric-data\n", "Using etcbc/parallels/tf - c r1.1 in C:\\Users\\Ejer/text-fabric-data\n" ] }, { "data": { "text/markdown": [ "**Documentation:** BHSA Character table Feature docs bhsa API Text-Fabric API 7.0.3 Search Reference" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Loaded features:\n", "

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det dist dist_unit domain freq_lex freq_occ function g_word g_word_utf8 gloss gn instruction is_root kind label language lex lex_utf8 ls nametype nme nu number otype pargr pdp pfm prs prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rank_occ rela root sp st tab trailer trailer_utf8 txt typ uvf vbe vbs verse voc_lex voc_lex_utf8 vs vt distributional_parent functional_parent mother oslots

Parallel Passages: crossref

Phonetic Transcriptions: phono phono_trailer

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
API members:\n", "C Computed, Call AllComputeds, Cs ComputedString
\n", "E Edge, Eall AllEdges, Es EdgeString
\n", "TF, ensureLoaded, ignored, loadLog
\n", "L Locality
\n", "cache, error, indent, info, reset
\n", "N Nodes, sortKey, otypeRank, sortNodes
\n", "F Feature, Fall AllFeatures, Fs FeatureString
\n", "S Search
\n", "T Text
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from tf.app import use\n", "B = use('bhsa', hoist=globals(), locations='actor/tf')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2. Importing corrected dataset" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "file = 'Datasets/Lev17toLev26_mapped_updated_corrected.csv'\n", "\n", "new_dict = {}\n", "\n", "n = 0\n", "\n", "with open(file) as f:\n", " next(f)\n", " reader = csv.reader(f, delimiter = ';')\n", " for r in reader:\n", " surface_text = r[1]\n", " book = r[2]\n", " chapter = r[3]\n", " verse = r[4]\n", " clause_atom = r[5]\n", " pred = r[6]\n", " ref = r[7]\n", " ptc_set = r[8]\n", " ptc_actor = r[9]\n", " slots = r[10]\n", " func = r[11]\n", " compound = r[12]\n", " correction_1 = r[14]\n", " correction_2 = r[15]\n", " n+=1\n", " \n", " new_dict[n] = [surface_text, book, chapter, verse, clause_atom, pred, ref, \n", " ptc_set, ptc_actor, slots, func, compound, correction_1, correction_2]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "data = pd.DataFrame.from_dict(new_dict).T\n", "data.columns = ['surface_text','book','chapter','verse','clause_atom','predicate','reference','participant',\n", " 'actor','slots','func','compound','1_correction','2_correction']" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
surface_textbookchapterverseclause_atompredicatereferenceparticipantactorslotsfunccompound1_correction2_correction
1JDBRLeviticus171528163DBRDBR3sm=JHWHJHWH63009VbPred0
2JHWHLeviticus171528163DBRJHWH3sm=JHWHJHWH63010Subj1
3>L MCHLeviticus171528163DBR>L MCH0sm=MCHMCH63011 63012Compl12
4L->MRLeviticus171528164>MRL >MR3sm=JHWHJHWH63013 63014VbPred3
5DBRLeviticus172528165DBRDBR2sm=MCH63015VbPred4
6>L >HRN W->L BNJW W->L KL BNJ JFR>LLeviticus172528165DBR>L >HRN W >L BN+S W >L KL BN JFR>L3pm=>HRN BN+>HRN FR>L>HRN BN >HRN63016 63017 63018 63019 63020 63021 63022 6302...Compl15 6 7 8 9 10 11>HRN BN >HRN BN JFR>L
7>L >HRN W->L BNJWLeviticus172528165DBR>L >HRN W >L BN+S......63016 63017 63018 63019 63020-paral6 7 8 9
8>L >HRNLeviticus172528165DBR>L >HRN3sm=>HRN>HRN63016 63017-paral7
9>L BNJWLeviticus172528165DBR>L BN+312......63019 63020-paral8 9
10sfx:WLeviticus172528165DBRsfx3sm=>HRN>HRN63020-gentf9
11BNJ JFR>LLeviticus172528165DBRBN JFR>L0pm=BN JFR>LBN JFR>L63024 63025-gentf10 11
12JFR>LLeviticus172528165DBRJFR>LJFR>LJFR>L63025-gentf11
13>MRTLeviticus172528166>MR>MR2sm=MCH63027VbPred12
14sfx:HMLeviticus172528166>MRsfx3pm=>HRN BN+>HRN FR>L>HRN BN >HRN63028Compl113>HRN BN >HRN BN JFR>L
15ZHLeviticus172528167....ZH0sm=DBRDBR63029Subj14
16H-DBRLeviticus172528167....H DBR0sm=DBRDBR63030 63031PrCompl15
17YWHLeviticus172528168YWHYWH3sm=JHWHJHWH63033VbPred16
18JHWHLeviticus172528168YWHJHWH3sm=JHWHJHWH63034Subj17
19L->MRLeviticus172528169>MRL >MR3sm=JHWHJHWH63035 63036VbPred18
20>JC >JCLeviticus173528170....>JC >JC3sm=>JC >JC>JC >JC63037 6303857219 20 21
21>JCLeviticus173528170....>JC......63037-paral20
22>JCLeviticus173528170....>JC0sm=>JC>JC63038-paral21
23M-BJT JFR>LLeviticus173528170....MN BJT JFR>L0sm=BJT JFR>LBJT JFR>L63039 63040 63041-specf22 23
24JFR>LLeviticus173528170....JFR>LJFR>LJFR>L63041-gentf23
25JCXVLeviticus173528171CXVCXV3sm=>JC >JC>JC >JC63043VbPred24
26CWR >W KFB >W <ZLeviticus173528171CXVCWR >W KFB >W <Z3sm=CWR KFB <ZCWR KFB <Z63044 63045 63046 63047 63048Obj125 26 27 28 29
27CWR >W KFBLeviticus173528171CXVCWR >W KFB......63044 63045 63046-paral26 27 28
28CWRLeviticus173528171CXVCWR......63044-paral27
29KFBLeviticus173528171CXVKFB......63046-paral28
30<ZLeviticus173528171CXV<Z......63048-paral29
.............................................
4063BRJT R>CNJMLeviticus2645529377ZKRBRJT R>CWN......68924 68925Obj14062 4063
4064R>CNJMLeviticus2645529377ZKRR>CWN0pm=R>CWNR>CWN68925-gentf4063
4065HWY>TJLeviticus2645529378JY>JY>1sc=>NJ>NJ68927VbPred4064JHWH
4066sfx:MLeviticus2645529378JY>sfx3pm=GWJGWJ68928Obj14065C>R
4067M->RY MYRJMLeviticus2645529378JY>MN >RY MYRJM>RY MYRJM>RY MYRJM68929 68930 68931Compl14066 4067
4068MYRJMLeviticus2645529378JY>MYRJM3pm=MYRJMMYRJM68931-gentf4067
4069L-<JNJ H-GWJMLeviticus2645529378JY>L <JN H GWJ......68932 68933 68934 68935Adjunc4068 4069
4070H-GWJMLeviticus2645529378JY>H GWJ3pm=GWJGWJ68934 68935-gentf4069
4071L-HJTLeviticus2645529379HJHL HJH1sc=>NJ>NJ68936 68937VbPred4070JHWH
4072sfx:HMLeviticus2645529379HJHsfx3pm=GWJGWJ68938Compl14071C>R
4073L->LHJMLeviticus2645529379HJHL >LHJM0pm=>LHJM>LHJM68939 68940PrCompl4072JHWH
4074>NJLeviticus2645529380....>NJ1sc=>NJ>NJ68941Subj4073JHWH
4075JHWHLeviticus2645529380....JHWH0sm=JHWHJHWH68942PrCompl4074
4076>LHLeviticus2646529381....>LH0pm=XQ MCPV TWRHXQ MCPV TWRH68943Subj4075
4077H-XQJM W-H-MCPVJM W-H-TWRTLeviticus2646529381....H XQ W H MCPV W H TWRH0pm=XQ MCPV TWRHXQ MCPV TWRH68944 68945 68946 68947 68948 68949 68950 68951PrCompl4076 4077 4078 4079 4080
4078H-XQJM W-H-MCPVJMLeviticus2646529381....H XQ W H MCPV......68944 68945 68946 68947 68948-paral4077 4078 4079
4079H-XQJMLeviticus2646529381....H XQ......68944 68945-paral4078
4080H-MCPVJMLeviticus2646529381....H MCPV......68947 68948-paral4079
4081H-TWRTLeviticus2646529381....H TWRH......68950 68951-paral4080
4082NTNLeviticus2646529382NTNNTN3sm=JHWHJHWH68953VbPred4081
4083JHWHLeviticus2646529382NTNJHWH3sm=JHWHJHWH68954Subj4082
4084BJNW W-BJN BNJ JFR>LLeviticus2646529382NTNBJN W BJN BN JFR>L......68955 68956 68957 68958 68959Compl14083 4084 4085 4086 4087
4085BJNWLeviticus2646529382NTNBJN......68955-paral4084 4085
4086sfx:WLeviticus2646529382NTNsfx3sm=JHWHJHWH68955-gentf4085
4087BJN BNJ JFR>LLeviticus2646529382NTNBJN BN JFR>L......68957 68958 68959-paral4086 4087
4088JFR>LLeviticus2646529382NTNJFR>L......68959-gentf4087
4089B-HR SJNJLeviticus2646529382NTNB HR SJNJ......68960 68961 68962Locat4088 4089
4090SJNJLeviticus2646529382NTNSJNJ......68962-gentf4089
4091B-JD MCHLeviticus2646529382NTNB JD MCH......68963 68964 68965Adjunc4090 4091
4092MCHLeviticus2646529382NTNMCH0sm=MCHMCH68965-gentf4091
\n", "

4092 rows × 14 columns

\n", "
" ], "text/plain": [ " surface_text book chapter verse \\\n", "1 JDBR Leviticus 17 1 \n", "2 JHWH Leviticus 17 1 \n", "3 >L MCH Leviticus 17 1 \n", "4 L->MR Leviticus 17 1 \n", "5 DBR Leviticus 17 2 \n", "6 >L >HRN W->L BNJW W->L KL BNJ JFR>L Leviticus 17 2 \n", "7 >L >HRN W->L BNJW Leviticus 17 2 \n", "8 >L >HRN Leviticus 17 2 \n", "9 >L BNJW Leviticus 17 2 \n", "10 sfx:W Leviticus 17 2 \n", "11 BNJ JFR>L Leviticus 17 2 \n", "12 JFR>L Leviticus 17 2 \n", "13 >MRT Leviticus 17 2 \n", "14 sfx:HM Leviticus 17 2 \n", "15 ZH Leviticus 17 2 \n", "16 H-DBR Leviticus 17 2 \n", "17 YWH Leviticus 17 2 \n", "18 JHWH Leviticus 17 2 \n", "19 L->MR Leviticus 17 2 \n", "20 >JC >JC Leviticus 17 3 \n", "21 >JC Leviticus 17 3 \n", "22 >JC Leviticus 17 3 \n", "23 M-BJT JFR>L Leviticus 17 3 \n", "24 JFR>L Leviticus 17 3 \n", "25 JCXV Leviticus 17 3 \n", "26 CWR >W KFB >W W KFB Leviticus 17 3 \n", "28 CWR Leviticus 17 3 \n", "29 KFB Leviticus 17 3 \n", "30 CNJM Leviticus 26 45 \n", "4064 R>CNJM Leviticus 26 45 \n", "4065 HWY>TJ Leviticus 26 45 \n", "4066 sfx:M Leviticus 26 45 \n", "4067 M->RY MYRJM Leviticus 26 45 \n", "4068 MYRJM Leviticus 26 45 \n", "4069 L-LHJM Leviticus 26 45 \n", "4074 >NJ Leviticus 26 45 \n", "4075 JHWH Leviticus 26 45 \n", "4076 >LH Leviticus 26 46 \n", "4077 H-XQJM W-H-MCPVJM W-H-TWRT Leviticus 26 46 \n", "4078 H-XQJM W-H-MCPVJM Leviticus 26 46 \n", "4079 H-XQJM Leviticus 26 46 \n", "4080 H-MCPVJM Leviticus 26 46 \n", "4081 H-TWRT Leviticus 26 46 \n", "4082 NTN Leviticus 26 46 \n", "4083 JHWH Leviticus 26 46 \n", "4084 BJNW W-BJN BNJ JFR>L Leviticus 26 46 \n", "4085 BJNW Leviticus 26 46 \n", "4086 sfx:W Leviticus 26 46 \n", "4087 BJN BNJ JFR>L Leviticus 26 46 \n", "4088 JFR>L Leviticus 26 46 \n", "4089 B-HR SJNJ Leviticus 26 46 \n", "4090 SJNJ Leviticus 26 46 \n", "4091 B-JD MCH Leviticus 26 46 \n", "4092 MCH Leviticus 26 46 \n", "\n", " clause_atom predicate reference \\\n", "1 528163 DBR DBR \n", "2 528163 DBR JHWH \n", "3 528163 DBR >L MCH \n", "4 528164 >MR L >MR \n", "5 528165 DBR DBR \n", "6 528165 DBR >L >HRN W >L BN+S W >L KL BN JFR>L \n", "7 528165 DBR >L >HRN W >L BN+S \n", "8 528165 DBR >L >HRN \n", "9 528165 DBR >L BN+312 \n", "10 528165 DBR sfx \n", "11 528165 DBR BN JFR>L \n", "12 528165 DBR JFR>L \n", "13 528166 >MR >MR \n", "14 528166 >MR sfx \n", "15 528167 .... ZH \n", "16 528167 .... H DBR \n", "17 528168 YWH YWH \n", "18 528168 YWH JHWH \n", "19 528169 >MR L >MR \n", "20 528170 .... >JC >JC \n", "21 528170 .... >JC \n", "22 528170 .... >JC \n", "23 528170 .... MN BJT JFR>L \n", "24 528170 .... JFR>L \n", "25 528171 CXV CXV \n", "26 528171 CXV CWR >W KFB >W W KFB \n", "28 528171 CXV CWR \n", "29 528171 CXV KFB \n", "30 528171 CXV CWN \n", "4064 529377 ZKR R>CWN \n", "4065 529378 JY> JY> \n", "4066 529378 JY> sfx \n", "4067 529378 JY> MN >RY MYRJM \n", "4068 529378 JY> MYRJM \n", "4069 529378 JY> L H GWJ \n", "4071 529379 HJH L HJH \n", "4072 529379 HJH sfx \n", "4073 529379 HJH L >LHJM \n", "4074 529380 .... >NJ \n", "4075 529380 .... JHWH \n", "4076 529381 .... >LH \n", "4077 529381 .... H XQ W H MCPV W H TWRH \n", "4078 529381 .... H XQ W H MCPV \n", "4079 529381 .... H XQ \n", "4080 529381 .... H MCPV \n", "4081 529381 .... H TWRH \n", "4082 529382 NTN NTN \n", "4083 529382 NTN JHWH \n", "4084 529382 NTN BJN W BJN BN JFR>L \n", "4085 529382 NTN BJN \n", "4086 529382 NTN sfx \n", "4087 529382 NTN BJN BN JFR>L \n", "4088 529382 NTN JFR>L \n", "4089 529382 NTN B HR SJNJ \n", "4090 529382 NTN SJNJ \n", "4091 529382 NTN B JD MCH \n", "4092 529382 NTN MCH \n", "\n", " participant actor \\\n", "1 3sm=JHWH JHWH \n", "2 3sm=JHWH JHWH \n", "3 0sm=MCH MCH \n", "4 3sm=JHWH JHWH \n", "5 2sm= MCH \n", "6 3pm=>HRN BN+>HRN FR>L >HRN BN >HRN \n", "7 ... ... \n", "8 3sm=>HRN >HRN \n", "9 ... ... \n", "10 3sm=>HRN >HRN \n", "11 0pm=BN JFR>L BN JFR>L \n", "12 JFR>L JFR>L \n", "13 2sm= MCH \n", "14 3pm=>HRN BN+>HRN FR>L >HRN BN >HRN \n", "15 0sm=DBR DBR \n", "16 0sm=DBR DBR \n", "17 3sm=JHWH JHWH \n", "18 3sm=JHWH JHWH \n", "19 3sm=JHWH JHWH \n", "20 3sm=>JC >JC >JC >JC \n", "21 ... ... \n", "22 0sm=>JC >JC \n", "23 0sm=BJT JFR>L BJT JFR>L \n", "24 JFR>L JFR>L \n", "25 3sm=>JC >JC >JC >JC \n", "26 3sm=CWR KFB CWN R>CWN \n", "4065 1sc=>NJ >NJ \n", "4066 3pm=GWJ GWJ \n", "4067 >RY MYRJM >RY MYRJM \n", "4068 3pm=MYRJM MYRJM \n", "4069 ... ... \n", "4070 3pm=GWJ GWJ \n", "4071 1sc=>NJ >NJ \n", "4072 3pm=GWJ GWJ \n", "4073 0pm=>LHJM >LHJM \n", "4074 1sc=>NJ >NJ \n", "4075 0sm=JHWH JHWH \n", "4076 0pm=XQ MCPV TWRH XQ MCPV TWRH \n", "4077 0pm=XQ MCPV TWRH XQ MCPV TWRH \n", "4078 ... ... \n", "4079 ... ... \n", "4080 ... ... \n", "4081 ... ... \n", "4082 3sm=JHWH JHWH \n", "4083 3sm=JHWH JHWH \n", "4084 ... ... \n", "4085 ... ... \n", "4086 3sm=JHWH JHWH \n", "4087 ... ... \n", "4088 ... ... \n", "4089 ... ... \n", "4090 ... ... \n", "4091 ... ... \n", "4092 0sm=MCH MCH \n", "\n", " slots func \\\n", "1 63009 VbPred \n", "2 63010 Subj \n", "3 63011 63012 Compl1 \n", "4 63013 63014 VbPred \n", "5 63015 VbPred \n", "6 63016 63017 63018 63019 63020 63021 63022 6302... Compl1 \n", "7 63016 63017 63018 63019 63020 -paral \n", "8 63016 63017 -paral \n", "9 63019 63020 -paral \n", "10 63020 -gentf \n", "11 63024 63025 -gentf \n", "12 63025 -gentf \n", "13 63027 VbPred \n", "14 63028 Compl1 \n", "15 63029 Subj \n", "16 63030 63031 PrCompl \n", "17 63033 VbPred \n", "18 63034 Subj \n", "19 63035 63036 VbPred \n", "20 63037 63038 572 \n", "21 63037 -paral \n", "22 63038 -paral \n", "23 63039 63040 63041 -specf \n", "24 63041 -gentf \n", "25 63043 VbPred \n", "26 63044 63045 63046 63047 63048 Obj1 \n", "27 63044 63045 63046 -paral \n", "28 63044 -paral \n", "29 63046 -paral \n", "30 63048 -paral \n", "... ... ... \n", "4063 68924 68925 Obj1 \n", "4064 68925 -gentf \n", "4065 68927 VbPred \n", "4066 68928 Obj1 \n", "4067 68929 68930 68931 Compl1 \n", "4068 68931 -gentf \n", "4069 68932 68933 68934 68935 Adjunc \n", "4070 68934 68935 -gentf \n", "4071 68936 68937 VbPred \n", "4072 68938 Compl1 \n", "4073 68939 68940 PrCompl \n", "4074 68941 Subj \n", "4075 68942 PrCompl \n", "4076 68943 Subj \n", "4077 68944 68945 68946 68947 68948 68949 68950 68951 PrCompl \n", "4078 68944 68945 68946 68947 68948 -paral \n", "4079 68944 68945 -paral \n", "4080 68947 68948 -paral \n", "4081 68950 68951 -paral \n", "4082 68953 VbPred \n", "4083 68954 Subj \n", "4084 68955 68956 68957 68958 68959 Compl1 \n", "4085 68955 -paral \n", "4086 68955 -gentf \n", "4087 68957 68958 68959 -paral \n", "4088 68959 -gentf \n", "4089 68960 68961 68962 Locat \n", "4090 68962 -gentf \n", "4091 68963 68964 68965 Adjunc \n", "4092 68965 -gentf \n", "\n", " compound 1_correction 2_correction \n", "1 0 \n", "2 1 \n", "3 2 \n", "4 3 \n", "5 4 \n", "6 5 6 7 8 9 10 11 >HRN BN >HRN BN JFR>L \n", "7 6 7 8 9 \n", "8 7 \n", "9 8 9 \n", "10 9 \n", "11 10 11 \n", "12 11 \n", "13 12 \n", "14 13 >HRN BN >HRN BN JFR>L \n", "15 14 \n", "16 15 \n", "17 16 \n", "18 17 \n", "19 18 \n", "20 19 20 21 \n", "21 20 \n", "22 21 \n", "23 22 23 \n", "24 23 \n", "25 24 \n", "26 25 26 27 28 29 \n", "27 26 27 28 \n", "28 27 \n", "29 28 \n", "30 29 \n", "... ... ... ... \n", "4063 4062 4063 \n", "4064 4063 \n", "4065 4064 JHWH \n", "4066 4065 C>R \n", "4067 4066 4067 \n", "4068 4067 \n", "4069 4068 4069 \n", "4070 4069 \n", "4071 4070 JHWH \n", "4072 4071 C>R \n", "4073 4072 JHWH \n", "4074 4073 JHWH \n", "4075 4074 \n", "4076 4075 \n", "4077 4076 4077 4078 4079 4080 \n", "4078 4077 4078 4079 \n", "4079 4078 \n", "4080 4079 \n", "4081 4080 \n", "4082 4081 \n", "4083 4082 \n", "4084 4083 4084 4085 4086 4087 \n", "4085 4084 4085 \n", "4086 4085 \n", "4087 4086 4087 \n", "4088 4087 \n", "4089 4088 4089 \n", "4090 4089 \n", "4091 4090 4091 \n", "4092 4091 \n", "\n", "[4092 rows x 14 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Implement corrections:\n", "\n", "How many corrections have been made?" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First round: 235\n", "Second round: 687\n" ] } ], "source": [ "rows_total = len(data)\n", "\n", "print(f\"First round: {rows_total-len(data[data['1_correction'] == ''])}\")\n", "print(f\"Second round: {rows_total-len(data[data['2_correction'] == ''])}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The manual corrections are incorporated in following order: First, it is checked whether a correction from 2. correction round exists, and thereafter if a correction from 1. correction round exists. If one of these exists (2. round has priority), it will overwrite the actor in the 'actor' column." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "scrolled": true }, "outputs": [], "source": [ "for row in data.iterrows():\n", " row = row[0]\n", " actor = data['actor'][row]\n", " correction_1 = data['1_correction'][row]\n", " correction_2 = data['2_correction'][row]\n", " \n", " if correction_2 != '':\n", " actor = correction_2\n", " elif correction_1 != '':\n", " actor = correction_1\n", " \n", " data['actor'][row] = actor #Update dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The columns '1_correction' and '2_correction' can now be dropped:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "data = data.drop(columns=['1_correction', '2_correction']) #drop columns\n", "data['otype'] = '...'" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
surface_textbookchapterverseclause_atompredicatereferenceparticipantactorslotsfunccompoundotype
1JDBRLeviticus171528163DBRDBR3sm=JHWHJHWH63009VbPred0...
2JHWHLeviticus171528163DBRJHWH3sm=JHWHJHWH63010Subj1...
3>L MCHLeviticus171528163DBR>L MCH0sm=MCHMCH63011 63012Compl12...
4L->MRLeviticus171528164>MRL >MR3sm=JHWHJHWH63013 63014VbPred3...
5DBRLeviticus172528165DBRDBR2sm=MCH63015VbPred4...
6>L >HRN W->L BNJW W->L KL BNJ JFR>LLeviticus172528165DBR>L >HRN W >L BN+S W >L KL BN JFR>L3pm=>HRN BN+>HRN FR>L>HRN BN >HRN BN JFR>L63016 63017 63018 63019 63020 63021 63022 6302...Compl15 6 7 8 9 10 11...
7>L >HRN W->L BNJWLeviticus172528165DBR>L >HRN W >L BN+S......63016 63017 63018 63019 63020-paral6 7 8 9...
8>L >HRNLeviticus172528165DBR>L >HRN3sm=>HRN>HRN63016 63017-paral7...
9>L BNJWLeviticus172528165DBR>L BN+312......63019 63020-paral8 9...
10sfx:WLeviticus172528165DBRsfx3sm=>HRN>HRN63020-gentf9...
11BNJ JFR>LLeviticus172528165DBRBN JFR>L0pm=BN JFR>LBN JFR>L63024 63025-gentf10 11...
12JFR>LLeviticus172528165DBRJFR>LJFR>LJFR>L63025-gentf11...
13>MRTLeviticus172528166>MR>MR2sm=MCH63027VbPred12...
14sfx:HMLeviticus172528166>MRsfx3pm=>HRN BN+>HRN FR>L>HRN BN >HRN BN JFR>L63028Compl113...
15ZHLeviticus172528167....ZH0sm=DBRDBR63029Subj14...
16H-DBRLeviticus172528167....H DBR0sm=DBRDBR63030 63031PrCompl15...
17YWHLeviticus172528168YWHYWH3sm=JHWHJHWH63033VbPred16...
18JHWHLeviticus172528168YWHJHWH3sm=JHWHJHWH63034Subj17...
19L->MRLeviticus172528169>MRL >MR3sm=JHWHJHWH63035 63036VbPred18...
20>JC >JCLeviticus173528170....>JC >JC3sm=>JC >JC>JC >JC63037 6303857219 20 21...
21>JCLeviticus173528170....>JC......63037-paral20...
22>JCLeviticus173528170....>JC0sm=>JC>JC63038-paral21...
23M-BJT JFR>LLeviticus173528170....MN BJT JFR>L0sm=BJT JFR>LBJT JFR>L63039 63040 63041-specf22 23...
24JFR>LLeviticus173528170....JFR>LJFR>LJFR>L63041-gentf23...
25JCXVLeviticus173528171CXVCXV3sm=>JC >JC>JC >JC63043VbPred24...
26CWR >W KFB >W <ZLeviticus173528171CXVCWR >W KFB >W <Z3sm=CWR KFB <ZCWR KFB <Z63044 63045 63046 63047 63048Obj125 26 27 28 29...
27CWR >W KFBLeviticus173528171CXVCWR >W KFB......63044 63045 63046-paral26 27 28...
28CWRLeviticus173528171CXVCWR......63044-paral27...
29KFBLeviticus173528171CXVKFB......63046-paral28...
30<ZLeviticus173528171CXV<Z......63048-paral29...
..........................................
4063BRJT R>CNJMLeviticus2645529377ZKRBRJT R>CWN......68924 68925Obj14062 4063...
4064R>CNJMLeviticus2645529377ZKRR>CWN0pm=R>CWNR>CWN68925-gentf4063...
4065HWY>TJLeviticus2645529378JY>JY>1sc=>NJJHWH68927VbPred4064...
4066sfx:MLeviticus2645529378JY>sfx3pm=GWJC>R68928Obj14065...
4067M->RY MYRJMLeviticus2645529378JY>MN >RY MYRJM>RY MYRJM>RY MYRJM68929 68930 68931Compl14066 4067...
4068MYRJMLeviticus2645529378JY>MYRJM3pm=MYRJMMYRJM68931-gentf4067...
4069L-<JNJ H-GWJMLeviticus2645529378JY>L <JN H GWJ......68932 68933 68934 68935Adjunc4068 4069...
4070H-GWJMLeviticus2645529378JY>H GWJ3pm=GWJGWJ68934 68935-gentf4069...
4071L-HJTLeviticus2645529379HJHL HJH1sc=>NJJHWH68936 68937VbPred4070...
4072sfx:HMLeviticus2645529379HJHsfx3pm=GWJC>R68938Compl14071...
4073L->LHJMLeviticus2645529379HJHL >LHJM0pm=>LHJMJHWH68939 68940PrCompl4072...
4074>NJLeviticus2645529380....>NJ1sc=>NJJHWH68941Subj4073...
4075JHWHLeviticus2645529380....JHWH0sm=JHWHJHWH68942PrCompl4074...
4076>LHLeviticus2646529381....>LH0pm=XQ MCPV TWRHXQ MCPV TWRH68943Subj4075...
4077H-XQJM W-H-MCPVJM W-H-TWRTLeviticus2646529381....H XQ W H MCPV W H TWRH0pm=XQ MCPV TWRHXQ MCPV TWRH68944 68945 68946 68947 68948 68949 68950 68951PrCompl4076 4077 4078 4079 4080...
4078H-XQJM W-H-MCPVJMLeviticus2646529381....H XQ W H MCPV......68944 68945 68946 68947 68948-paral4077 4078 4079...
4079H-XQJMLeviticus2646529381....H XQ......68944 68945-paral4078...
4080H-MCPVJMLeviticus2646529381....H MCPV......68947 68948-paral4079...
4081H-TWRTLeviticus2646529381....H TWRH......68950 68951-paral4080...
4082NTNLeviticus2646529382NTNNTN3sm=JHWHJHWH68953VbPred4081...
4083JHWHLeviticus2646529382NTNJHWH3sm=JHWHJHWH68954Subj4082...
4084BJNW W-BJN BNJ JFR>LLeviticus2646529382NTNBJN W BJN BN JFR>L......68955 68956 68957 68958 68959Compl14083 4084 4085 4086 4087...
4085BJNWLeviticus2646529382NTNBJN......68955-paral4084 4085...
4086sfx:WLeviticus2646529382NTNsfx3sm=JHWHJHWH68955-gentf4085...
4087BJN BNJ JFR>LLeviticus2646529382NTNBJN BN JFR>L......68957 68958 68959-paral4086 4087...
4088JFR>LLeviticus2646529382NTNJFR>L......68959-gentf4087...
4089B-HR SJNJLeviticus2646529382NTNB HR SJNJ......68960 68961 68962Locat4088 4089...
4090SJNJLeviticus2646529382NTNSJNJ......68962-gentf4089...
4091B-JD MCHLeviticus2646529382NTNB JD MCH......68963 68964 68965Adjunc4090 4091...
4092MCHLeviticus2646529382NTNMCH0sm=MCHMCH68965-gentf4091...
\n", "

4092 rows × 13 columns

\n", "
" ], "text/plain": [ " surface_text book chapter verse \\\n", "1 JDBR Leviticus 17 1 \n", "2 JHWH Leviticus 17 1 \n", "3 >L MCH Leviticus 17 1 \n", "4 L->MR Leviticus 17 1 \n", "5 DBR Leviticus 17 2 \n", "6 >L >HRN W->L BNJW W->L KL BNJ JFR>L Leviticus 17 2 \n", "7 >L >HRN W->L BNJW Leviticus 17 2 \n", "8 >L >HRN Leviticus 17 2 \n", "9 >L BNJW Leviticus 17 2 \n", "10 sfx:W Leviticus 17 2 \n", "11 BNJ JFR>L Leviticus 17 2 \n", "12 JFR>L Leviticus 17 2 \n", "13 >MRT Leviticus 17 2 \n", "14 sfx:HM Leviticus 17 2 \n", "15 ZH Leviticus 17 2 \n", "16 H-DBR Leviticus 17 2 \n", "17 YWH Leviticus 17 2 \n", "18 JHWH Leviticus 17 2 \n", "19 L->MR Leviticus 17 2 \n", "20 >JC >JC Leviticus 17 3 \n", "21 >JC Leviticus 17 3 \n", "22 >JC Leviticus 17 3 \n", "23 M-BJT JFR>L Leviticus 17 3 \n", "24 JFR>L Leviticus 17 3 \n", "25 JCXV Leviticus 17 3 \n", "26 CWR >W KFB >W W KFB Leviticus 17 3 \n", "28 CWR Leviticus 17 3 \n", "29 KFB Leviticus 17 3 \n", "30 CNJM Leviticus 26 45 \n", "4064 R>CNJM Leviticus 26 45 \n", "4065 HWY>TJ Leviticus 26 45 \n", "4066 sfx:M Leviticus 26 45 \n", "4067 M->RY MYRJM Leviticus 26 45 \n", "4068 MYRJM Leviticus 26 45 \n", "4069 L-LHJM Leviticus 26 45 \n", "4074 >NJ Leviticus 26 45 \n", "4075 JHWH Leviticus 26 45 \n", "4076 >LH Leviticus 26 46 \n", "4077 H-XQJM W-H-MCPVJM W-H-TWRT Leviticus 26 46 \n", "4078 H-XQJM W-H-MCPVJM Leviticus 26 46 \n", "4079 H-XQJM Leviticus 26 46 \n", "4080 H-MCPVJM Leviticus 26 46 \n", "4081 H-TWRT Leviticus 26 46 \n", "4082 NTN Leviticus 26 46 \n", "4083 JHWH Leviticus 26 46 \n", "4084 BJNW W-BJN BNJ JFR>L Leviticus 26 46 \n", "4085 BJNW Leviticus 26 46 \n", "4086 sfx:W Leviticus 26 46 \n", "4087 BJN BNJ JFR>L Leviticus 26 46 \n", "4088 JFR>L Leviticus 26 46 \n", "4089 B-HR SJNJ Leviticus 26 46 \n", "4090 SJNJ Leviticus 26 46 \n", "4091 B-JD MCH Leviticus 26 46 \n", "4092 MCH Leviticus 26 46 \n", "\n", " clause_atom predicate reference \\\n", "1 528163 DBR DBR \n", "2 528163 DBR JHWH \n", "3 528163 DBR >L MCH \n", "4 528164 >MR L >MR \n", "5 528165 DBR DBR \n", "6 528165 DBR >L >HRN W >L BN+S W >L KL BN JFR>L \n", "7 528165 DBR >L >HRN W >L BN+S \n", "8 528165 DBR >L >HRN \n", "9 528165 DBR >L BN+312 \n", "10 528165 DBR sfx \n", "11 528165 DBR BN JFR>L \n", "12 528165 DBR JFR>L \n", "13 528166 >MR >MR \n", "14 528166 >MR sfx \n", "15 528167 .... ZH \n", "16 528167 .... H DBR \n", "17 528168 YWH YWH \n", "18 528168 YWH JHWH \n", "19 528169 >MR L >MR \n", "20 528170 .... >JC >JC \n", "21 528170 .... >JC \n", "22 528170 .... >JC \n", "23 528170 .... MN BJT JFR>L \n", "24 528170 .... JFR>L \n", "25 528171 CXV CXV \n", "26 528171 CXV CWR >W KFB >W W KFB \n", "28 528171 CXV CWR \n", "29 528171 CXV KFB \n", "30 528171 CXV CWN \n", "4064 529377 ZKR R>CWN \n", "4065 529378 JY> JY> \n", "4066 529378 JY> sfx \n", "4067 529378 JY> MN >RY MYRJM \n", "4068 529378 JY> MYRJM \n", "4069 529378 JY> L H GWJ \n", "4071 529379 HJH L HJH \n", "4072 529379 HJH sfx \n", "4073 529379 HJH L >LHJM \n", "4074 529380 .... >NJ \n", "4075 529380 .... JHWH \n", "4076 529381 .... >LH \n", "4077 529381 .... H XQ W H MCPV W H TWRH \n", "4078 529381 .... H XQ W H MCPV \n", "4079 529381 .... H XQ \n", "4080 529381 .... H MCPV \n", "4081 529381 .... H TWRH \n", "4082 529382 NTN NTN \n", "4083 529382 NTN JHWH \n", "4084 529382 NTN BJN W BJN BN JFR>L \n", "4085 529382 NTN BJN \n", "4086 529382 NTN sfx \n", "4087 529382 NTN BJN BN JFR>L \n", "4088 529382 NTN JFR>L \n", "4089 529382 NTN B HR SJNJ \n", "4090 529382 NTN SJNJ \n", "4091 529382 NTN B JD MCH \n", "4092 529382 NTN MCH \n", "\n", " participant actor \\\n", "1 3sm=JHWH JHWH \n", "2 3sm=JHWH JHWH \n", "3 0sm=MCH MCH \n", "4 3sm=JHWH JHWH \n", "5 2sm= MCH \n", "6 3pm=>HRN BN+>HRN FR>L >HRN BN >HRN BN JFR>L \n", "7 ... ... \n", "8 3sm=>HRN >HRN \n", "9 ... ... \n", "10 3sm=>HRN >HRN \n", "11 0pm=BN JFR>L BN JFR>L \n", "12 JFR>L JFR>L \n", "13 2sm= MCH \n", "14 3pm=>HRN BN+>HRN FR>L >HRN BN >HRN BN JFR>L \n", "15 0sm=DBR DBR \n", "16 0sm=DBR DBR \n", "17 3sm=JHWH JHWH \n", "18 3sm=JHWH JHWH \n", "19 3sm=JHWH JHWH \n", "20 3sm=>JC >JC >JC >JC \n", "21 ... ... \n", "22 0sm=>JC >JC \n", "23 0sm=BJT JFR>L BJT JFR>L \n", "24 JFR>L JFR>L \n", "25 3sm=>JC >JC >JC >JC \n", "26 3sm=CWR KFB CWN R>CWN \n", "4065 1sc=>NJ JHWH \n", "4066 3pm=GWJ C>R \n", "4067 >RY MYRJM >RY MYRJM \n", "4068 3pm=MYRJM MYRJM \n", "4069 ... ... \n", "4070 3pm=GWJ GWJ \n", "4071 1sc=>NJ JHWH \n", "4072 3pm=GWJ C>R \n", "4073 0pm=>LHJM JHWH \n", "4074 1sc=>NJ JHWH \n", "4075 0sm=JHWH JHWH \n", "4076 0pm=XQ MCPV TWRH XQ MCPV TWRH \n", "4077 0pm=XQ MCPV TWRH XQ MCPV TWRH \n", "4078 ... ... \n", "4079 ... ... \n", "4080 ... ... \n", "4081 ... ... \n", "4082 3sm=JHWH JHWH \n", "4083 3sm=JHWH JHWH \n", "4084 ... ... \n", "4085 ... ... \n", "4086 3sm=JHWH JHWH \n", "4087 ... ... \n", "4088 ... ... \n", "4089 ... ... \n", "4090 ... ... \n", "4091 ... ... \n", "4092 0sm=MCH MCH \n", "\n", " slots func \\\n", "1 63009 VbPred \n", "2 63010 Subj \n", "3 63011 63012 Compl1 \n", "4 63013 63014 VbPred \n", "5 63015 VbPred \n", "6 63016 63017 63018 63019 63020 63021 63022 6302... Compl1 \n", "7 63016 63017 63018 63019 63020 -paral \n", "8 63016 63017 -paral \n", "9 63019 63020 -paral \n", "10 63020 -gentf \n", "11 63024 63025 -gentf \n", "12 63025 -gentf \n", "13 63027 VbPred \n", "14 63028 Compl1 \n", "15 63029 Subj \n", "16 63030 63031 PrCompl \n", "17 63033 VbPred \n", "18 63034 Subj \n", "19 63035 63036 VbPred \n", "20 63037 63038 572 \n", "21 63037 -paral \n", "22 63038 -paral \n", "23 63039 63040 63041 -specf \n", "24 63041 -gentf \n", "25 63043 VbPred \n", "26 63044 63045 63046 63047 63048 Obj1 \n", "27 63044 63045 63046 -paral \n", "28 63044 -paral \n", "29 63046 -paral \n", "30 63048 -paral \n", "... ... ... \n", "4063 68924 68925 Obj1 \n", "4064 68925 -gentf \n", "4065 68927 VbPred \n", "4066 68928 Obj1 \n", "4067 68929 68930 68931 Compl1 \n", "4068 68931 -gentf \n", "4069 68932 68933 68934 68935 Adjunc \n", "4070 68934 68935 -gentf \n", "4071 68936 68937 VbPred \n", "4072 68938 Compl1 \n", "4073 68939 68940 PrCompl \n", "4074 68941 Subj \n", "4075 68942 PrCompl \n", "4076 68943 Subj \n", "4077 68944 68945 68946 68947 68948 68949 68950 68951 PrCompl \n", "4078 68944 68945 68946 68947 68948 -paral \n", "4079 68944 68945 -paral \n", "4080 68947 68948 -paral \n", "4081 68950 68951 -paral \n", "4082 68953 VbPred \n", "4083 68954 Subj \n", "4084 68955 68956 68957 68958 68959 Compl1 \n", "4085 68955 -paral \n", "4086 68955 -gentf \n", "4087 68957 68958 68959 -paral \n", "4088 68959 -gentf \n", "4089 68960 68961 68962 Locat \n", "4090 68962 -gentf \n", "4091 68963 68964 68965 Adjunc \n", "4092 68965 -gentf \n", "\n", " compound otype \n", "1 0 ... \n", "2 1 ... \n", "3 2 ... \n", "4 3 ... \n", "5 4 ... \n", "6 5 6 7 8 9 10 11 ... \n", "7 6 7 8 9 ... \n", "8 7 ... \n", "9 8 9 ... \n", "10 9 ... \n", "11 10 11 ... \n", "12 11 ... \n", "13 12 ... \n", "14 13 ... \n", "15 14 ... \n", "16 15 ... \n", "17 16 ... \n", "18 17 ... \n", "19 18 ... \n", "20 19 20 21 ... \n", "21 20 ... \n", "22 21 ... \n", "23 22 23 ... \n", "24 23 ... \n", "25 24 ... \n", "26 25 26 27 28 29 ... \n", "27 26 27 28 ... \n", "28 27 ... \n", "29 28 ... \n", "30 29 ... \n", "... ... ... \n", "4063 4062 4063 ... \n", "4064 4063 ... \n", "4065 4064 ... \n", "4066 4065 ... \n", "4067 4066 4067 ... \n", "4068 4067 ... \n", "4069 4068 4069 ... \n", "4070 4069 ... \n", "4071 4070 ... \n", "4072 4071 ... \n", "4073 4072 ... \n", "4074 4073 ... \n", "4075 4074 ... \n", "4076 4075 ... \n", "4077 4076 4077 4078 4079 4080 ... \n", "4078 4077 4078 4079 ... \n", "4079 4078 ... \n", "4080 4079 ... \n", "4081 4080 ... \n", "4082 4081 ... \n", "4083 4082 ... \n", "4084 4083 4084 4085 4086 4087 ... \n", "4085 4084 4085 ... \n", "4086 4085 ... \n", "4087 4086 4087 ... \n", "4088 4087 ... \n", "4089 4088 4089 ... \n", "4090 4089 ... \n", "4091 4090 4091 ... \n", "4092 4091 ... \n", "\n", "[4092 rows x 13 columns]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1. Finding nearest object type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When exporting the actor references as TF-features, we need to know what object (word, phrase etc.) the actor reference applies to. That information does not exist directly in the dataset, as the actors are only related to an interval of slots. The following function finds the nearest object type in terms of slots occupied. The algorithm goes from the smallest type (suffix) to the larger types (subphrases and phrases). At each step, it is checked whether the first word and the last word of any given object match the first and last words of the actor reference:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "def nearestObject(row):\n", " '''\n", " Input: row number\n", " Output: the TF-object type that comes closest to the interval of words that the actor reference occupies.\n", " '''\n", " slots = data['slots'][row].split()\n", " first_word = int(slots[0]) #First word of actor reference\n", " last_word = int(slots[-1]) #Last word\n", " \n", " subphrase = L.u(first_word, 'subphrase') #The subphrase to which the first word belongs\n", " phrase_atom = L.u(first_word, 'phrase_atom')[0] #The phrase atom to which the first belongs\n", " \n", " nearest_object = ''\n", " \n", " if data['reference'][row] == 'sfx': #If actor reference is a suffix, it only occupies one word slot.\n", " nearest_object = first_word\n", " \n", " elif subphrase: #If the first word belongs to a subphrase - that's not always the case.\n", " for ph in subphrase:\n", " subphrase_words = L.d(ph, 'word')\n", " #Checking if first word and last word of the actor match those of the subphrase.\n", " if subphrase_words[0] == first_word and subphrase_words[-1] == last_word:\n", " nearest_object = ph\n", " \n", " if nearest_object == '': #If nearest object has not yet been found we check the phrase atom level\n", " phrase_words = L.d(phrase_atom, 'word')\n", " if phrase_words[-1] == last_word: #If the last word of the phrase matches the last of the actor, there is a match\n", " nearest_object = phrase_atom\n", " \n", " return nearest_object\n", "\n", "#nearestObject(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A new column is added to the dataframe in which we add the nodes of the object type matching the slots occupied by the actor reference. If no object type is found, the row is added to an error_list, so possible mismatches can be identified:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of errors: 0\n" ] } ], "source": [ "error_list = []\n", "\n", "for row in data.iterrows():\n", " if data['actor'][row[0]] != '...': #Empty actors are left out because they are not relevant.\n", " nearest_object = nearestObject(row[0]) #The function nearestObject() is run for every row.\n", " if nearest_object == '':\n", " error_list.append(row[0]) #If there is no object matching the actor, the row is added to an error_list\n", " else:\n", " data['otype'][row[0]] = nearest_object #The object type node is added to a column\n", " \n", "print(f'Number of errors: {len(error_list)}')" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
surface_textbookchapterverseclause_atompredicatereferenceparticipantactorslotsfunccompoundotype
1JDBRLeviticus171528163DBRDBR3sm=JHWHJHWH63009VbPred0943175
2JHWHLeviticus171528163DBRJHWH3sm=JHWHJHWH63010Subj1943176
3>L MCHLeviticus171528163DBR>L MCH0sm=MCHMCH63011 63012Compl12943177
4L->MRLeviticus171528164>MRL >MR3sm=JHWHJHWH63013 63014VbPred3943178
5DBRLeviticus172528165DBRDBR2sm=MCH63015VbPred4943179
6>L >HRN W->L BNJW W->L KL BNJ JFR>LLeviticus172528165DBR>L >HRN W >L BN+S W >L KL BN JFR>L3pm=>HRN BN+>HRN FR>L>HRN BN >HRN BN JFR>L63016 63017 63018 63019 63020 63021 63022 6302...Compl15 6 7 8 9 10 11943180
7>L >HRN W->L BNJWLeviticus172528165DBR>L >HRN W >L BN+S......63016 63017 63018 63019 63020-paral6 7 8 9...
8>L >HRNLeviticus172528165DBR>L >HRN3sm=>HRN>HRN63016 63017-paral71317252
9>L BNJWLeviticus172528165DBR>L BN+312......63019 63020-paral8 9...
10sfx:WLeviticus172528165DBRsfx3sm=>HRN>HRN63020-gentf963020
11BNJ JFR>LLeviticus172528165DBRBN JFR>L0pm=BN JFR>LBN JFR>L63024 63025-gentf10 111317258
12JFR>LLeviticus172528165DBRJFR>LJFR>LJFR>L63025-gentf111317257
13>MRTLeviticus172528166>MR>MR2sm=MCH63027VbPred12943182
14sfx:HMLeviticus172528166>MRsfx3pm=>HRN BN+>HRN FR>L>HRN BN >HRN BN JFR>L63028Compl11363028
15ZHLeviticus172528167....ZH0sm=DBRDBR63029Subj14943184
16H-DBRLeviticus172528167....H DBR0sm=DBRDBR63030 63031PrCompl15943185
17YWHLeviticus172528168YWHYWH3sm=JHWHJHWH63033VbPred16943187
18JHWHLeviticus172528168YWHJHWH3sm=JHWHJHWH63034Subj17943188
19L->MRLeviticus172528169>MRL >MR3sm=JHWHJHWH63035 63036VbPred18943189
20>JC >JCLeviticus173528170....>JC >JC3sm=>JC >JC>JC >JC63037 6303857219 20 21943190
21>JCLeviticus173528170....>JC......63037-paral20...
22>JCLeviticus173528170....>JC0sm=>JC>JC63038-paral211317261
23M-BJT JFR>LLeviticus173528170....MN BJT JFR>L0sm=BJT JFR>LBJT JFR>L63039 63040 63041-specf22 23943191
24JFR>LLeviticus173528170....JFR>LJFR>LJFR>L63041-gentf231317263
25JCXVLeviticus173528171CXVCXV3sm=>JC >JC>JC >JC63043VbPred24943193
26CWR >W KFB >W <ZLeviticus173528171CXVCWR >W KFB >W <Z3sm=CWR KFB <ZCWR KFB <Z63044 63045 63046 63047 63048Obj125 26 27 28 29943194
27CWR >W KFBLeviticus173528171CXVCWR >W KFB......63044 63045 63046-paral26 27 28...
28CWRLeviticus173528171CXVCWR......63044-paral27...
29KFBLeviticus173528171CXVKFB......63046-paral28...
30<ZLeviticus173528171CXV<Z......63048-paral29...
..........................................
4063BRJT R>CNJMLeviticus2645529377ZKRBRJT R>CWN......68924 68925Obj14062 4063...
4064R>CNJMLeviticus2645529377ZKRR>CWN0pm=R>CWNR>CWN68925-gentf40631318757
4065HWY>TJLeviticus2645529378JY>JY>1sc=>NJJHWH68927VbPred4064946968
4066sfx:MLeviticus2645529378JY>sfx3pm=GWJC>R68928Obj1406568928
4067M->RY MYRJMLeviticus2645529378JY>MN >RY MYRJM>RY MYRJM>RY MYRJM68929 68930 68931Compl14066 4067946970
4068MYRJMLeviticus2645529378JY>MYRJM3pm=MYRJMMYRJM68931-gentf40671318759
4069L-<JNJ H-GWJMLeviticus2645529378JY>L <JN H GWJ......68932 68933 68934 68935Adjunc4068 4069...
4070H-GWJMLeviticus2645529378JY>H GWJ3pm=GWJGWJ68934 68935-gentf40691318761
4071L-HJTLeviticus2645529379HJHL HJH1sc=>NJJHWH68936 68937VbPred4070946972
4072sfx:HMLeviticus2645529379HJHsfx3pm=GWJC>R68938Compl1407168938
4073L->LHJMLeviticus2645529379HJHL >LHJM0pm=>LHJMJHWH68939 68940PrCompl4072946974
4074>NJLeviticus2645529380....>NJ1sc=>NJJHWH68941Subj4073946975
4075JHWHLeviticus2645529380....JHWH0sm=JHWHJHWH68942PrCompl4074946976
4076>LHLeviticus2646529381....>LH0pm=XQ MCPV TWRHXQ MCPV TWRH68943Subj4075946977
4077H-XQJM W-H-MCPVJM W-H-TWRTLeviticus2646529381....H XQ W H MCPV W H TWRH0pm=XQ MCPV TWRHXQ MCPV TWRH68944 68945 68946 68947 68948 68949 68950 68951PrCompl4076 4077 4078 4079 4080946978
4078H-XQJM W-H-MCPVJMLeviticus2646529381....H XQ W H MCPV......68944 68945 68946 68947 68948-paral4077 4078 4079...
4079H-XQJMLeviticus2646529381....H XQ......68944 68945-paral4078...
4080H-MCPVJMLeviticus2646529381....H MCPV......68947 68948-paral4079...
4081H-TWRTLeviticus2646529381....H TWRH......68950 68951-paral4080...
4082NTNLeviticus2646529382NTNNTN3sm=JHWHJHWH68953VbPred4081946980
4083JHWHLeviticus2646529382NTNJHWH3sm=JHWHJHWH68954Subj4082946981
4084BJNW W-BJN BNJ JFR>LLeviticus2646529382NTNBJN W BJN BN JFR>L......68955 68956 68957 68958 68959Compl14083 4084 4085 4086 4087...
4085BJNWLeviticus2646529382NTNBJN......68955-paral4084 4085...
4086sfx:WLeviticus2646529382NTNsfx3sm=JHWHJHWH68955-gentf408568955
4087BJN BNJ JFR>LLeviticus2646529382NTNBJN BN JFR>L......68957 68958 68959-paral4086 4087...
4088JFR>LLeviticus2646529382NTNJFR>L......68959-gentf4087...
4089B-HR SJNJLeviticus2646529382NTNB HR SJNJ......68960 68961 68962Locat4088 4089...
4090SJNJLeviticus2646529382NTNSJNJ......68962-gentf4089...
4091B-JD MCHLeviticus2646529382NTNB JD MCH......68963 68964 68965Adjunc4090 4091...
4092MCHLeviticus2646529382NTNMCH0sm=MCHMCH68965-gentf40911318773
\n", "

4092 rows × 13 columns

\n", "
" ], "text/plain": [ " surface_text book chapter verse \\\n", "1 JDBR Leviticus 17 1 \n", "2 JHWH Leviticus 17 1 \n", "3 >L MCH Leviticus 17 1 \n", "4 L->MR Leviticus 17 1 \n", "5 DBR Leviticus 17 2 \n", "6 >L >HRN W->L BNJW W->L KL BNJ JFR>L Leviticus 17 2 \n", "7 >L >HRN W->L BNJW Leviticus 17 2 \n", "8 >L >HRN Leviticus 17 2 \n", "9 >L BNJW Leviticus 17 2 \n", "10 sfx:W Leviticus 17 2 \n", "11 BNJ JFR>L Leviticus 17 2 \n", "12 JFR>L Leviticus 17 2 \n", "13 >MRT Leviticus 17 2 \n", "14 sfx:HM Leviticus 17 2 \n", "15 ZH Leviticus 17 2 \n", "16 H-DBR Leviticus 17 2 \n", "17 YWH Leviticus 17 2 \n", "18 JHWH Leviticus 17 2 \n", "19 L->MR Leviticus 17 2 \n", "20 >JC >JC Leviticus 17 3 \n", "21 >JC Leviticus 17 3 \n", "22 >JC Leviticus 17 3 \n", "23 M-BJT JFR>L Leviticus 17 3 \n", "24 JFR>L Leviticus 17 3 \n", "25 JCXV Leviticus 17 3 \n", "26 CWR >W KFB >W W KFB Leviticus 17 3 \n", "28 CWR Leviticus 17 3 \n", "29 KFB Leviticus 17 3 \n", "30 CNJM Leviticus 26 45 \n", "4064 R>CNJM Leviticus 26 45 \n", "4065 HWY>TJ Leviticus 26 45 \n", "4066 sfx:M Leviticus 26 45 \n", "4067 M->RY MYRJM Leviticus 26 45 \n", "4068 MYRJM Leviticus 26 45 \n", "4069 L-LHJM Leviticus 26 45 \n", "4074 >NJ Leviticus 26 45 \n", "4075 JHWH Leviticus 26 45 \n", "4076 >LH Leviticus 26 46 \n", "4077 H-XQJM W-H-MCPVJM W-H-TWRT Leviticus 26 46 \n", "4078 H-XQJM W-H-MCPVJM Leviticus 26 46 \n", "4079 H-XQJM Leviticus 26 46 \n", "4080 H-MCPVJM Leviticus 26 46 \n", "4081 H-TWRT Leviticus 26 46 \n", "4082 NTN Leviticus 26 46 \n", "4083 JHWH Leviticus 26 46 \n", "4084 BJNW W-BJN BNJ JFR>L Leviticus 26 46 \n", "4085 BJNW Leviticus 26 46 \n", "4086 sfx:W Leviticus 26 46 \n", "4087 BJN BNJ JFR>L Leviticus 26 46 \n", "4088 JFR>L Leviticus 26 46 \n", "4089 B-HR SJNJ Leviticus 26 46 \n", "4090 SJNJ Leviticus 26 46 \n", "4091 B-JD MCH Leviticus 26 46 \n", "4092 MCH Leviticus 26 46 \n", "\n", " clause_atom predicate reference \\\n", "1 528163 DBR DBR \n", "2 528163 DBR JHWH \n", "3 528163 DBR >L MCH \n", "4 528164 >MR L >MR \n", "5 528165 DBR DBR \n", "6 528165 DBR >L >HRN W >L BN+S W >L KL BN JFR>L \n", "7 528165 DBR >L >HRN W >L BN+S \n", "8 528165 DBR >L >HRN \n", "9 528165 DBR >L BN+312 \n", "10 528165 DBR sfx \n", "11 528165 DBR BN JFR>L \n", "12 528165 DBR JFR>L \n", "13 528166 >MR >MR \n", "14 528166 >MR sfx \n", "15 528167 .... ZH \n", "16 528167 .... H DBR \n", "17 528168 YWH YWH \n", "18 528168 YWH JHWH \n", "19 528169 >MR L >MR \n", "20 528170 .... >JC >JC \n", "21 528170 .... >JC \n", "22 528170 .... >JC \n", "23 528170 .... MN BJT JFR>L \n", "24 528170 .... JFR>L \n", "25 528171 CXV CXV \n", "26 528171 CXV CWR >W KFB >W W KFB \n", "28 528171 CXV CWR \n", "29 528171 CXV KFB \n", "30 528171 CXV CWN \n", "4064 529377 ZKR R>CWN \n", "4065 529378 JY> JY> \n", "4066 529378 JY> sfx \n", "4067 529378 JY> MN >RY MYRJM \n", "4068 529378 JY> MYRJM \n", "4069 529378 JY> L H GWJ \n", "4071 529379 HJH L HJH \n", "4072 529379 HJH sfx \n", "4073 529379 HJH L >LHJM \n", "4074 529380 .... >NJ \n", "4075 529380 .... JHWH \n", "4076 529381 .... >LH \n", "4077 529381 .... H XQ W H MCPV W H TWRH \n", "4078 529381 .... H XQ W H MCPV \n", "4079 529381 .... H XQ \n", "4080 529381 .... H MCPV \n", "4081 529381 .... H TWRH \n", "4082 529382 NTN NTN \n", "4083 529382 NTN JHWH \n", "4084 529382 NTN BJN W BJN BN JFR>L \n", "4085 529382 NTN BJN \n", "4086 529382 NTN sfx \n", "4087 529382 NTN BJN BN JFR>L \n", "4088 529382 NTN JFR>L \n", "4089 529382 NTN B HR SJNJ \n", "4090 529382 NTN SJNJ \n", "4091 529382 NTN B JD MCH \n", "4092 529382 NTN MCH \n", "\n", " participant actor \\\n", "1 3sm=JHWH JHWH \n", "2 3sm=JHWH JHWH \n", "3 0sm=MCH MCH \n", "4 3sm=JHWH JHWH \n", "5 2sm= MCH \n", "6 3pm=>HRN BN+>HRN FR>L >HRN BN >HRN BN JFR>L \n", "7 ... ... \n", "8 3sm=>HRN >HRN \n", "9 ... ... \n", "10 3sm=>HRN >HRN \n", "11 0pm=BN JFR>L BN JFR>L \n", "12 JFR>L JFR>L \n", "13 2sm= MCH \n", "14 3pm=>HRN BN+>HRN FR>L >HRN BN >HRN BN JFR>L \n", "15 0sm=DBR DBR \n", "16 0sm=DBR DBR \n", "17 3sm=JHWH JHWH \n", "18 3sm=JHWH JHWH \n", "19 3sm=JHWH JHWH \n", "20 3sm=>JC >JC >JC >JC \n", "21 ... ... \n", "22 0sm=>JC >JC \n", "23 0sm=BJT JFR>L BJT JFR>L \n", "24 JFR>L JFR>L \n", "25 3sm=>JC >JC >JC >JC \n", "26 3sm=CWR KFB CWN R>CWN \n", "4065 1sc=>NJ JHWH \n", "4066 3pm=GWJ C>R \n", "4067 >RY MYRJM >RY MYRJM \n", "4068 3pm=MYRJM MYRJM \n", "4069 ... ... \n", "4070 3pm=GWJ GWJ \n", "4071 1sc=>NJ JHWH \n", "4072 3pm=GWJ C>R \n", "4073 0pm=>LHJM JHWH \n", "4074 1sc=>NJ JHWH \n", "4075 0sm=JHWH JHWH \n", "4076 0pm=XQ MCPV TWRH XQ MCPV TWRH \n", "4077 0pm=XQ MCPV TWRH XQ MCPV TWRH \n", "4078 ... ... \n", "4079 ... ... \n", "4080 ... ... \n", "4081 ... ... \n", "4082 3sm=JHWH JHWH \n", "4083 3sm=JHWH JHWH \n", "4084 ... ... \n", "4085 ... ... \n", "4086 3sm=JHWH JHWH \n", "4087 ... ... \n", "4088 ... ... \n", "4089 ... ... \n", "4090 ... ... \n", "4091 ... ... \n", "4092 0sm=MCH MCH \n", "\n", " slots func \\\n", "1 63009 VbPred \n", "2 63010 Subj \n", "3 63011 63012 Compl1 \n", "4 63013 63014 VbPred \n", "5 63015 VbPred \n", "6 63016 63017 63018 63019 63020 63021 63022 6302... Compl1 \n", "7 63016 63017 63018 63019 63020 -paral \n", "8 63016 63017 -paral \n", "9 63019 63020 -paral \n", "10 63020 -gentf \n", "11 63024 63025 -gentf \n", "12 63025 -gentf \n", "13 63027 VbPred \n", "14 63028 Compl1 \n", "15 63029 Subj \n", "16 63030 63031 PrCompl \n", "17 63033 VbPred \n", "18 63034 Subj \n", "19 63035 63036 VbPred \n", "20 63037 63038 572 \n", "21 63037 -paral \n", "22 63038 -paral \n", "23 63039 63040 63041 -specf \n", "24 63041 -gentf \n", "25 63043 VbPred \n", "26 63044 63045 63046 63047 63048 Obj1 \n", "27 63044 63045 63046 -paral \n", "28 63044 -paral \n", "29 63046 -paral \n", "30 63048 -paral \n", "... ... ... \n", "4063 68924 68925 Obj1 \n", "4064 68925 -gentf \n", "4065 68927 VbPred \n", "4066 68928 Obj1 \n", "4067 68929 68930 68931 Compl1 \n", "4068 68931 -gentf \n", "4069 68932 68933 68934 68935 Adjunc \n", "4070 68934 68935 -gentf \n", "4071 68936 68937 VbPred \n", "4072 68938 Compl1 \n", "4073 68939 68940 PrCompl \n", "4074 68941 Subj \n", "4075 68942 PrCompl \n", "4076 68943 Subj \n", "4077 68944 68945 68946 68947 68948 68949 68950 68951 PrCompl \n", "4078 68944 68945 68946 68947 68948 -paral \n", "4079 68944 68945 -paral \n", "4080 68947 68948 -paral \n", "4081 68950 68951 -paral \n", "4082 68953 VbPred \n", "4083 68954 Subj \n", "4084 68955 68956 68957 68958 68959 Compl1 \n", "4085 68955 -paral \n", "4086 68955 -gentf \n", "4087 68957 68958 68959 -paral \n", "4088 68959 -gentf \n", "4089 68960 68961 68962 Locat \n", "4090 68962 -gentf \n", "4091 68963 68964 68965 Adjunc \n", "4092 68965 -gentf \n", "\n", " compound otype \n", "1 0 943175 \n", "2 1 943176 \n", "3 2 943177 \n", "4 3 943178 \n", "5 4 943179 \n", "6 5 6 7 8 9 10 11 943180 \n", "7 6 7 8 9 ... \n", "8 7 1317252 \n", "9 8 9 ... \n", "10 9 63020 \n", "11 10 11 1317258 \n", "12 11 1317257 \n", "13 12 943182 \n", "14 13 63028 \n", "15 14 943184 \n", "16 15 943185 \n", "17 16 943187 \n", "18 17 943188 \n", "19 18 943189 \n", "20 19 20 21 943190 \n", "21 20 ... \n", "22 21 1317261 \n", "23 22 23 943191 \n", "24 23 1317263 \n", "25 24 943193 \n", "26 25 26 27 28 29 943194 \n", "27 26 27 28 ... \n", "28 27 ... \n", "29 28 ... \n", "30 29 ... \n", "... ... ... \n", "4063 4062 4063 ... \n", "4064 4063 1318757 \n", "4065 4064 946968 \n", "4066 4065 68928 \n", "4067 4066 4067 946970 \n", "4068 4067 1318759 \n", "4069 4068 4069 ... \n", "4070 4069 1318761 \n", "4071 4070 946972 \n", "4072 4071 68938 \n", "4073 4072 946974 \n", "4074 4073 946975 \n", "4075 4074 946976 \n", "4076 4075 946977 \n", "4077 4076 4077 4078 4079 4080 946978 \n", "4078 4077 4078 4079 ... \n", "4079 4078 ... \n", "4080 4079 ... \n", "4081 4080 ... \n", "4082 4081 946980 \n", "4083 4082 946981 \n", "4084 4083 4084 4085 4086 4087 ... \n", "4085 4084 4085 ... \n", "4086 4085 68955 \n", "4087 4086 4087 ... \n", "4088 4087 ... \n", "4089 4088 4089 ... \n", "4090 4089 ... \n", "4091 4090 4091 ... \n", "4092 4091 1318773 \n", "\n", "[4092 rows x 13 columns]" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2. Computing edges\n", "\n", "Next, we need to find the edges between co-referring actor references. The principle is to find a list of nodes from the 'otype' column referring to the same actor.\n", "\n", "First, we define a function that takes an actor reference and a chapter as input and produces a list of all nodes referring to the same actor based on the dataframe:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "scrolled": true }, "outputs": [], "source": [ "def mapActors(actor, chapter):\n", " '''\n", " Input: Actor reference (string) and chapter (int)\n", " Output: List of nodes with the same actor reference in that particular chapter\n", " '''\n", " subset_data = data[(data.chapter == str(chapter)) & (data.actor == actor)]\n", " otype = subset_data.otype.values.tolist()\n", " \n", " return otype\n", "\n", "#mapActors('>X BN JFR>L',25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function above is used to create a dictionary of edges for all actor references in the dataset:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "edges_dict = {}\n", "\n", "for row in data.iterrows():\n", " row = row[0]\n", " actor = data['actor'][row]\n", " if actor != '...': #Excluding rows with empty actors \n", " edges = mapActors(actor, data['chapter'][row]) #A list of edges is created with the function mapActors() for each row\n", " edges.remove(data['otype'][row]) #The present row number is removed from the edges list to avoid redundancy.\n", " \n", " #If a set can be made from the edges list, the set is added to the dictionary with the row otype node as key:\n", " if set(edges):\n", " edges_dict[data['otype'][row]] = set(edges)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "#edges_dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Exporting as TF features\n", "\n", "Now, we can export the actor references and the edge dictionary as TF-features. First, we assign TF version names and paths to ensure the right storage of the features:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "if 'SCRIPT' not in locals():\n", " SCRIPT = False\n", " FORCE = True\n", " CORE_NAME = 'bhsa'\n", " NAME = 'actor'\n", " VERSION= 'c'\n", " CORE_MODULE = 'core'" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "repoBase = os.path.expanduser('~/text-fabric-data/etcbc')\n", "coreTf = '{}/{}/tf/{}'.format(repoBase, CORE_NAME, VERSION) #Path of the core TF datasets\n", "thisTf = '~Feature_sets/{}/tf/{}'.format(NAME, VERSION) #Path of actor datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1. Export lexical and suffix references:\n", "\n", "To store the lexical and suffix references as TF-features the information needs to be stored in dictionaries. This is done by looping through the dataset and distinguishing between suffix references and non-suffix references and storing the TF node in the relevant dictionary:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "suffix_dict = {}\n", "lexical_dict = {}\n", "\n", "for row in data.iterrows():\n", " row = row[0]\n", " if data['actor'][row] != '...': #Excluding empty actors\n", " if data['reference'][row] == 'sfx': #If suffix, the word node (always only one word node) is stored in the dict.\n", " node = int(data['slots'][row])\n", " suffix_dict[node] = data['actor'][row]\n", " else: #If not suffix, the nearest object is found using nearestObject() and the resulting node is stored. \n", " node = nearestObject(row)\n", " lexical_dict[node] = data['actor'][row]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.1.1. Export lexical references" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | 0.10s T actor to actor/tf/c\n" ] } ], "source": [ "nodeFeatures = dict(actor=lexical_dict)\n", "metaData = dict(\n", " actor=dict(\n", " valueType='str',\n", " description=\"Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\",\n", " coreData='BHSA',\n", " coreVersion=VERSION\n", " )\n", ")\n", "TF.save(nodeFeatures=nodeFeatures, metaData=metaData, module='c')" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | 0.03s T prs_actor to actor/tf/c\n" ] } ], "source": [ "nodeFeatures = dict(prs_actor=suffix_dict)\n", "metaData = dict(\n", " prs_actor=dict(\n", " valueType='str',\n", " description=\"Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\",\n", " coreData='BHSA',\n", " coreVersion=VERSION\n", " )\n", ")\n", "TF.save(nodeFeatures=nodeFeatures, metaData=metaData, module='c')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2. Export edge features" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | 0.13s T coref to actor/tf/c\n" ] } ], "source": [ "edgeFeatures = dict(coref=edges_dict)\n", "metaData = dict(\n", " coref=dict(\n", " valueType='str',\n", " description=\"Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491\",\n", " coreData='BHSA',\n", " coreVersion=VERSION\n", " )\n", ")\n", "TF.save(edgeFeatures=edgeFeatures, metaData=metaData, module='c')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }