{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Social Network Analysis of Leviticus 17-26\n", "\n", "This notebook combines the participant references and semantic roles computed in other phases of this research project. The two datatypes are combined to create a social network model of the data and to explore this model by social network analytical tools. The first SNA-measures are given in this notobook, while more detailed studies of participant roles are reserved for other notebooks in this repo.\n", "\n", "**Content**\n", "1. Import of data\n", "2. Cross-tabulating participant and semantic roles\n", "3. Creation of network model\n", "4. Validation of the model\n", "5. First social network analyses" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: uncompiled fa2util module. Compile with cython for a 10-100x speed boost.\n" ] } ], "source": [ "#Dataset path\n", "PATH = 'datasets/'\n", "\n", "import csv, collections, html\n", "from operator import itemgetter\n", "import pandas as pd\n", "import numpy as np\n", "import scipy\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from adjustText import adjust_text\n", "import networkx as nx\n", "import forceatlas2\n", "import random" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Import data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: C:\\Users\\Ejer/text-fabric-data/annotation/app-bhsa/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: C:\\Users\\Ejer/text-fabric-data/etcbc/bhsa/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: C:\\Users\\Ejer/text-fabric-data/etcbc/phono/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: C:\\Users\\Ejer/text-fabric-data/etcbc/parallels/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: C:\\Users\\Ejer/text-fabric-data/etcbc/heads/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 8.3.3, app-bhsa, Search Reference
Data: BHSA, Character table, Feature docs
Features:
etcbc/heads/tfsem_set
head
nhead
obj_prep
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensisbook
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_lex
g_lex_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
prs
prs_gn
prs_nu
prs_ps
ps
qere
qere_trailer
qere_trailer_utf8
qere_utf8
rank_lex
rela
sp
st
tab
trailer
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
voc_lex
voc_lex_utf8
vs
vt
mother
oslots
Parallel Passagescrossref
Phonetic Transcriptionsphono
phono_trailer
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Importing the Hebrew data and Text-Fabric\n", "from tf.app import use\n", "A = use('bhsa', hoist=globals(), mod='etcbc/heads/tf')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.a Import of participant reference data:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
participantrefs
0JHWH944128 946176 946179 946182 946184 944142 9441...
1MCH=945152 945537 945155 945540 945547 945555 9449...
2>HRN944640 944641 65555 944662 65561 944666 944667...
3BN JFR>L67584 944132 944133 944139 946216 946217 94417...
4>JC >JC945664 64514 945666 945668 944135 944136 94567...
\n", "
" ], "text/plain": [ " participant refs\n", "0 JHWH 944128 946176 946179 946182 946184 944142 9441...\n", "1 MCH= 945152 945537 945155 945540 945547 945555 9449...\n", "2 >HRN 944640 944641 65555 944662 65561 944666 944667...\n", "3 BN JFR>L 67584 944132 944133 944139 946216 946217 94417...\n", "4 >JC >JC 945664 64514 945666 945668 944135 944136 94567..." ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(f'{PATH}participants_FINAL.csv')\n", "df.columns = ['participant','refs']\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The references are transformed to lists and their respective frequencies in the corpus are counted " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "ref_list = []\n", "participant_freq = []\n", "\n", "for row in df.iterrows():\n", " refs = [int(r) for r in row[1].refs.split()]\n", " ref_list.append(refs)\n", " participant_freq.append(len(refs))\n", " \n", "df.insert(2, 'ref_list', ref_list)\n", "df.insert(3, 'freq', participant_freq)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
participantrefsref_listfreq
0JHWH944128 946176 946179 946182 946184 944142 9441...[944128, 946176, 946179, 946182, 946184, 94414...476
1MCH=945152 945537 945155 945540 945547 945555 9449...[945152, 945537, 945155, 945540, 945547, 94555...60
2>HRN944640 944641 65555 944662 65561 944666 944667...[944640, 944641, 65555, 944662, 65561, 944666,...164
3BN JFR>L67584 944132 944133 944139 946216 946217 94417...[67584, 944132, 944133, 944139, 946216, 946217...579
4>JC >JC945664 64514 945666 945668 944135 944136 94567...[945664, 64514, 945666, 945668, 944135, 944136...277
\n", "
" ], "text/plain": [ " participant refs \\\n", "0 JHWH 944128 946176 946179 946182 946184 944142 9441... \n", "1 MCH= 945152 945537 945155 945540 945547 945555 9449... \n", "2 >HRN 944640 944641 65555 944662 65561 944666 944667... \n", "3 BN JFR>L 67584 944132 944133 944139 946216 946217 94417... \n", "4 >JC >JC 945664 64514 945666 945668 944135 944136 94567... \n", "\n", " ref_list freq \n", "0 [944128, 946176, 946179, 946182, 946184, 94414... 476 \n", "1 [945152, 945537, 945155, 945540, 945547, 94555... 60 \n", "2 [944640, 944641, 65555, 944662, 65561, 944666,... 164 \n", "3 [67584, 944132, 944133, 944139, 946216, 946217... 579 \n", "4 [945664, 64514, 945666, 945668, 944135, 944136... 277 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of participants: 75\n" ] } ], "source": [ "print(f'Number of participants: {len(df)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two functions fetch the participant label from any given word or phrase in the text." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def getLabel(ref, df=df):\n", " '''\n", " This function fetches the actor/participant reference from the participant dataframe.\n", " '''\n", " \n", " actor_list = []\n", " \n", " for row in df.iterrows():\n", " if ref in row[1].ref_list:\n", " actor_list.append(row[1].participant)\n", " \n", " return actor_list\n", "\n", "def Actor(ref, df=df):\n", " '''\n", " This function takes a reference as input and returns the participant label. Phrases are treated differently, becuase \n", " non-verbal phrases require additional measures to find the nominal head of the phrase and return the label for that \n", " particular constituent.\n", " '''\n", " \n", " nom_head = E.nhead.t(ref) #Finding the nominal head(s) of the phrase\n", " \n", " if F.otype.v(ref) == 'word': #Identifying object suffixes\n", " return getLabel(ref, df=df)\n", " \n", " elif F.typ.v(ref) == 'VP':\n", " return getLabel(L.d(ref, 'phrase_atom')[0], df=df)\n", " \n", " elif F.typ.v(ref) == 'PP':\n", " if len(nom_head) > 1:\n", " return getLabel(L.d(ref, 'phrase_atom')[0], df=df)\n", " if nom_head != E.head.t(ref): #If equal, the reference is a simple preposition with a suffix\n", " return getLabel(L.u(nom_head[0], 'phrase_atom')[0], df=df)\n", " else:\n", " if getLabel(E.head.t(ref)[0], df=df):\n", " return getLabel(E.head.t(ref)[0], df=df)\n", " else:\n", " return getLabel(L.u(nom_head[0], 'phrase_atom')[0], df=df)\n", " \n", " elif F.typ.v(ref) in {'NP','PrNP','PPrP','DPrP','CP'}:\n", " return getLabel(L.u(nom_head[0], 'phrase_atom')[0], df=df)\n", " \n", " else:\n", " return \"error\"\n", "\n", "#Actor(65418)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1.b Import agency ranks of participants" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VolInstAffnegrolenew_rolenew_rankrank
688348yynNaNAgentAgent55
688349ynyNaNVolitional UndergoerVolitional Undergoer-1-1
688350yynNaNAgentAgent55
688351yynNaNAgentAgent55
688352ynyNaNVolitional UndergoerVolitional Undergoer-1-1
\n", "
" ], "text/plain": [ " Vol Inst Aff neg role new_role \\\n", "688348 y y n NaN Agent Agent \n", "688349 y n y NaN Volitional Undergoer Volitional Undergoer \n", "688350 y y n NaN Agent Agent \n", "688351 y y n NaN Agent Agent \n", "688352 y n y NaN Volitional Undergoer Volitional Undergoer \n", "\n", " new_rank rank \n", "688348 5 5 \n", "688349 -1 -1 \n", "688350 5 5 \n", "688351 5 5 \n", "688352 -1 -1 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ranks_df = pd.read_csv(f'{PATH}role_ranks.csv', index_col=0)\n", "ranks_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function is defined to return the agency of any given reference" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def Agency(ref, colname, df=ranks_df):\n", " \n", " if ref in list(df.index):\n", " return df[df.index == ref][colname].item()\n", "\n", "#Agency(68032, 'new_rank')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Cross-tabulating participants and roles\n", "\n", "This section cross-tabulates the participant and role data to calculate the mean agency of each participant." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "actor_list = [Actor(ph) for ph in list(ranks_df.index)]\n", "ranks_df.insert(8, 'Actor', actor_list) #The actor is inserted as a new column" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VolInstAffnegrolenew_rolenew_rankrankActor
688348yynNaNAgentAgent55[JHWH]
688349ynyNaNVolitional UndergoerVolitional Undergoer-1-1[MCH=]
688350yynNaNAgentAgent55[JHWH]
688351yynNaNAgentAgent55[MCH=]
688352ynyNaNVolitional UndergoerVolitional Undergoer-1-1[>HRN, BN JFR>L, BN >HRN]
\n", "
" ], "text/plain": [ " Vol Inst Aff neg role new_role \\\n", "688348 y y n NaN Agent Agent \n", "688349 y n y NaN Volitional Undergoer Volitional Undergoer \n", "688350 y y n NaN Agent Agent \n", "688351 y y n NaN Agent Agent \n", "688352 y n y NaN Volitional Undergoer Volitional Undergoer \n", "\n", " new_rank rank Actor \n", "688348 5 5 [JHWH] \n", "688349 -1 -1 [MCH=] \n", "688350 5 5 [JHWH] \n", "688351 5 5 [MCH=] \n", "688352 -1 -1 [>HRN, BN JFR>L, BN >HRN] " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ranks_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cross-tabulation of the data to count how often each participant obtains a certain agency level:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
54310-1-2
JHWH118018293017
MCH=360101190
>HRN160113111910
BN JFR>L9904472288331
BN >HRN1606225175
\n", "
" ], "text/plain": [ " 5 4 3 1 0 -1 -2\n", "JHWH 118 0 1 8 29 30 17\n", "MCH= 36 0 1 0 1 19 0\n", ">HRN 16 0 11 31 1 19 10\n", "BN JFR>L 99 0 44 72 28 83 31\n", "BN >HRN 16 0 6 22 5 17 5" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dic = collections.defaultdict(lambda: collections.defaultdict(int))\n", "\n", "for row in ranks_df.iterrows():\n", " for n in row[1].Actor:\n", " dic[n][row[1].new_rank] += 1\n", " \n", "agency_df = pd.DataFrame(dic).fillna(0).astype('Int64').T\n", "agency_df = agency_df[[5,4,3,1,0,-1,-2]]\n", "agency_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mean agency is calculated" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "agency_mean = []\n", "\n", "for row in agency_df.iterrows():\n", " n=0\n", " total = 0\n", " for v in row[1]:\n", " total += (v * agency_df.columns[n])\n", " n+=1\n", " agency_mean.append(round(total/row[1].sum(), 3))\n", " \n", "agency_df.insert(7, 'mean', agency_mean)\n", "\n", "#Inserting labels\n", "labels = [label_gloss[l] if l in label_gloss else l for l in list(agency_df.index)]\n", "agency_df.insert(0, 'label', labels)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
label54310-1-2mean
MCH=Moses3601011902.877
JHWHYHWH1180182930172.645
>JC >JCan_Israelite60022746382.124
2ms2msg21010578821.698
BN JFR>LIsraelites99044722883311.552
GRsojourner450165139381.532
BN >HRNAaron's_sons16062251751.310
>HRNAaron1601131119101.193
>X -2msbrother110311610130.537
HMremnants324025130.138
<Mforeign_nations30105310-0.227
\n", "
" ], "text/plain": [ " label 5 4 3 1 0 -1 -2 mean\n", "MCH= Moses 36 0 1 0 1 19 0 2.877\n", "JHWH YHWH 118 0 1 8 29 30 17 2.645\n", ">JC >JC an_Israelite 60 0 22 7 4 6 38 2.124\n", "2ms 2msg 21 0 10 57 8 8 2 1.698\n", "BN JFR>L Israelites 99 0 44 72 28 83 31 1.552\n", "GR sojourner 45 0 16 5 13 9 38 1.532\n", "BN >HRN Aaron's_sons 16 0 6 22 5 17 5 1.310\n", ">HRN Aaron 16 0 11 31 1 19 10 1.193\n", ">X -2ms brother 11 0 3 1 16 10 13 0.537\n", "HM remnants 3 2 4 0 2 5 13 0.138\n", " 20]\n", "agency_df.sort_values(by='mean', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Creating nodes and edges\n", "\n", "The network model combines participant data and semantic roles. The primary principle is to isolate those clauses where at least two participants occur (they can be identical) which means that isolated participants are ignored. Secondly, the edges are made from the participant with the highest agency level toward the participant with the lowest agency level within the same clause. We can assume that the participant with the highest agency level is also most active in the event and therefore the source of the event." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": true }, "outputs": [], "source": [ "def createEdges(colname, df=df, ranks_df=ranks_df, verb_list = [], relation='function', label_text='gloss', mode=str()):\n", " '''\n", " Input: dictionary of actors + nodes (references), plus preferred text type, that is, English gloss (default)\n", " or transcription of the Hebrew lexeme (= trans)\n", " colname is name of the rank column (usually \"rank\" or \"new_rank\")\n", " Output: dictionary of edges and labels\n", " '''\n", " \n", " error_list = []\n", " \n", " #Finding intersection between nodes\n", " clause_node_list = []\n", " for i, row in df.iterrows():\n", " refs = [int(r) for r in row.refs.split()]\n", " clause_node_list += list(set([L.u(n, 'clause')[0] for n in refs]))\n", " \n", " #Intersections are calculated by counting the frequency of unique clauses. If a clause appears more than once, there is\n", " #an intersection\n", " counter = collections.Counter(clause_node_list)\n", " intersection = [n for n in counter if counter[n] > 1]\n", " \n", " edges = []\n", " \n", " if intersection:\n", " \n", " for cl in intersection: #Looping over clauses with intersecting actors\n", " \n", " clause_inventory = []\n", " pred = False\n", " \n", " for ph in L.d(cl, 'phrase'):\n", " ph_info = {}\n", " sfx_info = {} #Directory for object suffixes\n", " \n", " rank = Agency(ph, colname, ranks_df)\n", " \n", " #Get verb gloss if Predicate\n", " if F.function.v(ph) in {'Pred','PreS','PreO','PtcO','PreC'}:\n", " pred = True\n", " \n", " #Finding verb gloss:\n", " for w in L.d(ph, 'word'):\n", " if F.sp.v(w) == 'verb':\n", " pred_gloss, pred_lex = F.gloss.v(L.u(w, 'lex')[0]), F.lex.v(w)\n", " \n", " #If the phrase is annotated with a rank (agency), it is fetched.\n", " if rank or rank == 0:\n", " \n", " ph_info['ref'] = ph\n", " ph_info['function'] = F.function.v(ph)\n", " ph_info['rank'] = rank\n", " \n", " clause_inventory.append(ph_info)\n", " \n", " #If object suffix, the suffix info is stored separately and added to the clause inventory\n", " if F.function.v(ph) in {'PreO','PtcO'}:\n", " for w in L.d(ph, 'word'):\n", " if F.sp.v(w) == 'verb' and (Agency(w, colname, ranks_df) or Agency(w, colname, ranks_df) == 0):\n", " sfx_info['ref'] = w\n", " sfx_info['function'] = F.function.v(ph)\n", " sfx_info['rank'] = Agency(w, colname, ranks_df)\n", " \n", " clause_inventory.append(sfx_info)\n", " \n", " if pred == True and pred_lex!= 'HJH[' and len(clause_inventory) > 1:\n", " ranked = sorted(clause_inventory, key=itemgetter('rank'), reverse = True) \n", " \n", " #Getting Actor and labels\n", " Actor_ref = ranked[0]['ref']\n", " Actor_rank = ranked[0]['rank']\n", " Actors = Actor(Actor_ref, df=df) #A list of Actors\n", " \n", " if Actors == 'error':\n", " error_list.append((cl, Actor_ref))\n", " \n", " #Creating edges from Actor to Undergoer(s)\n", " for Undergoer in ranked[1:]:\n", " Undergoer_ref = Undergoer['ref']\n", " Undergoer_rank = Undergoer['rank']\n", " Undergoers = Actor(Undergoer_ref, df=df)\n", " \n", " if Undergoers == 'error':\n", " error_list.append((cl, Undergoer_ref))\n", " \n", " if (Actors and Undergoers) and (Undergoers != 'error') and (Actors != 'error'):\n", " for A in Actors:\n", " for U in Undergoers:\n", " \n", " if mode == 'one-mode':\n", " edge = (A, Actor_ref, Actor_rank, U, Undergoer_ref, Undergoer_rank, pred_gloss, cl)\n", " edges.append(edge)\n", " elif mode == 'two-mode':\n", " Actor_edge = (A, Actor_ref, Actor_rank, pred_gloss, cl)\n", " Undergoer_edge = (pred_gloss, U, Undergoer_ref, Undergoer_rank, cl)\n", " edges.append(Actor_edge), edges.append(Undergoer_edge)\n", " else:\n", " print(\"You need to specify mode\")\n", " \n", " return edges, error_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two models are created to account for two versions of the agency data. The 'old' data does not account negations in the clause, while the 'new' data involves a recalculation of the agency (NB: the recalculation is done in another notebook)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "482\n" ] } ], "source": [ "old_edges = createEdges(colname='rank',df=df, mode='one-mode')\n", "print(len(old_edges[0]))\n", "\n", "#With new ranks because of negatives (e.g. Agent -> Frustrative)\n", "new_edges = createEdges(colname='new_rank',df=df,mode='one-mode')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Explore errors:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "errors = old_edges[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for e in errors:\n", " A.pretty(e[0], highlights={e[1]:'gold'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both errors concern adverbial phrases, both referring to a location, so they are not important." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will remove edges for which both the Actor and Undergoer i 0 (Neutral) in Agency. In these cases, there is no interaction so those relations are not important:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def removeNeutral(edge_list):\n", " upd_edge_list = []\n", " \n", " for e in edge_list[0]:\n", " Actor_rank = e[2]\n", " Undergoer_rank = e[5]\n", " \n", " if Actor_rank == 0 and Undergoer_rank == 0:\n", " continue\n", " else:\n", " upd_edge_list.append(e)\n", " \n", " return upd_edge_list\n", " \n", "old_edges = removeNeutral(old_edges)\n", "new_edges = removeNeutral(new_edges)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Validation and export of the network model\n", "\n", "#### 4a. Validation\n", "\n", "Before the final export the edges need review. Several issues need validation:\n", "\n", "* Are all relevant clauses included?\n", "* Are the participants annotated correctly?\n", "* Are the roles annotated correctly?\n", "\n", "The review is carried out manually but assisted by an interface and colorcoding. 'Green' signals that the clause is included in the network, 'salmon' signals absence." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "first_verse = T.nodeFromSection(('Leviticus',17,1))\n", "last_verse = T.nodeFromSection(('Leviticus',26,46))\n", "\n", "clauses = range(L.d(first_verse, 'clause')[0], L.d(last_verse, 'clause')[0]+2)\n", "verbal_clauses = []\n", "for cl in clauses:\n", " pred = False\n", " for ph in L.d(cl, 'phrase'):\n", " if F.function.v(ph) in {'Pred','PreS','PreO','PtcO','PreC'}:\n", " pred = True\n", " for w in L.d(ph, 'word'):\n", " if F.sp.v(w) == 'verb' and F.lex.v(w) != 'HJH[':\n", " verbal_clauses.append(cl)\n", "\n", "print(f'Number of clauses to review: {len(verbal_clauses)}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def validate(clauses, edges, n):\n", " print(f'Nr {n}: {clauses[n]}')\n", " \n", " df = pd.DataFrame(edges)\n", " edge_clauses = list(df[7])\n", " \n", " if clauses[n] in edge_clauses:\n", " subset = df[df[7] == clauses[n]]\n", " \n", " for i, row in subset.iterrows():\n", " print(f'Actor: {row[0]} - Agency: {row[2]}')\n", " print(f'Undergoer: {row[3]} - Agency: {row[5]}\\n')\n", " \n", " A.pretty(clauses[n], highlights={clauses[n]:'lightgreen'})\n", " \n", " else:\n", " A.pretty(clauses[n], highlights={clauses[n]:'salmon'})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n=0" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "validate(verbal_clauses, old_edges, n)\n", "n+=1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### -----Update: All corrections made -----\n", "\n", "\n", "\n", "**Lev 17**\n", "* BN >HRN added to participant: Need to be listed as part-whole relations across the entire text\n", "* 'MN QRB/ JC >JC' added to Nodes\n", "* KHN added to roles\n", "* 'FL' and 'B TWK/ -BJT JFR>L#2' added to Nodes\n", "* L KM added to Roles\n", "\n", "**Lev 18**\n", "* 'CH >B -2ms' added to Nodes\n", "* 'CH >X -2ms' added to Nodes\n", "* 'CH#2' corrected in Nodes\n", "* 'MN QRB/ L' added in Nodes\n", " \n", "**Lev 19**\n", "* '>B >JC' and >M >JC added to participants: Need to be listed as part-whole relations across the entire text\n", "* 'T CM/ >LHJM/ -2ms' added to Nodes\n", "* 'XRC=/' added to Nodes\n", "* 'PNH/ DL/' and 'PNH/ GDWL/' added to Nodes\n", "* '>T BN/ T CM/ QDC/ -JHWH' added to Nodes\n", "* '>JC' changed role\n", "* '>T >CH/ >JC/' changed role\n", "* 'MN QRB/ JC', 'MN QRB/ CH_2' added in participants\n", "* BN JFR>L added in participants: Need to be listed as part-whole relations across the entire text\n", "\n", "**Lev 22**\n", "* NPC_2 added in participants (Aaron's offspring)\n", "* NPC_3 added in participants (A chattel-slave)\n", "* Make sure that the compound reference \">HRN BN >HRN\" is only deleted if the references have been succesfully distributed to either >HRN or BN >HRN\n", "* BN JFR>L changed in participants\n", "* Hypernyms accross the text: >JC (top-level) refers both to GR and a native\n", "* '>JC#2' added to Nodes: Refers to \"any man\" within the household of the priest\n", "\n", "**Lev 23**\n", "* Hypernyms accross the text: 'L H CH changed in Affectedness\n", "* 3mp removed from Participants (one instance)\n", "* JHWH changed in Affectedness\n", "\n", "**Lev 25**\n", "* When 'MH -2ms' is removed (because it is a hypernym) the participants are missing. Hypernyms need to be constructed on top-level before removal.\n", "* Skip clauses with HJH? They are not interactions\n", "* 'JD ->JC', 'JD ->X -2ms' and 'JD GR TWCB' added to Nodes (as synonyms)\n", "* '>X -2ms' changed in Affectedness\n", "* GR TWCB#2 has been changed in Participants to distinguish \"your brother\" from \"foreigners\" although they are sometimes given the same label.\n", "\n", "**Lev 26**\n", "* NPL changed in Akstionsart to stative\n", "* XMC changed in Nodes from XMC/\n", "* PNH changed in Affectedness\n", "* 'L PNH/ >JB[ -\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012new_rank_Actor345new_rank_Undergoer67
0BN >HRN69034355JHWH69034700swing440323
1JHWH69038355MCH=690384-1-1speak440335
2BN JFR>L69039755JHWH690399-1-1approach440341
3JHWH69040255MCH=690403-1-1speak440342
4BN JFR>L69041555JHWH690417-1-1approach440347
\n", "" ], "text/plain": [ " 0 1 2 new_rank_Actor 3 4 5 new_rank_Undergoer \\\n", "0 BN >HRN 690343 5 5 JHWH 690347 0 0 \n", "1 JHWH 690383 5 5 MCH= 690384 -1 -1 \n", "2 BN JFR>L 690397 5 5 JHWH 690399 -1 -1 \n", "3 JHWH 690402 5 5 MCH= 690403 -1 -1 \n", "4 BN JFR>L 690415 5 5 JHWH 690417 -1 -1 \n", "\n", " 6 7 \n", "0 swing 440323 \n", "1 speak 440335 \n", "2 approach 440341 \n", "3 speak 440342 \n", "4 approach 440347 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "old_df = pd.DataFrame(old_edges)\n", "old_df.insert(3, 'new_rank_Actor', new_df.iloc[:,2])\n", "old_df.insert(7, 'new_rank_Undergoer', new_df.iloc[:,5])\n", "old_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The labels (generated from the ETCBC-transliteration) will be replaced more readable ones:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "label_gloss = {'>CH BN -2ms': 'daughter-in-law',\n", " '>DM': 'human_being',\n", " 'GR': 'sojourner',\n", " '>CH#2': 'woman_in_menstruation',\n", " '>X -2ms': 'brother',\n", " 'BFR/ BN/ -T BT/ BN/ ->CH W >T BT/ BT/ ->CH': 'granddaughter_of_woman',\n", " 'MLK=': 'idols',\n", " 'NPC#3': 'slave',\n", " 'T >CH/ >JC/': \"fellow's_wife\",\n", " 'ZR': 'lay-person',\n", " '>B -2ms': 'father',\n", " 'JHWH': 'YHWH',\n", " '2mp_sfx': '2mpl',\n", " '>HRN': 'Aaron',\n", " 'MN >JC/ ->CH': 'husband',\n", " 'DWDH -2ms': 'aunt-in-law',\n", " '>CH >M ->CH': 'woman_and_her_mother',\n", " 'RDP': 'no-one',\n", " 'BN >CH': 'blasphemer',\n", " 'XRC=/': 'deaf',\n", " 'BN JFR>L': 'Israelites',\n", " 'C>R >B -2ms': 'aunt',\n", " 'KL': 'group_of_people',\n", " 'JC >JC': 'an_Israelite',\n", " 'BN ->X -2ms': 'son_of_brother',\n", " 'QNH': 'purchaser',\n", " '>JC >CH': 'man/woman',\n", " 'CH/ W BT/ ->CH': 'woman_and_her_daughter',\n", " '3mp': 'witnesses',\n", " '>L MCPXT/ ->JC': 'clan',\n", " 'BT >B -2msBT >M -2ms': 'sister',\n", " 'PNH/ GDWL/': 'rich',\n", " '>XD': \"brother's_brother\",\n", " '>T== ZKR=/': 'male',\n", " '2ms': '2msg',\n", " '>XWT ->CH': 'sister_of_woman',\n", " 'BN TWCB': 'sons_of_sojourners',\n", " '>M -2ms': 'mother',\n", " 'L >JC/': 'man',\n", " 'ZR< ->JC': 'offspring',\n", " 'PNH/ DL/': 'poor',\n", " 'L PNH/ CH': 'woman',\n", " '>CH >B -2ms': \"father's_wife\",\n", " 'MCH=': 'Moses',\n", " 'BN >HRN': \"Aaron's_sons\",\n", " 'BT -2ms': 'daughter',\n", " 'CPXH': 'handmaid',\n", " 'C>R -HW>': 'relative',\n", " '>LMNH GRC XLL': 'widowed/expelled/defiled_woman',\n", " 'HM': 'remnants',\n", " '>T PGR/ -X -2ms': \"brother's_uncle\",\n", " 'B \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0Source12new_rank_ActorTarget345new_rank_Undergoer67
0BN >HRNAaron's_sons69034355YHWHJHWH69034700swing440323
1JHWHYHWH69038355MosesMCH=690384-1-1speak440335
2BN JFR>LIsraelites69039755YHWHJHWH690399-1-1approach440341
3JHWHYHWH69040255MosesMCH=690403-1-1speak440342
4BN JFR>LIsraelites69041555YHWHJHWH690417-1-1approach440347
.......................................
472DWD ->X -2msbrother's_uncle69132655brother>X -2ms68032-1-1redeem440637
473L >JC/man68904100handmaidCPXH689040-2-2spend autumn439885
474MN >JC/ ->CHhusband68965255widowed/expelled/defiled_woman>LMNH GRC XLL689651-2-2drive out440088
4753mpwitnesses69066055blasphemerBN >CH66980-2-2settle440424
4763mpwitnesses69067555blasphemerBN >CH69067700support440429
\n", "

477 rows × 12 columns

\n", "" ], "text/plain": [ " 0 Source 1 2 new_rank_Actor \\\n", "0 BN >HRN Aaron's_sons 690343 5 5 \n", "1 JHWH YHWH 690383 5 5 \n", "2 BN JFR>L Israelites 690397 5 5 \n", "3 JHWH YHWH 690402 5 5 \n", "4 BN JFR>L Israelites 690415 5 5 \n", ".. ... ... ... .. ... \n", "472 DWD ->X -2ms brother's_uncle 691326 5 5 \n", "473 L >JC/ man 689041 0 0 \n", "474 MN >JC/ ->CH husband 689652 5 5 \n", "475 3mp witnesses 690660 5 5 \n", "476 3mp witnesses 690675 5 5 \n", "\n", " Target 3 4 5 \\\n", "0 YHWH JHWH 690347 0 \n", "1 Moses MCH= 690384 -1 \n", "2 YHWH JHWH 690399 -1 \n", "3 Moses MCH= 690403 -1 \n", "4 YHWH JHWH 690417 -1 \n", ".. ... ... ... .. \n", "472 brother >X -2ms 68032 -1 \n", "473 handmaid CPXH 689040 -2 \n", "474 widowed/expelled/defiled_woman >LMNH GRC XLL 689651 -2 \n", "475 blasphemer BN >CH 66980 -2 \n", "476 blasphemer BN >CH 690677 0 \n", "\n", " new_rank_Undergoer 6 7 \n", "0 0 swing 440323 \n", "1 -1 speak 440335 \n", "2 -1 approach 440341 \n", "3 -1 speak 440342 \n", "4 -1 approach 440347 \n", ".. ... ... ... \n", "472 -1 redeem 440637 \n", "473 -2 spend autumn 439885 \n", "474 -2 drive out 440088 \n", "475 -2 settle 440424 \n", "476 0 support 440429 \n", "\n", "[477 rows x 12 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "edges_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The weight of the ties between the participants is defined as the difference between Actor and Undergoer Rank. We create time stamps to include original rank and new rank (new rank takes negations into account): " ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "scrolled": true }, "outputs": [], "source": [ "old_weight = (edges_df[2]-edges_df[5])**2\n", "new_weight = (edges_df['new_rank_Actor']-edges_df['new_rank_Undergoer'])**2\n", "\n", "#Insert Weight: calculated as the difference between the Actor rank and the Undergoer rank\n", "edges_df.insert(12, 'old_weight', old_weight)\n", "edges_df.insert(13, 'new_weight', new_weight)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0Source12new_rank_ActorTarget345new_rank_Undergoer67old_weightnew_weight
0BN >HRNAaron's_sons69034355YHWHJHWH69034700swing4403232525
1JHWHYHWH69038355MosesMCH=690384-1-1speak4403353636
2BN JFR>LIsraelites69039755YHWHJHWH690399-1-1approach4403413636
3JHWHYHWH69040255MosesMCH=690403-1-1speak4403423636
4BN JFR>LIsraelites69041555YHWHJHWH690417-1-1approach4403473636
\n", "
" ], "text/plain": [ " 0 Source 1 2 new_rank_Actor Target 3 4 5 \\\n", "0 BN >HRN Aaron's_sons 690343 5 5 YHWH JHWH 690347 0 \n", "1 JHWH YHWH 690383 5 5 Moses MCH= 690384 -1 \n", "2 BN JFR>L Israelites 690397 5 5 YHWH JHWH 690399 -1 \n", "3 JHWH YHWH 690402 5 5 Moses MCH= 690403 -1 \n", "4 BN JFR>L Israelites 690415 5 5 YHWH JHWH 690417 -1 \n", "\n", " new_rank_Undergoer 6 7 old_weight new_weight \n", "0 0 swing 440323 25 25 \n", "1 -1 speak 440335 36 36 \n", "2 -1 approach 440341 36 36 \n", "3 -1 speak 440342 36 36 \n", "4 -1 approach 440347 36 36 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "edges_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We produce two files, one for dynamic networks and one for static networks:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "static = edges_df[['Source','new_rank_Actor','Target','new_rank_Undergoer',6,'new_weight',7]]\n", "static.columns = ['Source','Source_agency','Target','Target_agency','Label','Weight','Clause']\n", "\n", "#Export\n", "static.to_excel('Lev17-26.edges.Static.xlsx', index=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.c Compare with older datasets\n", "\n", "For the sake of consistency, it is possible to easily compare the changes that are made in new models in comparison to old ones. This helps to update the data without going through a manual validation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_old = pd.read_excel('Lev17-26.edges.Static_Old.xlsx')\n", "data_new = pd.read_excel('Lev17-26.edges.Static.xlsx')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_new.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(data_new)-len(data_old)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### i. Check if edges have been removed or added" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "review_edges1 = []\n", "review_edges2 = []\n", "\n", "for n, row in data_new.iterrows():\n", " if row.Clause in list(data_old.Clause):\n", " subset_old = data_old[data_old.Clause == row.Clause]\n", " match = False\n", " for n1, row1 in subset_old.iterrows():\n", " if row1.Source_label == row.Source_label and row1.Target_label == row.Target_label and row1.Label == row.Label:\n", " match = True\n", " if not match:\n", " review_edges1.append(row.Clause) \n", " else:\n", " review_edges1.append(row.Clause) #Clause is added in new dataset\n", " \n", "for n, row in data_old.iterrows():\n", " if row.Clause in list(data_old.Clause):\n", " subset_new = data_new[data_new.Clause == row.Clause]\n", " match = False\n", " for n1, row1 in subset_new.iterrows():\n", " if row1.Source_label == row.Source_label and row1.Target_label == row.Target_label and row1.Label == row.Label:\n", " match = True\n", " if not match:\n", " review_edges2.append(row.Clause) \n", " else:\n", " review_edges2.append(row.Clause) #Clause is added in new dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#review_edges1\n", "#review_edges2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### ii. Check if identical edges have same weight" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "review_edges3 = []\n", "\n", "for n, row in data_new.iterrows():\n", " if row.Clause in list(data_old.Clause):\n", " subset_old = data_old[data_old.Clause == row.Clause]\n", " match = False\n", " for n1, row1 in subset_old.iterrows():\n", " if row1.Source_label == row.Source_label and row1.Target_label == row.Target_label and row1.Label == row.Label and row1.Weight == row.Weight:\n", " match = True\n", " if not match:\n", " review_edges3.append(row.Clause) \n", " else:\n", " review_edges3.append(row.Clause) #Clause is added in new dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "review_edges3 = [e for e in review_edges3 if e not in review_edges1 and e not in review_edges2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "review_edges3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Social Network Analysis\n", "\n", "The network model can now be explored with SNA-tools, in this case NetworkX.\n", "\n", "### 5.a Visualization" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pd.read_excel('Lev17-26.edges.Static.xlsx')\n", "data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.MultiGraph()\n", "\n", "for n, row in data.iterrows(): \n", " G.add_edge(row.Source_label, row.Target_label)\n", " \n", "pos = { i : (random.random(), random.random()) for i in G.nodes()}\n", "l = forceatlas2.forceatlas2_networkx_layout(G, pos, niter=2000, gravity=30, scalingRatio=2.0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weight = collections.Counter(G.edges())\n", "\n", "for u, v, d in G.edges(data=True):\n", " d['weight'] = weight[u, v]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize = (15,15))\n", "\n", "nx.draw_networkx(G, l, node_color='violet', node_size=[n[1]*10 for n in G.degree()], \n", " edge_color='grey', width=[d['weight']/3 for _, _, d in G.edges(data=True)])\n", "\n", "plt.axis('off')\n", "plt.margins(x=0.1, y=0.1)\n", "\n", "plt.savefig('screenshots/Leviticus_SNA.png', dpi=500)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Number of nodes and edges:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f'Nodes: {len(G.nodes())}\\nEdges: {len(G.edges())}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having created the edges and computed a multiple directed graph (MultiDiGraph), we can now explore the resulting network. We will begin with a general inspection:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.b Cohesion and network density\n", "\n", "One of the simplest measures of cohession (\"knittedness\") is probably density. Density is simply the number of ties in the network proportional to the possible number of ties." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.density(G)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Density is sensitive to the size of the network, and large networks tend to have lower density than small networks, simply because it is more realistic for a member of a small network to be connected with most of the remaining participants than in a large network.\n", "\n", "Therefore, another approach is average degree:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "degree = G.degree()\n", "sum_degree = sum(dict(degree).values())\n", "print(f'Average degree: {sum_degree/len(G.nodes())}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G = nx.MultiDiGraph()\n", "\n", "for n, row in data.iterrows(): \n", " G.add_edge(row.Source_label, row.Target_label)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "outdegree_sequence = collections.Counter(sorted([d for n, d in G.out_degree()], reverse=True))\n", "indegree_sequence = collections.Counter(sorted([d for n, d in G.in_degree()], reverse=True))\n", "\n", "outdegree_df = pd.DataFrame(outdegree_sequence, index=[0]).T\n", "indegree_df = pd.DataFrame([indegree_sequence]).T" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "degree_df = pd.concat([indegree_df, outdegree_df], axis=1, sort=False)\n", "degree_df.columns = ['indegree','outdegree']\n", "degree_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(15,7))\n", "\n", "plt.bar(degree_df.index, degree_df.indegree, width=0.33)\n", "plt.bar(degree_df.index+0.33, degree_df.outdegree, color='tomato', width=0.33)\n", "\n", "ax.legend(labels=['indegree', 'outdegree'], fontsize=14)\n", "plt.ylabel(\"Count\", size=14)\n", "plt.xlabel(\"Degree\", size=14)\n", "plt.xticks(size=12)\n", "plt.yticks(size=12)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cumulative:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(G.nodes())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "indegree_cum = [n/len(G.nodes())*100 for n in np.cumsum(degree_df.fillna(0).indegree)]\n", "outdegree_cum = [n/len(G.nodes())*100 for n in np.cumsum(degree_df.fillna(0).outdegree)]\n", "degree_df.insert(2, \"indegree_cum (%)\", indegree_cum)\n", "degree_df.insert(3, \"outdegree_cum (%)\", outdegree_cum)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "degree_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most connected participants:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "top_degree = sorted(dict(degree).items(), key=itemgetter(1), reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A cummulative view:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cum_degree = pd.DataFrame(top_degree)\n", "cum_degree.columns = ['participant','degree']\n", "\n", "degree_cum = [n/(len((G.edges()))*2)*100 for n in np.cumsum(cum_degree.degree)]\n", "cum_degree.insert(2, \"degree_cum (%)\", degree_cum)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cum_degree.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Updated graph:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax1 = plt.subplots(figsize=(15,7))\n", "ax2 = ax1.twinx()\n", "\n", "ax1.bar(degree_df.index, degree_df.indegree, width=0.33)\n", "ax1.bar(degree_df.index+0.33, degree_df.outdegree, color='tomato', width=0.33)\n", "\n", "ax2.plot(degree_df.index, degree_df['indegree_cum (%)'], linestyle='--', alpha=0.5)\n", "ax2.plot(degree_df.index, degree_df['outdegree_cum (%)'], linestyle='--', alpha=0.5)\n", "\n", "ax1.legend(frameon=1, labels=['indegree', 'outdegree'], fontsize=14, facecolor='white', framealpha=1)\n", "ax1.set_ylabel(\"Count\", size=14)\n", "ax2.set_ylabel(\"Cumulative %\", size=14)\n", "ax1.set_xlabel(\"Degree\", size=14)\n", "plt.xticks(size=12)\n", "plt.yticks(size=12)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.degree()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "G.out_degree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Degree proportion of selected participants:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sel_part = sum(dict(G.degree(['YHWH', 'Moses','Israelites','sojourner','2ms','an_Israelite'])).values())\n", "\n", "print(f'{round(sel_part/sum(dict(G.degree()).values())*100, 2)}%')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.c Reciprocity\n", "\n", "Reciprocity concerns whether an interaction from one actor to another is returned, or whether the relation is one-sided. A simple measure of reciprocity is to count the number of reciprocal ties and divide these by the total number of ties. For this analysis, we are not interested in the weights of the edges but simply the binary value (connected or not)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "digraph = nx.DiGraph()\n", "\n", "for n, row in data.iterrows():\n", " digraph.add_edge(row.Source_label, row.Target_label)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx.reciprocity(digraph)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "reci_df = pd.DataFrame([nx.reciprocity(digraph, digraph.nodes())]).T.sort_values(by=0, ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(15,5))\n", "\n", "plt.bar(reci_df.index, reci_df[0], width=0.33)\n", "plt.ylabel(\"fraction\", size=14)\n", "plt.xticks(size=11, rotation=45, ha='right')\n", "plt.yticks(size=12)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.d Centrality\n", "\n", "We use 4 measures for measuring the centrality of individual nodes. That will give an image of core and periphery of the network. The four measures are Degree, Closeness, Betweenness, and Eigenvector." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "indegree = nx.in_degree_centrality(digraph)\n", "outdegree = nx.out_degree_centrality(digraph)\n", "betweenness = nx.betweenness_centrality(digraph)\n", "pagerank = nx.pagerank(digraph)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "centrality = pd.DataFrame([indegree, outdegree, betweenness, pagerank]).T\n", "centrality.columns = ['indegree','outdegree','betweeness','pagerank']\n", "centrality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top five scores for centrality measures:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def top(measure, df=centrality):\n", " return df.sort_values(by=measure, ascending=False)[measure][:10]\n", "\n", "fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(15,5), sharey=True)\n", "\n", "ax1.bar(top('outdegree').index, top('outdegree'))\n", "ax1.set_title(\"Outdegree\", size=16)\n", "ax2.bar(top('indegree').index, top('indegree'))\n", "ax2.set_title(\"Indegree\", size=16)\n", "ax3.bar(top('betweeness').index, top('betweeness'))\n", "ax3.set_title(\"Betweenness\", size=16)\n", "ax4.bar(top('pagerank').index, top('pagerank'))\n", "ax4.set_title(\"PageRank\", size=16)\n", "\n", "for ax in fig.axes:\n", " plt.sca(ax)\n", " plt.xticks(rotation=45, ha='right', size=12)\n", "\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }