{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mapping participants. The first experiments\n", "\n", "This notebook explores the participant references added to [text-fabric](https://dans-labs.github.io/text-fabric/) as extra features (see [notebook](https://github.com/ch-jensen/Semantic-mapping-of-participants/blob/master/3_Exporting%20actors%20as%20TF-features.ipynb)).\n", "\n", "In particular, the Python package NetworkX is applied in order to visualize the networks resulting from participant references and their interactions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Importing modules" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import csv, collections\n", "import pandas as pd\n", "import numpy as np\n", "import networkx as nx\n", "import matplotlib.pyplot as plt\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using etcbc/bhsa/tf - c r1.4 in C:\\Users\\Ejer/text-fabric-data\n", "Using etcbc/phono/tf - c r1.1 in C:\\Users\\Ejer/text-fabric-data\n", "Using etcbc/parallels/tf - c r1.1 in C:\\Users\\Ejer/text-fabric-data\n", "Using etcbc/actor/tf - c r1.0 in C:\\Users\\Ejer/text-fabric-data\n", "Cannot determine the name of this notebook\n", "Work around: call me with a self-chosen name: name='xxx'\n" ] }, { "data": { "text/markdown": [ "**Documentation:** BHSA Character table Feature docs bhsa API Text-Fabric API 7.0.1 Search Reference" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Loaded features:\n", "

etcbc/actor/tf: actor prs_actor coref

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det dist dist_unit domain freq_lex freq_occ function g_word g_word_utf8 gloss gn instruction is_root kind label language lex lex_utf8 ls nametype nme nu number otype pargr pdp pfm prs prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rank_occ rela root sp st tab trailer trailer_utf8 txt typ uvf vbe vbs verse voc_lex voc_lex_utf8 vs vt distributional_parent functional_parent mother oslots

Phonetic Transcriptions: phono phono_trailer

Parallel Passages: crossref

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
API members:\n", "C Computed, Call AllComputeds, Cs ComputedString
\n", "E Edge, Eall AllEdges, Es EdgeString
\n", "TF, ensureLoaded, ignored, loadLog
\n", "L Locality
\n", "cache, error, indent, info, reset
\n", "N Nodes, sortKey, otypeRank, sortNodes
\n", "F Feature, Fall AllFeatures, Fs FeatureString
\n", "S Search
\n", "T Text
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use('bhsa', hoist=globals(), mod='etcbc/actor/tf')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Mapping participant co-references\n", "\n", "Actor references have been exported as an edge feature in text-fabric, called *coref*. By calling this edge feature a list of co-referring nodes are returned. Thereby, co-referring participant references can be mapped." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1. Setting up the system\n", "\n", "Before network mapping, a set of functions need to defined to automate the procedure:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def getRefNodes(ref, level):\n", " '''\n", " Input: Tuple with book (string) and chapter (integer) plus level (phrase_atom, subphrase, or word)\n", " Output: A list of all nodes with edge coref features in particular chapter and at given level.\n", " '''\n", " loc = T.nodeFromSection(ref)\n", " \n", " ref_list = []\n", " for n in L.d(loc, level):\n", " if E.coref.f(n): #Using the edge feature to find co-references of the particular node.\n", " ref_list.append(n)\n", " \n", " return ref_list\n", "\n", "#getRefNodes(('Leviticus',25), 'phrase_atom')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def writeText(node, text='gloss'):\n", " '''\n", " Input: node (word node, phrase node etc.) plus text type, that is, English gloss (default)\n", " or transcription of the Hebrew lexeme (= trans)\n", " Output: Transcription of the input node in the preferred format (trans or gloss)\n", " '''\n", " \n", " if F.otype.v(node) != 'word': #If not a word node, the function levels down to word level and find the lexeme or gloss.\n", " node_text = ''\n", " \n", " for w in L.d(node, 'word'): #Leveling down to word level\n", " if text == 'gloss':\n", " node_text += F.gloss.v(L.u(w, 'lex')[0]) + ' '\n", " else:\n", " node_text += F.lex.v(w) + ' '\n", " else: #Word nodes - no need to level down.\n", " if text == 'gloss':\n", " node_text = F.gloss.v(L.u(node, 'lex')[0])\n", " else:\n", " node_text = F.lex.v(node)\n", " return node_text\n", "\n", "#writeText(945735, 'trans')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def createTuples(node, text='gloss'):\n", " '''\n", " Input: node (word node, phrase node etc.) plus text type, that is, English gloss (default)\n", " or transcription of the Hebrew lexeme (= trans)\n", " Output: A list of tuples. Each tuple contains two nodes (transcribed) plus weight.\n", " '''\n", " edges = E.coref.f(node) #Finding co-references of the node\n", " tuples_list = []\n", " weighted_tuples_list = []\n", " for e in edges: #Looping over the co-references\n", " tup = (f' {writeText(node, text)}', writeText(e, text).rstrip(' ')) #Create tuples with node and co-references\n", " tuples_list.append(tup)\n", " \n", " c = collections.Counter(tuples_list) #Identical tuples are counted and condensed as weights.\n", " for c in c.items():\n", " tup = c[0][0], c[0][1], str(c[1])\n", " weighted_tuples_list.append(tup)\n", " \n", " return weighted_tuples_list\n", "\n", "#createTuples(945735, 'trans')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def actorNetwork(actor, ref):\n", " '''\n", " This functions finds the first relevant node from the input of a reference word string. Using this function, the user\n", " can use actor names instead of (word/phrase atom) nodes.\n", " Input: Actor name (string) and Biblical reference (tuple with book and chapter)\n", " Output: Node\n", " '''\n", " ph_atom_list = L.d(T.nodeFromSection(ref), 'phrase_atom') #Create a phrase atom list from Biblical reference\n", "\n", " for ph in ph_atom_list: #Looping through the phrase atoms\n", " actor_node = int()\n", "\n", " #check phrase_atom level\n", " if F.actor.v(ph) == actor: #If there is a match of actor name at phrase atom level this phrase node is used\n", " actor_node = ph\n", " break\n", "\n", " #check subphrase level\n", " for subph in L.d(ph, 'subphrase'): #Going down to subphrase level to check for match\n", " if F.actor.v(subph) == actor:\n", " actor_node = subph\n", " break\n", "\n", " #check word level\n", " for w in L.d(ph, 'word'): #Going down to word level to check for match\n", " if F.actor.v(w) == actor or F.prs_actor.v(w) == actor:\n", " actor_node = w\n", " break\n", " \n", " if not actor_node: return 'Error: Actor not found' #Error if no match\n", " \n", " else: return actor_node\n", " \n", "#actorNetwork('BN JFR>L', ('Leviticus', 25)) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2. Plotting\n", "\n", "Having set up the ancillary functions, we can now start plotting co-references in network graphs. The following function creates nodes, weighted edges, node labels (lexeme or gloss), edge labels (weights). The graph is inspired by [https://qxf2.com/blog/drawing-weighted-graphs-with-networkx/](https://qxf2.com/blog/drawing-weighted-graphs-with-networkx/)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "def plot_weighted_graph(node, label_text='gloss'):\n", " '''\n", " Input: node (word node, phrase node etc.) plus preferred text type, that is, English gloss (default)\n", " or transcription of the Hebrew lexeme (= trans)\n", " Output: weighted network graph\n", " '''\n", " bo, ch, ve = T.sectionFromNode(node) #Biblical reference deduced from node\n", " \n", " #1. Add nodes\n", " G = nx.Graph() #Create a graph object called G\n", " node_list = [writeText(e, label_text).rstrip(' ') for e in E.coref.f(node)] #Creating node list from edges\n", " node_list.append(f' {writeText(node, label_text)}') #Appending input node to node list\n", " node_list = set(node_list)\n", " for n in node_list: #Add nodes to graph\n", " G.add_node(n)\n", " \n", " #2. Selecing graph layout\n", " pos=nx.spring_layout(G)\n", " nx.draw_networkx_nodes(G,pos,node_color='gold',node_size=2500) #Drawing nodes\n", " \n", " #3. Adding labels. The node names are used as labels.\n", " labels = {}\n", " for node_name in node_list:\n", " labels[str(node_name)] = node_name\n", " nx.draw_networkx_labels(G,pos,labels,font_size=12)\n", " \n", " #4. Adding weighted edges using the function createTuples()\n", " for e in createTuples(node, label_text):\n", " G.add_edge(e[0],e[1],weight=e[2])\n", " \n", " #4 a. Iterate through the graph nodes to gather all the weights\n", " all_weights = []\n", " for (node1,node2,data) in G.edges(data=True):\n", " all_weights.append(int(data['weight'])) #we'll use this when determining edge thickness\n", "\n", " #4 b. Get unique weights\n", " unique_weights = list(set(all_weights))\n", "\n", " #4 c. Plot the edges - one by one!\n", " for weight in unique_weights:\n", " \n", " #4 d. Form a filtered list with just the weight you want to draw\n", " weighted_edges = [(node1,node2) for (node1,node2,edge_attr) in G.edges(data=True) if int(edge_attr['weight'])==weight]\n", " \n", " #4 e. I think multiplying by [num_nodes/sum(all_weights)] makes the graphs edges look cleaner\n", " width = weight*len(node_list)*3.0/sum(all_weights)\n", " \n", " #4 f. Creating a dictionary with weights as edge labels. A tuple of nodes is used as key.\n", " edge_labels = {(node1,node2): edge_attr['weight'] for (node1,node2,edge_attr) in G.edges(data=True) if int(edge_attr['weight'])==weight}\n", " \n", " #Drawing edges and edge labels:\n", " nx.draw_networkx_edges(G,pos,edgelist=weighted_edges,width=width)\n", " nx.draw_networkx_edge_labels(G,pos,edge_labels=edge_labels) \n", "\n", " plt.axis('off')\n", " plt.rcParams[\"figure.figsize\"] = (10,10)\n", " plt.title(f'Coreferences of {writeText(node, label_text).rstrip(\" \")} in {bo} {ch}', fontsize=20)\n", " plt.savefig(f'Images/coref_{writeText(node, label_text).rstrip(\" \")}_{bo}_{ch}.png', dpi=500)\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_weighted_graph(945735, 'gloss')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If one does not know the node of the actor to map with its co-references, the function actorNetwork() can be used to find the relevant node based on actor name and chapter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "plot_weighted_graph(actorNetwork('>X BN JFR>L', ('Leviticus', 25)), 'gloss')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Participants in interactions\n", "\n", "Network analysis is usually based on interactions or relations between a set of participants. When using the ETCBC database of the Hebrew text as the corpus, we adopt a well-defined concept of interactions, namely the verbs that describe relations and interactions between participants.\n", "\n", "The following function takes a list of nodes, each referring to distinct actors in the same chapter, and identify those clauses where the actors intersect (NB: all actors need not intersecting in the same clause. It is enough that only two of them intersect). The interactions are mapped as verbs that connect participants." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "def plot_participant_relations(node_list, label_text='gloss'):\n", " '''\n", " Input: node list (word node, phrase node etc.) plus preferred text type, that is, English gloss (default)\n", " or transcription of the Hebrew lexeme (= trans)\n", " Output: network graph\n", " ''' \n", "\n", " G = nx.MultiGraph() #Create a graph object called G\n", " bo, ch, ve = T.sectionFromNode(node_list[0]) #Biblical reference deduced from node\n", " \n", " #Finding intersection between nodes\n", " actor_names = ''\n", " clause_atom_node_list = []\n", " for n in node_list:\n", " actor_names += f'{writeText(n, label_text).rstrip(\" \")}_' #Writing actor names to use for image name\n", " actor_clause_nodes = [L.u(n, 'clause_atom')[0] for n in E.coref.f(n)] #Finding the clause node of each node\n", " actor_clause_nodes.append(L.u(n, 'clause_atom')[0]) #Appending clause node to list\n", " clause_atom_node_list += list(set(actor_clause_nodes)) #Using set() to avoid redundant clause nodes\n", " \n", " #Intersections are calculated by counting the frequency of unique clauses. If a clause appears more than once, there is\n", " #an intersection\n", " counter = collections.Counter(clause_atom_node_list)\n", " intersection = [n for n in counter if counter[n] > 1]\n", " \n", " pred_actor_list = []\n", " edge_labels = {}\n", " if intersection:\n", " for cl in intersection: #Looping over clauses with intersecting actors\n", " non_pred_dict = {}\n", " pred_actor = ''\n", " subj = False\n", " for ph in L.d(cl, 'phrase'): #Looping over phrases of clause\n", " \n", " #If predicate phrase:\n", " if F.function.v(ph) in ['Pred','PreS','PreO']:\n", " pred_actor = ph\n", " pred_actor_list.append(ph)\n", " \n", " #The predicate phrase node is also added to non_pred_dict because the predicate contains an impl. subject\n", " non_pred_dict[L.u(ph, 'phrase_atom')[0]] = F.actor.v(L.u(ph, 'phrase_atom')[0])\n", " for w in L.d(ph, 'word'): #Checking whether a possible pron. suffix has an actor reference.\n", " if F.prs_actor.v(w) != None:\n", " non_pred_dict[L.u(ph, 'phrase_atom')[0]] = F.prs_actor.v(w) #Adding to non_pred_dict\n", " \n", " #If complement or object phrase\n", " elif len(L.u(ph, 'phrase_atom')) > 0 and F.function.v(ph) in ['Cmpl', 'Objc']:\n", " if F.actor.v(L.u(ph, 'phrase_atom')[0]) != None: #If actor reference, adding to dict.\n", " non_pred_dict[L.u(ph, 'phrase_atom')[0]] = F.actor.v(L.u(ph, 'phrase_atom')[0])\n", " else: #If no reference at phrase_atom level, checking whether there is a reference on suffix level\n", " for w in L.d(ph, 'word'):\n", " if F.sp.v(w) == 'prep' and F.prs_actor.v(w) != None: #Needs to be a preposition.\n", " non_pred_dict[L.u(ph, 'phrase_atom')[0]] = F.prs_actor.v(w)\n", " \n", " #If explicit subject in clause, the variable subj is set to True\n", " if F.function.v(ph) == 'Subj':\n", " subj = True\n", " \n", " #If there is a predicate clause, edges can be created:\n", " if pred_actor != '':\n", " for n in non_pred_dict: #Creating edges from verb (pred_actor) to each relevant non-predicate of the clause\n", " G.add_edge(writeText(pred_actor, label_text).rstrip(), non_pred_dict[n])\n", " \n", " func = F.function.v(L.d(n, 'phrase')[0])\n", " if func in ['Pred','PreO','PreS']:\n", " if subj == False: #If no explicit subject, set func to 'impl subj'\n", " func = 'Impl. subj' \n", " else:\n", " func = 'Subj'\n", " \n", " #Writing edge labels as a dictionary. The phrase function is the edge value\n", " edge_labels[writeText(pred_actor, label_text).rstrip(), non_pred_dict[n]] = func\n", " \n", " #3. Adding labels, using the node names\n", " labels = {}\n", " for node_name in G.nodes():\n", " labels[str(node_name)] = node_name\n", "\n", " #Using the pred_actor_list to create a list with the right names suitable for coloring the verb nodes.\n", " pred_actor_color = [writeText(n, label_text).rstrip(' ') for n in pred_actor_list]\n", " \n", " pos=nx.spring_layout(G)\n", " nx.draw_networkx_nodes(G,pos, node_color='gold',node_size=1000)\n", " nx.draw_networkx_nodes(G,pos,nodelist=pred_actor_color, node_color='salmon',node_size=1000)\n", " nx.draw_networkx_labels(G,pos,labels,font_size=8)\n", " nx.draw_networkx_edges(G,pos)\n", " nx.draw_networkx_edge_labels(G,pos,edge_labels=edge_labels, font_size=8)\n", " \n", " plt.axis('off')\n", " plt.legend(('Participant','Predicate'), loc = 'best', numpoints = 1, markerscale = 0.4)\n", " plt.savefig(f'Images/network_{actor_names}{bo}_{ch}.png', dpi=500)\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1. Examples of mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mapping YHWH and Moses in Lev 25:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_participant_relations([actorNetwork('MCH', ('Leviticus', 25)), \n", " actorNetwork('JHWH', ('Leviticus', 25))])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mapping Moses and the sons of Israel in Lev 25:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_participant_relations([actorNetwork('MCH', ('Leviticus', 25)), \n", " actorNetwork('BN JFR>L', ('Leviticus', 25))])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mapping YHWH, Moses and the sons of Israel in Lev 25:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_participant_relations([actorNetwork('MCH', ('Leviticus', 25)), \n", " actorNetwork('JHWH', ('Leviticus', 25)),\n", " actorNetwork('BN JFR>L', ('Leviticus', 25))])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }