## Install the following package
#install.packages ("relevent")
renv::restore()
## * The library is already synchronized with the lockfile.
library (relevent)
library(sna)

Welcome to the practical for the social network analysis. In this practical we will get hands-on experience with the materials in the lectures about applying Relational Event Models (REMs) on real relational event history (REH) data: Apollo 13 voice loop data, Twitter data, Class data and World Trade Center (WTC) by doing a bit of programming in R. These data sets are stored in UUsummerschool.rdata. Later in this practical we explain these data sets briefly.

1. Relational event model (REM)

As it was mentioned in the lecture, dyadic REMs are intended to capture the behavior of systems in which individual social units (persons, organizations, animals, companies, countries, etc.) direct discrete actions towards other individuals in their environment, which form a social network.

First, we need to load UUsummerschool.rdata and see what are inside of it.

# Set your own working directory
# setwd()

# Load data using load():
load("./data/UUsummerschool.rdata") 
# This R-data contains 11 objects:
ls()
## [1] "as.sociomatrix.eventlist" "Class"                   
## [3] "ClassIntercept"           "ClassIsFemale"           
## [5] "ClassIsTeacher"           "PartOfApollo_13"         
## [7] "Twitter_data_rem3"        "WTCPoliceCalls"          
## [9] "WTCPoliceIsICR"

What are stored in UUsummerschool.rdata?
A function as.sociomatrix.eventlist that outputs a sociomatrix, and the rest are data sets such as PartOfApollo_13, Class, Twitter_data_rem3 and WTCPoliceCalls.

Probably you would ask: “What is a sociomatrix?” Before you let the computer do the social network analysis, it must have some basic data to calculate, like the connection between whom and whom. We should be aware that any one of those connections involves only two individuals. That is, it involves two nodes. This property might already inspire you to think the relationship as a mathematical expression, for instance a matrix of two dimensions. If we put the elements in the same sequence both at the row and the column, and we use 1 and 0 to represent whether there is a connection between the two elements or not, then we will have a matrix as we expect. The matrix is for the social network and we can give it an exclusive name, sociomatrix.

Secondly, let’s check all the objects.

as.sociomatrix.eventlist
## function (eventlist, n) 
## {
##     g <- matrix(0, n, n)
##     if (NROW(eventlist) > 0) {
##         tabmat <- table(eventlist[, -1, drop = FALSE])
##         g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])] <- tabmat
##     }
##     g
## }
## attr(,"source")
## [1] "function(eventlist,n){"                                                             
## [2] "  g<-matrix(0,n,n)"                                                                 
## [3] "  if(NROW(eventlist)>0){"                                                           
## [4] "    tabmat<-table(eventlist[,-1,drop=FALSE])"                                       
## [5] "    g[as.numeric(dimnames(tabmat)[[1]]), as.numeric(dimnames(tabmat)[[2]])]<-tabmat"
## [6] "  }"                                                                                
## [7] "  g"                                                                                
## [8] "}"
head(WTCPoliceCalls)
##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32
head(WTCPoliceCalls)
##   number source recipient
## 1      1     16        32
## 2      2     32        16
## 3      3     16        32
## 4      4     16        32
## 5      5     11        32
## 6      6     11        32
head(PartOfApollo_13)
##      time sender receiver
## 1 11849.2     18        2
## 2 11854.2      2       18
## 3 11885.2     18        2
## 4 11890.2      2       18
## 5 12232.2      2       17
## 6 12342.2     17        2
head(Class)
##   StartTime FromId ToId
## 1     0.135     14   12
## 2     0.270     12   14
## 3     0.405     18   12
## 4     0.540     12   18
## 5     0.675      1   12
## 6     0.810     12    1
head(Twitter_data_rem3)
##   time_day source target
## 1     0.00      8      1
## 2   121.03     28      2
## 3   199.08     28      2
## 4   266.95      4      3
## 5   573.59     22      5
## 6   574.49     25      5

2. The relevent package and rem.dyad function

The relevent package (Butts, 2015) in R provides implementation of relational event framework, with specialized functionality for easily fitting dyadic relational event models. Within the relevent package, the rem.dyad() function is the primary workhorse for modeling dyadic data. As you can see the related documentation, there are a bunch of arguments that can be used for different purposes. Here, in this practical, we only use a few of them (please see the pdf file rem.dyad.args). For more explanation please check the help file of rem.dyad as follows.

## You can access the help file in three different ways:
# 1)
??rem.dyad
# 2)
help(rem.dyad)
# 3)
?relevent::rem.dyad

There is also rem() for modeling the dyadic data. The main difference is that rem() needs the computed statistics in advance but rem.dyad() computes the statistics internally. However, if we have the value of statistics in advance rem() tends to be faster.

As applying the regression model, we should first decide which variables/statistics should be in our model. This can be done in several ways: experts knowledge, prior information and statistics/variables selection e.g., using BIC, etc. Note that not all effects may lead to identified models in all cases. It is up to the user to ensure that the postulated model makes sense.

Below you will see a short description of the data sets that we use in this practical.

3. Apollo 13 data

During this practical we will analyze part of the Apollo 13 mission. The mission launched as scheduled at 2:13:00EST (19:13:00 UTC) on April 11, 1970. On board were James Lovell Jr. (Commander, CDR), John ”Jack” Swigert Jr. (Command Module Pilot, CMP), and Fred Haise Jr. (Lunar Module Pilot, LMP). The mission was quite routine and everything went to plan, except when at 56:54:53, the astronauts heard a ”pretty large bang” and experienced fluctuations in electrical power and control thrusters. This sets a series of events in motion. With oxygen levels depleting fast, the astronauts not only faced a risk of running out of oxygen to breathe. Therefore, they decided to abort the mission and come back to earth. This indeed changed their communications and interactions during the mission, as both the astronauts and mission control had to solve unexpected and urgent problems in order to bring the crew home alive.

The data come from the Apollo 13’s voice loops transcripts, obtained from http://apollo13realtime.org/ and https://history.nasa.gov/afj/ap13fj/07day3-before-the-storm.html; the data include the Flight directors’ voice loop and the air-ground’s voice loop. Flight directors (Houston’s Mission Control Center) were located in Houston and the crew (astronauts) were connected to this control center via Capsule Communicator (CAPCOM). The Apollo 13 data is an ideal benchmark data to study communication/interaction pattern. In this practical we use a part of the data, one hour before incident till one hour after that. The eventlist is stored in an object called PartOfApollo_13 and contain four columns: time, sender, receiver and message. Note that we know precisely when these calls were made. Therefore, we apply REM considering that the the exact event times are known. That should be notified that the number of actors is 19.

4. Twitter data

The data was extracted from the Academic Twitter API in two steps. In the first step, all users being mentioned using the hashtags ‘#ic2s2’ or ‘#’netsciXXXX’ (XXXX = 2010, 2011, … 2022) were extracted. In the second step, all mentions of those users were extracted, up to 800 tweets. Finally, the core of the network was extracted by keeping users with an in-degree and out-degree over K=150 mentions. As there were some overlaps in the time, we modified the data in order to be usable for the REM model. The source or sender is a person tweeting and the target or receiver is the person mentioned. Note that the time was date of tweet, which was converted to day. In this REH data the number of actors is 39.

5. Classroom data

The classroom dataset includes timed interactions between students and instructors within a high school classroom (McFarland, 2001; Bender-deMoll and McFarland, 2006). Sender and receiver events (n=691) were recorded between 20 actors (18 students and 2 teachers) along with the time of the events in increments of minutes. Note that the data employed here were modified slightly to increase the amount of time occurring between very closely recorded events, ensuring no simultaneity of events as assumed by the relational event framework. Two actor-level covariates are also at hand in the dataset used here: whether the actor was a teacher and whether the actor was female.

6. WTC data & visualization

This data contains information about the 9/11 terrorist attacks at the WTC in New York City in 2001. That should be notified that the radio communication was essential in coordinating activities during the crisis. Butts et al.(2007) coded radio communication events between officers responding to 9/11 from transcripts of communications recorded during the event. It is important to note that the WTC radio data was coded from transcripts that lacked detailed timing information; we do not therefore know precisely when these calls were made. But we know the order of events. Since the case of ordinal timing is somewhat simpler than that of exact timing, we consider the WTC data first in this practical. We will illustrate ordinal time REMs using the 481 communication events from 37 named communicants in this data set.

The eventlist is stored in an object called WTCPoliceCalls.
Question 1: Check the data using head and tailfunction. How many events does the data contain? How many actors? Without testing anything, can you see any trend? If so, which statistics can capture that trend?

# Check the data.

You can see the reciprocation and recency pattern in the first few events: responding officer 16 called officer 32 in the first event, officer 32 then called 16 back in the second, which might be characterized as a local reciprocity effect or \(AB \rightarrow BA\) participation shift. This was followed by 32 being the target of the next four calls, which is probably due to either a role of 32 such as a coordinator or perhaps the presence of a recency mechanism.

Question 2: Use the included sna function as.sociomatrix.eventlist() to convert the eventlist into a valued sociomatrix. For the definition of sociomatrix see previous section.
Hint: as.sociomatrix.eventlist(data, number of actors).

# Write your code here.

Question 3: Once you have the sociomatrix object, plot the aggregated network using gplot().
Hint: gplot(created sociomatrix, the arguments such as the size of nodes, width of edges, colors of nodes, etc.)
gplot creates a node-and-edge visualization of a network. It can accept an adjacency matrix (N × N), an array of adjacency matrices (m × N × N), network objects, and other forms of network data (check sna library to see what else is possible).
gplot has many arguments that you can tweak your network model visualization, for example:

  • vertex.col: set the node color.
  • edge.col: color for edge border and fill.
  • vertex.sides: number of sides for node polygon. ex) triangle = 3, square = 4 (default = 8).
  • label.bg: color for the background of node.
    Check the help file of gplot to find other features.
# Write your code here.

The plot above is the time-aggregated WTC Police communication network.

  • The three black square nodes represent actors who fill institutional coordinator roles and gray circle nodes represent all other communicants (one of the square nodes is in the center).
  • A directed edge is drawn between two actors, i and j, if actor i ever called actor j on the radio. The edges and arrowheads are scaled in proportion to the number of calls over time. This is clearly a hub-dominated network with two actors sitting on the majority of paths between all other actors.
  • The actor with the plurality of communication ties is an institutional coordinator (the square node at the center of the figure), heterogeneity in sending and receiving communication ties is evident, with several high-degree non-coordinators and two low-degree institutional coordinators, in the network. This source of heterogeneity is a good starting place from which to build our model.

7. Fitting relational event models

7-1. A simple covariate model

The question is whether the propensity of individuals to send and receive calls depends on whether they occupy institutionalized coordinator roles (ICR) or not. WTCPoliceIsICR is a vector that shows the ICR.

Question 4: Use rem.dyad() by passing appropriate arguments.
Hint: rem.dyad(data..., n = number of actors, effects = c("CovInt"), covar = list(CovInt = WTCPoliceIsICR), hessian = ...).
Please Check the description of arguments (?rem.dyad).

# Write your code here.

Once you have fit the model, you can summarize the model fit using summary().
**Question 5: Check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)).predicted.matchprovides a two-column matrix of true and false corresponding to the sender and receiver columns in the main data frame. It shows whether the model could predict the sender and receiver truly.** Hint:summary(your model name)`.

# Write your code here.

The output gives us the covariate effect, some uncertainty, and goodness-of-fit information. The format is like the output for a regression model summary, but note that the coefficients should be interpreted considering the relational event framework. In particular, the ICR role coefficient is the logged multiplier for the hazard of an event involving an ICR versus a non-ICR event ( \(e^{\lambda_1}\) ). This effect is cumulative: an event in which one actor in an ICR calls another actor in an ICR gets twice the log increment (\(e^{2\lambda_1}\) ).

Question 6: Use exp() to check the value of \(e^{\lambda_1}\) and \(e^{2\lambda_1}\).
Hint: You can extract the coeffcient value by using $ sign (e.g., yourmodel$coef).

# Relative hazard for a non-ICR/ICR 

# Relative hazard for an ICR/ICR 

Next, we evaluate whether it is worth treating these effects separately with respect to ICR status.
To do so, we enter the ICR covariate as a sender and receiver covariate, respectively.

**Question 7: Fit another model adding the covariates and check the model summary as you did above. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction ** Hint: It is the same as what we did for Q4. But here we should consider the effect for the sender and for the receiver separately. The structure is the same but we need to select "CovSnd",and " CovRec". Check the description of arguments to see what you need to specify foreffects`.

# Write your code here.

Question 8: Now, evaluate which model is preferred by BIC (lower is better) and explain what the BIC tells us.
Hint: You can extract BIC value from each model by using $ sign (e.g., yourmodel$BIC).

# Write your code here.

7-2. Adding endogenous variables

Add the reciprocity term: \(AB \rightarrow BA\). For the definition of reciprocity, see the slides. In particular, the AB-BA shift, which captures the tendency for B to call A, given that A has just called B, is likely at play in radio communication.

**Question 9: Fit another model adding the reciprocity term and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction** Hint: Add "PSAB-BA" as another statistic to the best model. Check the description of arguments to see what you need to specify foreffects`.

# Write your code here.

Question 10: Again, evaluate which model is preferred by BIC.

# Compare the BIC of this model with the first model.

Note that the lower BIC the better the model. It appears that there is a very strong reciprocity effect. Doesn’t it?

We may expect that the current participants in a communication are likely to initiate the next call and that one’s most recent communications may be the most likely to be returned.
These processes can be captured with the participation shifts for dyadic turn receiving/continuing and recency effects, respectively.

**Question 11: Fit another model adding the p-shift effects to the previous model and check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all)) and compare it with previous model. prediction ** Hint: Add "PSAB-BY" and "PSAB-AY" to the best model. For more information check the description of arguments to see what you need to specify foreffects`.

# Write your code here.

Question 12: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (third model).

Question 13: Lastly, fit a model adding the recency effects to the previous model and check the model summary. See how much were not very well predicted by your model using `mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction.
Hint: Read the definition of “RRecSnd”, and “RSndSnd”. Are they the statistics that we need for this purpose?

# Write your code here.

Question 14: Evaluate which model is preferred.

# Compare the BIC of this model with the previous model (fourth model).

The results indicate that turn-receiving, turn-continuing, and recency effects are all at play in the relational event process.

Question 15: Can you think of adding more statistics?
Please add a few more and see how the BIC changes. To this end, check the definition of statistics precisely.

# Try it out yourself.

Question 16: As there are a lot of statistics that we can use, can you think of some ways for statistics selection?

# Try it out yourself.

8. Assessing model adequacy

Model adequacy is a crucial consideration: even though the model fit is sufficiently good, you need to think of whether this model is good enough for our purposes. For example, the ability of the relational event model to predict the next event in the sequence, given those that have come before.
Note that we do not discuss this further in this practical but interested students can check Butts 2008.

9. Exact Time Event Histories

We are going to use REMs for event histories with exact timing information.
Let’s start by examining the raw temporal data and the time-aggregated network.
In this case, event time is given in increments of minutes from the onset of observation.

The edgelist is stored in an object called Class.
Question 17: Check the data again using head(). Can you see any rend?

# Check the data.

9-1. Time-aggregated network visualization

Question 18: First, convert the eventlist into a valued sociomatrix using as.sociomatrix.edgelist() as you did above.
Hint: as.sociomatrix.edgelist(data, number of actors).

# Write your code here.

Question 19: Plot the network using gplot().
Here we color the female nodes black, the male nodes gray, represent teachers as square-shaped nodes and students as triangle-shaped nodes. Edges between nodes are likewise scaled proportional to the number of communication events transpiring between actors.

Hint: Please check the hint of Q3 and consult the help file of gplot.

# Write your code here.

9-2. Modeling with covariates

**Question 20: Fit a simple intercept model, containing only a vector of 1s as an actor-level sending effect. And check the model summary. See how much were not very well predicted by your model using mean(apply(your model$predicted.match,1,all))** Hint: This vector is saved asClassIntercept, which we can pass to the respective covariate arguments inrem.dyad(). Also note that we do not want to discard timing information (ordinal=FALSE`).

# Write your code here.

The model does not fit any better than the null, because it is equivalent to the null model (as indicated by the absence of difference between the null and residual deviance).

Question 21: Now fit a more interesting covariate model that considers gender(male/female) and role(teacher/student) for senders and receivers. And evaluate whether there is any improvement over the intercept-only model by BIC. See how much were not very well predicted by your model using `mean(apply(your model$predicted.match,1,all)) and compare it with previous model prediction

# Write your code here.

# Compare the BIC of this model with the previous model.

A good improvement over the null model. But note that gender does not appear to be predictive of sending communication. A better model may be one without that specific term included.

Question 22: Fit another model excluding the non-significant term and again compare to the previous model by BIC and prediction.
Hint: Based on the above outputs, which statistics should be included/excluded?

# Write your code here.

Question 23: Is there any improvement?

9-3. Modeling endogenous social dynamics

For the classroom conversations, we can for example think of recency effects, turn-taking, sequential address, and turn-usurping. We can enter these terms into the model using their appropriate effect names (please look at the list and check the definition of statistics).

We keep the covariates from the best covariate model (i.e., model 3 from the previous section).
Question 24: Fit a model adding just a recency effect to the previous model and again check the improvement by BIC and prediction.

# Add recency effects + model 3:

# Compare the BIC of this model with the previous model (model3).

Question 25: Fit another model adding the conversational norms to the previous model (model 4) and check the improvement by BIC and prediction.

# Add conversational norms + model 4

# Compare the BIC of this model with the previous model (model4).

10. Project 1: Application of REM on Apollo 13 data

We are again going to use REMs for event histories with exact timing information.
Let’s consider Apollo 13 mission data. Read the description of the data at the beginning of this practical and also check the aforementioned websites for more information. In this case, event time is given in increments of seconds from the onset of observation.

In this data the actors are as follows:

  • AFD: Assistant Flight Director from Flight directors (1)

  • CAPCOM: Capsule Communicator from Flight directors (2)

  • CONTROL: Control Officer from Flight directors (3)

  • EECOM: Electrical, Environmental and Consumables Manager from Flight directors (4)

  • All : Ground control team (without flight directores) (5)

  • FDO : Flight dynamics officer (FDO or FIDO) (6)

  • FLIGHT: Flight Director from Flight directors (7)

  • GNC: The Guidance, Navigation, and Controls Systems Engineer from Flight directors (8)

  • GUIDO: Guidance Officer from Flight directors (9)

  • INCO: Integrated Communications Officer from Flight directors (10)

  • NETWORK: Network of ground stations from Flight directors (11)

  • TELMU: Telemetry, Electrical, and EVA Mobility Unit Officer from Flight directors (12)

  • RECOVERY: Recovery Supervisor from Flight directorsc (13)

  • PROCEDURES: Organization & Procedures Officer from Flight directors (14)

  • FAO: Flight activities officer from Flight directors (15)

  • RETRO: Retrofire Officer from Flight directors (16)

  • CDR: Commander James A. Lovell Jr. crew (astronauts) (17)

  • CMP: Command Module Pilot John (Jack) L. Swigert Jr. crew (astronauts) (18)

  • LMP: Lunar module pilot Fred W. Haise Jr. crew (astronauts) (19)

Question 26: First look at the data, plot the network and start with fitting a simple model. Next, add the statistics of interest to the model and see the performance of the new model.

# Write your code here.

11. Project 2: Application of REM on Twitter data

Question 27: First, look at the data, then plot the network. Can you see any trends in the network? Start with fitting a REM model and add the statistics of interest sequentially. Can you see any improvement based on the BIC and prediction?

Hint: Consider some of these statistics:“PSAB-BA”, “ISPSnd”, “PSAB-BA”, “PSAB-XB”, “NIDSnd”, “NIDRec”, and “NODSnd”.

# Write your code here.