--- title: "004 - Adding and Removing Nodes and Edges" output: html_document --- In a previous example, it was demonstrated that creating node data frames and edge data frames could be a viable strategy for building a graph. That pattern is useful for bulk additions of connected nodes as first step. Another means to compose a graph is to initialize an empty or partially-populated graph and add nodes individually, incorporating edges either during node addition or as separate operations. This is useful when data is collected slowly over time (e.g., through periodic data collections) and you'd like to update the graph with recent data. Here, we'll learn how to add nodes and edges to a graph, and, how to remove them. ## Setup Ensure that the development version of **DiagrammeR** is installed. Load in the package with `library()`. Additionally, load in the **tidyverse** packages. ```{r load_packages, message=FALSE, warning=FALSE, include=FALSE, results=FALSE} #devtools::install_github("rich-iannone/DiagrammeR") library(DiagrammeR) library(tidyverse) ``` ## Part 1. Adding Nodes and Edges First, create an empty graph. Sometimes it's good to start with an empty slate: ```{r create_empty_graph} graph <- create_graph() ``` You can add individual nodes to a graph by using the `add_node()` function. Let's add two nodes in the most minimal fashion: ```{r add_2_nodes_to_graph} graph <- graph %>% add_node() %>% add_node() ``` This creates 2 nodes with ID values `1` and `2` (ID values are set for you as auto-incrementing integers). We can use both the `get_node_ids()` and `get_node_df()` functions to verify that the nodes had been added. ```{r get_node_ids} graph %>% get_node_ids() ``` ```{r get_node_df} graph %>% get_node_df() ``` Likewise we can make sure that no edges are in the graph by using the `get_edges()` function. A graph with no edges will return `NA`. (We can also use `count_edges()` and expect `0`.) ```{r get_edges_empty_graph} graph %>% get_edges() ``` Note that using the default values for `type` or `label` in each `add_node()` call, we don't get values for the `type` attribute and the `label` attribute is assigned the node ID value. In the ideal case, values for `type` and `label` are supplied. Something to keep in mind is that including `label` values that are unique or distinct across all nodes in the graph will make it possible to specify node selections and perform useful actions on specific nodes. Let's create the `graph` object once more with `type` and `label` node attributes included. ```{r create_graph_add_nodes_labels_types} graph <- create_graph() %>% add_node( type = "number", label = "one") %>% add_node( type = "number", label = "two") ``` View the graph's internal node data frame with the `get_node_df()` function. This allows us to see that these attributes have been included alongside the graph's nodes. ```{r get_node_df_2} graph %>% get_node_df() ``` Adding a single edge is possible with the `add_edge()` function. Let's add a single, directed edge between nodes `1` and `2`. This edge with also be given a value for its `rel` attribute. To do this, we can specify the node ID values for the `from` and `to` arguments. Immediately after adding the edge to the graph, use the `get_edges()` function to show that the edge has been produced. ```{r create_graph_w_ids} # Add an edge between nodes `1` and `2` and # set the `rel` attribute as `to_number` graph_edge_w_ids <- graph %>% add_edge( from = 1, to = 2, rel = "to_number") # Display the graph's edges (in the default # string vector format with node IDs separated # by arrows in this directed graph case) graph_edge_w_ids %>% get_edges() ``` Perhaps you don't work directly with the node IDs and instead with unique node labels. This is a common practice as node ID values can be considered as arbitrary but node labels and other attributes give each node an identity and make them distinguishable. In such a workflow, it's easier to create edges based on the node `label` values. Supply the node `label` values to the `from` and `to` arguments (the package assumes that text represents node labels, and numbers are node ID values). To view the graph's edges use `get_edges()` as before but, this time, use `return_values = "label"` to observe the graph's edges in terms of node `label` values. ```{r create_graph_w_labels} # Add an edge between the nodes with labels # `one` (node `1`) and `two` (node `2`) and # set the `rel` attribute as `to_number` graph_edge_w_labels <- graph %>% add_edge( from = "one", to = "two", rel = "to_number") # Display the graph's edges (as a string-based # vector with pairs of node `label` values) graph_edge_w_labels %>% get_edges(return_values = "label") ``` The `get_edges()` function can output the pairs of nodes in edges either as a character vector (as above, which is the default), as a data frame (with 2 columns: `from` and `to`), or as a list (first component is the `from` vector and the second represents the `to` nodes). Here are examples of the latter two output types: ```{r get_edges_return_type_df} # Get the graph's edges as a data frame graph_edge_w_labels %>% get_edges(return_type = "df") ``` ```{r get_edges_return_type_list} # Get the graph's edges as a list graph_edge_w_labels %>% get_edges(return_type = "list") ``` The addition of a node and the creation of edges can also be performed in a single `add_node()` step. You can use either (or both) of the optional `from` and `to` arguments in the `add_node()` function. This is best demonstrated by means of a few examples. ```{r add_node_edge_a} # Add node with ID `3` and the # edge `2->3` graph_a <- graph_edge_w_labels %>% add_node( type = "number", label = "three", from = 2) graph_a %>% get_edges() ``` ```{r add_node_edge_b} # Add node with ID `4` and the # edge `4->3` graph_b <- graph_a %>% add_node( type = "number", label = "four", to = 3) graph_b %>% get_edges() ``` ```{r add_node_edge_c} # Add node with ID `5` and the # edges `1->5` and `5->2` graph_c <- graph_b %>% add_node( type = "number", label = "five", from = 1, to = 2) graph_c %>% get_edges() ``` There may be even multiple edges set as `to` and/or `from` values. ```{r add_node_edge_d} # Add node with ID `6` and the # edges `1->6`, `2->6`, and `3->6` graph_d <- graph_c %>% add_node( type = "number", label = "six", from = 1:3) graph_d %>% get_edges() ``` ```{r add_node_edge_e} # Add node with ID `7` and the # edges `7->4`, `7->5`, and `7->6` graph_e <- graph_d %>% add_node( type = "number", label = "seven", to = 4:6) graph_e %>% get_edges() ``` Have a look at the final graph in the RStudio Viewer by using the `render_graph()` function. ```{r view_graph_one_rel_set} graph_e %>% render_graph(output = "visNetwork") ``` Notice that the edge relationship value has only been added to the `1->2` edge as `to_number`. If you'd like to specify all the `rel` values for all edges as `to_number`. This can be done with the `set_edge_attrs()` function. To do this unconditionally to all edges in the graph: ```{r set_edge_attrs_f} graph_f <- graph_e %>% set_edge_attrs( edge_attr = "rel", values = "to_number") ``` To verify that the change was applied, use the `get_edge_df()` function to output the graph's internal edge data frame. ```{r get_edge_df_all_rel_set} graph_f %>% get_edge_df() ``` Alternatively, use the `get_edge_attrs()` to verify. By supplying the graph object and the name of the edge attribute (`rel`), we get a named vector of edge attribute values (where the names are the edges in the format `[id]->[id]`). ```{r get_edge_attrs_edge_attr_rel} graph_f %>% get_edge_attrs(edge_attr = "rel") ``` View the graph again to see that all edges are labeled with the `to_number` `rel` edge attribute. ```{r view_graph_all_rel_set} graph_f %>% render_graph(output = "visNetwork") ``` It should be noted that there are also analogous `set_node_attrs()` and `get_node_attrs()` functions that allow for setting and getting attributes for nodes in a graph (or in a node data frame). ## Part 2. Removing Nodes and Edges Nodes and edges can just as easily be removed. The key functions here are `delete_node()` and `delete_edge()`. Removing a node also removes all edges to and from that node. For the sake of example, let's remove the node with the ID `6` which is one of the more highly connected nodes in the graph. ```{r remove_node_and_also_edges} # Get the number of edges before node removal edges_before_change <- graph_f %>% count_edges() # Remove node `6` from the graph graph_g <- graph_f %>% delete_node(node = 6) # Get the number of edges after the removal edges_after_change <- graph_g %>% count_edges() # Show, as a vector, the number of edges # before and after the removal of a node c(edges_before_change, edges_after_change) ``` View the revised graph to see the change. ```{r view_revised_graph} graph_g %>% render_graph(output = "visNetwork") ``` To remove a single edge we use `delete_edge()` By removing the edge `5->2` we are left with a circular graph. Here we will get an edge count before the edge removal, remove the edge, then get another count of edges. ```{r delete_edge_5_2} # Get the number of edges before edge removal graph_g %>% count_edges() # Remove edge `5->2` from the graph graph_h <- graph_g %>% delete_edge( from = 5, to = 2) # Get the number of edges after edge removal graph_h %>% count_edges() ``` View the graph again and note that no nodes were removed. The `delete_edge()` function never removes nodes. We do get that cycle graph, as promised. ```{r view_revised_graph_2} graph_h %>% render_graph(output = "visNetwork") ```