#Cleaning Network Data
library(igraph)
This script is intended to help you to clean up network data that you have colected or got access to. One very common issue with cleaning networkd data is knowing what to do with isolates. Isolates are those who are a part of your network, but who have no connections to others in the group. Isolates are stored in network data differently depending on how your data are stored.
If your data are stored in an adjanceny matrix, then isolates are those with no 1s in the matrix. Ensuring that R recognises them as isolated is very simple. Bring in the data, and then convert it into a matrix. Any that are isolated will show as isolates.
hog_crush_matrix <- read.csv(file.choose(), row.names = 1, header = TRUE)
crush_matrix <- as.matrix(hog_crush_matrix)
hog_crush_net_mat <- graph_from_adjacency_matrix(crush_matrix, mode = "directed", diag = FALSE)
plot(hog_crush_net_mat)
However, things are not as straightforward when you are working with edgelists.With this structure, you have only two columns, one for senders and the other for receivers. If there is an individual in the group who neither sends nor receives, what do you do with them? One way of recording such isolates in an edgelist is list them as connected to themselves (known as a self loop). Take a look at this edgelist and you will see that these individuals are connected to themselves
hog_crush_correct <- read.csv(file.choose(), header=TRUE) # select Hogwarts Crushes Edgelist_CORRECT.csv
#Take a look at the data
hog_crush_correct
## Crusher Crush
## 1 Harry Potter Ginny Weasley
## 2 Harry Potter Cho Chang
## 3 Ron Weasley Hermione Granger
## 4 Hermione Granger Ron Weasley
## 5 Ron Weasley Lavender Brown
## 6 Ginny Weasley Harry Potter
## 7 Lily Potter James Potter
## 8 James Potter Lily Potter
## 9 Severus Snape Lily Potter
## 10 Nymphadora Tonks Remus Lupin
## 11 Remus Lupin Nymphadora Tonks
## 12 Lavender Brown Ron Weasley
## 13 Cho Chang Cedric Diggory
## 14 Cho Chang Harry Potter
## 15 Cedric Diggory Cho Chang
## 16 McGonagal McGonagal
## 17 Madeye Madeye
## 18 Voldemort Voldemort
## 19 Flitwick Flitwick
Now when you make this a graph object R does something different.
Crush_correct_net <- graph_from_data_frame(hog_crush_correct, directed = TRUE)
plot(Crush_correct_net)
They have self looped edges!!! These do not look great. To Remove them, you can use the delete_edges() command and select the edges that are looped by using the E() command coupled with the is.loop() option.This is also something you will need to remember to do every time you bring in an edgelist with isolates.
Crush_correct_net <- delete_edges(Crush_correct_net , E(Crush_correct_net )[which_loop(Crush_correct_net )])
plot(Crush_correct_net)
Another way to deal with isolates from an edgelist is to list noone in the “to” column. In other words, you list the name of the person in your network but leave the cell next to them blank. However, this approach also has additional steps to take before it is clean and ready to go.
hog_crush_wrong <- read.csv(file.choose(), header=TRUE) # select Crushes Edgelist_INCORRECT.csv
Take a look at the edgeist now it is in and you will see I added a few more characters to this group: Madeye, Flitwick, McGonagal, and Voldemort. They are all listed in the “Crusher” (from) column but have no connection to anyone in the “crush” column. This makes sense, since we know little about their romances from the Harry Potter Saga.
hog_crush_wrong
## Crusher Crush
## 1 Harry Potter Ginny Weasley
## 2 Harry Potter Cho Chang
## 3 Ron Weasley Hermione Granger
## 4 Hermione Granger Ron Weasley
## 5 Ron Weasley Lavender Brown
## 6 Ginny Weasley Harry Potter
## 7 Lily Potter James Potter
## 8 James Potter Lily Potter
## 9 Severus Snape Lily Potter
## 10 Nymphadora Tonks Remus Lupin
## 11 Remus Lupin Nymphadora Tonks
## 12 Lavender Brown Ron Weasley
## 13 Cho Chang Cedric Diggory
## 14 Cho Chang Harry Potter
## 15 Cedric Diggory Cho Chang
## 16 McGonagal
## 17 Madeye
## 18 Voldemort
## 19 Flitwick
When we make this a graph object, R does something funky.
The new characters are all connected to a nameless node and it looks, on visual inspection, that they all have a crush on the smae person.
I have higlighted that node in the visualization below. The red node is nameless because the edgelist has empty (nameless) cells.
crush_wrong_net <- graph_from_data_frame(hog_crush_wrong, directed = TRUE)
plot(crush_wrong_net)
V(crush_wrong_net)$wrong <- ifelse(V(crush_wrong_net)$name %in% c(""), "red", "white")
plot(crush_wrong_net, vertex.color = V(crush_wrong_net)$wrong)
One way to deal with this is to delete the superfluous node. You do this using the delete_vertex() function. ##This fixes the issue once you have the data in Rstudio, but the issue still exists in your dataset. If you choose to structure your network data this way, you will have to remember to remvoe this node every time. This may be harder to do/realise when dealing with large dense networks.
crush_wrong_net <- delete_vertices(crush_wrong_net, "")
plot(crush_wrong_net)
Other things to do to clean a network object once in Rstudio.
You may want to add or remove nodes and vertices (nodes) from your network. Only do this if you have legitimate reason to.
Deleting Nodes. You might decide to remove one or more nodes from your network. For example, in this hogwarts dataset, we may want to remove those who are not students at Hogwarts (i.e. remove teachers or adults). To do this, you would use the delete_vertices() option
Basic - You can delete them one-by-one.
hog_crush_students <- delete_vertices(Crush_correct_net, "Voldemort")
plot(hog_crush_students)
Pro tip - if you are deleting multiple, it is worth making a vector with all the names of those you want to remove, then use the delete_vertices() command
hog_adults <- c("Severus Snape", "Lily Potter", "James Potter", "Nymphadora Tonks", "Remus Lupin", "Voldemort", "Flitwick", "McGonagal", "Madeye")
hog_crush_students <- delete_vertices(Crush_correct_net, hog_adults)
plot(hog_crush_students)
This new version removed all unwanted nodes at once.
Sometimes, you want to remove the isolated nodes from your network because you only care about those who have connections to others. To do this, you identify those with no connections (degree = 0) and them remove them from your network. I suggest making a new object with this sub network.
hog_crush_isol <- which(degree(Crush_correct_net)==0)
Now you use the delete_vertices() command and remove those in the vector you just created (those with degree = 0)
Crush_no_isol <-delete_vertices(Crush_correct_net, hog_crush_isol)
plot(Crush_no_isol)
Now this new object has only those nodes with ties to others in the network.
Use add.vertices(graph name, numberof additional vertices, attribute = )
crush_added <- add.vertices(Crush_correct_net, 1, name = "Michael Corner")
plot(crush_added)
You may want to delete edges between two nodes.
edges_to_delete <- E(Crush_correct_net)[(.from("Remus Lupin") & .to("Nymphadora Tonks"))]
Crush_edge_delete <- delete_edges(Crush_correct_net, edges_to_delete)
plot(Crush_edge_delete)
To delete all edges between two nodes
edges_to_delete2 <- E(Crush_correct_net)[(.from("Remus Lupin") & .to("Nymphadora Tonks")) | .from("Nymphadora Tonks") & .to("Remus Lupin")]
Crush_edge_delete <- delete_edges(Crush_correct_net, edges_to_delete2)
plot(Crush_edge_delete)
Use add.edges().
crush_added <- add.edges(crush_added, edges = c("Michael Corner", "Ginny Weasley"))
plot(crush_added)
Now to add the reciprocated tie
crush_added <- add.edges(crush_added, edges = c("Ginny Weasley", "Michael Corner"))
plot(crush_added)