2 Data
2.1 Delta Science Tracker
This workshop uses publicly available data from the Delta Stewardship Council’s (DSC) Delta Science Tracker. This database is a library of Sacramento and San Joaquin Delta-related research over the last 50 years. As the DSC’s website explains:
“The Delta Science Tracker is a tool to improve coordination and collaboration of science activities in a way that is valuable to scientists, decision-makers, and managers in the Delta. The Delta Science Tracker allows users to explore and visualize recent research and monitoring efforts through the lenses of science activities, organizations, and people. It is intended to promote communication, create opportunities for collaboration, and enhance transparency for science funding opportunities and decisions.”
The data we’re using was downloaded from the DSC’s visualizations page, with the ‘collaborations by’ filter set to organizations. Additional data were obtained through scraping their webpage, which we did with permission from the DSC. Code for scraping their webpage can be found here.
2.2 Data descriptions
Before getting started with visualizations, we want to take a look at the data we have available. Typical questions you should be able to answer before getting started are as follows: What types of nodes do we have and how are they related? What node and/or edge attributes are available? Are the network data one or two mode? Directed or undirected? Let’s take a look at what we have.
Two-mode edge list: The DSC data was originally compiled as two-mode network data, describing how organizations/researchers are connected to research projects. In these data, our first mode, org_id
, is the organization(s) (agencies, universities, etc) that are working on projects in the Delta. And our second mode, project_id
, are the Delta science research projects themselves. This two-mode edge list has 680 rows, whereby 138 organizations are connected with one or more of 297 projects. In these data, organizations often take part in more than one project, and projects often have more than one organizations associated with it. We also have edge attributes, which represent the kind of role an organization has in the project: contributor or leader.
<- read.csv('https://raw.githubusercontent.com/ucdavisdatalab/workshop_network_viz/main/data/edgelist_twomode.csv')
edges_2mode dim(edges_2mode)
## [1] 680 3
head(edges_2mode)
## org_id project_id collaborative_role
## 1 49590 50122 Contributor
## 2 49592 49780 Contributor
## 3 49592 49865 Contributor
## 4 49594 49860 Contributor
## 5 49598 50091 Contributor
## 6 49600 50170 Contributor
Node list for two-mode network: We also have a list of all the nodes (organizations and projects), each of which has different attributes. We have more node attributes for project nodes than we do for organization nodes, so many of the attributes for organizations are empty. For projects, we have funding data, the dates of the project, as well as a series of categorical tags related to what kinds of scientific and management themes the project addresses. A sample of those are displayed below:
<- read.csv('https://raw.githubusercontent.com/ucdavisdatalab/workshop_network_viz/main/data/nodelist_twomode.csv')
nodes_2mode head(nodes_2mode[,c(1,4,6,16,31,39)])
## id funds startdate sci_Channelizedhabitats sci_Amphibiansandreptiles
## 1 50122 $0 2008 FALSE FALSE
## 2 49780 $0 2020 FALSE FALSE
## 3 49865 $445,921 2019 FALSE FALSE
## 4 49860 $931,104 2019 FALSE FALSE
## 5 50091 $0 NA FALSE FALSE
## 6 50170 $0 NA FALSE FALSE
## mgmt_Governance
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
Note that the scientific and management themes are formatted as logical values for each theme. This is because projects can have more than one theme, and so listing them out would cause our data to be ‘long’ with repeated project observations. With node lists, we want to only have unique observations. If your data are ‘long’ (with repeating observations), widen them out as we did with the scientific and management themes.
One-mode edge list: We projected our two-mode network to a one-mode network to take a deeper look at research collaborations. Projection of two-mode data into one-mode is a common practice, though it has its limitations (Jasny 2012)2. By converting two-mode into one-mode data, we can take a look at what organizations are working together on projects. Looking at the dimensions of our data, we have 475 collaborative ties in the network. We’ve also added in an edge attribute to our collaboration tie based on the years that different actors collaborated together. Reveal below for more on how to project one-mode data from two-mode data.
<- read.csv('https://raw.githubusercontent.com/ucdavisdatalab/workshop_network_viz/main/data/edgelist_onemode_projected.csv')
edges_1mode dim(edges_1mode)
## [1] 475 6
head(edges_1mode)
## from_org_id to_org_id Y1995_2009 Y2010_2024 before_1980 Y1980_1994
## 1 49590 49709 TRUE FALSE FALSE FALSE
## 2 49592 49708 FALSE TRUE FALSE FALSE
## 3 49592 49741 FALSE TRUE FALSE FALSE
## 4 49592 49609 FALSE TRUE FALSE FALSE
## 5 49708 49741 FALSE TRUE FALSE FALSE
## 6 49609 49708 FALSE TRUE FALSE FALSE
Reveal: How to project one-mode networks from two-mode edge list
# We got our projected data directly from the DST webpage, but generally,
# you can project any bipartite network into a one-mode network using the
# following steps:
# 1. Create count table where organizations are columns (make your mode of
# interest the 2nd/y argument in the table)
<- table(edges_2mode[c(2,1)])
tbl # 2. Extract column names -- these are mode 1 names (in our case, orgs)
<- colnames(tbl)
orgids # 3. Take the cross-product of the table to get a co-occurence matrix
<- crossprod(tbl)
comat # 4. Assign self-co-occurrences 0
diag(comat) <- 0
# Check: This should be symmetrical
isSymmetric(comat)
# 5. Make a data frame from the matrix and add ids as the column names
<- data.frame(comat)
comat colnames(comat) <- orgids
#. Check: This should be square
dim(comat)
# 6. Remove those with no co-occurrence at all
<- comat[rowSums(comat, na.rm = T) != 0,
comat colSums(comat, na.rm = T) != 0]
# 7. Make co-occur data frame into matrix object
<- as.matrix(comat)
comat # 8. Create a graph so that it can be converted into a weighted edge list
<- igraph::graph_from_adjacency_matrix(comat, weighted = T, mode = 'undirected')
g <- igraph::get.data.frame(g) el_proj
Node list for one-mode network: We reduced our two-mode node list down to only a list of our organization nodes, which is what we project in our one-mode network. For organizations, we have very few attributes, only their id, name, and link to their DST page.
<- read.csv('https://raw.githubusercontent.com/ucdavisdatalab/workshop_network_viz/main/data/nodelist_onemode.csv')
nodes_1mode head(nodes_1mode[c(1:2,5)])
## id name
## 1 49590 Agricultural Coalitions: Landowners membership fees
## 2 49592 Anchor QEA
## 3 49593 Audubon Canyon Ranch
## 4 49594 Bachand and Associates
## 5 49598 BTS
## 6 49600 CalFish
## org_type
## 1 NGO
## 2 Consultant
## 3 NGO
## 4 Consultant
## 5 Federal Government
## 6 State & Local Government
Jasny, Lorien. “Baseline Models for Two-Mode Social Network Data.” Policy Studies Journal 40, no. 3 (2012): 458–91. https://doi.org/10.1111/j.1541-0072.2012.00461.x.↩︎