2 Hands-On R Tutorial

Now we’re going to work in R to get some hands-on practice with the concepts we just discussed.

We’ll be working with the sf and terra libraries today, but these are not the only ways to represent spatial data in R. Roger Bivand’s review of spatial data for R is helpful for understanding the options for working with spatial data.

2.1 Download the Data

The data for this workshop is available in this Box folder. Note: you do not need to log in to Box to download the data. Use the download button in the upper right corner to download all the data as a .zip file. Remember to unzip it on your computer where you can easily access it.

The data we’ll be working with today is data related to the San Francisco Bay watershed. We have county boundaries (polygons), flowlines (lines representing rivers and streams), watershed boundaries (polygons), and watershed centroids (points). This data is an extract from the USGS’s National Hydrography Dataset.

2.2 Start R

Pick the R flavor of your choice - regular R, R Studio, etc. - and start it up. You can either add the commands we’ll be using to a script file to run, or you can just run the lines individually in your console to see how they work. It’s up to you.

2.3 Load Libraries

Let’s load the libraries we’ll need for this code. We only need to install a package once, but we do need to load the library the first time in a session that we run this code.

TIP: Make a section of your code at the top of your file for loading libraries and add to this section as you write your code and discover you need new libraries. Keep them in one place for easy reference.

install.packages("sf")

library("sf")

2.4 Working Directory

Set your working directory to the folder where you saved your data TIP: Windows users, use \ instead of  or switch the direction of the slashes to /… it has to do with escape characters.

setwd("C:/Workshops/Data")

Replace C:/Workshops/Data with the path to the folder in which you saved your data.

2.5 Vector Data

2.5.1 Load Data

Next, we’ll read the data into our R session Because “points” by itself is a command and we don’t want confusion, I’ve added “ws.” (WS for watershed) to the front of my variable names.

Most of the vector data formats can be read into r with the st_read function. To see the help files for this function, run ?st_read in your console.

ws.points<-st_read("WBDHU8_Points_SF.geojson")
ws.polygons<-st_read("WBDHU8_SF.geojson")
ws.streams<-st_read("flowlines.shp")

Let’s look at the contents of one of the files:

ws.polygons

Note that the output not only shows the data but gives you some metadata as well, such as the geometry type, bounding box (bbox), and the projected CRS, which is “NAD83 / California Albers”.

2.5.2 Identify the Assigned CRS

To see just the coordinate reference system, we can use the crs() command.

Let’s check the CRS for each of our files:

st_crs(ws.points)
st_crs(ws.polygons)
st_crs(ws.streams)

It looks like the streams dataset does not have its CRS defined.

2.5.3 Defining a CRS

Let’s be clear that the streams data has a CRS, but R doesn’t know what it should be. (Someone may have forgotten to include the .prj file in this shapefile.) You don’t get to decide what you want it to be, but rather figure out what it IS and tell R which coordinate system to use. How do you know which CRS the data has if its not properly defined? Typically, you first ask the person who sent it. If that fails, you can search for another version of the data online to get a file with the correct projection information. Finally, outright guessing can work, but isn’t recommended. We know that the CRS for this data should be EPSG 3309 (because that’s the projection the instructor tranformed it to before she deleted the .prj file for this exercise… shapefiles are not a great exchange format, FYI).

Set the CRS:

ws.streams.3309<-st_set_crs(ws.streams, value=3309) 

EPSG 3309 = California Albers, NAD 27

Let’s imagine we loaded up our data and find that it shows up on a map in the wrong location. What happened? Someone defined the CRS incorrectly. How do you fix it? First you figure out what the CRS should be, then you run one of the lines above with the correct CRS definition to fix the file.

2.5.4 Transforming / Reprojecting Vector Data

We need to get all our data into the same projection so it will plot together on one map before we can do any kind of spatial process on the data.

# transform/reproject vector data
ws.streams.3310<-st_transform(ws.streams.3309, crs=3310)

#   another option: match the CRS of the polygons data
ws.streams.3310<-st_transform(ws.streams.3309, crs=st_crs(ws.polygons))

Let’s check the CRS for each of our files again:

st_crs(ws.points)
st_crs(ws.polygons)
st_crs(ws.streams)

Now they all should match.

2.5.5 Plot Vector Data

Lets make a map now that all our data is in the same projection. What if we had tried to do this earlier before we fixed out projection definitions and transformed the files into the same projections?

Load up the CA Counties layer to use as reference in a map:

ca.counties<-st_read("data/CA_Counties.geojson")

Let’s plot all the data together:

plot(
  ca.counties$geometry, 
  col="#FFFDEA", 
  border="gray", 
  xlim=st_bbox(ws.polygons)[1:2], 
  ylim=st_bbox(ws.polygons)[3:4], 
  bg="#dff9fd",  
  main = "Perennial Streams",
  sub = "in the San Francisco Bay Watersheds"
  )
plot(ws.streams$geometry, col="#3182bd", lwd=1.75, add=TRUE)
plot(ws.points$geometry, col="black", pch=20, cex=3, add=TRUE)
plot(ws.polygons$geometry, lwd=2, border="grey35", add=TRUE)

Some explanation of the code above for plotting the spatial data: * xlim/ylim sets the extent. Here I used the numbers from the bounding box of the polygon dataset, but you could put in numbers - remember that this is projected data so lat/long won’t work * add=TRUE makes the 2nd, 3rd, 4th, etc. datasets plot on the same map as the first dataset you plot - order matters * col sets the fill color for the geometry * border sets the outline color (or stroke for users of vector graphics programs) * bg sets the background color for the plot * colors can be specified with words like “gray” or html hex codes like #dff9fd

2.6 Raster Data

We’ve just looked at how to work with the CRS of vector data. Now let’s look at raster data.

2.6.1 Load Libraries

First, we need to install the package that works with raster data. Today we’ll use terra but there are other packages that also work in similar ways.

install.packages("terra")
library(terra)

Now we’ll load our data. This is a digital elevation model (DEM) of the City of San Francisco.

dem<-rast(x="data/DEM_SF.tif")

2.6.2 Identify the CRS

What is the CRS of this dataset? Note that the command to read the CRS in the terra package is different from sf.

crs(dem)

2.6.3 Define a CRS

Our data came with a CRS, but in the event that we needed to define it, we would do it like this:

crs(dem)<-"epsg:4269"

2.6.4 Transform / Reprojecting Raster Data

So we know that our data is in the CRS with EPSG code 4269. That’s not the same CRS as our other data, so we’ll need to transform it. Again, terra uses different names than sf does. Is this confusing? Yes it is. This is why we read the documentation.

dem.3310<-project(dem, "epsg:3310")

2.6.5 Plot All The Data

Now that all of our datasets are in the same projection, we can use them together. Let’s make a map with both our raster and vector data.

plot(dem.3310, col=terrain.colors(50), axes = FALSE, legend = FALSE)
plot(ws.streams.3310$geometry, col="#3182bd", lwd=3, add=TRUE)
plot(ws.polygons$geometry, lwd=1, border="grey35", add=TRUE)