2 Hands-On R Tutorial
Now we’re going to work in R to get some hands-on practice with the concepts we just discussed.
We’ll be working with the sf and terra libraries today, but these are not the only ways to represent spatial data in R. Roger Bivand’s review of spatial data for R is helpful for understanding the options for working with spatial data.
2.1 Download the Data
The data for this workshop is available in this Box folder. Note: you do not need to log in to Box to download the data. Use the download button in the upper right corner to download all the data as a .zip file. Remember to unzip it on your computer where you can easily access it.
The data we’ll be working with today is data related to the San Francisco Bay watershed. We have county boundaries (polygons), flowlines (lines representing rivers and streams), watershed boundaries (polygons), and watershed centroids (points). This data is an extract from the USGS’s National Hydrography Dataset.
2.2 Start R
Pick the R flavor of your choice - regular R, R Studio, etc. - and start it up. You can either add the commands we’ll be using to a script file to run, or you can just run the lines individually in your console to see how they work. It’s up to you.
2.3 Load Libraries
Let’s load the libraries we’ll need for this code. We only need to install a package once, but we do need to load the library the first time in a session that we run this code.
TIP: Make a section of your code at the top of your file for loading libraries and add to this section as you write your code and discover you need new libraries. Keep them in one place for easy reference.
install.packages("sf")
library("sf")
2.4 Working Directory
Set your working directory to the folder where you saved your data TIP: Windows users, use \ instead of or switch the direction of the slashes to /… it has to do with escape characters.
setwd("C:/Workshops/Data")
Replace C:/Workshops/Data
with the path to the folder in which you saved your data.
2.5 Vector Data
2.5.1 Load Data
Next, we’ll read the data into our R session Because “points” by itself is a command and we don’t want confusion, I’ve added “ws.” (WS for watershed) to the front of my variable names.
Most of the vector data formats can be read into r with the st_read
function. To see the help files for this function, run ?st_read
in your console.
ws.points<-st_read("WBDHU8_Points_SF.geojson")
ws.polygons<-st_read("WBDHU8_SF.geojson")
ws.streams<-st_read("flowlines.shp")
Let’s look at the contents of one of the files:
ws.polygons
Note that the output not only shows the data but gives you some metadata as well, such as the geometry type, bounding box (bbox), and the projected CRS, which is “NAD83 / California Albers”.
2.5.2 Identify the Assigned CRS
To see just the coordinate reference system, we can use the crs() command.
Let’s check the CRS for each of our files:
st_crs(ws.points)
st_crs(ws.polygons)
st_crs(ws.streams)
It looks like the streams dataset does not have its CRS defined.
2.5.3 Defining a CRS
Let’s be clear that the streams data has a CRS, but R doesn’t know what it should be. (Someone may have forgotten to include the .prj file in this shapefile.) You don’t get to decide what you want it to be, but rather figure out what it IS and tell R which coordinate system to use. How do you know which CRS the data has if its not properly defined? Typically, you first ask the person who sent it. If that fails, you can search for another version of the data online to get a file with the correct projection information. Finally, outright guessing can work, but isn’t recommended. We know that the CRS for this data should be EPSG 3309 (because that’s the projection the instructor tranformed it to before she deleted the .prj file for this exercise… shapefiles are not a great exchange format, FYI).
Set the CRS:
ws.streams.3309<-st_set_crs(ws.streams, value=3309)
EPSG 3309 = California Albers, NAD 27
Let’s imagine we loaded up our data and find that it shows up on a map in the wrong location. What happened? Someone defined the CRS incorrectly. How do you fix it? First you figure out what the CRS should be, then you run one of the lines above with the correct CRS definition to fix the file.
2.5.4 Transforming / Reprojecting Vector Data
We need to get all our data into the same projection so it will plot together on one map before we can do any kind of spatial process on the data.
# transform/reproject vector data
ws.streams.3310<-st_transform(ws.streams.3309, crs=3310)
# another option: match the CRS of the polygons data
ws.streams.3310<-st_transform(ws.streams.3309, crs=st_crs(ws.polygons))
Let’s check the CRS for each of our files again:
st_crs(ws.points)
st_crs(ws.polygons)
st_crs(ws.streams)
Now they all should match.
2.5.5 Plot Vector Data
Lets make a map now that all our data is in the same projection. What if we had tried to do this earlier before we fixed out projection definitions and transformed the files into the same projections?
Load up the CA Counties layer to use as reference in a map:
ca.counties<-st_read("data/CA_Counties.geojson")
Let’s plot all the data together:
plot(
ca.counties$geometry,
col="#FFFDEA",
border="gray",
xlim=st_bbox(ws.polygons)[1:2],
ylim=st_bbox(ws.polygons)[3:4],
bg="#dff9fd",
main = "Perennial Streams",
sub = "in the San Francisco Bay Watersheds"
)
plot(ws.streams$geometry, col="#3182bd", lwd=1.75, add=TRUE)
plot(ws.points$geometry, col="black", pch=20, cex=3, add=TRUE)
plot(ws.polygons$geometry, lwd=2, border="grey35", add=TRUE)
Some explanation of the code above for plotting the spatial data:
* xlim/ylim
sets the extent. Here I used the numbers from the bounding box of the polygon dataset, but you could put in numbers - remember that this is projected data so lat/long won’t work
* add=TRUE
makes the 2nd, 3rd, 4th, etc. datasets plot on the same map as the first dataset you plot - order matters
* col
sets the fill color for the geometry
* border
sets the outline color (or stroke for users of vector graphics programs)
* bg
sets the background color for the plot
* colors can be specified with words like “gray” or html hex codes like #dff9fd
2.6 Raster Data
We’ve just looked at how to work with the CRS of vector data. Now let’s look at raster data.
2.6.1 Load Libraries
First, we need to install the package that works with raster data. Today we’ll use terra but there are other packages that also work in similar ways.
install.packages("terra")
library(terra)
Now we’ll load our data. This is a digital elevation model (DEM) of the City of San Francisco.
dem<-rast(x="data/DEM_SF.tif")
2.6.2 Identify the CRS
What is the CRS of this dataset? Note that the command to read the CRS in the terra package is different from sf.
crs(dem)
2.6.3 Define a CRS
Our data came with a CRS, but in the event that we needed to define it, we would do it like this:
crs(dem)<-"epsg:4269"
2.6.4 Transform / Reprojecting Raster Data
So we know that our data is in the CRS with EPSG code 4269. That’s not the same CRS as our other data, so we’ll need to transform it. Again, terra uses different names than sf does. Is this confusing? Yes it is. This is why we read the documentation.
dem.3310<-project(dem, "epsg:3310")
2.6.5 Plot All The Data
Now that all of our datasets are in the same projection, we can use them together. Let’s make a map with both our raster and vector data.
plot(dem.3310, col=terrain.colors(50), axes = FALSE, legend = FALSE)
plot(ws.streams.3310$geometry, col="#3182bd", lwd=3, add=TRUE)
plot(ws.polygons$geometry, lwd=1, border="grey35", add=TRUE)