1  Introduction

R is probably best used for cleaning and analyzing spatial data. Making good quality maps, however, takes a different set of skills.

1.1 Data Set

If you haven’t already, download the workshop data from the online repository. If you downloaded a .zip file, unzip the data to a folder you can easily find.

The data we have to work with today is:

  • Lake Monsters - LakeMonsters.gpkg - locations of lake monsters; global distribution
  • Lakes - Lakes_GreatLakes-Area.gpkg - a clip of the Natural Earth Data lakes dataset for the Great Lakes and areas adjacent
  • States - US_CAN_Admin1.gpkg - a clip of the Natural Earth Data states and provinces data for the US and Canada (I’m going to refer to these as “states” for simplicity, but I want to acknowledge that this includes Canadian provinces as well.)

Geopackage (.gpkg) is a single file, open vector format. We’re using it today because it’s one file per dataset (unlike Shapefile), which makes data management so much easier. See the README.txt file that comes with the data download for more details and sources of the data.

1.1.1 Data Processing

We’ll be working with an international dataset of locations of lake monsters, the most famous of which is arguably Nessie who supposedly lives in Loch Ness in Scotland. This dataset was assembled from Wikipedia’s List of Lake Monsters. The lake names were geocoded (you can find the R script that I wrote to process the data in the r_scripts folder of this repo, corrected, then exported to a geopackage file. Why did I process this data for you? It took a few hours to do and requires skills we are not focusing on in this workshop.

1.2 Map Details

Let’s pretend we want to a map for a journal article about the underpinnings of the distribution of cryptozoology sightings in the northeastern US and southeastern Canada, focusing on lake monsters, creatures reported to live in lakes that are mainly known from folklore and typically take on the the shape of extinct or extraordinarlily large living reptiles. You want your readers to understand the relationship between the monster locations and also their location on the planet. In our imaginary scenario, we plan to submit our paper to one of the Nature journals.

What’s the story? What should readers learn from this map? What data do you need to tell that story effectively?

The story I plan to tell is: where are these monsters reported to live in the US Great Lakes area? What are their names? For reference, what lakes and states or provinces are they in? The story you want to tell might be different, so feel free to make adjustments as needed.

1.3 Load Spatial Data

Open RStudio and create a new script file. Save it in your folder for this workshop.

Set your working directory to your workshop folder

    # set your working directory
    
    setwd("path/to/your/workshop/folder/workshop_maps_in_r")

Load the libraries that we will need. If you have not installed these packages yet, you can do so with the command install.packages("PackageName")

    # load libraries
    
    library(sf)     # for working with vector spatial data
    library(terra)  # for working with raster spatial data

Let’s load the data we’ll need.

    # read spatial data
    
    monsters <- st_read("data/LakeMonsters.gpkg")
Reading layer `LakeMonsters' from data source 
  `C:\Users\mmtobias\Documents\GitHub\workshop_maps_in_R\data\LakeMonsters.gpkg' 
  using driver `GPKG'
Simple feature collection with 55 features and 9 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -123.4696 ymin: -41.00612 xmax: 144.3105 ymax: 65.35767
Geodetic CRS:  WGS 84
    lakes <- st_read("data/Lakes_GreatLakes-Area.gpkg")
Reading layer `Lakes_GreatLakes-Area' from data source 
  `C:\Users\mmtobias\Documents\GitHub\workshop_maps_in_R\data\Lakes_GreatLakes-Area.gpkg' 
  using driver `GPKG'
Simple feature collection with 110 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -101.2215 ymin: 35.93167 xmax: -64.94982 ymax: 54.41538
Geodetic CRS:  WGS 84
    states <- st_read("data/US_CAN_Admin1.gpkg")
Reading layer `US_CAN_Admin1' from data source 
  `C:\Users\mmtobias\Documents\GitHub\workshop_maps_in_R\data\US_CAN_Admin1.gpkg' 
  using driver `GPKG'
Simple feature collection with 64 features and 83 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -178.1945 ymin: 18.96391 xmax: -52.65365 ymax: 83.11611
Geodetic CRS:  WGS 84
    dem <- rast("data/dem.tif")