NCEAS Scientific Computing: Solutions Center Use Case

Sampling Vector Polygon Data using a Point Grid Overlay

Using elements of the R sp package to extract attribute values at regular intervals from a vector polygon map using a regularly-spaced point grid


Bailey's Ecoregions for the continental United States delineate areas of common climactic and vegetation characteristics across the continent. They are available as a digital spatial dataset (various formats) from from the United States National Atlas data portal. An NCEAS scientist wished to generate a regularly-spaced grid covering the entire map, and then assign the underlying Ecoregion code to each grid location. Further, they wished to vary the spatial resolution of this grid order to create sample the polygon data at different spatial frequencies. A straightforward geospatial processing task, that until the advent of the R geospatial processing tools, required a full-scale Geographic Information System (GIS) to perform. Now, we can produce the required results using functions in the sp and PBSMapping packages.

In this Use Case we will work through the process of reading, computing, and displaying the input Bailey's Ecoregions file and the spatial datasets derived from this map.

Click here to download the source code and sample data used in this example.

Read and Display Vector Polygons: Bailey's Ecoregions Map

On the left is a display of the Bailey's Ecoregions polygons. On the right is the R code used to set up the R graphics environment and display the regions in map form. Note: The PBSMapping package methods are tailored to the display of multi-layer spatial data sets in that it is easy to 'stack' calls to the routines to generate a single multi-dataset (e.g., points-over-polygons) diagrams. In addition, they provide axis labels and map 'tic-marks' by default. There is a tradeoff: The map data need to be converted into PBSMapping-specific objects (e.g., the PolySet) before creating the diagram. PBSMapping provides methods for making this conversion, of course.

Baileys EcoRegion map
 SampleBaileysEcoRegions <- function()
{
   library(sp)
   library(PBSMapping)
# 
# Read ESRI Shape File into SpatialPolygonsDataFrame
# This produces a SpatialPolygonsDataFrame.
#
   EcoRegion = readShapePoly("BaileysEcoRegionsConusGP")
#
# Use PBSMapping routines to generate nice spatial object plots
# Note: This requires data conversion to a PBSMapping PolySet. 
#                      
   EcoRegionPS = SpatialPolygons2PolySet(EcoRegion)
#
   plotPolys(EcoRegionPS,xlab="W Longitude",ylab="N Latitude")
#

Extract and Use Subset of Baileys Ecoregion Attributes for Use Case

The Baileys Ecoregion polygons have nine descriptive attributes. We will extract three - ECO_US, ECO_US_ID, ECO_CODE - for use in this example. The ECO_CODE attribute will be sampled using the point grid that we will create in the next step. On the right is the R script, on the left is a portion of the reduced Data component

'data.frame':   1690 obs. of  3 variables:
 $ ECO_US_  : num  1246 1247 1248 1249 1250 ...
 $ ECO_US_ID: num   135  136 1402   15 1403 ...
 $ ECOCODE  : Factor w/ 164 levels "-212A","-212B",..: 1 116 51...

   ECO_US_ ECO_US_ID ECOCODE
0     1246       135   -212A
1     1247       136   M212A
2     1248      1402   -242A
3     1249        15   -242A
4     1250      1403   -242A
5     1251        14   M242B
6     1252      1404   -242A
7     1253      1405   -242A
8     1254         1   M242A
9     1255      1406   -242A
10    1256      1407   -242A
	
#
# Make a copy of the Spatial Polygon file's Data component, 
# including ONLY columns 3 through 5. Then, create a copy
# of the Spatial Polygon file, using the new data subset.
#
   NewData = EcoRegion@data[,3:5]
   EcoRegionTrunc = EcoRegion
   EcoRegionTrunc@data = NewData
   str(NewData)
   NewData[1:10,]
#

Create the Sampling Point Grid

Next, use the sp package tools to generate two sampling grids with two degree and one degree spatial resolution.On the right is the R script, on the left are (top to bottom): the two-degree grid, the two-degree grid overlaid on the Bailey's Ecoregions polygons, the one-degree grid overlaid on the Baley's Ecoregions, and a display (using the ESRI ArcMap GIS Identify feature) of a single grid point value.

Baileys EcoRegion map
#
   vals = EcoRegionTrunc@bbox
   deltaLong = as.integer((vals[1,2] - vals[1,1]) + 1.5)
   deltaLat  = as.integer((vals[2,2] - vals[2,1]) + 1.5)
#
# grid resolution in decimal degress.
#
   gridRes = 1.0
#   
   gridSizeX = deltaLong / gridRes
   gridSizeY = deltaLat / gridRes
#
# GridTopology object forms the basis for the sampling grid        
#
   GridTopoOneDeg = GridTopology(vals[,1],c(gridRes,gridRes),
                                 c(gridSizeX,gridSizeY))
#
# Create the SpatialGrid object....
#
   SamplingGridOneDeg = SpatialGrid(GridTopoOneDeg)
#
# NOTE: To create the one-half Deg Spatial Grid, 
# change the parameter 'gridRes', and use the  
# same code subset displayed here...
#  
  gridRes = 2.0 
# 
# repeat above code.....
#
# ......
#
  SamplingGridTwoDeg = SpatialGrid(GridTopoTwoDeg)  
#                                                                      
# We get a more suitable output from the overlay() method    
# if we use a SpatialPoint object as an input, so promote
# the SpatialGrid object to one. 
#
   SamplingPointsTwoDeg = as(SamplingGridTwoDeg,"SpatialPoints")
#
# plot the two kilometer grid on top of the base map,
# then create a new plot to display the two-and one-degree grids
# Note: we convert the points to PBSMapping-compatible EventData 
# to allow use of addPoints() method.
#
   NumPoints = length(SamplingPointsTwoDeg@coords[,1])
   TwoDegEvents <- as.EventData(data.frame(EID=1:NumPoints,X=SamplingPointsTwoDeg@coords[,1],
                               Y=SamplingPointsTwoDeg@coords[,2]),projection = "NA")   
   addPoints(TwoDegEvents,col="green",pch = 20)       
# 
# plot the two-degree grid by itself   
#
   dev.set(1)
   plotPoints(TwoDegEvents, col = "green", pch = 20)
   
#   
   NumPoints = length(SamplingPointsTwoDeg@coords[,1])
   OneDegEvents <- as.EventData(data.frame(EID=1:NumPoints,X=SamplingPointsOneDeg@coords[,1],
                                Y=SamplingPointsOneDeg@coords[,2]),projection = "NA")           
#
# create a second plot in new graphics window to display the One Degree grid
#   
   dev.set(1)   
   plotPolys(EcoRegionPS, xlab = "W Longitude", ylab = "N Latitude")
   addPoints(OneDegEvents,col="red",pch=20)     
#
# Use the overlay() method to extract a polygon attribute value 
# at each point location within the polygon data set boundary
#
   SamplingResultsDFTwoDeg = overlay(EcoRegionTrunc,SamplingPointsTwoDeg)
#   
#
# Create a point shape file to contain the 
# sampling results. The attribute table will include
# all three Bailey's measures along with the corresponding
# point location. 
# Make some modifications to the overlay results data frame 
# so that the shape file has usable attributes. 
#
# First, add the sampling point coordinates
# 
   Temp2 = cbind(SamplingPointsTwoDeg@coords,SamplingResultsDFTwoDeg)
#
# Next, convert the ECOCODE attribute from a factor to a character string
# 
   ecoCode = as.character(Temp1$ECOCODE)             
   flags = is.na(ecoCode)
   ecoCode[flags] = "NoMatch"
   OutputDataFrame = cbind(Temp2[1:4],ecoCode)
#
# Finally, generate a point shape file that includes the EcoRegion attribute.
#   
   write.pointShape(SamplingPointsTwoDeg@coords,OutputDataFrame,"BaileyEcoRegionGridTwoDeg")
#
Baileys EcoRegion / Two Degree grid
Baileys EcoRegion map / One Degree grid
Baileys EcoRegion Point ID map / One Degree grid

Create new point Shape File, then read and display the file

Lets create an output file consisting ONLY of grid points falling within the United States boundary. First, perform a Point-In-Polygon operation using the PBSMapping findPolys() method to assign the bounding polygon to each grid point.

  1. 1. Perform a Point-In-Polygon operation.
  2. 2. Write the point subset into a Shape File.
  3. 3. Read in and display the points in the Shape File.
  4. 4. Take a look at the R data object structure for both Point types.

Here is the R script, and the resulting plot:

Baileys Grid Points Clipped map
#
# Bind the Bailey's EcoCodes to the points in the
# entire two-degree grid.
#
   TwoDegEventsWithCodes = cbind(TwoDegEvents,ecoCode) 	
#
#  Extract ONLY grid points within the USA boundaries. 
#
   PointsInsideRegions = findPolys(TwoDegEventsWithCodes,EcoRegionPS)
   PointIndexesInsideRegions = sort(unique(PointsInsideRegions$EID))
   PointsInsideRegions = TwoDegEvents[PointIndexesInsideRegions,]
#
# write a new point shape file
#   
   write.pointShape(cbind(PointsInsideRegions$X,PointsInsideRegions$Y),
	                 PointsInsideRegions,"BaileyEcoRegionGridClipped")
#
# look at the structure of the point data object 
# before and after conversion to Events
#
   TwoDegInUSA = readShapePoints("BaileyEcoRegionGridClipped")
#   
   NumPoints = length(TwoDegInUSA@data[,1])
   PointEvents <- as.EventData(data.frame(EID=1:NumPoints ,X=TwoDegInUSA@coords[,1],
                                        Y=TwoDegInUSA@coords[,2]))# ,projection = "NA")
#   
# Make a new plot
#   
   addPoints(TwoDegEvents,col="orange",pch = 20)
   plotPolys(EcoRegionPS, xlab = "W Longitude", ylab = "N Latitude")   
   addPoints(PointEvents,col="red",pch = 20)   
#				
# Look at the structure of the point data object 
# before and after conversion to Events
#						
   str(PointEvents)
#
   str(TwoDegInUSA)   
#
# End of example!
#       
#
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 206 obs. of  4 variables:
  .. ..$ EID    : num [1:206] 3 4 5 6 7 8 9 10 11 12 ...
  .. ..$ X      : num [1:206] -121 -119 -117 -115 -113 ...
  .. ..$ Y      : num [1:206] 48.5 48.5 48.5 48.5 48.5 ...
  .. ..$ ecoCode: Factor w/ 114 levels "-212F","-212H",..: 88 111 ...
  .. ..- attr(*, "data_types")= chr [1:4] "N" "N" "N" "C"
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:206, 1:2] -121 -119 -117 -115 -113 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  ..@ bbox       : num [1:2, 1:2] -122.8   26.5  -68.8   48.5
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
  .. .. ..@ projargs: chr NA
#
# Note that EventData class does not retain the point attributes
#
Classes ‘EventData’ and 'data.frame':   206 obs. of  3 variables:
 $ EID: int  1 2 3 4 5 6 7 8 9 10 ...
 $ X  : num  -121 -119 -117 -115 -113 ...
 $ Y  : num  48.5 48.5 48.5 48.5 48.5 ...
#

Learning More:

The USFS Ecoregions Center: US Forest Service Natural Resource Assessment and Analysis Ecoregions Center web site

The CRAN (R) Task View summarizes the R Analytical Environment's spatial data analysis and management tools..

This ESRI White Paper describes the Shape File format.

This Web Site describes the standard .dbf format which is a component of the ESRI Shape File.