Polygon Overlay Operations using R
This Use Case demonstrates how to combine polygon data layers to create (and analyze) new geospatial layers. This case also provides a brief overview of the R spatial data object hierarchy provided by the sp class, as well as the implementation of the PROJ.4 map projection library within R.
Getting started:
Polygon Overlay (sometimes called Topological Overlay) operations determine the spatial coincidence (if any) of two polygon data layers, or between polygon and point usually creating a new polygon layer in the process. Polygon overlay techniques are often used by field scientists to explore the relationships between spatial attributes, stored as layers in a geophysical data model. Examples of polygon data are: species geographic range ranges, biomes, and watersheds. Examples of point data are: species sightings, breeding pair nest locations, or measurement stations.
Three widely used polygon overlay operations are:
- Intersection (logical AND): The common or shared area between two overlapping polygons.
- Point-in-Polygon (logical AND): Between a point and polygon layer, the subset of points located within the polygon boundary.
- Union (logical OR): The sum of the areas of two overlapping polygons.
Until recently, polygon overlay operations required specialized Geographic Information Systems software. However, the advent of more powerful desktop computers and open-source, scientific software that include geospatial analysis tools (such as R) has changed this.
This use case demonstrates polygon overlay analysis using R applications packages which employ the core R geospatial data classes.
This case uses the following R packages
- maptools: Shapefile input
and conversion to R spatial data types; simple map display
- PBSmapping: Geospatial layer creation and analysis, including polygon overlay
- rgdal: map (polygon and point layer) projection services
- sp: core geospatial data object classes
Review: Core R geospatial data types
Field scientists often measure phenomena that are distributed throughout the two (and sometimes three) dimensional space of the Earth surface (and atmosphere). From a spatial analysis perspective, these phenomena can be placed into four categories: Points (e.g., species sightings). lines (e.g., road, stream. or migration routes), polygons (e.g., geographic range areas, biomes, or conservation districts), and grids (e.g., climate measurements interpolated into a regularly-spaced matrix across a region). R geospatial packages include data types that are based on the core Data Frame, and augmented to store shape, locations, and map projection attributes:
The sp package provides a hierarchy of data objects to contain point, line, polygon, and grid data types. sp and other R classes that incorporate sp provide core geospatial operations for point-in-polygon and polygon overlay. However, as of June 2007, there are no similar operations for line data types. sp contains the following spatial data classes:
Spatial: Foundation class for sp spatial objects: includes objects to contractor projection and region 'bounding box' attributes
Polygon / Point / Lines / Grid Topology: Foundation classes store the basic graphical coordinates of the respective objects
Consider the complete class hierarchy using the polygon data type:
- Polygon Contains the basic graphical attributes for a single polygon object
- Polygons: A collection (list) of Polygon objects
- SpatialPolygons: The Polygons class with added Spatial class attributes
- SpatialPolygonsDataFrame: The SpatialPolygons class with added Data Frame class to store descriptive attributes for each polygon.
The PBSmapping package also includes a set of spatial data classes, based upon the R data frame, that support this package's point and polygon analysis methods.
- EventData: Contains point data objects.
- LocationSet:Specialized data frame contains point locations occurring within a specific polygon .
- PolyData Contain polygon locations
- PolySet: A collection of PolyData objects.
The PBSMapping package contains a comprehensive set of routines for point and polygon analysis and includes methods that convert the Spatial Polygon objects into PolySet objects..
The Fundamentals: R Polygon Overlay functions
A typical polygon overlay analysis has four steps:
- 1) Import the polygon data files into R data structures.
- 2) As necessary, transform all of the polygon and point data sets into a single, common map projection.
- 3) Conduct the overlay analysis, probably creating a NEW set of polygons and/or point attributes indicating each point's bounding polygon.
- 4) Compute summary statistics (e.g., polygon areas or point-in-polygon counts) for the new polygons.
Following are the R functions, included in the sp, maptools, rgdal and PBSmapping packages, used to perform overlay analysis This gives those seeking a quick tutorial a summary of the overlay process. For an in-depth discussion, see the end-to-end example which answers five typical polygon overlay questions (below):
#
# First, load the required R packages.
#
library(maptools) # for geospatial services; automatically loads foreign and sp
library(rgdal) # for map projection support; automatically loads sp
library(PBSmapping) # for GIS_like geospatial object manipulation / analysis including poly
#
# Step 1: Read in vector polygon and point files in ESRI Shape File format:
#
InputPolygons = readShapePoly("PolyShapefileName",proj4string=CRS("Map Projection Attributes"))
InputPoints = readShapePoints("PointShapefileName",proj4string=CRS("Map Projection Attributes"))
#
# Step 2 (if needed): Transform a polygon file into a new map projection
#
ProjectedPolygons = spTransform(InputPolygons,proj4string=CRS("Map Projection Attributes"))
#
# Step 3: Example Polygon Overlay operations
#
# Plot the projected polygon set with a border and labeled axes
#
plotPolys(ProjectedPolygons,proj = TRUE, col = "green",xlab = "longitude", ylab = "latitude")
#
# Extract and plot (no border) a single polygon from a SpatialPolygonsDataFrame object containing South America country outlines
#
BrazilProj = SouthAmericaPolygons[SouthAmBaseProj$COUNTRY == "BRASIL",]
plot(BrazilProj)
#
# Convert the SpatialPolygonDataFrame into a PolySet object for compatibility with the so that PBSMapping package's geospatial functions
#
ProjPolygonsForPBS = SpatialPolygons2PolySet(ProjectedPolygons)
#
# Compute Union and Intersection of two polygon collections in PolySet format
PolygonUnion = joinPolys(FirstPolygonsInPBS,SecondPolygonsInPBS,"UNION")
PolygonIntersect = joinPolys(FirstPolygonsInPBS,SecondPolygonsInPBS,"INT")
BrazilCB = combinePolys(BrazilProjPS) # union only
#
# Point-in-Polygon: which polygons contain one or more members of a second point data layer?
#
# PBSMapping treats point data differently; so we convert the point data frame to an EventData object.
#
nSightings = length(HabitatL2@data[,1])
SpeciesSightingEventSet <- as.EventData(data.frame(EID=1:nSightings ,X=HabitatL2PointProj@coords[,1],
Y=HabitatL2PointProj@coords[,2]),projection = "LL")
#
# find which 'events' (points) in the point set are bounded by the combined Brazil polygon
#
PointsAroundHab = findPolys(SpeciesSightingEventSet,BrazilCB)
#
# Create a map showing points and polygons
#
plotPolys(SouthAmBaseProjPS,proj = TRUE,col="wheat1",xlab="longitude",ylab="latitude")
addPolys(HabitatL2ProjPS,col="green4")
addPoints(SpeciesSightingEventSet,col="orange",pch=19)
#
# Step 4: Compute Statistics
#
print(sprintf("There are %d recorded Greater Armadillo sightings.",nSightings )) # point counts are easy.
#
# Calculate Polygon Areas
#
AreasOfResult = calcArea(OverlayResult) # Area of a single polygon
#
AllHabitatArea = sum(AreaHabInt[1:length(AreaHabInt[,1]),2]) # Area of all polygons in a set
#
To download a .zip file containing the R script and input files for this example, click here.
To demonstrate polygon R polygon overlay techniques we perform basic measurement and analysis using geographic range maps for South American mammals in ESRI Shapefile format, acquired from the NatureServe conservation organization. We will use as our base map an ESRI Shapefile containing polygons for the South American continent, obtained from the Peruvian International Potato Center (CIP) DIVA GIS Server.
First, we will import and display the
sample data sets. Then, answer five spatial analysis questions that illustrate polygon overlay analyses techniques.
Step 1: import, project. and display base map / range map polygons /species sighting location points
Base map of South America, geographic range ranges for three mammal species, and specimen sighting locations for one of the species. Read the shape files into R spatial data objects. The shapefile coordinates are in simple unprojected latitude/longitudes. Since we are interested in estimating geographic range areas, we will apply the Albers Equal-Area Projection to the files. The R spatial classes employ the PROJ.4 map projection library, which uses standard character strings to specify map projection parameters. see code comments. Here is the first section of code:
Here are the four geographic range layers, followed by the R code that reads and display them:
 |
 |
 |
| South America Polygons |
Base Map (green = Brazil) |
Southern Armadillo Geographic Range |
 |
 |
 |
| Vesper Mouse Range |
Greater Armadillo Range |
Greater Armadillo Sightings |
DemoDriver <- function()
{
# debug(PolyOverlayDemo)
PolyOverlayDemo()
}
PolyOverlayDemo <- function()
{
#
# the latest version of the poly overlay demo
#
# attr(BrazilArmHabCB, "projection") <- "LL"
#
# first, load the R packages that we will need
#
library(maptools) # for geospatial services; also loads foreign and sp
library(rgdal) # for map projection work; also loads sp
library(PBSmapping) # for GIS_like geospatial object manipulation / analysis including poly
#
# Use Maptools routine to ingest the polygon shape files: South America, the mammal geographic range range
# polygon files, and the mammal sighting location point file
#
SouthAmBase = readShapePoly("SouthAmericaStates",proj4string=CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
#
# class method reveals that readShapePoly returns a SpatialPolygonsDataFrame object
#
# class(SouthAmBase)
#
# Plot the South America base map with the Use the core R graphics plot() routine. Note lack of
# a bounding box, coordinate axes, etc. Just drawing a collection of polygons here.
#
plot(SouthAmBase)
print("here is the South America base map.....Hit key to continue")
browser()
print("Projecting base map.....")
#
# Project the base map into the Albers Equal Area projection, and convert the resulting file
# into a PolySet data type to make it compatible with the PBSmapping package plot routines.
# The Coordinate Reference System (CRS) string contains commands to PROJ.4 projection library.
#
SouthAmBaseProj = spTransform(SouthAmBase,CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
SouthAmBaseProjPS = SpatialPolygons2PolySet(SouthAmBaseProj)
#
# Redraw the base map with a more 'map-like' context. using the PBSMapping routine plotPolys()
# Note: the R function colors() prints a list of 600+ R color shades to use....
#
plotPolys(SouthAmBaseProjPS,proj = TRUE,col="wheat1",xlab="longitude",ylab="latitude")
#
# NEXT, Select only the polygon for the 'Brazil' polygon, then display its outline.
# names() routine call lists the polygon names.
#
# names(SouthAmBaseProj)
#
BrazilProj = SouthAmBaseProj[SouthAmBaseProj$COUNTRY == "BRASIL",]
BrazilProjPS = SpatialPolygons2PolySet(BrazilProj)
par(lwd = 2) # line width of 2, thicker than default of 1
addPolys(BrazilProjPS ,border="green")
#
# PBSmapping method calcArea calculates area of the polygon(s) that comprise 'Brasil'.
# Note that when there are > 1 polygons, we need to sum the areas of the components.
#
# specify coordinate projection units for area calculations
#
attr(BrazilProjPS, "projection") <- "LL"
AreaBrazil = calcArea(BrazilProjPS,rollup=1)
BrazilArea = sum(AreaBrazil[1:length(AreaBrazil[,1]),2])
print(sprintf("Brazil area (green outline): %g sq km. Hit key to continue",BrazilArea))
browser()
#
# Now, we work with the geographic range zones for four three species:
#
# 1) Southern Armadillo
# 2) Greater Armadillo
# 3) Vesper Mouse
#
# We also have a set of species sighting locations for the Greater Armadillo.
#
# Read in the species geographic range (polygons) and species sighting (points) spatial layers
# for use in demonstrating polygon overlay. Set their projection attributes to the same
# as the base map.
#
# NOTE: We will display the geographic range areas as overlays on the base map as they are created.
#
plotPolys(SouthAmBaseProjPS,proj = TRUE,col="wheat1",xlab="longitude",ylab="latitude")
#
HabitatL1A = readShapePoly("SouthernArmadilloPoly",proj4string=CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
HabitatL1Proj = spTransform(HabitatL1A,CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
#
HabitatL1ProjPS = SpatialPolygons2PolySet(HabitatL1Proj)
#
print("Press to view Southern Armadillo geographic range ..")
browser()
addPolys(HabitatL1ProjPS,col="red")
#
HabitatL2 = readShapePoly("GreaterArmadilloPoly",proj4string=CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
HabitatL2Proj = spTransform(HabitatL2,CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
#
HabitatL2ProjPS = SpatialPolygons2PolySet(HabitatL2Proj)
print("Press to view Greater Armadillo geographic range ..")
browser()
addPolys(HabitatL2ProjPS,col="green4")
#
HabitatL3 = readShapePoly("VesperMousePoly",proj4string=CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
HabitatL3Proj = spTransform(HabitatL3,CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
#
HabitatL3ProjPS= SpatialPolygons2PolySet(HabitatL3Proj)
print("Press to view Vesper Mouse geographic range ..")
browser()
addPolys(HabitatL3ProjPS,col="blue1")
#
# Create and plot the point file last.
#
HabitatL2B = readShapePoints("GreaterArmadilloPts",proj4string=CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
HabitatL2PointProj = spTransform(HabitatL2B,CRS("+proj=aea +ellps=GRS80 +datum=WGS84"))
#
# PBSMapping treats point data differently; we convert the point data frame to an EventData object.
#
# We create one using the @coords slot from the FiveStateCtyCentr SpatialPointsDataFrame object.
#
nSightings = length(HabitatL2PointProj@data[,1])
SpeciesSightingEventSet <- as.EventData(data.frame(EID=1:nSightings ,X=HabitatL2PointProj@coords[,1],
# Y=HabitatL2PointProj@coords[,2]))
Y=HabitatL2PointProj@coords[,2]),projection = "LL")
print(sprintf("There are %d recorded Greater Armadillo sightings.",nSightings ))
print("Press to view Species sighting points..")
browser()
addPoints(SpeciesSightingEventSet,col="orange",pch=19)
#
# OK, there is all of the data, move on to the polygon overly operations
#
Let's answer five sample questions about our South American mammal geographic range . Click the questions to move directly to the question's discussion.
The answers, followed by the relevant R code, follow....
 |
|
|
| All Layers |
|
|
back to questions
Southern Naked-Tailed Armadillo geographic range area (Red): 9.81511e+06 sq km
Vesper Mouse geographic range area (Blue): 981119 sq km
Greater Naked-Tailed Armadillo geographic range area (Green): 3.87778e+06 sq km
There are 24 recorded Greater Armadillo sightings (Orange).
#
# Question 1: Area of each geographic range -
#
attr(HabitatL1ProjPS, "projection") <- "LL"
AreaHab1 = calcArea(HabitatL1ProjPS,rollup=1)
#
# to get the total area: sum the multiple polygons that make up the geographic range
#
SouthernArmadilloHabArea= sum(AreaHab1[1:length(AreaHab1[,1]),2])
print(sprintf("Southern Naked-Tailed Armadillo geographic range area (Red): %g sq km",SouthernArmadilloHabArea))
attr(HabitatL2ProjPS, "projection") <- "LL"
AreaHab2 = calcArea(HabitatL2ProjPS,rollup=1)
GreaterArmadilloHabArea = sum(AreaHab2[1:length(AreaHab2[,1]),2])
print(sprintf("Greater Naked-Tailed Armadillo geographic range area (blue): %g sq km",GreaterArmadilloHabArea))
#
attr(HabitatL3ProjPS, "projection") <- "LL"
AreaHab3 = calcArea(HabitatL3ProjPS,rollup=1)
VesperMouseHabArea = sum(AreaHab3[1:length(AreaHab3[,1]),2])
print(sprintf("Vesper Mouse geographic range area (Green): %g sq km",VesperMouseHabArea))
#
print(sprintf("There are %d recorded Greater Armadillo sightings (Orange).",nSightings))
#
print("Press to demonstrate polygon overlay and analysis..")
browser()
back to questions
 |
 |
|
| All Armadillo |
All Mammal |
|
Calculating the UNION of ALL THREE Ranges..
Here is UNION of ALL THREE Ranges (RED area)..
Area covered by one or more geographic range: 1.22252e+08 sq km
#
# The remaining four questions require polygon overlay. the PBSmapping package
# provides the joinPoly() method. We use the Intersection ("INT") and Union ("UNION")
# options to the 'operation' argument to answer the questions.
#
# 2) What is the total area of South America covered by (one or more) geographic range?
#
# Calculate the UNION of all polygons
#
# joinPolys() accepts two polygon arguments so we will progressively construct it....
#
# look at first results on the map: replot the base map in a new graphics window.
#
print("Redrawing base map in new window..")
dev.set(1)
plotPolys(SouthAmBaseProjPS,proj = TRUE,col="wheat1",xlab="longitude",ylab="latitude")
#
print("Calculating the UNION of Armadillo Ranges (BLUE area)..")
Join1 = joinPolys(HabitatL1ProjPS,HabitatL2ProjPS,"UNION",maxVert = 1.0e+06)
#
print("...Here is the UNION of Armadillo Ranges..")
addPolys(Join1,col="blue")
print("press to continue...")
browser()
#
print("Calculating the UNION of ALL THREE Ranges..")
AllHabitatUnion = joinPolys(Join1,HabitatL3ProjPS,"UNION",maxVert = 1.0e+06)
AllHabUnionCB = combinePolys(AllHabitatUnion)
print("Here is UNION of ALL THREE Ranges (RED area)..")
addPolys(AllHabUnionCB,col="red")
#
attr(AllHabUnionCB, "projection") <- "LL"
AreaHabitatUnion = calcArea(AllHabUnionCB,rollup=1)
AllHabitatArea = sum(AreaHabitatUnion[1:length(AreaHabitatUnion[,1]),2])
print(sprintf("Area covered by one or more geographic range: %g sq km",AllHabitatArea))
#
print("Press to move to question 3..")
browser()
back to questions
 |
 |
|
| UNION of all Ranges |
INTERSECTION of all Ranges |
|
Calculating the INTERSERCTION of FIRST Two Ranges.
Calculating the INTERSERCTION of LAST Two Ranges..
Area (RED) covered by first two geographic range: 2.65559e+06 sq km
Area (GREEN) covered by all three geographic range: 0 sq km (NO Intersection in this case)
#
# 3) What is the total area of South America (if any) covered by all three geographic range?
#
# Calculate the INTERSECTION (if any) of the three geographic range
#
print("Redrawing base map in new window..")
dev.set(1)
plotPolys(SouthAmBaseProjPS,proj = TRUE,col="wheat1",xlab="longitude",ylab="latitude")
#
AllTwoHabitatArea = 0.0
AllThreeHabitatArea = 0.0
#
print("Calculating the INTERSERCTION of FIRST Two Ranges..")
Join1 = joinPolys(HabitatL1ProjPS,HabitatL2ProjPS,"INT",maxVert = 1.0e+06)
#
# if the intersection generated polygons, report the area and continue
#
if (length(Join1) > 0)
{
HabL1CB = combinePolys(Join1)
attr(HabL1CB, "projection") <- "LL"
AreaHabInt = calcArea(HabL1CB,rollup=1)
AllTwoHabitatArea = sum(AreaHabInt[1:length(AreaHabInt[,1]),2])
addPolys(HabL1CB ,col="red")
#
# move to third geographic range
#
print("Calculating the INTERSERCTION of LAST Two Ranges..")
Join3 = joinPolys(HabL1CB,HabitatL3ProjPS,"INT",maxVert = 1.0e+06)
if (length(Join3) > 0)
{
HabL3CB = combinePolys(Join3)
attr(HabL3CB, "projection") <- "LL"
AreaHabInt = calcArea(HabL3CB,rollup=1)
AllFouroHabitatArea = sum(AreaHabInt[1:length(AreaHabInt[,1]),2])
addPolys(HabL3CB,col="blue")
}
}
print(sprintf("Area (RED) covered by the first two geographic range: %g sq km",AllTwoHabitatArea))
print(sprintf("Area (GREEN) covered by the all three geographic range: %g sq km",AllThreeHabitatArea))
print("Press to move to question 4..")
browser()
back to questions
 |
|
|
| Brazil Armadillo Range |
|
|
Combining Range polygons..
...Calculating the UNION of Armadillo Ranges..
Area of Brazil: 8.65762e+06 sq km.
Area covered by Armadillo geographic range (BLUE outline): 8.4921e+06 sq km.
#
# Question 4: What portion of Brazil is covered by Armadillo geographic range ?
#
# Compute the union of the Greater and Southern Armadillo geographic range,
# and then the intersection of the union with the Brazil polygon.
#
dev.set(1)
plotPolys(SouthAmBaseProjPS,proj = TRUE,col="wheat1",xlab="longitude",ylab="latitude")
#
print("Combining Habitat polygons..")
#
# combinePolys appears to eliminate redundant interior polygons
#
HabL1CB = combinePolys(HabitatL1ProjPS)
HabL2CB = combinePolys(HabitatL2ProjPS)
BrazilCB = combinePolys(BrazilProjPS)
print("...Calculating the UNION of Armadillo Range..")
Armadillo = joinPolys(HabL1CB,HabL2CB,"UNION",maxVert = 1.0e+06)
BrazilArmHab = joinPolys(BrazilCB,Armadillo,"INT",maxVert = 1.0e+06)
BrazilArmHabCB = combinePolys(BrazilArmHab)
attr(BrazilArmHabCB, "projection") <- "LL"
AreaHabInt = calcArea(BrazilArmHabCB,rollup=1)
AllHabitatArea = sum(AreaHabInt[1:length(AreaHabInt[,1]),2])
print(sprintf("Area of Brazil: %g sq km.",BrazilArea))
print(sprintf("Area covered by Armadillo geographic range (BLUE outline): %g sq km",AllHabitatArea))
#
# show the outline of Armadillo geographic range within Brazil
#
addPolys(BrazilProjPS,col="orange")
par(lwd = 2) # line width of 2, thicker than default of 1
addPolys(BrazilArmHabCB,border="blue")
print("Press to move to question 5..")
browser()
back to questions
Number of sightings INSIDE geographic range : 12. Number OUTSIDE geographic range : 12
 |
|
|
| Armadillo Sightings In/Out of Habitat |
|
|
#
# Question 5: Are all of the Armadillo species sightings inside Armadillo geographic range ?
# Compute Point-in-Polygon using the armadillo species sightings and Armadillo layer.
# Then count the number of points inside the layer.
#
PointsAroundHab = findPolys(SpeciesSightingEventSet,Armadillo)
PointIndexesInsideHab = sort(unique(PointsAroundHab$EID))
PointsInsideHab = SpeciesSightingEventSet[PointIndexesInsideHab,]
#
# getting EIDs not in the Sighting event set takes an extra step...
#
allEIDs = seq(length(SpeciesSightingEventSet[,1]))
InsideFlags = !(is.element(allEIDs,PointIndexesInsideHab))
PointIndexesOutsideHab = allEIDs[InsideFlags]
PointsOutsideHab = SpeciesSightingEventSet[PointIndexesOutsideHab,]
print(sprintf("Number of sightings INSIDE geographic range : %d. Number OUTSIDE geographic range : %d",
length(PointIndexesInsideHab),length(PointIndexesOutsideHab)))
#
# finally, plot both sets in different colors and symbols on the base map.
#
addPoints(PointsInsideHab,col="yellow",pch=19)
addPoints(PointsOutsideHab,col="red",pch=19)
#
# close all the graphics devices
#
print("Press to end the demonstration..")
browser()
while(dev.cur() > 1)
dev.off()
Learning more:
UNC/Chapel Hill School of Information and Library Science course (Fall, 2003): Summary of GIS Analysis Functions
Point of Contact for this Use Case: Rick Reeves, NCEAS Scientific Programmer reeves@nceas.ucsb.edu
This use case compiled June, 2007