Generate Region-of-Interest polygons
for Points in an ESRI Shape File Using R
This
case presents an R script that: 1) Reads ESRI point and polygon
Shape Files covering a three-county area in California; 2) Determines the
polygon that contains each point, and generates a new Shape File
containing Regions of Interest (based on Voronoi
Polygons) for each point. Finally, the index number of each point's bounding polygon is appended to each record in the point shape file attribute table. Here we also demonstrate computation of polygon areas using the PBSMapping package.
Getting Started:
As with many R
programming tasks, there is more than one way (and associated set of R
packages) to approach and solve the problem. In this use case, we will use routines from the sp and maptools packages to
read and convert the incoming shape files into plot-friendly Polygon
data objects. Routines from the splancs
package determine the polygons containing each point (the well-defined
'point-in-polygon' problem). We also demonstrate two alternative ways to
calculate the Region Of Interest for each point: the calcVoronoi
method from the PBSMapping
package, and the voronoi.mosiac
method from the tripack
package.
Here are brief
synopsis of the R packages used:
- sp is a critical base package
for spatial operations in R. It
provides definitions for basic spatial classes (points, lines,
polygons,
pixels, and grids) in an attempt to unify the way R packages represent
and manage these sorts of data. It also includes several core functions
for creating and manipulating these data structures. The sp package is automatically
loaded when the maptools
package is used.
- maptools provides functions
for reading, writing, and manipulating spatial
data, in particular, ESRI-standard shape files. It use the core spatial data handling
methods provided by the sp
package.
- splancs provides functions for
spatial and space-time point pattern analysis of points and polygons.
- PBSMapping
provides two-dimensional spatial analysis,
transformation, and plotting functions typically found in
Geographic Information Systems. This package was developed by
scientists conducting statistical fisheries research at the Pacific Biological
Station in Nanaimo, British Columbia. The PBSMapping packages is supported
by the deldir
package.
If you have not already added these packages to your
local R installation, you can do so using the Packages/Install Packages option in the R GUI toolbar (Ms Windows XP users), or using the
following statement at the R command line:
install.packages(c("sp","maptools","splancs","PBSmapping"))
For tips on installing R on the Ubuntu Linux platform
widely used at NCEAS visit the link:
http://help.nceas.ucsb.edu/index.php/R
To run the
sample code discussed here, download and expand this .zip archive:
(PointPolyROIFiles.zip-22
kb). This file includes the R script and input files. After starting an R session from the same directory in which you have expanded this
archive,
you can run the example R code directly over the internet:
Simply enter the following statement at the R command line; the
imported data will then be stored as objects in your R environment:
source("http://www.nceas.ucsb.edu/scicomp/GISSeminar/UseCases/PointPolyRegionsOfInterest/PointInPolyProc.r")
You may also enter the statements one at a time by
typing or pasting them into the R command line. To view the online help
page for any R
method, be sure the appropriate library has been loaded and then enter
the command name proceeded by a question mark:
?coordinates
?inout
Working through the example program:
First, load the appropriate R libraries and read the
spatial data files into R data objects:
#
# Load the required libraries:
#
library(maptools)
library(splancs)
library(deldir)
library(PBSmapping)
TriCounty <-read.shape("TriCountyPolyDD.shp")
sites <- read.csv("ThreeCountyPoints.csv")
Next, load the
appropriate R libraries
and read the spatial data files into R data objects:
#
# FIPS code is the unique key for counties
#
indexes = as.matrix(TriCounty$att.data["COUNTYFIPS"])
#
# get number of points
#
len = length(sites$LatDD)
#
# promote the point data frame to an SpatialPointsDataFrame object
# so that we can add new attributes to the data frame.
#
sites2 = sites
coordinates(sites2) = c("LonDD","LatDD")
#
# get the lat/lon coords for the points
xy=coordinates(sites2)
#
# append an attribute column to the point SPDF to hold the covering
polygon
#
ar = array(0,len)
sites2$inpoly = as.vector(ar)
TcPolys = Map2poly(TriCounty)
#
# Lets have a look at the data: plot all the polygons and then all the
points
#
plot(TcPolys)
points(xy,pch="*")
Here are the baseline counties and their point
locations..

Next, proceed to point-in-polygon calculations:
#
# Loop through all of the individual polygons in the shape file.
# Use the inout() method to identify points in the 'xy' coordinate
vector
# (extracted from the incoming point shapefile) that fall within the
# boundaries of the current polygon.
#
for (iCtr in 1:length(TcPolys))
{
ThisPoly = TcPolys[[iCtr]]
#
# which points fall within the current polygon? inout() returns
'inside', a logical vector
# the same length as 'xy'.
#
inside = inout(xy,ThisPoly)
#
# assign the FIPS code for current polygon to the attribute column of
all points falling within.
#
sites2$inpoly[inside] = indexes[iCtr]
#
# create a two-column matrix containing coordinates of sites within
current polygon
#
subsites =
as.matrix(cbind(xy[inside==TRUE,1],xy[inside==TRUE,2]))
#
# For diagnostic purposes, plot the current polygon and corresponding
point subset
#
plot(ThisPoly,type="l")
points(subsites,pch="^")
print("hit key to proceed to next polygon")
browser() # browser waits for keystroke
}
Here is the result of point-in-polygon calculations for San Luis Obispo County, in which the point data exhibit the greatest spatial dispersion.

Lets generate the 'convex hull' around the set of points for San Luis Obispo County.
This is the polygon bounding all of the points in the subset. We'll use the chull()
function (part of the base grDevices package) to generate the convex hull:
#
# calculate and then draw the convex hull around the point subset
#
hullpts = chull(subsites)
#
# chull() returns a vector of integer coordinates. close the 'polygon' by appending the first to the last.
#
hullpts = c(hullpts,hullpts[1])
#
# plot the hull by referencing the 'subset of the subset' points and connecting the dots.
#
lines(subsites[hullpts, ],col="green")
#
# Note that chull() does not generate a polyon object, merely a vector of integers
# that we then transform into a set of plottable point coordinates:
#
# Plot the convex hull outline. First for San Luis Obispo County (wide dispersal)
#
#

Note that the convex hull is bounded by four of the data points. Sets save the geographic coordinates
of these points, and use them below to compute the area of the convex hull polygon.
#
# Lets save the convex hull for the first county (San Luis Obispo) for polygon area calculation demo....
# subsites vector contains the actual coordinates; hullpts contains indexes of the points we need...
#
if (iCtr == 1)
matChull = as.matrix(subsites[hullpts, ])
#
#
# now look at one other convex hull polygons for comparison
# ..and then for Santa Barbara County (narrow dispersal)
#

Next, perform another type of region-of-interest calculations: Voronoi (or Thiessen) polygons.
The points inside each Voronoi polygon are closer to the corresponding 'set point' than they
are to any other points in the set (in this case, the points within each county. Voronoi polygons
are often used as first-order estimates of 'regions of interest' (ROI) surrounding point features.
#
# Next, generate 'regions of interest' around each point using Voronoi
polygons.
# Definition of a Voronoi (or Thiessen) polygon: The area around a site
that contains
# all points witch are closer to that point than to any other point in
the set.
#
# Demonstrate two ways to compute these: The PBSMapping package
CalcVoronoi routine,
# which ecpects the point in a data frame, and the tripack package
'voronoi' routine,
# which expects a matrix or x/y coordinates.
# # First, use PBSMapping package calcVoronoi:
#
events <-
as.event.data(data.frame(EID=1:len,X=xy[,1],Y=xy[,2],projection=1)
Vpolys <-calcVoronoi(events)
#
# Plot the region-of-interest polygons, first in black and white, then
shaded.
#
# simple line drawing
#
plot.new()
plotPolys(Vpolys)
#
Here is a simple line plot of Voronoi polygon
ROIs.....
Here is the result of point-in-polygon calculations for San Luis Obispo County, with the greatest
point dispersion....
#
# shaded diagram requires a little preparation
#
ploydata <- calcArea(Vpolys)
names(polydata) <-[is.element(names(polydata),"area"}
<- "Z"
colSeq <- seq(0.4,0.95,length=4)
polydata <-
makeProps(polydata,breaks=quantile(polydata$Z,c(0,.25,.5,.75,1)),propName="col",
propVals=rgb(colSeq,colSeq,colSeq))
plotMap(Vpolys,polyProps=polydata)
plotMap(Vpolys,polyProps=polydata)
#
# add the point locations
#
addPoints(events,pch=19)
#
# Finally, add the outlines of the county polygons in red.
#
plot(TcPolys,border="red",add=TRUE)
#
...And here are the regions, shaded, with the
County outlines in red. Shading
is proportional to polygon areas; smaller polygons=darker shading.

Finally, demonstrate polygon area calculations: first, on the convex hull for the San Luis Obispo County point set,
and then for the county polygons. Use the PolySet data object and methods from the PBSMapping package
Here is the R code:
# Compute the area of the convex hull
# this requires some preprocessing as we need to construct
# a polygon whose vertices are geographic coordinates and
# which is projected into UTM Coordinates. We use methods
# and data objects from the PBSMapping package.
# First,create a matrix containing the geographic coordinates
# of the point coverage elements that make up the convex hull.
#
# consult the PBSmapping package for descriptions of the data
# and methods.
#
# Create PBSMapping data types: Polygon and Polygons (list of Polygon)
#
poly = Polygon(matChull)
ID = c("P1") # character vector label required by Polygons data type
#
# we need to create a PolySet object in order to use the calcArea method..
# Remember that 'mat'is an R 2-D matrix with the closed coordinates
# (last point = first point) for a single polygon
#
ps = appendPolys(NULL,matChull,1,1,FALSE)
#
# to compute area in units of kilometers ** 2, set the projection
# attribute of the PolySet to UTM and Zone 9.
#
attr(ps,"projection") = "LL"
attr(ps,"zone") = 9
psUTM = convUL(ps)
#
polygonArea = calcArea(psUTM,rollup=1)
#
# polygonArea is a compound data type, see them with the command attributes(polygonArea)
# we will print just the $area attribute.
#
print(sprintf("SLO County convex hull area is %f km**2.",polygonArea$area))
browser()
Here is the computed area:
SLO County convex hull area is 6596.811490 km**2.
#
#
# Finally, calculate the area of each polygon in the TcPolys county
# polygon data set. Note that we have to construct a PolySet object
# containing one element for each county. How to do this? Simple:
#
# 1) loop over the elements in the polylist object TcPolys:
# a) Extract the X and Y coordinate vectors from each object
# b) Bind them into a matrix
# c) Generate the PolySet object
# 2) Transform the PolyArea object to UTM coordinates
# 3) Calculate the polygon areas, one polygon at at a time.
#
print("Hit key to compute convex hull polygon area for county polygons...")
browser()
for (iCtr in 1: length(TcPolys))
{
ps2 = NULL
col1 = TcPolys[[iCtr]][,1]
col2 = TcPolys[[iCtr]][,2]
mat2 = as.matrix(cbind(col1,col2))
ps2 = appendPolys(ps2,mat2,NULL,NULL,FALSE) # NULL values = system
generated SID,PID.
attr(ps2,"projection") = "LL"
attr(ps2,"zone") = 9
psUTM = convUL(ps2)
polygonArea = calcArea(psUTM,rollup=1)
print(sprintf("County polygon %d area is %f km**2.",iCtr,polygonArea$area))
}
Here are the County polygon areas computed within the above loop, with US Census land areas for the three main counties:
County polygon 1 area is 8728.277722 km**2. (San Luis Obispo county. Area from US Census: 8588 km ** 2)
County polygon 2 area is 6720.636107 km**2. (Santa Barbara county. Area from US Census: 7089 km **2)
County polygon 3 area is 4843.153788 km**2. (Ventura county. Area from Census: 4778 km ** 2)
County polygon 4 area is 252.615570 km**2.
County polygon 5 area is 41.958022 km**2.
County polygon 6 area is 217.139878 km**2.
County polygon 7 area is 3.720030 km**2.
County polygon 8 area is 60.586033 km**2.
End of demonstration.