NCEAS Scientific Computing: Solutions Center Use Case

Generate Region-of-Interest polygons for Points in an ESRI Shape File Using R

This case presents an R script that: 1) Reads ESRI point and polygon Shape Files covering a three-county area in California; 2) Determines the polygon that contains each point, and generates a new Shape File containing Regions of Interest  (based on Voronoi Polygons) for each point. Finally, the index number of each point's bounding polygon is appended to each record in the point shape file attribute table. Here we also demonstrate computation of polygon areas using the PBSMapping package.


Getting Started:

As with many R programming tasks, there is more than one way (and associated set of R packages) to approach and solve the problem. In this use case, we will use routines from the sp and maptools packages to read and convert the incoming shape files into plot-friendly Polygon data objects. Routines from the splancs package determine the polygons containing each point (the well-defined 'point-in-polygon' problem). We also demonstrate two alternative ways to calculate the Region Of Interest for each point: the calcVoronoi method from the PBSMapping package, and the voronoi.mosiac method from the tripack package.

Here are brief synopsis of the R packages used:

  • sp is a critical base package for spatial operations in R. It provides definitions for basic spatial classes (points, lines, polygons, pixels, and grids) in an attempt to unify the way R packages represent and manage these sorts of data. It also includes several core functions for creating and manipulating these data structures. The sp package is automatically loaded when the maptools package is used. 

  • maptools provides functions for reading, writing, and manipulating spatial data, in particular, ESRI-standard shape files. It use the core spatial data handling methods provided by the sp package.

  • splancs provides functions for spatial and space-time point pattern analysis of points and polygons.

  • PBSMapping provides two-dimensional spatial analysis, transformation, and plotting functions typically found in Geographic Information Systems. This package was developed by scientists conducting statistical fisheries research at the Pacific Biological Station in Nanaimo, British Columbia. The PBSMapping packages is supported by the deldir package.

If you have not already added these packages to your local R installation, you can do so using the Packages/Install Packages option in the R GUI toolbar (Ms Windows XP users), or using the following statement at the R command line:

install.packages(c("sp","maptools","splancs","PBSmapping"))

For tips on installing R on the Ubuntu Linux platform widely used at NCEAS visit the link:

http://help.nceas.ucsb.edu/index.php/R

To run the sample code discussed here, download and expand this .zip archive: (PointPolyROIFiles.zip-22 kb). This file includes the R script and input files. After starting an R session from the same directory in which you have expanded this archive, you can run the example R code directly over the internet: Simply enter the following statement at the R command line; the imported data will then be stored as objects in your R environment:

source("http://www.nceas.ucsb.edu/scicomp/GISSeminar/UseCases/PointPolyRegionsOfInterest/PointInPolyProc.r")

You may also enter the statements one at a time by typing or pasting them into the R command line. To view the online help page for any R method, be sure the appropriate library has been loaded and then enter the command name proceeded by a question mark:

?coordinates
?inout

Working through the example program:

First, load the appropriate R libraries and read the spatial data files into R data objects:

#
# Load the required libraries:
#
library(maptools)
library(splancs)
library(deldir)
library(PBSmapping)
TriCounty <-read.shape("TriCountyPolyDD.shp")
sites <- read.csv("ThreeCountyPoints.csv")

Next, load the appropriate R libraries and read the spatial data files into R data objects:

#
# FIPS code is the unique key for counties
#
indexes = as.matrix(TriCounty$att.data["COUNTYFIPS"])
#
# get number of points
#
len = length(sites$LatDD)
#
# promote the point data frame to an SpatialPointsDataFrame object
# so that we can add new attributes to the data frame.
#
sites2 = sites 
coordinates(sites2) = c("LonDD","LatDD")
#
# get the lat/lon coords for the points
xy=coordinates(sites2)
#
# append an attribute column to the point SPDF to hold the covering polygon
#
ar = array(0,len)
sites2$inpoly = as.vector(ar)
TcPolys = Map2poly(TriCounty)
#
# Lets have a look at the data: plot all the polygons and then all the points
#
plot(TcPolys)
points(xy,pch="*")

Here are the baseline counties and their point locations..

Three-County Polygon View

Next, proceed to point-in-polygon calculations:

#
# Loop through all of the individual polygons in the shape file.
# Use the inout() method to identify points in the 'xy' coordinate vector
# (extracted from the incoming point shapefile) that fall within the
# boundaries of the current polygon.
#
for (iCtr in 1:length(TcPolys))
{
    ThisPoly = TcPolys[[iCtr]]
#
# which points fall within the current polygon? inout() returns 'inside', a logical vector
# the same length as 'xy'.
#
    inside = inout(xy,ThisPoly)
#
# assign the FIPS code for current polygon to the attribute column of all points falling within.
#
    sites2$inpoly[inside] = indexes[iCtr]
#
# create a two-column matrix containing coordinates of sites within current polygon
#
   subsites = as.matrix(cbind(xy[inside==TRUE,1],xy[inside==TRUE,2]))
#
# For diagnostic purposes, plot the current polygon and corresponding point subset
#
   plot(ThisPoly,type="l")
   points(subsites,pch="^")
   print("hit key to proceed to next polygon")
   browser() # browser waits for  keystroke
}
 Here is the result of point-in-polygon calculations for San Luis Obispo County, in which the point data exhibit the greatest spatial dispersion.
San Luis Obispo County Polygon

Lets generate the 'convex hull' around the set of points for San Luis Obispo County.
This is the polygon bounding all of the points in the subset. We'll use the chull()
function (part of the base grDevices package) to generate the convex hull:

#
# calculate and then draw the convex hull around the point subset
#
    hullpts = chull(subsites)
#
# chull() returns a vector of integer coordinates. close the 'polygon' by appending the first to the last.
#
    hullpts = c(hullpts,hullpts[1])
#
# plot the hull by referencing the 'subset of the subset' points and connecting the dots.
#
    lines(subsites[hullpts, ],col="green")
#
# Note that chull() does not generate a polyon object, merely a vector of integers
# that we then transform into a set of plottable point coordinates:
#
# Plot the convex hull outline. First for San Luis Obispo County (wide dispersal)
#
#
Convex Hull for SLO County points

Note that the convex hull is bounded by four of the data points. Sets save the geographic coordinates
of these points, and use them below to compute the area of the convex hull polygon.
#
# Lets save the convex hull for the first county (San Luis Obispo) for polygon area calculation demo....
# subsites vector contains the actual coordinates; hullpts contains indexes of the points we need...
#
    if (iCtr == 1)
         matChull = as.matrix(subsites[hullpts, ])
#

# now look at one other convex hull polygons for comparison
# ..and then for Santa Barbara County (narrow dispersal)
#
Convex Hull for SB County

Next, perform another type of region-of-interest calculations: Voronoi (or Thiessen) polygons.
The points inside each Voronoi polygon are closer to the corresponding 'set point' than they
are to any other points in the set (in this case, the points within each county. Voronoi polygons
are often used as first-order estimates of 'regions of interest' (ROI)  surrounding point features.

#
# Next, generate 'regions of interest' around each point using Voronoi polygons.
# Definition of a Voronoi (or Thiessen) polygon: The area around a site that contains
# all points witch are closer to that point than to any other point in the set.
#
# Demonstrate two ways to compute these: The PBSMapping package CalcVoronoi routine,
# which ecpects the point in a data frame, and the tripack package 'voronoi' routine,
# which expects a matrix or x/y coordinates.
#  # First, use PBSMapping package calcVoronoi:
#
  events <- as.event.data(data.frame(EID=1:len,X=xy[,1],Y=xy[,2],projection=1)
  Vpolys <-calcVoronoi(events)
#
# Plot the region-of-interest polygons, first in black and white, then shaded.
#
# simple line drawing
#
  plot.new()
  plotPolys(Vpolys)
#
Here is a simple line plot of Voronoi polygon ROIs.....
Simple Vorioni Regions
Here is the result of point-in-polygon calculations for San Luis Obispo County, with the greatest point dispersion....
#
# shaded diagram requires a little preparation
#
  ploydata <- calcArea(Vpolys)
  names(polydata) <-[is.element(names(polydata),"area"} <- "Z"
 
  colSeq <- seq(0.4,0.95,length=4)
  polydata <- makeProps(polydata,breaks=quantile(polydata$Z,c(0,.25,.5,.75,1)),propName="col",                               propVals=rgb(colSeq,colSeq,colSeq)) plotMap(Vpolys,polyProps=polydata)
  plotMap(Vpolys,polyProps=polydata)
#
# add the point locations
#
  addPoints(events,pch=19)
#
# Finally, add the outlines of the county polygons in red.
#
  plot(TcPolys,border="red",add=TRUE)
#

...And here are the regions, shaded, with the County outlines in red. Shading
is proportional to polygon areas; smaller polygons=darker shading.
Final Shaded Vorioni Diagram
Finally, demonstrate polygon area calculations: first, on the convex hull for the San Luis Obispo County point set,
and then for the county polygons. Use the PolySet data object and methods from the PBSMapping package

Here is the R code:

# Compute the area of the convex hull
# this requires some preprocessing as we need to construct
# a polygon whose vertices are geographic coordinates and
# which is projected into UTM Coordinates. We use methods
# and data objects from the PBSMapping package.
# First,create a matrix containing the geographic coordinates
# of the point coverage elements that make up the convex hull.
#
# consult the PBSmapping package for descriptions of the data
# and methods.
#
# Create PBSMapping data types: Polygon and Polygons (list of Polygon)
#
   poly = Polygon(matChull)
   ID = c("P1") # character vector label required by Polygons data type
#
# we need to create a PolySet object in order to use the calcArea method..
# Remember that 'mat'is an R 2-D matrix with the closed coordinates
# (last point = first point) for a single polygon
#
   ps = appendPolys(NULL,matChull,1,1,FALSE)
#
# to compute area in units of kilometers ** 2, set the projection
# attribute of the PolySet to UTM and Zone 9.   
#
   attr(ps,"projection") = "LL" 
   attr(ps,"zone") = 9
   psUTM = convUL(ps)
#
   polygonArea = calcArea(psUTM,rollup=1)
#
# polygonArea is a compound data type, see them with the command attributes(polygonArea)
# we will print just the $area attribute.
#
   print(sprintf("SLO County convex hull area is %f km**2.",polygonArea$area))
   browser()

Here is the computed area:

   SLO County convex hull area is 6596.811490 km**2.

#
#
# Finally, calculate the area of each polygon in the TcPolys county
# polygon data set. Note that we have to construct a PolySet object
# containing one element for each county. How to do this? Simple:
#
# 1) loop over the elements in the polylist object TcPolys:
#    a) Extract the X and Y coordinate vectors from each object
#    b) Bind them into a matrix
#    c) Generate the PolySet object
# 2) Transform the PolyArea object to UTM coordinates
# 3) Calculate the polygon areas, one polygon at at a time.
#
   print("Hit key to compute convex hull polygon area for county polygons...")
   browser()

   for (iCtr in 1: length(TcPolys))
   {
      ps2 = NULL
      col1 = TcPolys[[iCtr]][,1]
      col2 = TcPolys[[iCtr]][,2]
      mat2 = as.matrix(cbind(col1,col2))
      ps2 = appendPolys(ps2,mat2,NULL,NULL,FALSE) # NULL values = system generated SID,PID.  
 
      attr(ps2,"projection") = "LL"
      attr(ps2,"zone") = 9
      psUTM = convUL(ps2)
      polygonArea = calcArea(psUTM,rollup=1)
      print(sprintf("County polygon %d area is %f km**2.",iCtr,polygonArea$area))
   }

Here are the County polygon areas computed within the above loop, with US Census land areas for the three main counties:

 
    County polygon 1 area is 8728.277722 km**2. (San Luis Obispo county. Area from US Census: 8588 km ** 2)
    County polygon 2 area is 6720.636107 km**2. (Santa Barbara county. Area from US Census: 7089 km **2)
    County polygon 3 area is 4843.153788 km**2. (Ventura county. Area from Census: 4778 km ** 2)
    County polygon 4 area is 252.615570 km**2.
    County polygon 5 area is 41.958022 km**2.
    County polygon 6 area is 217.139878 km**2.
    County polygon 7 area is 3.720030 km**2.
    County polygon 8 area is 60.586033 km**2.

End of demonstration.



 

Learning More:

Search for information (including PROJ4 string) about your desired Map Projection: EPSG spatialreference.org Web Site

Alternate PROJ4 information: remotesensing.org Geotiff projection Web Site

UNC/Chapel Hill School of Information and Library Science course (Fall, 2003): Summary of GIS Analysis Functions

Point of Contact for this Use Case: Rick Reeves, NCEAS Scientific Programmer reeves@nceas.ucsb.edu reeves@nceas.ucsb.edu
This Use Case was compiled July, 2007.

Site Home | NCEAS Home | KNB Home