Point-in-polygon subsetting of data, using R
Given a set of spatially referenced points and a polygon feature, a
'point-in-polygon' operation simply determines which of the points are
located within the boundaries of the polygon. Especially with small(ish)
point data sets, this can be done very easily and flexibly using
R.
Getting Started:
Below is a simple example that you can run yourself if you have R
installed on your computer. In this example, we use an ESRI shape file
that contains polygons of all counties in Washington state. We first
subset King County, which we will use as the polygon of interest. We
then create a set of fake field sites that are referenced by (randomly
generated) latitude and longitude coordinates in and around King County;
at each site we have the local count of some species (use your
imagination). Our objective is to determine which sites are located
within King County, plot those sites, and produce a subset ted data set
that includes only data from those sites.
This example uses the inout() function (package: splancs), which
evaluates whether or not each member of a set of points is inside
a polygon of interest, and returns a vector of TRUE (inside) and
FALSE (not inside) values accordingly. This vector can then be easily
used to subset data from the original table of points. Related
function inpip() returns the indices (row numbers) of points within
the polygon, and pip() returns the actual coordinates of these points.
For more details within R, load the splancs package and then enter the
following at the command line:
?inout
?inpip
?pip
Preliminaries:
To run this on your own, you will first have to install the
maptools package for importing and manipulating shapefiles, and
the splancs package for performing the point-in-polygon
operation. In most cases, it is straightforward to install these
packages by issuing the following statements at the R command line:
install.packages("splancs", dependencies=TRUE)
install.packages("maptools")
In addition, you will have to download a small shape file bundle (WA.shp, WA.shx, WA.dbf). After starting up R in whatever directory
you've just saved this shape file, you can run the example below by
sourcing the R code directly over the internet. Simply enter the
following statement at the R command line:
source("http://www.nceas.ucsb.edu/scicomp/GISSeminar/UseCases/PointInPolygonAnalysis/point_in_poly.r")
Example solution:
# Load required packages (this pulls in 'sp' as a dependency)
library(maptools)
library(splancs)
# Import polygon shape file of all counties in Washington. Make sure
# the WA.shp, WA.shx, and WA.dbf files are in your working directory!
# Note that this creates a Map object (using Maptools package)
WA <- read.shape("WA.shp")
# Convert Map object into a polylist (for processing by the splancs
# package), then subset King County for demonstration purposes.
king <- subset(Map2poly(WA), WA$att.data$COUNTY=="King County")
plot(king, main="King County", xlab="Longitude", ylab="Latitude")
# Read in the field site data file over the web, then plot all sites
# onto the county polygon using the site names (letters) as labels.
sites <- read.csv("http://www.nceas.ucsb.edu/scicomp/GISSeminar/UseCases/sites.csv")
points(sites$lat~sites$lon, pch=c(as.character(sites$sitename)))
# Now subset the original data based on the point-in-polygon test. The
# inout() function returns TRUE for each site located within the King
# County polygon based on its longitude and latitude. This is fed
# directly into the subset() function. Note that when referring to the
# 'king' polygon, we have to use '[[1]]' for technical reasons we won't
# go into here...
king.sites <- subset(sites, inout(sites[c("lon","lat")], king[[1]]))
# Now plot sites in King County, colored red
with(king.sites, points(lat~lon, pch=c(as.character(sitename)), col="red"))
# ...and just for fun, draw circle around each site with radius
# proportional to the number of individuals counted there!
with(king.sites, symbols(lat~lon, circles=count, inches=0.5, add=TRUE))
# Uncomment to output the subset ted data to a file, if you wish:
# write.csv(king.sites, "sites_king.csv")
Learning More:
UNC/Chapel Hill School of Information and Library Science course (Fall, 2003): Summary of GIS Analysis Functions
Point of Contact for this Use Case: James Regetz, NCEAS Scientific Programmer regetz@nceas.ucsb.edu regetz@nceas.ucsb.edu
This Use Case was compiled June, 2007.