NCEAS Scientific Computing: Solutions Center Use Case

Point-in-polygon subsetting of data, using R

Given a set of spatially referenced points and a polygon feature, a 'point-in-polygon' operation simply determines which of the points are located within the boundaries of the polygon. Especially with small(ish) point data sets, this can be done very easily and flexibly using R.


Getting Started:

Below is a simple example that you can run yourself if you have R installed on your computer. In this example, we use an ESRI shape file that contains polygons of all counties in Washington state. We first subset King County, which we will use as the polygon of interest. We then create a set of fake field sites that are referenced by (randomly generated) latitude and longitude coordinates in and around King County; at each site we have the local count of some species (use your imagination). Our objective is to determine which sites are located within King County, plot those sites, and produce a subset ted data set that includes only data from those sites.

This example uses the inout() function (package: splancs), which evaluates whether or not each member of a set of points is inside a polygon of interest, and returns a vector of TRUE (inside) and FALSE (not inside) values accordingly. This vector can then be easily used to subset data from the original table of points. Related function inpip() returns the indices (row numbers) of points within the polygon, and pip() returns the actual coordinates of these points. For more details within R, load the splancs package and then enter the following at the command line:

?inout
?inpip
?pip

Preliminaries:

To run this on your own, you will first have to install the maptools package for importing and manipulating shapefiles, and the splancs package for performing the point-in-polygon operation. In most cases, it is straightforward to install these packages by issuing the following statements at the R command line:

install.packages("splancs", dependencies=TRUE)
install.packages("maptools")

In addition, you will have to download a small shape file bundle (WA.shp, WA.shx, WA.dbf). After starting up R in whatever directory you've just saved this shape file, you can run the example below by sourcing the R code directly over the internet. Simply enter the following statement at the R command line:

source("http://www.nceas.ucsb.edu/scicomp/GISSeminar/UseCases/PointInPolygonAnalysis/point_in_poly.r")

Example solution:

# Load required packages (this pulls in 'sp' as a dependency)
library(maptools)
library(splancs)

# Import polygon shape file of all counties in Washington. Make sure
# the WA.shp, WA.shx, and WA.dbf files are in your working directory!
# Note that this creates a Map object (using Maptools package)
WA <- read.shape("WA.shp")

# Convert Map object into a polylist (for processing by the splancs
# package), then subset King County for demonstration purposes.
king <- subset(Map2poly(WA), WA$att.data$COUNTY=="King County")
plot(king, main="King County", xlab="Longitude", ylab="Latitude")
# Read in the field site data file over the web, then plot all sites
# onto the county polygon using the site names (letters) as labels.
sites <- read.csv("http://www.nceas.ucsb.edu/scicomp/GISSeminar/UseCases/sites.csv")
points(sites$lat~sites$lon, pch=c(as.character(sites$sitename)))
# Now subset the original data based on the point-in-polygon test. The
# inout() function returns TRUE for each site located within the King
# County polygon based on its longitude and latitude. This is fed
# directly into the subset() function. Note that when referring to the
# 'king' polygon, we have to use '[[1]]' for technical reasons we won't
# go into here...
king.sites <- subset(sites, inout(sites[c("lon","lat")], king[[1]]))

# Now plot sites in King County, colored red
with(king.sites, points(lat~lon, pch=c(as.character(sitename)), col="red"))

# ...and just for fun, draw circle around each site with radius
# proportional to the number of individuals counted there!
with(king.sites, symbols(lat~lon, circles=count, inches=0.5, add=TRUE))
# Uncomment to output the subset ted data to a file, if you wish:
# write.csv(king.sites, "sites_king.csv")

Learning More:

UNC/Chapel Hill School of Information and Library Science course (Fall, 2003): Summary of GIS Analysis Functions

Point of Contact for this Use Case: James Regetz, NCEAS Scientific Programmer regetz@nceas.ucsb.edu regetz@nceas.ucsb.edu
This Use Case was compiled June, 2007.

Site Home | NCEAS Home | KNB Home