A common spatial analysis task is to assign the nearest point in one group to each of the elements of a second set of points. For example, consider the case of a scientist monitoring a series of study sites within a county, who needs to assign the closest meteorological recording station within the region to each study site. Both point sets are represented by geocoded records in two separate ASCII text files.
Click here to download a .zip file containing the source code and test data sets.
# R Script to assign closest temperature points to transect location points.
# Uses the splancs R library (which loads sp as a foundation)
#
# August 27, 2007
# Author: Rick Reeves, Scientific Programmer,
# National Center for Ecological Analysis and Synthesis (NCEAS)
# University of California, Santa Barbara www.nceas.ucsb.edu
#
# This script also demonstrates simple R functions and debugging mechanisms.
#
# To run the function:
# 1) Start R from command line
# 2) > setwd("(directory containing this file and the data sets 'table1.csv' and 'table2.csv')")
# 3) > source("AssignClosestPoints.r")
# 4) > TopDriver()
#
# To be able to step through the script one line at a time within the R environment:
# 1) Uncomment the lines 'debug(AssignClosestPoints)' and 'browser()'
# 2) Repeat steps 2 - 4, above
# 3) Execution of script will pause at the line after 'broswser()'.
# Enter 'n' to execute the next line, or 'c' to resume full-speed executuion.
# For more information, enter "help("browser")" at the R command prompt
#
TopDriver <- function()
{
#
# program requires sp library version 0.9-13, which includes the spDistsN1() function
#
library(sp)
#
# uncomment next line, and all lines with 'n()' if you wish to step through the script
#
debug(AssignClosestPoints)
#
# Call to function that performs actual calculations
#
AssignClosestPoints()
}
AssignClosestPoints <- function()
{
#
# Read in the points: Table 1 is list of temperature values at geocoded data points
# Table 2 is list of geocoded transect position points.
#
# NOTE:This example expects that the input X/Y point pairs are in geographic coordinates
# (Latitude/Longitude). Note that distances in the X (longitude) dimension are a function
# of the latitude of each point:
#
# (Miles per degree Latitude constant at 69.04 Statute Miles)
#
# Example: Latitude Statute Miles per Degree Longitude
# 0 69.04
# 45 48.8
# 60 34.67
# 80 13.89
# 90 0
#
# In some cases, especially at latitudes > 40 degrees, or for study areas with large (e.g., > 20 km)
# this difference may affect the accuracy of nearest neighbor point calculation.
#
# This example uses the spDistsN1() function (included in the April,2007 version of the sp package)
# which resolves this issue by computing spherical Great Circle distances in kilometers between
# the point pairs. The spDistsN1 argument flag 'longlat' determines the units of the distances
# calculated:
#
# longlat: TRUE: Distances are Great Circle distancens, in units of kilometers
# FALSE: Distances are in the same units (e.g., degrees or meters) as the input geographic coordinates
#
# ...see the spDistsN1() entry in the R sp package documentation (pdf) for details.
#
# We want to assign the closest temperature value from Table 1 to each of the transect points in Table 2.
#
#
# Read in the points: Table 1 is list of temperature values at lat/long points
# Table 2 is list of transect positions (lat/long) to which
# we which to assign the geographically closest temperature measurement.
#
# In other words, this program assigns the closest temperature value from
# one of the stations in Table 1 to each of the transect points in Table 2.
#
table1 <- read.csv("table1Sub.csv")
table2 <- read.csv("table2Sub.csv")
#
# NOTE: Some of the 'table1' temperature records have temperature values <- -99 or -199.
# I assume that we should exclude these from the candidates for assignment. To do so:
#
# Create a logical (values are TRUE or FALSE) vector that flags the temperature rows
# with valid temperature values. Lets compare two ways of making this vector:
#
# 1) In three steps:
#
tempVec <- table1$temperature
lRowsWithValidTemps <- (tempVec > -90) # logical vector
TableWithValidTempRecs <- table1[lRowsWithValidTemps == TRUE,1:3]
#
# 2) In one step, updating table1 and using the logical NOT (!) and %in% operators
#
TableWithValidTempRecs <- table1[!table1$temperature %in% c(-99,-199),]
#
# create subsets using only the lat/long coordinates.
#
pttbl1 <- matrix(c(TableWithValidTempRecs$longitude,TableWithValidTempRecs$latitude),nrow=length(TableWithValidTempRecs$latitude),ncol=2,byrow=TRUE)
pttbl2 <- matrix(c(table2$longitude,table2$latitude),nrow=length(table2$latitude),ncol=2,byrow=TRUE)
#
# Define these vectrors, used in the loop.
#
# Also note two ways of finding the number of rows in a table:
# 'length()' (of one column), and the more compact 'nrow()'
#
closestSiteVec <- vector(mode = "numeric",length = length(pttbl2[,1]))
minDistVec <- vector(mode = "numeric",length = nrow(pttbl2))
#
# Get the index of the temperature station closest to each field station.
# Use the spDistsN1 function to compute the distance vector between each
# field station site and all of the temperature stations. Then, find and
# retain the actual temperature, and the index of the closest temperature
# to each transect station.
#
# spDistsN1 usage: spDistsN1(pointList, pointToMatch, longlat)
#
# where:
# pointList : List of candidate points.
# pointToMatch: Single point for which we seek the closest point in pointList.
# longlat : TRUE computes Great Circle distance in km,
# FALSE computes Euclidean distance in units of input geographic coordinates
#
# We use Great Circle distance to increase distance calculation accuracy at high latitudes
# See the discussion of distance units in the header portion of this file
#
# minDistVec stores the distance from to the closest temperature station to transit station 'i'
# closestSiteVec stores the index of the closest temperature station to transit station 'i'
#
for (i in 1 : nrow(pttbl2))
{
distVec <- spDistsN1(pttbl1,pttbl2[i,],longlat = TRUE)
minDistVec[i] <- min(distVec)
closestSiteVec[i] <- which.min(distVec)
}
#
# Create the output file: merge the temperature point list with the transect point list
# into a five-column table by merging the temperature point and transect point lists.
#
PointAssignTemps <- TableWithValidTempRecs[closestSiteVec,3]
FinalTable <-data.frame(table2$longitude,table2$latitude,table2$density,closestSiteVec,minDistVec,PointAssignTemps)
#
# Here is how to replace the FinalTable column names (right side of '$' in the previous statement) with new names:
#
names(FinalTable) <- c("Long","Lat","Density","TempTableRowMatchIndex","Distance","Temperature")
#
# finally, write FinalTable to disk as a CSV file.
#
print("FinalTable is complete, write to disk as .csv file")
write.csv(FinalTable,file="FinalTempAssignmentsSUB.csv")
#
}
Learning More......
Measuring Geographic Distance:
The Basics: Geographic Coordinate Systems
Map Projections:
Introduction to Map Projections
Why Don't My Layers Line Up? Working With Coordinate Systems
R Programming Environment:
The R Project (R Programming Environment Info Center) home page
Point of Contact for this Use Case: Rick Reeves, NCEAS Scientific Programmer reeves@nceas.ucsb.edu
This Use Case was compiled August, 2007.