Fitting Spatial Surfaces to Noisy Observations

Home

GIS Research in Longwood Medical Area

GIS at Harvard University

FAQs

Scripts

Spatial Statistics

Links

Contributed by Chris Paciorek (paciorek AT hsph.harvard.edu), Ph.D, Department of Biostatistics, Harvard School of Public Health

Three options for fitting spatial surfaces to noisy observations are described. These three options are often appropriate for continuous data. The methods are not appropriate for data that are counts at each location (although they may be reasonable when all the counts are fairly large and none near zero) or 0/1 binary indicators. The first option is the SemiPar package for the statistical software S-plus. The second option is kriging within the ArcGIS software. A third potential option is the gam function in the mgcv library for the statistical software R, but I have not yet tested this. My goal here is to describe how to use these two primary options and outline situations in which one or the other may be more appropriate. For applications for which these options do not seem appropriate, one will probably want to consult with a statistician familiar with spatial modelling techniques.

SemiPar with S-plus: SemiPar is open source code written in the S statistical language and executed in the S-plus software package. S-plus is a commercial statistical computing environment that is available for both Windows and Unix systems. While the SemiPar code is free, S-plus is not. However, the authors of SemiPar are working on a version for the statistical language R, which is an open-source version of the S language that underlies S-plus. If necessary, it should be possible to convert the S-plus code to R fairly easily (see Chris Paciorek, paciorek AT hsph.harvard.edu, for assistance). SemiPar fits a spatial surface within the framework of linear mixed models, using a thin-plate spline basis to represent the surface. The model can include additional covariates, such as age or sex, as either linear or smooth (nonparametric) components. For data with a strong spatial directionality, the SemiPar software may not be ideal, because it assumes that the variability of the spatial surface is similar in all directions. Also for data for which the variability of the spatial surface is very different in different parts of the space (called nonstationarity), such as the topography of Colorado, with flat plains in the east and mountains in the west, the fit will not be ideal, because the model assumes similar variability throughout the space. Maps can be created within S-plus using SemiPar or using other S functions. S-plus graphics are customizable, but the extensive layers available in GIS software are not readily available. The SemiPar models are run and results plotted using a command line interface.

Detailed guide to using SemiPar

Kriging within ArcGIS: ArcGIS is commercial GIS software, available primarily (exclusively?) for Windows systems. Spatial surface fitting can be performed by the statistical method of kriging using either the Spatial Analyst or Geostatistical Analyst extensions. The underlying code by which the kriging is done is not accessible, and it can take some effort to understand exactly what is being done based on the documentation. Kriging is a standard statistical method, but the implementation in ArcGIS is done in a somewhat ad hoc manner, in part for computational efficiency. ArcGIS cannot include additional covariates; it only fits the spatial surface without accounting for other predictors. ArcGIS can account for spatial directionality. However, like SemiPar, it is not ideal for nonstationary data for which the variability of the spatial surface is very different in different parts of the space. Sophisticated maps can be created within the GIS software, although one is limited to the mapping options provided by the creators of the software. Models are fit and maps created using a graphical user interface.

Detailed guide to using ArcGIS for Kriging with ArcGIS

mgcv in R: Finally, I recently realized that the mgcv library in R (written by Simon Wood) has a gam() (generalized additive models) function that appears to allow for thin plate spline-based surface fits. Be careful: the gam() function in S-plus does not implement the same method and cannot be used for flexible surface fitting. I intend to explore the R gam() function further, but at this time leave the intrepid reader with example code (untested) for fitting such a model.

Note that the R gam() function allows for both linear covariates (e.g., covar1 in the example code) and smoothing terms (e.g., covar2, fit with a cubic regression spline), in addition to the spatial effect (x, y, fit with a thin-plate regression spline). The degree of smoothing is estimated within the model using the generalized cross-validation (GCV) criteria.

Recommendations: In general, I would recommend using the SemiPar code rather than kriging within ArcGIS. The R mgcv library may be a reasonable alternative, particularly since it appears to provide accessible estimates of uncertainty for both the spatial surface and the other covariates, but I haven't yet explored it much. The SemiPar statistical model does not rely heavily on ad hoc fitting methods used in the ArcGIS kriging routines, and SemiPar can account for important additional factors that may affect the response. The degree of smoothness in the surface estimated by SemiPar is controlled by a single parameter, rather than a number of parameters in the kriging approach. While having multiple parameters might seem to allow for a more flexible fit to the data, with the kind of smoothing done in either SemiPar or kriging, there is really only one parameter that controls the degree of smoothing, so the kriging approach introduces additional complexity with little change in the ability to fit a surface. The exception to this comment is the ability of the kriging methodology to account for directionality. Also note that if one wants to use the GIS software to create maps, the results from SemiPar/S-plus can be imported into the GIS software.

Other Methods: Note that there are many other methods for fitting spatial surfaces within a statistical framework, including other software available for doing kriging. These include the Fields software package for the R statistical language, which implements thin plate spline fitting and kriging (http://www.cgd.ucar.edu/stats/Software/Fields), both classical and Bayesian implementations of kriging using the geoR and geoS packages for R and S-plus, respectively (http://www.est.ufpr.br/geoR), and Matlab (commercial software available for both Windows and Unix) code for a Bayesian approach to free-knot spline models that can fit two-dimensional surfaces (http://stats.ma.ic.ac.uk/~ccholmes/Book_code/book_code.html). Also, the built-in 'spatial' library for R and S-plus can do a basic version of kriging similar to that done in ArcGIS. There are also other implementations for fitting spatial surfaces within the mixed model framework, but I won't describe those here.

Sources:
For more information on the statistical methods underlying the SemiPar software, a starting point is
Kammann, E.E., and M.P. Wand. 2003. Geoadditive models. Applied Statistics 52:1-18.

Information on the methods of Simon Wood implemented in the R library mgcv can be found in
Wood, S.N. 2000. Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society, Series B 62:413-428.
Wood, S.N. 2003. Thin plate regression splines. Journal of the Royal Statistical Society, Series B 65:95-114.

Information on exactly how kriging is done in ArcGIS is a bit difficult to come by. The online documentation has limited technical detail. More extensive information is available in:
Johnston, K. et al. 2001. Using ArcGIS geostatistical analyst. Redlands, CA: Environmental Systems Research Institute.
McCoy, J. et al. 2001. Using ArcGIS spatial analyst. Redlands, CA: Environmental Systems Research Institute.
Both are available in Harvard’s Pusey Library Map Collection, for purchase from ESRI and can be downloaded as pdf files from the Harvard site license downloads page (Books and Maps).
The ESRI website has some technical reports, most of which are somewhat tangential to the methods implemented in ArcGIS, but a couple of which provide further details on the fitting methods.