GIS at Harvard

Kriging With ArcGIS

Home GIS Research in Longwood Medical Area GIS at Harvard University FAQs Scripts Spatial Statistics Links	Adding Layers of Data in ArcGIS : 1. Download all the layers you need. 2. After downloading and unzipping the files, add the layers to your new map by opening ArcMap clicking File, Add Data, and selecting the layers you want to add from the appropriate folders. 3. If you want labels displayed on your map, you can right-click on the layer and select Label Features. Preparing your data and importing into ArcGIS 1. You need to create a file with three columns: one for longitude, one for latitude, and one for the variable that you wish to map. The first row of the file should contain the variable names (e.g., “x,y,z”). 2. Note that ArcGIS assumes that input x and y coordinates correspond to the x and y coordinates of the coordinate system of the data frame (longitude and latitude if you are using a geographic coordinate system). If this is not the case for your data, you may need to further examine the documentation to see how this might be handled. 3. For response variables that are constrained to be positive, such as concentrations, a common approach is to take the natural logarithm of the values. This tends to produce data that more closely match the underlying statistical assumptions involved in kriging. This and other transformations could also be done within ArcGIS before doing kriging. 4. Your input file should be comma delimited, which means that there should be commas, and only commas, between the values in the file. Each line should pertain to a single location, with the x and y coordinates of the location and the response value, all separated by commas. One way to create this format is in Microsoft Excel. Open the file that you want to convert. Go to File->Save as, and under 'Save as type', select 'CSV'. Hit Save, and Yes to the window that pops up. When you quit Excel, you do not need to save any changes. In S-plus or R, if you use the write.table() function to create the data file, the argument ' sep="," ' will create a comma delimited file. You can also use dbf files, which can also be created with MS Excel. 5. Open your map in ArcMap, and go to Tools->Add XY data, click the folder icon on the top right, and browse through the folders to find the .csv file that you just created. Then hit OK. Your new layer should appear in the Layers column on the left. It will also display the layer as individual points on your map. Overview of Kriging Methodology: Kriging is a statistical technique that posits a certain statistical model for the data, namely that the response at a given location is the sum of two components: an unknown underlying surface, which we are trying to estimate, plus some additional noise. Kriging produces an estimate of the underlying (usually assumed to be smooth) surface by a weighted average of the data, with weights declining with distance between the point at which the surface is being estimated and the locations of the data points. The exact nature of the decline is based on modelling the covariation between data at various spatial locations. Data points, and the associated surface, at nearby locations are assumed to be more similar to each other than data points at locations that are distant from each other. There are many ways to estimate the covariance structure in spatial data and to use this information to create a kriged surface. ArcGIS uses a somewhat ad hoc method, estimating something called the variogram using weighted least squares, and then relies on some further ad hoc approaches to reduce the computational burden of producing an estimate of the underlying surface and estimates of uncertainty in the surface estimate. For a surface estimate based on a default approach to estimating the variogram, one can use the Spatial Analyst. The Geostatistical Analyst is more extensive and customizable: one can either use the default approach or customize the variogram fit and computational details. I would characterize both the default and customizable approaches as being ad hoc and not particularly principled statistically, but they may still produce reasonable surface estimates, since the main factor is that nearby data points be weighted in a reasonable fashion when estimating the surface. One should note that the uncertainty estimates for the kriged surface do NOT account for the uncertainty involved in estimating the covariance structure. Next I describe how to perform kriging in both Spatial Analyst and Geostatistical Analyst. Among the advantages of the implementation of kriging in Geostatistical Analyst relative to that in Spatial Analyst are the ability to use the Matern covariance model favored by many statisticians, the ability to handle directionality in the data, and the ability to make plots of prediction errors as a way of assessing uncertainty. Kriging using Geostatistical Analyst: Using the Geostatistical Analyst, one can create a surface estimate based on default modelling of the covariance structure, or one can modify the covariance structure in a number of ways. 1. Make sure the Geostatistical Analyst is enabled (Tools->Extensions) and the toolbar is visible (View->Toolbars). 2. Select Geostatistical Wizard on the Geostatistical Analyst toolbar. 3. Select the input data (layer) to krige and the attribute (response, i.e., z) variable, as well as the ‘Kriging’ method in the lower left. Then choose ‘Next’. 4. (Step 1) Select Ordinary Kriging->Prediction Map. You also presumably have the option of transforming the data, such as using the natural logarithm, if you haven't done so already, though in my setup, ArcGIS gives me no options under ‘Transformation’. Click ‘Next’. 5. (Step 2) The next step involves estimating the covariance structure using the empirical variogram or empiral covariogram. The default is the variogram, and there is some evidence in the literature for preferring this when using the weighted least squares approach, as in ArcGIS. I suggest using the K-Bessel (known more commonly as the Matern) model; this has become popular in the spatial statistics literature. In particular, the Matern model tends to produce surfaces that are more smooth locally (on a very fine scale) than some other models (such as the exponential or spherical). This is often desirable, since the unknown underlying surface can often plausibly be considered to be locally smooth on a very fine scale. I suggest using the values for the Matern model estimated by ArcGIS. If you wish to change the values, you will want to read the documentation for ArcGIS and probably some basic material on spatial statistics. Note that if the estimated value for the ‘Parameter’ of the Matern is less than one, then the estimated surface is not differentiable (0.5 is the exponential model and values approaching infinity tend toward the Gaussian model). I also suggest clicking on 'Measurement Error' and sliding the bar to '100% Measurement Error', which indicates that your data are measured with uncertainty. The documentation claims that by doing so, you avoid interpolating the data (having the estimated surface go exactly through your data points), although I have not seen any differences in the estimated surface whether the Measurement Error box is clicked or not clicked. Still, to be safe, I suggest clicking the box. Then click ‘Next’. The one exception to using the default values is that you may want to investigate the possibility that your data have directionality to them, for example if they are influenced by prevailing winds or groundwater flow directions. An easy way to do this is to click on “anisotropy” toward the upper right of the Step 2 box and allow ArcGIS to estimate the necessary additional parameters automatically. To assess whether including directionality in your model is warranted you could compare the root mean square prediction error given in Step 4 for the models with and without anisotropy. 6. (Step 3) The next step determines the details of how ArcGIS approximates the surface estimate so that the computations can be done quickly. Theoretically, the estimated surface at any point should be based on all of the data points, but ArcGIS uses only some of the points, indicated in the ‘Neighbors to Include’ field. From a statistical viewpoint, there is more danger in using too few than too many points (since the kriging method essentially ignores data points far from the location at which predictions are being made anyway). I suggest using a sizable fraction of the data, to the extent that it does not take too long to estimate the surface. For example for 100 data points, I would try to use at least 25 neighbors, and more if possible. For 1000, I would use at least 25-50 and ideally a few hundred, but the computations may be too slow with this many. I would unclick the ‘Include at Least’ box, which preferentially includes data points based on direction, unless you have modeled directionality on the previous screen. Note that the colored map of points indicates the relative weights assigned to each data point for use in predicting the surface at the focal location at the center of the circle/ellipse. Click ‘Next’. 7. (Step 4) The next screen assesses how good your predictions are based on cross-validation (leaving points out of the fitting and then estimating the value and comparing to the actual value). The root-mean-square prediction error can be compared between different models as a way of choosing a model (such as whether to include directionality). The ArcGIS documentation gives more detail about this. Click ‘Finish’ and then ‘OK’ to produce the surface map. 8. Your new layer should now appear on the map. If you would like to change the colors, you can right-click on the layer from the Layers column and go to 'Properties'. On the top menu, select 'Symbology'. You can then change the colors in 'Color Ramp'. You can also change the number of colors that appear on the map by selecting a different number from the drop-down menu. 9. Estimating the uncertainty in your predicted surface is an important aspect of spatial analysis. If the uncertainty is so great that peaks and valleys in the surface may not truly exist, then it would be ill-advised to try to interpret those peaks and valleys as representing real features of your data. To estimate the standard errors at each location, as presented in an additional map, use the exact same steps as above, but in Step 1, select Ordinary Kriging->Prediction Standard Error Map. The interpretation of the confidence level indicated by the standard error at each location is that if you repeated the experiment many times, collecting data over and over again, in approximately 67% of the experiments, the true surface at a point would lie within one standard error of the estimated surface produce. Note that this uncertainty calculation does NOT account for the uncertainty in fitting the covariance structure, which might be substantial. Quick and dirty default kriging using Spatial Analyst: Using the Spatial Analyst, one can create a kriged spatial surface using default parameter estimates in ArcGIS. Note that this approach does not handle directionality, for which one would need to use the more extensive options in Geostatistical Analyst. 1. Make sure the Spatial Analyst is enabled (Tools->Extensions) and the toolbar is visible (View->Toolbars). 2. On the 'Layers' column, select the layer you want to krige. This will usually be the .csv layer you just created. 3. Go to the Spatial Analyst toolbar, Interpolate to raster->Kriging. Make sure the correct layer appears in the 'Input Points'. 4. On the 'Z Value Field', select the Z variable from the drop-down menu. 5. Select the ‘Ordinary’ kriging method. 6. I would ordinarily suggest using the Matern model, but this is only avaible in Geostatistical Analyst, so my next choice is the exponential model. A theoretical drawback to this model is that the result surface is not particularly smooth (continuous but not differentiable), but the plots I have seen seem reasonably smooth in practice. The Gaussian model (which has nothing to do with whether the z values have a Gaussian distribution) is a possibility, but I have found that the resulting surface can be very unsmooth, probably because of some numerical instability inherent in the kriging calculations based on the Gaussian model. 7. In the 'Search Radius Settings', I suggest using at least 25 (more if possible), unless the surface estimate takes too long to produce. This determines how many data points are used in predicting surface values at each location. See my comments about this issue in the discussion of Geostatistical Analyst. Then hit OK. 8. Your new layer should now appear on the map. If you would like to change the colors, you can right-click on the layer from the 'Layers' column and go to 'Properties'. On the top menu, select 'Symbology'. You can then change the colors in 'Color Ramp'. You can also change the number of colors that appear on the map by selecting a different number from the drop-down menu. Some comments on mapping in ArcGIS: ArcGIS provides various colors for mapping. In creating a map, one wants to make sure that a legend is included in any final output, so that viewers can interpret the levels and understand the range of the surface and differences in the response between different levels. This can be done using Insert->Legend. The color scheme should vary in a smooth fashion so that the user does not interpret sharp changes in color to be sharp changes in the surface unless this is truly the nature of the surface estimate. I believe there are principled ways of choosing a color scheme, but I haven't yet looked for further information on this. For surfaces that do not have negative values, one possibility is to use a color scheme such as used on topographic maps. For surfaces with both positive and negative values, I suggest using a blue-red color scheme, with white for values of zero and deepening shades of blue and red for increasingly negative and positive values, respectively. Interpolating methods in ArcGIS: Note that ArcGIS also provides spline and radial basis function (RBF) interpolators that can create spatial surfaces. These methods have the distinct drawback of producing surfaces that go exactly through all the data points, which is generally a bad idea with noisy data, from which we would like to create a smoothed surface. Note that spline and RBF methods can defined in a statistical way to produce smoothing methods that are not forced to interpolate the data (e.g., the methods in the R mgcv library and in SemiPar), but this is not done in ArcGIS.