Kriging With ArcGIS |
|
Adding
Layers of Data in ArcGIS :
2. After downloading and unzipping the files, add the layers to your new
map by opening ArcMap clicking File, Add Data, and selecting the layers
you want to add from the appropriate folders.
3. If you want labels displayed on your map, you can right-click on the
layer and select Label Features. Preparing your data and importing into ArcGIS
2. Note that ArcGIS assumes that input x and y coordinates correspond to
the x and y coordinates of the coordinate system
of the data frame (longitude and latitude if you are using a geographic
coordinate system). If this is not the case for your data, you may need
to further examine the documentation to see how this might be handled.
3. For response variables that are constrained to be positive, such as
concentrations, a common approach is to take the natural logarithm of
the values. This tends to produce data that more closely match the underlying
statistical assumptions involved in kriging. This and other transformations
could also be done within ArcGIS before doing kriging.
4. Your input file should be comma delimited, which means that there should
be commas, and only commas, between the values in the file. Each line
should pertain to a single location, with the x and y coordinates of the
location and the response value, all separated by commas. One way to create
this format is in Microsoft Excel. Open the file that you want to convert.
Go to File->Save as, and under 'Save as type', select 'CSV'. Hit Save, and Yes to the window
that pops up. When you quit Excel, you do not need to save any changes.
In S-plus or R, if you use the write.table() function to create the data
file, the argument ' sep="," ' will create a comma
delimited file. You can also use dbf files, which can also be created
with MS Excel.
5. Open your map in ArcMap, and go to Tools->Add XY data, click the folder
icon on the top right, and browse through the folders to find the .csv
file that you just created. Then hit OK. Your new layer should appear
in the Layers column on the left. It will also display the layer as individual
points on your map. Overview of Kriging Methodology: Kriging is a statistical technique that posits a certain statistical model for the data, namely that the response at a given location is the sum of two components: an unknown underlying surface, which we are trying to estimate, plus some additional noise. Kriging produces an estimate of the underlying (usually assumed to be smooth) surface by a weighted average of the data, with weights declining with distance between the point at which the surface is being estimated and the locations of the data points. The exact nature of the decline is based on modelling the covariation between data at various spatial locations. Data points, and the associated surface, at nearby locations are assumed to be more similar to each other than data points at locations that are distant from each other. There are many ways to estimate the covariance structure in spatial data and to use this information to create a kriged surface. ArcGIS uses a somewhat ad hoc method, estimating something called the variogram using weighted least squares, and then relies on some further ad hoc approaches to reduce the computational burden of producing an estimate of the underlying surface and estimates of uncertainty in the surface estimate. For a surface estimate based on a default approach to estimating the variogram, one can use the Spatial Analyst. The Geostatistical Analyst is more extensive and customizable: one can either use the default approach or customize the variogram fit and computational details. I would characterize both the default and customizable approaches as being ad hoc and not particularly principled statistically, but they may still produce reasonable surface estimates, since the main factor is that nearby data points be weighted in a reasonable fashion when estimating the surface. One should note that the uncertainty estimates for the kriged surface do NOT account for the uncertainty involved in estimating the covariance structure. Next I describe how to perform kriging in both Spatial Analyst and Geostatistical Analyst. Among the advantages of the implementation of kriging in Geostatistical Analyst relative to that in Spatial Analyst are the ability to use the Matern covariance model favored by many statisticians, the ability to handle directionality in the data, and the ability to make plots of prediction errors as a way of assessing uncertainty. Kriging using Geostatistical Analyst:
Using the Geostatistical Analyst, one can create a surface estimate based
on default modelling of the covariance structure, or one can modify the
covariance structure in a number of ways.
2. Select Geostatistical Wizard on the Geostatistical Analyst toolbar.
3. Select the input data (layer) to krige and the attribute (response,
i.e., z) variable, as well as the ‘Kriging’ method in the
lower left. Then choose ‘Next’.
4. (Step 1) Select Ordinary Kriging->Prediction Map. You also presumably
have the option of transforming the data, such as using the natural logarithm,
if you haven't done so already, though in my setup, ArcGIS gives me no
options under ‘Transformation’. Click ‘Next’.
5. (Step 2) The next step involves estimating the covariance structure
using the empirical variogram or empiral covariogram. The default is the
variogram, and there is some evidence in the literature for preferring
this when using the weighted least squares approach, as in ArcGIS. I suggest
using the K-Bessel (known more commonly as the Matern) model; this has
become popular in the spatial statistics literature. In particular, the
Matern model tends to produce surfaces that are more smooth locally (on
a very fine scale) than some other models (such as the exponential or
spherical). This is often desirable, since the unknown underlying surface
can often plausibly be considered to be locally smooth on a very fine
scale. I suggest using the values for the Matern model estimated by ArcGIS.
If you wish to change the values, you will want to read the documentation
for ArcGIS and probably some basic material on spatial statistics. Note
that if the estimated value for the ‘Parameter’ of the Matern
is less than one, then the estimated surface is not differentiable (0.5
is the exponential model and values approaching infinity tend toward the
Gaussian model). I also suggest clicking on 'Measurement Error' and sliding
the bar to '100% Measurement Error', which indicates that your data are
measured with uncertainty. The documentation claims that by doing so,
you avoid interpolating the data (having the estimated surface go exactly
through your data points), although I have not seen any differences in
the estimated surface whether the Measurement Error box is clicked or not clicked. Still,
to be safe, I suggest clicking the box. Then click ‘Next’. The one exception to using the default values is that you may want to investigate the possibility that your data have directionality to them, for example if they are influenced by prevailing winds or groundwater flow directions. An easy way to do this is to click on “anisotropy” toward the upper right of the Step 2 box and allow ArcGIS to estimate the necessary additional parameters automatically. To assess whether including directionality in your model is warranted you could compare the root mean square prediction error given in Step 4 for the models with and without anisotropy.
6. (Step 3) The next step determines the details of how ArcGIS approximates
the surface estimate so that the computations can be done quickly. Theoretically,
the estimated surface at any point should be based on all of the data
points, but ArcGIS uses only some of the points, indicated in the ‘Neighbors
to Include’ field. From a statistical viewpoint, there is more danger
in using too few than too many points (since the kriging method essentially
ignores data points far from the location at which predictions are being
made anyway). I suggest using a sizable fraction of the data, to the extent
that it does not take too long to estimate the surface. For example for
100 data points, I would try to use at least 25 neighbors, and more if
possible. For 1000, I would use at least 25-50 and ideally a few hundred,
but the computations may be too slow with this many. I would unclick the
‘Include at Least’ box, which preferentially includes data
points based on direction, unless you have modeled directionality on the
previous screen. Note that the colored map of points indicates the relative weights
assigned to each data point for use in predicting the surface at the focal
location at the center of the circle/ellipse. Click ‘Next’.
7. (Step 4) The next screen assesses how good your predictions are based
on cross-validation (leaving points out of the fitting and then estimating
the value and comparing to the actual value). The root-mean-square prediction
error can be compared between different models as a way of choosing a
model (such as whether to include directionality). The ArcGIS documentation
gives more detail about this. Click ‘Finish’ and then ‘OK’
to produce the surface map.
8. Your new layer should now appear on the map. If you would like to
change the colors, you can right-click on the layer from the Layers column
and go to 'Properties'. On the top menu, select 'Symbology'. You can then
change the colors in 'Color Ramp'. You can also change the number of colors
that appear on the map by selecting a different number from the drop-down
menu.
9. Estimating the uncertainty in your predicted surface is an important
aspect of spatial analysis. If the uncertainty is so great that peaks
and valleys in the surface may not truly exist, then it would be ill-advised
to try to interpret those peaks and valleys as representing real features
of your data. To estimate the standard errors at each location, as presented
in an additional map, use the exact same steps as above, but in Step 1,
select Ordinary Kriging->Prediction Standard Error Map. The interpretation
of the confidence level indicated by the standard error at each location
is that if you repeated the experiment many times, collecting data over
and over again, in approximately 67% of the experiments, the true surface
at a point would lie within one standard error of the estimated surface
produce. Note that this uncertainty calculation does NOT account for the
uncertainty in fitting the covariance structure, which might be substantial. Quick and dirty default kriging using Spatial Analyst:
Using the Spatial Analyst, one can create a kriged spatial surface using
default parameter estimates in ArcGIS. Note that this approach does not
handle directionality, for which one would need to use the more extensive
options in Geostatistical Analyst.
1. Make sure the Spatial Analyst is enabled (Tools->Extensions) and
the toolbar is visible (View->Toolbars).
2. On the 'Layers' column, select the layer you want to krige. This will
usually be the .csv layer you just created.
3. Go to the Spatial Analyst toolbar, Interpolate to raster->Kriging.
Make sure the correct layer appears in the 'Input Points'.
4. On the 'Z Value Field', select the Z variable from the drop-down menu.
5. Select the ‘Ordinary’ kriging method.
6. I would ordinarily suggest using the Matern model, but this is only
avaible in Geostatistical Analyst, so my next choice is the exponential
model. A theoretical drawback to this model is that the result surface
is not particularly smooth (continuous but not differentiable), but the
plots I have seen seem reasonably smooth in practice. The Gaussian model
(which has nothing to do with whether the z values have a Gaussian distribution)
is a possibility, but I have found that the resulting surface can be very
unsmooth, probably because of some numerical instability inherent in the
kriging calculations based on the Gaussian model.
7. In the 'Search Radius Settings', I suggest using at least 25 (more if
possible), unless the surface estimate takes too long to produce. This
determines how many data points are used in predicting surface values
at each location. See my comments about this issue in the discussion of Geostatistical Analyst.
Then hit OK.
8. Your new layer should now appear on the map. If you would like to change
the colors, you can right-click on the layer from the 'Layers' column and
go to 'Properties'. On the top menu, select 'Symbology'. You can then change
the colors in 'Color Ramp'. You can also change the number of colors that
appear on the map by selecting a different number from the drop-down menu. Some comments on mapping in ArcGIS: ArcGIS provides various colors for mapping. In creating a map, one wants to make sure that a legend is included in any final output, so that viewers can interpret the levels and understand the range of the surface and differences in the response between different levels. This can be done using Insert->Legend. The color scheme should vary in a smooth fashion so that the user does not interpret sharp changes in color to be sharp changes in the surface unless this is truly the nature of the surface estimate. I believe there are principled ways of choosing a color scheme, but I haven't yet looked for further information on this. For surfaces that do not have negative values, one possibility is to use a color scheme such as used on topographic maps. For surfaces with both positive and negative values, I suggest using a blue-red color scheme, with white for values of zero and deepening shades of blue and red for increasingly negative and positive values, respectively. Interpolating methods in ArcGIS: Note that ArcGIS also provides
spline and radial basis function (RBF) interpolators that can create spatial
surfaces. These methods have the distinct drawback of producing surfaces
that go exactly through all the data points, which is generally a bad
idea with noisy data, from which we would like to create a smoothed surface.
Note that spline and RBF methods can defined in a statistical way to
produce smoothing methods that are not forced to interpolate the data
(e.g., the methods in the R mgcv library and in SemiPar), but this is
not done in ArcGIS. |
|