It is practically impossible and unnecessary to obtain spatial-temporal information of any given continuous phenomenon at every point within a given geographic area. The most practical approach has always been to obtain information about the phenomenon as in many sample points as possible within the given geographic area and estimate the values of the unobserved points from the values of the observed points through spatial interpolation. However, it is important that users understand that different interpolation methods have their strength and weaknesses on different datasets. It is not correct to generalize that a given interpolation method (e.g. Kriging, Inverse Distance Weighting (IDW), Spline etc.) does better than the other without taking into cognizance, the type and nature of the dataset and phenomenon involved. In this paper, we theoretically, mathematically and experimentally evaluate the performance of Kriging, IDW and Spline interpolation methods respectively in estimating unobserved elevation values and modeling landform. This paper undertakes a comparative analysis based on the prediction mean error, prediction root mean square error and cross validation outputs of these interpolation methods. Experimental results for each of the method on both biased and normalized data show that Spline provided a better and more accurate interpolation within the sample space than the IDW and Kriging methods. The choice of an interpolation method should be phenomenon and data set structure dependent.
Interpolation aims at finding the values of a function
can be obtained from a mathematical function or from an empirical function modelled from observations or experiments [
Spatial interpolation has continued to be an important tool for estimating continuous spatial environmental variables for effective decision making. Many modeling tools including Geographic Information System offer the earth and environmental scientist the ability to carry out spatial interpolation routinely to generate useful spatial continuous data for all kinds of analysis [
For a sample of a target variable
According to Mitas and Mitasova [
Currently, it is difficult to find a method that fulfils all the above-mentioned requirements for a wide range of georeferenced data. Therefore, the right choice of the most adequate method with appropriate parameters for applications is paramount. Different methods produce different spatial representations in different datasets; also, in-depth knowledge of the phenomenon in question is necessary in evaluating which of the interpolation methods produces results closest to reality. The use of an unsuitable method or inappropriate parameters can result in a distorted model of spatial distribution, leading to potentially wrong decisions based on misleading spatial information. A wrong interpolation results becomes very critical when the estimates are inputs for simulations, as small error or distortion can cause models to produce false spatial patterns [
While external factors e.g. data density, spatial distribution of sample data, surface type, sample size and sampling design, etc. [
This paper attempts to examine the accuracy of spatial interpolation methods in modeling landform (topography) in relation to their mathematical formulation. The experimental study of this work employs an area comprising a slope and a plain as landform-adaptability test area and focuses on the comparative analysis of three commonly used interpolation methods of Kriging, Spline, and Inverse Distance Weighting, IDW. The following section summarizes the theoretical and mathematical basis of different known interpolation methods including the three interpolation methods in question. Section 3 introduces the accuracy analysis methods used in this paper while Section 4 presents the experimental analysis. Section 5 discusses the results and Section 6 concludes.
Different spatial interpolation methods have been developed in different domain for different applications. According to [
a) Mechanical/deterministic/non-geostatistical methods; these include among other methods, Inverse Distance Weighting (IDW) and Splines.
b) Linear statistical/stochastic/geostatistical methods; which include Kriging among others [
This method assumes that the value at an unknown location can be approximated as a weighted average of values at points within a certain cut-off distance, or from a given number of the closest points (typically 10 to 30). Weights are usually inversely proportional to a power of distance [
where p is a parameter (typically = 2) [
where
Splines belong to a group of interpolators called Radial Basis Functions (RBF). Methods in this group include Thin-Plate Spline (TPS), Regularized Spline with Tension, and Inverse Multi-Quadratic Spline [
For regularized spline with tension and smoothing, the prediction is given by:
where a1 is a constant and R(vi) is the radial basis function given by:
and
where
where
Wahba and Wendelberger [
where r is the distance between sample points and un-sampled locations [
The relation below approximates the surface with minimum bend
where the terms
where
and V is a vector of point heights. K is a matrix of the distance between sampled points and P is a matrix of the sampled points coordinates.
The relation below gives the inverse multi-quadratic spline function
where
where the weights
and is computed by the relation
where z is replaced by a vector of sampled data values, F is a square function matrix given by,
The estimation function generated with these weights is smooth and exact at sampled data points [
Splines have been widely seen as highly suitable for estimation of densely sampled heights and climatic variables [
Kriging, synonymous to geostatistical interpolation, began in the mining industry as a means of bettering ore reserve estimation in the early 1950’s [
Regionalized variable theory assumes that the spatial variation of any variable can be expressed as the sum of the following three components:
a) A structural component having a constant mean or trend.
b) A regionalized variable, which is the random but spatially correlated component.
c) A random but spatially uncorrelated noise or residual component.
d) Mathematically, for a random variable z at x, the expression is
where
Ordinary Kriging (OK) is a standard version of Kriging where predictions are based on the model,
where
where
Assuming stationarity, one can estimate a semi-variogram,
where h is the distance between point
where
where C is the covariance matrix derived for an n x n samples matrix with one additional row and column added to ensure the sum of weights is equal to one, and
1) Low values of h have small variance with variance increasing in direct proportion to h, leveling off at a certain point to form the sill.
2) At distances less than the range (the distance at which the variance levels off), points closer together are more likely to have similar values than points further apart while at distances greater than the range, points have no influence upon themselves. The range therefore gives an idea of how large the search radius needs to be for a distance-weighted interpolation.
3) The semivariance when h is zero has a positive value referred to as the nugget and indicates the amount of non-spatially autocorrelated noise [
The semivariance displayed in an experimental variogram is modeled by a mathematical function depending on the shape of the experimental variogram. A spherical model is used when the variogram has a classic shape, an exponential model when the approach to the sill is more gradual. A Gaussian model is used when the nugget is small and the variation is very smooth, and a linear model when there is no sill. A variogram containing a trend that has to be modeled separately is increasingly steep with larger values of h. If the nugget variance is large and the variogram shows no tendency to gradually vanish with smaller values of h, or the distance between observations is larger than the range (i.e. sample points are too far apart to influence one another), then interpolation is not reasonable and the best estimate is the overall mean of the observations. Anoise-filled variogram showing no particular pattern may mean that the observations are too few. A variogram that dips at distances greater than the range to create a hole effect shows the sample space may be too small to reflect some long wave-length variation in the data [
(i) The trend function is fixed
(ii) The variogram is invariant in the entire area of interest
(iii) The target variable is (approximately) normally distributed.
These requirements are often not met and constitute a serious disadvantage of Ordinary Kriging [
The accuracy evaluation indices commonly used include, the Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). For n observations, p predicted value, and o observed value these indices are evaluated using the expressions listed below:
ME is used for determining the degree of bias in the estimates often referred to as the bias [
The study area as described in
Data so obtained from the field samples was in Microsoft Excel .xls format. The lowest elevation of the area is 27.018m while the highest elevation is 98.719m above mean sea level. Preliminary data exploration using ESRI’s ArcGIS (version 9.3) shows the data is not normally distributed as shown by distribution parameters in
Count | 462 |
---|---|
Minimum value | 27.018 |
Maximum value | 98.719 |
Mean | 71.754 |
Standard deviation | 21.633 |
Skewness | −0.69162 |
Kurtosis | 2.0252 |
Median | 81.497 |
1st quartile | 52.994 |
3rd quartile | 88.886 |
The following contour fill surface shown in
Regularized Spline interpolation, implemented as Radial Basis Function(RBF), with order 2 gives the contour fill map in
Kriging works on the assumption that the data set is normalized, therefore, we carried out Box-Cox normalization on the data before implementing Kriging interpolation.
The data was divided into a training and test subsets in a ratio of 80:20 using the Geostatistical Analyst tool in ArcGIS and optimal parameter values (generated by ArcGIS and sets the best possible value for each parameter) used for predictions on the training subset. The test subset was then used with these optimal parameters for validation. The data distribution parameters after Box-Cox normalization is as shown in
The contour fill maps generated using Kriging with the Gausian model and
count | 462 |
---|---|
Minimum value | 26.018 |
Maximum value | 97.719 |
mean | 70.754 |
Standard deviation | 21.633 |
skewness | −0.69162 |
kurtosis | 2.0252 |
median | 80.497 |
1st quartile | 51.994 |
3rd quartile | 87.886 |
Spherical model respectively for auto-calculated values for nugget, sill, mid- range, a lag size of 54.401, and lag number = 12 are as shown
The maps for IDW, Spline and Kriging after the optimal validation of the data are shown in Figures 6(a)-(c).
From the prediction errors tabulated in
METHOD | IDW | SPLINE | KRIGING |
---|---|---|---|
ME | 0.1589 | 0.0608 | 0.01989 |
RMSE | 3.488 | 2.101 | 4.374 |
ME is the mean prediction error and RMSE is the root mean square standardized error of prediction.
METHOD | IDW | SPLINE | KRIGING |
---|---|---|---|
ME | 0.396 | 0.06838 | 0.4912 |
RMSE | 3.176 | 1.892 | 4.012 |
ME is the mean prediction error and RMSE is the root mean square standardized error of prediction.
is not normally distributed or difficult to normalize. Splines on the other hand use a physical model varying in accordance to the variation in the elastic properties of the estimation function. It tends to do well with modeling physical phenomena such as terrain. IDW uses a linear combination of values at captured event locations, assigns weights by an inverse function of the separation between the event location to be estimated and points captured to estimate values of the unknown location. Though weights are specified arbitrarily, ArcGIS software provides an optimal weight management function that assigns a weight that is most suitable for points within the captured data set. Predictions are influenced by this weight assignment but are more reliable in terms of error than what is obtained using Kriging. It is acknowledged that the Kriging does very well with covariate data such as temperature data, but the data has to be captured as randomly as possible. This is often not achieved. A good knowledge of the data used as well as the strengths and weaknesses of the available interpolation methods is necessary in deciding on a method to use for interpolation for a given purpose.
In this study, Spline provides a more accurate model and result for the elevation data obtained directly from field survey that was not homogenously randomized and not normalized. From the interpolation result we obtained, Spline method outside the data area also reaffirms that predictions by RBFs are not constrained to the range of measured values, i.e., predicted values can be above the maximum or below the minimum measured value. Tan and Xu [
Ikechukwu, M.N., Ebinne, E., Idorenyin, U. and Raphael, N.I. (2017) Accuracy Assessment and Comparative Analysis of IDW, Spline and Kriging in Spatial Interpolation of Landform (Topography): An Experimental Study. Journal of Geographic Information System, 9, 354- 371. https://doi.org/10.4236/jgis.2017.93022