The Thin Plate Regression Spline (TPRS) was introduced as a means of smoothing off the differences between the satellite and in-situ observations during the two dimensional (2D) blending process in an attempt to calibrate ocean chlorophyll. The result was a remarkable improvement on the predictive capabilities of the penalized model making use of the satellite observation. In addition, the blending process has been extended to three dimensions (3D) since it is believed that most physical systems exist in the three dimensions (3D). In this article, an attempt to obtain more reliable and accurate predictions of ocean chlorophyll by extending the penalization process to three dimensional (3D) blending is presented. Penalty matrices were computed using the integrated least squares (ILS) and integrated squared derivative (ISD). Results obtained using the integrated least squares were not encouraging, but those obtained using the integrated squared derivative showed a reasonable improvement in predicting ocean chlorophyll especially where the validation datum was surrounded by available data from the satellite data set, however, the process appeared computationally expensive and the results matched the other methods on a general scale. In both case, the procedure for implementing the penalization process in three dimensional blending when penalty matrices were calculated using the two techniques has been well established and can be used in any similar three dimensional problem when it becomes necessary.
As stated clearly by [
The blending technique used by [
Under normal circumstances, one would expect in situ to have a smooth relationship with satellite data in areas where observations are available from both data sets. This expectation is not met, indicating that some unknown factors may be affecting this relationship. Apart from the drawbacks of the algorithm for converting reflectance data to ocean chlorophyll concentration by not taking into account weather conditions and water characteristics, which in turn vary with ocean depth, other factors like seasons of the year, day and location of sample collection could also have an influence on this relationship. In addition, [
The lack of a smooth relationship between satellite and in situ field could be caused by the noisiness of both data sets as stated by [
This result motivated the main objective of the research, which is to extend the penalization of the blending technique from two dimensions to three dimensions (3D) since it is believed that most physical systems when expressed in three dimensions makes them closer to reality. The process of penalization requires smoothing, thus the efficiency of the technique will depend on the choices of the smoothing parameters. The procedure for selecting the smoothing parameter for the three dimensional penalization will follow some of the techniques used by [
The extension of smoothing the blending process from two to three-dimensions is very similar to the extension of the blending process itself from two to three dimensions. Hence expressing the three-dimensional blending process in the basis function mode can be done by extending the two dimensional basis function equation of [
f b l e n d ( x , y ) = f s a t ( x , y ) + ∑ k = 1 n g k ( x , y ) (1)
where g k is the solution to ∂ 2 g ∂ x 2 + ∂ 2 g ∂ y 2 = 0 , subject to the boundary conditions { 0 , x i , y i : i = 1 , ⋯ , k , k + 1 , ⋯ , n , Δ k , x k , y k } . Extension to three dimensions was done with the addition of time, averaged over weeks as the third variable. When this is done, Equation (1) can then be written in three dimensions as follows:
f b l e n d ( x , y , z ) = f s a t ( x , y , z ) + ∑ k = 1 n g k ( x , y , z ) (2)
where g k is the actual solution to the three dimensional partial differential equation
∂ 2 g ∂ x 2 + ∂ 2 g ∂ y 2 + ∂ 2 g ∂ z 2 = 0 (3)
subject to the boundary conditions { 0 , x i , y i , z i : i = 1 , ⋯ , k , k + 1 , ⋯ , n , Δ k , x k , y k , z k } .
Equation (2) can be re-written with each of the g k separately as
f b l e n d ( x , y , z ) = f s a t ( x , y , z ) + ∑ k = 1 n β k Δ k ( x , y , z ) (4)
where β k is set to the difference between the in situ and satellite values at boundary point k and Δ k ( x , y , z ) representing the basis is the solution to
∂ 2 g ∂ x 2 + ∂ 2 g ∂ y 2 + ∂ 2 g ∂ z 2 = 0
with external boundary points set to zero and the internal boundary points set to zero everywhere except at the kth position where it is set to 1.0, that is the knot of the basis. The blending process is performed using this knot as the sole boundary point, and the solution obtained is the basis for that knot. The sum of the product of each knot and its basis is added to the satellite field to give the final blended field of the basis function method.
In order to penalize the three dimensional blending model, it was necessary to represent it in a regression equation form. This can be done using Equation (4) where the term of interest is
∑ k = 1 n β k Δ k ( x , y , z )
From the two dimensional regression equations in [
Z k = ∑ j = 1 n β j Δ j ( x j , y j , z j ) + ε k (5)
where β j are unknown parameters to be estimated and Δ j ( x j , y j , z j ) the basis corresponding to the point j while ε k is the error term. This expression has been shown by [
Z k = β j Δ j ( x j , y j , z j ) + ε k
And the estimation of the parameters can be done using the nonparametric technique of penalized regression spline. This can be done by solving the penalized least squares objective given as
∑ k = 1 n [ Z k − β k Δ k ( x k , y k , z k ) ] 2 − λ ∑ β k Δ k ( x k , y k , z k ) (6)
Fitting the penalized model will require the estimation of the parameters β ′ k s and the smoothing parameter λ . These, can be obtained from generalized cross validation (gcv) and will therefore require the calculation of the collective penalty matrix S which penalizes the difference between the satellite and in situ fields. This shall be done by using the procedures of integrated least squares and the integrated squared derivatives. The ridge regression ended up in complete decoupling in the two dimensional case and thus shall not be tried here.
Though this techniques was not successful in the two dimensional case because of the sparseness of data, its description here is to fine out its performance when more information is made available and secondly to provide a means of implementing it in three dimensions when it becomes necessary. Thus by using the integrated least squares, a collective penalty in the three dimensions was imposed on the term
∑ k = 1 n β k Δ k ( x , y , z )
By so doing, the penalty term could be written as
∭ [ δ ( x , y , z ) ] 2 d x d y d z
By evaluating the three dimensional integral, the n × n penalty matrix S was obtained. That is
S = ∭ [ δ ( x , y , z ) ] 2 d x d y d z
This was approximated by performing pair wise sums in each spatial direction as
S i , j ≈ ∑ Δ i ( x q , y q , z q ) Δ j ( x q , y q , z q )
by summing over all the grid points ( x q , y q , z q ) . Once this matrix has been calculated, the model fitting process becomes straight forward. That is, given any sequence of smoothing parameters (λ’s), and the selected knots (β’s), models were be fitted to choose the optimum gcv score and optimum trace value.
These was then be used to fit the penalized model. The result obtained here was not encouraging, as predictions from these models were worse than those from the normal three-dimensional blending. This gave way to the examination of penalization using the integrated squared derivative in the three-dimensions.
In the case of the integrated squared derivative, the penalty is calculated based on the derivative of the basis function. This is represented by
S = ∫ [ ( d u d x ) 2 + ( d u d y ) 2 + ( d u d z ) 2 ] d x d y d z ,
where x, y and z, represent the spatial coordinates and
u = ∑ k = 1 n β k Δ k ( x , y , z ) = ∂ ¯ ( x , y , z )
The penalty S will be obtained from evaluation the integral. Therefore the n × n penalty matrix S was obtained by evaluating
S = ∫ ( d u d x ) 2 d x d y d z + ∫ ( d u d y ) 2 d x d y d z + ∫ ( d u d z ) 2 d x d y d z = ∫ β T d ∂ ¯ d ∂ ¯ T d x d x β d x d y d z + ∫ β T d ∂ ¯ d ∂ ¯ T d y d y β d x d y d z + ∫ β T d ∂ ¯ d ∂ ¯ T d z d z β d x d y d z = β T ∫ d ∂ ¯ d ∂ ¯ T d x d x d x d y d z β + β T ∫ d ∂ ¯ d ∂ ¯ T d y d y d x d y d z β + β T ∫ d ∂ ¯ d ∂ ¯ T d z d z d x d y d z β
The penalty matrix S is obtained by evaluating
∫ d ∂ ¯ d ∂ ¯ T d x d x d x d y d z + ∫ d ∂ ¯ d ∂ ¯ T d y d y d x d y d z + ∫ d ∂ ¯ d ∂ ¯ T d z d z d x d y d z
These integrals can be approximated by the sums of product of the differences over all pairs of basis functions to obtain the n × n matrix S. This is then used to calculate the gcv score. The λ corresponding to the minimum gcv score is selected as the optimum smoothing parameter which is then used to fit the penalized regression model.
The data set used for validation is an extract obtained from the archive maintained by the National Oceanic and Atmospheric Administration (NOAA) National Oceanography Data Centre (NODC), comprising observations from 1997 to 2002 averaged over a grid size of 0.75 longitude by 0.75 latitude and using the successive 8-day intervals over the year. This is justified by the fact that this same data set has been used in previous attempts to calibrate ocean chlorophyll and in addition, there is need to compare the performance of previous models and the model herein described.
Considering the large number of in situ observations available when looking at the data field in three-dimensions and the large amount of computer memory required running the evaluation of the penalty matrix, only data for the month of May was used in the validation exercise. This month was selected because it had the highest number of in situ observations. Validation data sets of size 150 were randomly selected from the in situ data field. Prediction of these validation data sets from the models fitted were compared to those obtained from the other blending methods.
The plots of
various correction methods. Again the penalized model failed to improve the results from the corrector factor method though occasionally it produces differences smaller than those from either the corrector factor or the basis function methods especially where the validation datum is surrounded by observations from the satellite field.
In this article the procedure for implementing smoothing on the blending process in three dimensions (3D) has been successfully established. This was achieved by expressing the interpolation formula used by the corrector factor of [
Making use of the cross-validation score calculated from the integrated least squares did not improve on the results, this is similar to what was obtained by [
The integrated squared derivative penalty is not expected to suffer from the same problems faced by the previous methods. This is because the action of the penalty is simply to try and flatten the smooth function around the vicinity of the omitted datum. If the smoothing parameter is large, it will increase the flattening and consequently pulls the estimate far away from the omitted datum. The penalty obtained by this technique had very little or no effect on the smoothing function hence the equality in results from the penalized and the other correction models.
Though the validation process did not show a remarkable improvement in the prediction potential of the penalized method, in several instances the penalized model had better results than the other tested models.
The main objective of this research was to extend the implementation of the thin plate regression spline of [
Though penalized blended field obtained from the penalty matrix obtained by using the integrated least squares did not improve results, the process of implementing this technique in three dimensions was well established and could thus be used in other fields of application. The results obtained when the integrated squared derivative was used to obtain the penalty matrix were not different from the other corrective models in general but had better prediction in areas where the validation datum was surrounded by many satellite observations. This makes this result interesting since the satellite field can now be used for prediction with much certainty.
With this result in three dimensions, it is hope that the ocean life cycle could be modeled more accurately since satellite born sensors which can provide sampling as required are available. In addition, it is believed that most physical and environmental problems exist in three dimensions and modeling them as such makes them closer to reality.
Onabid, M.A. and Wood, S. (2018) Estimating Ocean Chlorophyll Using the Penalized Three Dimensional (3D) Blending Technique. Open Journal of Marine Science, 8, 386-394. https://doi.org/10.4236/ojms.2018.83021