_{1}

^{*}

The effects of centering response and explanatory variables as a way of simplifying fitted linear models in the presence of correlation are reviewed and extended to include nonlinear models, common in many biological and economic applications. In a nonlinear model, the use of a local approximation can modify the effect of centering. Even in the presence of uncorrelated explanatory variables, centering may affect linear approximations and related test statistics. An approach to assessing this effect in relation to intrinsic curvature is developed and applied. Mis-specification bias of linear versus nonlinear models also reflects this centering effect.

Applied probability models are mathematical constructs that have roots in both theory and observed data. They often reflect specific theoretical properties, but may simply be the application of an all-purpose linear model. The fitting of a probability model to the observed data requires careful consideration of potential difficulties and model sensitivities. These may include aspects of the model itself or anomalies in the structure of the database. As large scale observational databases have become more common, the possibility of unplanned and non- standard data patterns have become more common.

The stability of linear models can be affected by various properties of the model-data combination. Model sensitivity to rescaling and transformations of the response [

The simple centering of data in linear models is often applied as a component of standardizing the variables in a regression, re-centering the means of the variables at zero. It can also be seen as a way to lower correlation among explanatory variables in some cases, but will have limited if any effect on ANOVA related test statistics and measures of goodness of fit in models when interaction terms are present in the model. This is due to the geometry of the test statistics involved which typically reflect standardized lengths of orthogonal projections which are invariant to centering. See for example [

Nonlinear regression models are also available to model data based patterns. The use of centering in such models can be challenging to interpret. Such models are common in many biological, ecological and economic applications and there is often less flexibility in the set of potential modifications available as theory often informs and restricts model choice. Examples can be found in [

In this paper, centering effects are examined in relation to the use of linear approximation in nonlinear regression models. To begin, the effects of centering in linear models with interaction effects are reviewed. Centering effects in nonlinear models where linear approximation is employed to obtain tests of significance are then discussed. Even in the presence of uncorrelated explanatory variables and simple main effects, centering may significantly affect locally defined linear approximations and related test statistics. Local measures of nonlinearity are defined and used to assess these effects. We then investigate the mis-specification of linear versus nonlinear models and show that centering effects arise as a measure of bias. This is particulary relevant in high dimensional data modeling where centeriing is common as a first step in data analysis.

We can write a standard linear model in the form

typically assuming the

The use of centering in linear regression settings is typically suggested to lower correlation among the explanatory variables. For example, if

To see this, consider the simple linear regression model

Centering by definition will not affect the shape of the initial ^{2}

For the multivariate linear model

the same basic argument related to residuals holds and the results are similar. The centering of all variables has no effect on the measures of association between the x and y variables, including the least squares estimators

The addition of interaction terms

This implies that the main effect of

The centering of the data to limit potentially high levels of correlation between the interaction term

then the least squares estimate of the interaction term will not alter if

Consider the Penrose bodyfat ( [

Nonlinear regression models typically are developed and applied in areas such as toxicology, economics and ecology. See [

Variables | Wt | Ab | Wrist | Ab * Wt | Ab * Wrist |
---|---|---|---|---|---|

Coeff (Std Error) | −0.0005 (0.00041) | −0.0026 (0.00041) | −0.0017 (0.0056) | 0.000002 (0.000002) | 0.00003 (0.00003) |

p-value | 0 | 0.001 | 0.77 | 0.29 | 0.35 |

Variables | Wt | Ab | Wrist | Ab * Wt | Ab * Wrist |
---|---|---|---|---|---|

Coeff (Std Error) | 0.0002 (0.00005) | −0.0022 (0.00013) | 0.0037 (0.001) | 0.000002 (0.000002) | 0.00003 (0.00003) |

p-value | 0 | 0.001 | 0.001 | 0.29 | 0.35 |

Nonlinear regression models are subject to the effects of centering when using local linear approximation. The relative position of the response y vis-a-vis the solution locus

Local Geometry

Some geometry is briefly reviewed. Let

be expressed as

where

where again

An intrinsic curvature based adjustment to standard ANOVA can be developed. See [

To investigate this curvature effect in relation to the hypothesis

with large values of the test statistic leading to rejection of

A further orthogonal decomposition gives a test of significance for curvature in the direction

where

See [

As in linear models, the use of centering on both response and some if not all of the explanatory variables initially would seem to have little or no effect on the underlying geometry of the model-data combination. A graph of the

In regard to standard m.l.e. based analysis, the effects of centering will depend on the actual model itself. For example consider the asymptotic growth model

where centering the data yields

If the differences

The fundamental nature of a nonlinear regression model may be reflected in its possible forms under reparameterisation, especially in regard to re-expression as a linear model. If this is possible, then intrinsic cur- vature corrections tend to be of little value and centering can be seen to have the same non-effect as in standard linear models with regard to the rescaled parameters. For example, the Michaelis-Menten model is given by;

where

Letting

For models which may not be re-expressed as linear models, we can assess the change in curvature effect at a given

The SSE values may also differ and together these alter the relevant F-statistics for the local ANOVA analysis discussed above. Note that while the raw data plot is simply re-centered, the local approximation and analysis reflecting the model-data combination is more strongly affected by centering.

We examine these concepts further in the context of the asymptotic growth model applied to the BOD dataset found in Bates and Watts (1988). This is given by

The original and centered dataset is given in

The non-standard behavior of this model yields log-likelihood based confidence regions that are open at confidence levels above 95% in the

with related 2 by 2 by n second order Hessian matrix

where each

Note that the m.l.e. here is not available in closed form, rather it is defined by differentiating the log- likelihood with regard to each parameter and setting the resulting equations equal to zero. Here the log- likelihood is given by

Demand | 8.3 (−6.53) | 10.3 (−4.53) | 19 (4.17) | 16 (1.17) | 15.6 (0.77) | 19.8 (4.97) |
---|---|---|---|---|---|---|

Time | 1 (−2.667) | 2 (−1.667) | 3 (−0.667) | 4 (0.333) | 5 (1.333) | 7 (3.333) |

MLE ( | Std Error ( | t-statistic ( | p-value ( | SSE | |
---|---|---|---|---|---|

Original Data | 19.14, 0.53 | 2.50, 0.20 | 7.67, 2.62 | 0.0015, 0.06 | 2.549 |

Centered Data | 8.10, 0.21 | 11.94, 0.27 | 0.68, 0.78 | 0.53, 0.48 | 2.878 |

Note that the effects of centering on the m.l.e. occur in this set of equations. Standard errors can be deter- mined from the inverse of the Fisher Information matrix.

For the original data, the resulting maximum likelihood or least squares value for

The curvature adjusted approach using ANOVA is given in

The measure

is examined here by comparing the SSCurv elements pre and post centering. This has a value pre-centering (0.40) that is approximately only 10% of its value post-centering (3.90). Whether this incurs statistically significant effects will depend on the local curvature of the surface, the manner in which the parameters enter into the model and the relative position of y in relation to

The use of linear models when the underlying model-data combination is nonlinear can lead to mis-specification error. It is interesting to consider this in relation to centering effect which can yield bias even where second order intrinsic curvature is not significant. In many high dimensional data analytic techniques the centering of the data is a standard first step. See for example [

To examine mis-specification generally in this setting, we begin by expressing a linear model as function of two sets of variables

Assume that the variables of interest form the

Assume now that a true nonlinear model underlies the set of

Source | df | SS | MS | F-statistic | p-value |
---|---|---|---|---|---|

Regression | 2 | 31.93 | 15.97 | 2.44 | 0.21 |

Residual | 4 | 26.11 | 6.53 | ||

Curvature | 1 | 0.4 | 0.4 | 0.047 | 0.85 |

Modified Residual | 3 | 25.71 | 8.57 | ||

Regression + Curvature | 3 | 32.33 | 10.78 | 1.26 | 0.43 |

Total | 6 | 58.03 |

Source | df | SS | MS | F-statistic | p-value |
---|---|---|---|---|---|

Regression | 2 | 1103.5 | 551.53 | 65.66 | 0.001 |

Residual | 4 | 33.58 | 8.4 | ||

Curvature | 1 | 3.91 | 3.91 | 0.4 | 0.572 |

Modified Residual | 3 | 29.67 | 9.89 | ||

Regression + Curvature | 3 | 1106.96 | 368.99 | 37.31 | 0.007 |

Total | 6 | 1136.63 |

where

where

where

If we fit the original linear model, mis-specification effects arise as we will use (i)

If the actual data are also centered, it follows that a data-based centering effect will further occur. Letting

The effect of centering the data here may be to worsen the mis-specification related biasing effect. This will depend on how the linear and nonlinear elements in the W vector and

Model sensitivity and stability are essential components of applied research using probability modes. These are functions of the model structure, data structure and the inferential or estimation method used to fit the model. This is most pronounced when nonlinear models are to be employed and linear approximation is a component of the inferential process. Wald statistics are the most interpretable in this setting and in the case of nonlinear regression with normal error; the curvature of the regression surface is a key component affecting the accuracy of the inferential process. The underlying nature of the model is also relevant with linearity on same scale being reflected in the intrinsic curvature related calculations. These issues arise often in the analysis of high dimen- sional datasets where centering is a standard first step.

If we examine centering in the context of the original point cloud the effects of centering seem non-existent. But the information in the data is assessed in relation to the assumed linear or nonlinear model. The properties of the assumed model are thus relevant to the estimation and testing of parameters defined within the fitted local model. The positioning of the response vector y in n-space in relation to the p-dimensional nonlinear regression surface defines a local frame of reference for inference with the intrinsic curvature and even simple centering has effects in nonlinear models both generally and when linear approximation is employed. Nonlinear models often reflect theoretical results for carefully chosen parameter and data scaling. In conclusion, the centering of data in relation to nonlinear regression model should be applied and interpreted carefully.

We thank the Editor and the referee for their comments.

MichaelBrimacombe, (2016) Local Curvature and Centering Effects in Nonlinear Regression Models. Open Journal of Statistics,06,76-84. doi: 10.4236/ojs.2016.61010