The impact of long-memory on the Before-After-Control-Impact (BACI) design and a commonly used nonparametric alternative, Randomized Intervention Analysis (RIA), is examined. It is shown the corrections used based on short-memory processes are not adequate. Long-memory series are also known to exhibit spurious structural breaks that can be mistakenly attributed to an intervention. Two examples from the literature are used as illustrations.
Ecological studies often involve data collected over time. Examples are observations of population densities such as the relative abundance of the white sea urchin (Lytechinus anamesus) in an area offshore the San Onofre Nuclear Generating Station (Schroeter et al. [
Researchers have explored the relationships among long-memory, aggregation, and structural breaks in time series [
Tests which seek to detect breaks due to an intervention in a series whose true data generating process is long memory are in danger of detecting spurious breaks. Later in this paper we examine two examples taken from the literature. They were chosen because they present instances in the literature where a significant intervention effect was detected when in fact no intervention occurred. A possible explanation for this is the presence of strong correlation in the data, perhaps long-memory, which could have produced the spurious detection of a significant intervention effect. This possibility provided the impetus for this manuscript.
The Before-After-Control-Impact (BACI) design [
In lieu of ignoring the autocorrelation, strategies have been proposed to adjust the BACI and RIA analyses for autocorrelated data. One approach is parametric: estimate the correlation structure using an assumed (short memory) model and use the estimated correlation to adjust the 2-sample t-test and confidence interval in the BACI analysis (see Bence [
This paper is structured as follows: Section 2 contains definitions and some simple derivations. In Section 3 we conduct numerical studies to illustrate the inadequacy of short-memory corrections for long-memory series. Section 4 contains two examples from the literature, and Section 5 concludes the paper with a brief discussion.
The following facts from time series theory and methodology may be found in standard texts such as [
ρ X ( h ) : = C o r r ( X t , X t + h ) , h = ± 1,2,3, ⋯ .
A short-memory time series has an ACF that decays at an exponential rate, i.e., ρ X ( h ) r h approaches a positive constant as h → ∞ for some 0 < r < 1 . A long-memory time series has an ACF that decays at a hyperbolic rate: ρ X ( h ) r h approaches a positive constant as h → ∞ for some 0 < r < 1 .
Suppose { W t } is a white noise process. We consider the following short- memory process, the autoregressive model of order 1 (AR(1))
X t = ϕ X t − 1 + W t , | ϕ | < 1.
The ACF of the AR(1) is given by
ρ X ( h ) = ϕ h , h = 0 , 1 , 2 , ⋯
Long-memory processes, as described by fractionally differenced white noise (FD(d)), define the time series { X t } by
( 1 − B ) d X t = W t , − 0.5 < d < 0.5.
where B defined by B X t = X t − 1 is the backshift operator and the fractional differencing operator ( 1 − B ) d has the polynomial expansion
( 1 − B ) d = ∑ j ≥ 0 π j B j , π j = Γ ( j − d ) Γ ( j + 1 ) Γ ( − d ) .
The exact form of the ACF of the FD(d) process is known (see [
ρ X ( h ) = Γ ( h + d ) Γ ( 1 − d ) Γ ( h − d + 1 ) Γ ( d ) = ∏ 0 < i ≤ h i − 1 + d i − d , h = 1 , 2 , ⋯ (1)
The AR(1) and FD(d) are both stationary processes. Fractionally differenced white noise is a classic long-memory time series, having an ACF that decays at a hyperbolic rate: ρ ( h ) h 1 − 2 d approaches a positive constant as h → ∞ . The AR(1) model is short-memory since its ACF converges to zero at an exponential rate as h → ∞ .
Consider the BACI design. Suppose Y 1 , ⋯ , Y n are observations on the impact site, X 1 , ⋯ , X n are observations on the control site and D 1 , ⋯ , D n are the differences between the two:
D t : = Y t − X t , t = 1 , ⋯ , n .
Assume { Y t } and { X t } are jointly stationary, yielding a stationary { D t } , and that V a r ( D i ) = σ 2 . As is well-known, if D 1 , ⋯ , D n form a random sample then V a r ( D ¯ ) = σ 2 / n . However, when D 1 , ⋯ , D n are realizations from a stationary time series with autocorrelation function ρ D ( h ) , then
V a r ( D ¯ ) = σ 2 n [ 1 + 2 ∑ h = 1 n − 1 ( 1 − h n ) ρ D ( h ) ] . (2)
The quantity
1 + 2 ∑ h = 1 n − 1 ( 1 − h n ) ρ D ( h ) (3)
is sometimes called the variance correction factor [
The estimated correction factor
1 + 2 ∑ h = 1 n − 1 ( 1 − h n ) ρ ^ ( h ) (4)
is used to adjust the usual estimate of V a r ( D ¯ ) , s 2 / n , when D 1 , ⋯ , D n are realizations of a stationary time series. Bence [
The BACI design uses a 2-sample t-test to compare the pre-intervention and post-intervention control-impact mean differences. Let D ¯ P r e denote the n 1 pre-intervention differences, D ¯ P o s t the n 2 post-intervention differences, n 1 + n 2 = n , and S ^ E ( D ¯ P o s t − D ¯ P r e ) be the estimated standard error of D ¯ P o s t − D ¯ P r e where
The estimated standard error S ^ E ( D ¯ P o s t − D ¯ P r e ) is calculated using (2) and the estimated variance correction (4). The estimates σ ^ D and ρ ^ D are obtained from pooling the two sets of differences. Note S ^ E ( D ¯ P o s t − D ¯ P r e ) ignores the correlation between the two samples since
S E ( D ¯ P o s t − D ¯ P r e ) = V a r ( D ¯ P o s t ) + V a r ( D ¯ P r e ) − 2 C o v ( D ¯ P o s t , D ¯ P r e ) .
This is another reason for the inacccuracy of the method in the presence of long-memory; for a short-memory process the problem will not be as severe.
The assumption that { X t } and { Y t } are jointly stationary allows the use of the 2-sample t-test with equal variances. Combined with the null hypothesis of no intervention effect, this suggests the following approximate test statistic for the 2-sample t-test
D ¯ P o s t − D ¯ P r e S ^ E ( D ¯ P o s t − D ¯ P r e ) ~ t n 1 + n 2 − 2 .
The use of the t-distribution depends on asymptotic theory which requires very large samples when the process is long-memory. For smaller samples it is not exactly correct, but it is difficult to work out the exact distribution ( [
An alternative to using a correction factor for the standard error of the mean is the use of nonparametric methods. The procedure is to resample blocks (the block bootstrap), the blocks being chosen large enough to properly capture the autocorrelation. The permutation test used in RIA is essentially block resampling from blocks of size one. This can be effective where the correlation structure is that of a short-memory process since blocks of minimal size are required. However, when long-memory is present the blocks must be large, requiring very large samples.
The problems that correlated data pose for RIA have been studied previously. One examination is in Carpenter et al. [
All simulations were run using the R environment [
The correction factors for FD(d) and AR(1) processes can be computed from (1) and (3) when the values of d , ϕ are known.
n | AR(0.7) | AR(0.9) | AR(0.99) | FD(0.3) | FD(0.49) |
---|---|---|---|---|---|
5 | 3.08 | 7.19 | 8.80 | 2.40 | 8.57 |
10 | 4.16 | 12.03 | 18.12 | 3.61 | 17.82 |
25 | 5.04 | 17.56 | 43.44 | 6.24 | 45.10 |
50 | 5.36 | 18.90 | 78.00 | 9.46 | 89.84 |
100 | 5.51 | 19.00 | 125.79 | 14.33 | 178.08 |
correction factors for the FD(d) process for d = 0.3 , 0.49 and those of the AR(1) process for values ϕ = 0.7 , 0.9 , 0.99 . Several sample sizes were used.
The most striking difference between the AR(1) and the FD(d) correction factors are the rates at which they increase as the sample size increases. FD(d) processes increase more due to the fact the autocorrelations persist longer. For small sample sizes the AR(1) corrections tend to be equal to or slightly larger than the FD(d) corrections, while for large sample sizes they are too small.
Bence [
To investigate the size α of 2-sample t-tests when the data are from a long memory process, series of various lengths for several values of d were simulated. Each simulated series was split into two equal halves to be the two series. The case d = 0 corresponds to white noise for the errors. The t-test statistic was calculated both with and without the AR(1) variance correction, and the null was rejected if the test statistic exceeded the appropriate critical value. The proportion of rejections was the estimated size of the test. The AR(1) correction used the value of ϕ estimated from the simulated series. Results are in
For white noise ( d = 0 ) processes the uncorrected and AR(1) corrected tests have size approximately equal to the nominal size. As d increases, the sizes of the tests increase. Though the AR(1) performs better than no correction, the performance is very poor for strong long-memory. Also, in the presence of long-memory the size of the test increases from the nominal size as the sample size increases, the performace being worse the stronger the long-memory.
Carpenter et al appear to have introduced RIA in [
a | 0.10 | 0.05 | 0.01 | ||||
---|---|---|---|---|---|---|---|
d | n | None | AR | None | AR | None | AR |
20 | 0.107 | 0.105 | 0.052 | 0.044 | 0.009 | 0.003 | |
40 | 0.099 | 0.101 | 0.050 | 0.047 | 0.010 | 0.006 | |
0.0 | 60 | 0.100 | 0.100 | 0.051 | 0.049 | 0.010 | 0.008 |
80 | 0.099 | 0.100 | 0.050 | 0.048 | 0.010 | 0.008 | |
100 | 0.102 | 0.101 | 0.049 | 0.047 | 0.010 | 0.009 | |
20 | 0.221 | 0.157 | 0.138 | 0.075 | 0.048 | 0.009 | |
40 | 0.275 | 0.193 | 0.189 | 0.110 | 0.083 | 0.026 | |
0.2 | 60 | 0.309 | 0.218 | 0.225 | 0.137 | 0.107 | 0.041 |
80 | 0.332 | 0.240 | 0.247 | 0.154 | 0.127 | 0.052 | |
100 | 0.352 | 0.254 | 0.268 | 0.169 | 0.145 | 0.063 | |
20 | 0.361 | 0.211 | 0.271 | 0.114 | 0.135 | 0.017 | |
40 | 0.472 | 0.287 | 0.382 | 0.181 | 0.242 | 0.053 | |
0.4 | 60 | 0.529 | 0.327 | 0.444 | 0.228 | 0.311 | 0.090 |
80 | 0.571 | 0.364 | 0.491 | 0.269 | 0.358 | 0.119 | |
100 | 0.603 | 0.397 | 0.520 | 0.293 | 0.398 | 0.149 |
test. The permutation test assumes the differences are independent, an assumption violated by data possessing long-memory.
Computing the exact p-value for a permutation test can be computationally taxing even for moderate sample sizes. The p-value can be approximated via Monte Carlo methods, using random assignments of the data to each of the two samples. The estimate of the p-value is taken to be the ratio of the number of random assignments resulting in an absolute mean difference that meet or exceed the observed difference to the number of random assignments. Since the aim of the simulation is to approximate the distribution of the p-value returned for RIA applied to a FD(d) time series, Monte Carlo methods are again applied to simulate many realizations from a FD(d) process and an approximate p-value is calculated for each.
Carpenter et al. recognized RIA is affected by autocorrelations. They simulated data from short-memory AR(1) and MA(1) processes and ran these through RIA, also checking these for true rejections when a given intervention of size ms occurred, that is, sizes that are multiples m of the standard deviation. As a result they recommend a correction to the p-value when dealing with positive autocorrelations, i.e., using a declared p-value of 0.01 to get a true p-value of 0.05.
d = 0.1 | d = 0.3 | d = 0.49 | |||||||
---|---|---|---|---|---|---|---|---|---|
n | Q1 | Median | Q3 | Q1 | Median | Q3 | Q1 | Median | Q3 |
10 | 0.2100 | 0.4469 | 0.7062 | 0.13 | 0.3648 | 0.6584 | 0.09463 | 0.29745 | 0.59205 |
20 | 0.1779 | 0.4214 | 0.6966 | 0.08017 | 0.28870 | 0.61352 | 0.02045 | 0.14620 | 0.46227 |
50 | 0.1344 | 0.4110 | 0.6842 | 0.0237 | 0.1892 | 0.5373 | 0.00160 | 0.04795 | 0.34013 |
100 | 0.1515 | 0.3740 | 0.6693 | 0.0087 | 0.1196 | 0.4537 | 0.0000 | 0.0121 | 0.2185 |
In the simulation a permutation test was applied to each of 1000 simulated long-memory FD(d) series, of the values of d and n indicated. The estimated permutation test p-values were based on 10,000 random permutations of each simulated data set. Note the simulated long-memory series contain no intervention but as mentioned do strongly violate the assumption of independent observations behind the permutation test. Estimated quartiles of the p-value distributions are summarized below in
For fixed d, as the sample size n increases, the distribution becomes increasingly right-skewed, with the p-values increasingly concentrated near zero. This is also true for fixed n, as the long-memory parameter d increases. The simulation results indicate long-memory data analyzed with a permutation test will result in many false detections of trend or intervention.
As mentioned, the R package fracdiff [
The following two examples were taken from the literature. They were chosen because they present instances in the literature where the BACI analysis with the short-memory AR(1) variance correction and the RIA analysis utilizing a permutation test returns a significant intervention effect when in fact no intervention occurred. A possible explanation for this is the presence of strong correlation in the data, perhaps long-memory, which could have produced the spurious detection of a significant intervention effect. The observations in the following examples are only approximately equally spaced in time. They were assumed so in order to simplify the analyses.
The first example involves data read from figure 4a in Bence [
The analysis by Bence estimated the mean difference with a t confidence interval. He assumed an AR(1) correlation structure after the Durbin-Watson test detected significant autocorrelation. However, estimation returned a non- stationary model, ruling out the AR(1) and another indication the data may possess long-memory.
Fitting a long-memory model to the sea urchin data yielded the estimate d ^ = 0.3 . The value of the approximate chi-square test statistic equaled 34.66, with a Monte Carlo approximate p-value of 0.00161. The test for significance of
the long-memory parameter yielded a Monte Carlo approximate p-value of 0.01222. The long-memory corrected 95% confidence interval (using (3) with the estimated FD(d) ACF using d ^ ) for the mean difference is − 1.84 ± 5.13 , indicating the mean difference is equal to 0. This compares with the (from Bence [
Carpenter et al. ( [
Durbin-Watson does not detect a statistically significant autocorrelation at lag 1 (p-value = 0.346), ruling out the AR(1). The fitted FD(d) model yielded the estimate d ^ = 0.15 . The value of the approximate chi-square test statistic equaled 81.21, with a Monte Carlo approximate p-value of 0.09075. The test for the significance of the long-memory parameter yielded a Monte Carlo approximate p-value of 0.05785. Weak to moderate long-memory in the data is a possible explanation of the significance of RIA, creating a false trend which was detected by the permutation test as a spurious break due to the intervention.
Murtaugh [
However, the BACI design and analysis will work better than RIA in these situations because it is amenable to a simple long-memory variance correction which will improve its performance. It is also known [
Researchers have examined the relationships among long-memory, aggregation and structural breaks in time series [
One solution is to detect and account for the breaks in a series, correct for them and then analyze the corrected time series. However, aggregation tests [
Boucher, T.R. (2017) Long-Memory and Spurious Breaks in Ecological Experiments. Open Journal of Statistics, 7, 768-779. https://doi.org/10.4236/ojs.2017.75054