Periodicity is common in natural processes, however, extraction tools are typically difficult and cumbersome to use. Here we report a computational method developed in MATLAB through a function called Periods with the aim to find the main harmonic components of time series data. This function is designed to obtain the period, amplitude and lag phase of the main harmonic components in a time series (Periods and lag phase components can be related to climate, social or economic events). It is based on methods of periodic regression with cyclic descent and includes statistical significance testing. The proposed method is very easy to use. Furthermore, it does not require full understanding of time series theory, nor require many inputs from the user. However, it is sufficiently flexible to undertake more complex tasks such as forecasting. Additionally, based on previous knowledge, specific periods can be included or excluded easily. The output results are organized into two groups. One contains the parameters of the adjusted model and their F statistics. The other consists of the harmonic parameters that best fit the original series according to their importance and the summarized statistics of the comparisons between successive models in the cyclic descent process. Periods is tested with both, simulated and actual sunspot and Multivariate ENSO Index data to show its performance and accuracy.
Periodic or cyclic phenomena are characteristic of many different kinds of data [
The task of finding periodicities in data is typically made using signal processing techniques or spectral analysis, but it requires a solid mathematical background. In recent years, there has been a great interest in developing alternatives like the matching pursuit algorithm [
In contrast, in this paper we develop a computer method through a MATLAB function called Periods, which is based on classical Fourier analysis, whose underlying math is easy to understand and does not require a deep knowledge of the Fourier theory. Also, such function it is able to identify all of the significant cyclic com- ponents and the linear trend (if present) in a time series and, if desired, to make predictions outside of the observed values.
We tested the performance of the Periods function with a time series simulated with known Periods and also with the well known yearly sunspot series and MEI (Multivariate El Niño Southern Oscillation Index) series. The results were compared to values reported in the literature to evaluate accuracy.
For the purpose of this paper we define a time series as a sequence of data points
In Equation (1),
The cyclic component can be represented by a cosine wave [
This function is defined by the length of the cycle (period p), one-half the range from the minimal to the maximal response (semi-amplitude A, hereafter referred to as amplitude for simplicity) and the angular point in time during the cycle when the response is a maximum (phase angle
When the period is known, the estimation of amplitude and phase angle can be done by regression, which in this context is referred as periodic regression [
here
when
From the following equation
the graphical lag from the origin (L) can be calculated as
When there are several periodicities in the data, the model will consist of a sum of sinusoidal functions [
The proposed procedure for finding the period and corresponding amplitude and phase angle for each harmonic that best fits
For the set of periods to test, a reasonable initial value (ip) is three (in the same units of time as the original data) while the final value (fp) will depend on the length of the time series (n).
In order to select the optimum period (op) in the tested sequence, an analytical and graphical criterion is introduced. This criterion is called the maximal reciprocal of residual sum of squares (MRRSS). Analytically, the objective consists of adjusting the parameters of a model function to best fit the data set based on the residual sum of squares (RSS); graphically, RSS values close to zero are amplified and represent peaks denoting the optimum period. The mathematical representation is as follows:
where
and where
Once the first op has been found, the fitted series
The periodic regression will then be applied sequentially to the
where
To determine if the addition of the new harmonic component is statistically significant (continuing the search of periods), we use a likelihood ratio test [
where
Once the last significant harmonic is found, in order to determine the best estimates of the amplitude and phase for each one, a final model including all the harmonics is fitted by multiple regression
In this equation,
This section describes the computational method developed trough the Periods function, which incorporates the above described methodology and other additional steps.
The complete set of arguments and default values of the Periods function in MATLAB1 is:
Periods (x, varargin)
where varargin represents the rest of the optional arguments, detailed below:
・ x: Time series
A vector containing the time series to be analyzed. Any variable name can be used, not restricted to x.
The next arguments are optional, with default values if not provided. They allow the user to control and fine tune the performance of the function and the degree of detail of the results.
・ t: Time vector
Must be a monotonically increasing vector of the same length of x. Usually data is collected in regular intervals in time. However, in long series this is not always the case. For example, the MEI series used in the results section, the data is arranged on a “monthly” basis, which is not a regular time unit due that a “month” in this series can have 28, 29, 30 or 31 days. In the case of sunspot time series, the time unit should be 365.25, which is the real value in days of a given year. For such reason, it is recommended using the date vector in the format given for the datenum function of MATLAB to obtain a precise time unit. This can have a effect on improving the fit. However, it is user’s decision to experiment and try to evaluate the results.
・ predict: t values for predictions
Integer used when a prediction is desired. If positive a forecast is done, when negative a hindcast is done. Both elements can be supplied indicating prediction in both directions. A time vector can also be provided, in this case a prediction over the vector time will be done. predict values or vector should be in the same units as time vector (t). When used, a new element (Xef) will be added to the output inside Final_Model array.
・ trend: Linear trend (string)
A logical. If TRUE, the linear tendency is calculated and subtracted from the data series, default is FALSE. This option is recommended to be used when the data has a clear tendency.
・ ip: Initial period to test
・ lp: Last period to test
To save computational time two criteria are applied to determine ip and lp. To avoid randomness, ip starts at 3 and lp is restricted to half the length of
・ step: Period step
Is the increment in time for each tested period according to the time unit (derived from t above), the default value is 1 assuming that two successive data values are separated by one unit of time. However, step can take values between 0 and 1. In the previous example the periods to be tested can be 3, 4, 5, …, 501. Changing the default to step = 0.1 the sequence of testing periods to test becomes 3.1, 3.2, 3.3, …, 501. It is important to note that small step consumes more computational time.
・ hn: Harmonics number
Number of harmonics to estimate. Determined by the MRRSS criteria or an integer number supplied by the user.
・ neig: Neighbors
By default, a period identified (op) in a periodic regression will not be considered in a subsequent search unless neig = −1. Other positive integers will cause the function to also exclude neighboring periods:
・ known: Known periods
This option completely overrides the search for periods in cyclic descent. A model with the given known periods is fitted by multiple regression to the x series.
・ include: Include periods
This option allows the user to specify the period(s) to be included in the final fitted model after the search for periods in the cyclic descent has ended.
・ exclude: Exclude periods
The periods given here will be excluded in the periods search.
・ alpha: Significance level, a
The confidence level for the statistical test in the cyclic descent or probability error if the corresponding model fit is rejected. Defaults to alpha = 0.05.
・ rrss: Reciprocal of residual sum of squares (string)
Logical, defaults to FALSE. If TRUE, the periods tested and their corresponding RRSS in every step of the cyclic descent will be provided.
・ plots: Produce plots
A string indicating the desired plots. By default plots = “last”, in which case just the plot with the final fitted model is generated. With plots = “all” the RRSS versus tested periods and the cumulated harmonics fit in every step of the cyclic descent will be plotted in addition to the final fit. When plots = “none” no plot is produced. Can be abbreviated.
Besides the graphical output, the Periods function also outputs a list object with two structure arrays in MATLAB, called Cyclic_Descent and Final_model, they are composed as follows.
Cyclic_Descent: Contains the results of the cyclic descent method and the statistical test in each step:
・ Harmonics: period, amplitude, phase, lag phase, RSS and R2 for every harmonic found;
・ Stats: F statistic, the corresponding degrees of freedom and p-values for the comparison between successive models;
・ RRSS: the periods tested and the corresponding RRSS in every step of the cyclic descent. Only returned if rrss = TRUE.
Final_Model: Contains the final fitted model (Equation (16)) and the corresponding statistical results:
・ Xe: the fitted series;
・ Xef: the predicted time series, only given if the predict argument is used. In this case both the new t values and the corresponding predictions will be provided;
・ Params: parameters for the final model. When trend = TRUE linear trend coefficients (
・ F_stats:
As a first example of the use of the Periods function, a time series (sim, n = 220) comprised by four harmonic components with: periods = 25, 10, 16, 73; amplitudes = 40, 20, 10, 5; phases = 2, 5, 1, 0;
>> plots = “a”
>> [Final_model, Cyclic_Descent] = Periods (sim, plots)
When running the function, a succession of graphic windows with two panels each appears as the harmonics are found. In the upper panel the RRSS values are plotted against tested periods. op refers to the optimal period found in a given step of the cyclic descent, which corresponds to MRRSS values (Equation (10)) for a given harmonic (the user has the option to output the RRSS used to produce this plot, specifying rrss = TRUE). In the bottom panel, the x series and the fitted model (one harmonic in the first plot, and thereafter the sum of the current with preceding harmonics) are shown, along with the corresponding coefficient of determination
In the left panel the effect on the value of RRSS becomes apparent as harmonics are found and subtracted in
preceding steps of the cyclic descent process. In the upper plot, for instance, the peak at 25 is the most prominent and the peak at 10 is hardly visible. Once the influence of this harmonic is subtracted, the peak at 10 becomes the most prominent in the second plot (below the top panel plot) and it is possible to distinguish another peak at 16. As the harmonics continue to be subtracted, the peak close to 80, invisible in the first step of the cyclic descent, becomes more and more evident from the second to the fourth plot. A plot of the periodogram computed by fast fourier transform will be similar to the first plot in the cyclic descent (note that instead of the period, the frequency is used as x-axis in the periodogram), but the selection of significant harmonics is left to the user. In the right panel, as the detected harmonics are incorporated to the model the correlation improves. The processes stops when the addition of another harmonic, in this case with a period of 5, is statistically non-significant (see Equation (15) and p-values. below), but the plot is still generated. With plot = “a”, the last plot produced by the Periods function shows the original time series with the final fitted series (not shown here).
Besides graphical output, the resulting object Cyclic_Descent is inspected below. The Harmonics data frame includes the periods found and their corresponding amplitude, phase (Equation (3)) and lag. The residual sum of squares and
>> Cyclic_Descent.Harmonics
Period Amplitude Phase Lag RSS R.sq
Model 1: 25 40.29 2.02 8.03 76155.61 0.6998
Model 2: 10 20.04 −1.21 −1.93 31938.24 0.8741
Model 3: 16 9.85 1.00 2.55 21124.79 0.9167
Model 4: 75 3.98 −0.11 −1.35 19415.50 0.9234
Model 5: 5 2.17 1.99 1.58 18897.26 0.9255
The Stats data frame contains the comparisons carried out between successive models, i.e. the F statistic, the corresponding degrees of freedom and the associated p-value. In this case, the addition of a harmonic with a period of five (Model 5) is non significant.
Note that for the purpose of this example, the RRSS were not required. In addition, the results from Final_ Model are not shown.
>> Cyclic_Descent.Stats
Models 1&2: 148.829 2 215 < 2.2e−16
Models 2&3: 54.515 2 213 < 2.2e−16
Models 3&4: 9.287 2 211 0.0001362
Models 4&5: 2.865 2 209 0.0591766
The comparison of the parameters used to simulate the sim time series with those obtained from the Periods function is presented in
To test the Periods function with real data, the time series of yearly sunspot numbers2 was chosen. This well known series contains the sunspot numbers by year [
i | Original | Estimated | ||||
---|---|---|---|---|---|---|
Period | Amplitude | Phase | Period | Amplitude | Phase | |
1 | 25 | 40 | 2 | 25 | 40.29 | 2.02 |
2 | 10 | 20 | 5 | 10 | 20.04 | −1.21 |
3 | 16 | 10 | 1 | 16 | 9.85 | 1.00 |
4 | 73 | 5 | 0 | 75 | 3.98 | −0.11 |
8.4 and a 5.7-years cycles when analyzing the sunspot numbers from 1700 to 1960. Nordemann et al. [
A first analysis was carried out using the following code.
>> [Final_model, Cyclic_Descent] = Periods (sunspots, t)
The upper panel in
To improve the fitted model and also to illustrate the flexibility of the Periods function, the sunspot series was reanalyzed using step = 0.25 in order to evaluate fractional (quarterly) sunspot cycles, since some authors have reported periods in fractions of years. The middle panel of
Periods with small differences between them may be regarded as closely related, for instance 5.25, 5.5 and 5.75. In some cases it may not be desirable to report cycles with such small differences. The neig argument can be used to remove neighboring values above and below a previously identified period preventing these values from being tested in subsequent steps of the cyclic descent. To illustrate this, the previous analysis was repeated adding neig = 2. The result is shown in the bottom panel of
It must to be noted that the removal of neighboring periods is solely a user decision, and it is therefore subjective. It is up to the user to decide whether or not it makes sense to obtain all the possible periods in a given analysis or to limit the search to non-contiguous cycles. In any case, it is recommended to carry out some testing and compare the results before deciding on this matter. Other options that can be explored by the user are the inclusion or exclusion of periods depending on previous knowledge on the data (arguments: include, exclude). Again, this is only a user decision.
Also, it should be noticed that there is a slight difference in the arrangement and values of the harmonics and the R2 value found in the Cyclic Descent process and the fitted Final Model showed by the method. The reason for this is, once the Cyclic Descent process found a certain number of harmonics, these are used as seed values for fitting a model for the observed data, and there is a rearrangement of the found values, and usually there are slight changes in the values of harmonics, phases, etc., and in a slight increase in the R2 value of the fitted model.
There are other function arguments that the user can control to modify the behavior of Periods as explained in Section 0. Users are encouraged to explore the effect of changing the default values of these arguments in the
Cyclic Descent Order | Period | Amplitude | Phase | Lag | Cum.R2 |
---|---|---|---|---|---|
1 | 11.00 | 30.15 | −2.51 | −4.40 | 0.28 |
2 | 10.00 | 21.19 | −0.16 | −0.26 | 0.42 |
3 | 103.00 | 16.63 | −2.54 | −41.41 | 0.50 |
4 | 11.75 | 12.11 | −1.61 | −3.02 | 0.55 |
5 | 52.75 | 11.49 | −2.39 | −20.75 | 0.59 |
6 | 8.50 | 9.85 | 2.14 | 2.89 | 0.62 |
7 | 151.25 | 9.67 | −1.62 | −38.96 | 0.65 |
8 | 13.00 | 8.15 | 0.29 | 0.59 | 0.67 |
9 | 66.75 | 7.90 | 1.76 | 18.65 | 0.69 |
10 | 43.75 | 6.50 | −0.85 | −5.92 | 0.70 |
11 | 9.25 | 5.94 | −1.90 | −2.80 | 0.71 |
12 | 21.50 | 5.88 | 0.55 | 1.88 | 0.72 |
13 | 28.25 | 5.51 | 0.02 | 0.08 | 0.73 |
14 | 7.50 | 4.76 | −2.33 | −2.78 | 0.74 |
15 | 5.50 | 4.72 | 0.37 | 0.33 | 0.74 |
16 | 15.25 | −0.76 | −1.86 | 0.75 | 0.75 |
function output.
Comparing the 14 periods found by [
Another good example to test the Periods function with real data, is the MEI (Multivariate El Niño Southern Oscillation Index) time series, which is the first principal component of six variables3. This is a monthly series constructed to monitor El Niño/La Niña activity and intensity in the tropical Pacific Ocean from 1950 to date [
Again, a first analysis was carried out using the code.
>> [Final_model, Cyclic_Descent] = Periods (MEI, t)
The upper panel in
harmonics. Note that the 44-month period (3.6 years) is the first period found, and as in the Sunspot time series, it means that such period has the largest amplitude and therefore it is the most influential period in the time series. This period is followed by a 136 and a 60-month periods (11.6 and 5 years respectively), indicating the importance of those periods in the index.
It can be recognized that there are many periods with values close to each other. Therefore, if the user want to better discern between found periods, taking, for example, only the first ten highly significant periods, the MEI series can be reanalyzed using hn = 10 and alpha = 0.0001 in order to choose only the first ten periods significant at 99.99%
Most of cycles found by this method are already reported in literature. For example, the first period, the 60-month-cycle, has been already reported for MEI [
It is worth noting the effect of removing neighboring periods. It makes the process of identifying close periods much clearer, reducing the number of found periods to a minimum. However, it is recommended to use the highest number of periods possible if the objective is forecasting beyond the time series. As stated in previous section, it is merely a user’s decision whether or not obtaining all the possible periods in a given
Cyclic Descent Order | Period(m) | Period(y) | Amplitude | Phase | Lag | Cum. R2 |
---|---|---|---|---|---|---|
1 | 60 | 5.0 | 0.45 | −2.51 | −23.96 | 0.11 |
2 | 44 | 3.7 | 0.43 | 1.38 | 9.69 | 0.21 |
4 | 77 | 6.4 | 0.32 | 2.75 | 33.75 | 0.26 |
3 | 139 | 11.6 | 0.33 | −1.65 | −36.43 | 0.32 |
7 | 29 | 2.4 | 0.29 | −2.19 | −10.09 | 0.36 |
5 | 68 | 5.7 | 0.26 | −2.62 | −28.34 | 0.40 |
6 | 206 | 17.2 | 0.24 | −1.08 | −35.43 | 0.43 |
9 | 53 | 4.4 | 0.21 | −3.04 | −25.68 | 0.45 |
10 | 37 | 3.1 | 0.20 | 2.86 | 16.82 | 0.47 |
11 | 18 | 1.5 | 0.16 | 0.55 | 1.56 | 0.49 |
analysis or to limit the search to non-contiguous cycles and it is highly recommended to test and compare the results before deciding on this matter.
We are not implying that our results on the sunspot number cycles and multivariate ENSO index are more accurate than those from previous studies, however, we believe that the proposed function can be a useful exploratory tool for users who are not particularly trained in time series analysis.
The decomposition of a time series into several signals as proposed here is not unique. Several methods for this purpose have been proposed, for example the method of frames, matching pursuit and best orthogonal basis [
The main advantages of the Periods function are the following: full description of the harmonics by their three parameters, amplitude, period and phase, including the lag phase that permits an easy graphical interpretation useful to compare lags between several responses of the same phenomenon. The output periodicities of the cyclic descent method can be interpreted or related to other known periodic events, because they are found sequentially to give their order of importance in explaining the residuals. This order and the lag can be relevant in such interpretations. For forecasting purposes, the final fitted model will be more useful because of its best statistical significance. The fitted parameters can be slightly different to those of the cyclic descent, especially the amplitudes, which indicates the relative importance of the harmonics in the fitted model (
The Periods function can be requested for free use to the corresponding author. Furthermore, there is a R version for the users with no access to MATLAB. Feel free to use both versions and distribute them.
We want to thank D. Lluch-Belda (+) for the support and comments for this paper. EGR and VMGM are grant holders of the Sistema Nacional de Investigadores (CONACYT). VMGM and HV were supported by a fellowship from COFAA-IPN. EGR and ARR want to thank CICESE ULP for the support provided.
Eduardo González-Rodríguez,Héctor Villalobos,Víctor Manuel Gómez-Muñoz,Alejandro Ramos-Rodríguez, (2015) Computational Method for Extracting and Modeling Periodicities in Time Series. Open Journal of Statistics,05,604-617. doi: 10.4236/ojs.2015.56062