Wheat is a staple agricultural grain commodity used within the United States and is grown in nearly every state. Modeling the price of Hard Winter Red wheat (the most common type of wheat) is of extreme economic and social importance. The 2008 financial crisis had a drastic effect on the price of food in real terms, tightening household budgets and increasing the US percentage of citizen classed below the poverty line. Understanding the influential factors in the econometric modeling of the price of wheat allows for more effective governmental intervention and price stabilization. Results indicate that the price of wheat is influenced by a combination of 5 separate functions: “supply”, “demand”, “macroeconomic”, “climate” and “natural resource” related functions. These functions derive from a wide variety of different data sources. The functions were determined and then incorporated into an Ordinary Least Squares (OLS) regression model taking into account variable interaction, variable transformation and time. This regression exercise resulted in a good model, explaining just over 90% of the variation in the price of wheat. Yet, results indicate that the model though sensitive to sharp decreases in the price of wheat is insensitive to sharp increases in the price of wheat. Ideas are discussed of ways of improving the price model. These include the addition of other variables, such as financial speculation/increased use of climate related variables and the idea of using alternative statistical modeling techniques in place of robust OLS regression modeling, such as SVAR models and Spline GARCH models. This research implies that further research into the modeling of the price of wheat within the US has useful potential for a more productive outcome.
Wheat is the most important agricultural commodity in the United States and is considered, to be the principal cereal grain, grown in nearly every state [
The 2008 financial crisis affected the entire globe, causing a sharp spike in the price of wheat and other agricultural commodities [
A wide range of sources of literature provide useful information in regard to the determination of the price of wheat. These include supply factors, demand factors, stock factors, economic factors, and additional commodities prices influencing the wheat price. These different factors integrate a large volume of different data interactions to create mathematical functions or operators which interact with each other as regressors to calculate the price of wheat [
- To what extent can the variation in the price of wheat within the United States (US Dollar per bushel), be explained by a combination of economically related functions, created by an array of variables related to the US economy?
Secondary to this problem is the question:
- What are the probable causes of the unexplained variation, not related to random error within the wheat price model?
Thirdly:
- Do any weather related regressors influence the variation in the price of wheat at a statistically significant level?
The research questions stated above justify the general hypotheses of this investigation:
Ho: There is a statistically significant relationship between the yearly average price of HRW wheat and supply, demand, macroeconomic, climate and natural resource related functions and the variation within the price of HRW can be explained to an economically acceptable degree.
Ha: There is no statistically significant relationship between the yearly average price of HRW wheat and supply, demand, macroeconomic, climate and resource related functions and the variation within the price of HRW is not accounted for by said functions.
The dependent variable in question is the price of wheat (Hard Red Winter, for reasons explained in the Introduction) and is in the form of the yearly average price (US$) within the US agricultural marketing season. The data series of the dependent variable dates from the year 1970 to the year 2014, giving a 45-year data set (N = 45) of individual yearly prices given to the nearest cent (i.e. US$ to two decimal places). The source of this data is the United States Department of Agriculture―National Agricultural Statistics Service [
To determine what is required for each function used in the creation of a model, an extensive data mining mission was undertaken. The mining of “big data” is the practice of examining a large database in order to generate new information. In total over 1336 different variables were examined [
To find a suitable model, with an acceptable explanatory power the general agricultural commodity formula of price prediction is:
Equation 1.
Price = constant (β0) + “supply function”*(β1) + “demand function”*(β2) + “macroeconomic function”*(β3) + “natural resource function”*(β4) + “climate function”*(β5)
- Climate function included for research purposes
The “supply function” in simple terms is determined by the amount of acreage harvested for the grain and the yield in bushels. This figure is calculated by multiplying the number of kernels per head of the crop by the number of heads in 3ft by 0.0319 [
The “demand function” consists of an integration of components of the supply function and market demand. Widely used within other price commodity models, the Stocks-to-Use ratio indicates the level of carryover stock for a given commodity as a percentage of the total demand. The Stocks-to-use Ratio is simply calculated by: The market season beginning stock level + total yearly production ? total use divided by the total use, then multiplied by 100 to gain a Stocks-to-Use % [
The “macroeconomic function” within the model is derived from a calculation of the US yearly average inflation percentage inferred from the Consumer Price Index. Previous research, such as the work of Furlong and Ingenito [
The price of crude oil has been previously cited has a weight influencing regressors in the price of agricultural commodity prices. Oil (in the form of petrol and diesel) is an agricultural production input, used in the production of fertilizer, mechanical fuel and transportation, therefore changes in the price of oil will subsequently influence the production of grain commodities and therefore commodity price [
As indicated by the third section of the Problem Statement, research by Keatinge et al. [
Dependent variable (Y):
- US$ HRW wheat per bushel
Independent variables (X):
- Wheat yield of bushels per acre (BPA).
- US stocks-to-use ratio (STU).
- US yearly average inflation rate (AIR).
- The price of crude oil per barrel (WTI-Cushing) adjusted for inflation (COB).
- ENSO related Oceanic Nino Index (ONI).
The nonparametric correlations of the dependent variable and independent variables were assessed to understand the primary relationships. The results show that both the price of crude oil per barrel adjusted for inflation and the yield of bushels of wheat per acre had a moderately strong positive relationship, whereas the Stocks-to-use ratio exhibited a strong negative relationship, with these three variables correlation was significant at the 95% confidence level (2-tailed). The US inflation rate and the ENSO index showed small negative correlations but were not flagged as significant at the 95% confidence level (see Statistical Appendix for tabled results).
Ordinary Least Squares (OLS) regression was implemented to establish the relationship between the variables to create an acceptable model to predict the price of HWR wheat. The first model created using regression analysis was conducted without considering the interaction among the regressors, using the enter method for the variables (all independent variables were included in the equation in one step). Surprisingly, the model resulted in an F value equating to 51.172, indicating that the regression model has good explanatory ability (51.172 > F (0.05, 5, 39 degrees of freedom) = 2.46). The F statistic is further validated by the R2 value of 0.868 and adjusted R2 of 0.851, indicating that the variables explain 85% of the variation within the dependent variable. Furthermore, all independent variables pass the t-test at the 95% confidence level, meaning that all independent variables have some influence on the dependent variable at the 95% confidence level. Additionally to this, the Durban Watson (DW) Statistic returned a value of 1.808 (close to 2.000). The bounds of the DW at a 95% confidence level where N = 45 and K’ = 5, dL = 1.287 and dU = 1.776. DW value 1.808 > DW upper (dU) are 1.776, therefore we fail to reject the null hypothesis of zero autocorrelation in the residuals, indicating multi- collinearity does not exist among the variables. The initial model (Model 1) can be displayed as:
Equation 2.
This model may be further improved by two methods, considering the interaction between variables and the introduction of time as a variable. As the dependent variable is linear in time we can introduce a successive variable indicating the year. The 5 independent variables and time were data-mined to create as many interactions as possible between the variables and transformations of the variables themselves (natural logs, squaring, cubing etc.) in an attempt to improve the explanatory power of the model in comparison to Model 1. An additional method to improve the model is the change in the way the variables are introduced into the model. After the extensive data mining procedure and assessing a variety of different methods of entering the variables the statistical software found that the forward stepwise procedure produce the best model (Model 2), with the following independent variable interactions:
Equation 3.
To clarify, Model 2 contains the variable interactions:
- Bushels per acre * Crude oil per barrel.
- The Natural Log of the Stocks-to-use ratio.
- Bushels per acre * Yearly average inflation rate.
- Crude oil per barrel * Oceanic Nino Index
- Date in the form of the year.
- Oceanic Nino Index * Oceanic Nino Index
Model 2 provided an improvement on Model 1, the model resulted in an F value of 60.068, increasing the explanatory power (60.068 > F (0.05, 7, 37 degrees of freedom) = 2.275). The model had an improved R2 of 0.919 and an adjusted R2 of 0.904, all variable interactions passed the t-test at the 95% significance level, indicating they all have an influential power within the model. Model 2 also improved on the Durban Watson Statistic with a value of 2.132 (close to 2.000). The result of 2.132 > DW upper (dU) 1.895 means we fail to reject the null hypothesis of zero autocorrelation in the residuals, implying there is no evidence of positive first-order serial correlation of residual series, see Tables 1-3 and
To provide theoretical justification of the model we must perform two actions, firstly a comparison of the two models stated to statistically prove which model is better and secondly to validate the assumptions of Ordinary
Regressors | R | R2 | Adjusted R Square | Std. Error of the Estimate | Change Statistics | ||||
---|---|---|---|---|---|---|---|---|---|
R Square Change | F Change | Df1 | Df2 | Sig. F Change | |||||
1 | 0.761 | 0.579 | 0.569 | 1.18802 | 0.579 | 59.194 | 1 | 43 | 0.000 |
2 | 0.905 | 0.819 | 0.810 | 0.78825 | 0.240 | 55.678 | 1 | 42 | 0.000 |
3 | 0.923 | 0.852 | 0.842 | 0.72044 | 0.033 | 9.279 | 1 | 41 | 0.004 |
4 | 0.934 | 0.873 | 0.860 | 0.67637 | 0.021 | 6.516 | 1 | 40 | 0.015 |
5 | 0.947 | 0.896 | 0.883 | 0.62028 | 0.023 | 8.562 | 1 | 39 | 0.006 |
6 | 0.952 | 0.907 | 0.892 | 0.59523 | 0.011 | 4.351 | 1 | 38 | 0.044 |
7 | 0.959 | 0.919 | 0.904 | 0.56150 | 0.012 | 5.703 | 1 | 37 | 0.022 |
Durbin-Watson = 2.132.
Sum of Squares | Df | Mean Square | F | Sig | |
---|---|---|---|---|---|
Regression | 132.572 | 7 | 18.939 | 60.068 | 0.000a |
Residual | 11.666 | 37 | 0.315 | ||
Total | 144.237 | 44 |
aPredictors: (constant), Bushels_Crudeoil, Ln_stockstouse, Bushels_Inflation, Crudeoil_ENSO, Yeild_Bushel, Year, ENSO_Squared.
Model 2 Regressors | Unstandardized Coefficients | Standardized Coefficients | T | Sig. | |
---|---|---|---|---|---|
B | St. Error | Beta | |||
Wheat Price (constant) | −830763 | 38.104 | −2.198 | 0.34 | |
Bushels * Crude Oil | 0.001 | 0.000 | 0.868 | 10.783 | 0.000 |
Ln Stocks to Use Ratio | −1.446 | 0.273 | −0.343 | −5.294 | 0.000 |
Bushels * Inflation | −0.005 | 0.002 | −0.261 | −3.193 | 0.003 |
Crude Oil * ENSO | −0.010 | 0.003 | −0.190 | −3.855 | 0.000 |
Yield Bushels Per Acre | −0.162 | 0.040 | −0.452 | −4.057 | 0.000 |
Year | 0.049 | 0.019 | 0.354 | 2.529 | 0.016 |
ENSO2 | 0.339 | 0.142 | 0.119 | 2.388 | 0.022 |
Least Squares regression modeling.
To find the best model two comparison methods were introduced, the Corrected Akaike’s Information Criteria (AIC) of both models were compared and an F-test was performed. Model 1 had a Sum-of-squares of 19.078, N = 45, Parameters = 5 resulting in a correct AIC of −24.41. Model 2 has a Sum-of-squares of 11.66, N = 45, Parameters = 7 resulting in a corrected AIC of −40.77. These results indicate that Model 2 has a lower AIC than Model 1, indicating that Model 2 has a better performance ratio than Model 1.
Additionally, the F-test was performed on the models as a secondary comparison method. As mentioned above, Model 1 has a Sum-of-squares of 19.078, 40 degrees of freedom; Model 2 has a Sum-of-squares of 11.66, 38 degrees of freedom resulting in a percentage difference of 63.62 and an F value of 12.087.
Equation 4.
Equation 5.
Since the F value > F statistic we can say there is statistical evidence to suggest that Model 2 performs better than Model 1, according to both the reduction in the AIC and the results of the F-test.
We must also provide validation that the model created meets the assumptions of OLS regression (normally distributed error, independent error structure, homoscedastic error and zero covariance of error within the predicted values). Firstly we must examine the errors terms as a result of the regression model, i.e. the mean of the value of the error terms should be equal to zero, the residuals should be distributed symmetrically about the expected values and that the error terms follow a “normal” distribution.
The results indicate that the residuals pass the first two criteria but fail on the third (the mean of unstandardized residuals = 0 and are distributed symmetrically about the mean―see Statistical Appendix). Testing for normality reveals that the errors terms are not normally distributed and the model fails the assumptions of OLS. The tests for normality show the model passing the Kolmogorov-Smirnov test but failing the Shapiro-Wilk test. The results indicate that two observations skew the distribution and are considered outliers, observations 2007 and 2012; this is also confirmed by the Cooks Distance values of these observations (2007 = 0.2878 and 2012 = 0.126386). Upon removal of these cases and retesting we find that the error terms mean the assumption of normality passing both the Kolmogorov-Smirnov test and Shapiro-Wilk test, therefore validating the assumptions of OLS regression.
Another assumption of OLS regression is that the error series is independent of Y and the regressors. The results indicate that this assumption is met with all covariance values for the error and variable series equaling zero (see Statistical Appendix for full results).
Equation 6.
The last assumption of OLS regression is the variance in the distribution of stochastic error is homoscedastic. The results indicate that the error is indeed homoscedastic, as shown by the horizontal fit-line of the scatterplot of the standardized residual against the standardized predicted value (Statistical Appendix). The results above confirm that the best model selected meets the assumptions of OLS and provides theoretical justification of the model.
As the results from the previous two sections indicate, the model created is generally a good fit-for-purpose, given the amount of variables/interactions and use of “natural data” when explaining the variation in the price of wheat. The figure below (
To summarize, the OLS regression analysis, taking into account variable interaction, provided a satisfactory model to establish a relationship between “supply”, “demand”, “macroeconomic”, “climate” and “natural resource” related functions to explain the variation in the price of wheat. The model returned an F-value of 60.068 and an R2 of 0.919/adjusted R2 of 0.904. upon removal of outliers the model met the assumptions of OLS regression providing theoretical justification of the model and Model 2 was statistically proven to be the best model (lowest AIC and F-test results). But with an adjusted R2 of 0.904 nearly 10% of the variance within the price of wheat remains to be explained. The section below discusses this issue and attempts to explain the remaining variation
that is unexplained by the model.
The results of this investigation provide an interesting insight into the modeling of the price of wheat within the US. The model, which is generally excellent in its explanatory power, does not account for just less than 10% of the variation within the price of wheat. Therefore we must look for additional factors which may explain this unaccountable variation. Common sense implies two methods of improvement, which are discussed below.
To improve the model stated, adding additional information may provide an explanation of the difference between the real price of wheat and the model predicted price of wheat. Masters [
The reason that the identification of the causes of unpredictable/outlier prices is necessary to implement and evaluate governmental policy (both national and international) in the intervention and adjustment in the price of not only wheat but other related agricultural commodities [
Another area where additional information may provide a better model would be increasing and developing the climatic related function within the model. In a similar research project to this (regression modeling the price of soybean in the US), Karthilkeyan and Harlalka [
Another method which may improve modeling capability would be to apply different statistical modeling methods other than OLS regression. Janzen et al. [
Roache [
The results of this investigation allow us to firstly answer the problem questions stated in this paper and secondly to “fail to reject” the general hypothesis stated. Therefore to conclude this investigation we can state, in answer to the problem questions stated earlier, that:
- We can explain the variation of the price of wheat within the United States to 90.5% according to statistical evidence based on the OLS regression model of a combination of “supply”, “demand”, “macroeconomic”, “climate” and “natural resource” related functions.
- There could be several causes of the variation within the price of wheat that the model created which have been left unexplained. These range from adding additional data sources, such as adding climatic data and financial speculation data and improving variable interactions to performing different statistical modeling techniques, such as SVAR and Spline-GARCH modeling.
- Lastly, weather related regressors were found to be statistically significant in the contribution to the price of wheat at a 95% confidence level, as shown by the ENSO related ONI interaction within the model.
The results of this investigation and the points raised in the discussion have various implications and inspire ideas for future research. The main implication is that the price of wheat can be modeled to a reasonable accuracy but there are areas in which future research could focus, such as developing methods to measure financial speculation etc. The recreation of this investigation on a monthly scale may be the best course of future action. Further research into the addition of weather related variables would also be an appropriate course of action.
Fergus J. D. Keatinge would like to thank the creators of the statistical software package IBM SPSS software, The World Bank for providing the data requires for the analysis and Dr. Timothy Fik of the University of Florida, for his teaching, ideas and suggestions for this research.
Fergus J.D. Keatinge, (2015) Influential Factors in the Econometric Modeling of the Price of Wheat in the United States of America. Agricultural Sciences,06,758-771. doi: 10.4236/as.2015.68073
Statistical Appendix
Statistical Appendix
Statistical Appendix
Statistical Appendix