Analysis of variance (ANOVA) is a usual way for analysing experiments. However, depending on the design and/or the analysis scheme, it can be a hard task. ExpDes, acronym for Experimental Designs, is a package that intends to turn such task easier. Devoted to fixed models and balanced experiments (no missing data), ExpDes allows user to deal with additional treatments in a single run, several experiment designs and exhibits standard and easy-to-interpret outputs. It was developed at the Exact Sciences Institute of the Federal University of Alfenas, Brazil. Stable versions of package ExpDes are available on CRAN (Comprehensive R Archive Network) since 2012. Based on users’ feedback, the package was used to illustrate graduation and post-graduation classes and to carry out data analysis, in Brazil and many other countries. Package ExpDes differs from the other R tools in its easiness in use and cleanliness of output.
An experiment is a planned inquiry to obtain new facts or to confirm or deny the results of previous experiments, where such inquiry will aid in a decision [
According to [
· Formulation of statistical hypotheses that are germane to the scientific hypothesis;
· Determination of the experimental conditions (independent variable) to be used, the measurement (dependent variable) to be recorded, and the extraneous conditions (nuisance variables) that must be controlled;
· Specifications of the number of subjects (experimental units) required and the population from which they will be sampled;
· Specification of the procedure for assigning the subjects to the experimental conditions;
· Determination of the statistical analysis to be performed.
In short, an experimental design identifies the independent, dependent, and nuisance variables and indicates the way in which the randomization and statistical aspects of an experiment are to be carried out.
The most common experimental plans are: Completely Randomized Design (CRD), Randomized Blocks Design (RBD) and Latin Squares Design (LSD), while the most popular schemes are: single factor, factorial and split-plot experiments [
The analysis of variance was introduced by Sir Ronald A. Fisher and is essentially an arithmetic process for partitioning a total sum of squares into components associated with recognized sources of variation. It has been used to advantage in all fields of research where data are measured quantitatively [
According to [
· to examine the relative contribution of different sources of variation (factors or combination of factors, i.e. the predictor variables) to the total amount of the variability in the response variable, and;
· to test the null hypothesis (H0) that population group or treatment means are equal.
That is, according to [
Nevertheless, for the ANOVA’s F test to be valid, three basic assumptions are required [
1) Errors are independent and normally distributed;
2) Errors present homogeneity of variance; and
3) Additivity of terms of the model.
For instance, for the completely randomized design, one assumes that
and
where yij is the j-th replication of the i-th treatment; θi = θ + τi is the mean of the i-th treatment; eij is the random error associated to yij and
If the omnibus hypothesis of equality of means is rejected, a researcher is still faced with the problem of deciding which of the means are not equal. A significant F test indicates that something has happened in an experiment that has a small probability of happening by chance [
On the other hand, when treatments are quantitative, linear regression analysis can be applied in ANOVA [
where yi is the value of the dependent variable for experimental unit i;
Several features of the model deserve further comment:
1) The observed value of yi is the sum of two components: the constant predictor term,
2) The expected value of the error term equals zero, E(ei) = 0; it follows that the expected value of yi is equal to the constant predictor term.
3) If the independent variables are quantitative, the unknown parameters can be interpreted as follows: The parameter β0 is the Y intercept of the regression line. The
4) When there is only one independent variable, the model is a simple regression model; when there are two or more independent variables, it is a multiple regression model.
5) The error term, ei, is assumed to have constant variance
6) The error terms are assumed to be uncorrelated. The value of ei is not related to the value of ei' for all i ≠ i'. Because the ei’s are uncorrelated, the yi’s are also uncorrelated.
When planning a factorial experiment, it is often desirable to include certain extra treatments falling outside the usual factorial scheme. Reference [
The analysis of experimental designs already can be performed in R using some specific packages. First of all, we have the basic package stats, that contains standard (general) functions for analyzing data from designed experiments, such as lm() and aov(). Package stats also has a few functions for get and set contrast matrices, for multiple comparison and some convenience functions like model.tables(), replications() and plot.design() [
In this topic, we briefly describe some contributed packages for the same purpose.
AlgDesign: Algorithmic experimental Designs. According to [
dae: Design and ANOVA of Experiments. The package dae [
DoE.base: Package DoE.base [
experiment: Package experiment, according to [
GAD: General ANOVA Designs. According to [
The package ExpDes [
ExpDes was created in 2009 and since then is used in classes at the Federal University of Alfenas, Brazil. During almost three years, its Portuguese and English versions were expanded, debugged and distributed on the website https://sites.google.com/site/ericbferreira/. Along 2011 it was released in Brazil. The first seminar took place at the Federal University of OuroPreto, Brazil.
The main purpose of the package ExpDes is to analyze simple experiments under completely randomized designs (crd()), randomized block designs (cbd()) and Latin square designs (latsd()). Also enables the analysis of treatments in a factorial design with 2 and 3 factors (fat2.crd(), fat3.crd(), fat2.rbd(), fat3.rbd()) and also the analysis of split-plot designs (split2.crd(), split2.rbd()).
Other functionality is analyzing experiments with one additional treatments on completely randomized design and randomized blocks design with 2 or 3 factors (fat2.ad.crd(), fat2.ad.rbd(), fat3.ad.crd() and fat3.ad.crd()).
After loading the package and reading and attaching the data, a single command is required to analyze any situation. For instance, consider a double factorial scheme under a completely randomized design plus an additional treatment:
fat2.ad.crd(factor1,factor2,repet, resp, respAd, quali=c(T,F), mcomp=”tukey”, fac.names=c(“F1”,”F2”), sigT=.05, sigF=.05)
Besides both factors (factor 1 and factor 2), repetitions (repet), response variable (resp) and response from the additional treatment (respAd), one must inform whether each factor is qualitative or not (quali), the desired multiple comparison test (mcomp)―to be used only for significant qualitative factors, the factor names to be used along the output report (fac.names) and the desired significance considered for the multiple comparison test (sigT, default is 5%) and F test (sigF, default is 5%).
The first information on the output is a legend of the factors followed by the conventional analysis of variance table (
Information on normality of residuals―according to the Shapiro-Wilk test [
Considering the pre-specified significance level, the functions consider the interactions significant or not, and analyse them or the single factors (
As illustrated in
can”), Student-Newman-Keuls (“snk”), Student’s t test (“lsd”), Bonferroni (“lsdb”) and a Bootstrap multiple comparison test (“ccboot”).
For comparison of quantitative treatments is available the routine of linear regression (by ordinary least squares) incorporated the analysis of variance. This routine fits polynomial models up to the third power, presenting individual analysis of variance and lack of fit (
The routines have certain autonomy because with only the information of the level of significance adopted by the researcher and the type of treatment the analysis will be performed automatically. Tests for the single factors or interactions are held only when necessary.
Experimental Designs are meant to be a valuable package to help researchers analyze experimental data without a lot of work or complication. It is available for free and there is no intention to charge for its use, even after new features added to it.
The ExpDes package must be maintained and upgraded by the authors. For next steps, authors are concerned with homogeneity of variances tests, other multiple comparison tests, analysis of unbalanced experiments and the enhancement of the regression procedures.
Special acknowledgements to FAPEMIG for the financial support.