STATISTICAL GUIDE FOR AUTHORS
Important points to consider when planning an experiment are: the amount of replication, the method of randomization, and the system of blocking (if any) to be used. Blocking allows major components of variability to be eliminated from estimates of treatment effects and experimental error. Randomization ensures unbiased comparisons and estimates of error variance. Replication should be sufficient to ensure that treatment differences of practical importance can be detected as statistically significant.
Simple designs (completely randomized, randomized block, split-plot) cater for most experimental situations. Seek the help of a statistician if a more complicated design seems necessary.
The power of a test is useful at the planning stage
in determining the required size of an experiment or sample. For example, if
it is required that the power of a test is at least 0.9 when the true effect
is some given value, it is possible to calculate the necessary size of
experiment or sample size (Lynch and Walsh, 1998, Appendix 5). However,
power calculations have no role in the analysis of data.
ANALYSIS OF VARIANCE
The main use of analysis of variance is the
calculation of the residual variance for an experimental design. Variance
ratio (F) tests for various hypotheses associated with the design (not
necessarily the hypotheses of interest) are produced as a by-product. This
is straightforward when data are balanced (when there is the same number of
observations for each treatment or treatment combination). The analysis of
unbalanced data requires more care. With two factors A and B, and unequal
replication of the different combinations of levels of A and B, effects and
sums of squares for B need to be adjusted for the effects of A, and
Analysis of variance can also be the basis for variance component estimation. Again, this is straightforward when data are balanced. With unbalanced data, the ANOVA table is not unique, and it is preferable to use another method, such as restricted maximum likelihood (Cox and Solomon, 2002).
The dependent variable Y is random with expectation a linear function of X (the independent variable). The values of X are often chosen by or under the control of the experimenter, while the values of Y are subject to experimental error. Sometimes both X and Y are random, in which case there are two regression lines, Y on X and X on Y. The first is relevant if it is required to predict Y from X, and vice-versa. An exception to this general rule occurs in calibration problems, where Y is measured at a series of fixed values of X, and the fitted line is subsequently used to predict X from Y.
When data are grouped, it may be necessary to separate between-group and within-group regressions. These may be quite different.
Stepwise selection procedures may be used to reduce
an unwieldy set of independent variables to a more manageable subset. There
may be a large number of subsets, all giving approximately the same quality
of fit. If the purpose of the regression is prediction, it may not matter
which subset is chosen. If the purpose is scientific understanding, it is
more important that the reduced equation be biologically sensible than that
it be chosen by a statistically optimal procedure.
Repeated measurements on the same experimental unit
should not be regarded as independent. Sometimes a split-plot design with
experimental units as main plots can be used. In growth studies, a simple
method of analysis is to do a separate analysis on a series of summary
measures for each experimental unit (e.g. mean, linear, quadratic trend) .
TRANSFORMATIONS, GENERALIZED LINEAR MODELS
Special methods may be required if the data are skew
or otherwise non-normally distributed, if the residual variance is not
constant, or effects (treatment differences) are not constant on the
original scale of measurement. Note that it is the residuals obtained after
fitting regression or ANOVA which should be normally distributed, not the
raw data. One approach is based on transformation of the data. Some commonly
used transformations are the square root, for data in the form of counts,
and the arcsine transformation, for binomial data. If the variance seems to
vary as the square of the mean, a logarithmic transformation may be useful.
Standard errors are not constant for comparisons on the original scale, so
estimates, standard errors, and test results are best given on the
transformed scale. The equivalent effects on the original scale are most
easily given in the form of confidence intervals, obtained by back
transformation of the end-points. An alternative to analysis on a
transformed scale is to fit a generalized linear model (Dobson, 2002). This
allows the problems of variance heterogeneity and additivity of effects to
be tackled separately. The two approaches usually produce similar results.
ESTIMATION AND TESTING
Do not confuse standard deviation and standard error. The standard deviation is a measure of variability in sample or population. The standard error is a measure of the precision of an estimate.
Multiple comparison procedures, such as Duncan's multiple range test, Scheffe's test, or Bonferroni adjustment of P-values, are relevant when a hypothesis is to be treated as one of a larger set, or when it has been selected in the light of the data. However, a well-designed experiment sets out to test a small number of clearly stated hypotheses. The treatments are usually structured , e.g., a combination of factors each with a small number of qualitative levels, or quantitative levels varying on a continuous scale. In this case comparisons are determined a priori and there is no need for multiple comparison tests.
Statistical significance should not be confused with practical, or biological, significance. A small real effect, of no practical importance, may be statistically significant in a very large sample. A non-significant result does not demonstrate that there is no effect. It means that the data are compatible with there being no effect, and in small samples, this can happen even when the real effect is large.
Computer-intensive methods, such as Gibbs sampling (Gilks
et al, 1996, Chapters 1 and 2) and bootstrapping (Davison and Hinkley,
1997), can be useful in providing standard errors, confidence intervals, or
significance tests in non-standard problems when conventional methods fail.
However, these methods are easy to misuse. Use with caution.
The statistical report should omit extraneous detail, but be informative enough to allow the reader to make independent judgments wherever possible. For example, in a regression analysis, plotting the raw data together with the regression line allows the reader to assess the need for transformation or the existence of outliers. Giving standard errors and degrees of freedom with estimates allows readers to choose their own significance levels for hypothesis tests or confidence intervals.
Do not over-use hypothesis tests. If an estimate is several times greater than its standard error, or is small relative to the standard error, a test may be superfluous. An estimate with standard error, or confidence interval, is often more useful and gives more information.
There are many statistical software packages, each with its own strengths and weaknesses. Do not assume that everyone is familiar with your chosen statistical package. In particular, avoid terminology which is specific to a particular brand of software. For example, terms such as 'least-squares mean' and 'type III sum of squares' may not be generally understood.
Observations which seem to be inconsistent with the main body of data should not be excluded from the analysis without good reason. If in doubt, it may be useful to analyze the data both with and without an anomalous value to assess the sensitivity of the analysis to its presence. In any case, omission of outlying values should be reported. More generally, any shortcomings in design or analysis should be reported with an indication of the possible effect on the results.
Round values to a reasonable number of decimal
places. For example, an estimate given as 5.3125 with standard error 1.7082
has too many decimal places, and might as well be given as 5.3 with standard
error 1.71. The standard error is usually given with one more decimal than
the estimate. It is rarely necessary to have more than 2 or 3 significant
Cox, D.R. and Solomon, P.J. (2002). Components of
Variance. Chapman and Hall/CRC, Boca Raton, Florida.
Copyright ©  [EAAP, INRA, BSAS]