- A major new International Journal of Animal Bioscience



  Statistical Guide
Editorial Team
Instruction to Authors
Statistical Guide
Ethical Policy
Table of Contents
Press Release
Contact Us
External links






Important points to consider when planning an experiment are: the amount of replication, the method of randomization, and the system of blocking (if any) to be used. Blocking allows major components of variability to be eliminated from estimates of treatment effects and experimental error. Randomization ensures unbiased comparisons and estimates of error variance. Replication should be sufficient to ensure that treatment differences of practical importance can be detected as statistically significant.

Simple designs (completely randomized, randomized block, split-plot) cater for most experimental situations. Seek the help of a statistician if a more complicated design seems necessary.

The power of a test is useful at the planning stage in determining the required size of an experiment or sample. For example, if it is required that the power of a test is at least 0.9 when the true effect is some given value, it is possible to calculate the necessary size of experiment or sample size (Lynch and Walsh, 1998, Appendix 5). However, power calculations have no role in the analysis of data.


The main use of analysis of variance is the calculation of the residual variance for an experimental design. Variance ratio (F) tests for various hypotheses associated with the design (not necessarily the hypotheses of interest) are produced as a by-product. This is straightforward when data are balanced (when there is the same number of observations for each treatment or treatment combination). The analysis of unbalanced data requires more care. With two factors A and B, and unequal replication of the different combinations of levels of A and B, effects and sums of squares for B need to be adjusted for the effects of A, and vice-versa.


Analysis of variance can also be the basis for variance component estimation. Again, this is straightforward when data are balanced. With unbalanced data, the ANOVA table is not unique, and it is preferable to use another method, such as restricted maximum likelihood (Cox and Solomon, 2002).



The dependent variable Y is random with expectation a linear function of X (the independent variable). The values of X are often chosen by or under the control of the experimenter, while the values of Y are subject to experimental error. Sometimes both X and Y are random, in which case there are two regression lines, Y on X and X on Y. The first is relevant if it is required to predict Y from X, and vice-versa. An exception to this general rule occurs in calibration problems, where Y is measured at a series of fixed values of X, and the fitted line is subsequently used to predict X from Y.

When data are grouped, it may be necessary to separate between-group and within-group regressions. These may be quite different.

Stepwise selection procedures may be used to reduce an unwieldy set of independent variables to a more manageable subset. There may be a large number of subsets, all giving approximately the same quality of fit. If the purpose of the regression is prediction, it may not matter which subset is chosen. If the purpose is scientific understanding, it is more important that the reduced equation be biologically sensible than that it be chosen by a statistically optimal procedure.


Repeated measurements on the same experimental unit should not be regarded as independent. Sometimes a split-plot design with experimental units as main plots can be used. In growth studies, a simple method of analysis is to do a separate analysis on a series of summary measures for each experimental unit (e.g. mean, linear, quadratic trend) .


Special methods may be required if the data are skew or otherwise non-normally distributed, if the residual variance is not constant, or effects (treatment differences) are not constant on the original scale of measurement. Note that it is the residuals obtained after fitting regression or ANOVA which should be normally distributed, not the raw data. One approach is based on transformation of the data. Some commonly used transformations are the square root, for data in the form of counts, and the arcsine transformation, for binomial data. If the variance seems to vary as the square of the mean, a logarithmic transformation may be useful. Standard errors are not constant for comparisons on the original scale, so estimates, standard errors, and test results are best given on the transformed scale. The equivalent effects on the original scale are most easily given in the form of confidence intervals, obtained by back transformation of the end-points. An alternative to analysis on a transformed scale is to fit a generalized linear model (Dobson, 2002). This allows the problems of variance heterogeneity and additivity of effects to be tackled separately. The two approaches usually produce similar results.


Do not confuse standard deviation and standard error. The standard deviation is a measure of variability in sample or population. The standard error is a measure of the precision of an estimate.

Multiple comparison procedures, such as Duncan's multiple range test, Scheffe's test, or Bonferroni adjustment of P-values, are relevant when a hypothesis is to be treated as one of a larger set, or when it has been selected in the light of the data. However, a well-designed experiment sets out to test a small number of clearly stated hypotheses. The treatments are usually structured , e.g., a combination of factors each with a small number of qualitative levels, or quantitative levels varying on a continuous scale. In this case comparisons are determined a priori and there is no need for multiple comparison tests.

Statistical significance should not be confused with practical, or biological, significance. A small real effect, of no practical importance, may be statistically significant in a very large sample. A non-significant result does not demonstrate that there is no effect. It means that the data are compatible with there being no effect, and in small samples, this can happen even when the real effect is large.

Computer-intensive methods, such as Gibbs sampling (Gilks et al, 1996, Chapters 1 and 2) and bootstrapping (Davison and Hinkley, 1997), can be useful in providing standard errors, confidence intervals, or significance tests in non-standard problems when conventional methods fail. However, these methods are easy to misuse. Use with caution.


The statistical report should omit extraneous detail, but be informative enough to allow the reader to make independent judgments wherever possible. For example, in a regression analysis, plotting the raw data together with the regression line allows the reader to assess the need for transformation or the existence of outliers. Giving standard errors and degrees of freedom with estimates allows readers to choose their own significance levels for hypothesis tests or confidence intervals.

Do not over-use hypothesis tests. If an estimate is several times greater than its standard error, or is small relative to the standard error, a test may be superfluous. An estimate with standard error, or confidence interval, is often more useful and gives more information.

There are many statistical software packages, each with its own strengths and weaknesses. Do not assume that everyone is familiar with your chosen statistical package. In particular, avoid terminology which is specific to a particular brand of software. For example, terms such as 'least-squares mean' and 'type III sum of squares' may not be generally understood.

Observations which seem to be inconsistent with the main body of data should not be excluded from the analysis without good reason. If in doubt, it may be useful to analyze the data both with and without an anomalous value to assess the sensitivity of the analysis to its presence. In any case, omission of outlying values should be reported. More generally, any shortcomings in design or analysis should be reported with an indication of the possible effect on the results.

Round values to a reasonable number of decimal places. For example, an estimate given as 5.3125 with standard error 1.7082 has too many decimal places, and might as well be given as 5.3 with standard error 1.71. The standard error is usually given with one more decimal than the estimate. It is rarely necessary to have more than 2 or 3 significant digits.


Cox, D.R. and Solomon, P.J. (2002). Components of Variance. Chapman and Hall/CRC, Boca Raton, Florida.

Davison, A.C. and Hinkley, D.V. (1997). Bootstrap Methods and their Application. Cambridge University Press.

Dobson, A.J. (2002). An Introduction to Generalized Linear Models. Chapman and Hall, London.

Gilks, W.R., Richardson, S., and Speigelhalter, D.J. (1996). Markov Chain Monte Carlo in
Practice. Chapman and Hall, London.

Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sinauer Associates.




To submit a paper

To submit your paper
NOW - please go to: Cambridge Journals


News Box

For a free on-line access to a sample issue, click here

For Registration to the "Table of Content" alert click here







Copyright [2008] [EAAP, INRA, BSAS]