This is the old United Nations University website. Visit the new site at http://unu.edu
10. Statistical considerations
In this chapter, a simplified guide to statistical techniques useful in handling the data that result from protein quality experiments is presented. Note that formulae are usually given in two forms; the first is the conceptual form that indicates the derivation and the second is the computational form. Explicit formulae are not given for the techniques of analysis of variance and analysis of covariance because of their complexity. It is assumed throughout that errors are independent and normally distributed. Further details and discussion can be found in standard statistical texts
Biological experiments and observations can never be precisely replicated, in the sense that the same results cannot occur twice. Not only are there always uncontrolled and uncontrollable factors, but there is always inherent biological variability that contributes to the variability of the data.
Data consisting of n observations on a single variable, x, can usually be best summarized by the mean
Further, the standard deviation
gives information on how variable the data are, and can be used to calculate the standard error of the mean
which indicates how well the mean has been determined.
For pairs of variables, x and y, the correlation coefficient can be calculated to measure how strongly the variables are related:
If the two variables can be assumed to be functionally related (with x an independent variable and y a dependent variable), the parameters of the function can be estimated, along with their standard errors, by regression analysis. The most common procedure is that of linear regression where the variables (or transformations of them, such as the log or square root) are assumed to be related by a straight line:
y = a + b (x - x)
In this case, the data can be used to determine estimates of the parameters a = y
is the estimate of the error inherent in the data.
These parameter estimates and their standard errors can be combined to give an estimate of the dependent variable y for any value of xO and of the error implicit in that estimate:
Moreover, an estimate and a confidence interval of the independent variable x, given an observation on the dependent variable y0 can be derived:
where c= t.975 sb / b and t.975 is the upper 97.5 % point of the t distribution.
Frequently the ratio of two estimated quantities is of interest:
R = p/q
The standard error of this ratio can be approximated by using the standard errors of the involved quantities:
A set of n observations on a single variable in a single population can be used to compare that population to a reference standard mean,u by the one-sample l-test. The test statistic
is calculated and its absolute value compared to the appropriate critical value of the t distribution for n - 1 degrees of freedom.
Two populations can be compared in terms of a single variable by the two-sample t-test (with n1 x1 's and n2 X2 's)
This statistic has n1 + n2 - 2 degrees of freedom.
If the measurements are on the same individuals, e.g., before and after a treatment, then the paired t-test is appropriate, with d = x1 - X2:
This has n - 1 degrees of freedom.
For the comparison of several populations, the techniques of analysis of variance are appropriate. If analysis of variance shows that there are differences between the populations, then individual pairs of populations can be compared using Tukey's or Scheffe's tests.
If it is of interest to compare two correlation coefficients, they can be transformed into normally distributed variables by
This statistic can be compared with tabulated t," values. The slopes of two regression lines can be compared using the two-sample l-test; however, if the whole regression line is of interest, or if more than two need to be compared, then the techniques of analysis of covariance are appropriate.
From a statistical point of view, experiments need to be designed to (a) measure as directly as possible the effects of interest, and (b) measure those effects so that they can be estimated with as little error as possible. In general, this means that all factors that might affect the variables of interest need to be controlled, or at least measured as covariates, and that sample size needs to be as large as possible (sincesX = sx/,hi). Because all the factors involved are often not apparent, and practical considerations limit sample size, a pilot experiment is usually appropriate. This will allow the investigator to sort out the factors and to determine the approximate size of the effect being sought and of the errors involved. From these considerations, the full experiment can be designed.