This is the old United Nations University website. Visit the new site at http://unu.edu
The validity of a research design is a measure of the degree to which its conclusions reflect real phenomena.
"Internal validity" of a design refers to the extent to which the detected outcome changes can be attributed to the intervention or treatment rather than to other causes. Unless the internal validity of a design is high, the finding that a particular relationship is causal will not be particularly convincing. Some of the major threats to the internal validity (i.e. confounding factors) are summarized in table 1.2. The primary reason for the proper choice of a comparison group and for statistical adjustment techniques is to control for these threats as best as possible when randomized allocation is not feasible. The expression "gross outcome" refers to a measured change in the outcome variable in the population without controlling for the threats to internal validity. Gross outcome does not eliminate the effects of confounding variables and therefore does not enable the evaluator to distinguish between change that occurred as a result of the programme and change that would have occurred anyway because of other factors. "Net outcome," however, does explicitly address those factors, other than the programme, that bring about measured changes in outcome variables. Net outcomes thus control for the numerous threats to internal validity.
In addition to internal validity, evaluators must be concerned with the external validity of the evaluation. External validity refers to the generalizability of the conclusions drawn to other populations, settings, and circumstances. Both the internal and external validity of an evaluation are fundamentally functions of the design chosen. Simply, each of the commonly-employed designs for evaluation displays a different ability to control for threats to internal and external validity.
While it is beyond the scope of this chapter to discuss the conventional non-experimental, quasi-experimental, and true experimental designs, and the extent to which they address confounding variables, table 1.3 diagrams six conventional designs. The reader who is unfamiliar with or uncertain about the array of available designs is urged to consult directly standard works on this subject such as those by Cook and Campbell (10), Poister (11), or Judd and Kenny (12). However, let it suffice to suggest that choosing from among the conventional techniques involves a trade-off between the difficulties of data collection, first on comparison groups and, second, over time, with the plausibility of the causal inference drawn. They also depend to some extent on the analytical capacity available, as discussed in the next section.
TABLE 1.3. Conventional Evaluation Designs
Design | Referred to as | Analysis | Delivers |
1 XO | One-shot case study | None | Adequacy |
2 OXO | One-group pre-test/post-test | Compare before/after | Adequacy |
3 Group 1 XO | |||
Group 2 O | Static group comparison | Compare groups | Adequacy |
4 X (Varies) O | Correlational | (a) Compare sub-groups | Adequacy, some |
(b) Correlate treatment levels inference on net with outcome controlling for outcome those confounding variables measured which are not themselves highly correlated with treatment | |||
5 Group 1 OXO | Non-equivalent control group design | Compare groups with statistical control for confounding | More plausible inferences on net outcome |
Group 2 OO | |||
6 OOO X OOO | Interrupted time series | Before/after; time-series |
* X= treatment. For items 1-3, 0 = observation of outcome; for
items 4-6,
O = observation of both outcome and confounding variables
Decisions on levels of analysis to be used are important because:
- the necessary skills for complex analyses may be lacking in developing countries;
- interpretation can be improved by advanced analyses, at least in terms of plausibility of conclusions;
- time and cost are in reality related heavily to the extent of analysis.
More advanced analyses have, on occasion, modified conclusions, and often this has been clarifying. In some cases, the clarification of further analysis avoided wrong conclusions that could, in fact, have been detected with a commonsense look at the data. For example, introducing socio-economic status into an analysis of the Narangwal experiment actually reversed the apparent direction of effect of the programme (13). Probably, however, this conclusion could still have been reached simply by dividing the sample into two or more socioeconomic groups. On the other hand, unresolved differences in conclusions between investigators using different analytical techniques have sometimes occurred because the assumptions underlying the analyses were not the same. Often the investigators did not realize this discrepancy themselves. This may warn against too much reliance on too sophisticated techniques, particularly in developing countries, especially since advanced analyses may be not widely feasible. Certainly, efforts should be made to seek the simplest analytical procedures-and this starts with the design of the evaluation.
We distinguish between "basic" and "advanced" analyses. Basic analysis refers to: categorical data analysis for comparison of frequencies (e.g. prevalences) between groups; correlation analysis, for investigating the degree of association between two variables (e.g. whether prevalence of malnutrition is correlated with a possible determinant); and analysis of variance, used to determine whether differences exist between mean values of indicators for a number of groups. The methods of advanced analysis reckoned to be most suitable for the problems we are interested in are the methods of multivariable analyses (e.g. ordinary least squares regression analysis, discriminant analysis, logic analysis, profit analysis, etc.) for investigating associations between outcome and a number of possible determinants, in this case obviously including programme delivery.
In deciding the overall plan of an evaluation, a balance needs to be struck between design, extent of data collection, level of analysis, and plausibility or certainty required of the conclusions. To some extent, good design requires less sophisticated analysis: for example, designs with adequate control groups, or before-after data, may require less investment in both data collection and analysis than an uncontrolled (by design) post-programme correlational analysis. The appropriate analyses by design are indicated in the third column of table 1.3.
When the capacity for advanced statistical analysis is not available-as may frequently be the case, particularly in poor countries-much can still be achieved by commonsense treatment of the data and by comparison of suitably-defined groups. Indeed, even when more advanced techniques are used, it is important to be clear conceptually about which groups are being considered. For example, very often socioeconomic status and/or sanitary conditions are a primary determinant of differences in outcome of variables such as nutritional status or health. These factors can confound conclusions on programme effects. Both can be measured: socio-economic status for example by income, quality of housing, etc. Analyses then are done by suitable groupings.
If programme delivery varies - even if there is no non-programme group as such - tabulation of results as in table 1.4 can be informative and valid. The interpretation of different options could be as follows: Example 1, in which the only group with poor nutritional status is that with low socio-economic status and poor programme delivery, tends to indicate that the programme is having an effect. The conclusion from this is possibly that delivery should be improved to the poor socioeconomic group. Example 2 indicates that socio-economic factors account for most of the difference in nutritional status, and that more detailed examination of whether the programme can have an effect is needed. Example 3 indicates that the programme is related to most of the differences in nutritional status. It also indicates that more efficient delivery is required because those not receiving the programme could benefit from it. Additional confounding variables such as sanitation could be added to such a table, although numbers per cell would decrease. Moreover, information may be lost by categorizing socio-economic status in this way, if it can be measured as a continuous variable.
TABLE 1.4. Comparisons of Outcomes for Different Levels of Programme Delivery and Socio-economic Status
Example | High Socio-economic Status |
Low Socio-economic Status |
||
High delivery |
Low delivery |
High delivery |
Low delivery |
|
1 | + |
+ |
+ |
_ |
2 | + |
+ |
_ |
_ |
3 | + |
- |
+ |
_ |
+ means satisfactory outcome indicator values - e.g. good nutritional status
- means poor outcome indicator values - e.g. poor nutritional status
To combine several variables and make the most use of the available information, multiple regression techniques are often applied. For evaluation, the outcome (nutritional status) is the dependent variable, and programme delivery is treated as one independent variable along with other determinants (confounding variables) such as in this example of socio-economic status and sanitation. The purpose is then to examine the significance (in a statistical sense) and importance (of the magnitude of the effect) when other determinants are allowed for. It must be emphasized, however, that when the substantial computing power required for multiple regression is not available, tabulations by group, as in table 1.4, can still give important results.
It may even be possible to derive some conclusions where there is no difference in delivery but where differences in socio-economic status still exist. The possibilities are given in table 1.5. Here, Example 1 indicates that there is an inadequate effect of the programme and it should be further examined. Example 2 indicates that the programme may be having an adequate effect, although it is possible that socioeconomic status does not account for any differences. Example 3 indicates that the programme is having no effect and should be further examined or discontinued. Example 4 is, in practice, unlikely to occur. Such tabulations give useful insights into the programme adequacy, and also raise questions on targeting and delivery, as discussed in the next section.
TABLE 1.5. Comparisons of Outcomes for Different Levels of Socio-economic Status Where Programme Delivery Does Not Vary
Example | High Socio-economic Status |
Low Socio-economic Status |
1 | + |
- |
2 | + |
+ |
3 | - |
- |
4 | - |
+ |
+ and - as in table 14
Both for planning and evaluation, it is important to distinguish between different population groups. The main groups of concern are as follows:
If programme staff have contact with the recipients, obtaining data on these may be relatively easy. This would be the case in the example of a feeding programme, but maybe not in, say, a water supply project. If outcome data are available from recipients these can, to a limited extent, substitute for survey data on the population as a whole.
The distinction between population groups allows construction of a series of 2 x 2 tables that lead to some important indicators for planning and evaluating targeting, as shown in Figure 1.1. (see FIG. 1.1. Construction of 2 x 2 Tables Quantifying Target Groups, "Needy," and Programme Recipients. A: Planning (pre-programme). B and C: Evaluation during programme. (When delivery is exactly as targeted, recipients = targeted, and table C is exactly like table A.)). In planning, the two important indicators are: a. the proportion of total targeted who are needy (needy targeted/total targeted), which indicates the degree of "planned focusing" of the programme towards nutrition; b. the proportion of total needy who are targeted (needy targeted/total needy), which reflects the "planned coverage" of the programme.
The concepts of coverage and focusing have commonsense meanings, both for planning and evaluation. "Coverage." a basic value that needs to be manipulated for different programme designs is equivalent to sensitivity in the epidemiological literature (14); evidently the aim is to optimize coverage. Focusing, which is equivalent to positive predictive value (14) is a less familiar concept. If targeting is to focus resources, focusing should be at least greater than the prevalence in the population as a whole. That is, the proportion of needy in the targeted population should be greater than the proportion of needy in the population as a whole; the same could apply-but is seldom to our knowledge done-for any evaluation of "poverty orientation". There are a number of procedures for choosing appropriate indicators and their screening levels to identify proportion of needy, and for efficiently deciding on cut-off points to define needy (see discussion in [14]).
For evaluation, the delivery is compared with the targeting and with degree of need, in order to generate further indicators, as shown in figure 1.1. This requires determining whether the recipients were in fact targeted and whether they are needy (e.g. malnourished).
An intermediate stage comparing targeted with recipients (part B of fig. 1.1.) gives indicators of delivery, e.g.: c. the proportion of total targeted who are recipients, which should be 100 per cent if the programme is fully implemented; and of leakage, e.g. as: d. the proportion of total recipients who are targeted, or conversely proportion of total recipients who are not targeted. These should be 100 per cent and 0 per cent respectively if there is no leakage to non-targeted groups.
If there is full implementation and no leakage, then the "actual focusing" and "actual coverage" are the same as those planned (see part C of fig. 1.1.). If there is deviation from the plan, then one way of assessing this is to calculate these "actual" indicators, comparing "needy" with "recipients." Again, actual focusing (recipients needy/all recipients) should be at least greater than the population prevalence of needy. For example, if the prevalence of malnutrition in the region served by the programme is 35 per cent, and the actual focusing is 20 per cent, the evaluator is alerted to a serious problem. Even with knowledge of costs, such indicators could give useful means of evaluating process; with costs as discussed in the next section, they could lead to decisions as to whether the programme is within the range likely to given an adequate or acceptable outcome, even if the expected effects on recipients were achieved. A worked example is given in Mason et al. (15, chap. 4).
If data on needy recipients and targeted populations are available from baseline studies, then some conclusions can also be drawn on outcome during programme implementation based only on outcome data on the recipients. This is so if the assumption can be made that the change in outcome variables is likely to be small compared with that in recipients, and if baseline (pre-programme) data are available. In this case, the need for population surveys for evaluation is reduced. This theory is also given in Mason et al. (15, chap. 4).
Cost-benefit and cost-effectiveness analyses are commonly used for assessing many types of programmes, both during planning and for evaluation. In the case of food and nutrition programmes, cost-effectiveness is the more suitable approach, since a monetary figure cannot reasonably be put on outcome. This kind of analysis, however, is not often used, and a major advance in these evaluations could be made by much more systematic introduction of the techniques and thinking involved. These do not necessarily depend on accurate data, and indeed some form of cost-effectiveness thinking is implicit in the planning of almost any programme; that there is a level of expenditure per unit of expected outcome that would not be worth it is almost always in the back of someone's mind. We consider that the summary parameter of effect per unit costs (which goes to zero when there is no effect) is a useful start, and this is the one mainly discussed here.
A dose-response type of curve relating effects to cost is likely to apply to intervention programmes. This is familiar in economics (as in total product and utility curves, etc.), but not often considered for nutrition programmes. This means that the relationships show in figure 1.2. (see FIG. 1.2. Effect/Cost Curves (scale only for illustration). A: Effect. B: Effect/cost.) are likely to apply. Probably there is as yet insufficient data to put a scale on the X axis, but some research on existing data might allow hypotheses to be put forward. In this hypothetical example, a cost per head of the target population of around $13 gives the maximum cost-effectiveness calculated as number of cases prevented per thousand dollars (fig. 1.2. B); but this rate of expenditure gives less than the maximum overall effect (fig. 1.2. A). The two curves are directly related: for example at $10 per head expenditure, if 100 cases per thousand population are prevented (A), this is 100 cases per $10,000. or 10 cases per thousand dollars (B). The effect/cost in B for any value of cost per head is equal to the total effect as can be read off in A, divided by the corresponding cost per head. Put another way, the height of the curve in B at any value of cost per head is the slope of the line joining the origin to the corresponding point on the curve in A.
One important advantage of such methods would be to allow assessment of whether the level of effort in a programme is at least in the range in which an outcome effect could be expected, taking account also of the level of malnutrition in the target group. It is our impression that often a programme could reasonably be expected to have little effect because the level of expenditure is too low relative to the expected doseresponse. This idea has been referred to as "situation assessment" (see [5]).
Effects per unit cost may also be used to define the extent to which an accurate assessment of outcome is needed. For example, (using relationships similar to those in figure 1.2.) it might be postulated that a change from 20 per cent prevalence to 10 per cent prevalence after the treatment is the maximum feasible (e.g. from
200 malnourished in a population of 1,000 to 100 malnourished) at a cost of say $10 per head (i.e. $10,000 for the population of 1,000). This is equivalent to proposing an effect per unit cost of 10 cases prevented or rehabilitated per $1,000. Clearly, this should have been regarded as good value for money at the stage of planning the project. Similarly, no change would mean that effect per cost was zero. Somewhere between these two, a level of change could be set below which it was regarded that the programme's resources were not being well spent for reasons which could relate to targeting, type of activity. adequacy of delivery. etc. For example; rehabilitation of 5 cases per $1,000 could be regarded as the minimum effect/cost ratio acceptable. This means that the maximum acceptable post-programme prevalence is 15 per cent (i.e. a maximum of 150 malnourished in the population of 1,000). In this case, it is only necessary to know whether the with-programme prevalence is above or below the adequacy cut-off point of 15 per cent.
So far we have not defined or commented on specific potential outcome indicators, and have used nutritional status as measured by anthropometry as the general example. This was in line with our brief for contributing to the MIT workshop and with our view that the major problems lie in designs of evaluation rather than in the measurements to be taken. In addition, most of the chapters that follow are devoted to a discussion of various outcome indicators.
Nevertheless, it should be pointed out that the relationship between indicators and objectives often needs to be clarified. Sometimes the indicator precisely measures the objective. A feeding programme aims to increase the weight gain in a target population of pre-school children, and this weight gain itself is measured. In this case, the responsiveness of the indicator is equivalent to the effectiveness of the programme.
In other circumstances, the indicator is a proxy for the main objective: a feeding programme aims to increase the food intake of a target group of children but the food intake itself is not open to measurement, so anthropometry is used as a proxy for the food intake. In this case we need an indicator that responds to increased food intake: thus for example Habicht and Butz showed that height gain is more responsive than weight gain (in the statistical sense of greater significance) and therefore a better proxy for food intake (4). Such relations between the indicator and the objective needs to be established in advance.
There is an urgent need for research to establish the responsiveness characteristics of indicators. This should be done in a manner similar to table 1.6., where some relevant data were obtained to allow comparison of the responsiveness of different indicators. Although table 1.6. is unsatisfactory in that only a few indicators have been objectively evaluated, it serves to demonstrate the sort of evaluation of indicators that now needs to be undertaken much more widely to establish a firm basis for selection in the future.
Finally, the issue of sample size in relation to the choice of indicators merits careful consideration in attempts to evaluate the results of any administered treatment.
Investigators must define carefully the unit of reference for which the sample size is to be estimated, clearly differentiating "observational units" from the "unit of interest" for the evaluation. The latter, which is made up of a cluster of observational units, is the principal determinant of sample size. In other words, although information may be collected from individuals (observational units), the evaluation of effects may focus and center interest on aggregates of individuals who constitute, say, families.
Whatever the "unit of interest," the number of such units (sample size) should be estimated under pre-specified conditions of accepted risk of detecting an effect when in fact it does not exist, and of not detecting the effect when it does exist. In the procedures for the statistical testing of specific hypotheses relating to treatment effects, the relative frequency (probability) of occurrence of the first kind of error is used to define the level of significance for performing the test, while the frequency of non-occurrence of the second kind of error is used to define the power of the test (frequency of correct detection of effects).
Under these premises, and provided the investigator can provide a priori information on the magnitude of the minimum treatment effect (expected result of a control-treatment difference) worth identifying, with concomitant information that relates to the variability (standard deviation) of the response under consideration, it is possible to estimate the approximate size of the sample required to detect the treatment effect (for a textbook treatment of this issue, see [9]).
TABLE 1.6. Mean Indicator Response to Supplementary Feeding
Field Trials | |||||||||
Type of Malnutrition | Type of Analysis | Indicator | Age | Duration of Suppl. |
Per cent Suppl. Diet | Deficit rel. to std. | Response To Suppl. | Pooled SD | Responsiveness = 1/2 (Respon/SD)² |
PEM | Suppl | Attained Wt. | 36mo | 36mo | 17 per cent-Cal | 4.5kg | 0.9 kg | 1.3 kg | 0.24 |
vs. | 35 per cent-Pro | (Denver) | |||||||
Control | |||||||||
Ht | 36 mo | 36 mo | 17 per cent Cal | 2.3 cm | 3 9 cm | 0.17 | |||
36 per cent-Pro | |||||||||
Arm Circum | 35 mo | 36 mo | 17 per cent. Cal | 0.35 cm | 0.9 cm | 0.06 | |||
36 per cent-Pro | |||||||||
Triceps | 36 mo | 36 mo | 17 per cent Cal | 0.15 mm | 1.1 mm | 0.01 | |||
Skinfold | 36 per cent-Pro | ||||||||
Subscapular | 36 mo | 36 mo | 17 per cent-Cal. | 0 | 1.1 mm | 0 | |||
Skinfold | 36 per cent-Pro | ||||||||
Source : see (16) | |||||||||
Vit. A (1) | Pre&post | Serum | Pre | 1-2vr | >100 per cent | Std z | 12.3 per cent | 11.8 | 0.54 |
Intervention | Retinol | School | Vit. A | 20 mcg/dl | decline in | ||||
reg. | prevalence | ||||||||
values < | |||||||||
20 mcg.dl | |||||||||
Source: see (1) | |||||||||
Iron deficiency anemia | Intervention | Hgb (g/dl) | 9 mo | 6 mo | 15 mg. Fe | 1.21 | 1.07g | 10.g | 0.57 |
vs | + 100 mg | ||||||||
Control | Sat % | 9 mo | 6 mo | Ascorbic | 8.2 | 4.8% | 60% | 0.32 | |
Group | Acid per | ||||||||
FEP | 9 mo | 6 mo | 100 g. full | 39 | 26 mcg | 33 mcg | 0.31 | ||
(mcg/dl. RBC) | fat milk | ||||||||
powder | |||||||||
% children | 9 mo | 6 mo | |||||||
with Hgb < | 27.2% | 2.3% | 63.9 | ||||||
110 g/dl | |||||||||
HgH | 15 mos | 9 mo | (as above) | 0.92 | 1.02 g | 0.94 g | 0.5 g | ||
Sat | 15 mos | 9 mo | 6.7 | 7.2% | 8 0% | 0.405 | |||
FEP | 15 mos | 9 mo | 38 | 24 mcg | 41 mcg. | 0.17 | |||
% children | 15 mos | 9 mo | |||||||
with Hgb < | 25.2% | 2.0% | 85.8 | ||||||
110 g/dl | |||||||||
Source: E Rios, et. al. forthcoming. Prevention of iron deficiency in infants by milk fortification. In Nutrition Interventions strategies, B. Underwood ed. |
Clinical Trials
Type of Malnutrition | Type of Analysis | Indicator | Age | Duration of Suppl |
Per cent Supp Diet | Deficit rel to std | Response to Suppl. | Pooled SD | Responsiveness =1/2 (Respon/SD)² |
PEM | Response | VO2 max | 33+97yrs | 21/2 mo | Protein | 20 | 9.7 | 5.14 | 1.89 |
to protein | /kg (direct | (80d) | from 5.6% | ||||||
supplement | measure) | - 19.6% | |||||||
of calories | |||||||||
VO2 max | 39 ± 97 yrs | 21/2mo | Protein | 1.49* | 0.75 | 0.314 | 2.85 | ||
difference | L/min | (80d) | from 5.6%% | ||||||
between | (direct | - 19.6% | |||||||
means | measure) | of calories | |||||||
heart rate | 39±97vrs | 124d | 2240 kcal/d | 55* | 30 | 6.63 | 10.24 | ||
response to a | ± 357 kcal | ||||||||
workload of | 100 Gm protein | ||||||||
250 kgM: | |||||||||
min | |||||||||
* with respect to valve in general from workers of the same region (normals) | |||||||||
Sources | |||||||||
1 Barac - Niato et al Am J Cl Nutr33 2268-2275 (1980) | |||||||||
2 Maksud et al Eur J Appl Physid 35 173 182 (1976) | |||||||||
3 Spurt et al Am J Clin Nutr 32 767 778 (1979) | |||||||||
PEM | Response | Serum | 18-30 | 22 days | % not given | All with | PA-18mg | 2.79 | 1.64 |
to Rx | prealbumin | Nido + | clinicat | 0.36 | 8.80 | ||||
clinical | mg / 100 ml | Nesmida | PEM | ||||||
grp only | Serum albumin | in t amt | Alb-15 kg | ||||||
g / 100 ml | |||||||||
Source Raf Ingenbleck et al 1972 Lancet a 106 | |||||||||
PEM | Normal vs | serum alb | 18-30 | 22 days | + to plateau | 52 8% of control 148 g /100 ml | 0.38 | 0.18 | |
pre&post | TBA | months | of 3.5 g | 285%ofcontrol 1593 mg/100ml | 2.79 | 1.63 | |||
PEM | RBP | prot &150 | 31.9 % of control 3.79 mg/100ml | 0.80 | 11.22 | ||||
plasma retinol | kcal/kg | 27 4% of control 30 64 g /100ml | 7.49 | 8.37 | |||||
BW/d | |||||||||
Source: Ingenbleck et al Clin Chim Acta 63 61 (1975) | |||||||||
from: | |||||||||
Kwashiorkor | Comparison | 3rd component | 6 m/o | 2 wks | 0.8 g prot | 34 mg | 22 mg | 4 mg | 15.13 |
of pre & post | of complement | 6 y/o | 88 kcal/kg/d | ||||||
intervention | C3 (mg / 100 ml.) | to 3.5 -4 g | |||||||
prot and | |||||||||
140 kcal/kg/d | |||||||||
plus multi | |||||||||
vitamins | |||||||||
Source: Neumann et al Am J Clin Nutr 28 89-104 (1975) | |||||||||
PEM | Comparison | %T-Iympho- | Children | 6-16 wks | "correction of | 37% | 9.2% | 8.1 | |
of pre & post | ocytes in blood | deficit" | |||||||
Intervention | |||||||||
Source: Chanda Brit Med J 3 608-609 (1974) | |||||||||
PEM | Comparison | % T lymph- | 1-5 y/o | 50 days | 1 g prof | 35.7% | 2.9% | 75.8 | |
of pre & post | ocytes in | 100 kcal/kg/d | |||||||
Intervention | blood | 4 g. prof | |||||||
175 kcal/kg/d | |||||||||
Source Kulapongs et al in Malnutrition and the Immune Response R M Suskind ed New York: Raven Press (1977), 99-103 | |||||||||
Iron | Comparison | Bacterecidal | 1-8 y/o | 1 dose | Iron | 33 | 14 | 2.8 | |
Deficiency | of pre & post | capacity of | paren | ||||||
Anaemia | intervention | PMN's leucocytes | terally | ||||||
Source: Chanda Arch Dis Child 48 864 - 866 (1973) |
The procedure for estimating sample size for comparison of independent samples utilizes the relation:
- where is an estimate of the standard deviation of the variable under consideration
- d is the difference to be detected; this should be a fraction (f) of the responses shown in table 1.6., below which one is indifferent to whether there is a response.
- K2 is a multiplier value for different levels of significance and various associated powers in testing as follows:
VALUES OF K2
Power (0/) | Level of Significance Two-tailed Test |
Level of Significance One-tailed Test |
||||
1 per cent |
5 per cent |
10 per cent |
1 per cent |
5 per cent |
10 per cent |
|
80 | 117 |
7.9 |
6.2 |
10.0 |
6.2 |
4.5 |
90 | 14.9 |
10.5 |
8.6 |
13.0 |
8.6 |
6.6 |
95 | 17.8 |
13.0 |
10.8 |
15 8 |
10 8 |
8.6 |
Source: Snedecor and Cochran, (17)
In closing. it should be restated that the above procedure is only an approximation and is usually an underestimate of the required sample size. Its indiscriminate application may lead to absurd answers. It is advisable. therefore, that the question of sample size always be considered in the context of each particular situation and with proper statistical consultation. Ultimately, the successful estimation of sample size is the result of experience that bridges the realms of art and science.
Arroyave, G., J.R Aguilar, M Flores and M.A. Guzman, "Evaluation of Sugar Fortification with Vitamin A at the National Level," Scientific Publication No 384 (PAHO, Washington, D.C. 1979).
Beghin, l and the FAO, "Selection of Specific Nutritional Components for Agricultural and Rural Development Projects" (Nutrition Unit, Institute of Tropical Medecine, Antwerp, Belgium, 1980) (mimeo).
Casley, D. and D. Lury, A Handbook on Monitoring and Evaluation of Agricultural and Rural Development Projects (Johns Hopkins Press, Baltimore, 1982)
Davis, C.E, "The Effect of Regression to the Mean in Epidemiologic and Clinical Studies," Am. J. Epidemiol., 104: 493-498 (1976).
Drake, W.D., R.l. Miller and M. Humphrey, "Final Report: Analysis of Community-Level Nutrition Programs," Project on Analysis of Community-Level Nutrition Programs, Vol. l (USAID. Office of Nutrition, Washington D C., 1980)
Furby, L., "Interpreting Regression Toward the Mean in Development Research," Development Psychology 8: 172- 179 (1973).
Gwatkin, D.R., J.R. Wilcox and J.D. Wray, "Can Health and Nutrition Interventions Make a Difference?" Overseas Development Council Monograph No. 13 (Overseas Development Council. Washington D.C, 1980).
Saretsky, H. "The OEO P.C. Experiment and the John Henry Effect," Phi Delta Kappan, 53: 579581 (1972).
Wray, J.D. "Malnutrition is a Problem of Ecology," Bibl. Nutr. Diet., 14: 142-60 (1970).
Wray, J.D., "Twenty Questions: a Checklist for Planning and Evaluating Nutrition Programmes for Young Children," in D.B Jelliffe and E.F.P. Jelliffe, eds., Nutrition Programmes for Preschool Children (Institute Public Health, Zagreb, 1973).