This is the old United Nations University website. Visit the new site at http://unu.edu
John B. Mason and J.-P. Habicht
The principles for the evaluation of ongoing programmes were set in chapter 1. It was argued there that proper evaluation of such programmes should de-emphasize the researcher's concern with causality and prediction and seek to provide answers to questions likely to be of interest to managers, administrators, and funders. It was also pointed out that evaluation should proceed in a sequence of stages, so that easier and cheaper answers are found first, before embarking on more elaborate types of investigation which may prove to be either unnecessary or impossible. This chapter takes up this sequence of stages in the evaluation of ongoing programmes.
Since finding answers to all the possible questions from evaluation - for example as given in the list on page 00 - is an extensive and costly task, it is necessary to delineate what is needed and what can be done with given resources. These refer to the needs of managers, administrators or funders, and researchers, who require different degrees of certainty or plausibility for their answers and conclusions, on the one hand; and to the feasibility and cost of carrying out various types of investigation to respond to these needs on the other.
In practice, therefore, the outcome and its possible association with programme delivery should be examined during the later stages of the evaluation process. Before getting this far, a series of stages should be followed to determine whether it is necessary to examine outcome at all; and if so, what associations between programme delivery and effect should be looked for. The rest of this chapter, therefore, sets out a sequence of stages that starts with the less expensive investigations and only proceeds to examine whether cause-effect relations can be shown after studies on the programme design and its implementation have shown these effects to be likely to exist. These stages are as follows:
Stage 1: Preliminary tasks - answering the questions of evaluation for whom, of what, and in what context
Stage 2: Evaluating the plan of the programme
Stage 3: Evaluating the implementation of the programme
Stage 4: Evaluating the gross outcome of the programme
Stage 5: Evaluating the net outcome of the programme
Stage 6: Move to built-in evaluation
A summary of each of these stages is given in each section as a table: these can be reviewed together as a summary of the procedure set out in this chapter. In addition, because of the importance of, and dearth of experience in, built-in evaluations, the entirety of chapter 14 is devoted to this subject.
TABLE 2.1 Stage 1: Preliminary Tasks
Decide:
- continuation or modification of delivery of the programme?
- replication of the programme?
- estimation of the net effects of the programme?
Reach consensus on the objectives of the programme.
Scout
Plan the evaluation
A first question should be: who is doing the evaluation, for whom, and for what purpose. (See TABLE 2.1. Stage 1: Preliminary Tasks for a schematic representation of this process.) Those responsible for carrying out the evaluation could be outsiders, management, the funding body, or a combinations of these. The users of the evaluation results could be management, administrators and funders, research bodies (see the list of issues to be addressed, on p. 00), or all of these; these should determine the purposes of the evaluation. If the purpose is to provide information to make decisions. the actual options to be decided upon should be specified. The purposes in turn should dictate the questions and issues addressed, and hence the decisions to be taken, again as outlined in the previous chapter. These purposes in effect constitute terms of reference and should be made explicit at the outset. At the same time, the degree of certainty needed to make the consequent decisions, and hence the methods to be used, must be decided.
The next initial step is to agree upon the programme objectives, because an evaluation should result in judgements about how well a programme is meeting its objectives. Those involved in programmes often have very different perceptions of programme objectives, and these perceptions may change markedly over time. Therefore, discussions must examine programme objectives and resolve contradictions that may exist among these, so that a consensus on the objectives of the programme can be set out explicitly.
In practice, it is essential to get a feel for the programme. This has been referred to as "scouting" (1), or going and talking to people in the programme area, both to those responsible for the programme, and to those participating in or benefiting from it. This is worth several days' or even weeks' effort, providing initial impressions as well as a basis for proceeding. It should also involve drawing up some sort of conceptual model to explain how the programme is supposed to achieve its effects if this is not already available in the programme documents - which means making the processes of the project explicit. Such a model also helps to identify the constraints on implementation, and the possible variables confounding apparent effects. An example of such a model based on the sequence of events in a food aid programme could be as follows: food is delivered to a central warehouse; it is distributed to MCH clinics in target areas; the food is given monthly in specified quantities to mothers of children of less than 80 per cent weight for age; at the same time malnourished children come to the MCH clinic and are identified; the mothers give their children the supplementary food, which adds to, rather than replaces, the previously existing intake, to a specified extent; this in turn leads to improved nutritional status of the target group.
From such information the evaluation itself should be planned, specifying the different stages (e.g., along the lines suggested here). Constructing a flow chart of the sequence of steps involved in each stage may be useful. An important decision even at this stage is whether it will be feasible to collect fresh data, either by case study or sample survey, should the next stage indicate that this is worthwhile. This data collection is expensive, and resources should be concentrated on those programmes where this is necessary. Stages 2 and 3 usually do not require a survey and efforts devoted to these earlier stages may pay off by making further data collection unnecessary. Stages 4 and 5 usually will require fresh data collection. However,
The evaluation of the plan of the programme seeks to answer the question: what effect was expected. or could reasonably have been expected. if the programme were implemented as planned? A systematic approach to this question involves a number of steps. (See TABLE 2.2.)
TABLE 2.2 Stage 2: Evaluating the Plan of the Programme
Examine overall objectives
- quantities of inputs?
- target groups (numbers, characteristics)?
- permissible deviations in targeting and delivery?
Evaluate implementation objectives
Evaluate targeting objectives
Evaluate outcome objectives
If adequate effect not expected from evaluation of plan
The overall objectives of the programme must be explicitly stated in stage 1. The evaluation of these objectives then is concerned with checking their feasibility given the resources available and their intented effect per unit cost. The initial steps concern the following questions:
- quantities of inputs
- target groups (numbers, characteristics)
- permissible leakage (percentage non-targeted recipients) and delivery (proportion targeted who are recipients)
The next step is to systematically evaluate implementation (input) objectives, targeting objectives (including permissible delivery and leakage), and outcome objectives, including determination of adequate levels of achievement of these.
Evaluating Implementation Objectives
This step should first check that sufficient details on supplies, services, costs, and so on, are available to provide a basis for evaluating the overall extent to which the programme has been implemented. The programme budget and work plan should provide the needed information. If not, further inquiries may be needed.
At this point, calculate the planned expenditure per head of the target population. Due allowance must be made for both direct and visible costs (e.g. food, transport, additional administration) and less visible costs (e.g. the opportunity cost of the time of personnel who would be occupied otherwise in the absence of the programme). In certain projects the planned level of expenditure per head may be too low to permit a realistic expectation of any detectable (or important) effect of the programme on nutrition. As noted in chapter 1, there is a scarcity of empirical data on actual levels of expenditure, certainly in relation to effect. However, it is important to check whether there is any basis in experience under the prevailing conditions to suppose that the levels of expenditure planned can have the expected effect In the absence of local data, the figures quoted by Gwatkin et al. (2) and Beghin (3) may provide some guidance. As a very rough rule of thumb, it seems to us that an expenditure below a minimum of $10 per recipient per year is very unlikely to produce any measurable effect on outcome. Once such estimates are made for the programme, they need to be checked with the agencies concerned and put together with the evidence for expected effects, to reach approximate levels of expenditure on different activities and assess if these are within the range for which any effect can realistically be expected. They should also be calculated systematically with data from pilot studies and operational programmes, and related to expected effect.
Evaluating Targeting Objectives
The target populations of the programme should be defined in the plan by their numbers and characteristics. The plan must also provide for adequate means of identifying these groups; if for example "malnourished children" were the stated target group, a procedure for screening and admitting them to the programme must be planned for. If an estimate of the prevalence of malnutrition at the beginning of the programme is available, the indicators in part A of Figure 1.1 should be calculated. If such an estimate is not available and cannot be approximated, these indicators cannot be calculated. For planning purposes, the appropriate indicators are the proportion of targeted who are needy (planned focusing), and the proportion of needy who are targeted (planned coverage). This provides the basis for comparing planned with actual focusing and coverage. Estimates of probable effects can be obtained if these indicators can be calculated from pre-programme data, using only data on programme participants, as discussed in Mason et al. (4, Chapter IV).
Apart from their use to evaluate the actual implementation of a programme, the indicators allow evaluation of the programme plan itself. For example, if focusing is not greater than the population prevalence of the needy, the programme is not targeted operationally to the needy. (For instance, a programme that was not targeted at all, that covers the whole population of an area, would have a value of focusing the same as the population prevalence.) Judgements on how far the programme plan was actually oriented to the malnourished. and on the relative efficiency of different targeting strategies, may be useful in making recommendations on modifications to the programme.
The calculations can be extended further by putting in costs. For example, if there are differential costs involved in reaching alternative target groups, knowledge of their relative nutritional status allows calculation of the optimum combination of delivery to different groups to minimize cost per malnourished reached.
Evaluating Outcome Objectives
The distinction has been made between the population as a whole, the (intented) target group(s), and programme recipients. The expected outcome per head is reduced going from recipients to target groups to population. In evaluating the plan, it needs to be decided (if it was not already decided in planning) to which population category the objectives apply. This may be done in one of two ways: an overall objective could be stated as a reduction of, say, malnutrition in the population as a whole, and targeting used to focus resources on groups with a high prevalence to improve efficiency; or the objective could be stated as specifically to improve nutrition in the target group.
This stage is going to mainly involve back-of-an-envelope calculations. but these nevertheless are important since they help decide what effects to look for and when they are adequate, both in and of themselves and with due account taken of costs. At an extreme, abolition of the problem will be much easier to detect than, for example, a 10 per cent reduction in its prevalence. Clearly, it is necessary to start by assessing expected effects on recipients, and then to adjust these for "planned" or "unavoidable" under-implementation and/or leakage to non-target groups. Numbers as well as prevalences should be used. The target group and the population should be the units of evaluation. The recipients should also be considered, to calculate the expenditure per caput of recipients. This will allow some estimate as to whether an adequate effect on the recipients (whether or not these are the same as the targeted) can be expected.
There is no widely accepted method for setting expected and adequate levels of outcome. However, some suggestions can be made, as follows.
1. A minimum level of expected effects on the targeted that is regarded as worth the effort, or cost, could be set. This minimum identifies the level of outcome below which the programme is unsatisfactory. This minimum level of adequacy can be set as. e.g.. cases improved per dollars spent. If the programme does not then meet this minimum adequacy level, it should either be reexamined carefully or discontinued. This is a choice which must have been considered in advance as part of setting objectives. This assessment has to be made at some stage, and it is better to reach preliminary conclusions as early as possible, since if such conclusions cannot be reached, results from data collection will not be interpretable anyway. It is preferable to face this dilemma before embarking on data collection and analysis than to find out later that the results cannot be used for making decisions on the project. Defining this minimum level is also needed to design data collection and analysis, by defining what levels of change need to be detected.
2. Experience from other programmes could be used, particularly pilot projects, to arrive at some statement of the expected levels of effects. Data from recent reviews of 21 nutrition and health projects. including Habicht & Butz (5); Gwatkin et al., (2); Drake et al., (6) show some consistency in results. Pilot projects seem to achieve, for example, a reduction in the prevalence of malnutrition of around 5 to 15 cases per 100 for a baseline prevalence of 30 per cent, or an improvement of about 2 to 6 per cent in mean weight for age. The cost estimates for implementation at pilot level (which may differ considerably from estimates for scaled-up operations) are around $1 to $8 per head per year. The time periods vary, as does whether the effects are on the entire target population or on the specific individuals who fully participated in the projects. With a per capita cost of $5 per head per year, and a reduction in prevalence of malnutrition of 10 per cent, the effect per cost calculates out at 20 cases of malnutrition rehabilitated or prevented per $1,000 expenditure. (The distinction between cost per head of population [e.g., target population! and costs by cases prevented or rehabilitated is obviously important, these being related by the programme effectiveness). Useful values for effects per unit cost in terms of cases prevented or rehabilitated are given in Beghin (3).
One purpose of considering outcome objectives is to eliminate the need to go further in the evaluation of some projects. There may be many that could not be expected to have enough effect to warrant their continuation, much less their expansion, because the inputs could not produce the hoped-for outcome, the targeting is inadequate, and so on.
At this stage, relevant outcome indicators should be tentatively identified. The starting point is clearly the outcome objective. If decreasing the prevalence of protein-energy malnutrition in preschool children is an objective, suitable indicators of nutritional status in these children will be required. The choice of indicators will depend primarily on data availability and responsiveness of indicators to the intervention. If improved school performance is an objective, data on this will be required.
For those programmes that survive this reappraisal, the process provides a clue as to how easy or difficult it will be to detect an effect, and how large that effect must be to be considered adequate. Our experience suggests that such calculations will show that a substantial number of projects require no further evaluation, or at least merit little further expenditure on seeking impacts that are at best hardly important to the overall problem. Several (hypothetical) examples are discussed below.
1. School feeding programmes are an obvious example; their objectives are all too frequently mix-directed. These objectives may be specified by numbers of school children affected, the number of schools, the quantities and/or quality of food provided, etc. "Reduction in malnutrition" is often specified in these programmes as the planned outcome. But these programmes are targeted at an age-group generally with a low prevalence of malnutrition; they reach, in many countries. only those children who are better-off and can, hence, enroll in schools; and finally they often emphasize foods of low priority- e.g. protein when the deficit in the diet is primarily of total food. Thus, an objective stated as substantially decreasing malnutrition is often likely to be shown to be unrealistic by examining the work plan.
2. Pre-school supplementary feeding programmes, on the other hand, are aimed at the age group most affected by malnutrition. A typical example might be the following: all the malnourished children attending clinics in an area are to be provided with 500 Kcals per day for three months. Suppose the area contains 1,000,000 people of whom 200.000 are under five years of age. The expenditure is $500.000 per year. 30 per cent of the children are malnourished, that is 60,000, and 10 per cent of these attend clinics. i.e., 6,000. The questions which immediately come to mind when evaluation of the programme is being considered are:
- Are the children who receive food in fact malnourished?
- Do these children improve?
- Does the programme have a substantial effect on the overall problem? (The answer to this is clear, since 90 per cent of the needy children do not attend clinics.)
If 50 per cent of the needy children receiving food were rehabilitated, this would mean that 3,000 children were improved, at a cost of $500,000. This gives an effect per unit cost of 6 rehabilitated per $1,000. It might be decided in planning that this was just acceptable. If so. it means that if an evaluation found that less than 50 per cent of those malnourished children attending clinics were rehabilitated, or if less than 10 per cent of the malnourished attend clinics at all, the programme should be carefully reconsidered, because the effect per unit cost will be below 6 rehabilitated per $1,000.
3. Another example, comparable with the previous one, might be a targeted food distribution programme, with eligibility criteria. Again, consider a million people, 200,000 under-fives, an expenditure of $500.000 per year, and assume 30 per cent (60,000) of the under 5's malnourished. If the programme reaches all these children, the expenditure would be around $8 per head per year. For a typical family, out of a total expenditure of say $300 per year; some $200 may be spent on food, i.e., perhaps $30 per year for a child. Hence, the $8 per child head per year provided as food is quite significant, and it might be expected that a reduction from 30 per cent malnutrition rate to, say, 20 per cent in the population, could reasonably be expected. This implies that, of the 60,000 children initially malnourished, 20,000 are no longer malnourished. Hence for each $1,000 of expenditure, 40 children are rehabilitated. In this case, if a change of prevalence from 30 per cent to 20 per cent was indeed detected, there might be good reason to suppose that we are looking at an effective programme. Hence, further investigation might be in order to gain more confidence that this improvement was indeed due to the programme. Similarly, an effect per unit cost of 20 cases prevented per $1,000 might be regarded as just adequate, so that the evaluation would be designed to determine whether the prevalence was now above or below 25 per cent.
4. A rehabilitation clinic might cost $10,000 per year to run and succeed in rehabilitating 100 cases of malnutrition. This gives a figure of 10 cases per $1,000. However, even if this effect per unit cost is acceptable, there will usually be logistical and personnel reasons for not scaling up such a project so that its coverage could never be satisfactory. Therefore, even if this was fully effective, it may not a priori be justified; hence outcome evaluation is not necessary, and evaluation of the plan alone will suffice.
5. In famine relief, the only issue really is whether severe malnutrition remains in the location (a camp, or an area) after activities have been going on for some time. The role of food in improving the situation probably requires no elaborate assessment evidently food is needed and there is unlikely to be any question of withholding this on the grounds that its effect cannot be shown. An evaluation here is likely to focus on procedures, in order to gather lessons for future, more efficient implementation of famine relief. Again, no detailed analysis of cost-effectiveness or attempts to attribute changes in outcome to food inputs are likely to be appropriate for administrative or managerial purposes. There may still be a research need to establish
Evaluating the implementation of a programme is reasonably straight-forward in principle (See TABLE 2.3.). It involves the following specific questions:
- Was the selection of recipients correct?
- Was expenditure or other measure of delivery per capita of recipients adequate?
- Was coverage adequate?
TABLE 2.3 Stage 3 Evaluating Implementation
Does programme reach intended target group?
Assess targeting as:
Assess level of delivery as:
Do deviations from objectives affect expected outcome (for target group or population)?
How should implementation be improved?
Targeting and Delivery
The aim of evaluating targeting could be summarized as filling in part B of figure 1.1. This then allows derivation of indicators of delivery and leakage, meaning proportion of targeted who are recipients, and proportion of recipients not targeted. respectively.
Quantitative assessment of level of delivery, e.g., expenditure per capita. must rely on administrative records. Relevant records pertain to financial aspects, delivery of goods. staffing, and so on. This stage of the study, therefore, uses administrative records to examine whether delivery is according to plan and reassess delivery in order to update estimates of likely effects (e.g., to see if expenditure per recipient is still within the range in which an effect can be expected).
Assessment of the extent to which the programme has been delivered to the needy, and to which the needy are covered by the programme, requires outcome data, but logically are included here, as it concerns implementation rather than effect of the programme. If these ratios can be calculated from available data, they will allow important estimates to be made of the success of the programme's implementation.
Again, a priori decisions may be needed as to how for deviations from planned targeting should be acceptable. In any event, these values will show how the actual delivery to the target group needs to be modified. The question of whether the right people are reached is measured by focusing; whether enough are reached is measured by coverage.
This examination and reassessment relies on process indicators. Some examples relevant to nutrition/health programmes might be the following: timing of delivery of supplies and equipment; participation by attendance or by receipt of food, immunizations, health care, health and nutrition education: and staff performance measured by the number of contacts with recipients per worker, or by the number of contact hours per worker.
Improving Implementation
There are two purposes to these analyses. First, to see whether the programme delivery is sufficiently in line with the plan that an adequate outcome can still be expected. This should be based on the premises used to design the programme, perhaps modified from more recent knowledge.
The second purpose is to allow identification of constraints and failures in implementation. This is both in terms of degree of implementation (e.g. quantities of goods and services), and targeting. Recommendations from these results may be among the most valuable outputs of the evaluation, certainly for the programme's management, and for its administration and funders.
The evaluation of the adequacy of programme effects relies in the first instance on assessment of gross outcome-that is, on detected change in outcome indicators, not allowing for the effect that might have occurred anyway. (See TABLE 2.4. Stage 4: Evaluating Gross Outcome) This involves comparison of the gross outcome with a pre-establishment standard of adequacy of effects. By definition. the gross outcome of a programme refers to the absolute or relative change observed in one or more indicators of programme effects. The evaluation of gross outcome has to cope with three issues:
- How are outcome indicators to be chosen?
- How are they to be measured?
- How are they to be evaluated?
TABLE 2.4 Stage 4: Evaluating Gross Outcome
Choose outcome indicators. Consider:
Measure gross outcome. If data only obtainable on programme participants, try:
If data obtainable on non-participants and confounding variables see Stage 5, Table 2.5
Evaluate gross outcome
Choosing Outcome Indicators
The choice of outcome indicators involves a number of considerations (specific examples are given in the relevant chapters elsewhere in this book):
- responsiveness of the indicator to the intervention (see table 1.6.);
- feasibility and cost of collecting and interpreting the necessary data;
- whether the indicator is a direct or an indirect (proxy) measure of the achievement of the objective. Direct measures are usually preferable.
Interviews and observations by an ethnographer or someone with similar skills should precede any systematic, quantitative data collection. Such interviews will often permit restricting quantitative data collection to a few variables and permit simpler sampling than if one tries to cover every eventuality. This can result in major savings in time and money, especially if the programme is inadequately implemented so that no quantitative collection of outcome data is justified.
Measuring Gross Outcome
The measurement of gross outcome does not always preclude subsequently moving by analysis to assessing net outcome. This depends both on the design of the evaluation and whether data on confounding variables are collected for later analysis.
Measurement of gross outcome preferably entails collection of baseline data from programme subjects before the programme begins, and a similar effort at a point sufficiently later to allow the programme to have some measurable effect. Such data, however. may not always be readily available. Baseline data on programme subjects are often lacking, possibly because an evaluation of the programme effects may not have been the intention at the beginning. Furthermore, the collection of such data may involve considerable cost if, for example, an extensive sample survey is required at two points in time. We have considered below the special case of when only data on programme participants are available, without baseline data. It is possible to rely on routinely-collected programme data to get approximate measures of gross outcome. Under these restrictive but common conditions, three possible designs come to mind, using data only from programme contacts:
- The time-in-programme method
- Rapid collection of time-series data
- Using variations in socio-economic status and programme delivery, which can be cross-tabulated or correlated.
The Time-in-Programme Method
The time-in-programme method involves comparing outcome information for children who have been in a programme for a substantial period with others just entering, on a cross-sectional basis. An improvement in those who have been "treated" for a certain time over those just entering can then lead to an estimate of gross outcome. Finally, comparisons should be made between children of the same age to exclude the effects of aging. One obvious confounding factor is self-selection - e.g., those that enter later have different nutritional status on entry than those admitted to the programme at the beginning. This can be checked if nutritional status on entry into the programme is recorded, as is often the case. Another problem is regression to the mean: if malnourished children are selected, they may be an extreme that anyway would improve. Nonetheless, with commonsense allowance for such factors, this procedure provides some internal control.
Rapid Collection of Time-series Data
The effects of certain programmes may, in fact, be detectable by time-series observations within short periods. The time for carrying out and analyzing a cross-sectional survey is itself going to be at least six months - time enough for a feeding programme to show its effects on nutritional status of participants. This can, at least, serve as a check that the programme has an effect on those to whom it is delivered. Questions remain, however, such as whether the participants are the target group, and so forth. Examining these latter questions may require further data.
Correlation Studies: Using Variations in Programmes Delivery and Socio-economic Status
This design relies on a large enough cross-section of the participant population, who have been subjected to the programme with varying intensity, and depends on collecting data on potential confounding variables which may independently influence the outcome, as well as the outcome variables themselves. If the number of observations is large enough and if the observations on each variable are significantly diverse, then it may be possible to use statistical techniques of control to do what the experimental conditions preclude, namely, to allow for the influence of the confounding variables chosen (see item no. 4 in table 1.3., p. 12). Because of the problems associated with this type of design, its primary usefulness is not in establishing causal relationships but rather in suggesting hypotheses that should later be more carefully examined. It can increase the plausibility of inferences on causality, but requires more advanced analytical techniques.
An example of correlational studies would be to collect data on a cross-section of the population, measuring a range of factors possibly affecting outcome. including data on programme delivery. The analysis would then examine the degree of correlation of outcome with programme delivery taking account of other factors. In its usual form analysis involves multiple regression techniques requiring extensive computer facilities; multiple group comparisons. which are more easily calculated, may give similar answers in some circumstances.
When feasible, correlational analyses will deliver the most plausibility under the restrictive conditions of no comparison groups and no pre-programme data. However, they are demanding in terms of data and analysis, and where possible more powerful designs should include some form of comparison groups and/or time-series data.
Evaluating Gross Outcome
Evaluating gross outcome essentially relies on a limited number of variables pertaining to the nutritional status (weight, height, age, which may be indicators themselves or from which indicators may be derived), the extent of delivery of the programme services, and the duration of time that a participant benefits from a programme Such information is used to establish relationships between programme variables on the one hand and measures of gross outcome on the other. For example, a positive correlation between the extent of delivery, and weight-for-age of the participants may indicate that a longer period of participation is associated with a more adequate outcome. This may be taken as an indication that the gross outcome of the programme has been positive. Where the measure of nutritional status of a participant is also available before he joins the programme, a comparison of this information with a similar measure, after some length of time, also measures the gross outcome if maturational effects and regression toward the mean are to be taken into account.
Finally, comparison of estimated gross outcome with programme costs will allow at least a rough estimate of whether the effects per unit cost are in the range likely to be regarded as adequate. If they are far below the adequacy level, it may not be necessary to estimate net outcome at all, since it may only further decrease the estimated effect per unit cost. or at least may not dramatically improve it. Many of the confounding factors discussed in table 1.2. usually act to give over-estimates of the net outcome (positive confounding). With an inadequate gross effect per unit cost, it may be worth considering whether more accurate assessments are, in fact, likely to increase this value-i.e., whether there has been negative confounding (e.g. Chernichovsky 17]. It is our impression that the more careful the evaluation study, the less effect on outcome is usually found. Therefore, unless there is good reason to believe that without-programme values would have deteriorated sharply, all the necessary information for outcome evaluation may now be available, and it is unnecessary to go further in data collection for analysis.