Contents - Previous - Next
This is the old United Nations University website. Visit the new site at http://unu.edu
Roy I. Miller and David E. Sahn
One reason for holding the conference leading to the publication of this document is the general dissatisfaction with the state of the art in the evaluation of food and nutrition programmes. Previous efforts to evaluate such intervention have had several disappointing outcomes. Often, evaluations have been restricted to a review of the process and procedures employed in the delivery of services because of a lack of available data. When impact data have been available, most evaluations have failed to demonstrate nutritional or health impact, or they have produced inconclusive results. Even in the few cases where nutritional and health benefits have been shown, critics have hastened to point out the methodological weaknesses of those evaluations. (A more detailed discussion of the evaluation methodologies used in recent food-aid evaluation programmes and the findings of those evaluation can be found in 11].) Because of those weaknesses (in data collection, measurement, research design and interpretation of results), different approaches to analysis can reveal competing explanations for the observed outcomes or, in many cases. entirely different outcomes altogether.
Traditionally, nutrition interventions have been evaluated using the basic strategies of social science research. Hypotheses are formulated and an experimental or quasi-experimental design is established and applied to test those hypotheses. The element of this strategy that enables the evaluator to attribute observed changes in nutritional or health status to an intervention is the use of controls: if the participant group fares better than the non-participant group, it is assumed that the programme is the cause. When circumstances preclude identifying a randomly-assigned control population to be compared with a treatment group, it is possible to use statistical controls (multivariate techniques such as regression), reflexive controls (comparisons of the treatment group with itself at different points in time), or other analytical techniques to account for or minimize the effects of extraneous factors. (For an example of a mixed strategy using statistical controls to account for differences between a control group and a treatment group, see .
For a more general discussion of the array of quasi-experimental techniques, see .)
In field settings, the implementation of these strategies for selecting suitable controls has proven to be a difficult and challenging task. The primary source of this difficulty is the lack of constancy of the "real" world and the inability, outside a laboratory, to maintain experimental conditions for a sustained period of time. Specifically, evaluations have faltered because they have:
A consequence of these flaws in evaluation is that there remain numerous competing explanations, other than the programme's impact for changes in the target and/or control groups, whatever the research design and/or analytic methodology. In the literature, common sources of competing explanations, often called "threats to validity," have been catalogued and illustrated. (For a general discussion, see [41. For a discussion related specifically to nutrition, see .)
Even more disturbing than evaluations without positive or ambiguous results are the large number of nutrition projects and programmes that are never evaluated at all. Evaluation has been viewed as a threat to programme continuity or as an expense hardly justifiable in light of the need to concentrate resources on service delivery. Rarely has evaluation been viewed as a tool to help learn how to achieve greater nutritional impact. The result is that the potential of evaluation as a means of improving project design and implementation has not yet been realized.
In response to the difficulties in carrying out adequate evaluations in the health/nutrition field, a number of chapters in this publication provide valuable guidance. Specifically, the choice and utilization of measures and indicators of nutritional status, as well as the collection and analysis of nutrition-related data, are discussed in considerable depth. Despite the informative nature of these chapters, there remains considerable uncertainty as to whether a "one point in time" evaluation of an ongoing feeding or nutrition programme can yield useful and conclusive results. even if we overcome the difficulties in measuring nutritional progress alluded to above. While it remains unclear that such an approach will ever yield reliable data or definitive indication of impact, even more disconcerting is the small likelihood that evaluation, as traditionally practiced, will improve project performance, justifying the time and resources expended. An alternative approach to evaluation must be considered in order to minimize the methodological problems discussed above, and, concurrently, to broaden the usefulness of evaluation in the planning and implementation phases of a large-scale intervention.
An alternative to more traditional methods of evaluation is the inclusion of a system of self-evaluation as an integral part of an intervention. Fundamentally, such an approach involves the integration of evaluation as an ongoing and indispensable element of the planning and implementation of a project. The concept of built-in self-evaluation has been given a great deal of lip service in the past few years and, recently, has been described in the literature in considerable detail. (A proposed system for P.L. 480 Title 11 programmes in India appears in .)
A built-in internal evaluation system conceptually merges the tasks of programme monitoring and evalution. Data are generated throughout the chain of programme events, beginning with the cataloguing of inputs, the substantiation of the delivery of services and, finally. the measurement of nutritional impact. This is done on a continuous basis at all project sites. Such information, to be gathered by project staff (as distinguished from outside evaluation teams), is limited to selected key indicators of project operations and impact. The selection of data elements to be gathered and summary statistics to be computed should be based on a review of the minimal information needs of programme managers and a clear understanding of how each element and statistic will be employed in the decision-making process. A premium is placed on coverage at all project sites and on the collation and presentation of such data in simple and understandable formats that can be translated easily into action by managers and their supervisors. In summary, a built-in evaluation system is a self-evaluation system that provides programme planners, field workers, and managers responsible for implementation continuing feedback on their performance.
An important by-product of an internal system that generates data for use in decision-making in the field is the opportunity to aggregate data from individual sites to assess the performance or the programme as a whole. Thus, the system also provides information for use in promoting rational changes in project design on the grand scale. However, for this to work effectively, the information must not flow only in the one usual direction, up through the management hierarchy. It must also flow back down from the central decision-makers, in the form of feedback, to those deployed in the field. The fact that the data were originally collected by field-level people adds strength to recommendations from the "top" that emanate from such data.
From the viewpoint of a funding organization or other agency interested in allocating resources among projects, the primary objections to internal evaluation systems is that the credibility is quite low. James Austin writes:
"This issue frequently takes the form of an outsider vs. insider dichotomy with the latter presumed to be biased because of vested interest in showing performance." (7)
In response, we suggest, first, that if built-in evaluation systems are framed and used as constructive enterprises. as opposed to threatening outside assessments of competency, the problem of objectivity will be mitigated. Furthermore, the concept of built-in evaluation does not preclude external evaluations. Quite the contrary: the potential for success of outside evaluation is enhanced by the wealth of data already available. It is conceivable that such outside evaluation could take on a far broader scope than now possible, not only because of the quality and quantity of data generated by a built-in system. but because of the sensitivity of programme management to the need for, and usefulness of, evaluation activities.
To illustrate how a built-in evaluation system might serve as the basis of an external evaluation, consider the funding agency that wishes to compare the cost-effectiveness of two programmes in order to better allocate resources. Usually, a team of consultants is fielded. On arrival at the site, they learn that there is a dearth of relevant data and that the data are unreliable, and/or incomplete. Furthermore, they must ask interminable questions about the local situation in order to begin to understand the possible explanations for trends observed in the data. With a built-in monitoring/ evaluation system in place, it would be relatively simple to: (a) assign costs to the inputs that are carefully monitored, (b) consider benefits in terms of the achievement of impact, and (c) relate variable costs that accompany changes in their service delivery system to the variable achievement of impact. Clearly, it would be much easier for decision-makers to glean accurate information from a well-functioning internal evaluation system.
A second major objection to built-in evaluation is that the costs are too high to justify the alleged benefit. We believe that this is a somewhat shortsighted view emanating from a poor conceptual understanding of the activity. If the resources for data collection and analysis are thought of as being taken from the programme for an outside activity clearly this argument is valid. But if real gains are derived from evaluation activities in terms of learning how to get more out of the resources applied to the delivery of services, the question becomes whether or not those gains merit the costs. We hypothesize that this will prove to be the case.
Given the discussion above, it is appropriate to consider the merits of built-in evaluation in greater detail. This we do in the next section. However, like so many other ideas, the divergence between theory and practice remains the greatest obstacle to implementation. The remainder of this paper will attempt to elucidate and make practicable the concept of built-in evaluation.
In the interest of brevity, we will concentrate on delineating just four of the major reasons for incorporating evaluation activities as a regular activity in a nutrition intervention.
Programme Operation Improved
The primary reason for incorporating evaluation activities into a nutrition programme is that the knowledge derived will help programme managers improve the quality of their intervention. At the most basic level, the monitoring of the service delivery system will help sharpen the implementation of the intervention. For example, careful monitoring of the stocks and flows of programme inputs may facilitate rationalization of the flow of supplies from warehouses to project sites. This will help avoid losses due to spoilage, contamination, or deterioration that result when commodities and supplies are overstocked at the community level.
At a higher level, evaluation results provide a basis and incentive for programme redesign. It is our belief that the long-term duration of most interventions guarantees that the environment surrounding each intervention must change. Therefore, the "best" programme design must change too. For example, a poor harvest might reduce food availability, raise prices, and decrease food intake by the poor. The resulting deterioration in nutritional status would signal the need for a temporary increase in ration size in a supplementary feeding programme. Furthermore, components of an intervention may meet with success and, as a consequence, no longer be needed. For example, a nutrition education programme, geared at inducing mothers to use oral rehydration techniques at the first sign of diarrhoeal disease may work so well that attention in nutrition education classes should shift to some other activity, for example, boiling water or home gardening. Responding to changing circumstances, as described in these examples, is contingent upon: (a) noting that a change in nutritional status has taken place, and (b) involving functionaries in the determination of why the change occurred and in the formulation of an appropriate response.
Built-in evaluation further facilitates the use of data to improve programme operations for two good reasons: (a) the data will be available in a more timely fashion than data collected in any other way, and (b) the aura of participation surrounding built-in evaluation at all levels will increase the receptivity of site managers, their supervisors, and even the participants themselves, to modifications in programme activities suggested by the findings of the system. It is far less likely that signals generated by a built-in, self-run evaluation system will be discarded or dismissed as incorrect.
Data Quality Improved
By building evaluation functions explicitly into an intervention, programme designers provide a genuine incentive for field workers to collect and record accurate data. All too often, programmes are initiated with a set of forms to be filled out in the field and, in some cases, transmitted to some central office. Field workers rarely understand the necessity and purpose of the forms. More often than not, forms, especially those with anthropometric data, simply clutter the health or feeding center and remain unused. Even when forms are shipped to the central office, field workers soon come to realize that no response is forthcoming. Data collection appears to them to be a futile and cumbersome activity, and all motivation for filling out forms accurately is lost. We have encountered field personnel who merely copy last month's form rather than prepare a new one because they perceive that the forms are useless.
However, if a programme is initiated with a set of forms for collecting limited quantities of data and these are used actively for management at the local level, field workers can perceive an immediate purpose in their efforts. When the data are aggregated at higher levels, with feedback given to the field-level functionaries, there is an even stronger motivation for collecting data properly.
A by-product of generating feedback at higher levels of management is the rapid identification of poorly collected and/or falsified data. The review of trends emanating from a single location by a skilled manager is the best protection against spurious or incorrect data. It is immediately apparent, upon review of longitudinal trends, where the data system is breaking down: any place with inordinately large or surprisingly little change has, in all probability, a worker not processing the data correctly.
Quantity of Data Increased
If one hundred people spend fifteen minutes each day collecting data, four people would have to spend over six hours each to collect the same amount of data, assuming that all of the data can be collected at the same location. By incorporating data collection as a routine part of an intervention, the collection procedure is spread out over a far larger number of people and places. Thus, the opportunity for generating additional data, more variables, or more cases on the same variables, goes up dramatically.
Also, by sharing the burden of data collection, the cost goes down. Initial costs, the training of so many workers, are high, but recurring costs are minimal. The task becomes another function performed in the field by people already there, often at no additional cost.
The opportunity to generate longitudinal sequences of data on individuals through a built-in evaluation system offers another substantial benefit. The measurement techniques available in the field for ascertaining nutritional status are inherently weak. Also, nutritional status is a dynamic condition that can change rapidly in the face of adversity or improved circumstances. The chance to review the velocity of growth in individuals, in reference to a standard, is of considerable importance and becomes possible through an internal data system.
Contextual Information Provided
Having played the role of outside evaluator several times, we have become highly sensitive to our inability to interpret locally-generated data because of our lack of knowledge of the local context. To illustrate: during a recent evaluation of a supplementary feeding programme in Sri Lanka, we encountered an unanticipated result. The nutritional status of preschoolers on the tea plantations, generally assumed to suffer the most severe malnutrition, was better than in the urban, rural, or suburban areas canvassed during our activities. A visit to the tea plantation from which over half the cases in the sample were drawn revealed that this plantation had the model health facility for plantations in the country. (The plantation manager was highly talented and truly concerned with health issues.) Moreover, the medical practitioner in charge noted that the infant mortality rate, even at this model clinic, was abnormally high, because the most malnourished infants were not surviving. Had we not learned of these quirks in our sample, we would have had to challenge the accepted notion that malnutrition was more prevalent on the tea estates. We would have been wrong.
Understanding the local context is critical for proper interpretation of locally-generated data. Even when statistically valid sampling techniques are used to generate programme-wide samples, contextual information knowledge of programme selection procedures, economic trends, impact of parallel programmes, and so forth is needed for proper interpretation. Observation of the local context is heightened when analysis begins at the local level, as it must be with a built-in evaluation system.
As a starting point for describing a built-in evaluation system. we can identify three components of such a system: the data, the analytic methodology. and the management support structure. Then, having described these basic components, we can offer several principles to guide their design.
The underpinnings of a built-in evaluation system lie in the data collection and recording procedures. Analytic results can be no better (more accurate) than the data used in the computational algorithms. To be effective, a data system should include two types of indicators, impact and process indicators. Measures of impact are needed to determine the degree to which a programme is achieving its goal. Process indicators are needed to ascertain the provision of inputs, their costs, as well as the quality and consistency of the service delivery system. Taken together, it becomes possible to relate project activities to impact. Traditionally, processoriented data have been the subject of project monitoring systems, but have been divorced from any attempt to substantiate or explain the achievement of impact. An example of an impact indicator for a supplementary feeding programme might be the percentage of two-year-olds below 70 per cent of a weightfor-age standard, while a process indicator might be the number of kilograms of the supplement distributed each month. (Note that these indicators are not necessarily directly measurable: they may be computed from simpler data elements stored in the data system. To illustrate, in order to derive the percentage of two-yearolds below 70 per cent of the standard, the age and weight for each child must be ascertained, the weight-forage score calculated, and the percentage of the standard computed. Only then can the overall percentage below some given level thought to define a malnourished state be calculated. Similarly, the amount of food distributed might be computed by subtracting stocks on hand at month's end from the sum of the stocks on hand at the first of the month, plus all shipments received during the month.)
Data are just "numbers" unless a defined procedure for reviewing the numbers is carried out. Although certain analytic procedures may seem obvious, it is our experience that considerable skill is required to arrive at the proper interpretation of the statistics used to summarize a batch of "numbers." For example, it is intuitively appealing to compare the percentage of two-year-olds below 70 per cent of the standard at two points in time to estimate impact. However, the experienced analyst will look at drop-outs in the intervening period (are the malnourished disappearing from the programme rolls faster than the well-nourished?) as well as new registrants during that period (were the new children entering the programme actually better off?). Seasonality, a change in the economic system, or bad harvests might also account for changes in the nutritional impact indicators over time. Similarly, distribution estimates may be abnormally high due to spoilage or pilferage.
An analyst must be trained to review the competing explanations for observed changes in outcome measurements (both impact and process} and to accept only those that withstand an effort at discreditation. It is difficult to conceive of a data system that will systematically collect data on all possible alternative explanations in a "real-life" social setting. Thus, the burden of identifying the most plausible competing explanations falls on the local staff who are living in the area and aware of the changing conditions in their communities. Furthermore, a sense of timing must be introduced; that is, the analyst should learn to wait until trends become clear and not draw conclusions precipitously.
Management Support Structure
To be effective, a built-in evaluation system should be supported at the local level with expertise drawn from higher levels of management. Ordinarily, a hierarchical organizational structure - one calling for the supervision of several local distribution centers by someone of a higher authority and/or greater responsibility (and often a higher educational background or more thorough training - oversees an intervention. The supervisory function provided at these higher levels is very important: first, to provide assistance to the manager of each distribution center in the analysis of data, and second, to transfer knowledge derived at one center to the managers of other centers. Our observation of existing programmes suggests that this mid-level management is often lacking in practice though existing on paper. (In the developing world, the logistics of moving supervisory personnel from center to center often preclude a viable supervisory activity.) Such a situation would totally disrupt a built-in evaluation system.
We suggest that supervisors be well versed in the principle of "management by exception". In reviewing data collected at distribution centers. supervisors should identify centers with abnormally bad indicators or extraordinarily good ones. The former need extra help; the latter may hold the keys to success. (Incidentally, centers with abnormal indicators on either end of the spectrum are often those making egregious errors in data collection and recording). By singling out centers performing on the extremes. supervisors can direct their efforts where they can best be used and, simultaneously, gain insight into what project components or community characteristics lend themselves to the attainment of objectives.
We now offer six principles to guide the design of the data component, the analysis component and, finally the management structure.
1. Data should be generated routinely at the local level. The very name "built-in evaluation system" suggests that the data collection procedure originate at the point of service delivery i.e., in targeted supplementary feeding programmes at the center or clinic performing the targeting function, delivery of health and nutrition services, and/or distributing the food. Although it is possible to conceive of a system relying on data generated by a traveling team of survey specialists, such a team would lack familiarity with the context from which the data were drawn. Routine local data collection, on the other hand, is done by those with greatest knowledge of local conditions and, therefore, the best ability to detect irregularities in the numbers and to interpret the results.
2. The data collected must be used for management of the local center or clinic as well as for evaluation of the programme as a whole. It is not uncommon practice in many existing programmes, especially ones attempting to target services to the most vulnerable, to weigh children as a prerequisite for inclusion in the intervention. In some cases, the data are used as a diagnostic tool to determine what additional services are required by the recipient. But, it is rare, indeed, that data on individuals are aggregated for a service center or health clinic to determine the overall impact of the package of services provided. in the absence of such a management-oriented activity, data systems tend to deteriorate, particularly as humanitarian interests, political pressures, or the press of time force practitioners to abandon data collection activities for the sake of delivering more service.
Aggregation of the data facilitates better management. For example, if it is discovered that two-year-olds are consistently in greater need of, and more responsive to, food supplements than are children of other ages, good managers may consider altering their targeting criteria. They may choose to concentrate their efforts more on two-yearolds or introduce community outreach to find two-year-olds not yet in the programme. It is reasonable to believe that once managers perceive the usefulness of data for sound management, they will continue to take care to collect the data accurately and completely.
3. The quantities of data recorded and analysed for any purpose should be kept to a bare minimum, particularly at the initiation of the system. Because data collection and analysis for any purpose is costly. both in time and money, it is important not to overburden the intervention with data-related chores. Too much data can prove to be as harmful as none at all. When food distribution center managers are inundated with data, they are unable to apply any quality control to the collection and recording activities; they are also less likely to look at and analyse what information is available because of the immensity of the task. A system calling for the examination of a few key indicators is, therefore, a workable and appropriate starting point. The design problem and creative task is selecting the proper summary statistics, given the components of the intervention and the skill levels of the managers. It should be anticipated that the initial system design will prove inadequate quickly as questions of interpretation are raised at the local levels. Therefore, provision for expanding the system should be included in the initial design. However, expansion should be dictated by the experience in the field and not the conventional wisdom of outside consultants.
4. Analytic procedures must be well-defined and understood at the local level. The interpretation of changes in selected impact and/or process indicators can be very difficult. For example, a drop in the supply of a food supplement at a center can be due to many factors: spoilage, an unanticipated increase in the number of programme participants, a failure of a shipment to arrive, and so forth. The manager must be sufficiently skilled to recognize that the change in supply is important, to search for the reason, and to take corrective action if needed. At a more complex level, Community-level functionaries must carry out not only tasks such as weighing children, but also more complex functions such as understanding the rudimentary implications and meaning of such data. That is, the village-level worker must be able to interpret the growth parameters being collected and impart such knowledge to project participants. Furthermore, workers must be trained to perform simple aggregation of data at the village level. This facilitates the recognition and anticipation of communitywide trends. But of even greater importance, when we say "well-defined and understood," it is meant that the manager must be able not only to carry out the computations, but also to identify and examine multiple explanations of the results. This suggests that an area for intense training must be the analysis of data. Training should not stop with form filling-out exercises.
5. The staff implementing the intervention must be committed to goal-oriented management. Too often, programme implementers consider the intuitive argument that feeding hungry people is inherently beneficial, and that is sufficient grounds for carrying out a supplementary feeding programme. Unfortunately, the evidence does not often validate this argument. Therefore, a built-in evaluation system must be based on the notion that a programme, as initially defined, might not reach its stated goals. As a result, managerial talent must be employed at all levels of programme design and implementation to use data to verify and/or facilitate the attainment of goals. This necessitates the abandonment of doctrinaire preconceptions and the substitution for them of an attitude that fosters creativity through recognition that the attainment of goals requires a process of iterative learning and experimentation.
6. The built-in evaluation system must be thought of as a dynamic entity subject to evolutionary growth throughout an intervention. We have already alluded to the need to initiate the system with a minimal set of data and a comprehensible set of analytic procedures. The system will grow as managers at all levels of the organizational hierarchy perceive the need for additional information to interpret the basic indicators. But, more importantly, the system will grow in response to the more sophisticated questions asked by managers once the simpler questions are answered satisfactorily. For example, a system might be designed, first and foremost, to verify that a package of services has had a desired impact. Once that is shown, it is logical to ask which components of the package are "cost-effective." This will require a modification of the built-in evaluation system. In this way, the system will evolve over time in response to the needs of management of the intervention.
Contents - Previous - Next