This is the old United Nations University website. Visit the new site at http://unu.edu
William M. Rand
Abstract
This article discusses the use of probability models and statistical techniques in the investigation and formulation of human nutrient requirements. The power of probability formulation and the expanding pace of data gathering make it essential for the nutrition community to develop a better understanding of these methods and to use them routinely.
Introduction
As research into the human response to food evolves from controlled laboratory studies in industrial countries to field studies around the world, the variability and complexity of the findings become almost overwhelming. The ways in which people respond to foods and to the nutrients they contain are sensitive to a multitude of factors, ranging from geographic to physiological. Synthesizing these research findings into simple, precise recommendations of what people should eat is very important, but it is also very difficult. In the last few years probability models and statistical techniques have been used in the investigation and formulation of human nutrient requirements [1; 2]. Also appearing recently have been a number of extensive expositions of statistics directed to the nonmathematical biomedical worker [e.g. 3; 4]. Because of the power and wide applicability of these methods, the nutrition community needs to be more aware of the value of the "probability approach."
Dietary guidelines and recommendations must include allowances that take into account not only individual biological variability but also differences in the availability of nutrients from foods and diets and in the interactions among nutrients consumed at a single meal. Data gathered from different countries around the world repeatedly show the variability in the human response to food, both within and between individuals. Thus, recommendations for nutrient intake levels must be dealt with by a probability approach.
The probability approach
The variability of a nutrient requirement can be described by considering the requirement to be a "random variable" with a probability distribution. Thus, the requirement of a population for a specific nutrient may be considered to have a Gaussian distribution with some specific mean and standard deviation. This approach recognizes that, while at any given time each individual in a population needs to consume a unique amount of that nutrient in order to maintain a level of well-being, these individual requirements themselves may be different at different times. It can be assumed, however, that, although the individuals change, the population remains constant if environmental conditions (socio-economics, disease, etc.) are stable and its requirements can be estimated from the mean levels and variability of a sample of individuals. Thus, for a specific nutrient, given a representative sample, one can estimate the intake levels that will satisfy the requirements of any given proportion of individuals. It is not possible, however, to predict which individuals will fall above or below a given percentile.
The probability distribution of a nutrient requirement formalizes the fact that a requirement level is not a fixed number, applicable for everyone and for all time. It makes explicit the reality that any reasonable level of intake will leave a certain fraction of the population below requirement, although this may be different individuals at different times.
Multivariate aspects of the probability approach
Given that the requirement for each nutrient has a probability distribution, the concept of joint or multiple probability distributions is a natural extension to describe how two or more requirements, each with its own variability, are related. Not only can different nutrient requirements be assumed to have their own distributions, but interactions among nutrients can be formalized within the probability model. The simplest models are the bivariate Gaussian distributions, which describe the relationships by the correlation coefficient (for example, the negative correlations between dispensable or non-essential amino acids where the intake of one reduces the need for others). Of course, much more complex interactions among nutrients have been postulated, both synergistic and antagonistic, that require more complex probability models.
One can apply the concept of joint distributions to other aspects of the requirement problem. For example, in formulating recommendations for energy intake, both the amount needed for normal functioning and the health consequences of overfeeding must be considered. Energy intake recommendations need to be based on the joint distribution of minimal need (requirement) and maximal tolerance. This critical balance between two aspects of a nutrient is obvious in the case of energy, but it is a problem that also arises with other nutrients that are essential and yet have adverse effects above certain levels.
Another example of the use of joint distributions is in the estimation of the fraction of the population at risk because of intake falling below requirement. This is not a straightforward problem whenever intake and requirement are related. Often, as in this case, the symmetrical Gaussian distribution will fit the data, with the correlation coefficient describing how the two are related.
The usefulness of the probability approach
The concept of the distribution of requirement for a particular group of individuals provides a general model for many critical aspects of nutrition. This offers a valuable way of organizing and communicating data and of designing and describing investigations, and, additionally, gives access to a range of powerful statistical tools for comparing, contrasting, and exploring nutritional data.
A probability approach forces confrontation with some of the underlying problems of the definition of requirement. Currently, for each nutrient there may be a unique definition or set of definitions, and often each group of researchers exploring a nutrient may use its own definitions. Sometimes these are precisely defined (e.g. protein in terms of maintaining body nitrogen balance), but often they are based on subjective judgments (e.g. of how much energy should be provided for physical activity, or how large iron stores should be). A common probability model, even if it does not standardize the various definitions, will assist by making them explicit and more available for comment, question, and testing.
One troublesome feature of variability within a population is that intake levels that are high enough to meet the requirements of almost all individuals can be unreasonably high for many individuals. A frequent approach is to divide the population into groups of similar individuals-for example, on the basis of sex, age, and/or physiological state (such as pregnancy)-and to calculate the requirements separately for each group. Different groupings are often used for different nutrients, and in addition this approach has a number of inherent problems (such as the difficulties introduced by splitting by age when what is needed is a split by physiological state-e.g. pre- and post-menarche). Moreover, traditional models deal primarily with the inherent biological variability between individuals, usually ignoring the differences within individuals over time, as well as the variability introduced by the method of estimation. Explicit probability modelling of requirement as a random variable permits the use of standard statistical techniques (such as analysis of covariance) to estimate population requirements more precisely and realistically by taking into explicit account other population characteristics. An approach to describing requirements in terms appropriate for nearly all family members is that adopted by a Latin American workshop [5] of expressing them per 1,000 calories of the household diet.
Viewing nutritional requirements in the context of multivariate distributions provides the investigator access to a full range of multivariate statistical techniques [3; 41 for such purposes as exploring the results of field studies. These techniques include multivariate analysis of variance for comparing populations simultaneously along several dimensions, and the exploratory multivariate techniques that are available for examining complex relationships (e.g. clustering, factor analysis, and canonical correlation). One specific example is in the use of risk analysis to provide explicit cost/benefit formulations for planners by giving them models with which to estimate the numbers of people affected by raising or lowering recommendations.
Finally, the application of statistical techniques to data that already exist can give insight into what new data are needed to extend our understanding of human nutrition, and will guide the design of efficient experiments to obtain these data.
Summary
There is a growing need for the formal use of statistics and probability for the conceptualization of human nutrient requirements and human responses to foods. For these purposes, probability statistics provides the natural tools for dealing with the complex variability encountered. Probability models offer ways of thinking about nutrient requirements that allow easy expression and data synthesis, that give access to a large body of statistical techniques that have been developed to deal with similar problems, and that suggest and assist in the design of critical experiments. The probability approach is a natural formalization of the ways in which many nutritionists have long viewed their problems. However' the power of probability formalization and the expanding pace of data gathering, make it essential for the nutrition community to develop a better understanding of these approaches and to employ them routinely.
References
Kevin M. Sullivan, Jonathan Gorstein, Andrew G. Dean, and Ronald R. Fichtner
Abstract
This paper describes the recent availability of microcomputer software for computing paediatric anthropometry-specifically height-for-age, weight-for-height, and weight-for-age indices based on the CDC/WHO international reference population. The primary software packages discussed are CASP, Epi Info, Anthro, ISSA, and IQ.
Background
The availability of user-friendly microcomputer software is essential to facilitate the assessment of the nutritional status of populations. Although growth reference tables, charts' and other devices may be useful in identifying individual children as having abnormal anthropometry, such as wasting or stunting, these devices are less useful in describing the nutritional status of groups of children. Computer programs that calculate anthropometric indices are useful because they can calculate precise anthropometric values so that measures of central tendency (such as median or mean values) and of dispersion (such as standard deviations) can be calculated and compared with other studies or with the reference population.
In February 1989 the US Centers for Disease Control (CDC) and the Interagency Food and Nutrition Surveillance Programme (IFNS) jointly convened a technical meeting on the future development of software for analysing anthropometric data for the assessment of nutritional status, particularly that of young children. A report of that meeting, with a description of available software, appeared in an earlier issue of the Food and Nutrition Bulletin [1].
In this paper we would like to provide an update on the software available for computing paediatric anthropometry-specifically, height-for-age, weight for-height, and weight-for-age indices based on the CDC/WHO international reference population [2]. The primary software packages are CASP, Epi Info, Anthro, ISSA, and IQ.
CASP
CASP (Centers for Disease Control Anthropometry Software Program) has been available since 1984; the current version is 3.1. This software was developed at the CDC and distributed and supported by the CDC, WHO, and other organizations. A user's manual and handbook for the software is currently available in English and Spanish. The CASP software is composed of a number of programs that compute the anthropometry indices interactively or in a batch processing mode and that also provide some analytic tools. Although CASP has proved to be a useful and important program, it does have certain limitations. Some parts of the software are difficult to learn, and users have reported difficulty with its error-handling capabilities. The strengths of CASP are that, compared with the other programs mentioned below, it requires less RAM (256K) and is easier to run on computers without hard disks. In addition, the BATCH program in CASP, which performs a batch-processing of ASCII records, may continue to be of use.
During the technical meeting on software for nutritional surveillance, participants agreed that, rather than upgrading the CASP software, a preferable strategy would be to use Epi Info version 5 as the primary software package for anthropometric calculations.
FIG 1. Example of questionnaire provided with Epi Info for anthropometric calculations
Table.Centile Distribution
Centile Distribution | 02/21/90 | ||||||||||
Centiles | |||||||||||
0.0- | 10.0- | 20.0- | 30.0- | 40.0- | 50.0- | 60.0- | 70.0- | 80.0- | 90.0- | Total | |
Index | 9.9 | 19.9 | 29.9 | 39.9 | 49.9 | 59.9 | 69.9 | 79.9 | 89.9 | 100 | |
H/A no % | 54 | 14 | 11 | 6 | 2 | 3 | 5 | 1 | 2 | 2 | 100 |
54.0 | 14.0 | 11.0 | 6.0 | 2.0 | 3.0 | 5.0 | 1.0 | 2.0 | 2.0 | 100.0 | |
W/H no % | 62 | 17 | 4 | 3 | 5 | 1 | 4 | 1 | 1 | 2 | 100 |
62.0 | 17.0 | 4.0 | 3.0 | 5.0 | 1.0 | 4.0 | 1.0 | 1.0 | 2.0 | 100.0 | |
W/A no % | 77 | 7 | 3 | 2 | 2 | 2 | 0 | 2 | 1 | 4 | 100 |
77.0 | 7.0 | 3.0 | 2.0 | 2.0 | 2.0 | 0.0 | 2.0 | 1.0 | 4.0 | 100.0 | |
Reference Population | |||||||||||
% | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 10.0 | 100.0 |
Epi Info
Epi Info consists of a series of microcomputer programs developed for epidemiologic investigations by the Epidemiology Program Office of the CDC and the Global Programme on AIDS of WHO. Using Epi Info, health workers can perform sample-size calculations, create questionnaires, enter and analyse data, produce reports, and export or import data. The ANALYSIS module can perform cross-tabulations, stratified analysis, and regression and matched case control analysis and can also create bar, histogram, line, and scatter plots.
Chapter 23 of the documentation for Epi Info version 5 discusses how to calculate anthropometric indices. In general, a questionnaire is created in Epi Info, and, after age, sex, weight, and height are entered for an individual in the ENTER module, the anthropometric indices are calculated. Figure 1 presents a sample questionnaire provided with Epi Info, which is very similar to the data-entry screen in CASP. The calculations are presented on the screen and are stored in the individual's record. (Earlier versions of Epi Info cannot perform the anthropometry calculations.) Sample analysis programs for Z-score and centile distributions are also provided. An example of the centile distribution output is shown in figure 2.
FIG. 3. Example of screen for Z-score distribution from Anthro
Because of Epi Info's modular design, customized systems can be created for specific applications. We envision that public health or research institutions that regularly deal with anthropometric data would have one or two individuals who would become skilled at using Epi Info and would create customized applications to meet their specific needs. For example, a customized Epi Info version would have a simple menu system with menu items such as Enter/Edit Data, Standard Analyses, and Monthly Report. Those entering data would not need extensive knowledge of microcomputers or Epi Info, and the screens and simplified documentation would be in the user's own language.
Currently the screens and manual are available in English; however, there are plans to translate both the program and the manual into French and Spanish. We are also investigating the usefulness of a software package that would allow for the batch processing of Epi Info data files for the calculation of anthropometric indices.
Anthro
The Anthro software package was developed jointly by the Division of Nutrition of the CDC and the Nutrition Unit of WHO to meet the requests of a number of health researchers who use dBase or software packages that can read or write dBase files. Anthro performs three basic tasks: First, it performs a batch processing of existing dBase files (although the program also allows users to import ASCII and comma delimited files into dBase files). Second, it performs standard anthropometric analyses, such as Z-score and centile distributions, and tabulates the prevalence of abnormal anthropometry. Examples of the Z-score distribution and descriptive statistics provided are shown in figures 3 and 4. Lastly, there is an anthropometric calculator for the interactive computation of anthropometric indices on a case-by-case basis.
The screens for Anthro are available in English, Spanish, and French. As of this writing, the documentation is available only in English, but Spanish and French documentation will soon be available.
Other software
Since the earlier description of ISSA (Integrated System for Survey Analysis, created by the Institute for Resource Development) [1], a new module has been added that provides for designing questionnaires and creates the data-description file for data entry.
The IQ software (International Questionnaire Development System, from the Tulane School of Public Health and Tropical Medicine), also described previously, also performs anthropometric calculations.
The CDC has made available the anthropometric subroutine source code in Fortran, dBase, Pascal, and Basic. A number of programmers/researchers have incorporated these routines into customized database applications. An IBM-mainframe version of the Fortran code, which includes head- and arm circumference curves, is also available from the CDC.
A utility program that converts CASP data files (i.e. files with the ETB extension) to Epi Into, dBase, ASCII, or comma-delimited file formats is being distributed with both CASP and Epi Info. This program, called Etbconv, can be run either from a menu or from the command line.
FIG. 4. Example of screen for Z-score distribution statistics from Anthro
Summary and conclusions
Epi Info version 5 is recommended as the software package of choice for those interested in the calculation and analysis of anthropometric indicators. Users with special needs are encouraged to create customized Epi Info applications. For users who employ dBase or software programs that can import or export dBase files, Anthro may be of use for batch processing of the data bases. Epi Info, ISSA, and IQ are generalized data-entry programs that have the additional capacity to perform anthropometric calculations. The latter two programs may be useful to those developing highly complex surveys. CASP will continue to be available, at least into the near future, but the software will not be updated. We recommend that current CASP users switch to Epi Info.
Sources of information
Additional information on the software packages can be obtained from their respective organizations:
Anthro, CASP, Epi Info, and Etbconv Division of
Nutrition, CCDPHP Centers for Disease Control 1600 Clifton Road
(MS A08) Atlanta, GA 30333, USA or
Nutrition Unit World Health Organization 1211 Geneva 27,
Switzerland
ISSA/DHS
IRD/Westinghouse 8850 Stanford Blvd., Suite 4000 Columbia, MD
21045, USA
IQ School of Public Health and Tropical Medicine Tulane
University 1501 Canal Street, Suite 713 New Orleans, LA 70112,
USA
Acknowledgements
The authors would like to thank Trevor Croft of the Institute for Resource Development for providing information on recent changes to ISSA, and Dr. Frederick Trowbridge of the CDC for his helpful comments.
References