This is the old United Nations University website. Visit the new site at

Contents - Next


The united nations university is an organ of the united nations established by the general Assembly in 1972 to be an international community of scholars engaged in research, advanced training, and the dissemination of knowledge related to the pressing global problems of human survival, development, and welfare . Its activities focus mainly on peace and conflict resolution, development in a changing world, and science and technology in relation to human welfare. The University operates through a worldwide network of research and post-graduate training centres, with its planning and coordinating headquarters in Tokyo, Japan.

The International Food Data Systems Project (INFOODS) is a comprehensive effort, begun within the United Nations University's Food and Nutrition Programme, to improve data on the nutrient composition of foods from all parts of the world, with the goal of ensuring that eventually adequate and reliable data can be obtained and interpreted properly worldwide. At present in many cases such data do not exist or are incomplete, incompatible, and inaccessible.

This volume is the third in a series that provide guidelines on the organization and content of food composition tables and data bases, methods for analysing foods and compiling those tables, and procedures for the accurate international interchange of the data. It presents an overview and guidelines for how information that results from the analyses of foods should be compiled into useful food composition tables.

Executive summary

Data on the composition of foods are essential for nutrition research, product development, nutrition education, trade of foods and food products between and within countries, and development of nutrition and agricultural policies by government agencies. Food composition data have been compiled into many data bases throughout the world. As the uses of those data increase, a larger number of individuals and organizations become involved in their compilation, and thus the need for guidelines on their gathering, formatting, and documentation increases. This document describes and presents recommendations for the procedures involved with compiling the values for food composition data bases and tables. Specifically addressed are the five major ways to obtain data on the nutrient content of foods:

Two main themes of this manual are (1) the importance of careful definition of the data base, in terms of its ultimate use, before the actual process of compilation begins, and (2) the importance of careful documentation of the procedures used to obtain the data. The data gathering, manipulation, and estimation techniques associated with each data point in a data base should be clearly identified and described. In addition to fostering compatibility and consistency between data bases, adequate documentation will also give the users an awareness of the extent and limitations of the data and a clear indication of necessary and potential improvements to the data base.

Compiling a data base is a complex endeavour requiring major effort; it is the goal of this document to provide guidelines to data base compilers to assist them in focusing their efforts, in the hope that future data bases will be more compatible, more consistent, and more useful to a wider audience.

William M. Rand is Professor of Biostatistics in the Department of Community Health of the Tufts University School of Medicine, Boston, Massachusetts, USA.

Jean A. T. Pennington is Associate Director for Dietary Surveillance in the Division of Nutrition, Center for Food Safety and Applied Nutrition, Food and Drug Administration, Washington, D.C., USA.

Suzanne P. Murphy is Assistant Research Nutritionist in the Department of Nutritional Sciences, University of California, Berkeley, California, USA.

John C. Klensin is Principal Research Scientist in the Department of Architecture, Project Coordinator for INFOODS, and Director of the INFOODS Secretariat at the Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

The technical committee for this document consisted of the above and: G. Beecher, Nutrient Composition Laboratory, Department of Agriculture, USA L. Bergström, National Food Administration, Sweden I. M. Buzzard, Nutrition Coordinating Center, University of Minnesota, USA D. Feskanich, INFOODS Secretariat, Massachusetts Institute of Technology, USA F. N. Hepburn, Human Nutrition Information Service, Department of Agriculture, USA J. M. Holden, Nutrient Composition Laboratory, Department of Agriculture, USA L. W. Hoover, University of Missouri, USA B. Perloff, Human Nutrition Information Service, Department of Agriculture, USA G. J. Petot, Case Western Reserve University, USA W. Polacchi, Italy N. Rawson, Campbell Soup Company, USA W. A. van Staveren, Agricultural University, the Netherlands K. Yasumoto, Kyoto University, Japan


Public health nutrition activities; agricultural, nutritional, and epidemiological research; food industry and trade decisions; and government planning and policies concerning nutrition and agriculture all depend on accurate knowledge of what is in foods. Currently, these data are not always adequate for existing needs. Often they are incomplete, inaccurate, inconsistent, incompatible, or inaccessible. While there is much excellent information on food composition throughout the world, its ultimate utility could be increased by better communication and interchange of both information and ideas among countries.

INFOODS was formed in 1983, under a mandate of the United Nations University, to develop operational communication paths between the gatherers, the compilers, and the users of food composition data [70, 69]. As part of its activities, funded primarily by agencies of the United States Government (the National Cancer Institute, the Food and Drug Administration, the National Heart, Lung, and Blood Institute, and the Department of Agriculture), INFOODS was commissioned to prepare a series of guidelines on how to collect and analyse foods for nutrients and other substances, how to record and communicate food composition data, and how to use food composition data in research and practice. This document focuses on the issues involved in gathering together, and estimating where necessary, the specific data needed for a food composition table or data base. It should be useful to developers as well as users of food composition data bases, at both local and national levels.

A prime concern of INFOODS has been fostering formal and informal discussions to outline areas in which guidelines would be useful. Starting with the initial INFOODS meeting in Bellagio, Italy [73], and continuing through the initial regional meetings of EUROFOODS [102], ASIAFOODS [71], LATINFOODS [12], and OCEANIAFOODS [20] and annual meetings of the U.S. National Nutrient Databank Conference, there have been groups focusing on how one actually obtains and combines food composition data into a data base. In late 1986 an international meeting was held in Washington, D.C., to discuss the general problems of "missing data", i.e., how to deal with situations in which no analytic data exist. This meeting produced the general plan for the current document. A first draft of the document was circulated to those who had attended that meeting and to the various regional liaison groups; the current version incorporates their very helpful suggestions.

Part I the data base

Tables listing the components of specific foods have been published for over 150 years. Food composition tables have been developed by international agencies, governments, universities, industries, and individuals around the world [28, 37, 72]. Each of these tables differs from others in terms of the foods examined, the nutrients analysed, and the data presented. Many of these tables also differ in their methods of gathering and handling data. As food composition data assume more scientific, academic, and political importance, compatibility and consistency of analysis and presentation become more critical.

This part of the manual discusses specific aspects of data bases, stressing the importance of planning and documentation. Its primary emphasis is on the importance of careful definition of foods and nutrients and documentation of data sources and methods of data manipulation. The utility of food composition data depends on the information that accompanies them.

1. Data base considerations

Importance and status of food composition data
Analytic and non-analytic data
Types of data bases
Data base development

Importance and status of food composition data

Food composition data are important to a spectrum of users ranging from international organizations to private individuals [11, 13, 36, 65, 100, 105].

At the international level:

At the regional level:

At the individual level:

Each of these activities requires accurate data on the composition of foods, and requires that these data be in a form that permits easy access, intelligent manipulation, and confident usage.

There are currently over 150 food composition tables in use around the world [28].

This number excludes most of the tables that exist for the United States [37] since, for the most part, these depend heavily on data from the United States Department of Agriculture (USDA). Because of the overlap between tables and inadequacies of documentation, an exact count of the number of unique foods for which there are data is difficult to determine, but they number in the thousands. Even so, the number of foods consumed is probably several times greater than the number for which analytic data exist.

Nutrient profiles of the foods that have been analysed are not always complete. Major food composition tables routinely contain data on 25 to 50 food components, but do not include all compounds that routinely occur in foods and are suspected of having biological activity. The biological activity of a nutrient in any given food derives both from the total profile of components in that food (and others with which it may be combined) and from the physical condition of the consuming individual. Nutrient bioavailability is one of the frontiers of modern nutritional research and an area not yet reflected in most food composition data bases.

Without information on their statistical distribution, the values available in food composition tables may be misleading. The amount of a nutrient in a food item has a probability distribution and cannot be adequately described by the results of a single analysis, or averages of analyses performed on samples drawn for convenience rather than for representativeness.

There should be better documentation of work in the areas of the generation and use of food composition data. There are few generally accepted standards and guidelines for gathering, aggregating, saving, identifying, manipulating, or using food composition data. One goal of this manual is to provide some of these guidelines so that future developers can avoid the deficiencies of the past.

Analytic and non-analytic data

Analytic data are values based on laboratory values, including those obtained by well-defined conversion factors and straightforward formulae. Thus, protein data calculated by multiplying nitrogen content by a constant are considered analytic data. By contrast, non-analytic data are values which involve either no chemical analyses (e.g., using a value of zero for the cholesterol in an orange because it is a plant product) or the use of analytic data with varying amounts of estimation involved (e.g., the calculation of the vitamin content of a stew from the vitamin content of its raw ingredients). Non-analytic data are often, but not always, less accurate than analytic data.

The terms "calculated" and "imputed" are often used for data that are not analytic, "calculated" implying more trustworthiness than "imputed". While many people distinguish between these two terms, there is little agreement on their precise meaning. To avoid conflict, we use the term "estimated" for those data which are not strictly laboratory values.

A data base should contain the most accurate and precise data that are available. To many, this suggests that the entries should be purely analytic data; however, there are several reasons why this is neither a practical nor a theoretical goal in food composition data bases.

Impractical Analyses

For most foods and many nutrients, there exist analytic procedures that produce reasonable data [83, 86]. However, these procedures differ in complexity and expense, and in the course of research, decisions will be made which effectively declare certain of the analyses impractical, prohibitively expensive, time-consuming, or labour-intensive. These decisions are made because the costs cannot be justified on the basis of the potential uses of the data-the lack of involvement of a particular food or nutrient with a health problem or the infrequency of consumption of a food by the population of interest-or simply because of cost-benefit trade-offs, e.g. where the choice is between analysing a number of foods for one nutrient at the same cost as that of completing a single analysis for another nutrient.

Multiple Measurements

If more than a single observation is available for a nutrient in a specific food, it may be necessary to summarize these "replicate" data. While certain data bases, such as those reflecting raw laboratory values, may include all the analytic data that exist as individual entries, most users need some indication of the most likely values for the nutrients of the food in question, as well as some indication of how variable the data are. In order to provide this information, it is necessary to manipulate the given data statistically to derive data base entries. A more complete discussion of these manipulations is contained in chapter 3.

Aggregated Foods

Frequently, data are needed for representative foods which do not in fact exist: one cannot determine the exact type of potato that a consumer purchases, since the consumer chooses from many different cultivars which are grown, harvested, and stored under a variety of conditions. What can be analysed are the various types or varieties of the food available to the consumer. Then these data can be combined into a single entry in a data base, an entry which does not represent any existing food, but can be used in certain well-defined situations. The appropriate procedures for combining data in this manner, producing weighted averages, are detailed in chapter 3. (Note that another alternative is to prepare and analyse a food sample that consists of specific proportions of different cultivars or brands and that this is occasionally done.)

Number of Different Foods

With limited resources for the compilation of food composition data bases and the current state of development of analytic techniques data base compilers can analyse all foods for all the nutrients that users require. The alternatives are using analytic data that already exist, using data on similar foods, and using recipe data for multiple ingredient foods.

It must be recognized that these various procedures all have certain inherent limitations such as potentially low accuracy or impressions of unrealistically high precision. These two independent aspects of data are critically important.

Discrepancies between different estimated nutrient values reflect different sources of foods and nutrients, methods of estimation, modes of expression for the data, judgements of coders, or values for yields retentions, and water/fat changes during preparation. This situation can be improved by careful definition and documentation of estimation procedures.

Types of data bases

There are several different ways of classifying food composition data bases, just as there are different ways of classifying the quality and origins of the data in them. The most important distinction is between REFERENCE data bases and SPECIAL-PURPOSE or APPLICATION data bases. These two types of data bases (which are not totally distinct but rather represent two ends of a spectrum) reflect an important two-level structure in the community of data base compilers. As a generalization, some individuals and organizations seek to maintain and provide general storehouses of food composition data for diverse users to draw upon, and others build specific data bases tailored to specific uses (such as the analysis of consumption data from a particular survey). There are also groups that maintain a large, general-purpose data base as well as produce subsets of it for specific applications sometimes supporting themselves and the maintenance of the general data base by sale of the specific application data bases and services.

More specifically, REFERENCE data bases are those designed primarily to provide the raw material for the construction of other, APPLICATION data bases. Reference data bases may contain analytic data for specific food items (e.g., a particular cultivar of apples), analytic data for composite foods (e.g., apples by market share), data that have been compiled and aggregated from various sources (e.g., data on "apples" from various sources weighted by number of samples), or any combination of these. A key property of a reference data base should be that it contains complete information about sampling, sources, and methods used to produce data, in addition to complete descriptions of the nutrients and foods included. This information is essential for evaluating selecting, and combining the data into more application-oriented food tables or data bases. The standard examples of reference data bases are those of the FAO (Food and Agriculture Organization) [47, 46], INCAP (the Institute of Nutrition of Central America and Panama) [48], the United States [96], the United Kingdom [59], and Japan [75]; however, the full set of data for reference data bases is not usually published, remaining instead in the files of the issuing organization.

Alternatively, an APPLICATION, or SPECIAL-PURPOSE data base, is one that is keyed to a specific application or problem. It generally does not contain extensive descriptive or primary data; however, it does need references to this information. An application data base may be a subset of a single reference data base (a specific set of foods or class of nutrients); it may draw data from several reference data bases (similar foods in different pans of the world); or it may supplement data from reference data bases with data gathered from other sources. Application data bases tend to be more compact and easier to handle and use than the general reference data bases. Often they are organized differently from the very general format required by the reference data base.

Contents - Next