Contents - Previous - Next

This is the old United Nations University website. Visit the new site at

The variability of the data

By the "goodness' of the data we mean suitability for the purpose at hand - how well the data will permit us to get on with whatever task we are involved with. For most users, finding a value in a table is sufficient. They will then use it as they need, assuming that that number is the best available estimate of some specific nutrient in some specific food.

Table 1. Whole chicken egg, fresh, raw, per 100 g edible portion

Table Water
USDA (new) 74.57 158 12.14 11.15 1.2 56 2.09
USDA (old) 73.7 163 12.9 11.5 0.9 54 2.3
United Kingdom 74.8 147 12.3 10.9 Trace 52 2
Federal Republic of Germany 74.1 167.08 12.9 11.2 0.7 56 2.1
Sweden 74.4 150 12.7 9.4 2.7 51 2.1
Denmark 74.6 155.8 12.1 11.2 1.2 2  
Finland 74 145 12.8 11.7 0.5 2.5  
Norway 75 155 13 11 0.7 2.1  
Italy 73.9 156 13 11.1 1 50 2.5
East Asia 73.7 163 12.9 11.5 0.8 61 3.2
China 1 71 170 14.7 11.6 1.6 55 2.7
China 2 70.8 187 11.8 15 1.3 58 4.3
China 3 73 174 13.1 13.5 3.6    
China 4 70 175 15.3 11.9 1.6 64 0
China 5 73 160 12.7 11.3 2 55 2.8
Republic of Korea 74 160 12.7 12.1 1.2    
Japan 70.7 199 12.2 15.2 0.9 65 1.8
Malaysia 73.2 166 13.3 12.5 0 57 3
India 73.7 173 13.3 13.3 60 2.1  
Africa 77 140 11.8 9.6 0.6 45 2.6
Near East 72.8 160 12.1 11.4 1.2 55 2.9
INCAP 75.3 148 11.3 9.8 2.7 54 2.5
Brazil   163 12.9 11.5 0.8 61 3.7
Australia   160 12.6 11.6 0.8 54 2.4

a. Data are taken from standard national and regional tables.

If the user were so confused that he consulted an "expert," that expert might give either or both of two answers: (a) "the values in the different tables are really measurements of different objects, often by different methods," or (b) "the differences do not matter." While the first answer is probably true, the second is often false - in general it does matter. For example:

1. A person on a specific diet will receive differing advice depending on what data base is used to analyse his/her food intake.

2. A small difference in an individual diet can become a large difference when projected to a population estimate - the level at which important decisions such as resource allocation are made (table 1).

3. In general, apparent and unexplained inconsistencies reduce the confidence in all such data, in the system which provides such data, and in the science that works with such data.

Table 2. Calcium (mg) in 100 g milk

USDA (8.1) Mean = 119 SE = 0.251; N = 1,054 (SD = 8.15)
McCance and Widdowson Mean = 120 Range = (110-130)
Souci/Fachmann/Kraut Average = 120 Variation = (107-133)
Swedish NFA Average = 113 variation = (100-122.1)

The problem hinges on the fact that it is unlikely that these data really are inconsistent

- they only look inconsistent. Most data bases present a single value for a specific nutrient in a specific food. This leads back to the first answer above

- that different things were measured. If we look closely at the "food" component of figure 1 we note that it should be expanded as in figure 2, to show that every sample of a food is quite likely to differ from every other sample, and this is before the chemists take over and add their own variability.

While it is important to realize that there are a number of specific sources of food composition data variability, the major point is that few tables even hint that such variabilities exist. Moreover, those tables that do, such as those shown in table 2, do not do so in a consistent fashion, nor are they very helpful about just how to use this added information.

The point to be stressed is that there is not a single food of each kind - there is no Platonic ideal "egg." Foods are not mathematical ideals, but must be considered as probabilistic or statistical objects, with statistical distributions of their nutrients. Any compilation or use of food composition data must be firmly based in this fact.

Fig. 2.

We are faced with the fact that a data base needs to contain more than just a single value for each food-nutrient combination. The description of a distribution is not straightforward; few distributions can be described adequately with just a few numbers. (The obvious counterexample is the Normal or Guassian distribution, the familiar bell-shaped curve. However, few measurements follow this distribution precisely.) The "statistics" that can be used to describe an arbitrary distribution include the mean, mode, median, quartiles, percentiles, standard deviations, and mean deviations. Each has its adherents and rationale. The improvement of food composition data requires careful investigation of both where the data come from and what they are to be used for. Inherent in the viewing of food composition data as "data" are several implications that need to be stressed.

1. Each type of user is likely to require different statistics. These need to be carefully defined and justified. For example, someone wanting to estimate intake would perhaps be satisfied with a mean or median value, while someone worrying about meeting requirements would want an upper or lower limit. In order for data banks to be well designed, for them to include "good" data, each user must decide the best data representation for his/her specific application.

2. Data banks must be designed to provide information about the variabilities of their holdings. Ultimately this requires access to raw data, but, well before that, standardized and documented algorithms for data manipulation are needed.

3. Users must be made aware of the inherent variabilities of the data, of the magnitudes and implications of these variabilities, and of the procedures for handling this inescapable aspect of food composition data.

4. The sources of data variabilities need to be sorted out as a preliminary step, estimating their magnitude, exploring their importance, and reducing those that can be reduced by approaches ranging from standardizing analytic techniques to developing regional values.

The INFOODS system

Data interchange and regional centres
Regional decisions
Local decisions


Laboratory of Architecture and Planning and INFOODS Secretariat,
Massachusetts Institute of Technology, Cambridge, Massachusetts, USA


A major goal of INFOODS is to improve the availability and accessibility of food composition data. This goal, coupled with the existing structure within the field and the resources currently and likely to be available to INFOODS, mandates that the basic structure of INFOODS consist of a small secretariat, multiple regional centres, and guidelines about how "things" should fit together and operate. This paper presents an overview of how we expect INFOODS to operate as a nutritional "system," with special emphasis on data interchange from the perspective of the user.

Data interchange and regional centres

INFOODS is basically a collaboration of people and organizations involved with the generation, collaboration, and use of food composition data. Key to this is the concept of moving data around - data interchange. The current model of the operation of INFOODS shows a number of regional centres, each acting as a focus for composition data within its region. We hope and expect that there will not be more than about a dozen such regions, and that industries and even governmental data banks will not constitute regions. Within a region, user centres and user groups - ranging in size from individual desk-top computers to large governmental or university installations - will use the regional centre as their access point for data from other regions, and perhaps as a means of accessing data from within the region. No particular model of regional centre operation is required by this. We expect that some regions will maintain all regional data centrally, that others will maintain no data but will have convenient mechanisms for accessing data from distributed repositories, and that still others will operate with a mix of local and distributed data. We see the flow of transactions as:

1. User facility queries the regional centre as to whether it has particular data available and how it is obtained.

2. If the regional centre does not have the data, and is not aware of its availability within the region (or whether another regional centre has it), it makes an INFOODS inquiry as to where the data might be found. In order to make this inquiry the only requirement is that there be a single international focus once the regional centres are in full operation. It is an inquiry about the location of data ("if this exists, who has it?") rather than about the data, and does not necessarily imply any computerized facility or electronic communication at all.

3. INFOODS either responds to the original regional centre with the availability information, in which case that regional centre requests the data from a second centre, or it requests that the second regional centre forward the data to the first.

4. The second regional centre does whatever is needed to obtain and prepare the data within its region, then forwards that data (in INFOODS interchange format) to the first regional centre. The interchange format is insensitive to mode of transmission - electronic networks, magnetic tapes, or even messages written on pieces of paper. Transmission modes will be worked out between pairs of regional centres on the basis of what is available and the real or anticipated level of demand.

5. The regional centre forwards the data to the user facility, possibly after some format conversion or editing operations agreed to within the region.

Regional decisions

This approach implies several decisions that each region will make separately. While INFOODS may make suggestions in these areas, the effective operation of the network will not depend on those suggestions being followed (or even asked for and supplied).

The first of these decisions is where the data themselves will be retained. The overall INFOODS notion requires only that a regional centre be able to provide requested information when it is requested, or within some reasonable time thereafter. The data could all be kept at a regional facility, or some could be kept centrally and some at various sites within the region, or all could be kept at local sites, with the regional centre acting as a collection and redistribution point only. Indeed, it would be possible for a region not to have any centralized computing facilities at all, but simply to receive requests for data and dispatch those requests to facilities within the region that had the data and could respond to the requests.

If we consider the possibility of each local facility- a country, a ministry, or academic or research facility - having its own unique computerized tools for managing nutrient data, the importance of the standard format for the interchange of data becomes even clearer. The INFOODS notion is that data moved between regional centres will be organized into this interchange format, with translation to and from that format occurring within the regions. The second regional decision will concern just where that translation is made. Will it be made at the regional centre, which might maintain a table of the formats required by local facilities within the region? Or will local facilities be expected to accept (and, if they are producers, create) data in the interchange format? We expect that this decision, too, will differ from region to region. It may even differ within a region, with some local formats being supported by the regional facility and local facilities that require other formats being required to translate from the interchange form themselves.

Third, a region will have to make sure its own decisions about how much data originating outside the region it should retain and for how long. At one extreme, the nature of the interchange and query arrangements should be such that there would be no technical obstruction to a single region assembling all of the world's data and retaining them locally. Since the cost of doing so includes an ongoing effort to keep that body of data current (or to determine when subsets of the data are no longer current), we have recommended that no region actually attempt to do this. A regional facility that discovered that particular data were requested repeatedly might reasonably decide to retain them, rather than requesting them from another region each time an inquiry arrived. At the other extreme, a region that chose to have no regional centre at all, but only a communications network, would presumably retain no data from outside the region except at local facilities.

Local decisions

Just as there are decisions that we expect each region will make separately, even if several of them reach the same conclusions, there are decisions that we expect will usually be made at facilities within the regions. For convenience, we describe these as "local decisions" and "local facilities," recognizing that some facilities that operate within regions and below the regional level may, themselves, operate as subregions with several levels of smaller or more local machines dependent on them for data or other services.

Each local site will be able to make its own decisions about what programs it should run and what hardware they should be run on. That flexibility is needed for at least two reasons - the need to be able to accommodate local interests or requirements for different types of data processing or entry, and the reality of constrained choices in hardware, and sometimes software, in various parts of the world (for example, if an institution has decided to run only IBM mainframe hardware and no stand-alone machines, then there is no value in a recommendation for "standard" software that requires a desk-top machine).

Not unlike the regions, each local facility will have to make its own decisions about what data are retained and for how long. While at the regional level this question applies primarily to the foods to be retained, at the local level there is the additional question of what data - nutrients, quality information, data-source information, other descriptive information, non-nutrients, and so forth - about each food should be kept as well.

Similarly, requests for data to be imported into a particular local facility will need to be made at that facility. Neither INFOODS nor a regional centre can make sensible recommendations about what data (especially extra-regional data) a particular local facility should retain or make available to its users. In this context, any software, especially end-user (small machine) and the user interfaces of regional software maintained by INFOODS, will be produced primarily as demonstrations of the feasibility of the general structures, of at least one way to do things, and, in the case of interchange formats, references as to how those formats are handled. INFOODS is also likely to produce some subroutine-level codes that will be suitable for embedding in other systems to provide debugged interfaces to facilities defined as part of the INFOODS work. There is no obligation on any facility, local or regional, to utilize any of that software directly, and its principal purpose will be demonstration and reference. None the less, we expect that it will be directly useful to some facilities, and they will certainly be invited to use it if they wish. The decision to do so will be another decision that can be made regionally or locally, as appropriate.

If we have a single group of recommendations for everyone, whether local facility or regional centre, it is that programs and systems be designed on the assumption that change, probably a great deal of change, will occur and is inevitable. In particular, as technical groups and individual scientists in different parts of the world progress with their work, we can expect to see explosions in both data and requirements: more foods, more nutrients, more desire to accommodate non-nutrients and source, quality, and description information, and a continuing flow of new and updated data as analytic methods improve. For those of us who have become attached to relatively small machines, there may also be changes in systems and architectures every few years as our perceived needs expand to match new generations of hardware and software. We argue strenuously below that the best system may be one that is closely tailored to a particular set of needs and users, with no presence at being "general purpose," but even such special systems should be designed so that the next study to be performed, or the next nutrient to be entered, does not completely disrupt them.

"What does the user need?" One theme that surfaces frequently in discussions is that users have only two important needs, "data and more data." We suggest that this is not true. The reason why it is not leads us to the first of our major system design suggestions. Data, as data, are essentially worthless. Computer systems should serve us in two important ways - to organize and catalogue data so that they can be managed, retrieved, and processed, and to provide assistance to process the data into information, something that we do need and probably need more of. Information has to be defined in terms of particular needs, and it is essential that the needs of food composition data be articulated and documented.

Contents - Previous - Next