This is the old United Nations University website. Visit the new site at http://unu.edu
The United Nations University is an organ of the United Nations established by the General Assembly in 1972 to be an international community of scholars engaged in research, advanced training, and the dissemination of knowledge related to the pressing global problems of human survival, development, and welfare. Its activities focus mainly on peace and conflict resolution, development in a changing world, and science and technology in relation to human welfare. The University operates through a worldwide network of research and post-graduate training centres, with its planning and coordinating headquarters in Tokyo, Japan.
The International Food Data Systems Project (INFOODS) is a comprehensive effort, begun within the United Nations University's Food and Nutrition Programme, to improve data on the nutrient composition of foods from all parts of the world, with the goal of ensuring that eventually adequate and reliable data can be obtained and interpreted properly worldwide. At present in many cases such data do not exist or are incomplete, incompatible, and inaccessible.
This volume is the fourth in a series that provides information and guidelines about requirements for food composition data, the identification of nutrient and non-nutrient components of foods, the computer representation and accurate interchange of food composition data, and on the organization, compilation, and content of food composition tables and data bases. It presents the structure and rules for moving data files between countries and regional organizations in a way that preserves all of the information available. The approach also alerts the developers of data bases about potential areas in which ambiguities are likely and special care should be taken and identifies some mechanisms for improvement of overall nutrient data base quality.
Many people made significant contributions to the development of the INFOODS Interchange System. In particular, Dr. David Peterson and Ms. Roselyn M. Romberg contributed many ideas and some text to earlier versions of this document. Comments from Mr. Craig Franklin and Dr. Zita Wenzel, as well as Dr. Peterson, helped to determine the special forms of using the SGML Standard. Professor Vernon R. Young, Dr. Lenore Arab-Kohlmeier, Ms. Diane Feskanich, and several others provided feedback at critical times about the relationship of the evolving system to possible practice with nutritional data and their use. Anders Møller, Lena Bergström, Brucy Gray, Pam Verdier, and the New Zealand Division of Scientific and Industrial Research provided data against which the conversion models could be effectively tested, some of which is incorporated, with permission, in the examples. The material on the description of data files being interchanged and on the description and classification of foods is a formalization of material, some of it still unpublished, developed by the INFOODS Committee on Terminology and Nomenclature, headed by Professor Stewart Truswell. The data description section derives from several discussions and position papers about the basic character of ideal descriptive statistics for small samples and unknown distributions with Dr. Ree Dawson and Professor William M Rand, both of whom also made frequent and helpful comments about other parts of the manuscript and the working documents that were the foundation for it. Finally, the system was presented in technical detail at a special Oceaniafoods technical workshop on food composition data base organization and interchange. The participants in that workshop provided invaluable feedback on both technical issues and on the clarity of some of the concepts.
Any work of this type is ultimately a synthesis of many ideas and concepts. Much of the credit should go to those who contributed; the blame for the interpretations rests, as always, with the author.
AN INTERCHANGE SYSTEM FOR FOOD COMPOSITION DATA
A major goal of INFOODS has been the development of easy and accurate interchange of food composition data among countries and regional organizations. Such data exchange will obviate the perceived need for a single international data centre which holds all of the world's data, replacing it with distributed arrangements in which most data are held by their compilers, or by regional data centres operated by organizations of which data compilers or owners are members, until the data are actually needed.
It is not sufficient merely to move data back and forth. Food composition data are complex and often are, or should be, accompanied by extensive description of the foods being reported upon and the methods of analysis used. It has become clear in the last few years that the introductory material in a printed table may be nearly as important as the data values (see, for example, Arab et al. [1]). The need for such description and explanation arises through the necessity of comparing data from widely differing cultures. Not all food composition tables and data bases have the same level of description, however, and the informal text of an introduction is not the best way to communicate the information that is available, especially if it is to be processed automatically (e.g., by a computer), rather than simply read by trained scientists.
Other distinctions have been noted about various types of tables and data bases. Some data bases are oriented toward end users, others for national reference purposes, and still others are the fundamental collections of laboratory-level data before aggregation [24]. An effective interchange mechanism must be able to handle any of these types of data, without obscuring the differences in the types of information contained in each.
As one examines international data interchange, it becomes clear that the primary criterion for designing and evaluating a data interchange system is that it preserve whatever information actually is available, without forcing the data supplier to provide any more information than is known or imposing any more burden than is absolutely necessary. It would not be reasonable to try to require data suppliers to supply information which they do not know, or do not normally keep for their own purposes. Similarly, while in an ideal situation everyone might do things in the same way, the interchange system must be able to accommodate methods of reporting and data organization that some scientists might consider inappropriate. The inclusion of a way of expressing a particular concept in this document is therefore not necessarily a recommendation of that concept. Indeed, in a few cases, the text recommends against styles of data presentation and identification for which provisions are nonetheless made. Because identical and accurate sampling and analytic procedures, food selection, data description, and reporting are unlikely to ever occur in all tables, successful and meaningful exchange of food composition data has necessitated developing new conventions and technologies to organize and identify the many and varied components of these data.
Accurate comparison of data values requires very precise identification of how the values were derived and what they mean. When existing food composition tables and data bases are considered without their sometimes detailed introductions and appendices, there are often major ambiguities concerning the exact identification of foods, nutrients, units, and analytic and sampling methods. More careful comparison of food composition tables shows that different provide information about different nutrients, different types of foods, and different amounts and types of supporting information about samples, quality, recipes, and so on.
While any approach must accommodate the data that exist, the nutrient composition field continues to evolve. New food coding systems are introduced frequently, and changing hypotheses about the relationship between foods and health result in the introduction of nutrients that were not previously considered interesting into tables and data bases. If an interchange arrangement is to be useful for more than a few years, it must be "extensible", i.e., it must provide for new terminology, technology, and areas of interest to be defined and added to the system without compromising existing files and programs.
The differences in values and the ambiguities of data and food identification inherent in existing food composition data require that any interchange model operate on the assumption that actual tables and data bases cannot be expected to conform to a single standard or format. The interchange strategy must be descriptive of what decisions have been made about foods, food classification, nutrients, chemistry, or description and how those decisions have been carried out. At the same time, as suggested above, it cannot be dominated by norms about the "right" way to do things: even questionable data, poorly organized, may be more useful than no data at all, especially if the nature of the problems can be carefully identified and understood.
Partially as a result of the fact that particular data may be acceptable for some purposes and not for others, another goal of the interchange system is to permit tracing the flow of values, through copying (borrowing) or calculation, from one table to another and, more important, to be able to trace and assign responsibility for those values. All of the requirements for information that must be supplied with interchange files are the result of this tracking requirement.
To permit data interchange without loss of quality, and to encourage improvements in quality, data description, and data definition, INFOODS has designed a system of regional data centres and has developed an "interchange system" by which whatever data exist and are of interest can be transferred among regional centres with precise identification of values and without any loss of information. The interchange system is both a model of how data can be transported between regional centres and a data interchange format definition. As the latter, it is derived from principles of "generic markup" which are becoming increasingly important in the processing and exchange of textual documents. The standard for generic markup is specified in widely adopted international standards based on an International Organization for Standardization document, ISO 8879 [53]. Using generic markup has several special attractions, including its growing availability, the ability for people to directly inspect the format and content of the files, and the lack of dependence on any particular medium or data-transport arrangement. The other alternatives which are possible in principle were systematically eliminated as infeasible or too restrictive [55].
The interchange system will be used internationally, to facilitate exchanging data among countries and regions of the world. As with other INFOODS work, the interchange system uses existing international standards whenever possible, even when the invention of a nearly equivalent set of conventions specific to food composition data might result in short-term convenience or compactness. For example, provision is made for expressing food names in national languages and character sets where necessary, but only when consistent international standards for those character sets have been established.
THE REGIONAL DATA MODEL
While the details are not discussed in this manual, operating regional data centres, affiliated with INFOODS, are assumed as part of the interchange system. Those data centres act as a focus for food composition data base activity in their regions of the world and as the host for data interchange activity. When data are needed, for example, in most circumstances the user requiring the data would contact his or her own regional data centre, which would make arrangements to obtain them from a distant regional data centre, which might, in turn, obtain them from an organization within its region. The interchange mechanisms described in this manual are required only for use between regions. While they may be suitable for use between a regional data centre and data providers or users within its region, and may also be suitable for the ongoing storage of some reference or archival data bases, regions are free to work out their own arrangements for intra-regional communications and data interchange. A region that has specified its own data interchange formats and arrangements will presumably provide the capability to convert between the formats and conventions specified in this manual and its own formats at its regional data centre.
A regional data centre will typically be operated as part of an INFOODS regional liaison group, but this is not a requirement; either could exist independently of the other, and the term "regional data centre" is used instead of "regional centre" to stress this distinction. In principle, the regional data centre for a particular region need not even be located in that region, although it would usually be desirable for it to be.
In addition to acting as a focus for data interchange activities for its region, a regional data centre is expected to act as a registrar of international food record identifiers for the associated region, maintain current lists of interchange system tags and other identifiers, and keep records of tables and data bases originating in the region. It may also maintain some data locally, either from within the region (for easy export or as part of regional support functions) or from outside the region but frequently needed within it. In either of these cases, the regional data centre is expected to make special provision to ensure that its copies of data sets are kept up-to-date or that they are discarded when they are no longer current.
THE INTERCHANGE SYSTEM AS A CONCEPTUAL DATA BASE MODEL
While the principal design goal for the interchange system is information-preserving exchange of data among regional centres, its provisions for precise identification of nutrients and other food components, detailed recording of varying amounts of data about each nutrient and descriptions of those values, and ability to accommodate multiple coding, classification, and description systems may make it appropriate for national or regional use for archival and perhaps reference data bases. INFOODS has not made a specific recommendation that it should be used this way, but if the character of the data and description associated with a data base creates difficulties in using conventional data base systems with statistical or scientific data [4, 18, 27] the architecture of the interchange system, and software developed to handle it, might be considered as an alternative.
THE CONCEPT OF AUTHORITY
Food composition data, like most other scientific data, are rarely "true" or "false" in any absolute sense. Instead, the data values, the choice of foods, the decisions about whether two samples represent the same food, or a set of samples adequately represent some particular food, all represent scientific choices, not completely deterministic outcomes of perfect processes. In particular, it is possible, indeed likely, that different but equally skilled scientists would make different decisions, especially under different circumstances or assumptions about the user population and its needs.
As part of the important goal of preserving to the greatest extent possible all of the information about data being stored or transferred, an interchange system must move beyond traditional styles of exchanging only individual values in two important ways:
Similar issues apply at the level of "individual foods". As discussed in Chapter 5, each collection of data associated with "a food" is associated with a food record identifier. A data base may contain multiple records for a given food, with different sets of values. If it does, each of these records will have a different food record identifier. The decision about whether a single food should have one or several food records is made by the table compiler. The interchange system imposes only two rules: (i) If previously published and identified data for an entire food (i.e., a single food record) are copied together, the food record identifier must be the same as the corresponding one in the original or data base. That is, the authority and responsibility for the integrity of the data rests primarily with the compiler of the original table or data base (but not the decision to include the data in the particular new data base). (ii) By contrast, if a food record is assembled from multiple sources-e.g., proximates and vitamins from one country and minerals from another- several key scientific decisions go into the compilation and combination process, and a new food record identifier is assigned to the newly created food record.
THE ROLE OF THIS MANUAL
This manual defines the organizing principles and formats of the interchange system-the model by which data about food composition can be transferred from one facility (typically a regional centre) to another while structuring and preserving whatever information may be available. It also specifies the ways in which the interchange system and its elements can be extended to account for changes in scientific conventions or knowledge without requiring data bases to be changed or programs to be rewritten if the changes are not important relative to the content or users of those particular data bases or programs.
The interchange system, of which an overview appears in the next chapter, depends on these principles and on conventions about the syntax in which textual and numerical values are written. As with conventional textual use of generic markup, the essential syntax uses a collection of carefully-defined "elements" which, in turn, are identified by a collection of specifically-defined "generic identifiers". Generic identifiers are predefined word-like strings of characters used to distinguish one element type from another.
More precise definitions of these terms, and examples of how they apply, appear in the chapters that follow. Later chapters specify those elements which are part of the interchange structure itself; the structure of elements used to describe the origins of, and responsibility for, an interchange file; foods and the properties of data. While the structure of elements that contain data values about the quantities of individual components present in foods is specified here, the generic identifiers for the food components themselves are specified elsewhere, primarily in the food component identification listing [17]. The information in that book may be needed for in-depth understanding of some of the examples that appear here. With the exception of a few areas for which specific generic identifiers have not been assigned at the time of publication, every element that appears in this manual is described either in the reference seniors or in the food component identification listing.
The general model of the interchange system is applicable to a great deal of food-related data which are not yet defined for use with it. Decisions to limit the extent of what to define have been conditioned by finite resources, the focus of the initial INFOODS mandate, and lack of clarity either of the needs or of the appropriate solutions. When additional elements of these are needed, working papers that begin to explore their development will be commissioned. These as yet unneeded areas and definitions include the use of national character sets for other than names of food, listing of recipes for mixed dishes, listing of food economics values (e.g., food balance data or food prices), and listing of food components that are not normally considered nutrients (e.g., food additives and contaminants).
PURPOSE AND AUDIENCE
This manual provides sufficient information about the interchange system to permit programs to be correctly written that will produce and interpret interchange files. Readers who are only concerned about a general introduction to the interchange system should concentrate on Part I, reading quickly through the balance with the confidence that most of the details are not important to them. Nonetheless, this is a technical document, and some terminology is used in very precise ways. The glossary contains all such terms, and should be consulted when there is doubt about whether a word is being used casually or with some special meaning.
Finally, this manual does not discuss the particular methods of transporting an interchange file from one location to another. The interchange system is designed to be insensitive to the choice of media (e.g., magnetic tapes or floppy diskettes) or transport mechanisms (e.g., computer networks or the post), depending only on a specially-delimited "interchange file''. Since an interchange file consists only of text, it can be transported by any medium-including file transfer or electronic mail in computer networks; magnetic or optical recording on tapes, disks, paper, or diskettes; or even such older media as punched cards or paper tape-so long as the medium is able to transport eight-bit characters accurately. If elements that can contain "national characters" are removed from the file before it is sent or ignored when it is received, transmission with media that can process only seven-bit characters, or even low-quality computer printouts and telefax transmission and subsequent optical scanning are feasible. The only requirement is that the interchange file must be clearly separable from other information, a requirement that the file definition itself enforces. Sender and receiver should, of course, reach agreement about the media and mechanisms to be used before data are actually transmitted. Conventions about media and mechanisms for interchange among INFOODS regional data centres will be developed depending on the facilities available at those centres.