This is the old United Nations University website. Visit the new site at http://unu.edu
Concerns of users of nutrient data bases
Managing food composition data at the national
level
Maintaining a food composition data base for
multiple research studies: the NCC food table
Managing a nutrient data-base system: meeting
users' needs and expectations
Introduction
Accessibility
Installation
and updating efforts
Data availability
Computational concerns
Data-base
and software products
References
LORETTA W. HOOVER
University of Missouri-Columbia, Columbia, Missouri, USA
Numerous uses of nutrient data bases have been identified in the professional literature during the past twenty years [5, 6,12]. Users of nutrient analysis software vary greatly in their degree of sophistication and hold widely varying ideas about what a nutrient data base should contain and the features of the associated computer programs. Professionals are seeking reliable data and systems but have no definitive measures for identifying such systems.
A two-tier system of users, with differing needs, has evolved. The first tier interacts directly with depositories of nutrient data- USDA, food manufacturers, and other sources of nutrient data. This group of users includes researchers, clinical practitioners in complex organizations with data-processing support, vendors of nutrient analysis software systems and services, and some private practitioners. These experienced and knowledgeable users who have maintained nutrient data bases for several years are acquainted with the issues of accessibility, installation and updating effort, availability, and computational concerns; they are concerned with the existence of software and data bases that are questionable with respect to accuracy of data, computational results, and dietary guidance.
The second tier of users consists of those individuals who acquire software packages or analysis services from vendors or through resource-sharing arrangements. These are often firsttime users who may be unaware of methodological and structural issues. They are often concerned about compatibility with a given brand of hardware, cost, and how to select a suitable system.
The first tier of more sophisticated users is likely to benefit most from a data network such as the one envisioned by INFOODS. The second tier will benefit indirectly, as more adequate and comprehensive data bases and software packages become available in the market-place. Meeting the needs of the first tier of users should be given priority.
Accessibility factors of concern include source, cost, and timeliness. These factors are important both at the time of initial acquisition of a data base and when an existing data base is being updated.
Maintaining a data base involves much redundant effort on the part of the first tier of users. Since no single source is available for all categories of data, many developers are involved in securing data about commercial products, ethnic foods, and regional specialities from the professional literature, food manufacturers, and unpublished sources. The aggregate time and cost spent on these activities in addition to the periodic costs associated with acquisition of machine-readable data constitute a significant expense for most users in the first tier. Minimizing the total costs for acquiring and maintaining a data base is a concern, particularly for those who do not market their data bases or services.
Timely delivery of data is another aspect of accessibility. Often users have an immediate need for data on specific nutrients or for certain foods. For example, recipes cannot be coded when data are lacking for certain ingredients. Meeting deadlines on research projects may be difficult when data are not readily available.
Knowledge of the availability of new data is also essential. Notification systems are needed to alert users when new data are released. Since some users may wish to access only a portion of the new data, opportunities for extracting selected data records would permit users to maintain up-to-date data bases tailored to their special needs.
With on-demand access, many users would be able to minimize the size of the data base maintained locally, since one could retrieve data from a central depository with the nutrient profile reflecting the most reliable values for each constituent. Acquiring data on a "just-in-time" basis would permit users to avoid data maintenance responsibilities for some data records until a need arises.
The effort required to install and maintain a nutrient data base depends on the specific characteristics of the installation. Because of the effort required when a data record format is changed, a standard format that can remain stable for an extended period of time is desirable. Each time the data record format from a depository is modified, users must either recreate their data bases and reformat data of local interest prior to merging with new data or reformat the new data to be compatible with their existing data-base design.
Many data-base developers utilize nutrient profiles for ingredients to estimate the nutrients in mixed dishes. Information about the quantity of each ingre client is stored in recipe data bases along with cross-referencing codes used to retrieve the nutrient profiles from a nutrient data base when the nutrients per portion are calculated. These recipe data bases facilitate the recalculation of nutrient profiles for mixed dishes by computer. The recalculation process can be involved at different times: periodically, such as monthly; whenever the contents of the nutrient data base are changed; whenever a change is made in the recipe formulation; or when a dietary record is processed with a mixed dish coded as a consumed item. With the maturation of computerized systems for food production and patient care, the use of recipe data bases is more prevalent, particularly in health-care organizations. For these users, the overhead associated with a data record format change is multiplied, since a recipe data base may also require recreation and associated software must be modified to process the new data structures.
Since conversion is expensive and time-consuming, some users may need the opportunity to acquire new data without being required to change to a new data format. Users with minimal resources or limited technical support must be able to maintain compatibility with existing software.
Data should be available in machine-readable form. Manual data entry is time consuming and error-prone; the probability of locating mistakes is low. Verified data from a reputable source in machine-readable form helps to assure the integrity of nutrient data bases.
Since so many data bases are updated with nutrient profiles for brand-name products and fast foods, a clearing-house or depository for brand-name product information is needed. These data should also be distributed in machine-readable form. The numbering scheme for these products should be co-ordinated with the coding scheme adopted by depositories providing nutrient profiles for generic foods. Also, the measuring units for the amount of food and food constituents should be consistent with those available for generic foods.
A universal or standardized code would be useful to some users to facilitate data-base updating and inter-data-base communications. In a survey of data-base developers conducted in March 1984 by the Data Base Committee of the Ninth National Nutrient Data Bank Conference, 28 of a total of 52 respondents preferred a standardized code or vocabulary.
Recoding within an existing system when the keys to data records are changed is a major task. Cross-reference or linking files are needed to facilitate recoding in other associated data bases such as recipe files. These linking files, which contain the new code number paired with the former code number for each food item, should be available in machine-readable form.
Another practice which would benefit end-users when a new coding system is adopted is application of the new coding system to all food items in an existing data base at one time. Even though new data may not be available at a given time for all entries in the data base, this approach would permit users to accomplish conversion to the new coding scheme and data structure as a single task rather than having to repeat the recoding task each time new data are released. However, if new data are not provided for some of the food items or some items are deleted, users should be informed so that they can replace the obsolete codes with new ones in associated data bases, such as recipe data bases.
Availability of data from a primary depository in user-specified formats would permit customized data bases to be downloaded so that minimal technical expertise would be required of the end-user. On-line retrieval is desirable when the amount of data can be accommodated by a user's equipment and data-transfer costs are economical.
Maintaining compatibility with existing software and other computerized systems reduces the overheads associated with installing new data bases or updating existing ones. Many nutrient data-base users in health-care organizations must compete with other users in their facility for support from the data-processing department. Efficient use of those resources is necessary when data-base conversion is required. Thus, multiple optional arrangements for data transfer would allow users to avoid some installation and updating costs.
The numbers of foods and food constituents per food desired in a nutrient data base vary according to the information requirements of users. Those engaged in research are usually interested in a high level of specificity in food description for those foods of interest to them. In contrast, users in other settings are generally satisfied to use data bases where the items are described in less detail.
In the survey of data-base developers, data on brand-name foods were requested for cereals, candies, fast foods, frozen entrees, margarines, formulated "recipe" items, and fortified foods. Thirty-eight of 52 respondents indicated a need for brand-name data. Some users requested that values be given for all nutrients where Recommended Dietary Allowances (RDAs) have been established. Often, users have little basis for imputing missing values in the nutrient profiles. The nutrient profiles provided in conjunction with nutritional labelling often lack data for constituents of interest to nutritionists.
More complete data are needed for several food constituents. Of particular concern are data on fibre, individual sugars, other carbohydrate fractions, and trace elements. Some developers are also seeking data for the caffeine and alcohol content of foods. The prevalence of missing data in many data bases is a problem. Users are seeking complete profiles for a broad range of nutrients for foods commonly consumed.
As indicated above, many nutrient data-base users estimate nutrient profiles for mixed dishes based on institutional or family recipes. Estimates for some nutrients for recipes are included in many cookbooks. Both food-service management systems and nutrient analysis systems are being designed to facilitate the calculation process. To focus attention on some problems associated with this practice of nutrient estimation, Hoover and Perloff [8] included a simple recipe for a tuna noodle casserole as a computational task in a methodology for assessing computer software.
Several methods are being used to estimate nutrient profiles for recipes, with each requiring associated data not usually present in nutrient data bases. In USDA Handbook No. 8 [1], nutrient-retention factors, ingredient-weight adjustment factors, and nutrient profiles for raw ingredients have been used to calculate nutrients for recipes. Additional information about ingredient yields has been provided by USDA in another publication, Handbook No. 102 [10]. Although provisional nutrient retention information has been made available [11], more information is needed for more foods and preparation methods.
In the 1960s, a different calculation method was implemented in food-service software systems [2, 3]. The major difference in calculation method was use of nutrient profiles for the finished form of each ingredient rather than the nutrient profiles for raw ingredients and nutrientretention factors. Numerous software systems are now using this yield-factor method.
Marsh [9] has described a study that focuses attention on the calculation of nutrients for mixed dishes. An approach for calculating nutrients based on "dish retention" was compared with the two methods mentioned above. Although all three calculation methods were used in the study, none of the methods was identified as best in a preliminary discussion of the findings.
Although elaborate procedures can be used to estimate nutrient retention and ingredient yields in a finished product, the actual nutrient profile for a recipe is not known unless a laboratory analysis is performed. Constituent over- or underestimation could adversely impact dietary guidance or menu planning. Further information is needed to identify the best methodology for calculating nutrients for recipes. Without a standard methodology, the results from various nutrient-analysis systems are not likely to be comparable. Depending on the method identified as most reliable to support the calculation process, additional ingredient information will be required. Some of this ingredient information, such as yield factors, may be suitable for incorporation into nutrient data bases.
As computer technology became available, some data-base developers constructed customized data bases, including many constituents not then present in data bases from government sources or published tables [4]. Some of these large data bases have since been made available to other users with access to mainframe computers.
More recently, the availability of microcomputers has provided data-processing access for most professionals and a segment of the lay public. The number of products available to meet the needs of the second tier of users has expanded rapidly during the past few years. A total of 69 analysis systems are described in the fourth edition of the Nutrient Data Bank Directory [7]. Many of the packages were developed for use on microcomputers. The number of foods and nutrients varies. Because initial data storage was limited on microcomputers, some of the early nutrient data bases developed for small machines contain fewer foods and nutrients than are now available. Some developers have concentrated on providing data for popular foods with complete profiles for nutrients of greatest interest.
With so many packages readily available, many individuals are confused when making a choice and do not know how to assess the products. Those users having their first experience with computers are often unaware of what issues to consider. Hence, the first tier of users must assume the responsibility for supplying creditable nutrient data bases and analysis software.
1. C. F Adams and L. J. Fincher, Procedures for Calculating Nutritive Values of Home-prepared Foods: As Used in Agriculture Handbook No. 8, Composition of Foods - Raw, Processed, Prepared, Revised 1963, ARS 61-13 (USDA, Washington, D.C., 1966).
2. J. T. Andrews. "Development of a Standardized Recipe Data File for Computer Systems," in A. N. Moore and B. H Tuthill, eds., Computer Assisted Food Management Systems (University of Missouri Technical Education Services, Columbia, Mo., 1971), pp. 33-45.
3. J. L. Balintfy, `'Menu Planning by Computer, ' Communications of the American Computer Machinery, 7(4): 255-259 (1964).
4. A. A. Hertzler and L. W. Hoover, "Review of Nutrient Databases: Development of Food Tables and Use with Computers," J. Am. Diet. Assoc., 70: 20-31 (1977).
5. L. W. Hoover, "Computers in Dietetics: State-of-the-Art, 1976," J. Am. Diet. Assoc., 68: 39-42 (1976).
6. L. W. Hoover, Computers in Nutrition, Dietetics and Foodservice Management: A Bibliography (University of Missouri-Columbia Printing Services, Columbia, Mo., 1983).
7. L. W. Hoover, ea., Nutrient Data Bank Directory, 4th ed. (University of Missouri-Columbia Printing Services, Columbia, Mo., 1984).
8. L. W. Hoover and B. P. Perlofl, Model for Review of Nutrient Database System Capabilities, 2nd ed. (University of Missouri-Columbia Printing Services, Columbia, Mo., 1984).
9. A. Marsh, "Problems Associated with Recipe Analysis," in R. Tobleman, ea., Proceedings of the Eighth National Nutrient Data Bank Conference (US Department of Commerce National Technical Information Service, Washington, D.C., 1983), pp. 29-38.
10. R. H. Matthews and Y. J. Garrison, Food Yields Summarized by Different Stages of Preparation, Agriculture Handbook No. 102 (USDA, Washington, D.C., 1975).
11. Provisional Table on Percent Retention of Nutrients in Food Preparation (USDA Nutrient Research Group, Washington, D.C., 1982).
12. J. Youngwirth, "The Evolution of Computers in Dietetics: A Review," J. Am. Diet. Assoc., 82:62 67 (1983).
Introduction
Data
input
Data
output
Special considerations
Conclusions
References
FRANK N. HEPBURN
Nutrition Monitoring Division. Human Nutrition Information Service, US Department of Agriculture, Washington, D.C., USA
The management of food composition data at the national level is carried out with the US Department of Agriculture's National Nutrient Data Bank (NDB). A distinction should be made between this management and nutrient data-base management in the more usual sense, such as is carried out in support of the many computerized dietary analysis systems - often called nutrient data-base systems - now in operation. They differ in that the NDB summarizes individual analytical values into a nutrient data base of representative values for foods. These in turn can serve as the foundation for the dietary analysis systems. Essentially, the NDB is the provider of summarized data, and the managers of data systems built upon those summarized data are the NDB's primary users. It is the purpose of this paper to provide insight into the NDB's present mode of operation, describe modifications for improvement now under way, discuss efforts for improving the quality of data, and indicate new applications of the system that may benefit INFOODS.
The Nutrient Data Bank was conceived and established as a computerized means of storing and compiling data on the nutrient composition of foods and of providing average, or representative, nutrient values to data users. Because the computerized system serves as the mechanism for the revision of Agriculture Handbook No. 8, the expansion of data stored in the NDB parallels progress on the handbook revision. The current publication status is shown in table 1 [9]. Food groups covered by AH-8, section nos. 18-22, are those most actively pursued in the data-entering stage at the present time.
The essential features of the NDB system have been described in detail elsewhere [5, 8]. For this discussion it may be helpful to describe briefly the NDB at each of its three levels. Data Base 1 (DB1) consists of the individual entries of nutrients in a food item, together with detailed descriptions of the food item and particulars concerning the measured value. At present, over 800,000 individual records are stored in the NDB, and additions continue to be made at the rate of about 6,000 to 9,000 per month.
Table 1. Status of Agriculture Handbook No. 8 revisions
Sections published [9] | Sections in preparation |
8-1 Dairy and egg products | 8-13 Beef products |
8-2 Spices and herbs | 8-14 Beverages |
8-3 Baby foods | 8-15 Fish and shellfish |
8-4 Fats and oils | 8-16 Legumes |
8-5 Poultry products | 8-17 Lamb, veal, and game |
8-6 Soups, sauces, and gravies | 8-18 Bakery products |
8-7 Sausages and luncheon meats | 8-19 Sugars and sweets |
8-8 Breakfast cereals | 8-20 Cereal grains, flours, and pasta |
8-9 Fruits and fruit juices | 8-21 Fast foods |
8-10 Pork and pork products | 8-22 Mixed dishes |
8-11 Vegetables and vegetable products | 8-23 Miscellaneous foods |
8-12 Nut and seed products |
Data Base 2 (DB2) consists of summarized values of nutrients in food items that have like descriptions. Individual values are averaged and standard deviations calculated for each grouping. Data at this stage of summary provide the opportunity to examine specific food descriptions, such as year of harvest or region of growth. The application of DB2 information to development of an international data base of cereal grain foods was described in a previous publication [6]. At present, data in DB2 are generally too limited for meaningful statistical distinctions, but the potential for such use by INFOODS should be kept in mind as a means of providing more detailed access to data than is now possible.
Data Base 3 (DB3) contains data at the level familiarly known in Agriculture Handbook No. 8. The aim is to provide data that are representative of foods across the nation on a yearround basis. To this end, groupings within DB2 that are indistinguishable at point of purchase or consumption, or that have nearly identical nutrient profiles, may be combined to yield overall mean values. The total number of observations and standard error are also calculated. A provision of the NDB system allows the components to be weighted to produce averages that are more representative for the nation. DB3 is also used to create the computerized version, the USDA Nutrient Data Base for Standard Reference (available from National Technical Information Service, 5285 Port Royal Road, Springfield, VA 22161).
The Nutrient Data Bank is still in its formative period and has not yet reached the stage of continuous maintenance management. At this time, attention is still focused on completing the revision of Agriculture Handbook No. 8, and data management is thus devoted primarily to control of data input and output.