This is the old United Nations University website. Visit the new site at http://unu.edu


Contents - Previous - Next


2. Technical overview


BASIC TERMS AND DEFINITIONS

The structure of an interchange file is described in terms of elements, or precisely identified blocks of data. The element is the basic "building block" of an interchange file, and serves to identify and contain the actual data being exchanged. Elements provide a structure for the data which is logically ordered for machines and relatively easy to follow for human beings. A typical element might be:

<NA> 5 </NA>

Elements are identified by tags, which identify and surround contents. In the example above, <NA> and </NA> are the tags which surround the content "5". Some elements use only a single tag, and are delimited by the next tag in sequence, whatever it might be. For example:

<date> 1983.11.04

Here, the content is the string "1983.11.04" in ISO standard date format [41], meaning "4 November 1983", the actual data content of the element. Contents may be data values (i.e., numerals or unrestricted strings of text), keywords (i.e., special values from a restricted list), other elements, or a combination of values, keywords, and elements. Elements that occur within other elements are said to be subsidiary or nested, and the term immediate is used to denote direct nesting, without intermediate elements, when the distinction is important. The following example, a brief but typical food component or <comp> element, contains a combination of data values, keywords, and nested elements and illustrates these concepts:

<comp>
<VITC> 30 </VITC> <NA> 0.12 <unit/> MMOL </unit/> </NA>
</comp>

In this example, the food component element consists of two tags, <comp> and </comp>, called the start-tag and end-tag respectively, and a content of two nested elements. The first element is the vitamin C element, whose tags are <VITC> and </VITC> and whose content is the actual data value "30 milligrams per 100 grams edible portion of food" (the units are specified as the default in the definition of the tag associated with the identified food component [17]). The second subsidiary element is the sodium element, whose tags are <NA> and </NA> and whose content consists of a value and a subsidiary element which specifies the unit of measure. The unit element's tags are <unit/> and </unit/> and its content is the keyword "MMOL", which stands for "millimoles". The <VITC> and <NA> elements are immediately subsidiary to <comp> . <Unit/> is immediately subsidiary to <NA>, subsidiary (but not immediately subsidiary) to <comp>, and not subsidiary to <VITC> at all. When it is clear from context which is meant, as in the case above, the start-tag is referred to as if it were the element. For example, in the previous sentence it would be more precise to say "The <unit/> element is immediately subsidiary to the <NA> element...".

Spaces before and after elements and line breaks are ignored in the interchange system. Hence the example above could be written all on one line, or with the sodium and vitamin C elements on separate lines, and so forth.

STRUCTURE OF AN INTERCHANGE FILE

In order to permit processors for interchange files to interpret them accurately and efficiently, interchange files must adhere to certain structural conventions. Consistent structure for all interchange files facilitates ease of use and interpretation of the data, both by people and by machines.

Every interchange file contains a single <infoods 85> element. Other types of information, such as data about the transport medium (e.g., magnetic tape density), electronic mail headers, telex information, mailing addresses, and informal text associated with the transportation of the file may surround but are not part of an interchange file.

The <infoods 85> start-tag is the only tag in the interchange system which requires an "attribute" indicating the version of the interchange system in use, in this case the version dating from 1985. The first tag of an interchange file must appear, therefore, as <infoods 85> and the last one must be </infoods>.

The <infoods 85> element's content is made up of two or more subsidiary elements, appearing in this order:

The <header> element identifies the seeder end the source of the date. The <dflt> element identifies defaults which apply to the entire data file, such as weights and measures. The <food> element classifies the specific food, identifies any relevant measures, and supplies the relevant nutrient composition data for the food. The structure of an interchange file is therefore:

<in foods 85>
<header>
source and sender elements
</header>
<dflt> default elements </dflt>
<food>
<classif>
<ifiri> food record identifier </ifri>
other classification elements
</classif>
<fddflt> per-food default elements </fddflt>
<comp> food component data elements </comp>
<drvd-comp> derived food component elements </drvd-comp>
</food>
other food elements, starting in <food> and ending in </food>
</infoods>

While the <header> element is supplied once and not repeated, and the <dflt> is either omitted or supplied once, the first <food> element would ordinarily be followed by additional <food> elements, since it would be rare to transmit information about only a single food. All interchange files must adhere to this structure as outlined in the example above. (Again, line breaks are ignored in actual interchange; they are used in this book merely to enhance readability.)

OVERVIEW OF THE INTERCHANGE FILE PRIMARY ELEMENTS AND ELEMENT GROUPS

The Header

The <header> element of an interchange file provides information about the sender of the file and the source of the data. This information is critical in identifying the data for interpretation and for archival and tracking purposes. The <header> element is composed of two subsidiary elements, the <sender> element and the <source> element, each of which is composed of a number of required elements with several additional elements optional. The list of <header> elements and their definitions is inspired by the work of the INFOODS Committee on Terminology [33].

The Sender Subsidiary Element

The <sender> element of the header is composed of elements that identify the sender of the interchange file. This is the person or organization responsible for preparing the file at hand for transmission, not the person or organization responsible for the data values. The information in this element must be available to the receiver or user of the file to permit contacting the right person if there are problems with the organization of the data.

Required elements include those for name, organization, address, location or country of sender, postal code, and date of transmission of the file. While some of the information is redundant, the repetitions are important for sorting and classification purposes. Optional elements include those for additional information which is useful but not critical, such as the sender's title, electronic mail address, international telephone numbers (voice and fax), telex number, and cable code.

The Source Subsidiary Element

The <source> element of the header is composed of elements which identify the source of the data-typically a table or data base and compiler-being transmitted in interchange form. Only one data source is allowed per interchange file. Possible data sources may include food tables and other publications, nutrient data bases, laboratories, and so on. Optional elements include the address of the analytic lab if the source is a laboratory, the publisher's address for a literature source, or the ISBN number for a book.

The idea of a "source" involves several issues about what foods should be reported, or used, as a single "table" entity. It is most easily understood by analogy to the concept of data for a single food. The realities of chemical analysis and laboratory measurement make it improbable that nutrient values for a single analysis will all be from the same individual food item (e.g., the same apple), nor would we expect values derived from a single apple to have any special merit. Instead, one samples, homogenizes, and combines items to construct a laboratory sample [11]. The decision as to which apples are representative of "apple" or even of a particular cultivar and set of growing conditions is a substantive scientific one, and the criteria of "sameness" are neither trivial nor obvious.

While the <sender> element describes the origins of the interchange file, the <source> element describes the origin of the data values themselves. Information provided with <source> might be used to obtain additional scientific information about the data; information provided with <sender> is useful for technical problems with the interchange itself. In addition, <source> is expected to contain the information needed to reference the data in a publication that uses them. By contrast, <sender> would provide information for an acknowledgement of someone who had been particularly helpful.

The following is a complete sample <header> element:

<header>
<sender> <date> 1988.06.07
<fullname> Dr. J. D. Smith <fsnm> Smith
<orgz> EUROFOODS Regional Centre <addr/> Department of Human Nutrition <-> Agricultural University <-> De Dreijen 12 <->6703 BC Wageningen <-> The Netherlands </addr/>
<country> NL <postcode> 6703 BC
<title/> Coordinator of the Laboratory </title/>
<phone/> +31 83 70 8 25 89 </phone/>
<telex/> NL 45015 </telex/>
</sender>
<source>
<ref/> Souci, S.W., W. Fachmann, H. Kraut. Food Composition and Nutrition Tables, 1986/87. Stuttgart: Wissenschaftliche Verlagsgesellschaft mbH, 1986.
<pub/> Wissenschaftliche Verlagsgesellschaft mbH </pub/>
<isbn> 3-8047-0833-1 </ref/>
<addr/>
Postfach 40 <-> D-7000 Stuttgart 1 <-> Deutschland
</addr/>
<country> DE <postcode> D-7000
</source> </header>

The above illustrates the combination of elements that do and ones that do not require end-tags and elements nested within other elements. The special tag <-> is discussed under "Repeated and Counted Elements" starting on.

Defaults

Default values for each component, such as the unit of measurement expressed per 100 grams of edible portion of the food, are included in the definition of the food component element [17], which is part of its registration. Default values which apply to data in the entire interchange file are specified in the <dflt> element. Subsidiary elements to <dflt> must reflect the structure of the food component or per-food default to which they refer. For example, if the data values for total protein, calculated from total nitrogen, for every food in the file were calculated using the standard conversion factor of 6.25, the <dflt> element for the file would look like this:

<dflt> <comp> <PROCNT> * STD </PROCNT> </comp> </dflt>

<Comp> and </comp> appear here because the <dflt> is treated as occurring at the same level as the <food> element itself. Hence, <comp> must be used to indicate that the subsidiary information applies to the specific food components.

The "STD" indicates that the standard conversion factor was used for all values ("-") supplied for total protein, calculated from total nitrogen, in the file. See the definition of <procnt> for more information.

The <dflt> tag itself acts as a "macro", affecting the interpretation of food component information. The rules by which it is applied are discussed in Chapter 3. Unlike <header> (and <sender> and <source> ), <dflt> is optional and need not be supplied. If there are no default values, the element is omitted entirely.

The Foods

A <food> element contains the necessary classification information to properly identify a food, along with optional indicators of standard measures or other per-food defaults, followed by the actual nutrient data for that food. A <food> element consists of a maximum of four subsidiary elements:

where <classif> consists of information that identifies the data records and describes the food, <fddflt> identifies per-food defaults, <comp> contains the food component data (optional, but generally supplied), and <drvd-comp> contains the derived component data (optional, but often supplied depending on available data).

Classification Subsidiary Element

The <classif> element consists of the international food record identifier element, which is required, and any other classification elements necessary to identify the food for which data is provided. A very simple <classif> element that did not contain any food coding or classification information might look like this:

<classif> <ifri> ER.UK.M-W78.171 </ifri>
<bvname> Eggs poached </bvname> </classif>

In this example, the <classif> element is composed of the international food record identifier <ifri> element whose content identifies the food as that from the table classified as "EUROFOODS, United Kingdom, McCance and Widdowson 1978, Food Number 171", in this case, "Eggs, poached". The use of the <bvname> element indicates that the name is expressed in the ISO 646 basic character set.

Per-Food Default Subsidiary Elements

Default values which apply to data supplied for a single food may be specified in the <fddflt> element. For example, the <meas/> element supplies a denominator for data according to some common or standard measure. Such data may be provided instead of, or in addition to, data supplied according to the default measures registered for each nutrient. For example, in

<food>
<classif>
<ifri> NOA.USDA.HB8 4-78.09003 </ifri>
<FDA-FFV-8707> A143 B1245 C167 E150 F03 H003 J003 K03 M003 N03 P24 </FDA-FFV-8707>
<EUROCODE2> 10303 </EUROCODE2>
<bvname> apple, raw, with skin </bvname>
<bvname> malus sylvestris </bvname>
</classif>
<fddflt> <meas/> <-> piece <qty/> 150 </qty/>
<refuse/> 12 <cmt/> approx 8%; core and seeds considered inedible </cmt/> </refuse/>
<cmt/> approximately 3 per pound, 2.75 inch diameter </cmt/> </meas/>
</fddflt>
<comp> <NA> 3 <-> 7 </NA> </comp> </food>

sodium data is supplied first according to the defaults registered for the <NA> element, milligrams per 100 grams edible portion, and then according to the common measure, in this case, "piece", i.e., "per apple". The special character sequence "<-> " is used to separate multiple sets of values for the nutrient. It must appear with <fddflt> <meas/> as well as subsidiary to <NA> in order to specify that the special measurement applies to the second set of values, rather than the first set.

<Fddflt> is similar to <dflt> in that it is essentially an abbreviation indicator or macro with a specific range of applications. See the next chapter for details.

Food Component and Derived Food Component Subsidiary Elements

The <comp> and <drvd-comp> elements are composed of the elements for the distinct nutritive and non-nutritive components of the food. These typically consist of elements containing values (expressed in default units for the component) and, in certain cases, specific keywords from restricted lists to further identify or qualify the values or methods expressed. The initial set of generic identifiers subsidiary to <comp> and <drvd-comp> is specified in Identification of Food Components for INFOODS Data Interchange [17], and more may be registered as needed. A simple set of <comp> and <drvd-comp> elements might look like this:

<comp> <FE> 2.0 </FE> </comp>
<drvd-comp> <CHEMSC> 0.58 FAO73 </CHEMSC> </drvd-comp>

In this example, two data values are supplied for the food in question. The first value is for total iron, 2 milligrams per 100 grams edible portion [17]; the second value is for chemical score, 58% calculated using the 1973 FAO reference protein pattern [17].

DESCRIPTION OF THE DATA THEMSELVES

Most food composition tables contain, in addition to point values for each nutrient, some statistical description-typically a number of samples and a standard deviation or standard error-for them. While most tables contain mean values, other statistics about location are occasionally supplied. The requirement that the interchange system support the representation of any data that are available implies that it must be able to include any statistics that are available, and what those statistics mean. Statistical description of data is particularly important in interchange, where the receiver of a data file may need to assess the value for use in an unanticipated application or context, e.g., copying into a food data base for another country or imputing values for a similar food. A collection of optional elements are available for identifying which statistics are being reported and precisely identifying those statistics. Definitions of those elements, accompanied by an extensive discussion of the issues surrounding them.

SUMMARY

The Interchange System provides a format for each item of data required for the successful exchange of food composition information. Using the Interchange System's format of elements with uniquely assigned tags, an interchange file is readily interpretable both by machines and by people.

Explanations of semantic and syntactic conventions and detailed discussions of elements are found in the next four chapters.


Contents - Previous - Next