This is the old United Nations University website. Visit the new site at http://unu.edu


Contents - Previous - Next


<loctype>
The <loctype> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and immediately follows the data value associated with <specific component> or <specific derived component> elements. It is used to specify the exact meaning of the "best estimate of location" for its associated food component (i.e., the <comp> or <drvd-comp> element to which it is subsidiary). "Loctype" may be thought of as an abbreviation for "location type" or "type of location estimate".

Description

Only the start-tag is permitted. The content consists of an unformatted string (whose first, and usually only, "word" is a keyword) and terminates when another tag is encountered. The keywords represent names of location statistics. If this element is omitted, most data base users will infer that the food component value represents a mean value. Use of <loctype> with "mean" reinforces and confirms that belief when that is appropriate; use of <loctype> with another value indicates that the mean was not considered to be the best estimate of location.

Format

The content of <loctype> consists of one member of the following list. The keywords in this list correspond exactly to the elements for specific location statistics that appear starting on the next page; the two lists will be expanded in parallel, and the correspondence between the generic identifier for the location elements and the keyword below, and between qualifying parts of the element and the additional information below, will be preserved.

KEYWORD
mean
median
tmean N M
locpctl N

Example

<comp> <ash> 0.69 <loctype> mean </ash>
<enerc> 354.027 FDS </enerc>
<procnt> 17.270 USDA 6.38 c loctype> mean </procnt> </comp>

The use of <procnt> here illustrates a case in which the primary information associated with a generic identifier is more complex than a single numeric value. As mentioned in the first paragraph above, the data description information follows all of the information that is directly associated with the <comp> "tagname" [17]. For <procnt>, this information consists of three values: the estimate of location, a keyword that specifies the source of the conversion factor used, and the actual conversion factor. " <Loctype> mean" specifies that 17.270 is the mean, not 6.38, which is just a conversion factor.

 

<mean>
The <mean> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify a mean value for its associated food component (<comp> or <drvd-comp> element) that is not the best estimate of location.

Description

Only the start-tag is permitted. The content consists of a single floating-point value representing the estimated or sample mean value for the component. Typically, this element would be used only if the best estimate of location differed from the mean.

Format

The content of <mean> is a single floating-point value representing the mean.

Examples

<enerc> 325 FDS <mean> 354.2 </enerc>

This would normally be interpreted as implying that the table compiler believed that the value 325 provides a better estimate of the total energy available than the actual mean. One would hope to find an associated <cmt/> element explaining this situation or a <loctype> element that explains how the value of 325 was derived. For example, we might see:

<enerc> 325 <loctype> median <mean> 354.2 </enerc>

This suggests a conclusion that the median provides a better estimate than the mean, but that the mean value is reported for comparison purposes.

<fat> 0.59 <loctype> pctl 80 <mean> 0.42 </fat>

This would normally be interpreted as implying that the table compiler wished to report a nominal 80th percentile value, 0.59, as a better estimate of the total fat present than the mean of 0.42. As above, one would hope to find an associated <cmt/> element explaining this situation.

 

<median>
The <median> element is an immediate subsidiary of <specific c component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify a median value associated with its food component (<comp> or <drvd-comp> element) that is not the best estimate of location. We take the median to be the midpoint in a sorted list consisting of any sample with an odd number of values and the arithmetic mean of the middle two values if the sample size is even. Other definitions should be described with a <cmt/> element.

Description

Only the start-tag is permitted. The content consists of a single floating-point value representing the estimated or sample median value for the component.

Format

The content of <median> is a single floating-point value representing the median.

Examples

<enerc> 354.2 FDS <median> 325 </enerc>
<fat> 0.59 <loctype> pctl 80 <median> 0.42 </fat>

This would normally be interpreted as implying that the table compiler wished to report a nominal 80th percentile value, 0.59, as a better estimate of the total fat present than the (possibly unknown) mean and that the median value was 0.42. One would hope to find an associated <cmt/> element explaining this situation.

 

<locpctl>
The <locpctl> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify a percentile-like value that is not the best estimate of location.

<Locpctl> is a somewhat dubious location statistic, not a "percentage point" value used to describe the distribution (see <pctpts>). In some countries, nutritional labels on food packaging for certain food components, are required to show not a mean value, but a value such that a certain percentage of the packages or portions will contain "at least that much" or "no more than that much" of the food component (depending on whether it is considered desirable or undesirable). This type of reporting requirement makes the actual value given at least as much a property of a manufacturer decision as one of sample data: "safety margins" may be included for possible future alterations in the recipe or to provide extra protection against accusations of non-compliance labelling.

While this type of information may be useful to the consumer, it would ideally never appear in a food composition table, since it is impossible to compare even approximately with, e.g., mean values. In the large sample case, we assume that the mean converges on the "the centre point" (the median or the 50% point), while this type of value would have its percentage point as a lower bound. However, since, for several reasons, it is not unusual for these values to appear in food composition tables, this element is provided to identify them.

"Locpctl" may be thought of as an abbreviation for "location percentile".

Description

Only the start-tag is permitted. The content consists of two floating-point values. The first represents the estimate and the second represents the percentile chosen. This element gives an estimate of location when a particular percentile value has special (e.g., regulatory) meaning to the data base compiler. A separate element, <pctpts>, should be used to list various percentage points as a means of describing the distribution of the data.

Format

The content of <locpctl> is a pair of floating-point values representing the value and the percentage point with which it is associated. The second value is expressed as a percentage, not a fraction, since that is the usual form of the specification.

Example

<fat> 0.42 <locpctl> 0.59 80 </fat>

 

<tmean>
The <tmean> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify a trimmed mean value that is not the best estimate of location. "Tmean" may be thought of as an abbreviation for "trimmed mean".

Description

Only the start-tag is permitted. The content consists of three floating-point values representing the estimated or sample trimmed mean value for the component, the lower trimming fraction, and the upper trimming fraction. A fraction is used since all of the literature on trimmed means appears to use fractions. Since the normal practice is to trim symmetrically, the second and third values will typically be the same.

Format

The content of <tmean> is three floating-point values representing the trimmed mean, the lower trimming fraction, and the upper trimming fraction.

Example

<enerc> 354.2 FDS <tmean> 325 0.10 0.10 </enerc>

 

<smsz>
The <smsz> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify the effective sample size (sometimes called "N" or, confusingly, "sample population" or "pop") associated with the statistical estimates.

By "effective sample size" we mean the sample size after informal data cleaning and similar processes (which might be described with the <sclean/> element), that is, the sample size used in computing the other statistics reported. If outliers are removed from the data as part of a cleaning process and, e.g., a mean is reported that reflects the smaller data set, the <smsz> element should show the sample size with the outliers already removed and the value should be reported with <mean> or " <loctype> mean". However, if a fractional trimming process is used instead of subjective outlier elimination, <smsz> should show the sample size before trimming and <tmean> should be used to express the trimmed mean and the trimming fractions. "Smsz" may be thought of as an abbreviation for "sample size".

Description

Only the start-tag is permitted. The content consists of a single integer value representing the effective sample size for the statistical estimates. If the statistics represent "trimmed" values, the <smsz> element represents the sample size before trimming.

Format

The content of <smsz> is a single integer value representing the sample size.

Example

<procnt> 17.27 USDA 6.38 <serr> 0.5085 <smsz> 24 </procnt>

This would indicate that the mean and standard error values for protein were calculated from 24 samples.

 

<sdv>
The <sdv> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify the sample estimate of the population standard deviation value (i.e., with a denominator of N-1). "Sdv" may be thought of as an abbreviation for "standard deviation".

Description

Only the start-tag is permitted. The content consists of a single floating-point value representing the population standard deviation value for the component.

Format

The content of <sdv> is a single floating-point value representing the estimated population standard deviation.

Example

<CA> 9.4 <loctype> mean <sdv> 1.6 </CA>

The estimate of location is specifically identified as the mean in this example.

 

<serr>
The <serr> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify a standard error value for the food component. "Serr" may be thought of as an abbreviation for "standard error".

Description

Only the start-tag is permitted. ,The content consists of a single floating-point value representing the standard error value for the component.

Format

The content of <serr> is a single floating-point value representing the standard error.

Examples

<mg> 4 <serr> 0.5 <smsz> 17 </mg>
<thia> 0.025 <smsz> 15 <serr> 0.0032 </thia>

 

<jserr/>
The <jserr/> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify a non-parametric estimate of standard error based on data resampling [22, 6]. "Jserr" may be thought of as an abbreviation for "jackknife standard error".

Description

Both start-tag and end-tag are required. The content consists of a single floating-point value representing the standard error value, obtained by jackknifing, for the component and an optional <cmt/> element. The <cmt/> element should be used to describe special circumstances or assumptions associated with the jackknife procedure, e.g., grouping and the number of groups.

Format

The content of <jserr/> is a single floating-point value representing the standard error and an optional <cmt/> element.

Examples

<mg> 4 <jserr/> 0.5 </jserr/> <smsz> 17 </mg>

<thia> 0.025 <smsz> 5000 <jserr/> 0.0032 <cmt/> Jackknife on ten groups
</cmt/> </jserr/> </thia>

 

<cnfi>
The <cnfi> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify the confidence interval for the estimate of location of the food component.

The comprehensibility and usefulness of confidence intervals tends to be fairly low when calculated in conjunction with the fairly small sample sizes typical of food composition data. Also, while a confidence interval can, in principle, be computed for any statistic and may, as provided for here, be asymmetric, most readers will tend to construe it as a two-sided symmetric estimate for the mean. Other applications or situations should, if possible, be described with <cmt/> elements, or, preferably, other statistics and elements should be used.

"Cnfi" may be thought of as an abbreviation for "confidence interval".

Description

Only the start-tag is permitted. The content consists of three floating-point values representing respectively the lower confidence bound, the upper confidence bound, and the probability value, expressed as a fraction, for which the confidence interval is computed.

Format

The content of <cnfi> consists of three floating-point values representing the confidence interval and associated probability.

Example

<p> 16.0 <sdv> 2.2 <cnfi> 11.6 20.4 0.95 </p>

This food would have a phosphorus value between 11.6 and 20.4, with p=0.95.

 

<detect-lvl>
The <detect-lvl> element is an immediate component of <specific component> and <specific derived component> elements. It is used to identify the detection level of the instruments or method used to determine a particular value. It may be supplied for general information; its use is strongly recommended when a value is reported as "TR" (i.e., a trace), since a trace with one method might be a measurable value with another.

Description

Only the start-tag is permitted. The content consists of a single floating-point value in the same units as the estimate of location (for <specific component> elements) or the value (for <specific derived component> elements).

Format

The content of <detect-lvl> consists of a single floating-point value.

Example

<na> TR <detect-lvl> 0.005 </na>

The food contained a trace of sodium, but no amount below 0.005 mg could be detected and measured by the method and instruments in use.

 

<sclean/>
The <sclean/> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It specifies the methods used to "clean" the data and their implications. "Sclean" may be thought of as an abbreviation for "sample cleaning".

Description

Both start-tag and end-tag are required. The content consists of elements only, there is no immediate data. The subsidiary elements include <cmt/>; other subsidiary elements will be defined in the future.

Format

The content of this element is an optional <cmt/> element and additional elements that will be defined as they are needed.

Example

<enerc> 354.2 FDS
<sclean/> <cmt/> outlier values eliminated by inspection </cmt/> </sclean/>
</enerc>

 

<edistr/>
The <edistr/> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It is used to specify the empirical distribution for the value of the food component. "Edistr" may be thought of as an abbreviation for "empirical distribution".

Description

Both start-tag and end-tag are required. The content consists entirely of elements; there is no immediate data. Some elements are defined in this document; others will be added in the future. The defined subsidiary elements are <cmt/>, <bounds>, <mdbds>, <sum7>, and <pctpts>. Ordinary estimates of variation and estimates of the population distribution, e.g., the estimated population standard deviation or the confidence interval, are included as elements immediately subsidiary to the <specific component> or <specific derived component>, not as elements subsidiary to this element.

Format

The content of <edistr/> element consists of elements that describe the distribution of the data values. See the description of the subsidiary elements for examples.

 

<sdistr/>
The <sdistr/> element is an immediate subsidiary of <specific component> elements. More specifically, it is a component of <data description> and follows the data value (and any <loctype> element, if present) associated with <specific component> or <specific derived component> elements. It specifies detailed information about the subjective distribution for the value of the food component or, more specifically, beliefs about that distribution. "Sdistr" may be thought of as an abbreviation for "subjective distribution".

Description

Both start-tag and end-tag are required. The content consists entirely of elements; there is no immediate data. The defined subsidiary elements include <cmt/>; other elements will be specified in the future.

Format

The content of <sdistr/> consists of elements that describe subjective beliefs about the distribution of the data values. Until specific elements are defined, <cmt/> should be used with a free text description.

 

<bounds>
The <bounds> element is an immediate subsidiary of the <edistr/> element, used to list the minimum and maximum values encountered in the sample. The bounds are often erroneously called the "range", which is really the difference between the upper and lower bound.

Description

Only the start-tag is permitted. The content consists of two floating-point values representing the minimum and maximum values encountered in the sample data. These values can be misleading when reported for small samples and confused with actual minimum and maximum values in the population [24], so one of the distribution reports (described in the pages that follow) that provides more information and more obviously reflects the sample is to be preferred when adequate data are available.

Format

The content of <bounds> consists of two floating-point values representing, in order, the minimum value and the maximum value of the sample data.

Example

<NA> 50 <loctype> mean <edistr/> <bounds> 35 90 </edistr/> </NA>

 

<mdbds>
The <mdbds> element is an immediate subsidiary of the <edistr/> element, used to list the median, hinges, and bounds of the distribution of the data.

Description

Only the start-tag is permitted. The content consists of five floating-point values representing the bounds, hinges (robust estimates of the quartiles), and median of the sample data [34]. Since the data in <mbds> are a subset of those represented by <sum7>, <mdbs> should be omitted if there are sufficient data to include <sum7> .

Format

The content of <mdbds> consists of five floating-point values representing, in order, the minimum value, the lower hinge, the median, the upper hinge, and the maximum value of the sample data. These values, also known as a "five-number summary" can be used to summarize the distribution of the sample data.

Example

<mdbds> 72 79 86.5 90 92

 

<sum7>
The <sum7> element is an immediate subsidiary of the <edistr/> element, used to list the median, fences, and bounds of the distribution of the data. "Sum7" may be thought of as an abbreviation for "seven-number summary".

Description

Only the start-tag is permitted. The content consists of seven floating-point values representing the bounds, fences, hinges, and median of the sample data [34].

Format

The content of <sum7> consists of seven floating-point values representing, in order, the minimum value, the lower fence, the lower hinge, the median, the upper hinge, the upper fence, and the maximum value of the sample data. These values, known as a "seven-number summary" can be used to summarize the distribution of the sample data.

Example

<sum7> 72 78 79 86.5 90 90.2 92

 

<pctpts>
The <pctpts> element is an immediate subsidiary of the <edistr/> element. It specifies the percentage points of the actual distribution of the sample data for some food component. For a given percentage point, the data value provided is such that the percentage of the data shown is smaller than the data value. See <locpctl> for a discussion of a slightly related location statistic. "Pctpts" may be thought of as an abbreviation for "percentage points".

Description

Only the start-tag is permitted. The content consists of ordered pairs of values, where the first member of each pair represents the percentage value, expressed as a fraction, and the second one represents the data value for that population percentile. The statistical literature is not consistent about whether percentage points should be expressed as percentages or fractions. We have chosen fractions for this element in the hope that they will make table or data base checking slightly easier. By convention, the 0 and 100% (1.0) percentage points are rarely reported, but, if they are, they are respectively the minimum and maximum values that appear in the sample.

Format

The content of <pctpts> is a sequence of floating-point values representing the percentage value (as a fraction) and the data value at that percentile. In other words, the format is

<pctpts> percentile1 datavalue 1 percentile2 datavalue2 ...

Example

<pctpts> .60 .38 .80 1.25 .95 2.44


Contents - Previous - Next