This is the old United Nations University website. Visit the new site at http://unu.edu
Decisions concerning selection of research site, "sampling universe," and sample size are often the most painful and difficult steps in the entire research process. How many cases are enough for meaningful statistical analysis? How broad a population base do we need for establishing significant generalizations? Decisions about sample size and shape are conditioned by economic, practical, and logistic concerns, as well as by theoretical and methodological considerations. In many cases the primary limitations are budgetary; larger samples require more money. The usual anthropological research methodology is based on intensive interviewing and observation of each "case." Even with adequate financing it is often logistically difficult to acquire such complex data for more than perhaps 200 households.
A major issue is the trade-off between depth of information on individual cases versus the extensiveness of coverage. Considering the number of different variables that must always be included in any field research design, even the most focused and specialized study typically requires the equivalent of an interview schedule of some 200 to 300 items. At least two to three hours of interview-observation time per household is a conservative estimate of the minimal amount of time needed for data-gathering. Even if researchers have the field forces to obtain this concentration of information for several hundred cases, the data analysis requirements also pose serious problems.
As an approximate rule of thumb suggested by some methodologists, a community-based sample should be at least 100 cases in order permit complex multivariate analysis. Even if one's research population happens to be quite small - say a village of 200 families - the goal of 100 cases is still recommended on statistical grounds. Multicommunity samples would ideally be larger - perhaps 75 to 100 per community.
The seemingly arbitrary use of 100 as a minimum sample size is important particularly in those many situations in which parametric statistics are used. Lazerwitz (1968, pp. 284-285) has commented as follows:
When can the sample mean be considered to fall on a sufficiently normal distribution? It is suggested that a minimum sample size of 100 should be obtained before any mean or proportion can be considered to have a sufficiently normal distribution. The same requirement of 100 should be applied to the size of subclasses. For both the total sample or any subclass, the requirement of 100 represents a conservative level and a skilled statistician can readily work with small samples. But let the semi-skilled seek the more cautious level of 100 cases.
This rule of thumb should not be interpreted to mean that statistical analysis is impossible or unnecessary with smaller samples. In many instances valid inferences can be drawn using small samples, though it is well to explore the possibilities of using non-parametric procedures when samples are quite small. In some instances statistical analysis of 15 to 20 cases can be a useful procedure. For bivariate assessment there are a number of non-parametric procedures including the Chi square, Mann-Whitney U tests, and Fisher's Exact Test. Degrees of association between variables can be examined using Spearman's rho, Kendall's tau, Lambda, and a number of other non-parametric tests of association (Freeman, 1965; Siegel, 1956).
Many researchers have noted the following important advantages of large samples:
First, the larger the sample the more likely are its mean values and other parameters to reflect accurately the sampling universe, provided that random sampling is employed. A random sample of 20 cases out of a thousand can easily be "off centre" in terms of the age distribution, types of households, and other parameters. Fifty or a hundred cases is statistically more likely to reflect the larger population accurately.
Second, as is well known, the larger the sample, the more likely that a "borderline" association between variables X and Y will be "statistically significant." That is, an association of .25 in a small sample may have a probability of .08, whereas exactly the same degree of association in a larger sample could have a probability of .04. Simple common sense might dictate that, wherever possible, one would seek to use large samples.
Some caveats about large samples should be considered, however. Paul Meehl (1967) has pointed out that the larger one's sample, the more likely it is that one will find statistically significant differences, even trivial ones. To demonstrate this idea, Meehl and associates examined a sample of 55,000 high-school students, testing for intercorrelations among all conceivable pair-wise relationships. They found "statistically significant" relationships in 91 per cent of all the pairs tested. Most of those correlations were rather small in actual magnitude, and probably reflected various extraneous, hidden variables, as well as errors in measurement.
In a similar vein, Hays (1963, p. 333) has stated that:
There is a real danger in detecting trivial associations as significant results when the sample size is very large. If the experimenter wants significance to be very likely to reflect a sizeable association in his data, and also wants to be sure that he will not be led by a significant result into some blind alley. then he should pay attention to both aspects of sample size.
Above all, nutritional anthropologists must seek meaningful theoretical models and practical, applicable data. The goals are seldom achieved simply through demonstration of a statistically significant correlation in and of itself. Associations among variables must be of sufficient magnitude to give them both theoretical and policy-shaping importance.
Given that relatively small samples are the usual pattern in anthropological research, investigators cannot afford to "waste" cases. In the first place, to ensure a sample of 100 for data analysis, it is usually necessary to initiate structured data-gathering with at least 120 cases, to allow for losses due to mobility, later refusals, and other factors. Every effort must be made to maintain rapport with individual households in order to avoid "drop-outs," and to maintain data quality control. In some long-term projects, in which research households are visited numbers of times over several months, it may be useful to have specially trained local "public relations persons" who periodically visit households to identify any complaints or problems and to smooth over the inevitable tensions and misunderstandings that arise in interactions between fieldworkers and local families.
In addition to those precautions, initial sample selection should ensure that every household or individual truly fits the criteria intended in data analysis. If a sample, for example, includes a small number of individuals of an ethnic group different from the one that constitutes the rest of the sample and if the sample is too small for separate statistical analysis, that complication in the sample should be eliminated. Similarly, in a study focused on infant feeding only those households with infants should be selected.
Selection of cases should be aimed to maximize variation in the key dependent variable of interest. For example, in a study of determinants of hypertension in St. Lucia (a population with generally high rates of hypertension), Dressier chose to focus on a sample of 40- to 50-year-old males, the category that previous research had indicated would have the highest variation in blood pressures (Dressier, 1979).
Different sampling strategies must be developed when a dependent variable or some aspect of research design involves a relatively infrequent condition in a population. For example, a study of food-use behaviours among diabetic people would be ill-served by simple random sampling, as the number of cases turned up would be small relative to the overall sample. Such research requires case finding as an early step, before sample selection. If the population is quite large (in a city, for example), a relatively large number of cases may be identified, from which a random sample of diabetics can then be drawn. In such research, a corresponding, matched sample of non-diabetics may be needed, if the logic of the research design involves hypotheses concerning the ways in which the diabetics are "different" in food-use patterns from the general population. Such a matched sample should correspond to the diabetic population as closely as possible in age, sex, neighbourhood, and socio-economic status, as well as other relevant variables when possible.
Nested Sampling Strategies
Whenever possible, researchers should seek to compromise between large and small samples by means of a strategy of "nested samples." The simplest example is one in which the researcher carries out a full census of all households in a target area, from which a smaller random sample is selected. The census makes possible very clear specification of the ways in which the random sample "fits with" or does not truly represent the larger universe. If a fullscale, 100 per cent census of a region is either impossible or impractical, a fairly ambitious survey, considerably larger than one's final sample, can be useful for describing certain broad features of the research population.
Farther along in the research process, it is often useful to define specific subsamples of one's research population for special attention. Within one's study sample, special attention may be focused on, and special research procedures may be used in, specific subsets, such as singleparent households and employed mothers. In some instances, special subsets of 15 to 20 cases may be the focus for especially intensive additional data-gathering.
Quite often, intriguing questions arise in the course of data analysis, for which callbacks to selected households might provide important additional explanations. Unexpected results in interview data always crop up, and these may be explored to some degree with further data from small subsamples of households. Such specialized follow-up probes gain strength if they represent a random subsample of the sample or perhaps a random sample of a subset, for example, all the women who work outside the home. Still another variation is to gather intensive ethnographic materials on a small series of cases that reflect major variations in the data set perhaps one each in large, medium, and small families, or two each in high-, medium-, and lowincome families, and so on. In a similar vein, if it is possible to gather biological and clinical data on a small sample of cases (individuals or households), such a special sample should, whenever possible, be drawn from the already well-studied sample so that researchers have maximum opportunities to relate the big-clinical variables to the broader spectrum of socio-cultural and economic data (see also Robbins et al., 1969).
In most societies, the household is the primary unit within which economic, social and cultural resources are organized and applied to human needs (Scrimshaw and Pelto, 1979). As anthropologists have become more quantitative in research approaches, the natural unit for sampling has generally been the household. Exceptions arise in research such as that dealing with special clinical populations and users of special facilities or programmes. However, even in those exceptions, researchers should bear in mind that households nearly always contain the most important micro-environmental factors affecting individual behaviours. This fact is especially important in nutritional anthropology, as households are practically by definition the prime units within which food is acquired, stored, shared, and consumed. Most research in nutritional anthropology requires a variety of household data, beginning with household composition and socio-economic status, but including a variety of other features. This is not to suggest that all action takes place in the household; some studies have been weakened through failure to examine important extrahousehold food patterns. None the less, when the focus is on individual behaviours and features, for example, in studies of malnourished children, lactating women, or diabetic patients, significant aspects of their food-use activities and beliefs are constrained within the household structure. Complex regional, national, and international processes and programmes have their ultimate expressions in individual consumption behaviours, carried into the household through contacts by household members at stores and other facilities, and affected by mass media that reach the household through television, newspapers, leaflets, and sometimes special contactpersons.
At the same time, all researchers nowadays should be sensitive to the ways in which forces from the larger economic, social, and political system make their impact on local behaviours. The focus on households should not be used to screen out the effects of broader system features. Rather, the challenge is to identify ways in which one can locate those broader "system effects" in their expression within households. Thus, researchers should expend effort to analyse linkages between micro-level phenomena of food behaviours and the macro-levels of food distribution systems and other forces in order to link the data of nutritional anthropology to broad, policy-relevant materials (Marchione, 1977).
Basic Variables in the Household
Certain household-level variables are so consistently important in affecting behaviours that they should be routinely included in variables lists regardless of the specific areas of topical focus. Measures of economic resources and socio-economic status are certainly indispensible, no matter how seemingly similar the economic levels in a region or community. DeWalt, for example, found wealth differences to be important in a small community of farmers in Mexico where all were supposed to have had access to land plots of the same size when land redistribution occurred (DeWalt, 1979).
The following partial list reviews some of the usual variables included in any reasonably comprehensive data base in which households are the research units:
This is only a very partial listing of frequently encountered factors in community research. Each of these suggested variables, or blocks of variables, will have applicability depending on special features found in the research communities. Careful ethnographic research is always essential for identifying locally relevant variables and for identifying specific indicators for measuring them.
In some instances researchers might respond to such lists with the comment that "I'm not interested in migration, wage labour, and these other factors." A caveat is in order: when certain variables or conditions are not part of one's theoretical interests, it is none the less essential that these seemingly extraneous variables be under some sort of descriptive control. It is, of course, impossible to include every relevant factor in one's research design, but to ensure overall credibility those major confounding variables can be controlled statistically, as well as explained and put in perspective through ethnographic observations.
One of the legacies of traditional anthropological research has been the persistence of "culture traits" or behavioural units that are portrayed as categorical or nominal in basic character. People either have them or they do not. Thus, households are "nuclear or extended" or sometimes "matrifocal"; residence is "neolocal, patrilocal, matrilocal or bifocal"; ethnic identity is a single categorical label; communities are either rural or urban. These typological features are still widely found in anthropological literature, and give rise to persistent debates, often because of the difficulties of fitting particular cases into the confines of the categories. The persistence of such "categorical thinking" in anthropology has sometimes led people to the view that cultural systems are qualitatively different from biological and physical systems.
In recent years, however, quantitatively oriented researchers have developed many scales and indices for key variables that can be expressed as ordinal values - even in instances that previously seemed very categorical in their nature. Socio-cultural system features such as matrilineal and patrilineal have been expressed as variations along complex dimensions (White, 1967), and it is now common to label cultural groups in terms of "degree of horticultural pursuits" or "per cent fishing activity" and so on. Degree of religiosity of households and individuals has been operationalized in the form of Guttman scales (Ness, 1976) as well as additive indices (DeWalt and Pelto, 1985). "Degree of acculturation" seems obviously a continuous variable, despite the tendency for some researchers to convert their data into typologies of "native-oriented," "dualistic," and other categories.
Padilla (1980) has measured "degree of ethnicity" among Mexican Americans, using a complex series of linguistic, behavioural, and other indicators. In a similar vien, Garcia and Lega (1979) have developed a scale of Cuban ethnic identity, using a series of items about food preferences, knowledge of national cultural heroes, and choices of favourite music.
Refinement of social, cultural, and economic variables as ordinal scales often provides stronger statistical analysis and avoids the embarrassments of trying to wrestle numerically with recalcitrant nominal categories. Scales and ratings of "nutritional knowledge," "dietary complexity," "adherence to traditional food patterns," and other cognitive components are also made more effective when they can be constructed as ranges of variation expressed by a series of items in an interview. These continuous variables then "work better" in relation to biological measures such as anthropometry and clinical observations.'
Procedure for Constructing Variables in Field Studies
The main steps in constructing any variable in the field are the following:
The emphasis in this paper has been on developing strong quantitative materials for statistical testing of research questions. It must be understood, however, that those data depend on strong qualitative ethnographic work, especially in connection with the search for effective, locally relevant data that can be transformed into quantified form. The kinds of research strategies suggested here are simply impossible to achieve unless they are developed through detailed ethnographic field-work.
In recent years a great deal of argument has been waged over the relative usefulness of so-called "emic" and "etic" approaches to data gathering. Conventionally, "emic" refers to variables and constructs that reflect locally defined meanings and categories, in terms of the "insiders' or natives' point of view. " Although the concept of emic data has sometimes been invoked to emphasize cultural uniqueness, the more usual (and useful) attitude is that local, culturally relevant meanings of events and behaviours can be ascertained through ethnographic field research and then transformed into more generalizable, cross-cultural terms. Thus, emic" language becomes the basis for more effective definition of our "etic" constructs.
Exploring a concept such as "degree of ethnicity" provides an apt example of the translation process (Garcia and Lega, 1979). Garcia and Lega explored a number of emically relevant items of Cuban identity in North American settings and isolated indicators such as:
These terms, they found, had special meaning for Cubans, as contrasted with other Spanishspeaking persons, in the United States. Individuals' responses to these questions could, therefore, be combined to form a scale of "Cubanness," in which persons who answered all the questions correctly were considered to be "more Cuban" than persons who lacked the special cultural information embedded in the items.
In a somewhat similar way, Johnson (1978) examined locally relevant definitions of soils and land types on a Brazilian plantation. The people classified lands into a variety of categories for which the dimensions of "strong-weak" and "hot-cold" were particularly relevant. After obtaining informants' statements about the appropriate crops to be planted in the different types of land, Johnson tallied actual plantings to check the fit between the emic definitions and actual behaviours. He was able to show that there was indeed a high degree of correspondence between types of lands and the stated "appropriate crops" planted. As with Cuban ethnicity, the transformation of "etic" data into quantifiable categories for statistical analysis was readily accomplished once the basic ethnographic work had been completed (Johnson, 1978).
In the area of nutrition and food behaviours researchers will often alternate between emic and etic approaches to data, particularly because dietary patterns and related food ideas must be initially understood in local, "native" terms. The chapter by Goode in this volume reviews the ways in which meal patterns, recipes, and other food data are likely to prove quite misleading unless qualitative ethnographic research is invested in the study of emic definitions and categories. The inventorying of locally available, wild and domesticated foods is another example of a research task that depends heavily on an emic approach (Messer, 1977). Once the emic categories of foods are sorted out, various methods for eliciting food intakes can be modified to facilitate transformations into etic analysis of dietary and nutritional components.
The aims of some research projects in nutritional anthropology may not correspond at all to the quantitative strategies outlined here. Some projects are focused on describing specific cultural features of food use, for which a comprehensive household analysis may be unnecessary. Detailed explorations of emic food domains or "cognitive maps" may be carried out with key informants in a manner that entails little in the way of comprehensive sampling across the community.
On the other hand, whenever researchers embark on hypothesis-testing research concerning food use and nutritional issues, the logic of research design, including definitions of variables and selection of adequate samples, must be considered. Often budgetary considerations will restrict research to smaller samples, or shorter time-periods than would be optimal, but these limitations in the scope of research should be considered carefully, with knowledge of the broader array of data needed for full effectiveness. Often small-scale projects can be designed in an open-ended manner, permitting later additions and expansions, including replication in neighbouring communities.
The research styles described here are quite eclectic and involve a mix of qualitative and quantified methodology. In addition to descriptive work at the community level, the ethnographic task should also include description of broader economic political networks that affect the community. The relationships of the local communities or regions to such national and international networks must be described because they help to shape food production, distribution, and consumption and because they help to link the local area to macro-level processes.
The linkages between local, micro-level food patterns and the macro-levels of commercial production and distribution systems require types of research that are as yet poorly developed in the social sciences. We often find excellent, detailed analyses of food use and nutrition pattenrs (micro-level) impressionistically, anecdotally linked to national-level processes. Thus, the carefully prepared local-level data are difficult to use effectively in policy making. Nutritional anthropologists, along with other social-science researchers, can strengthen the significance of community- and region-based research by developing new methodologies for analysis of the linkages, the systemic interrelationships, between micro-level and macro-level phenomena (DeWalt and Pelto, 1985).
Many research programmes of the 1980s place nutritional anthropologists in partnership with medical personnel, agricultural scientists, economists, and other types of specialists. One of the common denominators that aids interdisciplinary communication is the ability to relate social and cultural variables directly to the data and designs of other disciplines. Whereas in the past the anthropologists' task has sometimes ended with the provision of qualitative ethnographic descriptions, leaving the other disciplines to struggle with issues of quantification, it is more productive when each discipline is involved in quantified data in the overall design.
Practical, applied research on issues of food and nutrition involves the study of complex systems of interaction in which the variables and features selected for study always account for only a fraction (sometimes a small fraction) of the total variations of given dependent variables. Complex interactions among different kinds of variables - which long ago led to the anthropological credo of "holistic research" - produce serious difficulties for interpreting interactions, statistically and non-statistically. The complex indeterminacy of human behaviour in natural communities also leads to the conclusion that strong credibility of research results depends on a union of excellent qualitative ethnographic work and strong quantitative methodology.
NOTE
1. Some methodologists have maintained that ordinal data cannot be used in complex statistical analysis in association with biological (interval level) measures, because they do not meet the assumptions required of interval data. On the other hand, a number of statisticians have argued the opposite - that ordinal measures can be used, with appropriate caution, in multiple regressions, path analysis, and other complex statistical analyses (Namboodiri, Carter, and Blalock, 1975). It is also pointed out that the supposedly precise biological data in field studies (e.g. caloric intakes) are often based on observational techniques with no more precision than that used for the socio-cultural measures.
Asher, H. 1976. Causal Modeling. Sage Publications, Beverly Hills, Calif.
Baer, R. 1984. Nutritional Aspects of Commercial Agriculture. Paper presented at annual meeting of the American Anthropological Association, Denver, Colorado, 1984.
Brush, S. 1978. Mountain, Field and Family: Economy and Human Ecology of an Andean Valley. University of Pennsylvania Press, Philadelphia, Pa.
Campbell, D. T., and J. C. Stanley. 1963. Experimental and Quasi-experimental Designs for Research. Rand-McNally, Chicago, III.
Cook, T. D., and D. T. Campbell. 1979. Quasi-experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin, New York.
DeWalt, B. R. 1979. Modernization in a Mexican Ejigo. A Case Study in Economic Adaptation. Cambridge University Press, Cambridge, Mass.
DeWalt, B. R., and P. J. Pelto. 1985. Micro and Macro Levels of Analysis in Anthropology: Issues in Theory and Research. Westview Press, Boulder, Colo.
DeWalt, K., and G. H. Pelto. 1977. Food Use and Household Ecology in a Mexican Community. In: T. K. Fitzgerald, ea., Nutrition and Anthropology in Action. Van Gorcum, Amsterdam.
Dewey, K. G. 1981. Nutritional Consequences of the Transformation from Subsistence to Commercial Agriculture in Tabasco, Mexico. Human Ecol., 9(2): 151-187.
Dressler, W. 1979. Disorganization Adaptation and Arterial Blood Pressure. Medical Anthropol., 3(2): 225-248.
Freeman, L. C. 1965. Elementary Applied Statistics for Students in Behavioral Science. Wiley, New York.
Garcia, M., and L. Legal 1979. Development of a Cuban Ethnic Identity Questionnaire. Hispanic J. of Behav. Sci., 1(3): 247-262.
Harris, M. 1968. The Rise of Anthropological Theory. Crowell, New York.
Hays, H. R. 1963. Statistics for Psychologists. Holt, Rinehart, & Winston, New York.
Hull, C. H., and N. H. Nie. 1979. SPSS Update: New Procedures and Facilities. McGraw-Hill, New York.
Johnson, A. W. 1978. Quantification in Cultural Anthropology. An Introduction to Research Design. Stanford University Press, Stanford, Calif.
Kandell, R., and G. H. Pelto. 19#0. The Health Food Movement: Social Revitalization or Alternative Health Maintenance System? In: N. Jerome, R. Kandell, and G. Pelto, eds., Nutritional Anthropology. Redgrave, Pleasantville, N.Y.
Kaplan, A. 1964. The Conduct of Inquiry. Chandler Publishing Co., San Francisco, Calif.
Kerlinger, F. N. 1973. Foundations of Behavioral Research. Holt, Rinehart, & Winston, New York.
Klecka, W. R. 1980. Discriminant Analysis. Sage University Paper, 19. Sage Publications, Beverly Hills, Calif.
Langbein, L., and A. J. Lichtman. 1978. Ecological Inference. Sage University Paper, 10. Sage Publications, Beverly Hills, Calif.
Lazerwitz, B. 1968. Sampling Theory and Procedures. In: H. M. Blalock Jr and A. B. Blalock, eds., Methodology in Social Research. McGraw-Hill, New York.
Marchione, T. 1977. Food and Nutrition in Self-reliant National Development: Impact on Child Nutrition of Jamaican Government Policy. Med. Anthropol. 1(1): 57-79.
Meehl, P. 1967. Theory-testing in Psychology and Physics: A Methodological Paradox. Phil. Sci., 34(2) 103-115.
Messer, E. 1977. The Ecology of Vegetarian Diet in a Modernizing Mexican Community. In: T. K. Fitzgerald, ea., Nutrition and Anthropology in Action. Van Gorcum, Amsterdam.
Muņoz de Chavez, M. 1974. The Epidemiology of Good Nutrition. Ecology of Food and Nutrition, 3: 223230.
Namboodiri, N. K., L. F. Carter, and H. M. Blalock Jr. 1975. Applied Multivariate Analysis and Experimental Designs. McGraw-Hill, New York.
Ness, R. 1976. Illness and Adaptation in a Newfoundland Outport. Ph. D. dissertation. University of Connecticut, Storrs, Conn.
Padilla, A. M., ed. 1980. Acculturation Theory, Models and Some New Findings. Westview Press, Boulder, Colo.
Pelto, G. H. 1984. Report of an Ethnographic Study concerning the Determinants of Infant Feeding Patterns in Northern Cameroon. Consultant Report. Educational Development Center, Newton, Mass.
Pelto, P. J., and G. H. Pelto. 1978. Anthropological Research: The Structure of Inquiry, 2nd ed. Cambridge University Press, Cambridge.
Robbins, M., A. V. Williams, P. L. Kilbride et al. 1969. Factor Analysis and Case Selection in Complex Societies: A Buganda Example. Human Organiz., 28: 227-234.
Rogers, E., and L. Svenning. 1969. Modernization among Peasants: The Impact of Communication. Holt, Rinehart, & Winston, New York.
Russell, B. 1953. On the Notion of Cause, with Applications to the Free Will Problem. In: H. Feigl and M. Brodbeck, eds., Readings in the Philosophy of Science. Appleton, Century, Crofts, New York.
Scrimshaw, S., and G. H. Pelto. 1979. Family Composition and Structure in relation to Nutrition and Health Programs. In: R. Klein, H. Read Riecken et al., eds., Evaluation of the Impact of Nutrition and Health Programs. Plenum Press, New York.
Siegel, S. 1956. Non-parametric Statistics for the Behavioral Sciences. McGraw-Hill, New York.
Susser, M. 1973. Causal Thinking in the Health Sciences. Oxford University Press, Oxford.
White, D. 1967. Concomitant Variation in Kinship Structure. M. A. thesis. University of Minnesota, Minneapolis, Minn.