12. Data recording and processing

Michael Guzman, Ricardo Sibrian, and Rafael Flores

Data recording
Data processing
Concluding remarks


In contrast with laboratory investigations which commonly give rise to relatively few observations, large-scale nutrition intervention programmes require the collection, orderly handling and management of large quantities of data.

Since the data ultimately constitute the link between the design of the intervention and the evaluation of results, its management and handling clearly merit careful consideration. In this context, the procedures required for the collection of data and its subsequent treatment, which include the definition of the plan for analysis and expected outputs, should be an integral part of the study design. Thus, such procedures should be explicitly defined in the standard operating protocol (SOP) of the study.

Because of the highly-specialized nature of the skills required for both proper data management and analysis, it is advisable that specialists in these fields are included as part of the evaluation staff. Under such an arrangement, these specialists fully participate in the planning and execution of the evaluation.

Some basic procedures relating to various aspects of data recording and processing are described in this chapter, Although the coverage is neither complete nor exhaustive, it is hoped that the topics considered provide some general guidelines which may be useful as a frame of reference for identifying appropriate data management procedure under the specific set of circumstances of a particular study.

The processes to be described can best be summarized in terms of a gross flow chart diagram, as illustrated in figure 12.1. (see FIG. 12.1. Stages of Data Recording and Processing). Obviously, the different stages depicted here on a macro basis can, and must be, expanded in detail in accord with conditions pertaining to any specific investigation. Two examples of such expansion are presented later in the text in connection with the preparation of forms and questionnaires and the description of the sequence of events that relate to the process of data analysis.

Data recording

The general purpose of data recording is to set in writing and assure the preservation of the data collected in the course of field or laboratory studies.

The experimental design of each study determines the types of data to be collected in terms of the objectives and resources available for the study. The types of data commonly used in field studies, among others, often relate to morbidity, anthropometry diet. immunology and anthropology. Whatever the nature of the types of data, however, there is need for suitable forms or questionnaires to record the information to be gathered. It is often convenient to prepare these forms or questionnaires by discipline or type of data. The use of precoded forms or questionnaires that permit the direct registry of data is to be preferred, since with proper training, their use often results in fewer errors. Additionally, only one protocol or set of forms will be used to collect and code the information to be recorded in the field or in the laboratory for each unit of study (e.g., family or individual).


Form or Questionnaire Preparation

The objective of this stage is to produce all the needed forms and/or questionnaires in their final versions, as they will be used in the field or laboratory. These forms and questionnaires must be accompanied by a set of detailed instructions explicitly set out in a coding manual. In general, three steps are involved in the preparation of forms or questionnaires which comprise a series of coordinated actions as shown schematically in figure 12.2. (see FIG. 12.2. Coordinated Actions in the Preparation of Forms and Questionnaires for Data Recording).

The forms and questionnaires contain the information needed by both the investigator and the data processing personnel, and generally consist of two parts: a heading and a body.

The heading of the forms or questionnaires includes information needed mainly to prepare appropriate data files in accord with the objectives of the study as defined by the responsible investigator. The heading, however, may also include information to allow subject recall by the investigator, either for further interviewing or for checking the original recordings. Clearly, the kind of items in this part of the form or questionnaire varies with the nature of the study, but generally must include information of the type specified in the first 14 items in the scheme suggested in figure 12.3. (see FIG. 12.3 Flow Chart of Actions Generally Required in Data Analysis).

In table 12.1., the body of the form begins at item 15 and includes the actual data and information required to satisfy the objectives of the study. As many fields and digits as are necessary to complete recording may be used in the body of the form. However, it is always advisable to consult with the personnel that will be responsible for the data processing and analysis to avoid problems related to data management.

TABLE 12.1. Sample Questionnaire Form


Item Identification

Field Position

1 General information (i.e. Protocol page number) Open (not for coding)
2 General information (i.e. Name of subject) Open (not for coding)
3 General information (i.e. Address of subject) Open (not for coding)
4 Study identification 1-3
5 Area (Type of data) 4-5
6 Form identification 6-7
7 Date 8-13
8 Examiner/lnterviewer identification 14-15
9 First level of enquiry (Country) 16-17
10 Second level of enquiry (Community) 18-19
11 Third level of enquiry (Family) 20-21
12 Individual identification 22-25
13 Sex 26
14 Birth date 27-32
15 Data 33-
. . .
. . .
. . .

Some general comments about the heading or identification portion of the data record are in order. Each study and type of data or area should be assigned a code. For each type of data there may be as many forms as needed for complete recording, and therefore, each form also requires the assignment of a proper code identification. Since the study sample generally relates to country, region, community or similar geographic location classes, these items also must be identified a priori with specific codes. The data processing personnel, in the computer center or elsewhere, who will be responsible for handling the data for a given study should collaborate with the investigator in the assignment of these codes since, as stated earlier, these will be used mainly to organize and control the files and expected outputs within the system of data processing.

With the above information, the investigator will elaborate a first version of the forms or questionnaires and a first version of the corresponding coding manual. In particular, the coding manual must provide specific answers to the following questions:

  1. How is the form or questionnaire to be filled?
  2. How is each item included in the form or questionnaire to be coded?

Once the researcher has developed the first version of the forms and questionnaires, the next steps involve the application of procedures for testing and revising the original drafts. For this purpose the investigator will use a small sample (10-20 experimental units) to actually carry out the complete process of data collection; in the process the investigator will check all forms and questionnaires for ease of handling and use under field conditions. The adequacy of instructions and codes in the actual process of recording data also will be tested at this time.

The field tests will permit proper adjustments and improvement of the recording forms and accessory materials, prior to preparing them for production in sufficient volume to satisfy the needs of the study. The investigator must also consult with the personnel responsible for processing data prior to producing the definitive versions of the forms and questionnaires to be used in the evaluation. In the particular case of questionnaires, their reliability should be scrutinized using appropriate test-retest procedures (1). The testing required for developing the forms and questionnaires offers the opportunity to include activities related to the training and coordinating of examiners and interviewers. Otherwise, the training and standardization procedures must take place later, but always prior to the initiation of actual data collection (2, 3).


Data Collection

Data collection can be initiated when the personnel responsible for data collection have been properly trained and have reached a satisfactory level of standardization. In addition, forms, questionnaires and coding manuals must be considered operational. The description of recording forms, and the techniques and procedures to be employed should be integrated into a standard operating protocol (SOP) for the evaluation (2). In the course of long term studies, changes in procedure may be mandatory. Accordingly, it is advisable to produce the SOP in a loose leaf form for ease of insertions as may be required. In this connection. however, it is essential that all changes introduced in the course of the evaluation be fully documented in terms of justification, nature of the change and date of implementation.

Several types of errors may arise during the data collection stages which may produce biases affecting the interpretation of results. These errors are generally associated with failure to complete interviews, missing data, interviewer mistakes, and conceptual misunderstandings, lack of knowledge, and intentional misrepresentations of truth by the respondents. To minimize the effects of these factors or conditions, special attention must be given to proper supervision throughout the data collection stages. Emphasis shall be placed on correct household selection, formulation of questions, recording of answers and the application of proper follow-up procedures to reduce non-reponses. Supervision can take place either through direct observation by field supervisors and/or by actual live recording of the interviews (4). In any case, full documentation of the execution of all aspects and levels of activity is essential. This includes field procedures, and data collection, editing, input and analysis. In particular, causes of missing data must be fully documented, since such information is essential for identifying possible biases arising from sample attrition.



This stage can be initiated even before the actual collection of data. For example, some items in the heading of the form can be precoded using computer facilities. Computers may also be used to produce the self-printed forms which contain information on the types of data to be collected, the geographic classification (country, community) and the observation unit (family, individual). More generally, however, forms and questionnaires are coded after data collection. In such a procedure, it is advisable that the coding be completed as soon as possible, preferably on the same day that the data were collected.

