## Analysis of anthropometric variables

Testing for normality

Anthropometric characters tend to be continuous and many tests are constructed on the assumption that the data approximate to a normal distribution. An easy way of seeing whether the distribution is skewed is to compare the values of the mean and median. For normal distributions the mean and median are numerically identical. As the distribution becomes more skewed, the difference between mean and median increases. There are a number of statistical tests available for testing 'normality' and the researcher may well get different results depending on which test is used. For example the Kolmorogov-Smirnoff test examines the cumulative distribution, which conflates skewness and kurtosis, while the Cox test determines the extent of skewness and kurtosis separately. Since skewness is more constraining than kurtosis the Cox test is preferable. Nevertheless significant skewness and/or kurtosis may occur with large samples even though the magnitude of the effect (s) is very small.

Table 1. Mean BMIs of mothers by birth outcome

 Birth outcome n Mean SD Child died 345 20.36 2.66 Child survived 3805 21.25 2.68 Total 4105 21.18 2.69

F-test = 1.02, not significant.
t-test= 5.88, P < 0.001.

If the distribution of an anthropometric character does show significant skewness then a simple logarithmic (either log10 or loge) transformation will probably normalize the distribution. For instance body mass index (BMI: kg/m2) has been shown to show skewness in some populations because of the extended tail at the upper end of the distribution.

Table 2. One-way Analysis of Variance and a posterior) test

 Educational level and BMI in Bangladesh 0 1 2 3 None Primary Secondary Tertiary Mean 20.32 20.79 21.41 22.24 n 1182 698 1355 915 • Analysis of variance Source d.f Sum of squares Mean squares F ratio P Between groups 3 2079.1 693.20 102.37 <0.0001 Within groups 4146 28076.4 6.77 Total 4149 30155.75

• Multiple range test: Student-Newman-Keuls procedure
*Denotes pairs of groups significantly different at the 0.050 level

 Mean Group 0 1 2 3 20.32 Group 0 20.79 Group 1 * 21.41 Group 2 * * 22.24 Group 3 * * *

Cross-sectional statistical analyses

To illustrate the types of tests which can be used, data from a large Bangladeshi survey of 4150 mother-child pairs in which mothers' anthropometric data were related to birth outcome have been used. The study was conducted in 10 medical centres in Bangladesh and all the women were full term. Mothers with antepartum haemorrhage, or undergoing miscarriage and abortion, multiple pregnancy, eclampsia or with gross fetal abnormalities were excluded.

Table 3. Analysis of variance of BMI by educational level and gravidity

 Cell means: Total population 21.18(n = 4150) Education level 0 1 2 3 20.32 20.79 21.41 22.24 (1182) (698) (1355) (915) Gravidity 0 1 2 3 4 5+ 20.98 21.44 21.29 21.35 21.38 20.95 (1882) (1013) (604) (349) (122) (180) Gravidity Educational level 0 1 2 3 4 5+ 0 20.01 20.62 20.29 20.40 21.23 20.29 (429) (246) (189) (151) (62) (105) 1 20.52 20.66 20.74 21.57 21.19 21.90 (286) (163) (116) (67) (29) (37) 2 21.16 21.51 21.76 22.00 22.16 21.57 (681) (346) (174) (101) (22) (31) 3 21.85 22.63 22.67 23.43 21.18 23.07 (486) (258) (125) (30) (9) (7)

 • Analysis of variance Source of variation Sum of squares d.f Mean square F P Education 2272.92 3 757.64 113.53 0.001 Gravidity 372.50 5 74.50 11.16 0.001 2-way interaction: Education x Gravidity 168.80 15 11.25 1.69 0.047 Residual 27534.84 4126 6.67 Total 30155.76 4149 7.27 • Multiple classification analysisGrand mean = 21.18

 Unadjusted Adjusted for independents Variable + category n dev'n Eta dev'n Beta Education 0 none 1182 0.86 0.94 1 primary 698 0.39 0.42 2 secondary 1355 0.24 0.27 3 tertiary 915 1.06 1.13 0.26 0.28 Gravidity 0 1882 0.20 -0.31 1 1013 0.26 0.18 2 604 0.11 0.18 3 349 0.17 0.48 4 122 0.21 0.65 5+ 180 -0.23 0.31 0.08 0.11 Multiple R2 0.81 Multiple R 0.285

Continuous dependent variable and an independent variable with 2 categories (l-test and F-test)

One question of interest is whether there is any significant relationship between mothers' BMI and birth outcome, i.e. does the infant die? Since there are only two categories (death or no child death) a simple t-test will suffice. The simple t-test assumes non-significant differences in sample variances and a test for homogeneity of variances (F-test) is usually performed before going on to the l-test. If the F-test shows significant heterogeneity a separate variance t-test is used and most computer-based statistical packages (e.g. SPSS/PC+) provide both the pooled and separate variance t-tests.

The comparison of mean BMIs of mothers by birth outcome is presented in Table 1. Since there was no difference in sample variances a pooled t-test statistic was calculated. The results show that there is a highly significant difference in means; mothers whose child died have, on average, a lower mean BMI. In these analyses a two-tailed t-test was used because the null hypothesis (Ho) was that there was no difference between means. If, however, some previous study had shown a significantly reduced BMI in mothers whose child had died the hypothesis would have been the alternative one (H1) and a one-tailed t-test would have been used. The calculations of both one- and two-tailed l-tests are identical; the only difference is in the interpretation of the probability tables.

Continuous dependent variable and an independent variable with 3 or more categories (one-way analysis of variance)

It is frequently reported that BMI varies between people with different educational levels, where the educational level is taken as a proxy for a combination of knowledge of health matters and socio-economic status. In Bangladesh it is usual to grade people's educational attainment into four levels, no education (coded as 0 here), primary (1), secondary (2) and tertiary (3). The mean BMIs for the four groups are shown in Table 2 together with the analysis of variance (ANOVA). Many computer packages also include tests of a posterior) differences (i.e. the F-test is significant and the researcher wants to know which means are significant). There are a number of a posterior) tests; the one illustrated here is the Student-Newman-Keuls but other frequently used tests would be the Scheffé and a posterior) (t-test).

The ANOVA shows that there are highly significant differences between the four means. The a posterior) test reveals that all group means are very different.

Continuous dependent variables and two independent variables with 2 or more categories (ANOVA)

A slightly more complex analysis is used when the researcher wants to examine the simultaneous effect of two or more discrete characters on a continuous variable. One example is examining the relationship between BMI and educational level and gravidity. The same categories for educational level are used as described previously. Gravidity has been coded from 0 (primigravida) to 5 (the last category referring to mothers who have 5 or more children). The results of the ANOVA are presented in Table 3. The results show that there are significant additive effects of both educational level and gravidity and a borderline significant interaction effect. The multiple classification analysis compares each group in relation to the overall (grand) mean. It is clear for instance that the initial pattern of means for gravidity which show lower means for primigravida and multigravida (5+) women change when educational level is taken into account. The multiple R2 provides a measure of how much of the variation in BMI is explained by educational level and gravidity. In this example the two independent variables account for 8. 1% of the total variation.

Table 4. Regression analysis of BMI on mother's age

 Multiple R 0.14 R2 0.0197 Adjusted R2 0.0194 Standard error 2.6696

• Analysis of variance

 d.f Sum of square Mean squares Regression 1 593.71 593 71 Residual 4148 29562.05 7 13 F= 83.31,P<0.0001

 Variable B SE B Beta t P Age 0.077 0.0084 0.140 9.13 0.0001 (Constant) 19.300 0.210 91.94 0.001

Table 5. Test or curvilineanty of BMI against mother's age

 Step 1. Age entered Multiple R 0.140 R2 0.0197 Adjusted R2 0.0194 Standard error 2.6696

• Analysis of variance

 d.f Sum of squares Mean square Regression 1 593.71 593.71 Residual 4148 29562.05 7.13 F = 83.31, P < 0.0001

 Variable B SE B Beta t P Age 0.077 0.0084 0.140 9.13 0.0001 (Constant) 19.300 0.210 91.94 0.0001

Step 2. Age2 entered

 Multiple R 0.154 R2 0.0238 Adjusted R2 0.0233 Standard error 2.6644

• Analysis of variance

 d.f Sum of squares Mean square Regression 2 716.70 358.35 Residual 4147 29439.05 7.10 F = 50.48, P < 0.0001

 Variable B SE B Beta t P Age 0.356 0.068 0.652 5.26 0.0001 Age2 -0.005 0.001 -0.515 -.16 0.0001 (Constant) 15.76 0.875 18.00 0.0001

Continuous dependent variable and a continuous independent variable (regression analysis)

Regression analysis is used to examine the bivariate relationship between two continuous variables when there is no dependency or when the researcher wants to plot the best fitting line. Alternatively correlation analysis can be used if there is no dependent/independent relationship. The results of regressing BMI on age are shown in Table 4. There is a clear positive relationship with BMI increasing with mother's age and the regression line suggests that for each yearly increment in age BMI increases by ±0.08. It is always advisable to examine the residual plot because if there is a linear association residuals will be symmetrically arranged. In this analysis the examination of the residuals for BMI and age (not shown) revealed a curvilinear pattern which suggests that a quadratic term should be included in the analysis. The next section details how to test for a curvilinear relationship.

Table 6. Analysis of variance of BMI with age and age squared, educational level and gravidity

 Source of variation Sum of squares d.f Mean square F P Covariates Age 196.656 1 196.656 29.892 0.001 Age2 122.992 1 122.992 18.695 0.001 Main effects Education 1909.395 3 636.465 96.743 0.001 Gravidity 69.508 5 13.902 2.113 0.061 2-way interactions Education/Gravidity 181.932 15 12.129 1.844 0.024 Residual 27131.445 4124 6.579 Total 30155.752 4149 7.268

• Multiple classification analysis Grand mean = 21.178

 Unadjusted +covariates Adjusted for independents Variable + category n dev'n Eta dev'n Beta Education 1182 -0.86 -0.92 1 primary 698 -0.39 -0.36 2 secondary 1355 0.24 0.31 3 tertiary 915 1.06 0.99 0.26 0.27 Gravidity 0 1882 0.20 0.08 1 1013 0.26 0.18 2 604 0.11 -0.02 3 349 0.17 0.07 4 122 0.21 0.16 5+ 180 -0.23 0.38 0.08 0.05 Multiple R2 = 0.094 Multiple R = 0.307

Test for curvilinearity for a continuous dependent variable and a continuous independent variable (regression analysis)

With the inclusion of a quadratic term, the generalized regression equation changes from Y = a ± bX to Y = a ± bX±CX2. The analyses of BMI against mother's age (linear and quadratic) are presented in Table 5. The quadratic term for age is shown as Age2 in Table S and it is highly significant (t = 4.162, P < 0.0001) indicating significant curvilinearity. The effect of a negative quadratic term (-0.005) is to lower predicted BMIs at higher ages.

Continuous dependent variable and a continuous independent variable and a number of discrete independent variables (ANOVA or multiple regression analysis) The previous analyses have shown that there are relationships between BMI and educational level, gravidity and maternal age. The simultaneous effects of these variables can be examined using analysis of variance.

In this analysis of variance the effects of the continuous characters (age and age2) have been removed first of all before determining the effect of educational level and gravidity but researchers are usually free to choose in which order terms are removed. The results are presented in Table 6 and show that after removing the linear and quadratic effects of age, the impact of education remained very much as it was, whereas gravidity is no longer significant. In addition there is a significant interaction between education and gravidity (P = 0.024). About 9% of the variance of BMI is explained by the three variables.

Multiple regression analysis would give similar results to ANOVA and its use is discussed in the next section.