Contents - Previous - Next


Analysis of anthropometric variables


Testing for normality

Anthropometric characters tend to be continuous and many tests are constructed on the assumption that the data approximate to a normal distribution. An easy way of seeing whether the distribution is skewed is to compare the values of the mean and median. For normal distributions the mean and median are numerically identical. As the distribution becomes more skewed, the difference between mean and median increases. There are a number of statistical tests available for testing 'normality' and the researcher may well get different results depending on which test is used. For example the Kolmorogov-Smirnoff test examines the cumulative distribution, which conflates skewness and kurtosis, while the Cox test determines the extent of skewness and kurtosis separately. Since skewness is more constraining than kurtosis the Cox test is preferable. Nevertheless significant skewness and/or kurtosis may occur with large samples even though the magnitude of the effect (s) is very small.

Table 1. Mean BMIs of mothers by birth outcome

Birth outcome n

Mean

SD

Child died 345

20.36

2.66

Child survived 3805

21.25

2.68

Total 4105

21.18

2.69

F-test = 1.02, not significant.
t-test= 5.88, P < 0.001.

If the distribution of an anthropometric character does show significant skewness then a simple logarithmic (either log10 or loge) transformation will probably normalize the distribution. For instance body mass index (BMI: kg/m2) has been shown to show skewness in some populations because of the extended tail at the upper end of the distribution.

Table 2. One-way Analysis of Variance and a posterior) test


Educational level and BMI in Bangladesh


0

1

2

3



None

Primary

Secondary

Tertiary


Mean

20.32

20.79

21.41

22.24


n

1182

698

1355

915


• Analysis of variance Source

d.f

Sum of squares

Mean squares

F ratio

P

Between groups

3

2079.1

693.20

102.37

<0.0001

Within groups

4146

28076.4

6.77



Total

4149

30155.75




• Multiple range test: Student-Newman-Keuls procedure
*Denotes pairs of groups significantly different at the 0.050 level

Mean

Group

0

1

2

3

20.32

Group 0





20.79

Group 1

*




21.41

Group 2

*

*



22.24

Group 3

*

*

*


Cross-sectional statistical analyses

To illustrate the types of tests which can be used, data from a large Bangladeshi survey of 4150 mother-child pairs in which mothers' anthropometric data were related to birth outcome have been used. The study was conducted in 10 medical centres in Bangladesh and all the women were full term. Mothers with antepartum haemorrhage, or undergoing miscarriage and abortion, multiple pregnancy, eclampsia or with gross fetal abnormalities were excluded.

Table 3. Analysis of variance of BMI by educational level and gravidity

Cell means:

Total population 21.18
(n = 4150)







Education level


0

1

2

3




20.32

20.79

21.41

22.24




(1182)

(698)

(1355)

(915)




Gravidity


0

1

2

3

4

5+


20.98

21.44

21.29

21.35

21.38

20.95


(1882)

(1013)

(604)

(349)

(122)

(180)


Gravidity

Educational level

0

1

2

3

4

5+

0

20.01

20.62

20.29

20.40

21.23

20.29


(429)

(246)

(189)

(151)

(62)

(105)

1

20.52

20.66

20.74

21.57

21.19

21.90


(286)

(163)

(116)

(67)

(29)

(37)

2

21.16

21.51

21.76

22.00

22.16

21.57


(681)

(346)

(174)

(101)

(22)

(31)

3

21.85

22.63

22.67

23.43

21.18

23.07


(486)

(258)

(125)

(30)

(9)

(7)

• Analysis of variance Source of variation

Sum of squares

d.f

Mean square

F

P

Education

2272.92

3

757.64

113.53

0.001

Gravidity

372.50

5

74.50

11.16

0.001

2-way interaction:






Education x Gravidity

168.80

15

11.25

1.69

0.047

Residual

27534.84

4126

6.67



Total

30155.76

4149

7.27



• Multiple classification analysis
Grand mean = 21.18








Unadjusted

Adjusted for independents

Variable + category

n

dev'n

Eta

dev'n

Beta

Education






0 none

1182

0.86


0.94


1 primary

698

0.39


0.42


2 secondary

1355

0.24


0.27


3 tertiary

915

1.06


1.13





0.26


0.28

Gravidity






0

1882

0.20


-0.31


1

1013

0.26


0.18


2

604

0.11


0.18


3

349

0.17


0.48


4

122

0.21


0.65


5+

180

-0.23


0.31





0.08


0.11

Multiple R2





0.81

Multiple R





0.285

Continuous dependent variable and an independent variable with 2 categories (l-test and F-test)

One question of interest is whether there is any significant relationship between mothers' BMI and birth outcome, i.e. does the infant die? Since there are only two categories (death or no child death) a simple t-test will suffice. The simple t-test assumes non-significant differences in sample variances and a test for homogeneity of variances (F-test) is usually performed before going on to the l-test. If the F-test shows significant heterogeneity a separate variance t-test is used and most computer-based statistical packages (e.g. SPSS/PC+) provide both the pooled and separate variance t-tests.

The comparison of mean BMIs of mothers by birth outcome is presented in Table 1. Since there was no difference in sample variances a pooled t-test statistic was calculated. The results show that there is a highly significant difference in means; mothers whose child died have, on average, a lower mean BMI. In these analyses a two-tailed t-test was used because the null hypothesis (Ho) was that there was no difference between means. If, however, some previous study had shown a significantly reduced BMI in mothers whose child had died the hypothesis would have been the alternative one (H1) and a one-tailed t-test would have been used. The calculations of both one- and two-tailed l-tests are identical; the only difference is in the interpretation of the probability tables.

Continuous dependent variable and an independent variable with 3 or more categories (one-way analysis of variance)

It is frequently reported that BMI varies between people with different educational levels, where the educational level is taken as a proxy for a combination of knowledge of health matters and socio-economic status. In Bangladesh it is usual to grade people's educational attainment into four levels, no education (coded as 0 here), primary (1), secondary (2) and tertiary (3). The mean BMIs for the four groups are shown in Table 2 together with the analysis of variance (ANOVA). Many computer packages also include tests of a posterior) differences (i.e. the F-test is significant and the researcher wants to know which means are significant). There are a number of a posterior) tests; the one illustrated here is the Student-Newman-Keuls but other frequently used tests would be the Scheffé and a posterior) (t-test).

The ANOVA shows that there are highly significant differences between the four means. The a posterior) test reveals that all group means are very different.

Continuous dependent variables and two independent variables with 2 or more categories (ANOVA)

A slightly more complex analysis is used when the researcher wants to examine the simultaneous effect of two or more discrete characters on a continuous variable. One example is examining the relationship between BMI and educational level and gravidity. The same categories for educational level are used as described previously. Gravidity has been coded from 0 (primigravida) to 5 (the last category referring to mothers who have 5 or more children). The results of the ANOVA are presented in Table 3. The results show that there are significant additive effects of both educational level and gravidity and a borderline significant interaction effect. The multiple classification analysis compares each group in relation to the overall (grand) mean. It is clear for instance that the initial pattern of means for gravidity which show lower means for primigravida and multigravida (5+) women change when educational level is taken into account. The multiple R2 provides a measure of how much of the variation in BMI is explained by educational level and gravidity. In this example the two independent variables account for 8. 1% of the total variation.

Table 4. Regression analysis of BMI on mother's age

Multiple R

0.140

R2

0.0197

Adjusted R2

0.0194

Standard error

2.6696

• Analysis of variance


d.f

Sum of square

Mean squares

Regression

1

593.71

593 71

Residual

4148

29562.05

7 13

F= 83.31,P<0.0001




Variable

B

SE B

Beta

t

P

Age

0.077

0.0084

0.140

9.13

0.0001

(Constant)

19.300

0.210


91.94

0.001

Table 5. Test or curvilineanty of BMI against mother's age

Step 1. Age entered


Multiple R

0.140

R2

0.0197

Adjusted R2

0.0194

Standard error

2.6696

• Analysis of variance


d.f

Sum of squares

Mean square

Regression

1

593.71

593.71

Residual

4148

29562.05

7.13

F = 83.31, P < 0.0001




Variable

B

SE B

Beta

t

P

Age

0.077

0.0084

0.140

9.13

0.0001

(Constant)

19.300

0.210


91.94

0.0001

Step 2. Age2 entered



Multiple R

0.154

R2

0.0238

Adjusted R2

0.0233

Standard error

2.6644

• Analysis of variance


d.f

Sum of squares

Mean square

Regression

2

716.70

358.35

Residual

4147

29439.05

7.10

F = 50.48, P < 0.0001




Variable

B

SE B

Beta

t

P

Age

0.356

0.068

0.652

5.26

0.0001

Age2

-0.005

0.001

-0.515

-.16

0.0001

(Constant)

15.76

0.875


18.00

0.0001

Continuous dependent variable and a continuous independent variable (regression analysis)

Regression analysis is used to examine the bivariate relationship between two continuous variables when there is no dependency or when the researcher wants to plot the best fitting line. Alternatively correlation analysis can be used if there is no dependent/independent relationship. The results of regressing BMI on age are shown in Table 4. There is a clear positive relationship with BMI increasing with mother's age and the regression line suggests that for each yearly increment in age BMI increases by ±0.08. It is always advisable to examine the residual plot because if there is a linear association residuals will be symmetrically arranged. In this analysis the examination of the residuals for BMI and age (not shown) revealed a curvilinear pattern which suggests that a quadratic term should be included in the analysis. The next section details how to test for a curvilinear relationship.

Table 6. Analysis of variance of BMI with age and age squared, educational level and gravidity

Source of variation

Sum of squares

d.f

Mean square

F

P

Covariates






Age

196.656

1

196.656

29.892

0.001

Age2

122.992

1

122.992

18.695

0.001

Main effects






Education

1909.395

3

636.465

96.743

0.001

Gravidity

69.508

5

13.902

2.113

0.061

2-way interactions






Education/Gravidity

181.932

15

12.129

1.844

0.024

Residual

27131.445

4124

6.579



Total

30155.752

4149

7.268



• Multiple classification analysis Grand mean = 21.178



Unadjusted +covariates

Adjusted for independents

Variable + category

n

dev'n

Eta

dev'n

Beta

Education

1182

-0.86



-0.92

1 primary

698

-0.39


-0.36


2 secondary

1355

0.24


0.31


3 tertiary

915

1.06


0.99





0.26


0.27

Gravidity






0

1882

0.20


0.08


1

1013

0.26


0.18


2

604

0.11


-0.02


3

349

0.17


0.07


4

122

0.21


0.16


5+

180

-0.23


0.38





0.08


0.05

Multiple R2 = 0.094






Multiple R = 0.307






Test for curvilinearity for a continuous dependent variable and a continuous independent variable (regression analysis)

With the inclusion of a quadratic term, the generalized regression equation changes from Y = a ± bX to Y = a ± bX±CX2. The analyses of BMI against mother's age (linear and quadratic) are presented in Table 5. The quadratic term for age is shown as Age2 in Table S and it is highly significant (t = 4.162, P < 0.0001) indicating significant curvilinearity. The effect of a negative quadratic term (-0.005) is to lower predicted BMIs at higher ages.

Continuous dependent variable and a continuous independent variable and a number of discrete independent variables (ANOVA or multiple regression analysis) The previous analyses have shown that there are relationships between BMI and educational level, gravidity and maternal age. The simultaneous effects of these variables can be examined using analysis of variance.

In this analysis of variance the effects of the continuous characters (age and age2) have been removed first of all before determining the effect of educational level and gravidity but researchers are usually free to choose in which order terms are removed. The results are presented in Table 6 and show that after removing the linear and quadratic effects of age, the impact of education remained very much as it was, whereas gravidity is no longer significant. In addition there is a significant interaction between education and gravidity (P = 0.024). About 9% of the variance of BMI is explained by the three variables.

Multiple regression analysis would give similar results to ANOVA and its use is discussed in the next section.


Contents - Previous - Next