Analysis of variance and regression December 4, 2007 Variance - - PowerPoint PPT Presentation

analysis of variance and regression december 4 2007
SMART_READER_LITE
LIVE PREVIEW

Analysis of variance and regression December 4, 2007 Variance - - PowerPoint PPT Presentation

Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed random effects


slide-1
SLIDE 1

Analysis of variance and regression December 4, 2007

slide-2
SLIDE 2

Variance component models

  • Variance components
  • One-way anova with random variation

– estimation – interpretations

  • Two-way anova with random variation
  • Crossed random effects
  • Ecological analyses
slide-3
SLIDE 3

Lene Theil Skovgaard,

  • Dept. of Biostatistics,

Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2

slide-4
SLIDE 4

Variance Component Models, December 2007 1

Terminology for correlated measurements:

  • Multivariate outcome:

Several outcomes (responses) for each individual, e.g. a number

  • f hormone measurements that we want to study simultaneously.
  • Cluster design:

Same outcome (response) measured on all individuals in a number of families/villages/school classes

  • Repeated measurements:

Same outcome (response) measured in different situations (or at different spots) for the same individual.

  • Longitudinal measurements:

Same outcome (response) measured consecutively over time for each individual.

slide-5
SLIDE 5

Variance Component Models, December 2007 2

Variance component models Generalisations of ANOVA-type models or regression models, involving several sources of random variation (variance components)

  • environmental variation

– between regions, hospitals or countries

  • biological variation

– variation between individuals, families or animals

  • within-individual variation

– variation between arms, teeth, injection sites, days

  • variation due to uncontrollable circumstances

– time of day, temperature, observer

  • measurement error
slide-6
SLIDE 6

Variance Component Models, December 2007 3

Typical studies involve data from:

  • a number of family members from a sample of

households

  • pupils from a sample of school classes
  • measurements on several spots of each individual

Alternative name (for some of them): Multilevel models

  • variation on each level (variance component)
  • possibly systematic effects (covariates) on each level
slide-7
SLIDE 7

Variance Component Models, December 2007 4

Examples of hierarchies: individual → context/cluster → level 1 → level 2 → level 3 subjects → twin pairs → countries subjects → families → regions students → classes → schools visits → subjects → centres

slide-8
SLIDE 8

Variance Component Models, December 2007 5

Merits

  • Certain effects may be estimated more precisely, since

some sources of variation are eliminated, e.g. by making comparisons within a family. This is analogous to the paired comparison situation.

  • When planning subsequent investigations, the

knowledge of the relative sizes of the variance components will be of help in deciding the number of repetitions needed at each level (if possible).

slide-9
SLIDE 9

Variance Component Models, December 2007 6

Drawbacks

  • When making inference (estimation and testing), it is

important to take all sources of variation into account, and effects have to be evaluated using the relevant variation!

  • Bias may result, if one or more sources of variation are

disregarded

slide-10
SLIDE 10

Variance Component Models, December 2007 7

Measurements ’belonging together’ in the same cluster look alike (are correlated) If we fail to take this correlation into account, we will experience:

  • possible bias in the mean value structure
  • low efficiency (type 2 error) for evaluation of

level 1 covariates (within-cluster effects)

  • too small standard errors (type 1 error) for estimates of

level 2 effects (between-cluster effects)

slide-11
SLIDE 11

Variance Component Models, December 2007 8

Concepts of the day:

  • advantage/necessity of random effects
  • generalisations of ANOVA-type models

Examples with small data sets

  • some of them too small to allow for trustworthy

interpretations

  • illustrative precisely because of their limited size

Illustrated with SAS PROC MIXED

slide-12
SLIDE 12

Variance Component Models, December 2007 9

One-way analysis of variance – with random variation: Comparison of k ’groups/clusters’, satisfying

  • The groups are not of individual interest and it is of no

interest to test whether they have identical means

  • The groups may be thought of as representatives from

a population, that we want to describe. Example: 10 consecutive measurements of blood pressure

  • n a sample of 50 women:
  • We ’know’ that the women differ – and we do not care!
  • We only want to learn something about blood pressure

in the female population in general

slide-13
SLIDE 13

Variance Component Models, December 2007 10

Example of one-way anova structure: 6 rabbits are vaccinated, each in 6 spots on the back Response Y : swelling in cm2 Model: swelling = ’grand mean’ + ’rabbit deviation’ +’variation’ yrs = µ + αr + εrs, εrs ∼ N(0, σ2), where

r = 1, · · · , R = 6 denotes the rabbit, s = 1, · · · , S = 6 denotes the spot

The variation can be regarded either as ’within-rabbit variation’ or ’measurement error’ (probably a combination

  • f the two).
slide-14
SLIDE 14

Variance Component Models, December 2007 11

Rabbit means: µr = µ + αr

slide-15
SLIDE 15

Variance Component Models, December 2007 12

anova-table:

SS df MS=SS/df F Between 12.8333 R − 1 = 5 2.5667 4.39 Within 17.5266 R(S − 1) = 30 0.5842 Total 30.3599 RS − 1 = 35 0.8674

Test for identical rabbits means: F = 4.39 ∼ F(5, 30), P = 0.004,

But: We are not interested in these particular 6 rabbits,

  • nly in rabbits in general, as a species!

We assume these 6 rabbits to have been randomly selected from the species.

slide-16
SLIDE 16

Variance Component Models, December 2007 13

We choose to model rabbit variation instead of rabbit levels: swelling = ’grand mean’ +’between-rabbit variation’ +’within-rabbit variation’ yrs = µ + ar + εrs, where the ar’s and the εrs’s are assumed independent, normally distributed with Var(ar)=ω2

B,

Var(εrs)=σ2

W

The variation between rabbits has been made random ω2

B and σ2 W are variance components, and

the model is also called a two-level model

slide-17
SLIDE 17

Variance Component Models, December 2007 14

Fixed vs. random effects?

  • Fixed:

– all values of the factor present (typically only a few, e.g. treatment) – allows inference for these particular factor values only – must include a reasonable number of observations for each factor value

  • Random:

– a representative sample of values of the factor is present – allows inference to be extended beyond the values in the experiment and to the population of possible factor values (e.g. geographical areas, classes, rabbits) – is necessary when we have a covariate for this level

slide-18
SLIDE 18

Variance Component Models, December 2007 15

Interpretation: All observations have common mean and variance: yrs ∼ N(µ, ω2

B + σ2 W )

but: Measurements made on the same rabbit are correlated with the intra-class correlation Corr(yr1, yr2) = ρ = ω2

B

ω2

B + σ2 W

Measurements made on the same rabbit tend to look more alike than measurements made on different rabbits. All measurements on the same rabbit look equally much alike. This correlation structure is called compound symmetry (CS)

  • r exchangeability.
slide-19
SLIDE 19

Variance Component Models, December 2007 16

Estimation of variance components First step is to determine the mean values

  • f the mean squares (in balanced situations):

E(MSB) = Rω2

B + σ2 W

E(MSW) = σ2

W

and from this we get the estimates ˜ σ2

W = MSW

˜ σ2

B = MSB − MSW

R

slide-20
SLIDE 20

Variance Component Models, December 2007 17

Note: It may happen that ˜ σ2

B becomes negative!

  • by a coincidence
  • as a result of competition between units belonging

together, e.g. when measuring yield for plants grown in the same pot In this case, it will be reported as a zero

slide-21
SLIDE 21

Variance Component Models, December 2007 18

Reading in data in SAS:

data rabbit_orig; input spot $ y1-y6; cards; a 7.9 8.7 7.4 7.4 7.1 8.2 b 6.1 8.2 7.7 7.1 8.1 5.9 c 7.5 8.1 6.0 6.4 6.2 7.5 d 6.9 8.5 6.8 7.7 8.5 8.5 e 6.7 9.9 7.3 6.4 6.4 7.3 f 7.3 8.3 7.3 5.8 6.4 7.7 ; run; data rabbit; set rabbit_orig; rabbit=1; swelling=y1; output; rabbit=2; swelling=y2; output; rabbit=3; swelling=y3; output; rabbit=4; swelling=y4; output; rabbit=5; swelling=y5; output; rabbit=6; swelling=y6; output; run;

slide-22
SLIDE 22

Variance Component Models, December 2007 19

In SAS, the estimation can be performed as:

proc mixed data=rabbit; class rabbit; model swelling = / s; random rabbit; run;

Covariance Parameter Estimates Cov Parm Estimate rabbit 0.3304 Residual 0.5842 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 7.3667 0.2670 5 27.59 <.0001

slide-23
SLIDE 23

Variance Component Models, December 2007 20

Interpretation of variance components:

Proportion of Variation Variance component Estimate variation Between ω2

B

0.3304 36% Within σ2

W

0.5842 64% Total ω2

B + σ2 W

0.9146 100%

Typical differences (95% Prediction Intervals):

  • for spots on the same rabbit

±2 × √2 × 0.5842 = ±2.16 cm2

  • for spots on different rabbits

±2 × √2 × 0.9146 = ±2.70 cm2

slide-24
SLIDE 24

Variance Component Models, December 2007 21

Interpretation of the size of the variance components:

  • Approx. 2

3 of the variation in the measurements comes from the

variation within rabbits. Maybe there is a systematic difference between the injection spots? Two-way anova:

Source DF Type III SS Mean Square F Value Pr > F rabbit 5 12.833333 2.566667 4.69 0.0037 spot 5 3.833333 0.766667 1.40 0.2584

It does not look as if there is any systematic difference (P=0.26).

slide-25
SLIDE 25

Variance Component Models, December 2007 22

Design considerations Imaginary experiment with measurements on R rabbits, and S spots for each rabbit. Var(¯ y) = ω2

B

R + σ2

W

RS For S=#spots, varying from 1 to 10:

5 10 15 20 0.15 0.20 0.25 0.30 0.35 0.40 0.45 rabbits standard error

slide-26
SLIDE 26

Variance Component Models, December 2007 23

Effective sample size If we had only one observation for each of k rabbits, how many would we need to obtain the same precision? k = R × S 1 + ρ(S − 1) We have here ρ =

ω2

B

ω2

B+σ2 W =

0.3304 0.3304+0.5842 = 0.361 ⇒ k = 12.8

Effectively, we have only approximately two independent

  • bservations from each rabbit!
slide-27
SLIDE 27

Variance Component Models, December 2007 24

Quantification of

  • verall swelling

method estimate (s.e.) 1: forget rabbit 7.367 (0.155) 2: fixed rabbit 7.367 (0.127) 3: rabbit averages 7.367 (0.267) 4: random rabbit 7.367 (0.267)

  • 1. We pool all 36

measure- ments, mixing up the two variance components, and assuming independence

  • 2. We estimate the mean swel-

ling of exactly these 6 rabbits (using only within-rabbit variation)

  • 3. We only look at averages for

each rabbit (ecological analysis)

  • 4. We estimate the mean swel-

ling of rabbits as a species (the correct approach)

slide-28
SLIDE 28

Variance Component Models, December 2007 25

Estimation of individual rabbit means:

  • simple averages rely on individual measurements only,

¯ yr.

  • BLUP’s (or EBLUP’s, expected best unbiased linear

predictor) rely on the assumption that individuals come from the same population, and become weighted averages: ˜ ω2

B

˜ ω2

B + ˜ σ2

W

S

¯

  • yr. +

˜ σ2

W

S

˜ ω2

B + ˜ σ2

W

S

¯ y.. which have been shrinked towards the overall mean, ¯ y..

slide-29
SLIDE 29

Variance Component Models, December 2007 26

1 1 2 2 3 3 4 4 5 5 6 6

slide-30
SLIDE 30

Variance Component Models, December 2007 27

When the 3 smallest measurements from rabbit 2 (largest level) are

  • mitted, the results become:

method estimate (s.e.) 1: forget rabbit 7.291 (0.163) 2: fixed rabbit 7.291 (0.136) 3a: rabbit averages 7.291 (0.265) (weighted) 3b: rabbit averages 7.436 (0.333) (unweighted) 4: random rabbit 7.390 (0.298) reference 7.367 (0.267)

1 we have omitted some of the largest observations 2+3a rabbit 2 has a lower weight in the average due to

  • nly 3 observations

3b average for rabbit 2 has in- creased 4 rabbit 2 has a lower weight in the average due to a larger standard error

slide-31
SLIDE 31

Variance Component Models, December 2007 28

EBLUPS for the reduced data set: Larger shrinkage than before, for rabbit no. 2

1 1 2 2 3 3 4 4 5 5 6 6

slide-32
SLIDE 32

Variance Component Models, December 2007 29

Confidence limits for the variance components:

  • Intra-individual variation σ2

W:

0.373 < σ2

W < 1.044

  • Inter-individual variation ω2

B:

0.057 < ω2

B < 2.48

So, we should take care not to over-interpret.......

slide-33
SLIDE 33

Variance Component Models, December 2007 30

We imagine, that rabbits are grouped in two (grp=1,2)

proc mixed data=rabbit; class grp rabbit; model swelling = grp / s; random rabbit(grp); run;

Cov Parm Estimate rabbit(grp) 0.3633 <----------------this changes Residual 0.5842 <---------this stays the same Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 7.1444 0.3919 4 18.23 <.0001 grp 1 0.4444 0.5542 4 0.80 0.4675 grp 2 . . . .

slide-34
SLIDE 34

Variance Component Models, December 2007 31

Such a comparison can not be performed in the usual way (ignoring the rabbits), since we then perform the comparison/test against a wrong variation. Type I error will occur!

proc glm data=rabbit; class grp; model swelling=grp / solution; run; T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 7.144444444 B 33.06 0.0001 0.21610872 GRP 1 0.444444444 B 1.45 0.1551 0.30562388 2 0.000000000 B . . .

slide-35
SLIDE 35

Variance Component Models, December 2007 32

Two-level model: level unit variation covariates 1 rabbit*spot within rabbit spot 2 rabbit between rabbits group

  • verall mean

When the random rabbit variation is ignored:

  • low efficiency (type 2 error) for evaluation of level 1

covariates (spot)

  • too small standard errors (type 1 error) for estimates of

level 2 effects (group, overall mean)

slide-36
SLIDE 36

Variance Component Models, December 2007 33

Factor diagrams: In the traditional one-way anova: [I] = [R*S] − → [R] − → 0 In case of grouping: [I] = [R*S] − → [R] − → G − → 0 We have here used the notation

  • arrows indicating simplifications / groupings
  • [ ] for the random effects, corresponding to variance

components on the various levels.

slide-37
SLIDE 37

Variance Component Models, December 2007 34

Example: Number of nuclei per cell in the rat pancreas, used for the evaluation of cytostatica

Henrik Winther Nielsen, Inst. Med. Anat.

4 rats (R) 3 sections for each rat (S) 5 randomly chosen fields from each section (F) 3-level model fields → sections → rats σ2 τ 2 ω2 Factor diagram: [I] = [R*S*F] − → [R*S] − → [R] − → 0

slide-38
SLIDE 38

Variance Component Models, December 2007 35

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

Covariance Parameter Estimates Cov Parm Estimate rat 0.01787 section(rat) 0.002878 Residual 0.1968 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 0.7935 0.08936 3 8.88 0.0030

slide-39
SLIDE 39

Variance Component Models, December 2007 36

Estimation of variance components Proportion of Variation Variance component Estimate variation Rats ω2 0.0179 8.2% Sections τ 2 0.0029 1.3% Fields σ2 0.1968 90.4% Total ω2 + τ 2 + σ2 0.2176 100%

slide-40
SLIDE 40

Variance Component Models, December 2007 37

Typical differences:

  • for sections on different rats

±2 ×

  • 2 × (0.0179 + 0.0029 + 0.1968) = ±1.319
  • for different sections on the same rat

±2 ×

  • 2 × (0.0029 + 0.1968) = ±1.264
  • for different fields on the same section

±2 × √2 × 0.1968 = ±1.255

slide-41
SLIDE 41

Variance Component Models, December 2007 38

The correlation between two measurements on the same rat becomes:

  • if they are measured on the same section:

Corr(yrs1, yrs2) = ω2 + τ 2 ω2 + τ 2 + σ2 = 0.096

  • if they are measured on different sections:

Corr(yr11, yr22) = ω2 ω2 + τ 2 + σ2 = 0.082

slide-42
SLIDE 42

Variance Component Models, December 2007 39

Examples of hierarchies:

individual → cluster → level 1 → level 2 → level 3 spots → rabbits → fields → sections → rats subjects → twin pairs → countries subjects → families → regions students → classes → schools visits → subjects → centres

On all levels, we may have random variation (random effects or variance components), as well as covariates

slide-43
SLIDE 43

Variance Component Models, December 2007 40

Example: 2 groups of dogs (5 resp. 6 dogs). Outcome: Osmolality, measured at 4 different times (with treatments along the way) Average profiles:

slide-44
SLIDE 44

Variance Component Models, December 2007 41

Do we have repetitions?

slide-45
SLIDE 45

Variance Component Models, December 2007 42

Residual plot (after suitable analysis): We see a clear trumpet shape, corresponding to the fact, that

  • Dogs with a high level also

vary more than dogs with a low level. Solution: Make a logarithmic transformation!

slide-46
SLIDE 46

Variance Component Models, December 2007 43

Profiles on logarithmic scale with corresponding residual plot:

slide-47
SLIDE 47

Variance Component Models, December 2007 44

Two-level model: level unit variation covariates 1 dog*time within dogs grp*time time 2 dog between dogs group

  • verall mean

proc mixed data=dogs; class grp time dog; model losmol=grp time grp*time / outpm=fit1 ddfm=satterth; random dog(grp); run;

slide-48
SLIDE 48

Variance Component Models, December 2007 45 Class Level Information Class Levels Values grp 2 1 2 time 4 50 100 170 290 dog 11 1 2 3 4 5 6 7 8 9 10 11 Covariance Parameter Estimates Cov Parm Estimate dog(grp) 0.06587 Residual 0.03554 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time 3 27 21.35 <.0001 grp*time 3 27 2.50 0.0805

P=0.08 for test of interaction, i.e. no convincing indication

  • f this.
slide-49
SLIDE 49

Variance Component Models, December 2007 46

When there is no interaction, we simply omit the term from the model (but we could also just use averages, since the design is balanced) proc mixed covtest data=dogs; class grp time dog; model losmol=grp time / outpm=fit2 ddfm=satterth s; random dog(grp); run;

Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value Pr Z dog(grp) 0.06453 0.03534 1.83 0.0339 Residual 0.04088 0.01056 3.87 <.0001

slide-50
SLIDE 50

Variance Component Models, December 2007 47 Solution for Fixed Effects Standard Effect grp time Estimate Error DF t Value Pr > |t| Intercept 0.5422 0.1235 9 4.39 0.0017 grp 1 0.2795 0.1656 9 1.69 0.1257 grp 2 . . . . time 50 0.1215 0.08621 30 1.41 0.1691 time 110

  • 0.2173

0.08621 30

  • 2.52

0.0173 time 170

  • 0.4608

0.08621 30

  • 5.35

<.0001 time 290 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time 3 30 17.66 <.0001

slide-51
SLIDE 51

Variance Component Models, December 2007 48

In contrast, if we forget the random dog-effect, and perform a traditional two-way anova: proc glm data=dogs; class grp time; model losmol=grp time / solution; run;

The GLM Procedure Dependent Variable: losmol Source DF Type III SS Mean Square F Value Pr > F grp 1 0.85213853 0.85213853 8.48 0.0059 time 3 2.16564038 0.72188013 7.19 0.0006

slide-52
SLIDE 52

Variance Component Models, December 2007 49 Standard Parameter Estimate Error t Value Pr > |t| Intercept 0.5421708688 B 0.10504427 5.16 <.0001 grp 1 0.2794864906 B 0.09595797 2.91 0.0059 grp 2 0.0000000000 B . . . time 50 0.1214934244 B 0.13514314 0.90 0.3742 time 110

  • .2172752206 B

0.13514314

  • 1.61

0.1160 time 170

  • .4608255057 B

0.13514314

  • 3.41

0.0015 time 290 0.0000000000 B . . .

  • Type 2 error for effect of time (level 1 covariate)

time is evaluated in an unpaired fashion

  • Type 1 error for effect of grp (level 2 covariate)

we think we have more information than we actually have (we disregard the correlation)

slide-53
SLIDE 53

Variance Component Models, December 2007 50

Factor diagram: [I] [Dog ∗ Time] Grp ∗ Time [Dog] Time Grp ✲ ❍❍❍❍ ❥ ✟✟✟✟ ✯ ✲ ✲ We note the following:

  • The effect of GRP*TIME is evaluated against DOG*TIME
  • If GRP*TIME is not considered significant, we thereafter evaluate

– TIME against DOG*TIME – GRP against DOG, also called DOG(GRP)

slide-54
SLIDE 54

Variance Component Models, December 2007 51

Interpretation of group effect: The estimated group difference is 0.279 (0.166), corresponding to a 95% confidence interval of (-0.053,0.611). But this is on a logarithmic scale! We perform a back transformation with the exponential function and may conclude, that group 1 lies exp(0.279)=1.321 times higher than group 2, i.e. 32.1% higher. The 95% confidence interval is (exp(-0.053),exp(0.611))=(0.948,1.842)

slide-55
SLIDE 55

Variance Component Models, December 2007 52

Example of a non-hierarchical model: Visual acuity: time in msec. from a stimulus (light flash) to the electrical response at the back of the cortex, measured for 7 individuals (patient), 2 eyes for each individual (eye) 4 lens magnifications (power) for each eye

Crowder & Hand (1990)

1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7

slide-56
SLIDE 56

Variance Component Models, December 2007 53

Predictors of visual acuity:

  • Main effects:

– Systematic (mean value): eye, power – Random: patient

  • Interactions:

– Systematic (mean value): eye*power – Random: patient*eye, patient*power Example of crossed random factors

slide-57
SLIDE 57

Variance Component Models, December 2007 54

proc mixed data=visual; class patient eye power; model acuity=eye power eye*power / ddfm=satterth; random patient patient*eye patient*power; run;

Cov Parm Estimate patient 20.2857 patient*eye 11.6845 patient*power 4.0238 Residual 12.8333 Type 3 Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F eye 1 6 0.78 0.4112 power 3 18 2.25 0.1177 eye*power 3 18 1.06 0.3925

slide-58
SLIDE 58

Variance Component Models, December 2007 55

Factor diagram: Ey [Pa ∗ Ey] [Pa] Ey ∗ Po [I] = [Pa ∗ Ey ∗ Po] Po [Pa ∗ Po] ❍❍❍❍❍❍ ❍ ❥ ❍❍❍❍❍❍ ❍ ❥ ❍❍❍❍❍ ❍ ❥ ❍❍❍ ❍ ❥ ✟ ✟ ✟ ✟ ✙ ✟ ✟ ✟ ✟ ✟ ✟ ✙ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✙ ✟ ✟ ✟ ✟ ✟ ✟ ✟ ✙ ❄ ❄ ❄ ❄

slide-59
SLIDE 59

Variance Component Models, December 2007 56

level unit covariates 1 single measurements Ey*Po 2 interactions 2A [Pa*Ey] Ey 2B [Pa*Po] Po 2 individuals, [Pa]

  • verall level
slide-60
SLIDE 60

Variance Component Models, December 2007 57

slide-61
SLIDE 61

Variance Component Models, December 2007 58

Blood pressure and social inequity 15569 women in 17 regions of Malm¨

  • Covariates:
  • Individual (level 1):

– low educational achievement (x1)

(less than 9 years of school)

– age group (x2)

  • Regional (level 2):

– rate of people with low educational achievement (z1)

from the ’Sk˚ ane Council Statistics Office’

slide-62
SLIDE 62

Variance Component Models, December 2007 59

Ecological analysis level 2 analysis (analysis of regional averages): Y: average blood pressure in residential area Z1: rate of people with low educational achievement Estimate of regression coefficient: 4.655(1.420) It seems to be an important explanatory variable !?

slide-63
SLIDE 63

Variance Component Models, December 2007 60

Size of circle indicates size

  • f investigation

Obvious effect of aggregated covariate

slide-64
SLIDE 64

Variance Component Models, December 2007 61

slide-65
SLIDE 65

Variance Component Models, December 2007 62

Estimates from variance component model:

Covariates Individual variation Rate of variation in low education between low-education between model individuals regions x1 σ2

W

z1 ω2

B

none

  • 96.034
  • 0.347

age

  • 92.213
  • 0.258

x1, age 1.152 (0.170) 91.830

  • 0.143

z1, age

  • 91.484

4.058 (1.345) 0.121 x1, z1, age 1.093 (0.167) 91.256 2.966 (1.250) 0.087

slide-66
SLIDE 66

Variance Component Models, December 2007 63

We note the following:

  • Region as a random effect could only account for 0.4%
  • f the variation in blood pressure!
  • Ecological variable (’Rate of low-income’) will have

very little impact!

  • Ecological analysis ’sums up’ the two effects, but is not

able to distinguish between the two effects – It overestimates the level 2 effect – It cannot be interpreted as a level 1 effect

slide-67
SLIDE 67

Variance Component Models, December 2007 64

slide-68
SLIDE 68

Variance Component Models, December 2007 65

Covariate effects on level 1 and level 2 can be very different Example: Reading ability, as a function of age and cohort:

1 1 2 2 3 3 4 4 5 5 6 6

1 1 1 1 1 1 2 2 2 2 2 2

slide-69
SLIDE 69

Variance Component Models, December 2007 66

Misspecification Result missing random effect

  • type 2 error for x (unpaired)
  • type 1 error for z

(too many df’s, wrong variation) missing z

  • estimate of ω2

B too big

  • estimate of σ2

W perhaps too big

(in unbalanced designs) missing x

  • estimate of ω2

B too big

  • r too small
  • estimate of σ2

W too big

slide-70
SLIDE 70

Variance Component Models, December 2007 67

Simulated data: Random effect of individual

1.0 1.5 2.0 2.5 3.0 3.5 4.0 2 4 6 8 10 12 14 p y

slide-71
SLIDE 71

Variance Component Models, December 2007 68

Estimates: Level Variation standard deviation 1 within individuals ˆ σW = 1.59 2 between individuals ˆ ωB = 1.23

slide-72
SLIDE 72

Variance Component Models, December 2007 69

Individual (level 1) covariate x, e.g. time/age:

2 4 6 8 10 12 14 2 4 6 8 10 12 14 x y

slide-73
SLIDE 73

Variance Component Models, December 2007 70

Estimates: standard regression Level Variation deviation coefficient 1 within individuals ˆ σW = 0.41 βx = 1.028(0.046) 2 between individuals ˆ ωB = 4.43

slide-74
SLIDE 74

Variance Component Models, December 2007 71

Addition of a level 2 covariate: z, e.g. age:

2 4 6 8 10 12 2 4 6 8 10 12 14 x y

slide-75
SLIDE 75

Variance Component Models, December 2007 72

Estimates: standard regression Level Variation deviation coefficient 1 within individuals ˆ σW = 0.41 ˆ βx = 1.033(0.046) 2 between individuals ˆ ωB = 1.14 ˆ βz = −1.316(0.206)

slide-76
SLIDE 76

Variance Component Models, December 2007 73

Comparison of estimates: within between individual individual Model ˆ βx sd, ˆ σW ˆ βz sd, ˆ ωB

  • 1.59
  • 1.23

x 1.028 (0.046) 0.41

  • 4.43

z

  • 1.59
  • 0.284 (0.201)

1.03 x, z 1.033 (0.046) 0.41

  • 1.316 (0.206)

1.14

slide-77
SLIDE 77

Variance Component Models, December 2007 74

Example: suicide and religion Ecological analysis

  • f regions:

% suicides increases with % protestants, i.e. Protestants are more likely to commit suicide Or??

slide-78
SLIDE 78

Variance Component Models, December 2007 75

level unit variation covariates 1 individuals within region, σ2

W

religion, x 2 regions between regions, ω2

B

% protestants, z True explanation: Interaction between individual effect (x) and region covariate (z) More suicides among catholics, in regions with many protestants