Analysis of variance and regression 2009-3-11 Lene Theil Skovgaard - PowerPoint PPT Presentation

Repeated Measurements, lts, 7-5-09 ◮ rep. Two-level model : ◮ Observations Y gdt (group, dog, time) ◮ Systematic effect of time_no and grp ◮ Random dog-level, Var ( a gd ) = ω 2 B ◮ Residual variation, within dogs, Var ( ε gdt ) = σ 2 W proc mixed data=dog; class grp time_no dog; model losmol=grp time_no grp*time_no / ddfm=satterth; random dog(grp); run; 31 / 114

Repeated Measurements, lts, 7-5-09 The options ddfm=satterth (- or kenwardrogers ): ◮ When the distributions are exact, they have no effect ◮ in balanced situations ◮ When approximations are necessary, these two are considered best ◮ in unbalanced situations, i.e for almost all observational designs ◮ in case of missing observations ◮ It may give rise to fractional degrees of freedom ◮ The computations may require a little more time, but in most cases this will not be noticable ◮ When in doubt, use it! 32 / 114

Repeated Measurements, lts, 7-5-09 This model assumes the so-called compound symmetry , i.e. that all measurements on the same individual are equally correlated : ω 2 B Corr ( Y gdt 1 , Y gdt 2 ) = ρ = ω 2 B + σ 2 W This means that the distance in time is not taken into account!! Observations are exchangeable 33 / 114

Repeated Measurements, lts, 7-5-09 Two-level model with random dog level: Class Levels Values grp 2 1 2 time_no 4 1 2 3 4 dog 11 1 2 3 4 5 6 7 8 9 10 11 Covariance Parameter Estimates P=0.08 for test of Standard Z interaction, Cov Parm Estimate Error Value Pr Z i.e. no convincing dog(grp) 0.06587 0.03532 1.86 0.0311 Residual 0.03554 0.009672 3.67 0.0001 indication of this. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time_no 3 27 21.35 <.0001 grp*time_no 3 27 2.50 0.0805 34 / 114

Repeated Measurements, lts, 7-5-09 Factor diagram: ✲ [ Dog ] Grp ✟✟✟ ✯ ✟ ✸ ✑ ✑✑✑✑ [ I ] = [ Dog ∗ Time ] ❍❍❍ ❥ ❍ ✲ Grp ∗ Time Time We have used the notation [ ] for the random effects, corresponding to variance components. We may note the following: ◮ The effect of Grp*Time is evaluated against Dog*Time ◮ If Grp*Time is not considered significant, we thereafter evaluate ◮ Time against Dog*Time ◮ Grp against Dog(Grp) 35 / 114

Repeated Measurements, lts, 7-5-09 The variance component model with random dog level specifies the covariance structure : ω 2 B + σ 2 ω 2 ω 2 ω 2 0 1 0 1 1 ρ ρ ρ W B B B ω 2 ω 2 B + σ 2 ω 2 ω 2 1 ρ ρ ρ A = ( ω 2 B + σ 2 B C B C B W B B W ) B ω 2 ω 2 ω 2 B + σ 2 ω 2 C B C ρ ρ 1 ρ @ @ A B B W B ω 2 ω 2 ω 2 ω 2 B + σ 2 1 ρ ρ ρ B B B W called the compound symmetry structure. The correlation ρ is here estimated to ω 2 B ρ = Corr ( Y gdt 1 , Y gdt 2 ) = ω 2 B + σ 2 W 0 . 06587 0 . 06587 + 0 . 03554 = 0 . 65 ≈ 36 / 114

Repeated Measurements, lts, 7-5-09 Note, that the specification ’random dog(grp);’ can be written in two other ways: random intercept / subject=dog(grp); repeated time / type=CS subject=dog(grp); In the following, we shall see generalizations of the constructions above. 37 / 114

Repeated Measurements, lts, 7-5-09 Compound symmetry analysis (just checking...) Covariance Parameter Estimates proc mixed data=dog; Cov Parm Subject Estimate class grp time_no dog; model losmol=grp time_no grp*time_no CS dog(grp) 0.06587 / ddfm=satterth; Residual 0.03554 repeated time / type=cs subject=dog(grp) rcorr; run; Fit Statistics -2 Res Log Likelihood 14.8 AIC (smaller is better) 18.8 Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 Type 3 Tests of Fixed Effects 1 1.0000 0.6496 0.6496 0.6496 Num Den 2 0.6496 1.0000 0.6496 0.6496 Effect DF DF F Value Pr > F 3 0.6496 0.6496 1.0000 0.6496 4 0.6496 0.6496 0.6496 1.0000 grp 1 9 2.85 0.1257 time_no 3 27 21.35 <.0001 grp*time_no 3 27 2.50 0.0805 38 / 114

Repeated Measurements, lts, 7-5-09 Since the interaction was not significant, we omit it from the model: Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value Pr Z dog(grp) 0.06453 0.03534 1.83 0.0339 Residual 0.04088 0.01056 3.87 <.0001 Solution for Fixed Effects Standard Effect grp time_no Estimate Error DF t Value Pr > |t| Intercept 0.5422 0.1235 9 4.39 0.0017 grp 1 0.2795 0.1656 9 1.69 0.1257 grp 2 0 . . . . time_no 1 0.1215 0.08621 30 1.41 0.1691 time_no 2 -0.2173 0.08621 30 -2.52 0.0173 time_no 3 -0.4608 0.08621 30 -5.35 <.0001 time_no 4 0 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time_no 3 30 17.66 <.0001 39 / 114

Repeated Measurements, lts, 7-5-09 The variance component model ( compound symmetry ) with random dog level specifies the covariance structure :   1 ρ ρ ρ ρ 1 ρ ρ ( ω 2 B + σ 2   W )   ρ ρ 1 ρ   ρ ρ ρ 1 But: The assumption of equal correlation for all pairs of observations taken on the same individual is not necessarily reasonable! Observations taken close to each other in time will often be more closely correlated than observations taken further apart! 40 / 114

Repeated Measurements, lts, 7-5-09 In the dog example, the empirical correlation matrix is   1 0 . 60 0 . 60 0 . 48 0 . 60 1 0 . 73 0 . 63     0 . 60 0 . 73 1 0 . 95   0 . 48 0 . 63 0 . 95 1 Rather large differences are seen between individual correlations. So what? 41 / 114

Repeated Measurements, lts, 7-5-09 Unstructured covariance If we do not assume any special structure for the covariance, we may let it be arbitrary = unstructured This is done in MIXED by using ’ type=UN ’ and remembering the option hlm : proc mixed data=dog; class grp dog time_no; model losmol=grp time_no grp*time_no / ddfm=satterth; repeated time_no / type=UN hlm subject=dog(grp) rcorr; run; 42 / 114

Repeated Measurements, lts, 7-5-09 Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.6010 0.5978 0.4817 2 0.6010 1.0000 0.7310 0.6336 3 0.5978 0.7310 1.0000 0.9464 4 0.4817 0.6336 0.9464 1.0000 Fit Statistics -2 Res Log Likelihood 2.3 AIC (smaller is better) 22.3 Type 3 Hotelling-Lawley-McKeon Statistics Num Den Effect DF DF F Value Pr > F time 3 7 90.97 <.0001 grp*time_no 3 7 4.91 0.0381 43 / 114

Repeated Measurements, lts, 7-5-09 Advantages with unstructured covariance ◮ We do not force a wrong covariance structure upon our observations. ◮ We gain some insight in the actual structure of the covariance. Drawbacks of the unstructured covariance ◮ We use quite a lot of parameters to describe the covariance structure. The result may therefore be unstable. ◮ It cannot be used for small data sets ◮ It can only be used in case of balanced data (all subjects have to be measured at identical times) Can we do something ’in between’? 44 / 114

Repeated Measurements, lts, 7-5-09 Comparison of covariance structures, using the likelihood ◮ Good models have large values of likelihood L and therefore small values of deviance : − 2 log L ◮ Use differences in deviances ( ∆ = − 2 log Q ) and compare to χ 2 with degrees of freedom equal to the difference in parameters Comparison of compound symmetry and unstructured covariance: − 2 log Q = 14 . 8 − 2 . 3 = 12 . 5 ∼ χ 2 ( 10 − 2 ) = χ 2 ( 8 ) ⇒ P = 0 . 13. 45 / 114

Repeated Measurements, lts, 7-5-09 ◮ Default likelihood is the REML -likelihood, where the mean value structure has been ’eliminated’ ◮ The traditional likelihood may be obtained using an extra option: proc mixed method=ml; ◮ Comparison of covariance structures: Use either of the two likelihoods ◮ Comparison of mean value structures: Use only the traditional likelihood ( ML ) 46 / 114

Repeated Measurements, lts, 7-5-09 Autoregressive structure of first order In case of equidistant times , this specifies the following covariance structure ρ 2 ρ 3  1  ρ ρ 2 ρ 1 ρ σ 2     ρ 2 ρ 1 ρ   ρ 3 ρ 2 1 ρ i.e. the correlation decreases (in powers) with the distance between observations. The non-equidistant analogue is Corr ( Y gdt 1 , Y gdt 2 ) = ρ | t 1 − t 2 | 47 / 114

Repeated Measurements, lts, 7-5-09 The autoregressive correlation pattern: 48 / 114

Repeated Measurements, lts, 7-5-09 Autoregressive structure of first order ( TYPE=AR(1) ) Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.7950 0.6321 0.5025 2 0.7950 1.0000 0.7950 0.6321 3 0.6321 0.7950 1.0000 0.7950 4 0.5025 0.6321 0.7950 1.0000 Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z AR(1) dog(dog) 0.7950 0.09035 8.80 <.0001 Residual 0.1114 0.04188 2.66 0.0039 Fit Statistics -2 Res Log Likelihood 9.8 AIC (smaller is better) 13.8 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 8.89 2.49 0.1497 time_no 3 25.6 29.97 <.0001 grp*time_no 3 25.6 2.94 0.0522 49 / 114

Repeated Measurements, lts, 7-5-09 Note: Comparison of models with different covariance structures using a χ 2 -test on − 2 log Q (the difference between − 2 log L ’s) requires, that the models are nested This is not the case for CS and AR(1) ! Therefore, we have to compare both of them with the model which combines the two covariance structures: proc mixed data=dog; class grp dog time_no; model losmol = grp time_no grp*time_no / ddfm=satterth; random intercept / subject=dog(grp) vcorr; repeated time_no / type=AR(1) subject=dog(grp); run; 50 / 114

Repeated Measurements, lts, 7-5-09 In case of equidistant times , this combined model specifies the following covariance structure ω 2 + σ 2 ω 2 + σ 2 ρ ω 2 + σ 2 ρ 2 ω 2 + σ 2 ρ 3   ω 2 + σ 2 ρ ω 2 + σ 2 ω 2 + σ 2 ρ ω 2 + σ 2 ρ 2   ω 2 + σ 2 ρ 2 ω 2 + σ 2 ρ ω 2 + σ 2 ω 2 + σ 2 ρ     ω 2 + σ 2 ρ 3 ω 2 + σ 2 ρ 2 ω 2 + σ 2 ρ ω 2 + σ 2 51 / 114

Repeated Measurements, lts, 7-5-09 Estimated V Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.7930 0.6381 0.5222 2 0.7930 1.0000 0.7930 0.6381 3 0.6381 0.7930 1.0000 0.7930 4 0.5222 0.6381 0.7930 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate dog(grp) 0.01966 AR(1) dog(grp) 0.7483 Residual 0.09103 Fit Statistics -2 Res Log Likelihood 9.8 AIC (smaller is better) 15.8 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 8.88 2.49 0.1493 time_no 3 17.2 29.53 <.0001 grp*time_no 3 17.2 2.93 0.0633 52 / 114

Repeated Measurements, lts, 7-5-09 Comparison of covariance structures: cov. Model par. ∆ = − 2 log Q df P -2 log L UN 2.3 10 7.5 7 0.38 both AR(1) and CS 9.8 3 0.0 1 1.00 AR(1) 9.8 2 5.0 1 0.025 CS= random dog 14.8 2 Conclusions? ◮ The autoregressive structure is probably the best compromise. ◮ Our data set is too small! 53 / 114

Repeated Measurements, lts, 7-5-09 What, if we had had double or triple measurements at each time? ◮ If we always have the same number of repetitions, a correct and optimal approach is to analyze averages ◮ If the number of repetitions vary , analysis of averages may still be valid (depends on the reason for the unbalance), although not optimal ◮ The easiest approach is to modify the random -statement to: random dog dog*time_no; 54 / 114

Repeated Measurements, lts, 7-5-09 Actually, the times are not equidistant ! Measurements are taken at 50,110,170 and 290 minutes Then what?? The non-equidistant analogue to the autoregressive structure is Corr ( Y gdt 1 , Y gdt 2 ) = ρ | t 1 − t 2 | which is written as TYPE=SP(POW)(time) For technical reasons, we have to rescale time to hours=time/60 proc mixed covtest data=dog; class grp hours dog; model losmol=grp hours grp*hours / s ddfm=satterth; repeated hours / subject=dog(grp) type=sp(exp)(hours) rcorr; run; 55 / 114

Repeated Measurements, lts, 7-5-09 Class Level Information Class Levels Values grp 2 1 2 hours 4 0.8333333333 1.8333333333 2.8333333333 4.8333333333 dog 11 1 2 3 4 5 6 7 8 9 10 11 Estimated R Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.8064 0.6502 0.4228 2 0.8064 1.0000 0.8064 0.5243 3 0.6502 0.8064 1.0000 0.6502 4 0.4228 0.5243 0.6502 1.0000 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9.31 2.56 0.1433 hours 3 25.5 23.23 <.0001 grp*hours 3 25.5 2.78 0.0614 56 / 114

Repeated Measurements, lts, 7-5-09 Example: Calcium supplement or placebo for adolescent women, to improve the rate of bone gain g Outcome : BMD=bone mineral density, in cm 2 , measured every 6 months (5 visits) 57 / 114

Repeated Measurements, lts, 7-5-09 Factor diagram: ✲ [ Girl ] Grp ✯ ✟ ✟✟✟ ✸ ✑ ✑✑✑✑ [ I ] = [ Girl ∗ Visit ] ❍❍❍ ❥ ❍ ✲ Grp ∗ Visit Visit Two-level model with : ◮ Observations Y git (group=g, girl=individual=i, visit=time=t) ◮ Systematic (fixed) effects of group and visit, with a possible interaction ◮ Random girl-level, Var ( a gi ) = ω 2 B ◮ Residual variation, within girls, Var ( ε git ) = σ 2 W 58 / 114

Repeated Measurements, lts, 7-5-09 Variance component model (same as for dog example): Y git = µ + α g + β t + γ gt + a gi + ε git where Var ( a gi ) = ω 2 Var ( ε git ) = σ 2 B , W Like previously, we have assumed compound symmetry , i.e. that all measurements on the same girl are equally correlated : ω 2 B Corr ( Y git 1 , Y git 2 ) = ρ = ω 2 B + σ 2 W 59 / 114

Repeated Measurements, lts, 7-5-09 Empirical correlation structure: Row COL1 COL2 COL3 COL4 COL5 1 1.00000000 0.96987049 0.94138162 0.92499715 0.89865454 2 0.96987049 1.00000000 0.97270895 0.95852788 0.93987185 3 0.94138162 0.97270895 1.00000000 0.98090996 0.95919348 4 0.92499715 0.95852788 0.98090996 1.00000000 0.97553849 5 0.89865454 0.93987185 0.95919348 0.97553849 1.00000000 Is compound symmetry reasonable? Other possibilities: ◮ Unstructured: T ( T + 1 ) = 15 covariance parameters ( T = 5) 2 ◮ ’patterned’, e.g. an autoregressive structure ◮ random regression 60 / 114

Repeated Measurements, lts, 7-5-09 Compound symmetry results for the calcium example: Covariance Parameter Estimates (REML) Cov Parm Estimate GIRL(GRP) 0.00443925 Residual 0.00023471 Tests of Fixed Effects Source NDF DDF Type III F Pr > F GRP 1 110 2.63 0.1078 VISIT 4 382 619.42 0.0001 GRP*VISIT 4 382 5.30 0.0004 No doubt, we see an interaction GRP*VISIT , or? 61 / 114

Repeated Measurements, lts, 7-5-09 Results: Autoregressive covariance structure: Covariance Parameter Estimates (REML) Cov Parm Subject Estimate AR(1) GIRL(GRP) 0.97083335 ρ 2 ρ 3 ρ 4  1 ρ  Residual 0.00441242 ρ 2 ρ 3 1 ρ ρ   Tests of Fixed Effects σ 2  ρ 2 ρ 2  1 ρ ρ   Source NDF DDF Type III F Pr > F   ρ 3 ρ 2 ρ 1 ρ   GRP 1 110 2.74 0.1005 ρ 4 ρ 3 ρ 2 ρ 1 VISIT 4 381 233.91 0.0001 GRP*VISIT 4 381 2.86 0.0232 62 / 114

Repeated Measurements, lts, 7-5-09 Comparison of test results for the test of no interaction GRP*VISIT : Covariance structure Test statistic ∼ distribution P value Independence 0.35 ∼ F(4,491) 0.84 Compound symmetry 5.30 ∼ F(4,382) 0.0004 Autoregressive 2.86 ∼ F(4,382) 0.023 + local 2.90 ∼ F(4,205) 0.023 Unstructured 2.72 ∼ F(4,107) 0.034 63 / 114

Repeated Measurements, lts, 7-5-09 Predicted mean time profiles are almost identical for all choices of covariance structures ◮ For balanced designs, they agree completely and equals simple averages ◮ They agree for time points with no missing values (here the first visit) 64 / 114

Repeated Measurements, lts, 7-5-09 Predicted profiles for the unstructured covariance: ◮ The evolution over time looks pretty linear ◮ Include time=visit as a quantitative covariate? ◮ What about the baseline difference? 65 / 114

Repeated Measurements, lts, 7-5-09 Test of linear time trend ( time=visit , not included in the class -statement): proc mixed data=calcium; class grp girl visit; model bmd=grp time grp*time visit grp*visit / ddfm=satterth; repeated visit / type=UN subject=girl(grp); run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 110 0.36 0.5485 time 0 . . . time*grp 0 . . . visit 3 97.7 3.61 0.0160 / grp*visit 3 97.7 1.03 0.3849 ----not significant \ 66 / 114

Repeated Measurements, lts, 7-5-09 proc mixed data=calcium; class grp girl visit; model bmd=grp time grp*time visit / s ddfm=satterth; repeated visit / type=UN subject=girl(grp) r; run; Solution for Fixed Effects Standard Effect grp visit Estimate Error DF t Value Pr > |t| Intercept 0.8699 0.01220 138 71.29 <.0001 grp C 0.006565 0.01131 109 0.58 0.5629 grp P 0 . . . . time 0.01755 0.001825 118 9.62 <.0001 time*grp C 0.004330 0.001520 97.2 2.85 0.0054 time*grp P 0 . . . . visit 1 -0.01765 0.006013 95.8 -2.94 0.0042 visit 2 -0.01384 0.004246 95.1 -3.26 0.0016 visit 3 -0.00680 0.002370 93.6 -2.87 0.0050 visit 4 0 . . . . visit 5 0 . . . . 67 / 114

Repeated Measurements, lts, 7-5-09 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 109 0.34 0.5629 time 0 . . . time*grp 1 97.2 8.12 0.0054 visit 3 98.8 3.65 0.0151 There is some deviation from linearity (P=0.0151), which we ought to investigate further.... Tendency to slower increase with time Transformation, etc.... 68 / 114

Repeated Measurements, lts, 7-5-09 The time course is reasonably linear, but maybe the girls have different growth rates (slopes) ? If we let Y git denote BMD for the i’th girl (in the g’th group) at time t (t=1, · · · ,5), we could look at the model: ε git ∼ N ( 0 , σ 2 y git = a gi + b gi t + ε git , W ) But we cannot allow all these fixed parameters in the model! 69 / 114

Repeated Measurements, lts, 7-5-09 We might fit a straight line for each girl: Slopes in the Calcium-group seems to be bigger.... 70 / 114

Repeated Measurements, lts, 7-5-09 Results from individual regression: Group level at age 11 slope P 0.8697 (0.0086) 0.0206 (0.0014) C 0.8815 (0.0088) 0.0244 (0.0014) difference 0.0118 (0.0123) 0.0039 (0.0019) P 0.34 0.050 71 / 114

Repeated Measurements, lts, 7-5-09 We generalize the idea of a random level to Random regression: We let each individual (girl) have ◮ her own level a gi ◮ her own slope b gi but 72 / 114

Repeated Measurements, lts, 7-5-09 ... we bind these individual ’parameters’ ( a gi and b gi ) together by normal distributions � a gi �� α g � � � ∼ N 2 , G b gi β g � τ 2 τ 2 � � � ω ρτ a τ b a a G = = τ 2 τ 2 ω ρτ a τ b b b G describes the population variation of the lines , i.e. the inter-individual variation (as seen on p. 70). 73 / 114

Repeated Measurements, lts, 7-5-09 We estimate in this model by writing: proc mixed covtest data=calcium; class grp girl; model bmd=grp time time*grp / ddfm=satterth s; random intercept time / type=un subject=girl(grp) g v vcorr; run; Estimated G Matrix Row Effect grp girl Col1 Col2 1 Intercept C 101 0.004105 3.733E-6 2 time C 101 3.733E-6 0.000048 Estimated V Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col5 1 0.004285 0.004211 0.004263 0.004314 0.004366 2 0.004211 0.004435 0.004410 0.004509 0.004608 3 0.004263 0.004410 0.004681 0.004703 0.004850 4 0.004314 0.004509 0.004703 0.005022 0.005092 5 0.004366 0.004608 0.004850 0.005092 0.005459 74 / 114

Repeated Measurements, lts, 7-5-09 Estimated V Correlation Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col5 1 1.0000 0.9660 0.9518 0.9300 0.9027 2 0.9660 1.0000 0.9677 0.9553 0.9364 3 0.9518 0.9677 1.0000 0.9700 0.9594 4 0.9300 0.9553 0.9700 1.0000 0.9725 5 0.9027 0.9364 0.9594 0.9725 1.0000 Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) girl(grp) 0.004105 0.000575 7.13 <.0001 UN(2,1) girl(grp) 3.733E-6 0.000053 0.07 0.9435 UN(2,2) girl(grp) 0.000048 8.996E-6 5.30 <.0001 Residual 0.000125 0.000010 11.99 <.0001 Fit Statistics -2 Res Log Likelihood -2341.6 AIC (smaller is better) -2333.6 75 / 114

Repeated Measurements, lts, 7-5-09 Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.8471 0.008645 110 97.98 <.0001 grp C 0.007058 0.01234 110 0.57 0.5685 grp P 0 . . . . time 0.02242 0.001098 95.8 20.42 <.0001 time*grp C 0.004494 0.001571 96.4 2.86 0.0052 time*grp P 0 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 110 0.33 0.5685 time 1 96.4 985.55 <.0001 time*grp 1 96.4 8.18 0.0052 Thus, we find an extra increase in BMD of 0.0045(0.0016) g per cm 3 per half year , when giving calcium supplement. 76 / 114

Repeated Measurements, lts, 7-5-09 Note concerning MIXED-notation ◮ It is necessary to use TYPE=UN in the RANDOM-statement in order to allow intercept and slope to be arbitrarily correlated ◮ Default option in RANDOM is TYPE=VC , which only specifies variance components with different variances ◮ If TYPE=UN is omitted, we may experience convergence problems and sometimes totally incomprehensible results. In this particular case, the correlation between intercept and slope is not that impressive - actually only 0.0084 - (intercept is not completely out of range in this example, referring to visit=0 ). 77 / 114

Repeated Measurements, lts, 7-5-09 It turns out, that ◮ the girls are only seen approximately twice a year The actual dates are available and are translated into ctime , the internal date representation in SAS, denoting days since .... We can no longer use the construction type=UN , but still the random -statement and the CS in the repeated -statement. A lot of other covariance structures will still be possible, e.g. the generalization of the autoregressive type=AR(1) , as we already used for the dog-example: type=SP(POW)(ctime) 78 / 114

Repeated Measurements, lts, 7-5-09 Furthermore, ◮ the girls were not precisely 11 years at the first visit As a covariate, we ought to have the specific age of the girl, but unfortunately, these are not available. Note, that this will mostly affect the intercept estimates! 79 / 114

Repeated Measurements, lts, 7-5-09 If we use the newly constructed ctime as covariate: proc mixed covtest data=calcium; class grp girl; model bmd=grp ctime ctime*grp / ddfm=satterth s; random intercept ctime / type=un subject=girl(grp) g; run; Iteration History Iteration Evaluations -2 Res Log Like Criterion 0 1 -1221.35800531 1 2 -2316.64715219 0.02023229 2 1 -2316.64847895 0.02011117 3 1 -2316.64847962 0.02010938 4 1 -2316.64848338 0.02010936 47 1 -2317.30142024 0.01737561 48 1 -2317.30142030 0.01737561 49 1 -2317.30142036 0.01737561 50 1 -2317.30142043 0.01737561 WARNING: Did not converge. 80 / 114

Repeated Measurements, lts, 7-5-09 The variable ctime has much too large values, with a very small range, and we get numerical instability . We normalise, to approximate age or age11 : age=(ctime-11475)/365.25+12; age11=age-11; /* intercept at age 11 */ Variable N Mean Minimum Maximum ------------------------------------------------------- ctime 501 11475.08 11078.00 11931.00 bmd 501 0.9219202 0.7460000 1.1260000 visit 560 3.0000000 1.0000000 5.0000000 age 501 12.0002186 10.9130732 13.2484600 age11 501 1.0002186 -0.0869268 2.2484600 ------------------------------------------------------- 81 / 114

Repeated Measurements, lts, 7-5-09 Random regression, covariate age : y gpt = a gp + b gp ( age-11 ) + ε gpt 82 / 114

Repeated Measurements, lts, 7-5-09 Random regression, using actual age ( age11=age-11 ): proc mixed covtest data=calcium; class grp girl; model bmd=grp age11 age11*grp / ddfm=satterth s outpm=predicted_mean; random intercept age11 / type=un subject=girl(grp) g vcorr; run; Estimated G Matrix Row Effect grp girl Col1 Col2 1 Intercept C 101 0.004215 0.000095 2 age11 C 101 0.000095 0.000180 Estimated V Correlation Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col5 1 1.0000 0.9664 0.9537 0.9321 0.9056 2 0.9664 1.0000 0.9687 0.9566 0.9385 3 0.9537 0.9687 1.0000 0.9697 0.9590 4 0.9321 0.9566 0.9697 1.0000 0.9723 5 0.9056 0.9385 0.9590 0.9723 1.0000 83 / 114

Repeated Measurements, lts, 7-5-09 Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) girl(grp) 0.004215 0.000580 7.26 <.0001 UN(2,1) girl(grp) 0.000095 0.000104 0.91 0.3617 UN(2,2) girl(grp) 0.000180 0.000034 5.21 <.0001 Residual 0.000124 0.000010 12.01 <.0001 Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.8667 0.008688 110 99.75 <.0001 / grp C 0.01113 0.01240 110 0.90 0.3715 ---- grp P 0 . . . . \ age11 0.04529 0.002152 96 21.05 <.0001 / age11*grp C 0.008891 0.003076 96.6 2.89 0.0048 ---- age11*grp P 0 . . . . \ In this model, we quantify the effect of a calcium supplement to 0.0089 (0.0031) g per cm 3 per year . 84 / 114

Repeated Measurements, lts, 7-5-09 Results from random regression: Group level at age 11 slope P 0.8667 (0.0087) 0.0453 (0.0022) C 0.8778 (0.0088) 0.0542 (0.0022) difference 0.0111 (0.0124) 0.0089 (0.0031) P 0.37 0.0048 Compare to results from individual regressions (page 71): 85 / 114

Repeated Measurements, lts, 7-5-09 Comparison of slopes for different covariance structures: Covariance − 2 log L cov.par. AIC Difference structure in slopes P Independence -1245.0 1 -1243.0 0.0094 (0.0086) 0.27 Compound -2251.7 2 -2247.7 0.0089 (0.0020) < 0.0001 symmetry Exponential -2372.0 2 -2368.0 0.0094 (0.0032) 0.0038 (autoregressive) Random -2350.1 4 -2342.1 0.0089 (0.0031) 0.0048 regression 86 / 114

Repeated Measurements, lts, 7-5-09 Predicted values from random regression It looks as if there is a difference right from the start (although we have previously seen this to be insignificant , P=0.37). Baseline adjustment? 87 / 114

Repeated Measurements, lts, 7-5-09 It the first visit is a baseline measurement (which it is), and randomization has been performed: ◮ The two groups are known to be equal at baseline ◮ To include this measurement in the comparison between groups ◮ may weaken a possible difference between these (type 2 error) ◮ may convert a treatment effect to an interaction ◮ Dissimilarities may be present in small studies ◮ For ’slowly varying’ outcomes, even a small difference may produce non-treatment related differences, i.e. bias 88 / 114

Repeated Measurements, lts, 7-5-09 Simulated situation : ◮ Baseline + 2 follow-ups ◮ 2 groups, 20 individuals in each group ◮ Group 1: constant 5 ◮ Group 2: 5 at baseline, 6 at follow-ups ◮ Correlation between repeated measurements: 0.9 89 / 114

Repeated Measurements, lts, 7-5-09 Analysis of all 3 times: proc mixed; class grp time individual; model outcome=grp time grp*time / ddfm=satterth s; random individual(grp); run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 38 1.08 0.3059 time 2 76 25.95 <.0001 grp*time 2 76 16.86 <.0001 ◮ Simulation of constant group difference ◮ Finding: Significant interaction! 90 / 114

Repeated Measurements, lts, 7-5-09 Analysis of follow-up times, with baseline as covariate: proc mixed; where time>1; class grp time individual; model outcome=baseline grp time grp*time / ddfm=satterth s; random individual(grp); run; Solution for Fixed Effects Standard Effect grp time Estimate Error DF t Value Pr > |t| Intercept 1.5769 0.4366 37.8 3.61 0.0009 baseline 0.8743 0.08197 37 10.67 <.0001 grp 1 -0.7825 0.1642 49.6 -4.76 <.0001 grp 2 0 . . . . time 2 -0.1516 0.08975 38 -1.69 0.0994 time 3 0 . . . . grp*time 1 2 0.07651 0.1269 38 0.60 0.5502 grp*time 1 3 0 . . . . grp*time 2 2 0 . . . . grp*time 2 3 0 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F baseline 1 37 113.76 <.0001 grp 1 37 24.14 <.0001 time 1 38 3.19 0.0821 grp*time 1 38 0.36 0.5502 91 / 114

Repeated Measurements, lts, 7-5-09 Approaches for handling baseline differences : ◮ Use follow-up data only (exclude baseline from analysis) - most reasonable if correlation between repeated measurements is very low ◮ Subtract baseline from successive measurements - most reasonable if correlation between repeated measurements is very high ◮ Use baseline measurement as a covariate - may be used for any degree of correlation 92 / 114

Repeated Measurements, lts, 7-5-09 Baseline included as a covariate ◮ will hardly change the results for the slopes – since these are within-individual quantities A small change is expected because of the exclusion of visit 1 from the analysis, and because slope is correlated with.... ◮ may affect the difference between groups at fixed ages – e.g. endpoint age of 13 years 93 / 114

Repeated Measurements, lts, 7-5-09 Using age13 as covariate : proc mixed covtest noclprint data=calcium; class grp girl; model bmd=grp age13 grp*age13 / ddfm=satterth s; random intercept age13 / type=un subject=girl(grp) g; run; Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.9573 0.009819 108 97.49 <.0001 grp C 0.02891 0.01402 108 2.06 0.0416 grp P 0 . . . . age13 0.04529 0.002152 96 21.05 <.0001 age13*grp C 0.008891 0.003076 96.6 2.89 0.0048 age13*grp P 0 . . . . Estimated gain at the age 13: 0.0289 (0.0140) g per cm 3 94 / 114

Repeated Measurements, lts, 7-5-09 Excluding baseline (4 visits only), without baseline as covariate: proc mixed covtest noclprint data=calcium; where visit>1; class grp girl; model bmd=grp age13 grp*age13 / ddfm=satterth s; random intercept age13 / type=un subject=girl(grp) g; run; Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.9574 0.009721 102 98.49 <.0001 grp C 0.02474 0.01383 102 1.79 0.0765 grp P 0 . . . . age13 0.04634 0.002288 92.3 20.25 <.0001 age13*grp C 0.007456 0.003277 92.5 2.28 0.0252 age13*grp P 0 . . . . Estimated gain at the age 13: 0.0247 (0.0138) g per cm 3 95 / 114

Repeated Measurements, lts, 7-5-09 Including baseline as covariate proc mixed covtest noclprint data=calcium; where visit>1; class grp girl; model bmd=baseline grp age13 grp*age13 / ddfm=satterth s; random intercept age13 / type=un subject=girl(grp) g; run; Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.01825 0.02690 106 0.68 0.4989 baseline 1.0797 0.03054 102 35.36 <.0001 grp C 0.01728 0.006236 101 2.77 0.0067 grp P 0 . . . . age13 0.04597 0.002287 93.1 20.11 <.0001 age13*grp C 0.007419 0.003276 93.2 2.26 0.0258 age13*grp P 0 . . . . Estimated gain at the age 13: 0.0173 (0.0062) g per cm 3 96 / 114

Repeated Measurements, lts, 7-5-09 Including baseline as covariate ◮ explains some (but not all) of the difference between groups at age 13 without baseline: 0.0247 (0.0138) baseline as covariate: 0.0173 (0.0062) ◮ increases the precision of the estimated difference (standard error becomes smaller) It even becomes significant! 97 / 114

Repeated Measurements, lts, 7-5-09 Specification of mixed models: ◮ Systematic variation: ◮ Between-individual covariates: treatment, sex, age, baseline value... ◮ Within-individual covariates: time, cumulative dose, temperature... is specified “as usual” ◮ Random variation ◮ Interactions between systematic and random effects are always random 98 / 114

Repeated Measurements, lts, 7-5-09 Sources of random variation: 1. Random effects: 2. Serial correlation: 3. Measurement error: 99 / 114

Repeated Measurements, lts, 7-5-09 SAS, PROC MIXED ◮ model describes the systematic part (fixed effects, mean value structure) ◮ random describes the random effects ◮ repeated describes the serial correlation ◮ local adds an additional measurement error 100 / 114

Analysis of variance and regression 2009-3-11 Lene Theil Skovgaard - PowerPoint PPT Presentation

Repeated Measurements, lts, 7-5-09 Analysis of variance and regression 2009-3-11 Lene Theil Skovgaard Repeated measurements May 7, 2009 lts Repeated Measurements, lts, 7-5-09 Repeated measurements over time Introduction. Presentation of

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression December 4, 2007 Variance component models Variance

Analysis of variance and regression Other types of regression models Other types of regression

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Analysis of variance and regression November 27, 2007 Other types of regression models Counts

High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

3.3 Variance and Standard Deviation recap Anna Karlin Most Slides by Alex Tsun Agenda

802.1 Plenary March 2018 Rosemont, IL, USA Opening Agenda Glenn Parsons IEEE 802.1 WG

802.1 Plenary November 2017 Orlando, FL, USA Opening Agenda Glenn Parsons IEEE 802.1 WG

Crash Course into the New Finnish Government and HQ Communication Crash Course into the New

Networking Breakfast & ASSURED-JIVE Talk #2 INNOTRANS 2018 9:30 12:00 @ UITP Stand

AP BIOLOGY Investigation #4 Diffusion & Osmosis Summer 2014 www.njctl.org Slide 3 / 35

OSMOSIS Open Source Monitoring Security Issues HACKITO ERGO SUM 2014 / April 2014 / Paris AGENDA

Apache spark on planet scale Denis Chaplygin Software engineer @ Wolt Jan 2020 This

Regularity results for a penalized boundary obstacle problem Donatella Danielli Purdue

Analysis of variance and regression 2009-3-11 Lene Theil Skovgaard - PowerPoint PPT Presentation

Repeated Measurements, lts, 7-5-09 Analysis of variance and regression 2009-3-11 Lene Theil Skovgaard Repeated measurements May 7, 2009 lts Repeated Measurements, lts, 7-5-09 Repeated measurements over time Introduction. Presentation of

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression December 4, 2007 Variance component models Variance

Analysis of variance and regression Other types of regression models Other types of regression

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Analysis of variance and regression November 27, 2007 Other types of regression models Counts

High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

3.3 Variance and Standard Deviation recap Anna Karlin Most Slides by Alex Tsun Agenda

802.1 Plenary March 2018 Rosemont, IL, USA Opening Agenda Glenn Parsons IEEE 802.1 WG

802.1 Plenary November 2017 Orlando, FL, USA Opening Agenda Glenn Parsons IEEE 802.1 WG

Crash Course into the New Finnish Government and HQ Communication Crash Course into the New

Networking Breakfast &amp; ASSURED-JIVE Talk #2 INNOTRANS 2018 9:30 12:00 @ UITP Stand

AP BIOLOGY Investigation #4 Diffusion &amp; Osmosis Summer 2014 www.njctl.org Slide 3 / 35

OSMOSIS Open Source Monitoring Security Issues HACKITO ERGO SUM 2014 / April 2014 / Paris AGENDA

Apache spark on planet scale Denis Chaplygin Software engineer @ Wolt Jan 2020 This

Regularity results for a penalized boundary obstacle problem Donatella Danielli Purdue

Networking Breakfast & ASSURED-JIVE Talk #2 INNOTRANS 2018 9:30 12:00 @ UITP Stand

AP BIOLOGY Investigation #4 Diffusion & Osmosis Summer 2014 www.njctl.org Slide 3 / 35