Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016 Section 1 - PowerPoint PPT Presentation

Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016

Section 1 Linear modelling assumptions

Assumptions y i = β 0 + β 1 × x i + ε i ϵ i ∼ N (0 , σ 2 )

Linear modelling assumptions y i = β 0 + β 1 × x i + ε i ϵ i ∼ N (0 , σ 2 ) Homogeneity of variance   σ 2 . 0 0 ··· . .  σ 2  0 . ··· σ 2 )   y i = β 0 + β 1 × x i + ε i ε i ∼ N ( 0 , . V = cov = . . .   . . σ 2 � ��  . .  ··· Linearity Normality σ 2 0 . ··· ··· Zero covariance (=independence) . . .

Dealing with Heterogeneity y x 41.9 1 48.5 2 43 3 51.4 4 51.2 5 37.7 6 50.7 7 65.1 8 51.7 9 38.9 10 70.6 11 51.4 12 62.7 13 34.9 14 95.3 15 63.9 16

Mean Median :51.30 Max. 3rd Qu.:12.25 3rd Qu.:63.00 : 8.50 Mean :53.68 > data1 <- read.csv ('../data/D1.csv') Median : 8.50 1st Qu.: 4.75 Max. 1st Qu.:42.73 : 1.00 Min. :34.90 Min. x y :16.00 :95.30 Dealing with Heterogeneity > summary (data1) y i = β 0 + β 1 × x i + ε i ϵ i ∼ N (0 , σ 2 ) • estimate β 0 , β 1 and σ 2

Dealing with Heterogeneity

Dealing with Heterogeneity   σ 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0     σ 2 0 0 0 0 0 0 0 0 0      0 0 0 0 0    σ 2  0 0 0 0 0 0 0 0 0     0 0 0 0 0    σ 2  0 0 0 0 0 0 0 0 0      0 0 0 0 0     σ 2 0 0 0 0 0 0 0 0 0     0 0 0 0 0     σ 2 0 0 0 0 0 0 0 0 0     0 0 0 0 0     σ 2 0 0 0 0 0 0 0 0 0     0 0 0 0 0     σ 2 0 0 0 0 0 0 0 0 0   V = cov =   0 0 0 0 0     σ 2 0 0 0 0 0 0 0 0 0       Variance-covariance matrix

Dealing with Heterogeneity Homogeneity of variance   σ 2 . 0 0 ··· . .  σ 2  0 . ··· σ 2 )   ε i ∼ N ( 0 , . y i = β 0 + β 1 × x i + ε i . V = cov = . .   . . σ 2 � ��  . .  ··· Linearity Normality σ 2 . 0 ··· ··· Zero covariance (=independence) . . .     · · · σ 2 · · · 1 0 0 0 0 . .     . . · · · σ 2 · · · 0 1 . 0 . V = σ 2 ×     =     . . . .     . . . . σ 2 · · · 1 · · · . . . .     σ 2 0 · · · · · · 1 0 · · · · · · � �� Identity matrix Variance-covariance matrix

Dealing with Heterogeneity ● ● 90 80 70 ● y ● ● ● 60 ● ● ● ● ● 50 ● ● ● 40 ● ● ● 5 10 15 x • variance proportional to X • variance inversely proportional to X

Dealing with Heterogeneity • variance inversely proportional to X σ 2 ×  1   0 · · · √ 1 0 · · · 0 X 1 .  . σ 2 × .   1 . 0 · · · .  0 1 · · · . √   V = σ 2 × X ×  X 2 =   . .  . .   σ 2 × . . . .  1 · · · 1 · · · . . . .   √  X i · · · · · · 0 1 0 · · · · · · σ X n � �� Identity matrix Variance-covariance matrix

Dealing with Heterogeneity   1 0 · · · 0 √ X 1 .   . 1 · · · 0 .   √ V = σ 2 × ω ,  X 2  where ω =  . .  . .  1  · · · . . √   X i 1 · · · · · · 0 √ X n � �� Weights matrix

> 1/ sqrt (data1$x) [1] 1.0000000 0.7071068 0.5773503 0.5000000 0.4472136 0.4082483 0.3779645 0.3535534 0.3333333 [10] 0.3162278 0.3015113 0.2886751 0.2773501 0.2672612 0.2581989 0.2500000 Dealing with Heterogeneity Calculating weights

Generalized least squares (GLS) 1. use OLS to estimate fixed effects 2. use these estimates to estimate variances via ML 3. use these to re-estimate fixed effects (OLS)

Generalized least squares (GLS) ML is biased (for variance) when N is small: • use REML • max. likelihood of residuals rather than data

varIdent(form= |A) varExp(form= x) varComb(form= x|A) varPower(form= x) varFixed( x) varConstPower(form= x) Variance structures Variance function Variance structure Description V = σ 2 × x variance proportional to ฀x฀ (the covari- ate) V = σ 2 × e 2 δ × x variance proportional to the expo- nential of ฀x฀ raised to a con- stant power V x variance proportional to the absolute value of ฀x฀ raised to a con- stant power V x a variant on the power function V I when A is a factor, variance is al- lowed to be dif- ferent for each level (j) of the factor V x I combination of two of the above

+ method='REML') method='REML') > library (nlme) + > library (nlme) Generalized least squares (GLS) > data1.gls <- gls (y~x, data1, > plot (data1.gls) ● 2 Standardized residuals 1 ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −1 ● −2 ● 45 50 55 60 65 Fitted values > data1.gls1 <- gls (y~x, data=data1, weights= varFixed (~x), > plot (data1.gls1) ● 2 ed residuals ● 1 ●

> library (nlme) + method='REML') Generalized least squares (GLS) > data1.gls2 <- gls (y~x, data=data1, weights= varFixed (~x^2), > plot (data1.gls2) ● 1.5 ● ● Standardized residuals 1.0 ● ● 0.5 ● ● 0.0 ● ● ● −0.5 ● ● ● −1.0 ● ● −1.5 ● 45 50 55 60 65 Fitted values

fitted (data1.gls2)) > plot ( resid (data1.gls) ~ + > plot ( resid (data1.gls2) ~ fitted (data1.gls)) + Generalized least squares (GLS) g r o n w 30 ● 20 resid(data1.gls) ● ● 10 ● ● ● ● ● 0 ● ● ● ● ● ● −20 ● ● 45 50 55 60 65 fitted(data1.gls)

fitted (data1.gls2)) > plot ( resid (data1.gls,'normalized') ~ + > plot ( resid (data1.gls2,'normalized') ~ fitted (data1.gls)) + Generalized least squares (GLS) T R E C C O R resid(data1.gls, "normalized") ● 2 1 ● ● ● ● ● ● 0 ● ● ● ● ● ● −1 ● ● −2 ● 45 50 55 60 65 fitted(data1.gls)

> plot ( resid (data1.gls2,'normalized') ~ data1$x) > plot ( resid (data1.gls,'normalized') ~ data1$x) Generalized least squares (GLS) resid(data1.gls, "normalized") ● 2 1 ● ● ● ● ● ● 0 ● ● ● ● ● ● ● −1 ● −2 ● 5 10 15 data1$x resid(data1.gls2, "normalized") 1.5 ● ● ● ● ● 0.5 ● ● ● ●

3 118.9904 120.9076 -56.49519 data1.gls > #OR > anova (data1.gls, data1.gls1, data1.gls2) Model df AIC BIC logLik 1 data1.gls2 3 127.6388 129.5559 -60.81939 data1.gls1 2 3 121.0828 123.0000 -57.54142 data1.gls2 3 3 120.9904 3 123.0828 > AIC (data1.gls, data1.gls1, data1.gls2) data1.gls2 df AIC data1.gls 3 127.6388 data1.gls1 3 121.0828 3 118.9904 data1.gls1 > library (MuMIn) df AICc data1.gls 3 129.6388 Generalized least squares (GLS) > AICc (data1.gls, data1.gls1, data1.gls2)

Degrees of freedom: 16 total; 14 residual 1.49282 AIC BIC logLik 118.9904 120.9075 -56.49519 Variance function: Structure: fixed weights Formula: ~x^2 Coefficients: Value Std.Error t-value p-value (Intercept) 41.21920 1.493556 27.598018 0.0000 x 0.469988 Model: y ~ x Med Residual standard error: 1.393108 1.54157863 0.77799410 -1.49259798 -0.59852829 -0.07669281 Max Q3 Q1 3.176287 Min Standardized residuals: x -0.671 (Intr) Correlation: 0.0067 Data: data1 Generalized least squares fit by REML > summary (data1.gls) 1.57074 Generalized least squares fit by REML Model: y ~ x Data: data1 AIC BIC logLik 127.6388 129.5559 -60.81939 Coefficients: Value Std.Error t-value p-value (Intercept) 40.33000 7.189442 5.609615 0.0001 x 0.743514 2.112582 > summary (data1.gls2) Q3 Degrees of freedom: 16 total; 14 residual Residual standard error: 13.70973 2.29099872 0.35357567 -2.00006105 -0.29319830 -0.02282621 Max Med 0.0531 Q1 Min Standardized residuals: x -0.879 (Intr) Correlation: Generalized least squares (GLS)

Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016 Section 1 - PowerPoint PPT Presentation

Workshop 8.2a: Heterogeneity Murray Logan 23 Jul 2016 Section 1 Linear modelling assumptions Assumptions y i = 0 + 1 x i + i i N (0 , 2 ) Linear modelling assumptions y i = 0 + 1 x i + i i N (0 ,

A comparison of A comparison of heterogeneity correction heterogeneity correction algorithms

WORK IN THE GIG ECONOMY Huma Humans a ns as a s a Se Service rvice @JeremiasPrassl VAST

Etiologic Heterogeneity Etiologic Heterogeneity In Endometrial Cancer Advances in Endometrial

Processing Heterogeneity Nikolaus Grigorieff Heterogeneity and Biology Translocation, Brilot et

Processing Heterogeneity Nikolaus Grigorieff Larson, The Far Side Heterogeneity and Biology

Detecting and Detecting and Characterizing Heterogeneity Characterizing Heterogeneity

Unobserved Heterogeneity in Matching Games Jeremy T. Fox 1 Chenyu Yang 2 1 University of Michigan

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Electrical

Measuring the Spatial Heterogeneity of Outdoor Users in Wireless Cellular Networks Based on Open

Statistical Modeling of Spatial Traffic Distribution with Adjustable Heterogeneity and

Escaping the Losses from Trade: The Impact of Heterogeneity on Skill Acquisition Preliminary

Computational modelling of heterogeneity of asphalt mixtures Daniel Castillo 31.05.2018

Computational modelling of heterogeneity of asphalt mixtures Daniel Castillo 31.05.2018

Addressing Tumor Molecular Heterogeneity using A Novel Clinical Trial Design - PANGEA Daniel

Bioinformatics pipeline for revealing tumour heterogeneity Mustafa Anl Tuncel Department of

Overview Outline: Treating Heterogeneity in PLS Path Modeling Using Latent Class Moderating

Linear algebra A brush-up course Anders Ringgaard Kristensen Slide 1 Outline Real numbers

Hybrid Steepest Descent Method for Variational Inequality Problem over Fixed Point Sets of

The inverse Berreman problem Bill Lionheart and Chris Newton School of Mathematics University of

High Dimensional Data PCA So far we ve considered scalar data values f i (or We have n

Some Geometrical Considerations James H. Steiger Department of Psychology and Human Development

Over-parameterized nonlinear learning: Gradient descent follows the shortest path? Samet Oymak

Convex Optimization by Stephen Boyd, and Lieven Vandenberghe. Optimization for Machine Learning by

Session 06 Generalized Linear Models 1 Nature of the generalization Single response variable,