Lecture 8: Model assessment, nested models, and hypothesis testing - PowerPoint PPT Presentation

Lecture 8: Model assessment, nested models, and hypothesis testing Ani Manichaikul amanicha@jhsph.edu 27 April 2007 1

Another Example: Mortality n British Smoke, Pollution & Morality Data Airborne Smoke Particles 1.34 SO2 Concentration .09 London Mortality .29 4.46 112 518 2

Mortality Example: Model Let: Y = the daily mortality for London (deaths) n X 1 = airborne smoke particles (mg/m3) (smoke) n X 2 = SO 2 (ppm) (so2) n Model: 1) Y i = β 0 + β 1 (X 1 -2) + β 2 (X 2 -.5) + ε i n 2) ε i ~ N(0, σ 2 ) n Mortality is a linear function of the concentration of airborne n smoke particles AND the SO2 level 3

Mortality Example: Interpretations Model: n E( Y | X ) = β 0 + β 1 (X 1 -2) + β 2 (X 2 -.5) n β 0 : E( Y | X 1 = 2, X 2 = .5) = β 0 + β 1 (0)+ β 2 (0) = β 0 n Therefore: β 0 = The mean number of deaths per day when smoke particle concentrations are 2 mg/m 3 and SO 2 concentrations are 0.5 ppm levels 4

Mortality Example: Interpretations n β 1 : E( Y | X 1 = (X 1 + 1), X 2 )= β 0 + β 1 (X 1 -1)+ β 2 (X 2 -.5) E( Y | X 1 = (X 1 ), X 2 ) = β 0 + β 1 (X 1 -2)+ β 2 (X 2 -.5) ∆ E( Y | X ) = β 1 n Therefore: β 1 = Expected change in mortality on days when particles are 0.1 mg/m 3 higher if SO 2 is unchanged 5

Mortality Example: Interpretations n β 2 : E( Y | X 1 = ?, X 2 = ?) = E( Y | X 1 = ?, X 2 = ?) = ∆ E( Y | X ) = β 2 n Therefore: β 2 = 6

Mortality Example: Results Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 2, 12) = 36.57 Model | 205097.531 2 102548.765 Prob > F = 0.0000 Residual | 33654.2025 12 2804.51687 R-squared = 0.8590 -------------+------------------------------ Adj R-squared = 0.8355 Total | 238751.733 14 17053.6952 Root MSE = 52.958 ------------------------------------------------------------------------------ deaths | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- smokecenter | -220.3244 58.14314 -3.79 0.003 -347.0074 -93.64135 so2center | 1051.816 212.5959 4.95 0.000 588.6096 1515.023 _cons | 174.7703 29.16174 5.99 0.000 111.2323 238.3083 ------------------------------------------------------------------------------ 7

Mortality Example: Inference n Overall F-Test: n Are ANY of the covariates significant? n H 0 : β 1 = β 2 = 0; n Fobs: (2,13) = 36.57; n p-val = 0.0000 n Decision: At least one of the β ’s are nonzero 8

Parameter Estimates (95% C.I.) & individual t-tests β 0 n b 0 = 174.8 (111.2, 238.3) n H 0 : β 0 = 0; n tobs: (12) = 5.99; n p-val = 0.000 9

Parameter Estimates (95% C.I.) & individual t-tests β 1 n b 1 = -220.3 (-347.0, -93.6) n H 0 : β 1 = 0; n tobs: (12) = -3.79; n p-val = 0.003 10

Parameter Estimates (95% C.I.) & individual t-tests β 2 n b 2 = 1051.8 (588.6, 1515.0) n H 0 : β 2 = 0; n tobs: (12) = 4.95; n p-val = 0.000 means p-val < 0.001 n Note: s 2 = MSE = 2805; n s = √ MSE = ‘Root MSE’ = 53 11

Parameter Interpretations: with Estimates n b 0 : when smoke particles and SO 2 are around their average levels, (2 mg/m 3 ,and 0.5 ppm respectively), the estimated mean number of deaths is 174.8 / day n b 1 : the estimated mean mortality is 22 deaths/day lower on days when particles are 0.1 mg/m 3 higher if SO 2 is unchanged n b 2 : (You do!) 12

Estimating Suppose we were interested in the estimated mean n number of deaths when smoke particle concentrations were 3 mg/m 3 and SO 2 levels were 0.65 ppm E( Y | X ) = β 0 + β 1 (X 1 -2) + β 2 (X 2 -.5) so: E(Deaths) = b 0 + b 1 (smoke-2) + b 2 (so2-.5) n = 174.8 - 220 (3 - 2) + 1052 (.65 -0.5) ≈ 60 deaths How about if smoke particle concentrations were 3 n mg/m 3 and SO 2 levels were 0.45 ppm? 13

Association n The estimate for airborne smoke particles is b 1 = –220, implying that smoke particles and mortality have a negative relationship n i.e. an increase in smoke particles is associated with a decrease in mortality, after adjusting for SO 2 levels. 14

Negative Association?? n BUT WAIT! n Look at the plot of deaths vs smoke presented previously. Shouldn’t the relationship be positive instead?! n Let’s run Simple Linear Regressions (SLRs) of mortality on smoke & SO 2 and see what we get. 15

Simple Linear Regression n Same Notation: n Y = the daily mortality for London (deaths) n X 1 = airborne smoke particles (mg/m3) (smoke) n X 2 = SO 2 (ppm) (so2) 16

SLR Models n Smoke: n 1) Y i = β 0 + β 1 (X 1 -2) + ε i n 2) ε i ~ N(0, σ 2 ) n SO 2 : n 1) Y i = β 0 * + β 1 * (X 2 -.5) + ε i * n 2) ε i * ~ N(0, σ 2 * ) 17

SLR: Deaths ~ Smoke SLR: DEATHS ~ SMOKE 500 London Mortality 400 300 200 100 0 2 4 Airborne Smoke Particles 18

Death ~ Smoke: Results Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 1, 13) = 17.34 Model | 136449.517 1 136449.517 Prob > F = 0.0011 Residual | 102302.216 13 7869.40127 R-squared = 0.5715 -------------+------------------------------ Adj R-squared = 0.5386 Total | 238751.733 14 17053.6952 Root MSE = 88.71 ------------------------------------------------------------------------------ deaths | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- smokecenter | 63.76092 15.31226 4.16 0.001 30.68078 96.84105 _cons | 299.3407 24.64457 12.15 0.000 246.0993 352.582 ------------------------------------------------------------------------------ Parameter Estimates: b 0 = 299.3 b 1 = 63.8 ( is positive?!!) Amount of variation described: R 2 = SSM / SST = 57% Residual Variability left over, (undescribed by this SLR): SSE = 1023002.216 19

SLR: Death ~ SO 2 SLR: DEATHS ~ SO2 500 London Mortality 400 300 200 100 0 .5 1 1.5 SO2 Concentration 20

Death ~ SO 2 : Results Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 1, 13) = 28.99 Model | 164827.112 1 164827.112 Prob > F = 0.0001 Residual | 73924.6211 13 5686.50932 R-squared = 0.6904 -------------+------------------------------ Adj R-squared = 0.6666 Total | 238751.733 14 17053.6952 Root MSE = 75.409 ------------------------------------------------------------------------------ deaths | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- so2center | 256.2356 47.59353 5.38 0.000 153.416 359.0551 _cons | 272.2286 19.57285 13.91 0.000 229.944 314.5131 ------------------------------------------------------------------------------ Parameter Estimates: b 0 = 256.2 b 1 = 272.2 Amount of variation described: R 2 = SSM / SST = 69% Residual Variability left over, (undescribed by this SLR): SSE = 73924.6211 21

Confounding in this Example Recall our parameter interpretations: n β 1 = Expected change in mortality on days when particles are 0.1 mg/m 3 higher if SO 2 is unchanged n Suppose we examine the relationship between smoke particle concentrations and SO 2 levels, (SLR): 22

SLR: Smoke ~ SO 2 SLR: SMOKE ~ SO2 Airborne Smoke Particles 6 4 2 0 0 .5 1 1.5 SO2 Concentration 23

Confounding n Smoke particle concentrations and SO 2 levels are highly related! How can we talk about changing smoke particle concentrations while leaving SO 2 levels unchanged?? n This phenomenon is called ‘confounding’ n both covariates are related to the outcome and to each other. n Confounding is the reason we found differences between the SLR models and the MLR model. 24

Residuals: part “left over” 25

Residuals n Residuals are deviations n what’s ‘left over’ n in the response, Y, from what was expected given the predictor, X n The residuals are the part of Y that can’t be predicted by X! 26

Adjusted Variable Plots Idea: n Explain all that we can in London daily mortality using SO 2 levels n Explain all that we can in smoke particle concentrations using SO 2 levels n Explain everything that’s ‘left over’ in mortality with everything that’s ‘left over’ in smoke particle concentrations. The slope of this line will be the MLR coefficient! 27

Adjusted Variable Plot AVP: Deaths vs. Smoke Resids: DEATHS ~ SO2 200 100 0 -100 -200 -.5 0 .5 Resids: SMOKE ~ SO2 28

Notes on AVPs n β 1 * is identical to the coefficient of X 1 from an MLR of Y on X 1 and X 2 n β 0 * is zero -- zero intercept n The AVP display may be misleading if Y and/or X 1 are not linearly related to the other predictors 30

Lecture 8: Model assessment, nested models, and hypothesis testing - PowerPoint PPT Presentation

Lecture 8: Model assessment, nested models, and hypothesis testing Ani Manichaikul amanicha@jhsph.edu 27 April 2007 1 Another Example: Mortality n British Smoke, Pollution & Morality Data Airborne Smoke Particles 1.34 SO2

Nested Word Automata Jens Stimpfle 30.6.2014 Nested Words Nested Words Theoretically and

Nested and Composite Classes Lecture 14 COP 3252 Summer 2017 May 30, 2017 Nested Classes

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

Nested Transactions Nested Transactions Flat transactions The rules for committing of

Comparing Nested Models Two models are nested if one model contains all the terms of the other,

Comparing Nested Models Two regression models are called nested if one contains all the predictors

Cluster Validity Hypothesis Random Graph Hypothesis Random Label Hypothesis Relative Criteria

6 Subsequences and sequential compactness 6.1 Nested intervals and nested d -cells Recall the

NEVE: Nested Virtualization Extensions for ARM Jin Tack Lim, Christo ff er Dall, Shih-Wei Li, Jason

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

6.16.4 Hypothesis tests Prof. Tesler Math 186 Winter 2019 Prof. Tesler 6.16.4 Hypothesis

Hypothesis Testing Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Hypothesis Tests using Excel T.TEST function V1e 11/12/2013 Two group hypothesis tests using

Two-way ANOVA. Interaction. Susanne Rosthj Section of Biostatistics Department of Public

Multiple Regression Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression

STAT 215 Indicator Variables Colin Reimer Dawson Oberlin College 31 October and 2 November 2016

Announcements Grades for the first midterm are posted, solutions to the midterm are on Smartsite

Statistics and Data Analysis R Programming and Logistic Regression Ling-Chieh Kung Department of

Dealing with Missing Data Challenges and Solutions Nicole Erler Department of Biostatistics,

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The

Stat 8053, Fall 2013: Robust Regression Duncans occupational-prestige regression was introduced