Lecture 13. Multiple regression 2020 (1) Introduction Now there is - PowerPoint PPT Presentation

Lecture 13. Multiple regression 2020

(1) Introduction Now there is one response variable Y and two predictor variables, X and Z . Data ( X 1 ; Z 1 ; Y 1 ) : : : ( X n ; Z n ; Y n ). We want to either a) predict the value of Y associated with particular values of X and Z , or b) describe the relationship between Y , X and Z , or c) estimate the e¸ect of changes in X and Z on Y .

(2) Data example Time Distance Climb Race (mins) (miles) (1000 ft) Greenmantle Dash 16.08 2.5 0.65 Carnethy 5 Hill 48.35 6.0 2.50 Craig Dunain 33.65 6.0 0.90 Ben Rha 45.60 7.5 0.80 Ben Lomond 62.27 8.0 3.07 Goat Fell 73.22 8.0 2.87 Bens of Jura 204.62 16.0 7.50 Cairnpapple 36.37 6.0 0.80 Scolty 29.75 5.0 0.80 Traprain Law 39.75 6.0 0.65 . . . and so on . . .

(3) Prediction equation As for simple linear regression, it may be that a) Predictors X , Z and response Y are all random, or b) Values of predictors X and Z are ˛xed, e.g. by experimental design. In either case, there is a prediction equation Y = b 0 + b 1 X + b 2 Z + e; Prediction error e is assumed N (0 ; ff 2 ).

(4) The multiple regression surface ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● z x

(5) Sums of squares and products The starting point for all calculations is this 3 ˆ 3 matrix of sums of squares and products. 0 S xx S xz S xy 1 B S zx S zz S zy C B C @ A S yx S yz S yy

(6) Estimation equations ^ b 1 and ^ b 2 are solutions of the two equations b 1 S xx + b 2 S xz = S xy b 1 S zx + b 2 S zz = S zy When appropriate, corrected sums of squares and products are replaced by variances and covariances. ’Partial’ regression coe‹cient b 1 is the e¸ect on E ( Y ) of changing X while holding Z constant. ’Partial’ regression coe‹cient b 2 is the e¸ect on E ( Y ) of changing Z while holding X constant.

(7) Partial regression coe‹cients When X is increased by one unit, the total e¸ect on Y is the sum of two parts, one due to the change in X , the other due to the concomitant change in Z . If the model includes X and not Z , we only see the total e¸ect. Including both X and Z in the model allows us to separate the two parts. The partial regression coe‹cient estimates the part speci˛c to X .

(8) Estimate of regression coe‹cient The estimate of b 1 is ( S xy ` S xz S yz =S zz ) ( S xx ` S 2 xz =S zz ) Compare this with the estimate S xy =S xx obtained when Z is ignored. The denominator is the residual sum of squares obtained after regressing X on Z . Numerator is the sum of products of Y and the residual of X after ˛tting Z . There is a similar expression for the estimate of b 2 .

(9) Residuals and ˛tted values Fitted value is now Y + ^ X ) + ^ Y = — ^ b 1 ( X ` — b 2 ( Z ` — Z ), and the anova equation still holds: Y ) 2 = Y ) 2 + X ( Y ` — X (^ Y ` — X ( Y ` ^ Y ) 2 Regression SSQ simpli˛es to ^ b 1 S xy +^ b 2 S zy , with 2 d.f. Residual sum of squares has n ` 3 d.f.

(10) The anova table Sums of squares and mean squares are set out in an anova table, as for simple linear regression, but now degrees of freedom for regression, residual and total sums of squares are 2, n ` 3, and n ` 1. The ANOVA F statistic (with 2 and n ` 3 d.f.) tests the null hypothesis that b 1 = b 2 = 0, i.e. that E ( Y ) = b 0 (constant). Regression sum of squares may be split into two components, each with 1 d.f. See later.

(11) Tests for regression coe‹cients There is a t test for the hypothesis b 1 = 0. As usual, the test statistic is the estimate of b 1 divided by its standard error. The null distn is t with n ` 3 d.f. S xx determined the size of the s.e. for simple linear regression. Now this role is played by S xx ` S 2 xz =S zz . Correlation between the two predictors reduces this quantity and ’in‚ates’ the standard error. There is a similar result for b 2 (switch x and z in the previous paragraph.)

� � � � (12) Cow and her relatives Mother Father ?? Cow Halfsib

(13) Estimated breeding value Y is the breeding value of a cow, X and Z are phenotypes of its mother and paternal half-sister. We want to use X and Z to predict Y . Covariance matrix for X , Z and Y is 1 0 1 V P 0 2 V A B C 1 B 0 V P 4 V A C B C B C @ 1 1 A 2 V A 4 V A V A where V P = V A + V E .

(14) Estimated breeding value 1 V P 0 2 V A 1 0 V P 4 V A 1 1 2 V A 4 V A V A The two equations to be solved are b 1 V P = 1 2 V A , b 2 V P = 1 4 V A , and prediction is ^ Y = h 2 ( X= 2 + Z= 4), where h 2 = V A =V P is the heritability of the trait.

END OF LECTURE

Lecture 14. Hill race data 2020

(15) A special case If Z takes values 0 and 1, the model gives b 0 + b 2 X when Z = 0 b 0 + b 1 + b 2 X when Z = 1. Common slope of the parallel lines is b 2 . Intercept for the ˛rst line is b 0 . Intercept for the second line is b 0 + b 1 . b 1 is the di¸erence between the intercepts (the constant vertical distance between the two lines).

(16) A special case Y = b 0 + b 1 Z + b 2 X Y 1 = Z b 1 b 0 + b 1 0 = Z b 0 X

(17) Hill-race data The di‹culty of a hill race is measured by a) X = total distance covered, b) Z = total climb required. Given distance, climb, and record time for 31 Scottish hill races, multiple regression can ˛nd a relationship between record time Y and the two measures of di‹culty X and Z .

(18) Hill-race data 100 80 60 Time (mins) 40 20 Distance (miles) 0 2 4 6 8 10 12 14

(19) Hill-race data For this analysis, values of climb are grouped as low (climb < 1000 feet, Z = 0), or high (climb > 1000 feet, Z = 1), corresponding to light and dark gray dots on the graph. Estimate Std Error t (Distance) 6.8731 0.4564 15.06 (Climb) 10.3651 2.3175 4.472 Both partial regression coe‹cients are highly signi˛cant ( P < 0 : 001). The single regression line shown on the previous slide fails to capture the e¸ect of di¸erent amounts of climb.

(20) An F test Anovas for the regression on distance alone, and the regression on both distance and climb: DF SSQ Distance 1 12081 Residual 29 1474 Distance + Climb 2 12695 Residual 28 860 The two anovas can be combined into one: DF SSQ Distance (ignoring Climb) 1 12081 Climb (adjusted for Distance) 1 614 Residual 28 860

(21) An F test DF SSQ MSQ F Distance 1 12081 12081 Climb (adjusted) 1 614 614 20.0 Residual 28 860 30.7 Test b(Climb) = 0: F = 20.0 on 1 and 28 d.f. (P < 0.001). Adding climb to the equation signi˛cantly improves the ˛t. The hypothesis is ˛rmly rejected. There is strong evidence for an e¸ect of climb, after allowing for the e¸ect of distance. The same result (exactly) was obtained with a t test based on the estimated partial regression coe‹cient ( F = t 2 ).

(22) Using the original climb data What happens if we use the original climb data rather than the grouped (0/1) version? The model is now E ( Y ) = ( b 0 + b 1 Z ) + b 2 X On the ( X , Y ) graph, this speci˛es a family of parallel lines. The vertical position of the line changes smoothly and continuously as Z changes. The regression coe‹cient b 1 measures the rate at which this happens (in units of inches per 1000 feet). The grouped (0/1) version of Z gave just two lines, one for low climb races, the other for high climb races.

(23) Comparing the two analyses The regression coe‹cient and s.e. for distance is similar in the two analyses. The table below shows the estimated e¸ects of climb. Estimate Std Error t Grouped Z 10.3651 2.3175 4.472 Original Z 6.8288 1.1134 6.133 The ungrouped analysis tells us that the ( X , Y ) line moves up (the predicted race time increases) by 6.8 mins for every additional 1000 feet of climb. The grouped analysis told us that the line for a ’high’ climb race is 10.4 mins above the line for a ’low’ climb race.

(24) Diagnostic plot for hill race data Residuals vs Fitted 10 5 Residuals 0 −5 −10 20 40 60 80 Fitted values lm(Time ~ Distance + Climb)

(25) Diagnostic plot for hill race data Normal Q−Q 2 Standardized residuals 1 0 −1 −2 −2 −1 0 1 2 Theoretical Quantiles lm(Time ~ Distance + Climb)

END OF LECTURE

Lecture 15. Using R, . . . 2020

(26) Using R The lm function deals with multiple regression. Diagnostic plots and analysis of variance tables are produced as for simple linear regression. library(sda) hills31 <- subset(hills, Time < 100) fit <- lm(Time ˜ Distance + Climb, data = hills31) summary(fit) anova(fit) plot(fit, which = 1:2, add.smooth = FALSE) confint(fit, parm = 2:3)

(27) summary and anova summary(fit) produces estimates and standard errors for partial regression coe‹cients. Each coe‹cient is adjusted for all other e¸ects in the model. Results do not depend on the order of terms. anova(fit) produces ’extra’ sums of squares, which depend on the order of terms.

Lecture 13. Multiple regression 2020 (1) Introduction Now there is - PowerPoint PPT Presentation

Lecture 13. Multiple regression 2020 (1) Introduction Now there is one response variable Y and two predictor variables, X and Z . Data ( X 1 ; Z 1 ; Y 1 ) : : : ( X n ; Z n ; Y n ). We want to either a) predict the value of Y associated with

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Multiple regression STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Multiple Regression Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression

Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General

STAT 213 Interactions in Multiple Regression Colin Reimer Dawson Oberlin College 29 March 2016

R05 - Multiple Regression STAT 587 (Engineering) Iowa State University October 30, 2020

Multiple Linear Regression James H. Steiger Department of Psychology and Human Development

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Introduction to Multiple Regression James H. Steiger Department of Psychology and Human

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Bit Accurate Roundoff Bit Accurate Roundoff Noise Analysis of Noise Analysis of Fixed-Point

Towards Scalable and Efficient FPGA Stencil Accelerators el Deest 1 Nicolas Estibals 1 Tomofumi

UAM scholar integration : UAM scholar integration : The case of France The case

Rational, Recognizable, and Aperiodic Sets in the Partially Lossy Qeue Monoid 35 th International

Thank You

AGRM UPDATE! H E R E I S S O M E I N F O T O K E E P Y O U I N T H E L O O P. WE JUST

Heathwall Pumping Station & Kirtling Street Community Liaison Working Group Thursday, 5

N UMERICAL RESULTS OBTAINED FOR PLANE CHANNELS , CIRCULAR AND ANNULAR PIPES Circular pipes Plane

Lecture 13. Multiple regression 2020 (1) Introduction Now there is - PowerPoint PPT Presentation

Lecture 13. Multiple regression 2020 (1) Introduction Now there is one response variable Y and two predictor variables, X and Z . Data ( X 1 ; Z 1 ; Y 1 ) : : : ( X n ; Z n ; Y n ). We want to either a) predict the value of Y associated with

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Multiple regression STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Multiple Regression Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression

Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General

STAT 213 Interactions in Multiple Regression Colin Reimer Dawson Oberlin College 29 March 2016

R05 - Multiple Regression STAT 587 (Engineering) Iowa State University October 30, 2020

Multiple Linear Regression James H. Steiger Department of Psychology and Human Development

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Introduction to Multiple Regression James H. Steiger Department of Psychology and Human

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Bit Accurate Roundoff Bit Accurate Roundoff Noise Analysis of Noise Analysis of Fixed-Point

Towards Scalable and Efficient FPGA Stencil Accelerators el Deest 1 Nicolas Estibals 1 Tomofumi

UAM scholar integration : UAM scholar integration : The case of France The case

Rational, Recognizable, and Aperiodic Sets in the Partially Lossy Qeue Monoid 35 th International

Thank You

AGRM UPDATE! H E R E I S S O M E I N F O T O K E E P Y O U I N T H E L O O P. WE JUST

Heathwall Pumping Station &amp; Kirtling Street Community Liaison Working Group Thursday, 5

N UMERICAL RESULTS OBTAINED FOR PLANE CHANNELS , CIRCULAR AND ANNULAR PIPES Circular pipes Plane

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Heathwall Pumping Station & Kirtling Street Community Liaison Working Group Thursday, 5