Multivariate Linear Regression Max Turgeon STAT 4690Applied - PowerPoint PPT Presentation

Multivariate Linear Regression Max Turgeon STAT 4690–Applied Multivariate Analysis

Multivariate Linear Regression model • We will assume a linear relationship : regression coeffjcients . • We will also assume homoscedasticity : 2 • We are interested in the relationship between p outcomes Y 1 , . . . , Y p and q covariates X 1 , . . . , X q . • We will write Y = ( Y 1 , . . . , Y p ) and X = (1 , X 1 , . . . , X q ) . • E ( Y | X ) = B T X , where B is a ( q + 1) × p matrix of • Cov( Y | X ) = Σ , where Σ is positive-defjnite. • In other words, the (conditional) covariance of Y does not depend on X .

• However, we will use the correlation between outcomes Relationship with Univariate regression i for hypothesis testing • This follows from the assumption that each component 3 • Let σ 2 i be the i -th diagonal element of Σ . • Let β i be the i -th column of B . • From the model above, we get p univariate regressions: • E ( Y i | X ) = X T β i ; • Var( Y i | X ) = σ 2 i . Y i is linearly associated with the same covariates X .

Relationship with Univariate regression ii between the outcomes, we would get the Seemingly Unrelated Regressions (SUR) model. • This model is sometimes used by econometricians. 4 • If we assumed a difgerent set of covariates X i for each outcome Y i and still wanted to use the correlation

Least-Squares Estimation i • For Least-Squares Estimation, we will be looking for the • Note : This criterion is also known as the (squared) 5 • Let Y 1 . . . , Y n be a random sample of size n , and let X 1 , . . . , X n be the corresponding sample of covariates. • We will write Y and X for the matrices whose i -th row is Y i and X i , respectively. • We can then write E ( Y | X ) = X B . estimator ˆ B of B that minimises a least-squares criterion: ( Y − X B ) T ( Y − X B ) [ ] • LS ( B ) = tr Frobenius norm ; i.e. LS ( B ) = ∥ Y − X B ∥ 2 F .

Least-Squares Estimation ii • Note 2 : If you expand the matrix product and look at the diagonal, you can see that the Frobenius norm is equivalent to the sum of the squared entries. • Or, we can expand the matrix product along the diagonal and compute the trace. 6 • To minimise LS ( B ) , we could use matrix derivatives… • Let Y ( j ) be the j -th column of Y .

Least-Squares Estimation iii 7 • In other words, Y ( j ) = ( Y 1 j , . . . , Y nj ) contains the n values for the outcome Y j . We then have ( Y − X B ) T ( Y − X B ) [ ] LS ( B ) = tr p ( Y ( j ) − X β j ) T ( Y ( j ) − X β j ) ∑ = j =1 p n ∑ ∑ ( Y ij − β T j X i ) 2 . = j =1 i =1

Least-Squares Estimation iv linear regression. • Or put another way: 8 least-squares criterion for the corresponding univariate • For each j , the sum ∑ n i =1 ( Y ij − β T j X i ) 2 is simply the • ˆ β j = ( X T X ) − 1 X T Y ( j ) • But since LS ( B ) is a sum of p positive terms, each minimised at ˆ β j , the whole is sum is minimised at ˆ ( ˆ ˆ ) B = β 1 · · · β p . ˆ B = ( X T X ) − 1 X T Y .

Comments i • We still have not made any distributional assumptions on • We do not need to assume normality to derive the least-squares estimator. • The least-squares estimator is unbiased : 9 Y . E ( ˆ B | X ) = ( X T X ) − 1 X E ( Y | X ) = ( X T X ) − 1 X T X B = B.

Comments ii 10 estimation process. But note that: • We did not use the covariance matrix Σ anywhere in the Cov(ˆ β i , ˆ ( ) ( X T X ) − 1 X T Y ( i ) , ( X T X ) − 1 X T Y ( j ) β j ) = Cov ( X T X ) − 1 X T ) T ( ) ( = ( X T X ) − 1 X T Cov Y ( i ) , Y ( j ) = ( X T X ) − 1 X T ( σ ij I n ) X ( X T X ) − 1 = σ ij ( X T X ) − 1 , where σ ij is the ( i, j ) -th entry of Σ .

Example i # Let's revisit the plastic film data library (heplots) library (tidyverse) Y <- Plastic %>% select (tear, gloss, opacity) %>% as.matrix X <- model.matrix ( ~ rate, data = Plastic) head (X) 11

Example ii ## 4 (B_hat <- solve ( crossprod (X)) %*% t (X) %*% Y) 0 1 ## 6 0 1 ## 5 0 1 0 ## 1 ## 3 0 1 ## 2 0 1 ## 1 (Intercept) rateHigh 12

Example iii coef (fit) 0.29 0.59 -0.51 ## rateHigh 3.79 9.57 ## (Intercept) 6.49 tear gloss opacity ## data = Plastic) ## fit <- lm ( cbind (tear, gloss, opacity) ~ rate, # Compare with lm output 0.29 0.59 -0.51 ## rateHigh 3.79 9.57 ## (Intercept) 6.49 tear gloss opacity 13

Geometry of LS i residuals. 14 • Let P = X ( X T X ) − 1 X T . • P is symmetric and idempotent : P 2 = X ( X T X ) − 1 X T X ( X T X ) − 1 X T = X ( X T X ) − 1 X T = P. • Let ˆ Y = X ˆ B be the fjtted values, and ˆ E = Y − ˆ Y , the • We have ˆ Y = P Y . • We also have ˆ E = ( I − P ) Y .

Geometry of LS ii • Putting all this together, we get • In other words, the fjtted values and the residuals are orthogonal . 15 Y T ˆ ˆ E = ( P Y ) T ( I − P ) Y = Y T P ( I − P ) Y = Y T ( P − P 2 ) Y = 0 . • Similarly, we can see that X T ˆ E = 0 and P X = X .

Geometry of LS iii 16 • Interpretation : ˆ Y is the orthogonal projection of Y onto the column space of X .

Example (cont’d) i Y_hat <- fitted (fit) crossprod (Y_hat, residuals) ## tear gloss opacity ## tear -9.489298e-16 2.959810e-15 -4.720135e-15 ## gloss -1.424461e-15 1.109357e-15 -1.150262e-14 ## opacity -7.268852e-16 1.211209e-15 1.648459e-16 17 residuals <- residuals (fit)

Example (cont’d) ii crossprod (X, residuals) ## tear gloss opacity ## (Intercept) 0 5.828671e-16 -4.440892e-16 ## rateHigh 0 1.387779e-16 4.440892e-16 18

Example (cont’d) iii # Is this really zero? isZero <- function (mat) { all.equal (mat, matrix (0, ncol = ncol (mat), nrow = nrow (mat)), check.attributes = FALSE) } isZero ( crossprod (Y_hat, residuals)) ## [1] TRUE 19

Example (cont’d) iv isZero ( crossprod (X, residuals)) ## [1] TRUE 20

Bootstrapped Confjdence Intervals i • We still have not made any assumption about the covariance function. • Let’s see how much further we can go. • We will use bootstrap to derive confjdence intervals for our quantities of interest. • Bootstrap is a resampling technique for estimating the sampling distribution of an estimator of interest. • Particularly useful when we think the usual assumptions may not hold, or when the sampling distribution would be diffjcult to derive. 21 distribution of Y , beyond the conditional mean and

Bootstrapped Confjdence Intervals ii • Let’s say we want to estimate the sampling distribution of the correlation coeffjcient. • The idea is to resample with replacement from our sample to mimic the process of “repeating the experiment”. 22 • We have a sample of pairs ( U 1 , V 1 ) , . . . , ( U n , V n ) , from which we estimated the correlation ˆ ρ .

• We now have a whole sample of correlation coeffjcients Bootstrapped Confjdence Intervals iii • From its quantiles, we can derive a confjdence interval for 23 • For each bootstrap sample ( U ( b ) 1 , V ( b ) 1 ) , . . . , ( U ( b ) n , V ( b ) n ) , ρ ( b ) . we compute the sample correlation ˆ ρ (1) , . . . , ˆ ρ ( B ) . ˆ ρ . ˆ

Example i library (candisc) dataset <- HSB[, c ("math", "sci")] (corr_est <- cor (dataset)[1,2]) ## [1] 0.6495261 24

Example ii # Choose a number of bootstrap samples B <- 5000 corr_boot <- replicate (B, { replace = TRUE) dataset_boot <- dataset[samp_boot,] cor (dataset_boot)[1,2] }) quantile (corr_boot, probs = c (0.025, 0.975)) 25 samp_boot <- sample ( nrow (dataset),

Example iii ## 2.5% 97.5% ## 0.6037029 0.6924364 hist (corr_boot, breaks = 50) abline (v = corr_est, col = 'red', lty = 2) 26

Example iv 27 Histogram of corr_boot 400 300 Frequency 200 100 0 0.55 0.60 0.65 0.70 corr_boot

Bootstrapped Confjdence Intervals (cont’d) i • Going back to our multivariate linear regression setting, we can bootstrap our estimate of the matrix of regression coeffjcients! • It’s important to sample the same rows in both • For each bootstrap sample, we can compute the estimate • From these samples, we can compute confjdence intervals 28 • We will sample with replacement the rows of Y and X matrices. We want to keep the relationship between Y and X intact. ˆ B ( b ) . for each entry in B .

Bootstrapped Confjdence Intervals (cont’d) ii • We can also technically compute confjdence regions for • E.g. a whole column or a whole row • But multivariate quantiles are tricky… 29 multiple entries in B

Example (cont’d) i B_boot <- replicate (B, { replace = TRUE) X_boot <- X[samp_boot,] Y_boot <- Y[samp_boot,] solve ( crossprod (X_boot)) %*% t (X_boot) %*% Y_boot }) # The output is a 3-dim array dim (B_boot) 30 samp_boot <- sample ( nrow (Y),

Example (cont’d) ii ## (Intercept) 6.5545455 probs = c (0.025, 0.975)) quantile (B_boot["rateHigh", "tear",], # CI for effect of rate on tear 0.4787879 -0.1535354 0.7515152 ## rateHigh 9.5090909 3.7818182 opacity ## [1] gloss tear ## B_boot[,,1] 3 5000 2 31

Example (cont’d) iii ## 2.5% 97.5% ## 0.2738049 0.9125000 # CI for effect of rate on gloss quantile (B_boot["rateHigh", "gloss",], probs = c (0.025, 0.975)) ## 2.5% 97.5% ## -0.8967100 -0.1040152 32

Multivariate Linear Regression Max Turgeon STAT 4690Applied - PowerPoint PPT Presentation

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis Multivariate Linear Regression model We will assume a linear relationship : regression coeffjcients . We will also assume homoscedasticity : 2 We

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Space-Time Localized Radial Basis Function Collocation Methods for PDEs Alfa Heryudono

Respondent Mathieu Gaborit Supervisors Peter Gransson (KTH, se) Olivier Dazel (Le Mans

5. Advanced Topography Selecting Premium IOLs and Complex Cases www.medsalesacademy.co.uk

OCT: Applications in Laser Vision Correction and Lens-Based Refractive Surgery I have no

Clausemate Negative Polarity Item Licensing in Persian Dennis R. Storoshenko and Mahyar Nakhaei

1 So our first step is to walk through the anatomy of a module, at least in theory, so that when

Computer Graphics as Virtual Photography -- A Tour June 10, 1999 Motivational Film Geris

PARADIGM (BR32) OlaPArib And RADiotherapy In newly-diagnosed GlioblastoMa: Short-course