multivariate linear regression
play

Multivariate Linear Regression Max Turgeon STAT 4690Applied - PowerPoint PPT Presentation

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis Multivariate Linear Regression model We will assume a linear relationship : regression coeffjcients . We will also assume homoscedasticity : 2 We


  1. Multivariate Linear Regression Max Turgeon STAT 4690–Applied Multivariate Analysis

  2. Multivariate Linear Regression model • We will assume a linear relationship : regression coeffjcients . • We will also assume homoscedasticity : 2 • We are interested in the relationship between p outcomes Y 1 , . . . , Y p and q covariates X 1 , . . . , X q . • We will write Y = ( Y 1 , . . . , Y p ) and X = (1 , X 1 , . . . , X q ) . • E ( Y | X ) = B T X , where B is a ( q + 1) × p matrix of • Cov( Y | X ) = Σ , where Σ is positive-defjnite. • In other words, the (conditional) covariance of Y does not depend on X .

  3. • However, we will use the correlation between outcomes Relationship with Univariate regression i for hypothesis testing • This follows from the assumption that each component 3 • Let σ 2 i be the i -th diagonal element of Σ . • Let β i be the i -th column of B . • From the model above, we get p univariate regressions: • E ( Y i | X ) = X T β i ; • Var( Y i | X ) = σ 2 i . Y i is linearly associated with the same covariates X .

  4. Relationship with Univariate regression ii between the outcomes, we would get the Seemingly Unrelated Regressions (SUR) model. • This model is sometimes used by econometricians. 4 • If we assumed a difgerent set of covariates X i for each outcome Y i and still wanted to use the correlation

  5. Least-Squares Estimation i • For Least-Squares Estimation, we will be looking for the • Note : This criterion is also known as the (squared) 5 • Let Y 1 . . . , Y n be a random sample of size n , and let X 1 , . . . , X n be the corresponding sample of covariates. • We will write Y and X for the matrices whose i -th row is Y i and X i , respectively. • We can then write E ( Y | X ) = X B . estimator ˆ B of B that minimises a least-squares criterion: ( Y − X B ) T ( Y − X B ) [ ] • LS ( B ) = tr Frobenius norm ; i.e. LS ( B ) = ∥ Y − X B ∥ 2 F .

  6. Least-Squares Estimation ii • Note 2 : If you expand the matrix product and look at the diagonal, you can see that the Frobenius norm is equivalent to the sum of the squared entries. • Or, we can expand the matrix product along the diagonal and compute the trace. 6 • To minimise LS ( B ) , we could use matrix derivatives… • Let Y ( j ) be the j -th column of Y .

  7. Least-Squares Estimation iii 7 • In other words, Y ( j ) = ( Y 1 j , . . . , Y nj ) contains the n values for the outcome Y j . We then have ( Y − X B ) T ( Y − X B ) [ ] LS ( B ) = tr p ( Y ( j ) − X β j ) T ( Y ( j ) − X β j ) ∑ = j =1 p n ∑ ∑ ( Y ij − β T j X i ) 2 . = j =1 i =1

  8. Least-Squares Estimation iv linear regression. • Or put another way: 8 least-squares criterion for the corresponding univariate • For each j , the sum ∑ n i =1 ( Y ij − β T j X i ) 2 is simply the • ˆ β j = ( X T X ) − 1 X T Y ( j ) • But since LS ( B ) is a sum of p positive terms, each minimised at ˆ β j , the whole is sum is minimised at ˆ ( ˆ ˆ ) B = β 1 · · · β p . ˆ B = ( X T X ) − 1 X T Y .

  9. Comments i • We still have not made any distributional assumptions on • We do not need to assume normality to derive the least-squares estimator. • The least-squares estimator is unbiased : 9 Y . E ( ˆ B | X ) = ( X T X ) − 1 X E ( Y | X ) = ( X T X ) − 1 X T X B = B.

  10. Comments ii 10 estimation process. But note that: • We did not use the covariance matrix Σ anywhere in the Cov(ˆ β i , ˆ ( ) ( X T X ) − 1 X T Y ( i ) , ( X T X ) − 1 X T Y ( j ) β j ) = Cov ( X T X ) − 1 X T ) T ( ) ( = ( X T X ) − 1 X T Cov Y ( i ) , Y ( j ) = ( X T X ) − 1 X T ( σ ij I n ) X ( X T X ) − 1 = σ ij ( X T X ) − 1 , where σ ij is the ( i, j ) -th entry of Σ .

  11. Example i # Let's revisit the plastic film data library (heplots) library (tidyverse) Y <- Plastic %>% select (tear, gloss, opacity) %>% as.matrix X <- model.matrix ( ~ rate, data = Plastic) head (X) 11

  12. Example ii ## 4 (B_hat <- solve ( crossprod (X)) %*% t (X) %*% Y) 0 1 ## 6 0 1 ## 5 0 1 0 ## 1 ## 3 0 1 ## 2 0 1 ## 1 (Intercept) rateHigh 12

  13. Example iii coef (fit) 0.29 0.59 -0.51 ## rateHigh 3.79 9.57 ## (Intercept) 6.49 tear gloss opacity ## data = Plastic) ## fit <- lm ( cbind (tear, gloss, opacity) ~ rate, # Compare with lm output 0.29 0.59 -0.51 ## rateHigh 3.79 9.57 ## (Intercept) 6.49 tear gloss opacity 13

  14. Geometry of LS i residuals. 14 • Let P = X ( X T X ) − 1 X T . • P is symmetric and idempotent : P 2 = X ( X T X ) − 1 X T X ( X T X ) − 1 X T = X ( X T X ) − 1 X T = P. • Let ˆ Y = X ˆ B be the fjtted values, and ˆ E = Y − ˆ Y , the • We have ˆ Y = P Y . • We also have ˆ E = ( I − P ) Y .

  15. Geometry of LS ii • Putting all this together, we get • In other words, the fjtted values and the residuals are orthogonal . 15 Y T ˆ ˆ E = ( P Y ) T ( I − P ) Y = Y T P ( I − P ) Y = Y T ( P − P 2 ) Y = 0 . • Similarly, we can see that X T ˆ E = 0 and P X = X .

  16. Geometry of LS iii 16 • Interpretation : ˆ Y is the orthogonal projection of Y onto the column space of X .

  17. Example (cont’d) i Y_hat <- fitted (fit) crossprod (Y_hat, residuals) ## tear gloss opacity ## tear -9.489298e-16 2.959810e-15 -4.720135e-15 ## gloss -1.424461e-15 1.109357e-15 -1.150262e-14 ## opacity -7.268852e-16 1.211209e-15 1.648459e-16 17 residuals <- residuals (fit)

  18. Example (cont’d) ii crossprod (X, residuals) ## tear gloss opacity ## (Intercept) 0 5.828671e-16 -4.440892e-16 ## rateHigh 0 1.387779e-16 4.440892e-16 18

  19. Example (cont’d) iii # Is this really zero? isZero <- function (mat) { all.equal (mat, matrix (0, ncol = ncol (mat), nrow = nrow (mat)), check.attributes = FALSE) } isZero ( crossprod (Y_hat, residuals)) ## [1] TRUE 19

  20. Example (cont’d) iv isZero ( crossprod (X, residuals)) ## [1] TRUE 20

  21. Bootstrapped Confjdence Intervals i • We still have not made any assumption about the covariance function. • Let’s see how much further we can go. • We will use bootstrap to derive confjdence intervals for our quantities of interest. • Bootstrap is a resampling technique for estimating the sampling distribution of an estimator of interest. • Particularly useful when we think the usual assumptions may not hold, or when the sampling distribution would be diffjcult to derive. 21 distribution of Y , beyond the conditional mean and

  22. Bootstrapped Confjdence Intervals ii • Let’s say we want to estimate the sampling distribution of the correlation coeffjcient. • The idea is to resample with replacement from our sample to mimic the process of “repeating the experiment”. 22 • We have a sample of pairs ( U 1 , V 1 ) , . . . , ( U n , V n ) , from which we estimated the correlation ˆ ρ .

  23. • We now have a whole sample of correlation coeffjcients Bootstrapped Confjdence Intervals iii • From its quantiles, we can derive a confjdence interval for 23 • For each bootstrap sample ( U ( b ) 1 , V ( b ) 1 ) , . . . , ( U ( b ) n , V ( b ) n ) , ρ ( b ) . we compute the sample correlation ˆ ρ (1) , . . . , ˆ ρ ( B ) . ˆ ρ . ˆ

  24. Example i library (candisc) dataset <- HSB[, c ("math", "sci")] (corr_est <- cor (dataset)[1,2]) ## [1] 0.6495261 24

  25. Example ii # Choose a number of bootstrap samples B <- 5000 corr_boot <- replicate (B, { replace = TRUE) dataset_boot <- dataset[samp_boot,] cor (dataset_boot)[1,2] }) quantile (corr_boot, probs = c (0.025, 0.975)) 25 samp_boot <- sample ( nrow (dataset),

  26. Example iii ## 2.5% 97.5% ## 0.6037029 0.6924364 hist (corr_boot, breaks = 50) abline (v = corr_est, col = 'red', lty = 2) 26

  27. Example iv 27 Histogram of corr_boot 400 300 Frequency 200 100 0 0.55 0.60 0.65 0.70 corr_boot

  28. Bootstrapped Confjdence Intervals (cont’d) i • Going back to our multivariate linear regression setting, we can bootstrap our estimate of the matrix of regression coeffjcients! • It’s important to sample the same rows in both • For each bootstrap sample, we can compute the estimate • From these samples, we can compute confjdence intervals 28 • We will sample with replacement the rows of Y and X matrices. We want to keep the relationship between Y and X intact. ˆ B ( b ) . for each entry in B .

  29. Bootstrapped Confjdence Intervals (cont’d) ii • We can also technically compute confjdence regions for • E.g. a whole column or a whole row • But multivariate quantiles are tricky… 29 multiple entries in B

  30. Example (cont’d) i B_boot <- replicate (B, { replace = TRUE) X_boot <- X[samp_boot,] Y_boot <- Y[samp_boot,] solve ( crossprod (X_boot)) %*% t (X_boot) %*% Y_boot }) # The output is a 3-dim array dim (B_boot) 30 samp_boot <- sample ( nrow (Y),

  31. Example (cont’d) ii ## (Intercept) 6.5545455 probs = c (0.025, 0.975)) quantile (B_boot["rateHigh", "tear",], # CI for effect of rate on tear 0.4787879 -0.1535354 0.7515152 ## rateHigh 9.5090909 3.7818182 opacity ## [1] gloss tear ## B_boot[,,1] 3 5000 2 31

  32. Example (cont’d) iii ## 2.5% 97.5% ## 0.2738049 0.9125000 # CI for effect of rate on gloss quantile (B_boot["rateHigh", "gloss",], probs = c (0.025, 0.975)) ## 2.5% 97.5% ## -0.8967100 -0.1040152 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend