Theory of Regression Analysis with Applications T Padma Ragaleena - - PowerPoint PPT Presentation

theory of regression analysis with applications
SMART_READER_LITE
LIVE PREVIEW

Theory of Regression Analysis with Applications T Padma Ragaleena - - PowerPoint PPT Presentation

Multiple-linear regression model Theory of Regression Analysis with Applications T Padma Ragaleena National Institute of Science Education and Research Bhubaneswar 20 November 2019 T Padma Ragaleena NISER Theory of Regression Analysis with


slide-1
SLIDE 1

Multiple-linear regression model

Theory of Regression Analysis with Applications

T Padma Ragaleena

National Institute of Science Education and Research Bhubaneswar

20 November 2019

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 1 / 38

slide-2
SLIDE 2

Multiple-linear regression model

Multiple-linear regression model

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 2 / 38

slide-3
SLIDE 3

Multiple-linear regression model

Regression model

Response Regressor 1 Regressor 2 · · · Regressor k y x1 x2 · · · xk y1 x11 x12 · · · x1k y2 x21 x22 · · · x2k · · · · · · · · yn xn1 xn2 · · · xnk Y = Xβ + ǫ where ǫ ∼ N(0, σ2I) We also assume : cov(ǫi, ǫj) = 0 for all i = j Y is a random vector ; all xi’s are not random and they are known with negligible error We assume the existence of at least an approximate linear relationship between response variables and other regressors.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 3 / 38

slide-4
SLIDE 4

Multiple-linear regression model

Why are we assuming a distribution for ǫ? to get the p-values and confidence intervals for quantities of interest (hypothesis testing) Why did we choose normal distribution? It describes random errors in real world processes reasonably well There is well developed mathematical theory behind normal distribution Are non-normal distributions useful? In financial models, errors are assumed to come from a heavy tailed distribution , normal distribution is not suitable here.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 4 / 38

slide-5
SLIDE 5

Multiple-linear regression model

Least Square Estimates

How do we estimate β SSE = n

i=1(yi − ˆ

yi)2 measures the amount of deviation of the predicted value from the true value. One way to get a “good estimate” for β is to minimize the SSE. So we minimize S(β) = (y − Xβ)′(y − Xβ) with respect to β and call the minimizing vector as the Least Square Estimate(LSE) for the model. It is denoted by ˆ β . In order to find the β which minimizes S(β), we use the following property of Hilbert spaces: Closest point theorem Let M be a closed convex subset of a Hilbert space H, x ∈ M then ∃!y0 ∈ M such that ||x − y0|| ≤ ||x − m|| for all m ∈ M. Also, y0 − x ∈ M⊥ Using this theorem, we get : ˆ β = (X ′X)−1X ′Y = Least Square Estimate

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 5 / 38

slide-6
SLIDE 6

Multiple-linear regression model

Least square estimates

In Hilbert spaces, y0 is called the projection of x on to the subspace M. Similarly, H = X(X ′X)−1X ′ is called projection matrix because ˆ y = Hy For Hilbert spaces, we know that the projection map defined as P(x) = y0 is

  • idempotent. Here also, H is idempotent i.e. H2 = H

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 6 / 38

slide-7
SLIDE 7

Multiple-linear regression model

Properties of least square estimates(LSE)

LSE is an unbiased estimate for β ˆ β is a maximum likelihood estimator for β. Least square estimators are Best Linear Unbiased Estimators - BLUE (Gauss-Markov theorem) Gauss-Markov theorem Let Y = Xβ + ǫ be a regression model such that each ǫi follows a distribution with mean 0 , variance σ2 and cov(ǫi, ǫj) = 0. Then the LSE are Best Linear Unbiased Estimators. Observe that no normality is assumed for the errors ˆ β is best = ⇒ Var(a′ ˆ β) ≤ Var(a′ ˜ β) for all a ∈ Rp ; ˜ β = any other linear unbiased estimate

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 7 / 38

slide-8
SLIDE 8

Multiple-linear regression model

Coefficient of determination

n

i=1(yi − ¯

y)2 = n

i=1(yi − ˆ

yi)2 + n

i=1(ˆ

yi − ¯ y)2 i.e. SST = SSRes + SSR SST measures the total variation of yi’s around ¯ y SSRes measures the variation that could not be explained by the model. SSR is the variation that can be explained by the model. Then R2 = 1 − SSRes

SST

∈ [0, 1] gives a proportion of variation in yi that could be explained by the model.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 8 / 38

slide-9
SLIDE 9

Multiple-linear regression model

Coefficient of determination

Consider the data containing temperature (x-variable) and the log of the light intensity(y- variable) of 47 stars in the star cluster CYG OB1 data("CYGOB1") model1 <- lm(CYGOB1$logli CYGOB1$logst , data = CYGOB1) summary(model1)$r.squared 0.04427374 Regerssion captures only 4.4% variation . This is not a good model.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 9 / 38

slide-10
SLIDE 10

Multiple-linear regression model

Tests of significance

H0 : βj = 0 for all j against H1 : at least one βj = 0 tests if there exists any linear relationship between response and predictors. Test Statistic: Under the null hypothesis

SSR k SSRes σ2

∼ Fk,n−p Under a level of significance α, we have enough evidence to reject H0 in favour of H1 if |F ∗| ≥ F α

2 ;k,n−p

  • r reject the null hypothesis in favour of H1 if

p-value ≤ α

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 10 / 38

slide-11
SLIDE 11

Multiple-linear regression model

Tests of significance

Once we know that previous null hypothesis is rejected, then our next aim would be to know which coefficients βj are non-zero. H0 : βj = 0 against βj = 0 Test Statistic: Under the null hypothesis:

ˆ βj

  • Var( ˆ

βj ) ∼ tn−k−1

Under a level of significance α, we have enough evidence to reject H0 in favour of H1 if |t∗| ≥ t α

2 ;n−k−1

  • r reject the null hypothesis in favour of H1 if

p-value ≤ α

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 11 / 38

slide-12
SLIDE 12

Multiple-linear regression model

Tests of significance

A more general hypothesis would be to test the r linearly independent hypothesis i.e. H0 : ˆ β0ai0 + ˆ β1ai1 + ... + ˆ βkaik = bi for all i = 1, 2, ..., r. In other words, the hypothesis we want to test is H0 : Aˆ β = ˜ b where T is a known linear transformation. Test statistic: Under the null hypothesis:

(A ˆ β−˜ b)′(A(X′X)−1A′)−1(A ˆ β−˜ b) r ˆ σ2

∼ Fr,n−p Under a level of significance α, we have enough evidence to reject H0 in favour of H1 if |F ∗| ≥ F α

2 ;r,n−p

  • r reject the null hypothesis in favour of H1 if

p-value ≤ α

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 12 / 38

slide-13
SLIDE 13

Multiple-linear regression model

Regression Diagnostics

Our aim is to check if our model follows the regression assumptions. A few remedies are suggested if the assumptions are not being followed. The validity of these assumption is needed for the results to be meaningful. If these assumptions are violated, the results can be incorrect or misleading. So such underlying assumptions have to be verified before attempting to do regression modeling.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 13 / 38

slide-14
SLIDE 14

Multiple-linear regression model

Residuals

Residuals ei = yi − ˆ yi can be thought of as a realization of the error terms. Thus any departure from assumptions on errors, should show up in the residuals. We can show that e = (I − H)ǫ. Hence Var(e) = σ2(I − H). Even though errors ǫi are assumed to be uncorrelated and independent, the residuals ei’s are correlated and hence dependent.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 14 / 38

slide-15
SLIDE 15

Multiple-linear regression model

Normality assumption

Q-Q plot is a graphical tool that is used to assess normality. It plots the theoretical quantiles (horizontal axis) against the sample quantiles (vertical axis) ) Using the residual values (ei), an empirical distribution is constructed using which we get sample quantiles. If X is a discrete random variable, then ξp is called the pth quantile of a random variable X if P(X ≤ ξp) ≥ p and P(X ≥ ξp) ≥ 1 − p If X is a continuous random variable, then pth quatile is the unique ξp such that P(X ≤ ξp) = p

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 15 / 38

slide-16
SLIDE 16

Multiple-linear regression model

Q-Q plot

Here we want to check if the residuals ei are coming from a normal distribution. Considering the residual values we have, we can estimate the cdf from which these points have come from as: ˆ F(x) = 1

n

n

i=1 I(ei ≤ x)

If e(1) ≤ e(2) ≤ ... ≤ e(n), then e(i) will be the i

n th quantile.

Plot ˆ F −1( i

n ) = ξ 1

n against Φ−1( i

n )

If the normality assumption is followed then the plot has to be an approximate y = x line.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 16 / 38

slide-17
SLIDE 17

Multiple-linear regression model

Normal Q-Q plot

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 17 / 38

slide-18
SLIDE 18

Multiple-linear regression model

Non-normal Q-Q plot

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 18 / 38

slide-19
SLIDE 19

Multiple-linear regression model

Data Example

Consider the "LifeCycleSavings" data set in R. This is a model proposed by Franco Modigliani to estimate savings ratio of a country. g <- lm(sr pop15 + pop75 + dpi + ddpi , data = LifeCycleSavings) “sr” is the savings ratio “pop15” is percentage of people under 15 “pop75” is percentage of people over 75 “dpi” is per capita disposable income “ddpi” is the percentage growth of dpi

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 19 / 38

slide-20
SLIDE 20

Multiple-linear regression model

Q-Q plot

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 20 / 38

slide-21
SLIDE 21

Multiple-linear regression model

Kolmogorov-Smirnov Test

We should not rely on graphical tools to draw conclusions. A formal test to check for normality assumption is Kolomogorov-Smirnov test. If X1, X2...Xn are assumed to come from a known continuous distribution P . Then we want to test the null hypothesis H0 : The samples come from P against H1 : they do not come from P . Let Fexp be the cdf associated with the null hypothesis and the empirical distribution function Fobs is given by : Fobs(x) =

1 total no. of obs

n

i=1 I(Xi ≤ x).

The test statistic is: D = sup|Fexp(x) − Fobs(x)|, sup over all x

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 21 / 38

slide-22
SLIDE 22

Multiple-linear regression model

Kolomogorov-Smirnov test

One-sample Kolmogorov-Smirnov test data: as.numeric(rstudent(g)) D = 0.067991, p-value = 0.9628 alternative hypothesis: two-sided A very high p-value indicates that it is very likely that the normality assumption is being followed. Other tests like Anderson Darling test, Shiapiro-Wilk test also exist to check the normal- ity assumption.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 22 / 38

slide-23
SLIDE 23

Multiple-linear regression model

Constant Variance assumption

Fitted values vs Residuals for data from standard normal distribution

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 23 / 38

slide-24
SLIDE 24

Multiple-linear regression model

Constant Variance assumption

Fitted values vs Residuals for data from normal distribution with non-constant variance

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 24 / 38

slide-25
SLIDE 25

Multiple-linear regression model

Constant Variance assumption

Fitted values vs Residuals for LifeCycleSavings dataset

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 25 / 38

slide-26
SLIDE 26

Multiple-linear regression model

Linearity Assumption

We can check the linearity assumption using the lack of fit test. In order to apply this test we should make sure that all the other assumptions are followed and only linearity is being questioned. Requirement : Take more than one observation for response given response x. xi = ⇒ yi1, ..., yi,ni m

i=1

ni

j=1(yij − ˆ

yi)2 = m

i=1

ni

j=1(yij − ¯

yi)2 + m

i=1 ni(ˆ

yi − ¯ yi)2 SSRes = SSPE + SSLOF If the true regression function is linear: SSLOF(n−m)

SSPE(m−2) ∼ Fm−2,n−m

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 26 / 38

slide-27
SLIDE 27

Multiple-linear regression model

Box-Cox transformation

To correct the normality assumption if it isn’t being followed. Consider the “gala” data from R

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 27 / 38

slide-28
SLIDE 28

Multiple-linear regression model

Box-Cox transformation

Find λ that maximizes likelihood

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 28 / 38

slide-29
SLIDE 29

Multiple-linear regression model

Box-Cox transformation

After applying a cube root transformation

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 29 / 38

slide-30
SLIDE 30

Multiple-linear regression model

Box-Cox method

One-sample Kolmogorov-Smirnov test data: as.numeric(rstudent(gfit3)) D = 0.093249, p-value = 0.935 alternative hypothesis: two-sided Hence our transformation on response was very useful. High p-value indicates strong evidence for normality.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 30 / 38

slide-31
SLIDE 31

Multiple-linear regression model

Variance stabilizing transformations

One of the common reasons for the violation of constant variance is for the response variable to follow a distribution in which variance is a function of mean i.e. when σ2 = ω(µ). AIM: We wish to find a function f such that Var(f(Y)) is roughly constant i.e. we “trans- form” the response variable. f(Y) ≈ f(µ) + (Y − µ)f ′(µ) ⇒ [f(Y) − f(µ)]2 ≈ (Y − µ)2[f ′(µ)]2 hence, V(f(Y)) ≈ V(Y) × [f ′(µ)]2

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 31 / 38

slide-32
SLIDE 32

Multiple-linear regression model

Multicollinearity

The problem of multi-collinearity is said to exist when two or more regressor variables are strongly correlated. Or in other words, the columns of X exhibit near linear de- pendencies, then the problem of multicollinearity is said to exist. In case of perfect multicollinearity X will not be invertible. We cannot find the least square estimates when multicollinearity exists

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 32 / 38

slide-33
SLIDE 33

Multiple-linear regression model

Least Square Estimates

How do we estimate β SSE = n

i=1(yi − ˆ

yi)2 measures the amount of deviation of the predicted value from the true value. One way to get a “good estimate” for β is to minimize the SSE. So we minimize S(β) = (y − Xβ)′(y − Xβ) with respect to β and call the minimizing vector as the Least Square Estimate(LSE) for the model. It is denoted by ˆ β . In order to find the β which minimizes S(β), we use the following property of Hilbert spaces: Closest point theorem Let M be a closed convex subset of a Hilbert space H, x ∈ M then ∃!y0 ∈ M such that ||x − y0|| ≤ ||x − m|| for all m ∈ M. Also, y0 − x ∈ M⊥ Using this theorem, we get : ˆ β = (X ′X)−1X ′Y = Least Square Estimate

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 33 / 38

slide-34
SLIDE 34

Multiple-linear regression model

Problem with multi-collinearity

We can show that Var( ˆ βj) = cjjσ2 where cjj is the jth diagonal element of (X ′X)−1 In this case, Cjj =

1 1−R2

j

where Rj is the coefficient of determination when we regress xj on the remaining p-variables. If multicollinearity exists, Var( ˆ βj) − → ∞ as R2

j −

→ 1 This would mean that our estimates would be unreliable.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 34 / 38

slide-35
SLIDE 35

Multiple-linear regression model

VariationInflation Factors(VIF)

VIF exists for each of the predictors in a multiple regression model. VIF for the jth predictor is given by VIFj =

1 1−R2

j

. Rule of thumb: If VIF > 4 , it warrents further investigation. VIF > 10 indicates serious multicollinearity. The followinf are the VIF values when BP is regressoed with respect to BSA and weight. data$Weight data$BSA 4.276401 4.276401 We see some evidence of multicollinearity but we need to we need more evidence to confirm.

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 35 / 38

slide-36
SLIDE 36

Multiple-linear regression model

Ridge Regression

The ridge coefficients minimize a penalized residual sum of squares, n

i=1(yi − β0 − k j=1 xijβj)2 + λ k i=1 β2 k )2 is minimized w.r.t. β

This is equivalent to minimizing n

i=1(yi − β0 − k j=1 xijβj)2 given that k i=1 β2 k < s

βridge = (X ′X + λI)−1X ′Y

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 36 / 38

slide-37
SLIDE 37

Multiple-linear regression model

Data example

Consider the “meatspec” data in R from faraway package. modified HKB estimator is 2.363535e-08 modified L-W estimator is 0.907997 smallest value of GCV at 3.25e-08 So the value of λ obtained is 3.25e − 08

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 37 / 38

slide-38
SLIDE 38

Multiple-linear regression model

Ridge trace

T Padma Ragaleena NISER Theory of Regression Analysis with Applications 20 November 2019 38 / 38