More Regression Algebra James H. Steiger Department of Psychology - - PowerPoint PPT Presentation

more regression algebra
SMART_READER_LITE
LIVE PREVIEW

More Regression Algebra James H. Steiger Department of Psychology - - PowerPoint PPT Presentation

More Regression Algebra James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 22 More Regression Algebra Introduction 1 Random Multiple Linear Regression: The


slide-1
SLIDE 1

More Regression Algebra

James H. Steiger

Department of Psychology and Human Development Vanderbilt University

James H. Steiger (Vanderbilt University) 1 / 22

slide-2
SLIDE 2

More Regression Algebra

1

Introduction

2

Random Multiple Linear Regression: The Model

3

Random Multiple Linear Regression: Solution for β

4

Orthogonality Properties Least Squares β Weights Imply Orthogonality Orthogonality Implies A Least Squares β

5

Error Covariance Structure

6

Coefficient of Determination

7

Additivity of Covariances

8

Applications Regression Component Analysis

James H. Steiger (Vanderbilt University) 2 / 22

slide-3
SLIDE 3

Introduction

Introduction

A number of important multivariate methods build on the algebra of multivariate linear regression, because they are least squares multiple regression systems, i.e., systems where one or more criteria are predicted as linear combinations of one or more predictors, with

  • ptimal prediction defined by a least squares criterion.

James H. Steiger (Vanderbilt University) 3 / 22

slide-4
SLIDE 4

Introduction

Introduction

A number of important multivariate methods build on the algebra of multivariate linear regression, because they are least squares multiple regression systems, i.e., systems where one or more criteria are predicted as linear combinations of one or more predictors, with

  • ptimal prediction defined by a least squares criterion.

In this module, we discuss some key results in multiple regression and multivariate regression that have significant implications in the context of other multivariate methods.

James H. Steiger (Vanderbilt University) 3 / 22

slide-5
SLIDE 5

Introduction

Introduction

A number of important multivariate methods build on the algebra of multivariate linear regression, because they are least squares multiple regression systems, i.e., systems where one or more criteria are predicted as linear combinations of one or more predictors, with

  • ptimal prediction defined by a least squares criterion.

In this module, we discuss some key results in multiple regression and multivariate regression that have significant implications in the context of other multivariate methods. We illustrate the algebra with a couple of theoretical derivations.

James H. Steiger (Vanderbilt University) 3 / 22

slide-6
SLIDE 6

Random Multiple Linear Regression: The Model

The Model

Unlike the fixed score multiple regression model frequently employed, this one assumes that both predictor and criterion variables are random.

James H. Steiger (Vanderbilt University) 4 / 22

slide-7
SLIDE 7

Random Multiple Linear Regression: The Model

The Model

Unlike the fixed score multiple regression model frequently employed, this one assumes that both predictor and criterion variables are random. Suppose we have a random variable y that we wish to predict from a set of random variables that are in the random vector x.

James H. Steiger (Vanderbilt University) 4 / 22

slide-8
SLIDE 8

Random Multiple Linear Regression: The Model

The Model

Unlike the fixed score multiple regression model frequently employed, this one assumes that both predictor and criterion variables are random. Suppose we have a random variable y that we wish to predict from a set of random variables that are in the random vector x. To simplify matters, assume all variables are in deviation score form, i.e., have means of zero.

James H. Steiger (Vanderbilt University) 4 / 22

slide-9
SLIDE 9

Random Multiple Linear Regression: The Model

The Model

Unlike the fixed score multiple regression model frequently employed, this one assumes that both predictor and criterion variables are random. Suppose we have a random variable y that we wish to predict from a set of random variables that are in the random vector x. To simplify matters, assume all variables are in deviation score form, i.e., have means of zero. The prediction system is linear, so we may write y = β′x + e (1)

James H. Steiger (Vanderbilt University) 4 / 22

slide-10
SLIDE 10

Random Multiple Linear Regression: Solution for β

Solution for β Weights

We choose β to minimize the expected squared error, i.e., to minimize E(e2).

James H. Steiger (Vanderbilt University) 5 / 22

slide-11
SLIDE 11

Random Multiple Linear Regression: Solution for β

Solution for β Weights

We choose β to minimize the expected squared error, i.e., to minimize E(e2). It is easy to see (C.P.) that E(e2) = σ2

y − 2σyxβ + β′Σxxβ

(2)

James H. Steiger (Vanderbilt University) 5 / 22

slide-12
SLIDE 12

Random Multiple Linear Regression: Solution for β

Solution for β Weights

We choose β to minimize the expected squared error, i.e., to minimize E(e2). It is easy to see (C.P.) that E(e2) = σ2

y − 2σyxβ + β′Σxxβ

(2) Minimizing this involves taking the partial derivative of E(e2) with respect to β, setting the resulting equation to zero, and solving for β. The well-known result is that β = Σ−1

xx σxy

(3)

James H. Steiger (Vanderbilt University) 5 / 22

slide-13
SLIDE 13

Random Multiple Linear Regression: Solution for β

Multiple Linear Regression: Solution for β Weights

The preceding result assumed a single criterion variable y.

James H. Steiger (Vanderbilt University) 6 / 22

slide-14
SLIDE 14

Random Multiple Linear Regression: Solution for β

Multiple Linear Regression: Solution for β Weights

The preceding result assumed a single criterion variable y. In least squares multivariate linear regression, we have 2 or more criteria, so the model becomes y = β′x + e (4)

James H. Steiger (Vanderbilt University) 6 / 22

slide-15
SLIDE 15

Random Multiple Linear Regression: Solution for β

Multiple Linear Regression: Solution for β Weights

The preceding result assumed a single criterion variable y. In least squares multivariate linear regression, we have 2 or more criteria, so the model becomes y = β′x + e (4) In this case, we wish to select β to minimize the overall average squared error, i.e., to minimize Tr E(ee′). It turns out that the solution is essentially the same as before, i.e., β = Σ−1

xx Σxy

(5)

James H. Steiger (Vanderbilt University) 6 / 22

slide-16
SLIDE 16

Orthogonality Properties Least Squares β Weights Imply Orthogonality

Orthogonality Properties

Least Squares β Weights Imply Orthogonality

Suppose we have linear regression system where β = Σ−1

xx Σxy. There

are a number of immediate consequences.

James H. Steiger (Vanderbilt University) 7 / 22

slide-17
SLIDE 17

Orthogonality Properties Least Squares β Weights Imply Orthogonality

Orthogonality Properties

Least Squares β Weights Imply Orthogonality

Suppose we have linear regression system where β = Σ−1

xx Σxy. There

are a number of immediate consequences. One consequence is that x and e are orthogonal, because their covariance matrix is a null matrix. Cov(x, e) = E(xe′) = E(x(y − β′x)′) = E(xy′) − E(xx′β) = Σxy − ΣxxΣ−1

xx Σxy

= Σxy − IΣxy =

James H. Steiger (Vanderbilt University) 7 / 22

slide-18
SLIDE 18

Orthogonality Properties Least Squares β Weights Imply Orthogonality

Orthogonality Properties

Least Squares β Weights Imply Orthogonality

Suppose we have linear regression system where β = Σ−1

xx Σxy. There

are a number of immediate consequences. One consequence is that x and e are orthogonal, because their covariance matrix is a null matrix. Cov(x, e) = E(xe′) = E(x(y − β′x)′) = E(xy′) − E(xx′β) = Σxy − ΣxxΣ−1

xx Σxy

= Σxy − IΣxy = Of course, if x and e are orthogonal, ˆ y and e must also be orthogonal.

James H. Steiger (Vanderbilt University) 7 / 22

slide-19
SLIDE 19

Orthogonality Properties Orthogonality Implies A Least Squares β

Orthogonality Implies a Least Squares β

We have seen that a least squares β implies orthogonality.

James H. Steiger (Vanderbilt University) 8 / 22

slide-20
SLIDE 20

Orthogonality Properties Orthogonality Implies A Least Squares β

Orthogonality Implies a Least Squares β

We have seen that a least squares β implies orthogonality. It turns out that, in a linear system of the form y = β′x + e,

  • rthogonality of x and e implies that the β must be the least squares

β. (You can prove this as a homework assignment.)

James H. Steiger (Vanderbilt University) 8 / 22

slide-21
SLIDE 21

Error Covariance Structure

Error Covariance Structure

As a straightforward consequence of the formula for a least squares β, the covariance matrix of the errors in least squares regression is Σee = Σyy − β′Σxxβ = Σyy − ΣyxΣ−1

xx Σxy

James H. Steiger (Vanderbilt University) 9 / 22

slide-22
SLIDE 22

Error Covariance Structure

Error Covariance Structure

As a straightforward consequence of the formula for a least squares β, the covariance matrix of the errors in least squares regression is Σee = Σyy − β′Σxxβ = Σyy − ΣyxΣ−1

xx Σxy

In this case, Σee is the partial covariance matrix of the variables in y, with those in x partialled out.

James H. Steiger (Vanderbilt University) 9 / 22

slide-23
SLIDE 23

Coefficient of Determination

Coefficient of Determination

The coefficient of determination R2

pop is the square of the correlation

between the predicted scores and the criterion scores.

James H. Steiger (Vanderbilt University) 10 / 22

slide-24
SLIDE 24

Coefficient of Determination

Coefficient of Determination

The coefficient of determination R2

pop is the square of the correlation

between the predicted scores and the criterion scores. As a generalization of something we showed in Psychology 310, it is easy to prove (C.P.) that Cov(yj, ˆ yj) = Var(ˆ yj), and we shall use that fact below.

James H. Steiger (Vanderbilt University) 10 / 22

slide-25
SLIDE 25

Coefficient of Determination

Coefficient of Determination

The coefficient of determination R2

pop is the square of the correlation

between the predicted scores and the criterion scores. As a generalization of something we showed in Psychology 310, it is easy to prove (C.P.) that Cov(yj, ˆ yj) = Var(ˆ yj), and we shall use that fact below. The correlation between the jth criterion variable yj and the predictors is given by Rj = Cov(yj, ˆ yj)

  • Var(yj) Var(ˆ

yj) = Var(ˆ yj)

  • Var(yj) Var(ˆ

yj) =

  • Var(ˆ

yj) Var(yj) whence R2

j

= Var(ˆ yj) Var(yj)

James H. Steiger (Vanderbilt University) 10 / 22

slide-26
SLIDE 26

Coefficient of Determination

Coefficient of Determination

We then obtain R2

j

= Var(ˆ yj) Var(yj) = σ′

yjxΣ−1 xx σxyj

σ2

yj

= σ′

yjxβj

σ2

yj

James H. Steiger (Vanderbilt University) 11 / 22

slide-27
SLIDE 27

Additivity of Covariances

Additivity of Covariances

In a least squares linear regression system, we may write y = ˆ y + e, and, because the predicted and error portions are uncorrelated, we may write Var(y) = Var(ˆ y) + Var(e) (6)

James H. Steiger (Vanderbilt University) 12 / 22

slide-28
SLIDE 28

Additivity of Covariances

Additivity of Covariances

In a least squares linear regression system, we may write y = ˆ y + e, and, because the predicted and error portions are uncorrelated, we may write Var(y) = Var(ˆ y) + Var(e) (6) Furthermore, since ˆ y = β′x, we may also write Var(y) = Σyy = [β′Σxxβ] + [Σyy − β′Σxxβ] (7)

James H. Steiger (Vanderbilt University) 12 / 22

slide-29
SLIDE 29

Additivity of Covariances

Additivity of Covariances

In a least squares linear regression system, we may write y = ˆ y + e, and, because the predicted and error portions are uncorrelated, we may write Var(y) = Var(ˆ y) + Var(e) (6) Furthermore, since ˆ y = β′x, we may also write Var(y) = Σyy = [β′Σxxβ] + [Σyy − β′Σxxβ] (7) This formula gives explicit expressions for the partitioning of variances and covariances into predicted and error components in least squares multivariate regression.

James H. Steiger (Vanderbilt University) 12 / 22

slide-30
SLIDE 30

Applications

Applications

In this section, we examine a few well-known applications of the theory developed in previous sections.

James H. Steiger (Vanderbilt University) 13 / 22

slide-31
SLIDE 31

Applications Regression Component Analysis

Applications

Regression Component Analysis

“Component analysis” is a well-known alternative to common factor analysis.

James H. Steiger (Vanderbilt University) 14 / 22

slide-32
SLIDE 32

Applications Regression Component Analysis

Applications

Regression Component Analysis

“Component analysis” is a well-known alternative to common factor analysis. Both component and factor analysis are commonly thought of as “factor analytic methods,” although they have some important differences.

James H. Steiger (Vanderbilt University) 14 / 22

slide-33
SLIDE 33

Applications Regression Component Analysis

Applications

Regression Component Analysis

“Component analysis” is a well-known alternative to common factor analysis. Both component and factor analysis are commonly thought of as “factor analytic methods,” although they have some important differences. The best known example of component analysis is Principal Component Analysis, or PCA.

James H. Steiger (Vanderbilt University) 14 / 22

slide-34
SLIDE 34

Applications Regression Component Analysis

Applications

Regression Component Analysis

“Component analysis” is a well-known alternative to common factor analysis. Both component and factor analysis are commonly thought of as “factor analytic methods,” although they have some important differences. The best known example of component analysis is Principal Component Analysis, or PCA. PCA is a special case of a more general system known as “regression component analysis” (Sch¨

  • nemann and Steiger, 1976).

James H. Steiger (Vanderbilt University) 14 / 22

slide-35
SLIDE 35

Applications Regression Component Analysis

Applications

Regression Component Analysis

A set of “components” x of a set of random variables y is any set of linear combinations of y.

James H. Steiger (Vanderbilt University) 15 / 22

slide-36
SLIDE 36

Applications Regression Component Analysis

Applications

Regression Component Analysis

A set of “components” x of a set of random variables y is any set of linear combinations of y. Specifically, we write x = B′y (8)

James H. Steiger (Vanderbilt University) 15 / 22

slide-37
SLIDE 37

Applications Regression Component Analysis

Applications

Regression Component Analysis

A set of “components” x of a set of random variables y is any set of linear combinations of y. Specifically, we write x = B′y (8) A regression component system is of the form y = Fx + e (9) where x = B′y is a set of components of y, and F, known as the “component pattern”, is the set of least squares linear regression weights for predicting y from x.

James H. Steiger (Vanderbilt University) 15 / 22

slide-38
SLIDE 38

Applications Regression Component Analysis

Applications

Regression Component Analysis

Notice that the system is completely tautological in one sense, since e = (I − FB′)y, and so of course y = F(B′y) + (I − FB′)y (10)

James H. Steiger (Vanderbilt University) 16 / 22

slide-39
SLIDE 39

Applications Regression Component Analysis

Applications

Regression Component Analysis

Notice that the system is completely tautological in one sense, since e = (I − FB′)y, and so of course y = F(B′y) + (I − FB′)y (10) Once B is established for a given y, the components are uniquely

  • defined. In a sense, examining B establishes the relationship between

the components and the variables used to construct them.

James H. Steiger (Vanderbilt University) 16 / 22

slide-40
SLIDE 40

Applications Regression Component Analysis

Applications

Regression Component Analysis

Notice that the system is completely tautological in one sense, since e = (I − FB′)y, and so of course y = F(B′y) + (I − FB′)y (10) Once B is established for a given y, the components are uniquely

  • defined. In a sense, examining B establishes the relationship between

the components and the variables used to construct them. The real “payoff” for RCA is when the p × m matrix B′ has only a few columns, so that p, the number of variables in y, is much smaller than m, the number of components, and yet the error variance is small.

James H. Steiger (Vanderbilt University) 16 / 22

slide-41
SLIDE 41

Applications Regression Component Analysis

Applications

Regression Component Analysis

In a regression component system, once B is defined, then for any set

  • f data, the “component pattern” F is automatically defined.

Conversely, any given F corresponds to a derivable B.

James H. Steiger (Vanderbilt University) 17 / 22

slide-42
SLIDE 42

Applications Regression Component Analysis

Applications

Regression Component Analysis

In a regression component system, once B is defined, then for any set

  • f data, the “component pattern” F is automatically defined.

Conversely, any given F corresponds to a derivable B. As an example, suppose we try to derive the facts about F and B.

James H. Steiger (Vanderbilt University) 17 / 22

slide-43
SLIDE 43

Applications Regression Component Analysis

Applications

Regression Component Analysis

In a regression component system, once B is defined, then for any set

  • f data, the “component pattern” F is automatically defined.

Conversely, any given F corresponds to a derivable B. As an example, suppose we try to derive the facts about F and B. To begin with, suppose that the scores in y are in deviation score

  • form. Since x, e, and ˆ

y are all linear combinations of y, they must also in deviation score form.

James H. Steiger (Vanderbilt University) 17 / 22

slide-44
SLIDE 44

Applications Regression Component Analysis

Applications

Regression Component Analysis

In a regression component system, once B is defined, then for any set

  • f data, the “component pattern” F is automatically defined.

Conversely, any given F corresponds to a derivable B. As an example, suppose we try to derive the facts about F and B. To begin with, suppose that the scores in y are in deviation score

  • form. Since x, e, and ˆ

y are all linear combinations of y, they must also in deviation score form. To begin with, let me ask you to derive Σyx, the covariance matrix between y and the components in x, in terms of Σyy and B.

James H. Steiger (Vanderbilt University) 17 / 22

slide-45
SLIDE 45

Applications Regression Component Analysis

Applications

Regression Component Analysis

In a regression component system, once B is defined, then for any set

  • f data, the “component pattern” F is automatically defined.

Conversely, any given F corresponds to a derivable B. As an example, suppose we try to derive the facts about F and B. To begin with, suppose that the scores in y are in deviation score

  • form. Since x, e, and ˆ

y are all linear combinations of y, they must also in deviation score form. To begin with, let me ask you to derive Σyx, the covariance matrix between y and the components in x, in terms of Σyy and B. Before clicking on the button to move to the next slide, take a few seconds to see if you can derive the answer. (Hint: Σyx = E(yx′).)

James H. Steiger (Vanderbilt University) 17 / 22

slide-46
SLIDE 46

Applications Regression Component Analysis

Applications

Regression Component Analysis

Here is the solution. Σyx = E(yx′) (11)

James H. Steiger (Vanderbilt University) 18 / 22

slide-47
SLIDE 47

Applications Regression Component Analysis

Applications

Regression Component Analysis

Here is the solution. Σyx = E(yx′) (11) But x = B′y, so Σyx = E(y(B′y)′) = E(yy′B) = E(yy′)B = ΣyyB (12)

James H. Steiger (Vanderbilt University) 18 / 22

slide-48
SLIDE 48

Applications Regression Component Analysis

Applications

Regression Component Analysis

Here is another fairly straightforward problem for you.

James H. Steiger (Vanderbilt University) 19 / 22

slide-49
SLIDE 49

Applications Regression Component Analysis

Applications

Regression Component Analysis

Here is another fairly straightforward problem for you. Express Σxx, the variance-covariance matrix of the x components, in terms of B and Σyy, the variance-covariance matrix of the variables in y. When you have your answer, click on the button to move on to the next slide.

James H. Steiger (Vanderbilt University) 19 / 22

slide-50
SLIDE 50

Applications Regression Component Analysis

Applications

Regression Component Analysis

Here is the solution.

James H. Steiger (Vanderbilt University) 20 / 22

slide-51
SLIDE 51

Applications Regression Component Analysis

Applications

Regression Component Analysis

Here is the solution. Σxx = E(xx′) = E(B′yy′B) = B′ E(yy′)B = B′ΣyyB (13)

James H. Steiger (Vanderbilt University) 20 / 22

slide-52
SLIDE 52

Applications Regression Component Analysis

Applications

Regression Component Analysis

Finally, show how to construct a formula for computing F, the component pattern, from B and Σyy.

James H. Steiger (Vanderbilt University) 21 / 22

slide-53
SLIDE 53

Applications Regression Component Analysis

Applications

Regression Component Analysis

Finally, show how to construct a formula for computing F, the component pattern, from B and Σyy. Hint: remember that in a regression system, the linear weights β′ for predicting y from x are computed as ΣyxΣ−1

xx .

James H. Steiger (Vanderbilt University) 21 / 22

slide-54
SLIDE 54

Applications Regression Component Analysis

Applications

Regression Component Analysis

Finally, show how to construct a formula for computing F, the component pattern, from B and Σyy. Hint: remember that in a regression system, the linear weights β′ for predicting y from x are computed as ΣyxΣ−1

xx .

When you have your answer, click on the button to continue on to the next slide.

James H. Steiger (Vanderbilt University) 21 / 22

slide-55
SLIDE 55

Applications Regression Component Analysis

Applications

Regression Component Analysis

The solution is as follows. In this context, we have already established that Σyx = ΣyyB, and that Σxx = B′ΣyyB.

James H. Steiger (Vanderbilt University) 22 / 22

slide-56
SLIDE 56

Applications Regression Component Analysis

Applications

Regression Component Analysis

The solution is as follows. In this context, we have already established that Σyx = ΣyyB, and that Σxx = B′ΣyyB. In a regression component system, F plays the same role as β′ in the general multivariate linear regression model. So F = ΣyxΣ−1

xx

= ΣyyB(B′ΣyyB)−1 (14)

James H. Steiger (Vanderbilt University) 22 / 22