Second-Order Bias-Corrected AIC for Selecting Structural Equation - - PowerPoint PPT Presentation

second order bias corrected aic for selecting structural
SMART_READER_LITE
LIVE PREVIEW

Second-Order Bias-Corrected AIC for Selecting Structural Equation - - PowerPoint PPT Presentation

Vienna WS 1 Second-Order Bias-Corrected AIC for Selecting Structural Equation Models Kentaro H AYASHI Department of Psychology , University of Hawaii at Manoa (E-mail: hayashik@hawaii.edu) AND Hirokazu Y ANAGIHARA Department of Mathematics ,


slide-1
SLIDE 1

Vienna WS 1

Second-Order Bias-Corrected AIC for Selecting Structural Equation Models

Kentaro HAYASHI Department of Psychology, University of Hawaii at Manoa (E-mail: hayashik@hawaii.edu)

AND

Hirokazu YANAGIHARA Department of Mathematics, Hiroshima University (E-mail: yanagi@math.sci.hiroshima-u.ac.jp)

slide-2
SLIDE 2

Vienna WS 2

Introduction We derive a second-order bias correction of Akaike Information Criterion (AIC) in structural equation models (SEM) under the normal assumption when the model is overspecified. Note: “Overspecified” means the candidate models (f ’s) include the true model (ϕ). Contents:

  • 1. Introduction

(a) Structural Equation Models (SEM) (b) AIC and CV (Cross-Validation) Criterion

slide-3
SLIDE 3

Vienna WS 3

  • 2. General Theory

(a) True and Candidate Models (b) Likelihood and MLE (c) Risk, Bias, and Information Criterion (d) Estimated Bias

  • 3. Recent and Current Studies

(a) Notations (Derivatives, Expectation of Moment Matrices, Estimates

  • f Expected Moment Matrices, and Coefficients in Bias-Correction

Terms) (b) Evaluating Bias of Information Criteria (c) Asymptotic Expansion of Expectation of Estimated “Beta” Term (d) Bias of AIC (Main Result of Current Study) (e) Useful Formulas in Obtaining the Coefficient Terms

slide-4
SLIDE 4

Vienna WS 4

Structural Equation Models (SEM) References: Bollen (1989), Bartholomew and Knott (1999), Skrondal and Rabe-Hesketh (2004), Yuan and Bentler (2007)

  • SEM is one of the most frequently used multivariate techniques in

social sciences.

  • SEM aims to express the covariance structure using relatively small

number of parameters. Notation: ( ) θ Σ

slide-5
SLIDE 5

Vienna WS 5

  • The single most famous SEM is the confirmatory factor analysis

(CFA) model: = + + y f μ ε Λ , where y ( 1 p× ): Observed variables, μ ( 1 p× ): Population means, Λ (p m × ): Factor loadings (Path coefficients), f ( 1 m × ): Factors, ε ( 1 p× ): Errors. Note: CFA is a linear model but the factors (f ) are latent (NOT

  • bserved) variables.
slide-6
SLIDE 6

Vienna WS 6

  • Typical assumptions: Errors are mutually uncorrelated, and factors

and errors are uncorrelated. That is, Cov( , )

i j =

ε ε and Cov( , )

i j =

f ε .

  • The covariance structure of variables (y) is expressed as a function of:

Λ (p m × ): Factor loadings (Path coefficients), Φ (m m × ): Factor correlations, and Ψ (p p × ): Error variances.

  • That is, the covariance structure under CFA is expressed as:

( ) ′ = + θ Σ ΛΦΛ Ψ, where

1

( , , ) ( ,..., )

q

θ θ ′ ′ ′ ′ ′ = = θ λ φ ψ .

slide-7
SLIDE 7

Vienna WS 7

AIC (Akaike Information Criterion)

  • When the candidate model is overspecified, AIC is the first-order

bias-corrected estimator of the risk function based on the expected predictive Kullback-Leibler (KL) discrepancy between the true model and the candidate model. That is,

1 KL

E[AIC] ( ) R O n− = + .

  • AIC tends to choose the model with many parameters as the best

model (when the full model has too many parameters).

  • Reason: AIC tends to underestimate the bias when the candidate

model has many parameters (because the bias term of AIC is derived based on the asymptotic theory of ˆ θ).

slide-8
SLIDE 8

Vienna WS 8

  • The (negative) property of AIC that the candidate model having too

many parameters is chosen as the best model tends to appear in the

  • verspecified models.
slide-9
SLIDE 9

Vienna WS 9

Cross-Validation (CV)

  • Even when the candidate model is misspecified, by correcting the bias
  • f the cross validation (CV) criterion (Stone, 1974), the second-order

bias-corrected estimators of the risk function have been proposed under the general condition (e.g., Yanagihara, Tonda and Matsumoto, 2006). That is,

2 KL

E[CCV] ( ) R O n− = + .

  • However, the many computational tasks are need for obtaining

bias-corrected criteria and these criteria have large variance.

slide-10
SLIDE 10

Vienna WS 10

True and Candidate Models (General) Let

1,..., n

y y be p-dimensional random observation vectors, where n is the sample size.

  • The true model:

1

: ,..., . . . ( )

n

M i i d

ϕ

ϕ y y y ∼ , where

( ) ϕ y is an unknown probability density function.

  • The candidate model:

1

: ,..., . . . ( | )

f n

M i i d f y y y θ ∼ , where

1

{ ( | ); }, ( ,..., )

q

f θ θ ′ = ⊆ Θ = y θ θ θ F .

slide-11
SLIDE 11

Vienna WS 11

Candidate Model in SEM If the candidate model is specified, then

1 1

( , ( )) , ,..., . . . (1, ( )),

p N N p

N W N N i i d W ⇒ = + + S S W W W W θ θ Σ Σ ∼

where 1 N n = − .

1,..., N

W W can be regarded as independent observations. Therefore, the candidate model is:

1

: ,..., . . . (1, ( ))

f N p

M i i d W W W θ Σ ∼ .

slide-12
SLIDE 12

Vienna WS 12

Likelihood and MLE (General)

  • Log-likelihood function:

1

( | ) log ( | )

n i i

L f

=

= ∑ Y y θ θ , where

1

( ,..., )

n ′

= Y y y .

  • Maximum likelihood estimator (MLE) of θ: ˆ

arg max ( | ) L = Y

θ

θ θ

  • Convergence of MLE in the misspecified model (White, 1982) :

ˆ lim , log ( | )

n

E f

→∞ =

⎡ ⎤ ∂ ⎢ ⎥ = = ⎢ ⎥ ∂ ⎣ ⎦

y

y

θ θ

θ θ θ θ where Ey denotes an expectation with respect to y under the true model ( ) ϕ y .

slide-13
SLIDE 13

Vienna WS 13

Likelihood and MLE in SEM In SEM, the discrepancy function is:

1 1 KL KL 1

( | ) log | | ( ) 1 ( | ).

N i i

F tr p F N

− − =

= − Σ + Σ − ⇒

S S S W θ θ Therefore, the log-likelihood is:

KL

2 ( | ) ( ) ( | ) L N F − = S S θ θ , and the MLE is: ˆ arg max ( | ) L = S

θ

θ θ .

slide-14
SLIDE 14

Vienna WS 14

Risk Function, Bias, and Information Criterion

  • Risk function based on the expected predictive KL discrepancy:

KL

ˆ 2 ( | ) R E E L ⎡ ⎤ = − ⎢ ⎥ ⎣ ⎦

y u

U θ , where

1

( ,..., )

n ′

= U u u is an n p × future observation matrix (independent of Y ), and

i

u is distributed according to the same distribution of

i

y ( 1,..., i n = ).

  • Bias:

KL

ˆ 2 ( | ) B R E L ⎡ ⎤ = − − ⎢ ⎥ ⎣ ⎦

y

Y θ .

slide-15
SLIDE 15

Vienna WS 15

  • Information criterion (IC):

ˆ ˆ IC 2 ( | ) L B = − + Y θ , where ˆ B is a consistent estimator of B. Note: The ICs are specified by different terms of ˆ B.

slide-16
SLIDE 16

Vienna WS 16

Estimated Bias in Information Criteria

  • AIC: ˆ

2 B q = .

  • TIC (Takeuchi information criterion; Takeuchi, 1976):

1

ˆ ˆ ˆ ˆ ˆ 2tr{ ( ) ( ) } B

= I J θ θ .

  • CV:

[ ] 1

ˆ ˆ ˆ 2 log ( | ) 2 ( | )

n i i i

B f L

− =

= − +

y Y θ θ , where

[ ] i −

θ is Jackknife estimator of θ defined by

[ ]

ˆ argmax log ( | )

n i j j i

f

− ≠

⎧ ⎫ ⎪ ⎪ ⎪ ⎪ = ⎨ ⎬ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭

y

θ

θ θ .

slide-17
SLIDE 17

Vienna WS 17

Notations

  • A. Derivatives:
  • 1. First-order (Gradient):

( | ) log ( | ) f ∂ = − ∂ g y y θ θ θ ,

  • 2. Second-order (Hessian):

2

( | ) log ( | ) f ∂ = − ′ ∂ ∂ H y y θ θ θ θ ,

  • 3. Third-order:

2

( | ) log ( | ) f ⎛ ⎞ ∂ ∂ ⎟ ⎜ = − ⊗ ⎟ ⎜ ⎟ ⎜ ′ ′ ∂ ∂ ∂ ⎝ ⎠ C y y θ θ θ θ θ ,

  • 4. Fourth-order:

2 2

( | ) log ( | ) f ⎛ ⎞ ∂ ∂ ⎟ ⎜ = − ⊗ ⎟ ⎜ ⎟ ⎜ ′ ′ ∂ ∂ ∂ ∂ ⎝ ⎠ Q y y θ θ θ θ θ θ , where ⊗ is the Kronecker product.

slide-18
SLIDE 18

Vienna WS 18

  • B. Expectation of Moment Matrices
  • 1. Information:

( ) [ ( | ) ( | ) ] E ′ =

y

I g y g y θ θ θ ,

  • 2. Jacobian

: ( ) [ ( | )] E =

y

J H y θ θ ,

  • 3. Expected third-order moment matrix:

( ) [ ( | )] E =

y

K C y θ θ ,

  • 4. Expected fourth-order moment matrix:

( ) [ ( | )] E =

y

L Q y θ θ .

slide-19
SLIDE 19

Vienna WS 19

  • C. Estimates of Expected Moment Matrices:
  • 1. Estimated Information:

1

1 ˆ( ) ( | ) ( | )

n i i i

n

=

′ = ∑ I g y g y θ θ θ ,

  • 2. Estimated Jacobian:

1

1 ˆ( ) ( | )

n i i

n

=

= ∑ J H y θ θ ,

  • 3. Estimated 3rd-order moment:

1

1 ˆ( ) ( | )

n i i

n

=

= ∑ K C y θ θ ,

  • 4. Estimated 4th-order moment:

1

1 ˆ( ) ( | )

n i i

n

=

= ∑ L Q y θ θ .

slide-20
SLIDE 20

Vienna WS 20

D(a). Coefficients in Bias-Correction Terms (General Case)

{ }

1 1 1 1 2 1 1 1 3 1 1 1 4 1 5

tr{ ( ) ( ) }, [ ( | ) ( ) ( | ) ( ) ( | )], [ ( | ) ( ) ( ){ ( ) ( | ) ( ) ( | )}], tr ( ) ( ) ( ) ( | ) ( ) ( | ) , tr{ ( | )( ( ) E E E E α α α α α

− − − − − − − − − −

= ′ = ′ = ⊗ ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ =

y y y y

I J g y J H y J g y g y J K J g y J g y J I J H y J H y C y J θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ

{ }

1 1 1 1 1 6 1 1 1 1 7 1 8

( | ) ( ) ( ) ( ) )} , ( | ) ( ) ( | ) ( ) ( | ) ( ) ( | ) , tr ( )( ( ) ( | ) ( ) ( | ) ( ) ( ) ( ) ) , tr ( )( ( ) ( )vec( ( E E E α α α

− − − − − − − − − −

⎡ ⎤ ⊗ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ′ = ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎢ ⎥ ⎣ ⎦ =

y y y

g y J I J g y J H y J H y J g y K J H y J g y J I J K J K J θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ

{ }

1 1 1 1 1 1 1 9 1 2 1 2 1 1 1 1 10 1 11

) ( ) ( ) ) ( ) ( ) ( ) ) , ( | ) ( ) ( | ) ( ) ( | ) ( ) ( | ) , tr{ ( )( ( ) ( | ) ( ) ( | ) ( ) ( ) ( ) } , tr{ ( ) ( ) ( )( ( E E

ϕ ϕ

α α α

− − − − − − − − − − − −

⊗ ⎡ ⎤ ′ = ⎣ ⎦ ⎡ ⎤ = ⊗ ⎣ ⎦ ′ = I J J I J g y J H y J H y J g y K J g y J H y J I J K J K J θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ

1 1 1 1 1 1 1 1 12

) ( ) ( ) ( ) ( ) ( ) )}, tr{ ( )( ( ) ( ) ( ) ( ) ( ) ( ) )}. α

− − − − − − − −

⊗ = ⊗ I J J I J L J I J J I J θ θ θ θ θ θ θ θ θ θ θ θ

slide-21
SLIDE 21

Vienna WS 21

D(b). Case with Overspecified Models ( ) ( ) = I J θ θ

{ }

1 1 1 2 1 1 1 3 1 1 4 1 1 5 6

, [ ( | ) ( ) ( | ) ( ) ( | )], [ ( | ) ( ) ( ){ ( ) ( | ) ( ) ( | )}], tr ( ) ( | ) ( ) ( | ) , tr{ ( | )( ( ) ( | ) ( ) )} , q E E E E E α α α α α α

− − − − − − − − −

= ′ = ′ = ⊗ ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎣ ⎦ =

y y y y y

g y I H y I g y g y I K I g y I g y I H y I H y C y I g y I θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ

{ } { }

1 1 1 1 1 1 7 1 1 1 1 8 1 9 1 2

( | ) ( ) ( | ) ( ) ( | ) ( ) ( | ) , tr ( )( ( ) ( | ) ( ) ( | ) ( ) ) , tr ( )( ( ) ( )vec( ( ) ( ) ( ) ) ( ) ) , ( | ) ( ) ( E E E α α α

− − − − − − − − − − −

⎡ ⎤ ⎡ ⎤ ′ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎢ ⎥ ⎣ ⎦ = ⊗ ′ =

y y y

g y I H y I H y J g y K I H y I g y I K J K J I J I g y I H y θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ

1 1 1 2 1 1 1 10 1 1 1 11 1 1 12

| ) ( ) ( | ) ( ) ( | ) , tr{ ( )( ( ) ( | ) ( ) ( | ) ( ) } , tr{ ( ) ( ) ( )( ( ) ( ) )}, tr{ ( )( ( ) ( ) )}. E α α α

− − − − − − − − − −

⎡ ⎤ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎣ ⎦ ′ = ⊗ = ⊗

y

I H y I g y K I g y I H y I K I K I I L I I θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ

slide-22
SLIDE 22

Vienna WS 22

Bias of AIC (Main Result of Current Study)

( )

AIC 1 1 2 3 4 5 6 7 2 8 9 10 11 12

1 2( ) 3 4 4 4 4 4 8 2 ( ). B q n O n α α α α α α α α α α α α α

= − + − − + + + + − + + − + − +

slide-23
SLIDE 23

Vienna WS 23

Useful Formulas in Obtaining the Coefficient Terms If (1, ( ))

p

W W θ Σ ∼ , for any symmetric matrices A, B, and C , 1. [tr( )] tr( ) E = WA A , 2. [tr( )tr( )] tr( )tr( ) 2tr( ) E = + WA WA A B AB , 3.

{ }

[tr( )tr( )tr( )] tr( )tr( )tr( ) 2 tr( )tr( ) tr( )tr( ) tr( )tr( ) 8tr( ). E = + + + + WA WB WC A B C A BC B AC C AB ABC

slide-24
SLIDE 24

Vienna WS 24

These expectations imply: 4. [{tr( )}{tr( )}] 2tr( ) E − − = A WA B WB AB , 5. [{tr( )}{tr( )}{tr( )}] 8tr( ) E − − − = − A WA B WB C WC ABC . Note: If B is not symmetric, then: 5(b). [{tr( )}{tr( )}{tr( )}] 4{tr( ) tr( )}. E − − − ′ = − + A WA B WB C WC ABC AB C

slide-25
SLIDE 25

Vienna WS 25

Thank you!