Vienna WS 1
Second-Order Bias-Corrected AIC for Selecting Structural Equation - - PowerPoint PPT Presentation
Second-Order Bias-Corrected AIC for Selecting Structural Equation - - PowerPoint PPT Presentation
Vienna WS 1 Second-Order Bias-Corrected AIC for Selecting Structural Equation Models Kentaro H AYASHI Department of Psychology , University of Hawaii at Manoa (E-mail: hayashik@hawaii.edu) AND Hirokazu Y ANAGIHARA Department of Mathematics ,
Vienna WS 2
Introduction We derive a second-order bias correction of Akaike Information Criterion (AIC) in structural equation models (SEM) under the normal assumption when the model is overspecified. Note: “Overspecified” means the candidate models (f ’s) include the true model (ϕ). Contents:
- 1. Introduction
(a) Structural Equation Models (SEM) (b) AIC and CV (Cross-Validation) Criterion
Vienna WS 3
- 2. General Theory
(a) True and Candidate Models (b) Likelihood and MLE (c) Risk, Bias, and Information Criterion (d) Estimated Bias
- 3. Recent and Current Studies
(a) Notations (Derivatives, Expectation of Moment Matrices, Estimates
- f Expected Moment Matrices, and Coefficients in Bias-Correction
Terms) (b) Evaluating Bias of Information Criteria (c) Asymptotic Expansion of Expectation of Estimated “Beta” Term (d) Bias of AIC (Main Result of Current Study) (e) Useful Formulas in Obtaining the Coefficient Terms
Vienna WS 4
Structural Equation Models (SEM) References: Bollen (1989), Bartholomew and Knott (1999), Skrondal and Rabe-Hesketh (2004), Yuan and Bentler (2007)
- SEM is one of the most frequently used multivariate techniques in
social sciences.
- SEM aims to express the covariance structure using relatively small
number of parameters. Notation: ( ) θ Σ
Vienna WS 5
- The single most famous SEM is the confirmatory factor analysis
(CFA) model: = + + y f μ ε Λ , where y ( 1 p× ): Observed variables, μ ( 1 p× ): Population means, Λ (p m × ): Factor loadings (Path coefficients), f ( 1 m × ): Factors, ε ( 1 p× ): Errors. Note: CFA is a linear model but the factors (f ) are latent (NOT
- bserved) variables.
Vienna WS 6
- Typical assumptions: Errors are mutually uncorrelated, and factors
and errors are uncorrelated. That is, Cov( , )
i j =
ε ε and Cov( , )
i j =
f ε .
- The covariance structure of variables (y) is expressed as a function of:
Λ (p m × ): Factor loadings (Path coefficients), Φ (m m × ): Factor correlations, and Ψ (p p × ): Error variances.
- That is, the covariance structure under CFA is expressed as:
( ) ′ = + θ Σ ΛΦΛ Ψ, where
1
( , , ) ( ,..., )
q
θ θ ′ ′ ′ ′ ′ = = θ λ φ ψ .
Vienna WS 7
AIC (Akaike Information Criterion)
- When the candidate model is overspecified, AIC is the first-order
bias-corrected estimator of the risk function based on the expected predictive Kullback-Leibler (KL) discrepancy between the true model and the candidate model. That is,
1 KL
E[AIC] ( ) R O n− = + .
- AIC tends to choose the model with many parameters as the best
model (when the full model has too many parameters).
- Reason: AIC tends to underestimate the bias when the candidate
model has many parameters (because the bias term of AIC is derived based on the asymptotic theory of ˆ θ).
Vienna WS 8
- The (negative) property of AIC that the candidate model having too
many parameters is chosen as the best model tends to appear in the
- verspecified models.
Vienna WS 9
Cross-Validation (CV)
- Even when the candidate model is misspecified, by correcting the bias
- f the cross validation (CV) criterion (Stone, 1974), the second-order
bias-corrected estimators of the risk function have been proposed under the general condition (e.g., Yanagihara, Tonda and Matsumoto, 2006). That is,
2 KL
E[CCV] ( ) R O n− = + .
- However, the many computational tasks are need for obtaining
bias-corrected criteria and these criteria have large variance.
Vienna WS 10
True and Candidate Models (General) Let
1,..., n
y y be p-dimensional random observation vectors, where n is the sample size.
- The true model:
1
: ,..., . . . ( )
n
M i i d
ϕ
ϕ y y y ∼ , where
( ) ϕ y is an unknown probability density function.
- The candidate model:
1
: ,..., . . . ( | )
f n
M i i d f y y y θ ∼ , where
1
{ ( | ); }, ( ,..., )
q
f θ θ ′ = ⊆ Θ = y θ θ θ F .
Vienna WS 11
Candidate Model in SEM If the candidate model is specified, then
1 1
( , ( )) , ,..., . . . (1, ( )),
p N N p
N W N N i i d W ⇒ = + + S S W W W W θ θ Σ Σ ∼
- ∼
where 1 N n = − .
1,..., N
W W can be regarded as independent observations. Therefore, the candidate model is:
1
: ,..., . . . (1, ( ))
f N p
M i i d W W W θ Σ ∼ .
Vienna WS 12
Likelihood and MLE (General)
- Log-likelihood function:
1
( | ) log ( | )
n i i
L f
=
= ∑ Y y θ θ , where
1
( ,..., )
n ′
= Y y y .
- Maximum likelihood estimator (MLE) of θ: ˆ
arg max ( | ) L = Y
θ
θ θ
- Convergence of MLE in the misspecified model (White, 1982) :
ˆ lim , log ( | )
n
E f
→∞ =
⎡ ⎤ ∂ ⎢ ⎥ = = ⎢ ⎥ ∂ ⎣ ⎦
y
y
θ θ
θ θ θ θ where Ey denotes an expectation with respect to y under the true model ( ) ϕ y .
Vienna WS 13
Likelihood and MLE in SEM In SEM, the discrepancy function is:
1 1 KL KL 1
( | ) log | | ( ) 1 ( | ).
N i i
F tr p F N
− − =
= − Σ + Σ − ⇒
∑
S S S W θ θ Therefore, the log-likelihood is:
KL
2 ( | ) ( ) ( | ) L N F − = S S θ θ , and the MLE is: ˆ arg max ( | ) L = S
θ
θ θ .
Vienna WS 14
Risk Function, Bias, and Information Criterion
- Risk function based on the expected predictive KL discrepancy:
KL
ˆ 2 ( | ) R E E L ⎡ ⎤ = − ⎢ ⎥ ⎣ ⎦
y u
U θ , where
1
( ,..., )
n ′
= U u u is an n p × future observation matrix (independent of Y ), and
i
u is distributed according to the same distribution of
i
y ( 1,..., i n = ).
- Bias:
KL
ˆ 2 ( | ) B R E L ⎡ ⎤ = − − ⎢ ⎥ ⎣ ⎦
y
Y θ .
Vienna WS 15
- Information criterion (IC):
ˆ ˆ IC 2 ( | ) L B = − + Y θ , where ˆ B is a consistent estimator of B. Note: The ICs are specified by different terms of ˆ B.
Vienna WS 16
Estimated Bias in Information Criteria
- AIC: ˆ
2 B q = .
- TIC (Takeuchi information criterion; Takeuchi, 1976):
1
ˆ ˆ ˆ ˆ ˆ 2tr{ ( ) ( ) } B
−
= I J θ θ .
- CV:
[ ] 1
ˆ ˆ ˆ 2 log ( | ) 2 ( | )
n i i i
B f L
− =
= − +
∑
y Y θ θ , where
[ ] i −
θ is Jackknife estimator of θ defined by
[ ]
ˆ argmax log ( | )
n i j j i
f
− ≠
⎧ ⎫ ⎪ ⎪ ⎪ ⎪ = ⎨ ⎬ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭
∑
y
θ
θ θ .
Vienna WS 17
Notations
- A. Derivatives:
- 1. First-order (Gradient):
( | ) log ( | ) f ∂ = − ∂ g y y θ θ θ ,
- 2. Second-order (Hessian):
2
( | ) log ( | ) f ∂ = − ′ ∂ ∂ H y y θ θ θ θ ,
- 3. Third-order:
2
( | ) log ( | ) f ⎛ ⎞ ∂ ∂ ⎟ ⎜ = − ⊗ ⎟ ⎜ ⎟ ⎜ ′ ′ ∂ ∂ ∂ ⎝ ⎠ C y y θ θ θ θ θ ,
- 4. Fourth-order:
2 2
( | ) log ( | ) f ⎛ ⎞ ∂ ∂ ⎟ ⎜ = − ⊗ ⎟ ⎜ ⎟ ⎜ ′ ′ ∂ ∂ ∂ ∂ ⎝ ⎠ Q y y θ θ θ θ θ θ , where ⊗ is the Kronecker product.
Vienna WS 18
- B. Expectation of Moment Matrices
- 1. Information:
( ) [ ( | ) ( | ) ] E ′ =
y
I g y g y θ θ θ ,
- 2. Jacobian
: ( ) [ ( | )] E =
y
J H y θ θ ,
- 3. Expected third-order moment matrix:
( ) [ ( | )] E =
y
K C y θ θ ,
- 4. Expected fourth-order moment matrix:
( ) [ ( | )] E =
y
L Q y θ θ .
Vienna WS 19
- C. Estimates of Expected Moment Matrices:
- 1. Estimated Information:
1
1 ˆ( ) ( | ) ( | )
n i i i
n
=
′ = ∑ I g y g y θ θ θ ,
- 2. Estimated Jacobian:
1
1 ˆ( ) ( | )
n i i
n
=
= ∑ J H y θ θ ,
- 3. Estimated 3rd-order moment:
1
1 ˆ( ) ( | )
n i i
n
=
= ∑ K C y θ θ ,
- 4. Estimated 4th-order moment:
1
1 ˆ( ) ( | )
n i i
n
=
= ∑ L Q y θ θ .
Vienna WS 20
D(a). Coefficients in Bias-Correction Terms (General Case)
{ }
1 1 1 1 2 1 1 1 3 1 1 1 4 1 5
tr{ ( ) ( ) }, [ ( | ) ( ) ( | ) ( ) ( | )], [ ( | ) ( ) ( ){ ( ) ( | ) ( ) ( | )}], tr ( ) ( ) ( ) ( | ) ( ) ( | ) , tr{ ( | )( ( ) E E E E α α α α α
− − − − − − − − − −
= ′ = ′ = ⊗ ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ =
y y y y
I J g y J H y J g y g y J K J g y J g y J I J H y J H y C y J θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ
{ }
1 1 1 1 1 6 1 1 1 1 7 1 8
( | ) ( ) ( ) ( ) )} , ( | ) ( ) ( | ) ( ) ( | ) ( ) ( | ) , tr ( )( ( ) ( | ) ( ) ( | ) ( ) ( ) ( ) ) , tr ( )( ( ) ( )vec( ( E E E α α α
− − − − − − − − − −
⎡ ⎤ ⊗ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ′ = ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎢ ⎥ ⎣ ⎦ =
y y y
g y J I J g y J H y J H y J g y K J H y J g y J I J K J K J θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ
{ }
1 1 1 1 1 1 1 9 1 2 1 2 1 1 1 1 10 1 11
) ( ) ( ) ) ( ) ( ) ( ) ) , ( | ) ( ) ( | ) ( ) ( | ) ( ) ( | ) , tr{ ( )( ( ) ( | ) ( ) ( | ) ( ) ( ) ( ) } , tr{ ( ) ( ) ( )( ( E E
ϕ ϕ
α α α
− − − − − − − − − − − −
⊗ ⎡ ⎤ ′ = ⎣ ⎦ ⎡ ⎤ = ⊗ ⎣ ⎦ ′ = I J J I J g y J H y J H y J g y K J g y J H y J I J K J K J θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ
1 1 1 1 1 1 1 1 12
) ( ) ( ) ( ) ( ) ( ) )}, tr{ ( )( ( ) ( ) ( ) ( ) ( ) ( ) )}. α
− − − − − − − −
⊗ = ⊗ I J J I J L J I J J I J θ θ θ θ θ θ θ θ θ θ θ θ
Vienna WS 21
D(b). Case with Overspecified Models ( ) ( ) = I J θ θ
{ }
1 1 1 2 1 1 1 3 1 1 4 1 1 5 6
, [ ( | ) ( ) ( | ) ( ) ( | )], [ ( | ) ( ) ( ){ ( ) ( | ) ( ) ( | )}], tr ( ) ( | ) ( ) ( | ) , tr{ ( | )( ( ) ( | ) ( ) )} , q E E E E E α α α α α α
− − − − − − − − −
= ′ = ′ = ⊗ ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎣ ⎦ =
y y y y y
g y I H y I g y g y I K I g y I g y I H y I H y C y I g y I θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ
{ } { }
1 1 1 1 1 1 7 1 1 1 1 8 1 9 1 2
( | ) ( ) ( | ) ( ) ( | ) ( ) ( | ) , tr ( )( ( ) ( | ) ( ) ( | ) ( ) ) , tr ( )( ( ) ( )vec( ( ) ( ) ( ) ) ( ) ) , ( | ) ( ) ( E E E α α α
− − − − − − − − − − −
⎡ ⎤ ⎡ ⎤ ′ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎢ ⎥ ⎣ ⎦ = ⊗ ′ =
y y y
g y I H y I H y J g y K I H y I g y I K J K J I J I g y I H y θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ
1 1 1 2 1 1 1 10 1 1 1 11 1 1 12
| ) ( ) ( | ) ( ) ( | ) , tr{ ( )( ( ) ( | ) ( ) ( | ) ( ) } , tr{ ( ) ( ) ( )( ( ) ( ) )}, tr{ ( )( ( ) ( ) )}. E α α α
− − − − − − − − − −
⎡ ⎤ ⎣ ⎦ ⎡ ⎤ = ⊗ ⎣ ⎦ ′ = ⊗ = ⊗
y
I H y I g y K I g y I H y I K I K I I L I I θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ θ
Vienna WS 22
Bias of AIC (Main Result of Current Study)
( )
AIC 1 1 2 3 4 5 6 7 2 8 9 10 11 12
1 2( ) 3 4 4 4 4 4 8 2 ( ). B q n O n α α α α α α α α α α α α α
−
= − + − − + + + + − + + − + − +
Vienna WS 23
Useful Formulas in Obtaining the Coefficient Terms If (1, ( ))
p
W W θ Σ ∼ , for any symmetric matrices A, B, and C , 1. [tr( )] tr( ) E = WA A , 2. [tr( )tr( )] tr( )tr( ) 2tr( ) E = + WA WA A B AB , 3.
{ }
[tr( )tr( )tr( )] tr( )tr( )tr( ) 2 tr( )tr( ) tr( )tr( ) tr( )tr( ) 8tr( ). E = + + + + WA WB WC A B C A BC B AC C AB ABC
Vienna WS 24
These expectations imply: 4. [{tr( )}{tr( )}] 2tr( ) E − − = A WA B WB AB , 5. [{tr( )}{tr( )}{tr( )}] 8tr( ) E − − − = − A WA B WB C WC ABC . Note: If B is not symmetric, then: 5(b). [{tr( )}{tr( )}{tr( )}] 4{tr( ) tr( )}. E − − − ′ = − + A WA B WB C WC ABC AB C
Vienna WS 25