Model selection and estimation for latent variable models Presented - - PowerPoint PPT Presentation
Model selection and estimation for latent variable models Presented - - PowerPoint PPT Presentation
Model selection and estimation for latent variable models Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen Download pdf of these slides June 26th 2019 @ EcoSta2019 1 / 18 here. Motivation Many
Many scientific disciplines use latent variable (LV) models, or its special case, factor analytic (FA) models (e.g. medical, economics and agriculture). Data: ` ` corn hybrids ` ` trials 1-3 replicates at each trial Trait: Yield
Motivation
n = 64 p = 6
2 / 18
Which corn hybrid is the best?
The transposed data matrix Y⊤ 3 / 18
Factor analytic model
A dimension reduction: variables are written by a smaller number of underlying, unobserved factors where (usually much less). Key symbol: Key point: like a linear regression but the common factors are not observed.
p k k ≤ p
Λp×k
4 / 18
Factor analytic model - multivariate form
Putting it all together in a matrix form or ... 5 / 18
Univariate form
Thinking as a linear mixed model Here we have
➤ ➤and
➤and
➤What about and ?
y = Xβ + Zu + e
[ ] ∼ N ([ ] [ ]) u e G R y = vec( ) Y⊤ X = ⊗ 1n Ip β = μ Z = diag( ⊗ Λ, ) In Inp u = ( , f ⊤ ϵ⊤)⊤ G = diag( , ⊗ Ψ) Ink In
e R
6 / 18
All cells are not made equal
The transposed data matrix Most of the cells have 3 replicates but some are 2 replicates and one with 1 observation.
Y⊤
7 / 18
Two-stage analysis
➤Quite often data is processed to fit the rectangular structure. ➤In this case, "observations" in data matrix are estimates. ➤Estimates may have different precisions. ➤These precisions may be used as weights for the second step. ➤We take the diagonal entries, , ofprecision matrix as weights for the next step, i.e. take as a known diagonal matrix with diagional entries as .
➤Alternatively, we can do a one-stage analysis (better, alsocan handle missing values) but not in this talk.
cii np × np C−1
yy
R cii
8 / 18
Processed data
corndata # A tibble: 384 x 4 site hybrid yield weights <fct> <fct> <dbl> <dbl> 1 S1 G01 144. 0.00999 2 S2 G01 67.5 0.00993 3 S3 G01 105. 0.00857 4 S4 G01 154. 0.0113 5 S5 G01 110. 0.0143 6 S6 G01 88.3 0.00361 7 S1 G02 156. 0.00999 8 S2 G02 79.8 0.00662 9 S3 G02 61.7 0.00857 10 S4 G02 138. 0.0113 # … with 374 more rows
➤Our R-package platent fits this model with FA order selected by ourOFAL algorithm (Hui et al. 2018, Biometrics).
➤The current capibility is limited to above model.library(platent) # still in development fit_ofal(yield ~ site + id(hybrid):rr(site) + id(hybrid):diag(site), weights = weights, data = corndata) fit_ofal(yield ~ site + id(hybrid):fa(site), weights = weights, data = corndata)
y = + + e X ⏞
⊗ 1n Ip
β Z ⏞
Inp
[( ⊗ Λ)f + ϵ] In u: [ ] ∼ N ([ ] , [ ]) u: e 0np 0np ⊗ (Λ + Ψ) In Λ⊤ 0np×np 0np×np R̂
9 / 18
OFAL algorithm
We need to estimate:
➤fixed effects and ➤variance parametersEstimating fixed effects For given , use BLUE for : where is a function of . Here: .
β θ = (vec(Λ , diag(Ψ )⊤ )⊤)⊤ θ β = ( X y β̂ X⊤V−1 )−1X⊤V−1 var(y) = V θ V = ⊗ (Λ + Ψ) + In Λ⊤ R̂
Estimating variance parameters Recall is a factor loading matrix: What , i.e. how many factors, should we have? Assume that is a pseudo factor loading matrix where .
Λ p × k . ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ λ11 λ21 ⋮ λk1 ⋮ λp1 λ12 λ22 ⋮ λk2 ⋮ λp2 ⋯ ⋯ ⋱ ⋯ ⋯ λ1k λ2k ⋮ λkk ⋮ λpk ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ k Λ0 p × d k ≤ d ≤ p
Estimating variance parameters
REML or ML estimate: the typical (frequentist) approach OFAL estimate: our approach via penalised likelihood where
➤ is a tuning parameter, ➤is a group-wise adaptive weight for th column of , and
➤is an element-wise adaptive weight for th entry of .
= ℓ(θ | y, X, Z, β) θ̂
ML/REML
arg max
θ
= ℓ(θ) − s − s | | θ̂
OFAL
arg max
θ
⎧ ⎩ ⎨ ⎪ ⎪ ∑
l=1 d
ωg,l ∑
i=1 p
∑
j=l d
λ2
0,ij
‾ ‾ ‾‾‾‾‾‾‾‾ ⎷ ∑
i=1 p
∑
j=1 d
ωe,ij λ0,ij ⎫ ⎭ ⎬ ⎪ ⎪ s ωg,l l Λ0 ωe,ij i, j Λ0 = ωg ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ωg,1 ωg,2 ⋮ ωg,d ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ = Ωe ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ωe,11 ωe,21 ⋮ ωe,d1 ⋮ ωe,p1 ωe,12 ωe,22 ⋮ ωe,d2 ⋮ ωe,p2 ⋯ ⋯ ⋱ ⋯ ⋯ ωe,1d ωe,2d ⋮ ωe,dd ⋮ ωe,pd ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥
11 / 18
Λ0
OFAL Demonstration
Say , then you would expect .
s → ∞ ωe,15 → 0 λ15
12 / 18
Λ0
OFAL Demonstration
Say , then you would expect . Sum of squares is zero only if each element is zero, so and for .
s → ∞ ωg,5 → 0 ( + ) ∑p
l=1 λ2 l5
λ2
l6
‾ ‾ ‾‾‾‾‾‾‾‾‾‾‾‾ √ → 0 λl5 → 0 λl6 l = 1, . . . , p
13 / 18
EM algorithm
E-Step
➤with respect to the conditional density;
➤where is the complete log-likelihood (or residual log-likelihood); ➤and are the estimate of and , respectively, for the th iteration.
Q(θ) = (ℓ(θ) | y, X, , ) β̂
(r) θ̂ (r)
f(u|y) ℓ β̂
(r)
θ̂
(r)
β θ r
14 / 18
M-Step
If is a local maximiser of above then there exists local maxiiser
- f below such that
. See proof in Hui et al. (2018).
➤Reformulates problem into an elastic-net type regularisation problem. ➤Employ coordinate-wise optimisation to obtain loading estimates.= Q(θ) − s − s | | θ̂
(r+1)
arg max
θ
∑
l=1 k
ωg,l( ) ∑
i=1 p
∑
j=l k
λ2
ij 1/2
∑
i=1 p
∑
j=1 k
ωe,ij λij θ̂ ( , ) θ̃ τ̃ = θ̂ θ̃ ( , ) = Q(θ) − − s | | − θ̃
(r+1) τ̃ (r+1)
arg max
θ,τ≥0
ns2 4 ∑
l=1 d
ωel τl ∑
k=1 d
∑
j=1 p
λ2
jk
∑
j=1 p
∑
k=1 d
ωg,jk λjk ∑
l=1 d
ωelτl
Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 15 / 18
Adaptive Weights & Tuning Parameter Selection
Adaptive Weights
➤Fit model with unpenalised likelihood first to obtain estimate,- f
.
➤Perform eigendecomposition. Take as the first columns of .
➤Construct adaptive weights asTuning Parameter
➤Tuning parameter may be selected from some information criterion (e.g. AIC, BIC, EBIC).We used ERIC*.
Λ̃ Λ0 = QD Λ̃ Q⊤ Λ∗ d QD1/2 = = and = | . ωe,l ( ( ) ∑
j=l d
∑
i=1 p
λ∗
ij)2 1/2
∑
k=l d
D−1/2
kk
ωg,ij λ∗
ij|−1
s
*Hui, Warton & Foster (2015) Tuning Parameter Selection for the Adaptive Lasso Using ERIC. JASA
16 / 18
Performance & Future
➤Simulation results suggests competitive performance. See Hui et al. (2018).Details
➤I only show this for an FA (Gaussian) model here but our paper also shows results from a negativebinomial generalised linear latent variable model.
➤We need more research into: ➤adaptive weight construction; ➤computational efficient approaches and ➤study for high-dimensional problems & other non-normal responses.Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 17 / 18