Model selection and estimation for latent variable models Presented - - PowerPoint PPT Presentation

model selection and estimation for latent variable models
SMART_READER_LITE
LIVE PREVIEW

Model selection and estimation for latent variable models Presented - - PowerPoint PPT Presentation

Model selection and estimation for latent variable models Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen Download pdf of these slides June 26th 2019 @ EcoSta2019 1 / 18 here. Motivation Many


slide-1
SLIDE 1

Model selection and estimation for latent variable models

Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen June 26th 2019 @ EcoSta2019 1 / 18 Download pdf of these slides here.

slide-2
SLIDE 2

Many scientific disciplines use latent variable (LV) models, or its special case, factor analytic (FA) models (e.g. medical, economics and agriculture). Data: ` ` corn hybrids ` ` trials 1-3 replicates at each trial Trait: Yield

Motivation

n = 64 p = 6

2 / 18

slide-3
SLIDE 3

Which corn hybrid is the best?

The transposed data matrix Y⊤ 3 / 18

slide-4
SLIDE 4

Factor analytic model

A dimension reduction: variables are written by a smaller number of underlying, unobserved factors where (usually much less). Key symbol: Key point: like a linear regression but the common factors are not observed.

p k k ≤ p

Λp×k

4 / 18

slide-5
SLIDE 5

Factor analytic model - multivariate form

Putting it all together in a matrix form or ... 5 / 18

slide-6
SLIDE 6

Univariate form

Thinking as a linear mixed model Here we have

➤ ➤

and

and

What about and ?

y = Xβ + Zu + e

[ ] ∼ N ([ ] [ ]) u e G R y = vec( ) Y⊤ X = ⊗ 1n Ip β = μ Z = diag( ⊗ Λ, ) In Inp u = ( , f ⊤ ϵ⊤)⊤ G = diag( , ⊗ Ψ) Ink In

e R

6 / 18

slide-7
SLIDE 7

All cells are not made equal

The transposed data matrix Most of the cells have 3 replicates but some are 2 replicates and one with 1 observation.

Y⊤

7 / 18

slide-8
SLIDE 8

Two-stage analysis

➤Quite often data is processed to fit the rectangular structure. ➤In this case, "observations" in data matrix are estimates. ➤Estimates may have different precisions. ➤These precisions may be used as weights for the second step. ➤We take the diagonal entries, , of

precision matrix as weights for the next step, i.e. take as a known diagonal matrix with diagional entries as .

➤Alternatively, we can do a one-stage analysis (better, also

can handle missing values) but not in this talk.

cii np × np C−1

yy

R cii

8 / 18

slide-9
SLIDE 9

Processed data

corndata # A tibble: 384 x 4 site hybrid yield weights <fct> <fct> <dbl> <dbl> 1 S1 G01 144. 0.00999 2 S2 G01 67.5 0.00993 3 S3 G01 105. 0.00857 4 S4 G01 154. 0.0113 5 S5 G01 110. 0.0143 6 S6 G01 88.3 0.00361 7 S1 G02 156. 0.00999 8 S2 G02 79.8 0.00662 9 S3 G02 61.7 0.00857 10 S4 G02 138. 0.0113 # … with 374 more rows

➤Our R-package platent fits this model with FA order selected by our

OFAL algorithm (Hui et al. 2018, Biometrics).

➤The current capibility is limited to above model.

library(platent) # still in development fit_ofal(yield ~ site + id(hybrid):rr(site) + id(hybrid):diag(site), weights = weights, data = corndata) fit_ofal(yield ~ site + id(hybrid):fa(site), weights = weights, data = corndata)

y = + + e X ⏞

⊗ 1n Ip

β Z ⏞

Inp

[( ⊗ Λ)f + ϵ] In        u: [ ] ∼ N ([ ] , [ ]) u: e 0np 0np ⊗ (Λ + Ψ) In Λ⊤ 0np×np 0np×np R̂

9 / 18

slide-10
SLIDE 10

OFAL algorithm

We need to estimate:

➤fixed effects and ➤variance parameters

Estimating fixed effects For given , use BLUE for : where is a function of . Here: .

β θ = (vec(Λ , diag(Ψ )⊤ )⊤)⊤ θ β = ( X y β̂ X⊤V−1 )−1X⊤V−1 var(y) = V θ V = ⊗ (Λ + Ψ) + In Λ⊤ R̂

Estimating variance parameters Recall is a factor loading matrix: What , i.e. how many factors, should we have? Assume that is a pseudo factor loading matrix where .

Λ p × k . ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ λ11 λ21 ⋮ λk1 ⋮ λp1 λ12 λ22 ⋮ λk2 ⋮ λp2 ⋯ ⋯ ⋱ ⋯ ⋯ λ1k λ2k ⋮ λkk ⋮ λpk ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ k Λ0 p × d k ≤ d ≤ p

slide-11
SLIDE 11

Estimating variance parameters

REML or ML estimate: the typical (frequentist) approach OFAL estimate: our approach via penalised likelihood where

➤ is a tuning parameter, ➤

is a group-wise adaptive weight for th column of , and

is an element-wise adaptive weight for th entry of .

= ℓ(θ | y, X, Z, β) θ̂

ML/REML

arg max

θ

= ℓ(θ) − s − s | | θ̂

OFAL

arg max

θ

⎧ ⎩ ⎨ ⎪ ⎪ ∑

l=1 d

ωg,l ∑

i=1 p

j=l d

λ2

0,ij

‾ ‾ ‾‾‾‾‾‾‾‾  ⎷   ∑

i=1 p

j=1 d

ωe,ij λ0,ij ⎫ ⎭ ⎬ ⎪ ⎪ s ωg,l l Λ0 ωe,ij i, j Λ0 = ωg ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ωg,1 ωg,2 ⋮ ωg,d ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ = Ωe ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ωe,11 ωe,21 ⋮ ωe,d1 ⋮ ωe,p1 ωe,12 ωe,22 ⋮ ωe,d2 ⋮ ωe,p2 ⋯ ⋯ ⋱ ⋯ ⋯ ωe,1d ωe,2d ⋮ ωe,dd ⋮ ωe,pd ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

11 / 18

slide-12
SLIDE 12

Λ0

OFAL Demonstration

Say , then you would expect .

s → ∞ ωe,15 → 0 λ15

12 / 18

slide-13
SLIDE 13

Λ0

OFAL Demonstration

Say , then you would expect . Sum of squares is zero only if each element is zero, so and for .

s → ∞ ωg,5 → 0 ( + ) ∑p

l=1 λ2 l5

λ2

l6

‾ ‾ ‾‾‾‾‾‾‾‾‾‾‾‾ √ → 0 λl5 → 0 λl6 l = 1, . . . , p

13 / 18

slide-14
SLIDE 14

EM algorithm

E-Step

➤with respect to the conditional density

;

➤where is the complete log-likelihood (or residual log-likelihood); ➤

and are the estimate of and , respectively, for the th iteration.

Q(θ) = (ℓ(θ) | y, X, , ) β̂

(r) θ̂ (r)

f(u|y) ℓ β̂

(r)

θ̂

(r)

β θ r

14 / 18

slide-15
SLIDE 15

M-Step

If is a local maximiser of above then there exists local maxiiser

  • f below such that

. See proof in Hui et al. (2018).

➤Reformulates problem into an elastic-net type regularisation problem. ➤Employ coordinate-wise optimisation to obtain loading estimates.

= Q(θ) − s − s | | θ̂

(r+1)

arg max

θ

l=1 k

ωg,l( ) ∑

i=1 p

j=l k

λ2

ij 1/2

i=1 p

j=1 k

ωe,ij λij θ̂ ( , ) θ̃ τ̃ = θ̂ θ̃ ( , ) = Q(θ) − − s | | − θ̃

(r+1) τ̃ (r+1)

arg max

θ,τ≥0

ns2 4 ∑

l=1 d

ωel τl ∑

k=1 d

j=1 p

λ2

jk

j=1 p

k=1 d

ωg,jk λjk ∑

l=1 d

ωelτl

Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 15 / 18

slide-16
SLIDE 16

Adaptive Weights & Tuning Parameter Selection

Adaptive Weights

➤Fit model with unpenalised likelihood first to obtain estimate,
  • f

.

➤Perform eigendecomposition

. Take as the first columns of .

➤Construct adaptive weights as

Tuning Parameter

➤Tuning parameter may be selected from some information criterion (e.g. AIC, BIC, EBIC).

We used ERIC*.

Λ̃ Λ0 = QD Λ̃ Q⊤ Λ∗ d QD1/2 = = and = | . ωe,l ( ( ) ∑

j=l d

i=1 p

λ∗

ij)2 1/2

k=l d

D−1/2

kk

ωg,ij λ∗

ij|−1

s

*Hui, Warton & Foster (2015) Tuning Parameter Selection for the Adaptive Lasso Using ERIC. JASA

16 / 18

slide-17
SLIDE 17

Performance & Future

➤Simulation results suggests competitive performance. See Hui et al. (2018).

Details

➤I only show this for an FA (Gaussian) model here but our paper also shows results from a negative

binomial generalised linear latent variable model.

➤We need more research into: ➤adaptive weight construction; ➤computational efficient approaches and ➤study for high-dimensional problems & other non-normal responses.

Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 17 / 18

slide-18
SLIDE 18

These slides, made using xaringan R-package, can be found at bit.ly/ecosta2019 Our methods paper: Hui, Tanaka & Warton (2018) Biometrics. Follow the platent R-package development at http://github.com/emitanaka/platent Comments/feedback welcome! dr.emi.tanaka@gmail.com @statsgen Acknowledgement Big thanks go to my collaborator Dr. Francis Hui! His EcoSta2019 talk on this afternoon 4.10pm at Room S1A01 on 18 / 18