model selection and estimation for latent variable models
play

Model selection and estimation for latent variable models Presented - PowerPoint PPT Presentation

Model selection and estimation for latent variable models Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen Download pdf of these slides June 26th 2019 @ EcoSta2019 1 / 18 here. Motivation Many


  1. Model selection and estimation for latent variable models Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen Download pdf of these slides June 26th 2019 @ EcoSta2019 1 / 18 here.

  2. Motivation Many scientific disciplines use latent variable (LV) models, or its special case, factor analytic (FA) models (e.g. medical, economics and agriculture ). Data: ` ` corn hybrids n = 64 ` ` trials p = 6 1-3 replicates at each trial Trait: Yield 2 / 18

  3. Which corn hybrid is the best? The transposed data matrix Y ⊤ 3 / 18

  4. Factor analytic model A dimension reduction: p variables are written by a smaller number of underlying, unobserved factors where k (usually much less). k ≤ p Key symbol: Λ p × k Key point: like a linear regression but the common factors are not observed. 4 / 18

  5. Factor analytic model - multivariate form Putting it all together in a matrix form or ... 5 / 18

  6. Univariate form Thinking as a linear mixed model y = X β + Z u + e u 0 G 0 ] ∼ N ([ [ ] [ ]) e 0 0 R Here we have Y ⊤ ➤ y = vec( ) and ➤ X = 1 n ⊗ I p β = μ and f ⊤ ϵ ⊤ ) ⊤ ➤ Z = diag( I n ⊗ Λ , I np ) u = ( , ➤ G = diag( I nk , I n ⊗ Ψ ) What about and ? e R 6 / 18

  7. All cells are not made equal The transposed data matrix Y ⊤ Most of the cells have 3 replicates but some are 2 replicates and one with 1 observation. 7 / 18

  8. Two-stage analysis ➤ Quite often data is processed to fit the rectangular structure. ➤ In this case, "observations" in data matrix are estimates. ➤ Estimates may have different precisions. ➤ These precisions may be used as weights for the second step. ➤ We take the diagonal entries, , of precision matrix c ii np × np as weights for the next step, i.e. take as a known C − 1 R yy diagonal matrix with diagional entries as . c ii ➤ Alternatively, we can do a one-stage analysis (better, also can handle missing values) but not in this talk. 8 / 18

  9. Processed 1 n ⊗ I p I np ⏞ ⏞ y = X β ���� + Z [( ⊗ Λ ) f + ϵ ] + e I n        data u ������ : ���� Λ ⊤ I n ⊗ ( Λ + Ψ ) 0 np × np 0 np u ������ : ���� ] ∼ N ( [ ] , [ corndata [ R ̂ ]) e 0 np 0 np × np ➤ Our R-package platent fits this model with FA order selected by our # A tibble: 384 x 4 site hybrid yield weights <fct> <fct> <dbl> <dbl> OFAL algorithm (Hui et al. 2018, Biometrics). 1 S1 G01 144. 0.00999 2 S2 G01 67.5 0.00993 ➤ The current capibility is limited to above model. 3 S3 G01 105. 0.00857 4 S4 G01 154. 0.0113 5 S5 G01 110. 0.0143 6 S6 G01 88.3 0.00361 7 S1 G02 156. 0.00999 library (platent) # still in development 8 S2 G02 79.8 0.00662 9 S3 G02 61.7 0.00857 fit_ofal(yield ~ site + id(hybrid):rr(site) + id(hybrid):diag(site), 10 S4 G02 138. 0.0113 weights = weights, data = corndata) # … with 374 more rows fit_ofal(yield ~ site + id(hybrid):fa(site), weights = weights, data = corndata) 9 / 18

  10. OFAL algorithm Estimating variance parameters Recall is a factor loading matrix: We need to estimate: Λ p × k ➤ fixed effects and β ⎡ ⎤ ⋯ λ 11 λ 12 λ 1 k ➤ variance parameters ⎢ ⎥ ) ⊤ ) ⊤ ) ⊤ θ = (vec( Λ , diag( Ψ ⋯ λ 21 λ 22 λ 2 k ⎢ ⎥ Estimating fixed effects ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ . For given , use BLUE for : ⎢ ⎥ ⋯ λ k 1 λ k 2 λ kk ⎢ ⎥ θ β ⎢ ⎥ ⋮ ⋮ ⋮ ⎢ ⎥ ⎣ ⋯ ⎦ λ p 1 λ p 2 λ pk X ⊤ V − 1 ) − 1 X ⊤ V − 1 β ̂ = ( X y What , i.e. how many factors, should we have? where is a function of . k var( y ) = V θ Here: . Λ ⊤ R ̂ Assume that is a pseudo factor loading V = I n ⊗ ( Λ + Ψ ) + Λ 0 p × d matrix where . k ≤ d ≤ p

  11. Estimating variance parameters ⎡ ⎤ ω g ,1 REML or ML estimate: the typical (frequentist) approach ⎢ ⎥ ω g ,2 ⎢ ⎥ ω g = ⎢ ⎥ θ ̂ ⋮ = arg max ℓ ( θ | y , X , Z , β ) ML/REML ⎢ ⎥ θ OFAL estimate: our approach via penalised likelihood ⎣ ⎦ ω g , d ⎧ ⎫  d p d p d ‾ ‾‾‾‾‾‾‾‾ ‾  ⎪ ⎪ λ 2 θ ̂  ⎨ ⎬ = arg max ℓ ( θ ) − s ω g , l − s ω e , ij λ 0, ij | | OFAL 0, ij ∑ ∑ ∑ ∑ ∑ ⎪ ⎪ θ l =1 i =1 j = l i =1 j =1 ⎷ ⎩ ⎭ ⎡ ⎤ ⋯ ω e ,11 ω e ,12 ω e ,1 d where ⎢ ⎥ ⋯ ω e ,21 ω e ,22 ω e ,2 d ⎢ ⎥ ➤ is a tuning parameter, ⎢ ⎥ s ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ is a group-wise adaptive weight for th column of , and Ω e = ⎢ ⎥ ω e , d 1 ω e , d 2 ⋯ ω e , dd ➤ ω g , l l Λ 0 ⎢ ⎥ is an element-wise adaptive weight for th entry of . ⎢ ⎥ ⋮ ⋮ ⋮ ⎢ ⎥ ➤ ω e , ij i , j Λ 0 ⎣ ⎦ ⋯ ω e , p 1 ω e , p 2 ω e , pd 11 / 18

  12. OFAL Demonstration Say , then you would expect . s ω e ,15 → ∞ λ 15 → 0 12 / 18 Λ 0

  13. OFAL Demonstration Say , then you would expect . ∑ p ‾ ‾‾‾‾‾‾‾‾‾‾‾‾ l =1 λ 2 λ 2 ‾ s ω g ,5 → ∞ ( + ) → 0 l 5 l 6 √ Sum of squares is zero only if each element is zero, so and λ l 5 → 0 for . λ l 6 → 0 l = 1, . . . , p 13 / 18 Λ 0

  14. EM algorithm ( r ) θ ̂ ( r ) β ̂ Q ( θ ) = � ( ℓ ( θ ) | y , X , , ) ➤ with respect to the conditional density ; E-Step f ( u | y ) ➤ where is the complete log-likelihood (or residual log-likelihood); ℓ and are the estimate of and , respectively, for the th ( r ) ( r ) β ̂ θ ̂ ➤ β θ r iteration. 14 / 18

  15. M-Step 1/2 p p k k k ( r +1) λ 2 θ ̂ = arg max Q ( θ ) − s ω g , l ( − s ω e , ij λ ij | | ij ∑ ∑ ∑ ∑ ∑ ) θ l =1 i =1 j = l i =1 j =1 If is a local maximiser of above then there exists local maxiiser of below such that . θ ̂ θ ̃ τ ̃ θ ̂ θ ̃ ( , ) = See proof in Hui et al. (2018). p p d d d d ns 2 ω el ( r +1) τ ̃ ( r +1) λ 2 θ ̃ ( , ) = arg max Q ( θ ) − − s ω g , jk λ jk | | − ω el τ l jk 4 ∑ τ l ∑ ∑ ∑ ∑ ∑ θ , τ ≥ 0 l =1 k =1 j =1 j =1 k =1 l =1 ➤ Reformulates problem into an elastic-net type regularisation problem. ➤ Employ coordinate-wise optimisation to obtain loading estimates. Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 15 / 18

  16. Adaptive Weights & Tuning Parameter Selection Adaptive Weights ➤ Fit model with unpenalised likelihood first to obtain estimate, of . Λ ̃ Λ 0 0 ➤ Perform eigendecomposition . Take as the first columns of . Q ⊤ Λ ∗ QD 1/2 Λ ̃ = QD d 0 0 ➤ Construct adaptive weights as 1/2 p d d ij ) 2 ij | − 1 D − 1/2 λ ∗ λ ∗ ω e , l = ( = and ω g , ij = | . kk ∑ ∑ ∑ ( ) j = l i =1 k = l Tuning Parameter ➤ Tuning parameter may be selected from some information criterion (e.g. AIC, BIC, EBIC). s We used ERIC * . 16 / 18 * Hui, Warton & Foster (2015) Tuning Parameter Selection for the Adaptive Lasso Using ERIC. JASA

  17. Performance & Future ➤ Simulation results suggests competitive performance. See Hui et al. (2018). Details ➤ I only show this for an FA (Gaussian) model here but our paper also shows results from a negative binomial generalised linear latent variable model. ➤ We need more research into: ➤ adaptive weight construction; ➤ computational efficient approaches and ➤ study for high-dimensional problems & other non-normal responses. Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 17 / 18

  18. These slides, made using xaringan R-package, can be found at bit.ly/ecosta2019 Our methods paper: Hui, Tanaka & Warton (2018) Biometrics. Follow the platent R-package development at http://github.com/emitanaka/platent Comments/feedback welcome! dr.emi.tanaka@gmail.com @statsgen Acknowledgement Big thanks go to my collaborator Dr. Francis Hui! His EcoSta2019 talk on this afternoon 4.10pm at Room S1A01 on 18 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend