Model selection and estimation for latent variable models Presented - PowerPoint PPT Presentation

Model selection and estimation for latent variable models Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen Download pdf of these slides June 26th 2019 @ EcoSta2019 1 / 18 here.

Motivation Many scientific disciplines use latent variable (LV) models, or its special case, factor analytic (FA) models (e.g. medical, economics and agriculture ). Data: ` ` corn hybrids n = 64 ` ` trials p = 6 1-3 replicates at each trial Trait: Yield 2 / 18

Which corn hybrid is the best? The transposed data matrix Y ⊤ 3 / 18

Factor analytic model A dimension reduction: p variables are written by a smaller number of underlying, unobserved factors where k (usually much less). k ≤ p Key symbol: Λ p × k Key point: like a linear regression but the common factors are not observed. 4 / 18

Factor analytic model - multivariate form Putting it all together in a matrix form or ... 5 / 18

Univariate form Thinking as a linear mixed model y = X β + Z u + e u 0 G 0 ] ∼ N ([ [ ] [ ]) e 0 0 R Here we have Y ⊤ ➤ y = vec( ) and ➤ X = 1 n ⊗ I p β = μ and f ⊤ ϵ ⊤ ) ⊤ ➤ Z = diag( I n ⊗ Λ , I np ) u = ( , ➤ G = diag( I nk , I n ⊗ Ψ ) What about and ? e R 6 / 18

All cells are not made equal The transposed data matrix Y ⊤ Most of the cells have 3 replicates but some are 2 replicates and one with 1 observation. 7 / 18

Two-stage analysis ➤ Quite often data is processed to fit the rectangular structure. ➤ In this case, "observations" in data matrix are estimates. ➤ Estimates may have different precisions. ➤ These precisions may be used as weights for the second step. ➤ We take the diagonal entries, , of precision matrix c ii np × np as weights for the next step, i.e. take as a known C − 1 R yy diagonal matrix with diagional entries as . c ii ➤ Alternatively, we can do a one-stage analysis (better, also can handle missing values) but not in this talk. 8 / 18

Processed 1 n ⊗ I p I np ⏞ ⏞ y = X β �� + Z [( ⊗ Λ ) f + ϵ ] + e I n        data u �� : �� Λ ⊤ I n ⊗ ( Λ + Ψ ) 0 np × np 0 np u �� : �� ] ∼ N ( [ ] , [ corndata [ R ̂ ]) e 0 np 0 np × np ➤ Our R-package platent fits this model with FA order selected by our # A tibble: 384 x 4 site hybrid yield weights <fct> <fct> <dbl> <dbl> OFAL algorithm (Hui et al. 2018, Biometrics). 1 S1 G01 144. 0.00999 2 S2 G01 67.5 0.00993 ➤ The current capibility is limited to above model. 3 S3 G01 105. 0.00857 4 S4 G01 154. 0.0113 5 S5 G01 110. 0.0143 6 S6 G01 88.3 0.00361 7 S1 G02 156. 0.00999 library (platent) # still in development 8 S2 G02 79.8 0.00662 9 S3 G02 61.7 0.00857 fit_ofal(yield ~ site + id(hybrid):rr(site) + id(hybrid):diag(site), 10 S4 G02 138. 0.0113 weights = weights, data = corndata) # … with 374 more rows fit_ofal(yield ~ site + id(hybrid):fa(site), weights = weights, data = corndata) 9 / 18

OFAL algorithm Estimating variance parameters Recall is a factor loading matrix: We need to estimate: Λ p × k ➤ fixed effects and β ⎡ ⎤ ⋯ λ 11 λ 12 λ 1 k ➤ variance parameters ⎢ ⎥ ) ⊤ ) ⊤ ) ⊤ θ = (vec( Λ , diag( Ψ ⋯ λ 21 λ 22 λ 2 k ⎢ ⎥ Estimating fixed effects ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ . For given , use BLUE for : ⎢ ⎥ ⋯ λ k 1 λ k 2 λ kk ⎢ ⎥ θ β ⎢ ⎥ ⋮ ⋮ ⋮ ⎢ ⎥ ⎣ ⋯ ⎦ λ p 1 λ p 2 λ pk X ⊤ V − 1 ) − 1 X ⊤ V − 1 β ̂ = ( X y What , i.e. how many factors, should we have? where is a function of . k var( y ) = V θ Here: . Λ ⊤ R ̂ Assume that is a pseudo factor loading V = I n ⊗ ( Λ + Ψ ) + Λ 0 p × d matrix where . k ≤ d ≤ p

Estimating variance parameters ⎡ ⎤ ω g ,1 REML or ML estimate: the typical (frequentist) approach ⎢ ⎥ ω g ,2 ⎢ ⎥ ω g = ⎢ ⎥ θ ̂ ⋮ = arg max ℓ ( θ | y , X , Z , β ) ML/REML ⎢ ⎥ θ OFAL estimate: our approach via penalised likelihood ⎣ ⎦ ω g , d ⎧ ⎫  d p d p d ‾ ‾‾‾‾‾‾‾‾ ‾  ⎪ ⎪ λ 2 θ ̂  ⎨ ⎬ = arg max ℓ ( θ ) − s ω g , l − s ω e , ij λ 0, ij | | OFAL 0, ij ∑ ∑ ∑ ∑ ∑ ⎪ ⎪ θ l =1 i =1 j = l i =1 j =1 ⎷ ⎩ ⎭ ⎡ ⎤ ⋯ ω e ,11 ω e ,12 ω e ,1 d where ⎢ ⎥ ⋯ ω e ,21 ω e ,22 ω e ,2 d ⎢ ⎥ ➤ is a tuning parameter, ⎢ ⎥ s ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ is a group-wise adaptive weight for th column of , and Ω e = ⎢ ⎥ ω e , d 1 ω e , d 2 ⋯ ω e , dd ➤ ω g , l l Λ 0 ⎢ ⎥ is an element-wise adaptive weight for th entry of . ⎢ ⎥ ⋮ ⋮ ⋮ ⎢ ⎥ ➤ ω e , ij i , j Λ 0 ⎣ ⎦ ⋯ ω e , p 1 ω e , p 2 ω e , pd 11 / 18

OFAL Demonstration Say , then you would expect . s ω e ,15 → ∞ λ 15 → 0 12 / 18 Λ 0

OFAL Demonstration Say , then you would expect . ∑ p ‾ ‾‾‾‾‾‾‾‾‾‾‾‾ l =1 λ 2 λ 2 ‾ s ω g ,5 → ∞ ( + ) → 0 l 5 l 6 √ Sum of squares is zero only if each element is zero, so and λ l 5 → 0 for . λ l 6 → 0 l = 1, . . . , p 13 / 18 Λ 0

EM algorithm ( r ) θ ̂ ( r ) β ̂ Q ( θ ) = � ( ℓ ( θ ) | y , X , , ) ➤ with respect to the conditional density ; E-Step f ( u | y ) ➤ where is the complete log-likelihood (or residual log-likelihood); ℓ and are the estimate of and , respectively, for the th ( r ) ( r ) β ̂ θ ̂ ➤ β θ r iteration. 14 / 18

M-Step 1/2 p p k k k ( r +1) λ 2 θ ̂ = arg max Q ( θ ) − s ω g , l ( − s ω e , ij λ ij | | ij ∑ ∑ ∑ ∑ ∑ ) θ l =1 i =1 j = l i =1 j =1 If is a local maximiser of above then there exists local maxiiser of below such that . θ ̂ θ ̃ τ ̃ θ ̂ θ ̃ ( , ) = See proof in Hui et al. (2018). p p d d d d ns 2 ω el ( r +1) τ ̃ ( r +1) λ 2 θ ̃ ( , ) = arg max Q ( θ ) − − s ω g , jk λ jk | | − ω el τ l jk 4 ∑ τ l ∑ ∑ ∑ ∑ ∑ θ , τ ≥ 0 l =1 k =1 j =1 j =1 k =1 l =1 ➤ Reformulates problem into an elastic-net type regularisation problem. ➤ Employ coordinate-wise optimisation to obtain loading estimates. Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 15 / 18

Adaptive Weights & Tuning Parameter Selection Adaptive Weights ➤ Fit model with unpenalised likelihood first to obtain estimate, of . Λ ̃ Λ 0 0 ➤ Perform eigendecomposition . Take as the first columns of . Q ⊤ Λ ∗ QD 1/2 Λ ̃ = QD d 0 0 ➤ Construct adaptive weights as 1/2 p d d ij ) 2 ij | − 1 D − 1/2 λ ∗ λ ∗ ω e , l = ( = and ω g , ij = | . kk ∑ ∑ ∑ ( ) j = l i =1 k = l Tuning Parameter ➤ Tuning parameter may be selected from some information criterion (e.g. AIC, BIC, EBIC). s We used ERIC * . 16 / 18 * Hui, Warton & Foster (2015) Tuning Parameter Selection for the Adaptive Lasso Using ERIC. JASA

Performance & Future ➤ Simulation results suggests competitive performance. See Hui et al. (2018). Details ➤ I only show this for an FA (Gaussian) model here but our paper also shows results from a negative binomial generalised linear latent variable model. ➤ We need more research into: ➤ adaptive weight construction; ➤ computational efficient approaches and ➤ study for high-dimensional problems & other non-normal responses. Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 17 / 18

These slides, made using xaringan R-package, can be found at bit.ly/ecosta2019 Our methods paper: Hui, Tanaka & Warton (2018) Biometrics. Follow the platent R-package development at http://github.com/emitanaka/platent Comments/feedback welcome! dr.emi.tanaka@gmail.com @statsgen Acknowledgement Big thanks go to my collaborator Dr. Francis Hui! His EcoSta2019 talk on this afternoon 4.10pm at Room S1A01 on 18 / 18

Model selection and estimation for latent variable models Presented - PowerPoint PPT Presentation

Model selection and estimation for latent variable models Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen Download pdf of these slides June 26th 2019 @ EcoSta2019 1 / 18 here. Motivation Many

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

1 Latent variable models In the next section we will discuss latent variable models for

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 6 Stefano

Learning Latent Variable Models through Tensor Methods Anima Anandkumar U.C. Irvine Challenges

Outline Latent Variable Generative Models Cooperative Vector Quantizer Model Model

A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies James

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Discrete Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 15

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

WELCOME Kansas Agriculture and Rural Leadership Program Agenda Our history From idea to

Focus: EIDL and PPP April 8, 2020 Hosted by: Kansas Corn Panelist: Brian Kuehl, K Coe Isom Kala

RT 200 Slides Sensor Based N Give a value to N response RI 1.1 is 10% increase 1.2 is

Informatics 1 Computation and Logic The Wolf, the Goose, and the Corn Michael Fourman @mp4man

Lessons from Global Patterns of Resistance to Bt Crops for Sustainable Pest Management Bruce

Disorders Practicum Study Materials Causal Category # Agent # Part of Plant Damaged # Released

RIVER CANE Cultural Workhorse and Ecological Powerhouse RTCAR Revitalization of Traditional

Mi Millennial Attitudes s Toward Bio Bioplas plastics ics August 2018 1 Bi Biop