On the maximum likelihood degree of linear mixed models with two - PowerPoint PPT Presentation

On the maximum likelihood degree of linear mixed models with two variance components Mariusz Grz ˛ adziel Department of Mathematics, Wrocław University of Environmental and Life Sciences B˛ edlewo, 2 December 2016 1 / 13

Presentation based on: M. Grz ˛ adziel, On the maximum likelihood degree of linear mixed models with two variance components, arXiv preprint. 2 / 13

The model and the likelihood function I Let us consider the model N ( Y , X β, Σ( s )) , where: Y is an n × 1 normally distributed random vector with Cov ( Y ) = Σ( s ) = σ 2 1 V + σ 2 E ( Y ) = X β, 2 I n , (1) where: ◮ X is an n × p matrix of full rank , p < n , ◮ β is a p × 1 vector, ◮ V is an n × n nnd symmetric matrix, V � = 0 , rank ( V ) < n 2 ) ′ is an unknown vector of variance components s = ( σ 2 1 , σ 2 belonging to S = { s : σ 2 1 � 0 , σ 2 2 > 0 } . The twice the log-likelihood function is given, up to an additive constant, by l 0 ( β, s , Y ) := − log | Σ( s ) | − ( Y − X β ) ′ Σ − 1 ( s )( Y − X β ) . (2) The ML estimator of ( β, s ) is defined as the maximizer of l 0 ( β, s , Y ) over ( β, s ) ∈ R n × S . 3 / 13

The model and the likelihood function II Let M := I n − XX + . It can be shown that l 0 ( β, s , Y ) � l 0 (˜ β, s , Y ) = − log | Σ( s ) | − Y ′ R ( s ) Y , where R ( s ) := ( M Σ( s ) M ) + and ˜ β ( s ) := ( X ′ Σ − 1 ( s ) X ) − 1 X ′ Σ − 1 ( s ) Y . It can be checked that l 0 ( β, s , Y ) < l 0 (˜ β, s , Y ) for β � = ˜ β . It can be thus seen that the problem of computing the ML estimator of ( β, s ) reduces to finding the maximizer of l ( s , Y ) := − log | Σ( s ) | − Y ′ R ( s ) Y over s ∈ S , which we will refer to as the ML estimator of s . It can be also observed that for a given value y of the vector Y the ML estimate of s exists if and only if the ML estimate of ( β, s ) exists. 4 / 13

Multimodality of the likelihood function The likelihood function can have multiple local maxima (Hodges and Henn 2014; Lavine et al. 2015). Using methods based on local approaches may lead to finding a local (rather than global) maximum. Alternative approach: finding all stationary points of the likelihood function (using the fact that the ML equations are rational). In the case of the model with two variance components finding all stationary points of the likelihood function reduces to finding all roots of a certain univariate polynomial. (Gross et al. 2012; MG 2014). 5 / 13

The ML degree I Gross et al. (2012): The ML degree is the number of complex solutions to the (rational) likelihood equations when the data are generic . Indeed, the number of complex solutions is constant with probability one, and a data set is generic if it is not part of the null set for which the number of complex solutions is different. Drton et al. (2009): A basic principle of algebraic geometry is that the number of solutions of a system of polynomial or rational equations that depends rationally on parameters is constant except on an algebraic subset of parameter space. In our case, the rational equations under investigation are the likelihood equations and the “varying parameters” are the data. It may be interpreted as a measure of the computational complexity of the problem of solving the ML equations algebraically. 6 / 13

The ML degree II Let B be an ( n − p ) × n matrix satisfying the conditions BB ′ = I n − p , B ′ B = M . (3) Let d − 1 BVB ′ = � m i E i (4) i = 1 be the spectral decomposition of BVB ′ , where m 1 > . . . > m d − 1 > m d = 0 denotes the decreasing sequence of distinct eigenvalues of BVB ′ and E i ’s are orthogonal projectors satisfying the condition E i E j = 0 n − p for i � = j . Let E d be such that � d i = 1 E i = I n − p . Let us note that the quantities d , m i , E i do not depend on the choice of B in (3) 7 / 13

The ML degree III Theorem 1 Let d 0 stand for the number of distinct eigenvalues of the matrix V. If the model (1) satisfies the condition M ([ X , V ]) � R n , (5) then its ML degree is bounded from above by 2 d + d 0 − 4 . 8 / 13

The REML degree of the model The restricted maximum likelihood (REML) estimator of σ = ( σ 2 1 , σ 2 2 ) ′ := ML estimator of σ in 1 BVB ′ + σ 2 N{ z , 0 n − p , σ 2 2 I n − p } with z = BY . The REML degree of the model: the ML degree of the model 1 BVB ′ + σ 2 N{ z , 0 n − p , σ 2 2 I n − p } . Theorem 2 Under the assumptions of Theorem 1 the REML degree of the model (1) is bounded from above by 2 d − 3 . 9 / 13

One-way classification I The random effects model for the unbalanced one-way classification: Y ij = µ + α i + e ij ; i = 1 , . . . , q ; j = 1 , . . . , n i , (6) where Y ij is the j th observation in the i th treatment group, µ is the overall mean, α i is the effect due to the i th level of the treatment factor and e ij is the error term. The model can be expressed in the matrix form: Y = 1 n µ + Z α + ǫ, (7) where n = � q k = 1 n k , α = ( α 1 , . . . , α q ) ′ , ǫ = ( ǫ 11 , . . . , ǫ qn q ) ′ ,   1 n 1 0 n 1 · · · 0 n 1 · · · 0 n 2 1 n 2 0 n 2     Z = (8) . . . . ...   . . . . . .     0 n q 0 n q · · · 1 n q 10 / 13

One-way classification II The one-way classification model with the general mean structure considered in Gross et al. (2012) can be expressed by Y = X β + Z α + ǫ, (9) where β ∈ R p is a fixed mean parameter and X is an n × p matrix of rank p < n such that 1 n ∈ span ( X ) . (10) 11 / 13

One way classification — the ML degree and the REML degree Gross et al. (2012): ◮ The ML degree and the REML degree for the one-way classification random model are given; ◮ Conjecture: The ML degree for the one-way classification model with the general mean structure is bounded from above by 3 q − 3; the REML degree of this model is bounded from above by 2 q − 3. MG (2016): The conjecture is true under the assumption span ([ X , Z ]) � = R n . 12 / 13

Conclusion The results obtained results indicate that the approach proposed in Gnot et al. (2002), Gross et al. (2012) and MG (2014), in which all critical points of the log-likelihood function are found by solving a system of algebraic equations, may prove to be efficient for linear mixed models with two variance components. 13 / 13

On the maximum likelihood degree of linear mixed models with two - PowerPoint PPT Presentation

On the maximum likelihood degree of linear mixed models with two variance components Mariusz Grz adziel Department of Mathematics, Wrocaw University of Environmental and Life Sciences B edlewo, 2 December 2016 1 / 13 Presentation

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Outline Statistical inference for linear mixed models general form of linear mixed models

Mixing it up with random effects Joshua Loftus Mixed models Intro to mixed models What is a

Regression 2: Mixed Models Marco Baroni Practical Statistics in R Outline Mixed models with

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Logistic Regression Machine Learning 1 Where are we? We have seen the following ideas

EC400 Part II, Math for Micro: Lecture 3 Leonardo Felli NAB.SZT 13 September 2010 Sufficient

Central Limit Theorem for discrete loggases Vadim Gorin MIT (Cambridge) and IITP (Moscow)

Polytopes Associated with Symmetry Handling Christopher Hojny joint work with Marc Pfetsch

Covers universal portfolio and stochastic portfolio theory Ting-Kam Leonard Wong University

Self-testing quantum systems of arbitrary local Self-testing quantum systems of arbitrary local

The Role of Normware in Trustworthy and Explainable AI Giovanni Sileno (g.sileno@uva.nl),

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -