maximum likelihood density estimation under total
play

Maximum Likelihood Density Estimation under Total Positivity Elina - PowerPoint PPT Presentation

Maximum Likelihood Density Estimation under Total Positivity Elina Robeva MIT joint work with Bernd Sturmfels, Ngoc Tran, and Caroline Uhler arXiv:1806.10120 ICERM Workshop on Nonlinear Algebra in Applications November 12, 2018 1 / 48


  1. Maximum Likelihood Density Estimation under Total Positivity Elina Robeva MIT joint work with Bernd Sturmfels, Ngoc Tran, and Caroline Uhler arXiv:1806.10120 ICERM Workshop on Nonlinear Algebra in Applications November 12, 2018 1 / 48

  2. Density estimation Given i.i.d. samples X = { x 1 , . . . , x n } ⊂ R d from an unknown distribution on R d with density p , can we estimate p ? parametric: assume that p lies in some parametric family, and estimate parameters • finite-dimensional problem • too restrictive; the real-world distribution might not lie in the specified parametric family non-parametric: assume that p lies in a non-parametric family, e.g. impose shape-constraints on p (convex, log-concave, monotone, etc.) • infinite-dimensional problem • need constraints that are: • strong enough so that there is no spiky behavior • weak enough so that function class is large 2 / 48

  3. Density estimation Given i.i.d. samples X = { x 1 , . . . , x n } ⊂ R d from an unknown distribution on R d with density p , can we estimate p ? parametric: assume that p lies in some parametric family, and estimate parameters • finite-dimensional problem • too restrictive; the real-world distribution might not lie in the specified parametric family non-parametric: assume that p lies in a non-parametric family, e.g. impose shape-constraints on p (convex, log-concave, monotone, etc.) • infinite-dimensional problem • need constraints that are: • strong enough so that there is no spiky behavior • weak enough so that function class is large 2 / 48

  4. Shape-constrained density estimation • monotonically decreasing densities: [G renander 1956, R ao 1969] • convex densities: [A nevski 1994, G roeneboom , J ongbloed, and W ellner 2001] • log-concave densities: [C ule , S amworth, and S tewart 2008] • generalized additive models with shape constraints: [C hen and S amworth 2016] • this talk: totally positive and log-concave densities 3 / 48

  5. MTP 2 distributions • A distribution with density p on X ⊆ R d is multivariate totally positive of order 2 (or MTP 2 ) if p ( x ) p ( y ) ≤ p ( x ∧ y ) p ( x ∨ y ) for all x , y ∈ X , where x ∧ y and x ∨ y are the componentwise minimum and maximum. • MTP 2 is the same as log-supermodular : log( p ( x ))+log( p ( y )) ≤ log( p ( x ∧ y ))+log( p ( x ∨ y )) for all x , y ∈ X . 4 / 48

  6. MTP 2 distributions • A distribution with density p on X ⊆ R d is multivariate totally positive of order 2 (or MTP 2 ) if p ( x ) p ( y ) ≤ p ( x ∧ y ) p ( x ∨ y ) for all x , y ∈ X , where x ∧ y and x ∨ y are the componentwise minimum and maximum. • MTP 2 is the same as log-supermodular : log( p ( x ))+log( p ( y )) ≤ log( p ( x ∧ y ))+log( p ( x ∨ y )) for all x , y ∈ X . • A random vector X taking values in R d is positively associated if for any non-decreasing functions φ, ψ : R d → R cov( φ ( X ) , ψ ( X )) ≥ 0 . • MTP 2 implies positive association (F ortuin K asteleyn G inibre inequality, 1971). 4 / 48

  7. MTP 2 distributions • A distribution with density p on X ⊆ R d is multivariate totally positive of order 2 (or MTP 2 ) if p ( x ) p ( y ) ≤ p ( x ∧ y ) p ( x ∨ y ) for all x , y ∈ X , where x ∧ y and x ∨ y are the componentwise minimum and maximum. • MTP 2 is the same as log-supermodular : log( p ( x ))+log( p ( y )) ≤ log( p ( x ∧ y ))+log( p ( x ∨ y )) for all x , y ∈ X . • A random vector X taking values in R d is positively associated if for any non-decreasing functions φ, ψ : R d → R cov( φ ( X ) , ψ ( X )) ≥ 0 . • MTP 2 implies positive association (F ortuin K asteleyn G inibre inequality, 1971). 4 / 48

  8. Properties of MTP 2 distributions Theorem (F allat, L auritzen, S adeghi, U hler, W ermuth and Z wiernik, 2015) If X = ( X 1 , . . . , X d ) is MTP 2 , then (i) any marginal distribution is MTP 2 , (ii) any conditional distribution is MTP 2 , (iii) X has the marginal independence structure X i ⊥ ⊥ X j ⇐ ⇒ cov ( X i , X j ) = 0 . Theorem (K arlin and R inott, 1980) If p ( x ) > 0 and p is MTP 2 for any pair of coordinates when the others are held constant, then p is MTP 2 . 5 / 48

  9. Examples of MTP 2 distributions • A Gaussian random variable X ∼ N ( µ, Σ) is MTP 2 whenever Σ − 1 is an M-matrix, i.e. its off-diagonal entries are nonpositive. • The joint distribution of observed variables influenced by one hidden variable Z X 5 X 1 X 4 X 2 X 3 • Very common in real data: e.g. IQ test scores, phylogenetics data, financial econometrics data, and others • Many models imply MTP 2 : • Ferromagnetic Ising models • Order statistics of i.i.d. variables • Brownian motion tree models • Latent tree models (e.g. single factor analysis models) 6 / 48

  10. Maximum Likelihood Estimation Given i.i.d. samples X = { x 1 , . . . , x n } ⊂ R d with weights w = ( w 1 , . . . , w n ) (where w 1 , . . . , w n ≥ 0, � w i = 1) from a distribution p on R d , can we estimate p ? The log-likelihood of observing X = { x 1 , . . . , x n } with weights w = ( w 1 , . . . , w n ) if they are drawn i.i.d. from p is (up to an additive constant) n � ℓ p ( X , w ) := w i log( p ( x i )) . i =1 7 / 48

  11. Maximum Likelihood Estimation Given i.i.d. samples X = { x 1 , . . . , x n } ⊂ R d with weights w = ( w 1 , . . . , w n ) (where w 1 , . . . , w n ≥ 0, � w i = 1) from a distribution p on R d , can we estimate p ? The log-likelihood of observing X = { x 1 , . . . , x n } with weights w = ( w 1 , . . . , w n ) if they are drawn i.i.d. from p is (up to an additive constant) n � ℓ p ( X , w ) := w i log( p ( x i )) . i =1 We would like to n � maximize p w i log( p ( x i )) i =1 s.t. p is an MTP 2 density . 7 / 48

  12. Maximum Likelihood Estimation Given i.i.d. samples X = { x 1 , . . . , x n } ⊂ R d with weights w = ( w 1 , . . . , w n ) (where w 1 , . . . , w n ≥ 0, � w i = 1) from a distribution p on R d , can we estimate p ? The log-likelihood of observing X = { x 1 , . . . , x n } with weights w = ( w 1 , . . . , w n ) if they are drawn i.i.d. from p is (up to an additive constant) n � ℓ p ( X , w ) := w i log( p ( x i )) . i =1 We would like to n � maximize p w i log( p ( x i )) i =1 s.t. p is an MTP 2 density . 7 / 48

  13. Maximum Likelihood Estimation under MTP 2 Suppose we observe two points: X = { x 1 , x 2 } ⊂ R 2 . We can find a sequence of MTP 2 densities p 1 , p 2 , p 3 , . . . such that ℓ p n ( X ) → ∞ as n → ∞ . x 1 x 1 x 1 x 2 x 2 x 2 p 1 p 2 p 3 Thus, the MLE doesn’t exist. 8 / 48

  14. Maximum Likelihood Estimation under MTP 2 Suppose we observe two points: X = { x 1 , x 2 } ⊂ R 2 . We can find a sequence of MTP 2 densities p 1 , p 2 , p 3 , . . . such that ℓ p n ( X ) → ∞ as n → ∞ . x ∨ y x x 1 x 1 x 1 x ∧ y y x 2 x 2 x 2 p 1 p 2 p 3 Thus, the MLE doesn’t exist. 9 / 48

  15. Maximum Likelihood Estimation under MTP 2 To ensure that the likelihood function is bounded, we impose the condition that p is log-concave. n � maximize p w i log( p ( x i )) i =1 s.t. p is an MTP 2 density , and p is log-concave . A function f : R d → R is log-concave if its logarithm is concave. 10 / 48

  16. Maximum Likelihood Estimation under MTP 2 To ensure that the likelihood function is bounded, we impose the condition that p is log-concave. n � maximize p w i log( p ( x i )) i =1 s.t. p is an MTP 2 density , and p is log-concave . A function f : R d → R is log-concave if its logarithm is concave. • Log-concavity is a natural assumption because it ensures the density is continuous and includes many known families of parametric distributions. • Log-concave families: • Gaussian; Uniform( a , b ); Gamma( k , θ ) for k ≥ 1; Beta( a , b ) for a , b ≥ 1. • Maximum likelihood estimation under log-concavity is a well-studied problem (Cule et al. 2008, D¨ umbgen et al. 2009, Schuhmacher et al. 2010, . . . ). 10 / 48

  17. Maximum Likelihood Estimation under MTP 2 To ensure that the likelihood function is bounded, we impose the condition that p is log-concave. n � maximize p w i log( p ( x i )) i =1 s.t. p is an MTP 2 density , and p is log-concave . A function f : R d → R is log-concave if its logarithm is concave. • Log-concavity is a natural assumption because it ensures the density is continuous and includes many known families of parametric distributions. • Log-concave families: • Gaussian; Uniform( a , b ); Gamma( k , θ ) for k ≥ 1; Beta( a , b ) for a , b ≥ 1. • Maximum likelihood estimation under log-concavity is a well-studied problem (Cule et al. 2008, D¨ umbgen et al. 2009, Schuhmacher et al. 2010, . . . ). 10 / 48

  18. Maximum Likelihood Estimation under Log-Concavity n � maximize p w i log( p ( x i )) i =1 s.t. p is a density and p is log-concave . Theorem ( Cule, Samworth and Stewart 2008 ) • With probability 1, a log-concave maximum likelihood estimator ˆ p exists and is unique. 11 / 48

  19. Maximum Likelihood Estimation under Log-Concavity n � maximize p w i log( p ( x i )) i =1 s.t. p is a density and p is log-concave . Theorem ( Cule, Samworth and Stewart 2008 ) • With probability 1, a log-concave maximum likelihood estimator ˆ p exists and is unique. • Moreover, log (ˆ p ) is a ’tent-function’ supported on the convex hull of the data P ( X ) = conv ( x 1 , . . . , x n ) . 11 / 48

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend