algorithmes gradient proximaux pour l inf erence
play

Algorithmes Gradient-Proximaux pour linf erence statistique - PowerPoint PPT Presentation

Algorithmes Gradient-Proximaux pour linf erence statistique Algorithmes Gradient-Proximaux pour linf erence statistique Gersende Fort Institut de Math ematiques de Toulouse, CNRS Toulouse, France Algorithmes Gradient-Proximaux


  1. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Gersende Fort Institut de Math´ ematiques de Toulouse, CNRS Toulouse, France

  2. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Based on joint works with Yves Atchad´ e (Univ. Michigan, USA) Jean-Fran¸ cois Aujol (IMB, Bordeaux, France) Eric Moulines (Ecole Polytechnique, France) Adeline Samson et Edouard Ollier (Univ. Grenoble Alpes, France). Charles Dossal (IMT). Laurent Risser (IMT) → On Perturbed Proximal-Gradient algorithms (JMLR, 2017) ֒ → Stochastic Proximal Gradient Algorithms for Penalized Mixed ֒ Models (Stat & Computing, 2018) ֒ → Acceleration for perturbed Proximal Gradient algorithms (work in progress) ֒ → Algorithmes Gradient Proximaux Stochastiques (GRETSI, 2017)

  3. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations Outline Motivations Pharmacokinetic General Case: Latent Variable Models Votes in the US congress General case: Discrete graphical models Conclusion, part I Penalized ML through Perturbed Stochastic-Gradient algorithms Asymptotic behavior of the algorithm Numerical illustration

  4. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations Pharmacokinetic Motivation 1: Pharmacokinetic (1/2) N patients. At time 0 : dose D of a drug. For patient i , evolution of the concentration at times t ij , 1 ≤ j ≤ J i : observations { Y ij , 1 ≤ j ≤ J i } . Model: i.i.d. ∼ N (0 , σ 2 ) Y ij = F ( t ij , X i ) + ǫ ij ǫ ij i.i.d. X i = Z i β + d i ∈ R L ∼ N L (0 , Ω) and independent of ǫ • d i Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates

  5. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations Pharmacokinetic Motivation 1: Pharmacokinetic (1/2) N patients. At time 0 : dose D of a drug. For patient i , evolution of the concentration at times t ij , 1 ≤ j ≤ J i : observations { Y ij , 1 ≤ j ≤ J i } . Model: i.i.d. ∼ N (0 , σ 2 ) Y ij = F ( t ij , X i ) + ǫ ij ǫ ij i.i.d. X i = Z i β + d i ∈ R L ∼ N L (0 , Ω) and independent of ǫ • d i Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates Example of model F : monocompartimental, with digestive absorption � � exp( − Cl F ( t, [ln Cl , ln V , ln A ]) = C ( Cl,V,A,D ) V t ) − exp( − A t ) For each patient i ,         β 1 ,Cl Z i 1 ,Cl + · · · + β K,Cl Z i ln Cl β 0 ,Cl d Cl ,i K,Cl idem, with covariates Z i ln V = β 0 ,V  + k,V and coefficients β k,V  + d V ,i       idem, with covariates Z i ln A β 0 ,A k,A and coefficients β k,A d A ,i i

  6. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations Pharmacokinetic Motivation 1: Pharmacokinetic (1/2) N patients. At time 0 : dose D of a drug. For patient i , evolution of the concentration at times t ij , 1 ≤ j ≤ J i : observations { Y ij , 1 ≤ j ≤ J i } . Model: i.i.d. ∼ N (0 , σ 2 ) Y ij = F ( t ij , X i ) + ǫ ij ǫ ij i.i.d. X i = Z i β + d i ∈ R L ∼ N L (0 , Ω) and independent of ǫ • d i Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates Statistical analysis: estimation of θ = ( β, σ 2 , Ω) , under sparsity constraints on β selection of the covariates based on ˆ β . ֒ → Penalized Maximum Likelihood

  7. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations Pharmacokinetic Motivation : Pharmacokinetic (2/2) Model: i.i.d. ∼ N (0 , σ 2 ) Y ij = f ( t ij , X i ) + ǫ ij ǫ ij i.i.d. X i = Z i β + d i ∈ R L d i ∼ N L (0 , Ω) and independent of ǫ • Z i known matrix s.t. each row of X i has in intercept (fixed effect) and covariates Likelihoods: Complete likelihood: the distribution of { Y ij , X i ; 1 ≤ i ≤ N, 1 ≤ j ≤ J i } has an explicit expression. � N � N � � J i � � � N ( f ( t ij , X j ) , σ 2 )[ Y ij ] N L ( Z i β, Ω)[ X i ] i =1 j =1 i =1 Likelihood: the distribution of { Y ij ; 1 ≤ i ≤ N, 1 ≤ j ≤ J i } is not explicit . ML: here, the likelihood is not concave .

  8. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations General Case: Latent Variable Models General case: Latent variable models The log-likelihood of the observations Y is of the form (dependende upon Y is omitted) � θ �→ log L ( θ ) L ( θ ) = p θ ( x ) µ ( d x ) , X where µ is a σ -finite positive measure on a set X. x collects the missing/latent data. previous example: x ← ( X 1 , · · · , XN ) , µ ← lebesgue on R LN In these models, the complete likelihood p θ ( x ) can be evaluated explicitly, the likelihood has no closed expression. The exact integral could be replaced by a Monte Carlo approximation ; known to be inefficient. Numerical methods based on the a posteriori distribution of the missing data are preferred (see e.g. Expectation-Maximization approaches). → What about the gradient of the (log)-likelihood ? ֒

  9. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations General Case: Latent Variable Models Latent variable model: Gradient of the likelihood � log L ( θ ) = log p θ ( x ) µ ( d x ) Under regularity conditions, θ �→ log L ( θ ) is C 1 and � ∂ θ p θ ( x ) µ ( d x ) ∇ log L ( θ ) = � p θ ( z ) µ ( d z ) � p θ ( x ) µ ( d x ) = ∂ θ log p θ ( x ) � p θ ( z ) µ ( d z ) � �� � the a posteriori distribution

  10. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations General Case: Latent Variable Models Latent variable model: Gradient of the likelihood � log L ( θ ) = log p θ ( x ) µ ( d x ) Under regularity conditions, θ �→ log L ( θ ) is C 1 and � ∂ θ p θ ( x ) µ ( d x ) ∇ log L ( θ ) = � p θ ( z ) µ ( d z ) � p θ ( x ) µ ( d x ) = ∂ θ log p θ ( x ) � p θ ( z ) µ ( d z ) � �� � the a posteriori distribution The gradient of the log-likelihood � ∇ θ { log L ( θ ) } = H θ ( x ) π θ ( d x ) is an untractable expectation w.r.t. the conditional distribution of the latent variable given the observations Y (known up to a constant) For all ( x, θ ) , H θ ( x ) can be evaluated.

  11. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations Votes in the US congress Motivation 2: relationships in a graph (1/2) p nodes in a graph (e.g. p senators from the US congress) each node takes values in {− 1 , 1 } (e.g. each node codes for no/yes in a vote) N pictures of the graph (e.g. N votes) Model: Each observation Y ( i ) ∈ {− 1 , 1 } p ; i.i.d. observations with distribution � p � p − 1 p � � � π θ ( y ) ∝ exp θ i y i + θ ij y i y j i =1 i =1 j = i +1 Statistical Analysis: estimation of θ , under penalty (sparse graph, regularization N << p 2 / 2 ) classification of the nodes → Penalized Maximum Likelihood ֒

  12. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations Votes in the US congress Motivation 2: relationships in a graph (2/2) Model: Each observation Y ( n ) ∈ {− 1 , 1 } p ; i.i.d. observations with distribution � p � p − 1 p π θ ( y ) = 1 � � � Z θ exp θ i y i + θ ij y i y j i =1 i =1 j = i +1 def = ( Y (1) , · · · , Y ( N ) ) Log-Likelihood: Y � N � N � � p p − 1 p � � � � � Y ( n ) Y ( n ) Y ( n ) − N log Z θ ℓ ( θ ) = θ i + θ ij i i j i =1 n =1 i =1 j = i +1 n =1 = � Θ , S ( Y ) � − N log Z θ = � Ψ( θ ) , S ( Y ) � + Φ( θ ) Likelihood : not explicit � p � p − 1 p � � � � def Z θ = exp θ i y i + θ ij y i y j y ∈{− 1 , 1 } p i =1 i =1 j = i +1 ML: here, the likelihood is concave.

  13. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations General case: Discrete graphical models General Case: Discrete graphical models N independent observations of an undirected graph with p nodes. Each node takes values in a finite alphabet X. N i.i.d. observations Y ( i ) in X p with distribution   p 1 � � def y = ( y 1 , · · · , y p ) �→ π θ ( y ) = Z θ exp θ kk B ( y k , y k ) + θ kj B ( y k , y j )   k =1 1 ≤ j<k ≤ p = 1 �� �� θ, ¯ Z θ exp B ( y ) where ¯ B is a symmetric function. θ is a symmetric p × p matrix. the normalizing constant (partition function) Z θ can not be computed - sum over | X | p terms.

  14. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations General case: Discrete graphical models Markov random field: Likelihood ◮ Likelihood of the form (scalar product between matrices = Frobenius inner product) � � N 1 θ, 1 � ¯ N log L ( θ ) = B ( Y i ) − log Z θ N i =1 The likelihood is untractable.

  15. Algorithmes Gradient-Proximaux pour l’inf´ erence statistique Motivations General case: Discrete graphical models Markov random field: Gradient of the likelihood ◮ Gradient of the form � 1 � N � = 1 � ¯ ¯ ∇ θ N log L ( θ ) B ( Y i ) − B ( y ) π θ ( y ) µ ( d y ) N X p i =1 with 1 �� �� def θ, ¯ π θ ( y ) = Z θ exp B ( y ) . The gradient of the (log)-likelihood is untractable

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend