An empirical Bayes procedure for the selection of Gaussian graphical - PowerPoint PPT Presentation

An empirical Bayes procedure for the selection of Gaussian graphical models Estimation bay´ esienne pour les mod` eles graphiques gaussiens d´ ecomposables Jean-Michel Marin I3M, Universit´ e Montpellier 2 joint with Sophie Donnet , Universit´ e Paris Dauphine Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 1

Introduction The last decade has witnessed the apparition of applied problems typ- ified by very high-dimensional variables, in marketing database or gene expression studies for instance. Graphical modelling is a form of multivariate analysis that uses graphs to represent models. They enable concise representations of associational and causal relations between variables under study. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 2

There is two main types of graphical models: • undirected graphical models; • directed acyclic graphical models. Lauritzen (1996) We shall concentrate on undirected graphs. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 3

Example of an undirected graph 1841 employees of a car factory 6 binary variables S: smoking (yes or not) M: strenuous mental work (yes or not) P: strenuous physical work (yes or not) B: blood pressure ( < 140 or ≥ 140) L: ratio of lipoproteins ( < 3 or ≥ 3) F: family history of coronary heart disease (yes or not) Madigan and Raftery (1994) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 4

Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 5

If the graph is known, the parameters of the model are easily estimated. However, a quite challenging issue is the determination of the set of most appropriate graphs for a given dataset. We consider this problem and the case of decomposable Gaussian graphical models Dawid and Lauritzen (1993) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 6

Plan • Background on Bayesian model selection • Background on decomposable Gaussian graphical models • Bayesian tools for Gaussian graphical models • An empirical Bayes procedure via the SAEM-MCMC algorithm • A new Metropolis-Hastings sampler to explore the space of graphs • Numerical experiments Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 7

Background on Bayesian model selection Several models available for the same observation M i : y ∼ f i ( y | θ i ) , i ∈ I where I can be finite or infinite Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 8

Probabilise the entire model/parameter space • allocate probabilities p i to all models M i • define priors π i ( θ i ) for each parameter space Θ i • compute � f i ( y | θ i ) π i ( θ i )d θ i p i Θ i P ( M i | y ) = � � p j f j ( y | θ j ) π j ( θ j )d θ j Θ j j • take largest P ( M i | y ) to determine “best” model, or use averaged predictive � � f j ( y ′ | θ j , y ) π j ( θ j | y )d θ j P ( M j | y ) Θ j j Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 9

Background on decomposable Gaussian graphical models Let G = ( V, E ) be an undirected graph: • V = { 1 , . . . , p } is the vertex set; • E ⊆ { ( i, j ) : 1 ≤ i < j ≤ p } is the edge set: if ( a, b ) ∈ E then vertices a and b are adjacent in G . A graph or subgraph is complete if all its vertices are joined by an edge. A complete subgraph that is not contained within another complete subgraph is called a clique. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 10

Let C = { C 1 , . . . , C k } be the set of cliques of G . An ordering of all the cliques ( C 1 , . . . , C k ) is said to be perfect if the vertices of each clique C i also contained in any previous clique C 1 , . . . , C i − 1 are all members of one previous clique; that is ∀ i = 2 , 3 , . . . , k , S i = C i ∩ ∪ i − 1 j =1 C i ⊆ C h for some h = h ( i ) ∈ { 1 , 2 , . . . , i − 1 } . S = { S 2 , . . . , S k } is the set of separators associated to the perfect ordering { C 1 , . . . , C k } . If an undirected graph admits a perfect ordering it is said to be decomposable. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 11

The following graph (used as benchmark in the following) is decomposable. k = 5, C 1 = { 1 , 2 , 3 } , C 2 = { 2 , 3 , 5 , 6 } , C 3 = { 2 , 4 , 5 } , C 4 = { 5 , 6 , 7 } and C 5 = { 6 , 7 , 8 , 9 } , S 2 = { 2 , 3 } , S 3 = { 2 , 5 } , S 4 = { 5 , 6 } and S 5 = { 6 , 7 } . Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 12

If (2 , 6) / ∈ E and (3 , 5) / ∈ E , the graph is not decomposable any more. k = 5, C 1 = { 1 , 2 , 3 } , C 2 = { 2 , 4 , 5 } , C 3 = { 3 , 6 } , C 4 = { 5 , 6 , 7 } and C 5 = { 6 , 7 , 8 , 9 } Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 13

With p vertices, the number of possible edges is T = p ( p − 1) and the total 2 number of graphs is 2 T . The total number of decomposable graphs with p vertices can be calcu- lated for moderate values of p , for instance: if p = 6 there is 32 , 768 graphs and 18 , 154 are decomposable (around 55%); if p = 8, there is 268 , 435 , 456 graphs and 30 , 888 , 596 are decomposable (around 12%). Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 14

A pair ( A, B ) of subsets of the vertex set V of an undirected graph G is said to form a decomposition of G if • V = A ∪ B ; • A ∩ B is complete; • A ∩ B separates A from B (any path from a vertex in A to a vertex in B goes through A ∩ B ). Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 15

To each vertex v ∈ V , we associate a random variable y v . For A ⊆ V , y A = ( y v ) v ∈ A indicates the collection of random variables { y v : v ∈ A } . To ease the notation, let y = y V . The probability distribution of y is said to be Markov with respect to G , if for any decomposition ( A, B ) of G , y A is independent of y B given y A ∩ B (global Markov property). A graphical model is a family of distributions on y which are Markov with respect to a graph. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 16

A Gaussian graphical model is such that y |G , Σ G ∼ N p (0 p , Σ G ) , (1) where Σ G is a positive definite matrix which ensures that the distribution of y is Markov with respect to G . Σ G ensures that the distribution of y is Markov if and only if � � Σ − 1 ( i, j ) / ∈ E ⇐ ⇒ ( i,j ) = 0 . G Dempster (1972) (covariance selection models) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 17

In a Gaussian graphical model, the global, local and pairwise Markov properties are equivalent. Local Markov property: every variable is conditionally independent of the remaining, given its neighbours. Pairwise Markov property: any non-adjacent pair of random variables are conditionally independent given the remaning. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 18

The mean parameter is typically set to zero: the data we analyze will be expressed as deviation from the sample mean. We observe a sample y 1 , . . . , y n from (1) (the data are centered). We would like to identify the set of most relevant graphs. For the considered multivariate random phenomenon, we are interested in the set of most relevant conditional independence structures. = ⇒ explore huge graph space. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 19

Bayesian tools for Gaussian graphical models We consider the Bayesian paradigm. Conditionally on G , we use a Hyper-Inverse Wishart (HIW) distribution associated to the graph G as prior distribution on Σ G : Σ G |G , δ G , Φ G ∼ HIW G ( δ G , Φ G ) where δ G > 0 and Φ G is a p × p symmetric positive definite matrix. Dawid and Lauritzen (1993), Giudici and Green (1999), Armstrong et al. (2006) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 20

Conditionally on G , the HIW distribution is conjugate � � n � y i � y i � T Σ G | y 1 , . . . , y n , G , δ G , Φ G ∼ HIW G δ G + n, Φ G + . (2) i =1 Moreover, for such a prior, h G ( δ G , Φ G ) f ( y 1 , . . . , y n |G , δ G , Φ G ) = � � n � y i � y i � T (2 π ) − np/ 2 h G δ G + n, Φ G + i =1 where h G is the normalizing constant of the HIW distribution associated to the graph G . Roverato (2002) extends Hyper-Inverse Wishart distribution to non- decomposable case. Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 21

i =1 y i � y i � T . Let Y = ( y 1 , . . . , y n ) and S Y = � n If we assume a uniform prior distribution in the space of graphs, π ( G ) ∝ 1: π ( G| Y , δ G , Φ G ) ∝ f ( Y |G , δ G , Φ G ) . Uniform distribution on the space of graphs typically not satisfactory: with p vertices, the number of possible edges is equal to p ( p − 1) and, for 2 an uniform prior over all graphs, the prior number of edges has mode around p ( p − 1) . 4 Wong, Carter and Kohn (2003), Jones et al. (2005), Armstrong et al. (2009), Carvalho and Scott (2009) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 22

An alternative to the naive uniform prior is to set a Bernouilli distribution of parameter r on the inclusion or not of each edge: p ( p − 1) − k G , π ( G| r ) ∝ r k G (1 − r ) 2 where k G is the number of edges of G . The parameter r has to be calibrate. If r = 1 / 2, this prior resumes to the uniform one. We deduce easily that h G ( δ G , Φ G ) π ( G| Y , δ G , Φ G , r ) ∝ h G ( δ G + n, Φ G + S Y ) π ( G| r ) . (3) Journ´ ee MSTGA, CIRAD Montpellier, 01/06/2011 Page 23

An empirical Bayes procedure for the selection of Gaussian graphical - PowerPoint PPT Presentation

An empirical Bayes procedure for the selection of Gaussian graphical models Estimation bay esienne pour les mod` eles graphiques gaussiens d ecomposables Jean-Michel Marin I3M, Universit e Montpellier 2 joint with Sophie Donnet ,

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

8/29/2015 Effect of Empirical Left Atrial Appendage Isolation on Effect of Empirical Left Atrial

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Optimizing Procedure Calls Inlining Procedure calls can be costly (A.k.a. procedure integration,

Optimizing Procedure Calls Inlining Procedure calls can be costly (A.k.a. procedure integration,

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Chapter 9: Competition From: Gause 1934 Competitive exclusion and co-existence Asterionella

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #8 Special Topics: Pharmaceuticals in Water I

ANNUAL SESSION 2014 President's Report to the 51st Annual Session of the Ghana Baptist Convention

Outline Wh y Mac hine Learning What is a w elldened learning problem

Course Projects Sep 13, 2012 Course Projects Covers 50% of your grade 10-12 weeks of work

3D Scanning Dr. Francesco Banterle, francesco.banterle@isti.cnr.it banterle.com/francesco What

Lunar Reflectance in Lyman Julie Feldt University of Kansas Mentors : Marty Snow and Greg

An empirical Bayes procedure for the selection of Gaussian graphical - PowerPoint PPT Presentation

An empirical Bayes procedure for the selection of Gaussian graphical models Estimation bay esienne pour les mod` eles graphiques gaussiens d ecomposables Jean-Michel Marin I3M, Universit e Montpellier 2 joint with Sophie Donnet ,

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

8/29/2015 Effect of Empirical Left Atrial Appendage Isolation on Effect of Empirical Left Atrial

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabs Pczos Empirical Risk

Optimizing Procedure Calls Inlining Procedure calls can be costly (A.k.a. procedure integration,

Optimizing Procedure Calls Inlining Procedure calls can be costly (A.k.a. procedure integration,

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Chapter 9: Competition From: Gause 1934 Competitive exclusion and co-existence Asterionella

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

CEE 697K ENVIRONMENTAL REACTION KINETICS Lecture #8 Special Topics: Pharmaceuticals in Water I

ANNUAL SESSION 2014 President's Report to the 51st Annual Session of the Ghana Baptist Convention

Outline Wh y Mac hine Learning What is a w elldened learning problem

Course Projects Sep 13, 2012 Course Projects Covers 50% of your grade 10-12 weeks of work

3D Scanning Dr. Francesco Banterle, francesco.banterle@isti.cnr.it banterle.com/francesco What

Lunar Reflectance in Lyman Julie Feldt University of Kansas Mentors : Marty Snow and Greg

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?