LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN - PowerPoint PPT Presentation

LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN GRAPHICAL MODELS Jacek Wesołowski (GUS & Politechnika Warszawska , Warszawa) XLII Konferencja "STATYSTYKA MATEMATYCZNA" B˛ edlewo, Nov. 28 - Dec. 2, 2016 with H. Massam (York Univ., Toronto)

Plan Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3

Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3

Discrete model Let X = ( X v , v ∈ V ) be a random vector assuming values in I = × v ∈ V I v , where #( I v ) < ∞ , v ∈ V . We write p ( i ) := P X ( i ) = P ( X = i ) , i ∈ I . Let X 1 , . . . , X n be iid with distribution P X . Let n � M i = I ( X j = i ) , i ∈ I . j = 1 Then M = ( M i , i ∈ I ) has a multinomial distribution, i.e. � n � � p ( i ) m i , P ( M = m ) = m i ∈I � m = ( m i , i ∈ I ) , m i = n . i ∈I

Dirichlet law as an a priori distribution Bayesian approach means that one imposes some distribution on π = ( p ( i ) , i ∈ I ) . Since the only restriction on π are: p ( i ) ≥ 0, i ∈ I and � i ∈I p ( i ) = 1 we need a probability measure supported on a unit simplex of proper dimension. A random vector ( Y 1 , . . . , Y r ) has a (classical) Dirichlet distribution D ( α i , i = 1 , . . . , r ) if the density of the distirbution of ( Y 1 , . . . , Y r − 1 ) has the form r Γ( α ) � y α i f ( y 1 , . . . , y r − 1 ) = i I T r ( y ) , � r i = 1 Γ( α i ) i = 1 where α = � r i = 1 α i oraz y r = 1 − y 1 − . . . − y r − 1 .

Dirichlet conjugacy and moments If π = ( p ( i ) , i ∈ I ) has a Dirichlet distribution D ( α i , i ∈ I ) then a posteriori law is also Dirichlet π | M ∼ D ( α i + M i , i ∈ I ) . Exercise : Prove conjugacy of the Dirichlet law using only the form of its joint moments i ∈I ( α i ) ri � � p ( i ) r i = E , ( α ) r i ∈I i ∈I r i and ( a ) s = Γ( a + s ) where r = � Γ( a ) . Note that in this case the moments uniquely determine the distribution.

Example Let X = ( X 1 , X 2 , X 2 ) assume values in I = { 0 , 1 } 3 . Obviously, P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 ) P ( X 3 = i 3 | X 1 = i 1 , X 2 = i 2 ) , This is different than the Markov structure imposed by P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 ) P ( X 3 = i 3 | X 2 = i 2 ) , associated with an ordered graph: 1 → 2 → 3. Equivalent to p ( 000 ) p ( 101 ) = p ( 100 ) p ( 001 ) (1) and p ( 010 ) p ( 111 ) = p ( 110 ) p ( 011 ) . (2)

Example, cont. Conditions (1) and (2) are equivalent to each of the Markov structures imposed by two other ordered graphs (with skeleton 1 − 2 − 3) 1 ← 2 ← 3, i.e. P ( X = i ) = P ( X 1 = i 1 | X 2 = i 2 ) P ( X 2 = i 2 | X 3 = i 3 ) P ( X 3 = i 3 ); 1 ← 2 → 3, i.e. P ( X = i ) = P ( X 1 = i 1 | X 2 = i 2 ) P ( X 2 = i 2 ) P ( X 3 = i 3 | X 2 = i 2 ) .

Example, cont.: prior on π We seek a convenient prior on π , which is a probab. measure on (5-dimensional) manifold in [ 0 , ∞ ) 8 described by equations:  x 1 + . . . + x 8 = 1 ,      x 1 x 2 = x 3 x 4 ,     x 5 x 6 = x 7 x 8 .  Some Dirichlet-like distribution would be fine!

Example, cont. - one more ordered graph The graph 1 → 2 ← 3 introduces a different Markov structure P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 , X 3 = i 3 ) P ( X 3 = i 3 ) . Equivalently ( p ( 000 )+ p ( 010 ))( p ( 101 )+ p ( 111 )) = ( p ( 100 )+ p ( 110 ))( p ( 001 )+ p ( 011 )) . So we seek a probab. measure on a (6-dimensional) manifold in [ 0 , ∞ ) 8 defined by  x 1 + . . . + x 8 = 1 ,  ( x 1 + x 5 )( x 2 + x 6 ) = ( x 3 + x 7 )( x 4 + x 8 ) . 

Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3

DAG For a graph G = ( V , E ) define a DAG (directed acycylic graph) with skeleton G by changing all unordered edges in E into arrows in a acyclic way. DAG can be identified with a parent function p : V → 2 V defined by p ( v ) = { w ∈ V : w → v } , v ∈ V and having the "acyclicity" property { v } ∩ p k ( v ) = ∅ . ∀ k ≥ 1 We will use also another function, q : V → 2 V defined by q ( v ) = { v } ∪ p ( v ) , v ∈ V .

p -Markov model Let p be a DAG with a chordal skeleton G = ( V , E ) . X (or π = ( p ( i ) , i ∈ I ) ) is called p - Markov iff p v | p ( v ) � p ( i ) = P ( X = i ) = ∀ i ∈ I , i v | i p v , v ∈ V where p v | p ( v ) � := P ( X v = i v | X p ( v ) = i p ( v ) ) . i v | i p v v ∈ V Note that = p q ( v ) (( n , m )) p v | p ( v ) , m ∈ I v , n ∈ I p ( v ) , m | n p p ( v ) ( n ) where p A n = � j ∈I V \ A p (( j , n )) = P ( X A = n ) , n ∈ I A , A ⊂ V .

Moral DAGs A DAG p with chordal skeleton G = ( V , E ) is moral if ∀ v ∈ V the subgraph induced in G by p ( v ) ⊂ V is complete. π is p ′ -Markov for a moral DAG p ′ with a chordal skeleton G iff π is p -Markov with respect to any moral p DAG with the same skeleton G . The family of DAGs with skeleton 1 − 2 − 3 splits into: moral DAGs 1 → 2 → 3 , 1 ← 2 ← 3 , 1 ← 2 → 3 an immoral DAG 1 → 2 ← 3 .

Cliques and separators Let G = ( V , E ) be a chordal graph. Any induced maximal complete subgraph is called a clique . Denote C the set of cliques of G . A perfect ordering of cliques is a numbering C 1 , . . . , C K of element of C , such that ∀ j = 2 , . . . , K j − 1 � ∃ i < j : S j := C j ∩ C l ⊂ C i . l = 1 S = { ( S 1 = ∅ ) , S j , j = 2 , . . . , K } is called a set of separators .

G -Markov model For a chordal G = ( V , E ) we say that X (or π ) is G -Markov if C ∈C p C ( i C ) � p ( i ) = S ∈S p S ( i S ) , i ∈ I , � where p A i A = P ( X A = i A ) and X A = ( X v , v ∈ A ) , for A ⊂ V . Equivalently, X (or π ) is p -Markov for (any) moral DAG p with skeleton G , i.e. p v | p ( v ) � i ∈ I . p ( i ) = i v | i p ( v ) , v ∈ V Equivalently, X w ⊥ X v | X V \{ w , v } { w , v } �∈ E . if only

Dawid & Lauritzen, Ann. Statist. (1993) Assume that π is G -Markov where G = ( V , E ) is a chordal graph. We say that π has a hyper-Dirichlet distribution , HD ( α C m , m ∈ I C , C ∈ C ) , iff its moments are rC m ∈I C ( α C m ) m � � � p ( i ) r i = C ∈C E n , rS n ∈I S ( α S n ) � � i ∈I S ∈S where for S ∋ S ⊂ C ∈ C � α S α C n = m , n , n ∈ I S m ∈I C \ S and � r A m = r m , n , m ∈ I A . n ∈I V \ A

HD distribution Equivalently, for any moral DAG p (with skeleton G ) in the decomposition p v | p ( v ) � p ( i ) = i ∈ I i v | i p ( v ) , v ∈ V the vectors of conditional probabilities ( p v | p ( v ) i v | i p ( v ) , i v ∈ I v ) , i p ( v ) ∈ I p ( v ) , v ∈ V , are independent and have classical Dirichlet distributions D ( α v | p ( v ) i v | i p ( v ) , i v ∈ I v ) . Then ∀ C ∈ C and ∀ i C ∈ I C i C = α v | p ( v ) α C if only C = { v } ∪ p ( v ) = q ( v ) . i v | i p ( v )

Multinomial mixture Let X 1 , . . . , X m be observations on X and m � � � M = M i = I ( X k = i ) , i ∈ I . k = 1 The conditional law of M = � m k = 1 X k given π is a multinomial distribution with parameters m and π = ( p ( i ) , i ∈ I ) .

HD as a conjugate prior law Th. If the a priori law of π is HD ( α C m , m ∈ I C , C ∈ C ) the posterior law of π | M is also hiper-Dirichlet, HD ( α C m + M C m , m ∈ I C , C ∈ C ) , where M C � m = M ( m , n ) , m ∈ I C . n ∈I V \ C

Proof The generalized Bayes rule reads  �  i ∈I p ( i ) ri ( m � i ∈I p ( i ) mi m ) E � � � p ( i ) r i  = � E M = m . � i ∈I p ( i ) mi E ( m m ) � � i ∈I � Apply the moment formula for the HD distribution in the numerator and denominator:  �  � rC j + mC mS � j � α C ( α S n ) n � p ( i ) r i  = � � j � � � E M = m n , � � mC rS n + mS � ( α S n ) � j α C i ∈I j ∈I C n ∈I S C ∈C S ∈S � j where m A n = � j ∈I V \ A m ( n , j ) , n ∈ I A , A ⊂ V .

Proof, cont. Since ( a ) b + c = ( a + b ) c , ( a ) b then the last formula gives  �  � rC � � α C j + m C j � � � p ( i ) r i  = C ∈C j ∈I C j � M = m E n . ✷ � rS n ∈I S ( α S n + m S n ) � � � i ∈I S ∈S �

p -Dirichlet and P -Dirichlet distributins Let p be a moral DAG with a chordal skeleton G = ( V , E ) . A G -Markow random vector π has a p -Dirichlet law if only the random vectors ( p v | p ( v ) , m ∈ I v ) , n ∈ I p ( v ) , v ∈ V , m | n have (classical) Dirichlet laws and are independent. Let P be a family of moral DAGs with a chordal skeleton G = ( V , E ) . We say that G -Markov π has a P -Dirichlet distribution if it has a p -Dirichlet ∀ p ∈ P .

HD as a special P -Dirichlet law Let P be a family of all moral DAGs with the chordal skeleton G . If G -Markov π has a P -Dirichlet distribution then π has a HD distribution. Question: Can we have a similar description of the HD law through a smaller family P ?

p -perfect ordering of cliques Let p be a moral DAG with a (chordal) skeleton G = ( V , E ) . A perfect ordering of cliques o = ( C 1 , . . . , C K ) is called p -perfect (notation: o p ) if ∀ ℓ = 1 , . . . , K ∃ v ∈ C ℓ \ S ℓ : S ℓ = p ( v ) . Lemat. For any moral DAGu p there exists a p -perfect ordering of cliques.

LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN - PowerPoint PPT Presentation

LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN GRAPHICAL MODELS Jacek Wesoowski (GUS & Politechnika Warszawska , Warszawa) XLII Konferencja "STATYSTYKA MATEMATYCZNA" B edlewo, Nov. 28 - Dec. 2, 2016 with

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

Camera Parameters INEL 6088 Computer Vision Camera Parameters Extrinsic parameters: define

Not Too Different From Discrete... A Note on Independence For continuous RVs, what is weird about

CMSC 222: Discrete Mathematics Prof S Fall 2018 What is Discrete Mathematics? Discrete

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

Cyber-Physical Systems Discrete Dynamics IECE 553/453 Fall 2019 Prof. Dola Saha 1 Discrete

Cyber-Physical Systems Discrete Dynamics ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Discrete

Discrete-time Systems in the Time Domain Chaiwoot Boonyasiriwat August 21, 2020 Discrete-time

Two-Port Networks Definitions Impedance Parameters Admittance Parameters Hybrid

+ + Review n function parts: n parameters n no parameters n return type n multiple

Chapter 4 Parameters 1 Parameters T wo methods of passing arguments to parameters

One-Port v Admittance Parameters Network - Hybrid Parameters i' 1 Transmission

Order Independence Krzysztof R. Apt CWI and University of Amsterdam Order Independence p.

Categorical data Modelling and Independence R.W. Oldford Eikosograms - Dependence/independence

CS 327E Class 11 November 25, 2019 Announcements Milestone 12: What: Group Presentations.

Kokkos Task-DAG: Photos placed in Memory Management and Locality horizontal position with even

Modelling Fashion @ About wehkamp About Wehkamp Digital Development at Wehkamp 1952 - founded

Information-Theoretic Implications of Classical and Quantum Causal Structures Rafael Chaves QIP

Parallel Computing Basics Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

CS 170 Section 10 Search Problems and Intractability Owen Jow owenjow@berkeley.edu 4/04

Dr. Ampl A Meta Solver for Optimization Dominique Orban Bob Fourer cole Polytechnique de

Directed Graphical Models Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134)