local and global independence of parameters in discrete
play

LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN - PowerPoint PPT Presentation

LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN GRAPHICAL MODELS Jacek Wesoowski (GUS & Politechnika Warszawska , Warszawa) XLII Konferencja "STATYSTYKA MATEMATYCZNA" B edlewo, Nov. 28 - Dec. 2, 2016 with


  1. LOCAL and GLOBAL INDEPENDENCE of PARAMETERS in DISCRETE BAYESIAN GRAPHICAL MODELS Jacek Wesołowski (GUS & Politechnika Warszawska , Warszawa) XLII Konferencja "STATYSTYKA MATEMATYCZNA" B˛ edlewo, Nov. 28 - Dec. 2, 2016 with H. Massam (York Univ., Toronto)

  2. Plan Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3

  3. Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3

  4. Discrete model Let X = ( X v , v ∈ V ) be a random vector assuming values in I = × v ∈ V I v , where #( I v ) < ∞ , v ∈ V . We write p ( i ) := P X ( i ) = P ( X = i ) , i ∈ I . Let X 1 , . . . , X n be iid with distribution P X . Let n � M i = I ( X j = i ) , i ∈ I . j = 1 Then M = ( M i , i ∈ I ) has a multinomial distribution, i.e. � n � � p ( i ) m i , P ( M = m ) = m i ∈I � m = ( m i , i ∈ I ) , m i = n . i ∈I

  5. Dirichlet law as an a priori distribution Bayesian approach means that one imposes some distribution on π = ( p ( i ) , i ∈ I ) . Since the only restriction on π are: p ( i ) ≥ 0, i ∈ I and � i ∈I p ( i ) = 1 we need a probability measure supported on a unit simplex of proper dimension. A random vector ( Y 1 , . . . , Y r ) has a (classical) Dirichlet distribution D ( α i , i = 1 , . . . , r ) if the density of the distirbution of ( Y 1 , . . . , Y r − 1 ) has the form r Γ( α ) � y α i f ( y 1 , . . . , y r − 1 ) = i I T r ( y ) , � r i = 1 Γ( α i ) i = 1 where α = � r i = 1 α i oraz y r = 1 − y 1 − . . . − y r − 1 .

  6. Dirichlet conjugacy and moments If π = ( p ( i ) , i ∈ I ) has a Dirichlet distribution D ( α i , i ∈ I ) then a posteriori law is also Dirichlet π | M ∼ D ( α i + M i , i ∈ I ) . Exercise : Prove conjugacy of the Dirichlet law using only the form of its joint moments i ∈I ( α i ) ri � � p ( i ) r i = E , ( α ) r i ∈I i ∈I r i and ( a ) s = Γ( a + s ) where r = � Γ( a ) . Note that in this case the moments uniquely determine the distribution.

  7. Example Let X = ( X 1 , X 2 , X 2 ) assume values in I = { 0 , 1 } 3 . Obviously, P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 ) P ( X 3 = i 3 | X 1 = i 1 , X 2 = i 2 ) , This is different than the Markov structure imposed by P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 ) P ( X 3 = i 3 | X 2 = i 2 ) , associated with an ordered graph: 1 → 2 → 3. Equivalent to p ( 000 ) p ( 101 ) = p ( 100 ) p ( 001 ) (1) and p ( 010 ) p ( 111 ) = p ( 110 ) p ( 011 ) . (2)

  8. Example, cont. Conditions (1) and (2) are equivalent to each of the Markov structures imposed by two other ordered graphs (with skeleton 1 − 2 − 3) 1 ← 2 ← 3, i.e. P ( X = i ) = P ( X 1 = i 1 | X 2 = i 2 ) P ( X 2 = i 2 | X 3 = i 3 ) P ( X 3 = i 3 ); 1 ← 2 → 3, i.e. P ( X = i ) = P ( X 1 = i 1 | X 2 = i 2 ) P ( X 2 = i 2 ) P ( X 3 = i 3 | X 2 = i 2 ) .

  9. Example, cont.: prior on π We seek a convenient prior on π , which is a probab. measure on (5-dimensional) manifold in [ 0 , ∞ ) 8 described by equations:  x 1 + . . . + x 8 = 1 ,      x 1 x 2 = x 3 x 4 ,     x 5 x 6 = x 7 x 8 .  Some Dirichlet-like distribution would be fine!

  10. Example, cont. - one more ordered graph The graph 1 → 2 ← 3 introduces a different Markov structure P ( X = i ) = P ( X 1 = i 1 ) P ( X 2 = i 2 | X 1 = i 1 , X 3 = i 3 ) P ( X 3 = i 3 ) . Equivalently ( p ( 000 )+ p ( 010 ))( p ( 101 )+ p ( 111 )) = ( p ( 100 )+ p ( 110 ))( p ( 001 )+ p ( 011 )) . So we seek a probab. measure on a (6-dimensional) manifold in [ 0 , ∞ ) 8 defined by  x 1 + . . . + x 8 = 1 ,  ( x 1 + x 5 )( x 2 + x 6 ) = ( x 3 + x 7 )( x 4 + x 8 ) . 

  11. Introduction 1 Markovian structure imposed by DAGs 2 Local and global independence vs. HD law 3

  12. DAG For a graph G = ( V , E ) define a DAG (directed acycylic graph) with skeleton G by changing all unordered edges in E into arrows in a acyclic way. DAG can be identified with a parent function p : V → 2 V defined by p ( v ) = { w ∈ V : w → v } , v ∈ V and having the "acyclicity" property { v } ∩ p k ( v ) = ∅ . ∀ k ≥ 1 We will use also another function, q : V → 2 V defined by q ( v ) = { v } ∪ p ( v ) , v ∈ V .

  13. p -Markov model Let p be a DAG with a chordal skeleton G = ( V , E ) . X (or π = ( p ( i ) , i ∈ I ) ) is called p - Markov iff p v | p ( v ) � p ( i ) = P ( X = i ) = ∀ i ∈ I , i v | i p v , v ∈ V where p v | p ( v ) � := P ( X v = i v | X p ( v ) = i p ( v ) ) . i v | i p v v ∈ V Note that = p q ( v ) (( n , m )) p v | p ( v ) , m ∈ I v , n ∈ I p ( v ) , m | n p p ( v ) ( n ) where p A n = � j ∈I V \ A p (( j , n )) = P ( X A = n ) , n ∈ I A , A ⊂ V .

  14. Moral DAGs A DAG p with chordal skeleton G = ( V , E ) is moral if ∀ v ∈ V the subgraph induced in G by p ( v ) ⊂ V is complete. π is p ′ -Markov for a moral DAG p ′ with a chordal skeleton G iff π is p -Markov with respect to any moral p DAG with the same skeleton G . The family of DAGs with skeleton 1 − 2 − 3 splits into: moral DAGs 1 → 2 → 3 , 1 ← 2 ← 3 , 1 ← 2 → 3 an immoral DAG 1 → 2 ← 3 .

  15. Cliques and separators Let G = ( V , E ) be a chordal graph. Any induced maximal complete subgraph is called a clique . Denote C the set of cliques of G . A perfect ordering of cliques is a numbering C 1 , . . . , C K of element of C , such that ∀ j = 2 , . . . , K j − 1 � ∃ i < j : S j := C j ∩ C l ⊂ C i . l = 1 S = { ( S 1 = ∅ ) , S j , j = 2 , . . . , K } is called a set of separators .

  16. G -Markov model For a chordal G = ( V , E ) we say that X (or π ) is G -Markov if C ∈C p C ( i C ) � p ( i ) = S ∈S p S ( i S ) , i ∈ I , � where p A i A = P ( X A = i A ) and X A = ( X v , v ∈ A ) , for A ⊂ V . Equivalently, X (or π ) is p -Markov for (any) moral DAG p with skeleton G , i.e. p v | p ( v ) � i ∈ I . p ( i ) = i v | i p ( v ) , v ∈ V Equivalently, X w ⊥ X v | X V \{ w , v } { w , v } �∈ E . if only

  17. Dawid & Lauritzen, Ann. Statist. (1993) Assume that π is G -Markov where G = ( V , E ) is a chordal graph. We say that π has a hyper-Dirichlet distribution , HD ( α C m , m ∈ I C , C ∈ C ) , iff its moments are rC m ∈I C ( α C m ) m � � � p ( i ) r i = C ∈C E n , rS n ∈I S ( α S n ) � � i ∈I S ∈S where for S ∋ S ⊂ C ∈ C � α S α C n = m , n , n ∈ I S m ∈I C \ S and � r A m = r m , n , m ∈ I A . n ∈I V \ A

  18. HD distribution Equivalently, for any moral DAG p (with skeleton G ) in the decomposition p v | p ( v ) � p ( i ) = i ∈ I i v | i p ( v ) , v ∈ V the vectors of conditional probabilities ( p v | p ( v ) i v | i p ( v ) , i v ∈ I v ) , i p ( v ) ∈ I p ( v ) , v ∈ V , are independent and have classical Dirichlet distributions D ( α v | p ( v ) i v | i p ( v ) , i v ∈ I v ) . Then ∀ C ∈ C and ∀ i C ∈ I C i C = α v | p ( v ) α C if only C = { v } ∪ p ( v ) = q ( v ) . i v | i p ( v )

  19. Multinomial mixture Let X 1 , . . . , X m be observations on X and m � � � M = M i = I ( X k = i ) , i ∈ I . k = 1 The conditional law of M = � m k = 1 X k given π is a multinomial distribution with parameters m and π = ( p ( i ) , i ∈ I ) .

  20. HD as a conjugate prior law Th. If the a priori law of π is HD ( α C m , m ∈ I C , C ∈ C ) the posterior law of π | M is also hiper-Dirichlet, HD ( α C m + M C m , m ∈ I C , C ∈ C ) , where M C � m = M ( m , n ) , m ∈ I C . n ∈I V \ C

  21. Proof The generalized Bayes rule reads  �  i ∈I p ( i ) ri ( m � i ∈I p ( i ) mi m ) E � � � p ( i ) r i  = � E M = m . � i ∈I p ( i ) mi E ( m m ) � � i ∈I � Apply the moment formula for the HD distribution in the numerator and denominator:  �  � rC j + mC mS � j � α C ( α S n ) n � p ( i ) r i  = � � j � � � E M = m n , � � mC rS n + mS � ( α S n ) � j α C i ∈I j ∈I C n ∈I S C ∈C S ∈S � j where m A n = � j ∈I V \ A m ( n , j ) , n ∈ I A , A ⊂ V .

  22. Proof, cont. Since ( a ) b + c = ( a + b ) c , ( a ) b then the last formula gives  �  � rC � � α C j + m C j � � � p ( i ) r i  = C ∈C j ∈I C j � M = m E n . ✷ � rS n ∈I S ( α S n + m S n ) � � � i ∈I S ∈S �

  23. p -Dirichlet and P -Dirichlet distributins Let p be a moral DAG with a chordal skeleton G = ( V , E ) . A G -Markow random vector π has a p -Dirichlet law if only the random vectors ( p v | p ( v ) , m ∈ I v ) , n ∈ I p ( v ) , v ∈ V , m | n have (classical) Dirichlet laws and are independent. Let P be a family of moral DAGs with a chordal skeleton G = ( V , E ) . We say that G -Markov π has a P -Dirichlet distribution if it has a p -Dirichlet ∀ p ∈ P .

  24. HD as a special P -Dirichlet law Let P be a family of all moral DAGs with the chordal skeleton G . If G -Markov π has a P -Dirichlet distribution then π has a HD distribution. Question: Can we have a similar description of the HD law through a smaller family P ?

  25. p -perfect ordering of cliques Let p be a moral DAG with a (chordal) skeleton G = ( V , E ) . A perfect ordering of cliques o = ( C 1 , . . . , C K ) is called p -perfect (notation: o p ) if ∀ ℓ = 1 , . . . , K ∃ v ∈ C ℓ \ S ℓ : S ℓ = p ( v ) . Lemat. For any moral DAGu p there exists a p -perfect ordering of cliques.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend