Graphical Models Graphical Models Bayesian Networks Siamak - PowerPoint PPT Presentation

Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ?

Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ?

Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ? why?

Bayesian networks: conditioinal independencies Bayesian networks: conditioinal independencies quality of the letter (L) only depends on the grade (G) L ⊥ D , I , S ∣ G How about the following assertions? D ⊥ S ? D ⊥ S ∣ I ? D ⊥ S ∣ L ? why? read from the graph?

Conditional independencies (CI): Conditional independencies (CI): notation notation I ( P ) 1. set of all CIs of the distribution P 2. set of local CIs from the graph (DAG) ( G ) I ℓ I ( G ) 3. set of all ( global ) CIs from the graph

Local conditional independencies ( Local conditional independencies (CI CIs) s) for any node X ⊥ ∣ X NonDescendents Parents i i X X i i ( G ) = { I D ⊥ I , S ℓ I ⊥ D G ⊥ S ∣ I , D S ⊥ G , L , D ∣ I L ⊥ D , I , S ∣ G } graph G

Local CIs Local CIs from factorization from factorization use the factorized form P ( X ) = P ( X ∣ ) ∏ i Pa i X i to show ∀ X i P ( X , NonDesc ∣ ) = P ( X ∣ ) P ( NonDesc ∣ ) Pa Pa Pa i X X i X X X i i i i i which means ⊥ ∣ X NonDesc Pa i X X i i

Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G )

Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( G , S ∣ I ) = = P ( D , I , G , S , L ) ∑ d , g , s , l

Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l

Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( I ) P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( I ) P ( D ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l

Local CIs Local CIs from factorization: from factorization: example example given S ⊥ G ∣ I P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( D , I , G , S , L ) P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , l P ( G , S ∣ I ) = = = P ( D , I , G , S , L ) ∑ d , g , s , l P ( D ) P ( I ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( I ) P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( I ) P ( D ) P ( G ∣ D , I ) P ( S ∣ I ) P ( L ∣ G ) ∑ d , g , s , l P ( S ∣ I ) P ( D ) P ( G ∣ D , I ) P ( L ∣ G ) ∑ d , l = P ( S ∣ I ) P ( G ∣ I ) 1

Factorization Factorization from local CIs from local CIs from local CI s ( G ) = { X ⊥ ∣ ∣ i } I NonDesc Pa ℓ i X X i i find a topological ordering (parents before children): , … , X X i i 1 n use the chain rule n P ( X ) = P ( X ) P ( X ∣ , … , X ) 1 ∏ j =2 X i i i i 1 j −1 j simplify using local CIs n P ( X ) = P ( X ) P ( X ∣ ) 1 ∏ j =2 Pa i i X j i j

Factorization Factorization from local CIs: from local CIs: example example local CIs ( G ) = { I ( D ⊥ I , S ), ( I ⊥ D ) , ( G ⊥ S ∣ I ), ℓ ( S ⊥ G , L , D ∣ I ), ( L ⊥ D , I , S ∣ G ) } a topological ordering: D, I, G, L, S use the chain rule P ( D , I , G , S , L ) = P ( D ) P ( I ∣ D ) P ( G ∣ D , I ) P ( L ∣ D , I , G ) P ( S ∣ D , I , G , L ) simplify using ( G ) I ℓ P ( D , I , G , S , L ) = P ( D ) P ( I ) P ( G ∣ D , I ) P ( L ∣ G ) P ( S ∣ I )

⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to G

⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to ( G ) ⊆ I ( P ) I G ℓ

⇔ Factorization local CIs Factorization local CIs ⇔ P ( X ) = G holds in P P ( X ∣ Pa ) ∏ i ( G ) I ℓ i X i P factorizes according to ( G ) ⊆ I ( P ) I G ℓ is an I-map for P G it does not mislead us about independencies in P

Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN

Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN Example: { 1/12, if x ⊗ y ⊗ z = 0 p ( x , y , z ) = 1/6, if x ⊗ y ⊗ z = 1 ( X ⊥ Y ), ( Y ⊥ Z ), ( X ⊥ Z ) ∈ I ( P ) ( X ⊥ Y ∣ Z ), ( Y ⊥ Z ∣ Z ), ( X ⊥ Z ∣ Y ) ∈ / I ( P )

Summary Summary so far so far simplification of the chain rule P ( X ) = P ( X ∣ ) ∏ i Pa i X i Bayes-net represented using a DAG naive Bayes local conditional independencies I = { X ⊥ ∣ ∣ i } NonDesc Pa i X X i i hold in a Bayes-net imply a Bayes-net Note: motivation is not just compressed representation, but faster inference and learning as well

Global Global CIs CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs

Global Global CIs CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs ⇒ factorized form of P global CIs ( G ) ⊆ I ( G ) ⊆ I ( P ) I ℓ

Global CIs Global CIs from the graph from the graph for any subset of vars X, Y and Z, we can ask X ⊥ Y ∣ Z ? global CI: the set of all such CIs ⇒ factorized form of P global CIs ( G ) ⊆ I ( G ) ⊆ I ( P ) I ℓ Example: C ⊥ D ∣ B , F ? algorithm: directed separation ( D-separation )

Three canonical settings Three canonical settings for three random variables 1. causal / evidence trail Z Y X P ( X , Y , Z ) = P ( X ) P ( Y ∣ X ) P ( Z ∣ Y )

 Three canonical settings Three canonical settings for three random variables 1. causal / evidence trail Z Y X P ( X , Y , Z ) = P ( X ) P ( Y ∣ X ) P ( Z ∣ Y ) marginal independence: P ( X , Z ) = P ( X ) P ( Z ) conditional Independence: P ( X , Y , Z ) P ( X ) P ( Y ∣ X ) P ( Z ∣ Y ) P ( Z ∣ X , Y ) = = = P ( Z ∣ Y ) P ( X , Y ) P ( X ) P ( Y ∣ X )

Three canonical settings Three canonical settings 2. common cause Y Z X P ( X , Y , Z ) = P ( Y ) P ( X ∣ Y ) P ( Z ∣ Y )

 Three canonical settings Three canonical settings 2. common cause Y Z X P ( X , Y , Z ) = P ( Y ) P ( X ∣ Y ) P ( Z ∣ Y ) marginal independence: P ( X , Z ) = P ( X ) P ( Z ) conditional independence: P ( X , Y , Z ) P ( X , Z ∣ Y ) = = P ( X ∣ Y ) P ( Z ∣ Y ) P ( Y )

Three canonical settings Three canonical settings X 3. common effect Z a.k.a. collider, v-structure Y P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z )

Three canonical settings Three canonical settings X 3. common effect Z a.k.a. collider, v-structure Y P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z ) marginal independence: P ( X , Z ) = P ( X , Y , Z ) = P ( X ) P ( Z ) P ( Y ∣ X , Z ) = P ( X ) P ( Z ) ∑ Y ∑ Y conditional independence: P ( X , Y , Z )  P ( X , Z ∣ Y ) = = P ( X ∣ Y ) P ( Z ∣ Y ) P ( Y )

 Three canonical settings Three canonical settings 3. common effect X Z Y conditional Independence: w P ( X , Z ∣ W ) = P ( X ∣ W ) P ( Z ∣ W ) even observing a descendant of Y makes X, Z dependent

Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X Y Z 1 2 1 1 2 X 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 Z 1

Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X X Y Z 2 1 2 1 1 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 so far X ⊥ Y ∣ Z Z 1

Putting the three cases together Putting the three cases together , X ⊥ ∣ , Z ? X Y Z 1 2 1 1 2 X 2 consider all paths between variables in X and Y Z 2 X 1 Y 1 had we not observed Z 1 ( X , X ⊥ ∣ ) ∈ I ( G ) Y Z 1 2 1 2 Z 1

D-seperation D-seperation (a.k.a. Bayes-Ball algorithm) X ⊥ Y ∣ Z ? See whether at least one ball from X reaches Y Z is shaded image from:https://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

D-separation: D-separation: algorithm algorithm Linear time complexity input: graph G and X, Y, Z output: X ⊥ Y ∣ Z ? mark the variables in Z and all of their ancestors in G breadth-first-search starting from X stop any trail that reaches a blocked node a node in Y is reached? unmarked middle of a collider (V-structure) in Z and not a collider

D-separation D-separation quiz quiz

D-separation D-separation quiz quiz G ⊥ S ∣ ∅?

D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ?

D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅?

D-separation D-separation quiz quiz G ⊥ S ∣ ∅? D ⊥ L ∣ G ? D ⊥ I , S ∣ ∅? D , L ⊥ S ∣ I , G ?

Summary Summary graph and distribution are combined: factorization of the distribution according to the graph P ( X ) = G P ( X ∣ ) ∏ i Pa i X i conditional independencies of the distribution inferred from the graph local CI: ⊥ ∣ X NonDescendents Parents i X X i i global CI: D-separation

Summary Summary factorization of the distribution identify the same local conditional independencies family of distributions global conditional independencies

Bonus slides Bonus slides

Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) P factorizes on both of these graphs

Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) P factorizes on both of these graphs From d-separation algorithm it is sufficient same undirected skeleton same v-structures

Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) different v-structures, yet ′ I ( G ) = I ( G ) = ∅

Equivalence class of DAGs Equivalence class of DAGs ′ Two DAGs are I-equivalent if I ( G ) = I ( G ) different v-structures, yet ′ I ( G ) = I ( G ) = ∅ here, v-structures are irrelevant for I-equivalent because: parents are connected ( moral parents! )

Equivalence class of DAGs Equivalence class of DAGs Two DAGs are I-equivalent if I ( G ) = I ( G ) ′ same undirected skeleton ′ I ( G ) = I ( G ) ⇔ same immoralities

Equivalence class of DAGs Equivalence class of DAGs Two DAGs are I-equivalent if I ( G ) = I ( G ) ′ same undirected skeleton ′ I ( G ) = I ( G ) ⇔ same immoralities X X Y Y X ⊥ Y ∣ Z ? Z Z

I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? X Y X Y W W Z Z

I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? no! X Y X Y W W Z Z

I-Equivalence I-Equivalence quiz quiz do these DAGs have the same set of CIs? no! X Y X Y W W Z Z X ⊥ Z ∣ W

Minimal I-map Minimal I-map which graph G to use for P? G is minimal I-map for P: G is an I-map for P: I ( G ) ⊆ I ( P ) removing any edge destroys this property Example: P ( X , Y , Z , W ) = P ( X ∣ Y , Z ) P ( W ) P ( Y ∣ Z ) P ( Z ) Z Y Z Z Z Y Y Y X X X X W W W W IMAP min. IMAP NOT an IMAP min. IMAP

Minimal I-map Minimal I-map from CI from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G X X X 1 i n for i=1...n find minimal s.t. U ⊆ { X U ∣ U ) , … , X } ( X ⊥ , … , X − X 1 i −1 1 i −1 i set U ← ⊥ ∣ Pa X NonDesc Pa X i X X i i i

Minimal I-map from CI Minimal I-map from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G different orderings give different graphs

Minimal I-map from CI Minimal I-map from CI which graph G to use for P? input: or an oracle; an ordering I ( P ) , … , X X 1 n output: a minimal I-map G different orderings give different graphs Example: L,S,G,I,D L,D,S,I,G D,I,S,G,L (a topological ordering)

Perfect MAP ( Perfect MAP (P-MAP P-MAP) which graph G to use for P? L,D,S,I,G D,I,S,G,L L,S,G,I,D all the graphs above are minimal I-MAPs I ( G ) ⊆ I ( P ) Perfect MAP: I ( G ) = I ( P )

Perfect map ( Perfect map (P-map P-map) which graph G to use for P? Perfect MAP: I ( G )= I ( P ) P may not have a P-map in the form of BN

Graphical Models Graphical Models Bayesian Networks Siamak - PowerPoint PPT Presentation

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on Previously on Probabilistic Graphical Models Probabilistic Graphical Models Probability distribution and density functions Random variable

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed & undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

rt sts ssts r

Toward an Automated Vulnerability Comparison of Open Source IMAP Servers Chaos Golubitsky

Delegating Network Security with More Information with More Information {Stanford: [ Jad

IMAP, JMAP and the future of email standards Bron Gondwana <brong@fastmailteam.com> 1 / 13

Chapter 2 Application Layer

PixelVault:+Using+GPUs+for+Securing+ Cryptographic+Opera;ons+ ! Giorgos+Vasiliadis + +

Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski , Anthony

Towards a theory of Undo Aaron Brown UC Berkeley June 2002 ROC Retreat Outline Recap of

Graphical Models Graphical Models Bayesian Networks Siamak - PowerPoint PPT Presentation

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on Previously on Probabilistic Graphical Models Probabilistic Graphical Models Probability distribution and density functions Random variable

Transforming Graphical System Models to Graphical Attack Models ! Joint work with Marieta

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Undirected Graphical Models Aaron Courville, Universit de Montral 2 (UNDIRECTED) GRAPHICAL

Graphical models Review Graphical models (Bayes nets, Markov random fields, factor graphs) !

Probabilistic Graphical Models CMSC 691 UMBC Two Problems for Graphical Models 1 ,

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Graphical Models Graphical Models Relationship between the directed &amp; undirected models

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

Graphical &gt; Tangible? What are their limitations? 93 94 Graphical &gt; Tangible? Graphical

Graphical Screen Design Grids are an essential tool for graphical design Important graphical

10/4/15 Graphical Programming (1) Maze Program TOPICS Graphical Programming Using

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

rt sts ssts r

Toward an Automated Vulnerability Comparison of Open Source IMAP Servers Chaos Golubitsky

Delegating Network Security with More Information with More Information {Stanford: [ Jad

IMAP, JMAP and the future of email standards Bron Gondwana &lt;brong@fastmailteam.com&gt; 1 / 13

Chapter 2 Application Layer

PixelVault:+Using+GPUs+for+Securing+ Cryptographic+Opera;ons+ ! Giorgos+Vasiliadis + +

Analysis of wide area user mobility patterns Kevin Simler*, Steven E. Czerwinski , Anthony

Towards a theory of Undo Aaron Brown UC Berkeley June 2002 ROC Retreat Outline Recap of

Graphical Models Graphical Models Relationship between the directed & undirected models

Graphical > Tangible? What are their limitations? 93 94 Graphical > Tangible? Graphical

IMAP, JMAP and the future of email standards Bron Gondwana <brong@fastmailteam.com> 1 / 13