Bayesian networks Andrea Passerini passerini@disi.unitn.it Machine - PowerPoint PPT Presentation

Bayesian networks Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian networks

Graphical models Why All probabilistic inference and learning amount at repeated applications of the sum and product rules Probabilistic graphical models are graphical representations of the qualitative aspects of probability distributions allowing to: visualize the structure of a probabilistic model in a simple and intuitive way discover properties of the model, such as conditional independencies, by inspecting the graph express complex computations for inference and learning in terms of graphical manipulations represent multiple probability distributions with the same graph, abstracting from their quantitative aspects (e.g. discrete vs continuous distributions) Bayesian networks

Bayesian Networks (BN) BN Semantics A BN structure ( G ) is a directed x 1 graphical model x 2 x 3 Each node represents a random variable x i Each edge represents a direct x 4 x 5 dependency between two variables x 6 x 7 The structure encodes these independence assumptions: I ℓ ( G ) = {∀ i x i ⊥ NonDescendants x i | Parents x i } each variable is independent of its non-descendants given its parents Bayesian networks

Bayesian Networks Graphs and Distributions x 1 Let p be a joint distribution over x 2 x 3 variables X Let I ( p ) be the set of independence assertions holding in p x 4 x 5 G in as independency map (I-map) for p if p satisfies the x 6 x 7 local independences in G : I ℓ ( G ) ⊆ I ( p ) Note The reverse is not necessarily true: there can be independences in p that are not modelled by G . Bayesian networks

Bayesian Networks Factorization x 1 We say that p factorizes x 2 x 3 according to G if: m � p ( x 1 , . . . , x m ) = p ( x i | Pa x i ) x 4 x 5 i = 1 If G is an I-map for p , then p x 6 x 7 factorizes according to G If p factorizes according to G , then G is an I-map for p Example p ( x 1 , . . . , x 7 ) = p ( x 1 ) p ( x 2 ) p ( x 3 ) p ( x 4 | x 1 , x 2 , x 3 ) p ( x 5 | x 1 , x 3 ) p ( x 6 | x 4 ) p ( x 7 | x 4 , x 5 ) Bayesian networks

Bayesian Networks Proof: I-map ⇒ factorization If G is an I-map for p , then p satisfies (at least) these (local) 1 independences: {∀ i x i ⊥ NonDescendants x i | Parents x i } Let us order variables in a topological order relative to G , 2 i.e.: x i → x j ⇒ i < j Let us decompose the joint probability using the chain rule 3 as: m � p ( x 1 , . . . , x m ) = p ( x i | x 1 , . . . , x i − 1 ) i = 1 Local independences imply that for each x i : 4 p ( x i | x 1 , . . . , x i − 1 ) = p ( x i | Pa x i ) Bayesian networks

Bayesian Networks Proof: factorization ⇒ I-map If p factorizes according to G , the joint probability can be 1 written as: m � p ( x i | Pa x i ) p ( x 1 , . . . , x m ) = i = 1 Let us consider the last variable x m (repeat steps for the 2 other variables). By the product and sum rules: p ( x 1 , . . . , x m ) p ( x 1 , . . . , x m ) p ( x m | x 1 , . . . , x m − 1 ) = p ( x 1 , . . . , x m − 1 ) = � x m p ( x 1 , . . . , x m ) Applying factorization and isolating the only term 3 containing x m we get: p ( x m | Pa x m ) ✭✭✭✭✭✭✭✭ � m − 1 � m i = 1 p ( x i | Pa x i ) i = 1 p ( x i | Pa x i ) = i = 1 p ( x i | Pa x i ) = � m ✿ 1 � ✘ ✭✭✭✭✭✭✭✭ ✘✘✘✘✘✘✘✘ x m � m − 1 i = 1 p ( x i | Pa x i ) x m p ( x m | Pa x m ) � Bayesian networks

Bayesian Networks Definition A Bayesian Network is a pair ( G , p ) where p factorizes over G and it is represented as a set of conditional probability distributions (cpd) associated with the nodes of G . Factorized Probability m � p ( x 1 , . . . , x m ) = p ( x i | Pa x i ) i = 1 Bayesian networks

Bayesian Networks Example: toy regulatory network Genes A and B have independent prior probabilities Gene C can be enhanced by both A and B gene value P(value) A 0.3 active A 0.7 inactive gene value P(value) B 0.3 active B 0.7 inactive A active inactive B B active inactive active inactive C 0.9 0.6 0.7 0.1 active C 0.1 0.4 0.3 0.9 inactive Bayesian networks

Conditional independence Introduction Two variables a , b are independent (written a ⊥ b | ∅ ) if: p ( a , b ) = p ( a ) p ( b ) Two variables a , b are conditionally independent given c (written a ⊥ b | c ) if: p ( a , b | c ) = p ( a | c ) p ( b | c ) Independence assumptions can be verified by repeated applications of sum and product rules Graphical models allow to directly verify them through the d-separation criterion Bayesian networks

d-separation Tail-to-tail c Joint distribution: p ( a , b , c ) = p ( a | c ) p ( b | c ) p ( c ) a b a and b are not independent (written a ⊤ ⊤ b | ∅ ): � p ( a , b ) = p ( a | c ) p ( b | c ) p ( c ) � = p ( a ) p ( b ) c c a and b are conditionally independent given c : p ( a , b | c ) = p ( a , b , c ) a b = p ( a | c ) p ( b | c ) p ( c ) c is tail-to-tail wrt to the path a → b as it is connected to the tails of the two arrows Bayesian networks

d-separation Head-to-tail a c b Joint distribution: p ( a , b , c ) = p ( b | c ) p ( c | a ) p ( a ) = p ( b | c ) p ( a | c ) p ( c ) a and b are not independent : � p ( a , b ) = p ( a ) p ( b | c ) p ( c | a ) � = p ( a ) p ( b ) c a c b a and b are conditionally independent given c : p ( a , b | c ) = p ( b | c ) p ( a | c ) p ( c ) = p ( b | c ) p ( a | c ) p ( c ) c is head-to-tail wrt to the path a → b as it is connected to the head of an arrow and to the tail of the other one Bayesian networks

d-separation Head-to-head a b Joint distribution: p ( a , b , c ) = p ( c | a , b ) p ( a ) p ( b ) a and b are independent : c � p ( a , b ) = p ( c | a , b ) p ( a ) p ( b ) = p ( a ) p ( b ) c a b a and b are not conditionally independent given c : p ( a , b | c ) = p ( c | a , b ) p ( a ) p ( b ) � = p ( a | c ) p ( b | c ) p ( c ) c c is head-to-head wrt to the path a → b as it is connected to the heads of the two arrows Bayesian networks

d-separation: basic rules summary tail to tail head to tail head to head c a b a c b no evidence a b c c a b a c b evidence a b c independent dependent Bayesian networks

Example of head-to-head connection Setting A fuel system in a car: battery B , either charged ( B = 1) or flat ( B = 0) fuel tank F , either full ( F = 1) or empty ( F = 0) electric fuel gauge G , either full ( G = 1) or empty ( G = 0) Conditional probability tables (CPT) Battery and tank have B F independent prior probabilities: P ( B = 1 ) = 0 . 9 P ( F = 1 ) = 0 . 9 G The fuel gauge is conditioned on both (unreliable!): P ( G = 1 | B = 1 , F = 1 ) = 0 . 8 P ( G = 1 | B = 1 , F = 0 ) = 0 . 2 P ( G = 1 | B = 0 , F = 1 ) = 0 . 2 P ( G = 1 | B = 0 , F = 0 ) = 0 . 1 Bayesian networks

Example of head-to-head connection Probability of empty tank B F Prior: P ( F = 0 ) = 1 − P ( F = 1 ) = 0 . 1 G Posterior after observing empty fuel gauge: P ( F = 0 | G = 0 ) = P ( G = 0 | F = 0 ) P ( F = 0 ) ≃ 0 . 257 P ( G = 0 ) Note The probability that the tank is empty increases from observing that the fuel gauge reads empty (not as much as expected because of strong prior and unreliable gauge) Bayesian networks

Example of head-to-head connection Derivation � P ( G = 0 | F = 0 ) = P ( G = 0 , B | F = 0 ) B ∈{ 0 , 1 } � = P ( G = 0 | B , F = 0 ) P ( B | F = 0 ) B ∈{ 0 , 1 } � = P ( G = 0 | B , F = 0 ) P ( B ) = 0 . 81 B ∈{ 0 , 1 } � � P ( G = 0 ) = P ( G = 0 , B , F ) B ∈{ 0 , 1 } F ∈{ 0 , 1 } � � P ( G = 0 | B , F ) P ( B ) P ( F ) = B ∈{ 0 , 1 } F ∈{ 0 , 1 } Bayesian networks

Example of head-to-head connection Probability of empty tank B F Posterior after observing that the battery is also flat: P ( F = 0 | G = 0 , B = 0 ) = G P ( G = 0 | F = 0 , B = 0 ) P ( F = 0 | B = 0 ) ≃ 0 . 111 P ( G = 0 | B = 0 ) Note The probability that the tank is empty decreases after observing that the battery is also flat The battery condition explains away the observation that the fuel gauge reads empty The probability is still greater than the prior one, because the fuel gauge observation still gives some evidence in favour of an empty tank Bayesian networks

d-separation General Head-to-head Let a descendant of a node x be any node which can be reached from x with a path following the direction of the arrows A head-to-head node c unblocks the dependency path between its parents if either itself or any of its descendants receives evidence Bayesian networks

General d-separation criterion d-separation definition Given a generic Bayesian network Given A , B , C arbitrary nonintersecting sets of nodes The sets A and B are d-separated by C ( dsep ( A ; B | C ) ) if: All paths from any node in A to any node in B are blocked A path is blocked if it includes at least one node s.t. either: the arrows on the path meet tail-to-tail or head-to-tail at the node and it is in C , or the arrows on the path meet head-to-head at the node and neither it nor any of its descendants is in C d-separation implies conditional independence The sets A and B are independent given C ( A ⊥ B | C ) if they are d-separated by C . Bayesian networks

Example of general d-separation a ⊤ ⊤ b | c f a Nodes a and b are not d-separated by c : e b Node f is tail-to-tail and not observed Node e is head-to-head and its child c c is observed a ⊥ b | f f a Nodes a and b are d-separated by f : Node f is tail-to-tail and observed e b c Bayesian networks

Bayesian networks Andrea Passerini passerini@disi.unitn.it Machine - PowerPoint PPT Presentation

Bayesian networks Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian networks Graphical models Why All probabilistic inference and learning amount at repeated applications of the sum and product rules Probabilistic graphical

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Autonomic Slicing Background and Possible extension to Anima 31 st March 2017

Data-Loop-Free Self-Timed Circuit Verification Cuong Chau 1 , Warren A. Hunt Jr. 1 , Matt Kaufmann

Applications in the LogGOPS Model Torsten Hoefler, Timo Schneider, Andrew Lumsdaine Presented at

1 John Series Lesson #026 June 24, 2001 Dean Bible Ministries www.deanbibleministries.org Dr.

Preparing for an Unmanned Future in SESAR Real-time Simulation of RPAS Missions E. Pastor M. P

Scanned Documents LBSC 796/INFM 718R Douglas W. Oard Week 8, March 30, 2011 Expanding the

Impact of water abstraction on the water balance of Lake Ziway, Ethiopia 1 Demelash Goshime 1, 4,

Lesson 2 Why river basin Prof. R. Nagarajan, CSRE , IIT Bombay GNR 624 : Water Resources and