4 Bayesian Belief Networks (also called Bayes Nets) Interesting - PowerPoint PPT Presentation

40. 4 Bayesian Belief Networks (also called Bayes Nets) Interesting because: • The Naive Bayes assumption of conditional independence of attributes is too restrictive. (But it’s intractable without some such assumptions...) • Bayesian Belief networks describe conditional independence among subsets of variables. • It allows the combination of prior knowledge about (in)dependencies among variables with observed training data.

41. Conditional Independence Definition: X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given a value of Z : ( ∀ x i , y j , z k ) P ( X = x i | Y = y j , Z = z k ) = P ( X = x i | Z = z k ) More compactly, we write P ( X | Y, Z ) = P ( X | Z ) Note: Naive Bayes uses conditional independence to justify P ( A 1 , A 2 | V ) = P ( A 1 | A 2 , V ) P ( A 2 | V ) = P ( A 1 | V ) P ( A 2 | V ) Generalizing the above definition: P ( X 1 . . . X l | Y 1 . . . Y m , Z 1 . . . Z n ) = P ( X 1 . . . X l | Z 1 . . . Z n )

42. Storm BusTourGroup S,B S,¬B ¬S,B ¬S,¬B 0.4 0.1 0.8 0.2 C A Bayes Net Lightning Campfire 0.6 0.9 0.2 0.8 ¬C Campfire Thunder ForestFire The network is defined by • A directed acyclic graph, represening a set of conditional independence assertions: Each node — representing a random variable — is asserted to be conditionally independent of its nondescendants, given its immediate predecessors. Example: P ( Thunder | ForestFire, Lightning ) = P ( Thunder | Lightning ) • A table of local conditional probabilities for each node/variable.

43. A Bayes Net (Cont’d) represents the joint probability distribution over all variables Y 1 , Y 2 , . . . , Y n : This joint distribution is fully defined by the graph, plus the conditional probabilities: n P ( y 1 , . . . , y n ) = P ( Y 1 = y 1 , . . . , Y n = y n ) = P ( y i | Parents ( Y i )) � i =1 where Parents ( Y i ) denotes immediate predecessors of Y i in the graph. In our example: P ( Storm, BusTourGroup, . . . , ForestFire )

44. Inference in Bayesian Nets Question: Given a Bayes net, can one infer the probabilities of values of one or more network variables, given the observed values of (some) others? Example: P(L)=0.4 L Given the Bayes net F P(F)=0.6 compute: P(S|L,F)=0.8 P(S|~L,F)=0.5 S (a) P ( S ) P(S|L,~F)=0.6 P(S|~L,~F)=0.3 (b) P ( A, S ) (b) P ( A ) P(G|S)=0.8 P(A|S)=0.7 A G P(G|~S)=0.2 P(A|~S)=0.3

45. Inference in Bayesian Nets (Cont’d) Answer(s): • If only one variable is of unknown (probability) value, then it is easy to infer it • In the general case, we can compute the probability distribution for any subset of network variables, given the distribution for any subset of the remaining variables. But... • The exact inference of probabilities for an arbitrary Bayes net is an NP-hard problem!!

46. Inference in Bayesian Nets (Cont’d) In practice, we can succeed in many cases: • Exact inference methods work well for some net structures. • Monte Carlo methods “simulate” the network randomly to calculate approximate solutions [ Pradham & Dagum, 1996 ] . (In theory even approximate inference of probabilities in Bayes Nets can be NP-hard!! [ Dagum & Luby, 1993 ] )

47. Learning Bayes Nets (I) There are several variants of this learning task • The network structure might be either known or unknown (i.e., it has to be inferred from the training data). • The training examples might provide values of all network variables, or just for some of them. The simplest case: If the structure is known and we can observe the values of all variables, then it is easy to estimate the conditional probability table entries. (Analogous to training a Naive Bayes clas- sifier.)

48. Learning Bayes Nets (II) When • the structure of the Bayes Net is known, and • the variables are only partially observable in the training data learning the entries in the conditional probabilities tables is similar to (learning the weights of hidden units in) training a neural network with hidden units: − We can learn the net’s conditional probability tables using the gradient ascent! − Converge to the network h that (locally) maximizes P ( D | h ) .

49. Gradient Ascent for Bayes Nets Let w ijk denote one entry in the conditional probability table for the variable Y i in the network w ijk = P ( Y i = y ij | Parents ( Y i ) = the list u ik of values) It can be shown (see the next two slides) that ∂lnP h ( D ) P h ( y ij , u ik | d ) = � ∂w ijk w ijk d ∈ D therefore perform gradient ascent by repeatedly 1. update all w ijk using the 2. renormalize the w ijk to training data D assure P h ( y ij , u ik | d ) � w ijk = 1 and 0 ≤ w ijk ≤ 1 w ijk ← w ijk + η � j w ijk d ∈ D

50. Gradient Ascent for Bayes Nets: Calculus ∂ lnP h ( D ) ∂ ∂ ln P h ( d ) 1 ∂P h ( d ) = ln � P h ( d ) = � = � P h ( d ) ∂w ijk ∂w ijk ∂w ijk ∂w ijk d ∈ D d ∈ D d ∈ D Summing over all values y ij ′ of Y i , and u ik ′ of U i = Parents ( Y i ) : ∂ lnP h ( D ) 1 ∂ = � j ′ k ′ P h ( d | y ij ′ , u ik ′ ) P h ( y ij ′ , u ik ′ ) � ∂w ijk P h ( d ) ∂w ijk d ∈ D 1 ∂ = � j ′ k ′ P h ( d | y ij ′ , u ik ′ ) P h ( y ij ′ | u ik ′ ) P h ( u ik ′ ) � P h ( d ) ∂w ijk d ∈ D Note that w ijk ≡ P h ( y ij | u ik ) , therefore...

51. Gradient Ascent for Bayes Nets: Calculus (Cont’d) ∂ lnP h ( D ) 1 ∂ = � P h ( d | y ij , u ik ) w ijk P h ( u ik ) ∂w ijk P h ( d ) ∂w ijk d ∈ D 1 = P h ( d ) P h ( d | y ij , u ik ) P h ( u ik ) (applying Bayes th.) � d ∈ D 1 P h ( y ij , u ik | d ) P h ( d ) P h ( u ik ) = � P h ( d ) P h ( y ij , u ik ) d ∈ D P h ( y ij , u ik | d ) P h ( u ik ) P h ( y ij , u ik | d ) = � = � P h ( y ij , u ik ) P h ( y ij | u ik ) d ∈ D d ∈ D P h ( y ij , u ik | d ) = � w ijk d ∈ D

52. Learning Bayes Nets (II, Cont’d) The EM algorithm (see next sildes) can also be used. Repeatedly: 1. Calculate/estimate from data the probabilities of unob- served variables w ijk , assuming that the hypothesis h holds 2. Calculate a new h (i.e. new values of w ijk ) so to maximize E [ln P ( D | h )] , where D now includes both the observed and the unob- served variables.

53. Learning Bayes Nets (III) When the structure is unknown, algorithms usually use greedy search to trade off network complexity (add/substract edges/nodes) against degree of fit to the data. Example: [ Cooper & Herscovitz, 1992 ] the K 2 algorithm: When data is fully observable, use a score metric to choose among alternative networks. They report an experiment on (re-learning) a network with 37 nodes and 46 arcs describing anesthesia problems in a hospital operating room. Using 3000 examples, the program succeeds almost perfectly: it misses one arc and adds an arc which is not in the original net.

54. Summary: Bayesian Belief Networks • Combine prior knowledge with observed data • The impact of prior knowledge (when correct!) is to lower the sample complexity • Active/Recent research area – Extend from boolean to real-valued variables – Parameterized distributions instead of tables – Extend to first-order instead of propositional systems – More effective inference methods – ...

4 Bayesian Belief Networks (also called Bayes Nets) Interesting - PowerPoint PPT Presentation

40. 4 Bayesian Belief Networks (also called Bayes Nets) Interesting because: The Naive Bayes assumption of conditional independence of attributes is too restrictive. (But its intractable without some such assumptions...) Bayesian

Overview Independence Belief Networks Conditional Independence Belief networks Chris

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Belief Networks Decision Theoretic Agents Introduction to Probability [Ch13]

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Belief Networks Some Belief Network references E. Charniak Bayesian Networks without

5.2 Learning Bayesian networks: General idea See Witten et al. 2011. Bayesian (belief) networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Inference in Belief Networks CMPUT 366: Intelligent Systems P&M 8.4 Lecture Outline

Bayesian Belief Network 14.4 Inference Decision Theoretic Agents Introduction to Probability

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], [3.5, KPM], [1.5.4, CMB]

quancol . ........ . . . ... ... ... ... ... ... ... Hillston 21/9/2016 1 / 70

P(X=x i ) , or P(x i ) , is the probability that the = Pr( x ( a , b )) p (

Using the UART with STM32 Microcontrollers Corrado Santoro ARSLAB - Autonomous and Robotic

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

ECE 238L Arithmetic Operations and Codes August 30, 2006 Typeset by Foil T EX Binary

A Sum Error Detection Scheme for Decimal Arithmetic www.itmati.com Alvaro Vzquez ITMATI.

Sambuz

Useful Links

Newsletter

Mail Us

4 Bayesian Belief Networks (also called Bayes Nets) Interesting - PowerPoint PPT Presentation

40. 4 Bayesian Belief Networks (also called Bayes Nets) Interesting because: The Naive Bayes assumption of conditional independence of attributes is too restrictive. (But its intractable without some such assumptions...) Bayesian

Overview Independence Belief Networks Conditional Independence Belief networks Chris

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Belief Networks Decision Theoretic Agents Introduction to Probability [Ch13]

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Belief Networks Some Belief Network references E. Charniak Bayesian Networks without

5.2 Learning Bayesian networks: General idea See Witten et al. 2011. Bayesian (belief) networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Inference in Belief Networks CMPUT 366: Intelligent Systems P&amp;M 8.4 Lecture Outline

Bayesian Belief Network 14.4 Inference Decision Theoretic Agents Introduction to Probability

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], [3.5, KPM], [1.5.4, CMB]

quancol . ........ . . . ... ... ... ... ... ... ... Hillston 21/9/2016 1 / 70

P(X=x i ) , or P(x i ) , is the probability that the = Pr( x ( a , b )) p (

Using the UART with STM32 Microcontrollers Corrado Santoro ARSLAB - Autonomous and Robotic

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale

Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

ECE 238L Arithmetic Operations and Codes August 30, 2006 Typeset by Foil T EX Binary

A Sum Error Detection Scheme for Decimal Arithmetic www.itmati.com Alvaro Vzquez ITMATI.

Sambuz

Useful Links

Newsletter

Mail Us

Inference in Belief Networks CMPUT 366: Intelligent Systems P&M 8.4 Lecture Outline