outline graphical models part i
play

Outline Graphical Models - Part I Greg Mori - CMPT 419/726 - PDF document

Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Outline Graphical Models - Part I Greg Mori - CMPT 419/726 Probabilistic Models Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Bayesian Networks


  1. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Outline Graphical Models - Part I Greg Mori - CMPT 419/726 Probabilistic Models Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Bayesian Networks Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Probabilistic Models Reminder - Three Tricks • Bayes’ rule: • We now turn our focus to probabilistic models for pattern p ( Y | X ) = p ( X | Y ) p ( Y ) recognition = α p ( X | Y ) p ( Y ) • Probabilities express beliefs about uncertain events, useful p ( X ) for decision making, combining sources of information • Marginalization: • Key quantity in probabilistic reasoning is the joint � distribution � p ( X ) = p ( X , Y = y ) or p ( X ) = p ( X , Y = y ) dy p ( x 1 , x 2 , . . . , x K ) y where x 1 to x K are all variables in model • Product rule: • Address two problems p ( X , Y ) = p ( X ) p ( Y | X ) • Inference: answering queries given the joint distribution • All 3 work with extra conditioning, e.g.: • Learning: deciding what the joint distribution is (involves inference) � p ( X | Z ) = p ( X , Y = y | Z ) • All inference and learning problems involve manipulations y of the joint distribution p ( Y | X , Z ) = α p ( X | Y , Z ) p ( Y | Z )

  2. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Joint Distribution Joint Distribution toothache toothache L catch catch catch catch L L toothache toothache L .108 .012 .072 .008 cavity catch catch catch catch L L cavity .016 .064 .144 .576 L cavity .108 .012 .072 .008 .016 .064 .144 .576 cavity L • Consider model with 3 boolean random variables: cavity , catch , toothache • Consider model with 3 boolean random variables: cavity , • Can answer query such as catch , toothache • Can answer query such as p ( ¬ cavity | toothache ) = p ( ¬ cavity , toothache ) p ( toothache ) p ( ¬ cavity | toothache ) 0 . 016 + 0 . 064 p ( ¬ cavity | toothache ) = 0 . 108 + 0 . 012 + 0 . 016 + 0 . 064 = 0 . 4 Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Joint Distribution Problems • In general, to answer a query on random variables Q = Q 1 , . . . , Q N given evidence E = e , E = E 1 , . . . , E M , • The joint distribution is large • e. g. with K boolean random variables, 2 K entries e = e 1 , . . . , e M : • Inference is slow, previous summations take O ( 2 K ) time p ( Q , E = e ) p ( Q | E = e ) = • Learning is difficult, data for 2 K parameters p ( E = e ) � • Analogous problems for continuous random variables h p ( Q , E = e , H = h ) = � q , h p ( Q = q , E = e , H = h )

  3. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Reminder - Independence Reminder - Conditional Independence • p ( Toothache , Cavity , Catch ) has 2 3 − 1 = 7 independent Cavity Cavity entries Toothache Catch decomposes into Toothache Catch Weather • If I have a cavity, the probability that the probe catches in it Weather doesn’t depend on whether I have a toothache: • A and B are independent iff (1) P ( catch | toothache , cavity ) = P ( catch | cavity ) p ( A | B ) = p ( A ) or p ( B | A ) = p ( B ) or p ( A , B ) = p ( A ) p ( B ) • The same independence holds if I haven’t got a cavity: • p ( Toothache , Catch , Cavity , Weather ) = (2) P ( catch | toothache , ¬ cavity ) = P ( catch |¬ cavity ) p ( Toothache , Catch , Cavity ) p ( Weather ) • Catch is conditionally independent of Toothache given • 32 entries reduced to 12 ( Weather takes one of 4 values) Cavity : p ( Catch | Toothache , Cavity ) = p ( Catch | Cavity ) • Absolute independence powerful but rare • Equivalent statements: • Dentistry is a large field with hundreds of variables, none of • p ( Toothache | Catch , Cavity ) = p ( Toothache | Cavity ) which are independent. What to do? • p ( Toothache , Catch | Cavity ) = p ( Toothache | Cavity ) p ( Catch | Cavity ) • Toothache ⊥ ⊥ Catch | Cavity Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Conditional Independence contd. Graphical Models • Graphical Models provide a visual depiction of probabilistic • Write out full joint distribution using chain rule: model p ( Toothache , Catch , Cavity ) = p ( Toothache | Catch , Cavity ) p ( Catch , Cavity ) • Conditional indepence assumptions can be seen in graph = p ( Toothache | Catch , Cavity ) p ( Catch | Cavity ) p ( Cavity ) • Inference and learning algorithms can be expressed in = p ( Toothache | Cavity ) p ( Catch | Cavity ) p ( Cavity ) terms of graph operations 2 + 2 + 1 = 5 independent numbers • We will look at 2 types of graph (can be combined) • In many cases, the use of conditional independence • Directed graphs: Bayesian networks greatly reduces the size of the representation of the joint • Undirected graphs: Markov Random Fields distribution • Factor graphs (won’t cover)

  4. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Bayesian Networks Example • A simple, graphical notation for conditional independence Cavity Weather assertions and hence for compact specification of full joint distributions • Syntax: • a set of nodes, one per variable Toothache Catch • a directed, acyclic graph (link ≈ “directly influences”) • a conditional distribution for each node given its parents: • Topology of network encodes conditional independence assertions: p ( X i | pa ( X i )) • Weather is independent of the other variables • In the simplest case, conditional distribution represented • Toothache and Catch are conditionally independent given Cavity as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Example Example contd. P(E) P(B) Burglary Earthquake .001 .002 • I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t call. Sometimes it’s set off by B E P(A|B,E) minor earthquakes. Is there a burglar? T T .95 Alarm • Variables: Burglar , Earthquake , Alarm , JohnCalls , MaryCalls T F .94 F T .29 • Network topology reflects “causal” knowledge: F F .001 • A burglar can set the alarm off • An earthquake can set the alarm off • The alarm can cause Mary to call A P(J|A) A P(M|A) • The alarm can cause John to call T .90 JohnCalls MaryCalls T .70 F .05 F .01

  5. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Compactness Global Semantics B E B E • A CPT for Boolean X i with k Boolean parents • Global semantics defines the full joint has 2 k rows for the combinations of parent distribution as the product of the local A A values conditional distributions: J M J M • Each row requires one number p for X i = true n � (the number for X i = false is just 1 − p ) P ( x 1 , . . . , x n ) = P ( x i | pa ( X i )) • If each variable has no more than k parents, i = 1 the complete network requires O ( n · 2 k ) e.g., P ( j ∧ m ∧ a ∧ ¬ b ∧ ¬ e ) = numbers • i.e., grows linearly with n , vs. O ( 2 n ) for the full P ( j | a ) P ( m | a ) P ( a |¬ b , ¬ e ) P ( ¬ b ) P ( ¬ e ) joint distribution = 0 . 9 × 0 . 7 × 0 . 001 × 0 . 999 × 0 . 998 • For burglary net, ?? numbers ≈ 0 . 00063 • 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 − 1 = 31 ) Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Constructing Bayesian Networks Example • Need a method such that a series of locally testable assertions of conditional independence guarantees the Suppose we choose the ordering M , J , A , B , E required global semantics MaryCalls 1. Choose an ordering of variables X 1 , . . . , X n JohnCalls 2. For i = 1 to n add X i to the network select parents from X 1 , . . . , X i − 1 such that p ( X i | pa ( X i )) = p ( X i | X 1 , . . . , X i − 1 ) • This choice of parents guarantees the global semantics: n � p ( X 1 , . . . , X n ) = p ( X i | X 1 , . . . , X i − 1 ) (chain rule) i = 1 P ( J | M ) = P ( J ) ? n � = p ( X i | pa ( X i )) (by construction) i = 1

  6. Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary P ( J | M ) = P ( J ) ? No P ( J | M ) = P ( J ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? P ( B | A , J , M ) = P ( B | A ) ? P ( B | A , J , M ) = P ( B ) ? Probabilistic Models Bayesian Networks Probabilistic Models Bayesian Networks Example Example Suppose we choose the ordering M , J , A , B , E Suppose we choose the ordering M , J , A , B , E MaryCalls MaryCalls JohnCalls JohnCalls Alarm Alarm Burglary Burglary Earthquake Earthquake P ( J | M ) = P ( J ) ? No P ( J | M ) = P ( J ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No P ( A | J , M ) = P ( A | J ) ? P ( A | J , M ) = P ( A ) ? No P ( B | A , J , M ) = P ( B | A ) ? P ( B | A , J , M ) = P ( B | A ) ? Yes Yes P ( B | A , J , M ) = P ( B ) ? No P ( B | A , J , M ) = P ( B ) ? No P ( E | B , A , J , M ) = P ( E | A ) ? P ( E | B , A , J , M ) = P ( E | A ) ? No P ( E | B , A , J , M ) = P ( E | A , B ) ? P ( E | B , A , J , M ) = P ( E | A , B ) ? Yes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend