probabilistic reasoning
play

Probabilistic Reasoning a h C , N R wrt Time Decision - PDF document

5 1 r e t p Probabilistic Reasoning a h C , N R wrt Time Decision Theoretic Agents Introduction to Probability [Ch13] Belief networks [Ch14] Dynamic Belief Networks [Ch15] Foundations Markov Chains


  1. 5 1 r e t p Probabilistic Reasoning a h C , N R wrt Time

  2. Decision Theoretic Agents Introduction to Probability [Ch13] � Belief networks [Ch14] � � Dynamic Belief Networks [Ch15] � Foundations � Markov Chains (Classification) � Hidden Markov Models (HMM) � Kalman Filter � General: Dynamic Belief Networks (DBN) � Applications � Future Work, Extensions, ... Single Decision [Ch16] � Sequential Decisions [Ch17] � Game Theory [Ch 17.6 – 17.7] � 2

  3. Markovian Models � In general, X t+ 1 depends on everything ! … X t , X t-1 , … � Markovian means... Future is independent of the past once you know the present. , … ) = P( X t+ 1 | X t P( X t+ 1 | X t , X t-1 ) � Markov Chain: “state” (everything important) is visible , 〈 everything 〉 P( x t+ 1 | x t ) = P( x t+ 1 | x t ) � Eg: First-Order Markov Chain 1. Random Walk along x axis, changing x-position ± 1 at each time 2. Predicting rain � Stationarity: P( x 2 | x 1 ) = P( x 3 | x 2 ) = … = P( x t+ 1 | x t ) � Hidden Markov Model: State information not visible 4

  4. Using Markov Chain, for Classification � Two classes of DNA... different di-nucleotide distribution � Use this to classify a nucleotide sequence x = 〈 ACATTGACCA… 〉 A : P( x |+ ) = p + ( x 1 | ) p + ( x 2 | x 1 ) p + ( x 3 | x 2 ) … p + ( x k | x k-1 ) = + ∏ i= 1 ) = ∏ i= 1 k p + (x i k a |x i-1 xi|xi-1 using Markov properties 5

  5. Using Markov Chain, for Classification � Is x = 〈 ACATTGACCAT 〉 positive? P( x |+ ) = p + ( x 1 | ) p + ( x 2 | x 1 ) p + ( x 3 | x 2 ) … p + ( x k | x k-1 ) = p + (A) p + ( C | A ) p + ( A | C) … p + ( T | A) = 0.25 × 0.274 × 0.171 × … × 0.355 P( x |–) = p – ( x 1 | ) p – ( x 2 | x 1 ) p – ( x 3 | x 2 ) … p – ( x k | x k-1 ) = p – (A) p – ( C | A ) p – ( A | C) … p – ( T | A) = 0.25 × 0.205 × 0.322 × … × 0.239 � Pick larger: + if p(x|+ ) > p(x | – ) 6

  6. Results (Markov Chain) Predict – Predict + � Results over 48 sequences: � Here: everything is visible � Sometimes, can't see the “states” 7

  7. Phydeaux, the Dog Known Correlations � Sometimes: Grumpy � State { G,H } to Sometimes: Happy Observations { s, f, y} � But hides emotional state… � State { G,H } on day t to Only observations: state { G,H } on day t+ 1 { slobbers, frowns, yelps } p= 0.15 p= 0.85 p= 0.95 Grumpy (state) Happy (state) p( s | g) = 0.15 p( s | h) = 0.8 p( f | g) = 0.75 p( f | h) = 0.15 p( y | g) = 0.10 p( y | h) = 0.05 p= 0.05 � Challenge: Given observation sequence: 〈 s, s, f, y, y, f, … 〉 what were Phydeaux's states? ?? 〈 H, H, H, G, G, G, … 〉 8

  8. Umbrella+ Rain Situation X t ∈ { + rain, –rain } � State: � Observation: E t ∈ { + umbrella, –umbrella} � Simple Belief Net: R 0 � Note: Umbrella t depends only on Rain t Rain t depends only on Rain t-1 9

  9. R 0 HMM Tasks Filtering / Monitoring: P( X t | e 1:t ) 1. What is P(R 3 = + | U 1 = + , U 2 = + , U 3 = –) ? � Need distr. current state to make rational decisions � Prediction: P( X t+ k | e 1:t ) 2. What is P(R 5 = – | U 1 = + , U 2 = + , U 3 = –) ? � Use to evaluate possible courses of actions � Smoothing / Hindsight: P( X t-k | e 1:t ) 3. What is P(R 1 = – | U 1 = + , U 2 = + , U 3 = –) ? � Likelihood: P( e 1:t ) 4. What is P(U 1 = + , U 2 = + , U 3 = –) ? � For comparing different models … classification � Most likely expl'n: argmax x1:t { P( x 1:t | e 1:t ) } 5. Given 〈 U 1 = + , U 2 = + , U 3 = – 〉 , � what is most likely value for 〈 , R 3 〉 R 1 , R 2 ? Compute assignments, for DNA, sounds, . . . � 11

  10. R 0 1. Filtering � At time 3: have � P(R 2 | u 1:2 ) = 〈 P(+ r 2 |+ + ), P(–r 2 |+ + ) 〉 � … then observe u 3 = – � P(R 3 | u 1:3 ) = P( R 3 | u 1:2 , u 3 ) = 1/P(u 1:3 ) P( u 3 | R 3 , u 1:2 ) P(R 3 | u 1:2 ) = 1/P(u 1:3 ) P( u 3 | R 3 ) P(R 3 | e 1:2 ) � P( R 3 | e 1:2 ) = ∑ r2 P(R 3 , r 2 | e 1:2 ) = ∑ r2 P(R 3 | r 2 , e 1:2 ) P( r 2 | e 1:2 ) = ∑ r2 P(R 3 | r 2 ) P( r 2 | e 1:2 ) 12

  11. 1. Filtering R 0 � At time t: � have P(X t | e 1:t ) � … then update from e t+ 1 � P(X t+ 1 | e 1:t+ 1 ) = α P( e t+ 1 | X t+ 1 ) ∑ xt P(X t+ 1 | x t ) P( x t | e 1:t ) Transition Prob’s distribution wrt time t Emission Prob’s � Called “ Forward Algorithm ” 14

  12. P( x t , e 1:t ) vs P( x t | e 1:t ) To compute P( X t = a | e 1:t ): Just compute 〈 = k , e 1:t ) 〉 P( X t = 1 , e 1:t ), …, P( X t 1. Compute P(e 1:t ) = ∑ i P( X t = i , e 1:t ) 2. Return P( X t = a | e 1:t ) = P( X t = a , e 1:t ) / P( e 1:t ) = a , e 1:t ) / ∑ i P( X t = P( X t = i , e 1:t ) Normalizing constant: α = 1/ P(e 1:t ) 15

  13. Filtering – Forward Algorithm � Let f 1:t = P( X t | e 1:t ) = 〈 P( X t = 1 | e 1:t ),..., P( X t = r | e 1:t ) 〉 f 1:t+ 1 (x t+ 1 ) = P( x t+ 1 | e 1:t+ 1 ) = α P( e t+ 1 | x t+ 1 ) ∑ xt P(X t+ 1 | x t ) f 1:t (x t ) � f 1:t+ 1 = α Forward( f 1:t+ 1 , e t+ 1 ) Detached! � Update (for discrete state variables): Constant time & Constant space! 16

  14. Filtering Process State.t from State.t-1 State.t from Percept.t State.t+ 1 from State.t 17

  15. R 0 Forward( ) Process Given: P(R 0 ) = 〈 0.5, 0.5 〉 � Evidence 〈 U 1 = + , U 2 = + 〉 : Predict state distribution (before evidence) � ) = ∑ r0 P(R 1 | r 0 ) P( r 0 ) P(R 1 = 〈 0.7, 0.3 〉× 0.5 + 〈 0.2, 0.8 〉× 0.5 = 〈 0.45, 0.55 〉 I ncorporate “Day 1 evidence" + u 1 : � P(R 1 | + u 1 ) = α P(+ u 1 | R 1 ) P( R 1 ) 〈 0.9, 0.2 〉 .* 〈 0.45, 0.55 〉 〈 0.405, 0.11 〉 〈 0.786, 0.214 〉 = α = α ≈ Predict (from t = 1 to t = 2, before new evidence) � P(R 2 | + u 1 ) = ∑ r1 P(R 2 | r 1 ) P( r 1 | + u 1 ) = 〈 0.7, 0.3 〉 0.786 + 〈 0.2, 0.8 〉 〈 0.593, 0.407 〉 0.214 ≈ I ncorporate “Day 2 evidence” + u 2 : � P(R 2 |+ u 1 ,+ u 2 ) = P(+ u 2 |R 2 ) P(R 2 |+ u 1 ) = 〈 0.9, 0.2 〉 .* 〈 0.609, 0.391 〉 〈 0.533, 0.081 〉 〈 0.868, 0.132 〉 α = α ≈ 18

  16. R 0 HMM Tasks Filtering / Monitoring: P( X t | e 1:t ) 1. What is P(R 3 = + | U 1 = + , U 2 = + , U 3 = –) ? � Need distr. current state to make rational decisions � Prediction: P( X t+ k | e 1:t ) 2. What is P(R 5 = – | U 1 = + , U 2 = + , U 3 = –) ? � Use to evaluate possible courses of actions � Smoothing / Hindsight: P( X t-k | e 1:t ) 3. What is P(R 1 = – | U 1 = + , U 2 = + , U 3 = –) ? � Likelihood: P( e 1:t ) 4. What is P(U 1 = + , U 2 = + , U 3 = –) ? � For comparing different models … classification � Most likely expl'n: argmax x1:t { P( x 1:t | e 1:t ) } 5. Given 〈 U 1 = + , U 2 = + , U 3 = – 〉 , � what is most likely value for 〈 , R 3 〉 R 1 , R 2 ? Compute assignments, for DNA, sounds, . . . � 19

  17. R 0 4. Likelihood � How to compute likelihood P( e 1:t ) ? � Let L 1:t = P( X t , e 1:t ) , e 1:t+ 1 ) = ∑ xt P( x t L 1:t+ 1 = P( X t+ 1 , X t+ 1 , e 1:t , e t+ 1 ) = ∑ xt P( e t+ 1 | X t+ 1 , x t , e 1:t ) P(X t+ 1 | x t , e 1:t ) P( x t , e 1:t ) = P( e t+ 1 | X t+ 1 ) ∑ xt P(X t+ 1 | x t ) L 1:t (x t ) � Note: Same Forward( ) algorithm!! � To compute actual likelihood: P( e 1:t ) = ∑ xt P(X t = x t , e 1:t ) = ∑ xt L 1:t (x t ) 20

  18. Best Model of Phydeaux? p= 0.15 I p= 0.85 p= 0.95 Grumpy (state) Happy (state) p( s | g) = 0.15 p( s | h) = 0.8 p( f | g) = 0.75 p( f | h) = 0.15 p( y | g) = 0.10 p( y | h) = 0.05 p= 0.05 p= 025 p= 0.75 II p= 0.75 Grumpy (state) Happy (state) p( s | g) = 0.10 p( s | h) = 0.5 p( f | g) = 0.8 p( f | h) = 0.25 p( y | g) = 0.10 p( y | h) = 0.25 p= 0.25 � Challenge: Given observation sequence: 〈 s, s, f, y, y, f, … 〉 which model of Phydeaux is “correct”?? ( e ) vs P II ( e ) Want P I 21

  19. Use HMMs to Classify Words in Speech Recognition � Use one HMM for each word � hmm j for j th word � Convert acoustic signal to sequence of fixed duration frames (eg, 60ms) (Assumes know start/end of each word in speech signal) � Map each frame to nearest “codebook” frame (discrete symbol x t ) � e 1:T = 〈 e 1 , ... , e n 〉 To classify sequence of frames e 1:T � � 1. Compute P( e 1:T | hmm j ) likelihood e 1:T generated by each word hmm j � 2. Return argmax j { P( e 1:T | hmm j ) } word# j whose hmm j gave highest likelihood 22

  20. R 0 HMM Tasks Filtering / Monitoring: P( X t | e 1:t ) 1. What is P(R 3 = + | U 1 = + , U 2 = + , U 3 = –) ? � Need distr. over current state to make rational decisions � Prediction: P( X t+ k | e 1:t ) 2. What is P(R 5 = – | U 1 = + , U 2 = + , U 3 = –) ? � Use to evaluate possible courses of actions � Smoothing / Hindsight: P( X t-k | e 1:t ) 3. What is P(R 1 = – | U 1 = + , U 2 = + , U 3 = –) ? � Likelihood: P( e 1:t ) 4. What is P(U 1 = + , U 2 = + , U 3 = –) ? � For comparing different models … classification � Most likely expl'n: argmax x1:t { P( x 1:t | e 1:t ) } 5. Given 〈 U 1 = + , U 2 = + , U 3 = – 〉 , � what is most likely value for 〈 , R 3 〉 R 1 , R 2 ? Compute assignments, for DNA, sounds, . . . � 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend