introduction to bayesian belief nets
play

Introduction to Bayesian Belief Nets Russ Greiner Dept of - PowerPoint PPT Presentation

Introduction to Bayesian Belief Nets Russ Greiner Dept of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta http://www.cs.ualberta.ca/~ greiner/bn.html 2 Motivation Gates says [LATimes, 28/Oct/96]:


  1. Introduction to Bayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta http://www.cs.ualberta.ca/~ greiner/bn.html

  2. 2

  3. Motivation � Gates says [LATimes, 28/Oct/96]: Microsoft’s competitive advantages is its expertise in “Bayesian networks” � Current Products � Microsoft Pregnancy and Child Care (MSN) � Answer Wizard (Office, …) � Print Troubleshooter Excel Workbook Troubleshooter Office 95 Setup Media Troubleshooter Windows NT 4.0 Video Troubleshooter Word Mail Merge Troubleshooter 3

  4. Motivation (II) � US Army: SAI P ( Battalion Detection from SAR, IR… GulfWar) � NASA: Vista (DSS for Space Shuttle) � GE: Gems (real-time monitor for utility generators) � Intel: (infer possible processing problems from end-of-line tests on semiconductor chips) � KIC: � medical: sleep disorders, pathology, trauma care, hand and wrist evaluations, dermatology, home- based health evaluations � DSS for capital equipment : locomotives, gas- turbine engines, office equipment 4

  5. Motivation (III) � Lymph-node pathology diagnosis � Manufacturing control � Software diagnosis � Information retrieval � Types of tasks � Classification/Regression � Sensor Fusion � Prediction/Forecasting 5

  6. Outline � Existing uses of Belief Nets (BNs) � What is a BN ? � Specific Examples of BNs � Contrast with Rules, Neural Nets, … � Possible applications of BNs � Challenges � How to reason efficiently � How to learn BNs 6

  7. Symptoms Symptoms Blah blah ouch yak Chief complaint ouch blah ouch blah History, … blah ouch blah Signs Signs Physical Exam Test results, … Plan Plan Diagnosis Treatment, … 7

  8. Objectives: Decision Support System � Determine � which tests to perform � which repair to suggest based on costs, sensitivity/specificity , … � Use all sources of information � symbolic (discrete observations, history, …) � signal (from sensors) � Handle partial information � Adapt to track fault distribution 8

  9. Underlying Task � Situation: Given observations { O 1 = v 1 , … O k = v k } (symptoms, history, test results, …) what is best DIAGNOSIS Dx i for patient? : Use set of obs 1 & … & obs m → Dx i rules Approach1 : � Approach1 � but… Need rule for each situation � for each diagnosis Dx r � for each set of possible values v j for O j � for each subset of obs. {O x1 , O x2 , … } ⊂ {O j } Can’t use If Temp>100 & BP = High & Cough = Yes → DiseaseX if only know Temp and BP � Seldom Completely Certain 9

  10. Underlying Task � Situation: Given observations { O 1 = v 1 , … O k = v k } (symptoms, history, test results, …) what is best DIAGNOSIS Dx i for patient? � Approach 2 � Approach 2 : Compute Probabilities of Dx i given observations { obs j } P ( Dx = u | O 1 = v 1 , …, O k = v k ) � Challenge: How to express Probabilities? 10

  11. How to deal with Probabilities � Sufficient: “atomic events”: P ( Dx = u, O 1 =v 1 ,..., O k = v k ,…, O N =v N ) for all 2 1+ N values u ∈ { T, F} , v j ∈ { T, F} P ( Dx=T, O 1 =T, O 2 =T, …, O N =T ) = 0.03 P ( Dx=T, O 1 =T, O 2 =T, …, O N =F ) = 0.4 ⇒ … P ( Dx=T, O 1 =F, O 2 =F, … , O N =T ) = 0 … P ( Dx=F, O 1 =F, O 2 =F, …, O N =F ) = 0.01 � Then: Marginalize : ∑ = = = = = = = = P Dx ( u O , v ,... O v ) P Dx ( u O , v ,... O v ,... O v ) 1 1 7 7 1 1 7 7 N N v ,... v 8 N Conditionalize : = = = P Dx ( u O , v ,... O v ) = = = = 1 1 7 7 P Dx ( u O | v ,... O v ) = = 1 1 7 7 P O ( v ,... O v ) 1 1 7 7 • But… even if binary Dx, 20 binary obs.’s. ⇒ > 2,097,000 numbers! 11

  12. Problems with “Atomic Events” Representation is not intuitive � ⇒ Should make “connections” explicit use “local information” P (Jaundice | Hepatitis), P (LightDim | BadBattery),… � Too many numbers – O (2 N ) � Hard to store � Hard to use [Must add 2 r values to marginalize r variables] � Hard to learn [Takes O( 2 N ) samples to learn 2 N parameters] ⇒ Include only necessary “connections” Belief Nets ⇒ 12

  13. 13 but +BloodTest not Jaunticed ? Hepatitis? ? Hepatitis, ? BloodTest Jaunticed

  14. Encoding Causal Links � Simple Belief Net: P(H=1) P(H=0) H 0.05 0.95 h P(B=1 | H=h) P(B=0 | H=h) B 1 0.95 0.05 0 0.03 0.97 h b P(J=1|h,b) P(J=0|h,b) J 1 1 0.8 0.2 � Node ~ Variable 1 0 0.8 0.2 0 1 0.3 0.7 Link ~ “Causal dependency” 0 0 0.3 0.7 � “CPTable” ~ P(child | parents) 14

  15. Encoding Causal Links P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h b P(J= 1|h , b ) B 0 0.03 1 1 0.8 1 0 0.8 J 0 1 0.3 0 0 0.3 � P(J | H, B=0) = P(J | H, B=1) ∀ J, H ! ⇒ P( J | H, B) = P(J | H) � J is INDEPENDENT of B , once we know H � Don’t need B → J arc! 15

  16. Encoding Causal Links P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h P(J= 1|h ) B 0 0.03 1 0.8 1 J 0 0.3 0 � P(J | H, B=0) = P(J | H, B=1) ∀ J, H ! ⇒ P( J | H, B) = P(J | H) � J is INDEPENDENT of B , once we know H � Don’t need B → J arc! 16

  17. Encoding Causal Links P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h P(J= 1|h ) B 0 0.03 1 0.8 0 0.3 J � P(J | H, B=0) = P(J | H, B=1) ∀ J, H ! ⇒ P( J | H, B) = P(J | H) � J is INDEPENDENT of B , once we know H � Don’t need B → J arc! 17

  18. Sufficient Belief Net P(H= 1) H 0.05 h P(B= 1 | H= h) 1 0.95 h P(J= 1|h ) B 0 0.03 1 0.8 0 0.3 J � Requires: P(H=1) known P(J=1 | H=1) known P(B=1 | H=1) known (Only 5 parameters, not 7) P (J=0 | H=1) 1 P (H=1 | B=1, J=0 ) = P (H=1) P (B=1 | H=1) P(J=0 |B=1,H=1) Hence: α 18

  19. H “Factoring” B � B does depend on J: J If J=1, then likely that H=1 ⇒ B =1 � but… ONLY THROUGH H: � If know H=1, then likely that B=1 � … doesn’t matter whether J=1 or J=0 ! ⇒ P(J=0 | B=1, H=1) = P(J=0 | H=1) N.b., B and J ARE correlated a priori P(J | B ) ≠ P(J) GIVEN H, they become uncorrelated P(J | B, H) = P(J | H) 19

  20. Factored Distribution � Symptoms independent, given Disease ≠ P ( B | J ) P ( B ) but H Hepatitis P ( B | J,H ) = P ( B | H ) J Jaundice B (positive) Blood test � ReadingAbility and ShoeSize are dependent, P (ReadAbility | ShoeSize ) ≠ P (ReadAbility ) but become independent, given Age P (ReadAbility | ShoeSize, Age ) = P (ReadAbility | Age) Age ShoeSize Reading 20

  21. “Naïve Bayes” � Classification Task: Given { O 1 = v 1 , …, O n = v n } Find h i that maximizes (H = h i | O 1 = v 1 , …, O n = v n ) P(H = h i ) H P(O j = v j | H = h i ) � Given ... O 1 O 2 O n Independent: P(O j | H, O k ,…) = P(O j | H) 1 ∏ = = = = = = = P ( H h | O v ..., O v ) P ( H h ) P ( O v | H h ) α i 1 1 n n i j j i j � Find argmax {h i } 21

  22. H Naïve Bayes (con’t) ... O 1 O 2 O n 1 ∏ = = = = = = = P ( H h | O v ..., O v ) P ( H h ) P ( O v | H h ) α i 1 1 n n i j j i j ∑ ∏ α = = = = = = = P ( O v ,..., O v ) P ( H h ) P ( O v | H h ) Normalizing term � 1 1 n n i j j i i j (No need to compute, as same for all h i ) � Easy to use for Classification � Can use even if some v j s not specified � If k Dx ’s and n O i s, requires only k priors, n * k pairwise-conditionals (Not 2 n+k … relatively easy to learn) 2 n+1 – 1 n 1+2n 10 21 2,047 22 30 61 2,147,438,647

  23. Bigger Networks P(I= 1) P(H= 1) GeneticPH LiverTrauma 0.20 0.32 g lt P(H= 1|g ,lt ) 1 1 0.82 Hepatitis 1 0 0.10 0 1 0.45 0 0 0.04 h P(J= 1| h ) Jaundice Bloodtest 1 0.8 h P(B= 1| h ) 0 0.3 1 0.98 0 0.01 Intuition: Show CAUSAL connections: � GeneticPH CAUSES Hepatitis; Hepatitis CAUSES Jaundice � If GeneticPH, then expect Jaundice: GeneticPH ⇒ Hepatitis ⇒ Jaundice But only via Hepatitis: GeneticPH and not Hepatitis ⇒ Jaundice P ( J | G ) ≠ P ( J ) but P ( J | G,H ) = P ( J | H) 23

  24. Belief Nets � DAG structure � Each node ≡ Variable v � v depends (only) on its parents + conditional prob: P(v i | parent i = 〈 0,1,… 〉 ) � v is INDEPENDENT of non-descendants, given assignments to its parents D I Given H = 1, - D has no influence on J H - J has no influence on B - etc. J B 24

  25. Less Trivial Situations N.b., obs 1 is not always independent of obs 2 given H • Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression • MotherSuicide causes Depression (w/ or w/o F.H.Depression) P(FHD=1) FHD 0.001 f P(MS=1 | FHD=f) 1 0.10 MS f m P(D=1 | FHD=f, MS=m) 0 0.03 1 1 0.97 1 0 0.90 D 0 1 0.08 0 0 0.04 • Here, P ( D | MS, FHD ) ≠ P ( D | FHD ) ! � Can be done using Belief Network, but need to specify: P( FHD ) 1 P( MS | FHD ) 2 P( D | MS, FHD ) 4 25

  26. 26 Example: Car Diagnosis

  27. 27 MammoNet

  28. 28 A Logical Alarm Reduction Mechanism • 8 diagnoses, 16 findings, … ALARM

  29. 29 Troup Detection

  30. 30 ARCO1: Forecasting Oil Prices

  31. 31 ARCO1: Forecasting Oil Prices

  32. 32 Forecasting Potato Production

  33. 33 Warning System

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend