probabilistic reasoning with bayesian networks
play

Probabilistic Reasoning with Bayesian Networks course notes 2019 - PowerPoint PPT Presentation

Probabilistic Reasoning with Bayesian Networks course notes 2019 L.C. van der Gaag, S. Renooij c UU ICS Master Programmes: Computing Science Artificial Intelligence 1 / 383 Probabilistic reasoning with Bayesian networks Silja Renooij


  1. Probabilistic Reasoning with Bayesian Networks course notes 2019 � L.C. van der Gaag, S. Renooij c UU – ICS Master Programmes: Computing Science Artificial Intelligence 1 / 383

  2. Probabilistic reasoning with Bayesian networks Silja Renooij ( s.renooij@uu.nl ) Lecturer: probability theory & graph theory Prerequisites: syllabus & slides & studymanual Literature: lectures & exercises (formative self assessment) Form: (tip: discuss exercises on Blackboard forum) practical assignments (formative) Grading: & written exam (summative) Additional see course website: info: http://www.cs.uu.nl/docs/vakken/prob/ 2 / 383

  3. Chapter 1: Introduction 3 / 383

  4. Reasoning under uncertainty In numerous application areas of knowledge-based decision-support systems we have • uncertainty concerning the general domain knowledge; • problem-specific information that is often uncertain, incomplete and even contradictory. A decision-support system should be capable of dealing with these types of knowledge. 4 / 383

  5. Application of probability theory Consider a discrete joint probability distribution Pr on a set of random variables V = { V 1 , . . . , V n } . In general we have that: • the representation of Pr requires exponential space consider e.g. n = 2 binary-valued variables, or n = 40 ; what if they have 5 values each? (and how do you get the numbers?) • calculating the (conditional) probability of a value of a variable by conditioning and marginalisation requires exponential time consider e.g. computing Pr( V 1 = true ) from Pr( V ) , or Pr( V 1 = true | V 2 = true ) This cannot be improved without additional knowledge about the probability distribution. 5 / 383

  6. Diagnosis problem: pioneering in the 1960s Let H = { h 1 , . . . , h n } , n ≥ 1 , be a set of hypotheses, and let E = { e 1 , . . . , e m } , m ≥ 1 , be a set of relevant findings (evidence). Determine the ’best’ diagnosis given findings e ⊆ E . The approach : Compute for each h ⊆ H the probability Pr( h | e ) = Pr( e | h ) Pr( h ) Pr( e ) Drawback : An exponential number of probabilities need to be computed; storage is also exponential. 6 / 383

  7. Pioneering in the 1960s Determine the diagnosis given findings e ⊆ E . The approach : Assume h i ∈ H mutually exclusive, and collectively exhaustive: ∪ n i =1 { h i } = Ω . Then, compute for each h i ∈ H : Pr( h i | e ) = Pr( e | h i ) Pr( h i ) Pr( e | h i ) Pr( h i ) = � n Pr( e ) k =1 Pr( e | h k ) Pr( h k ) Drawback : We compute only n − 1 probabilities, but computation still requires an exponential number of probabilities. 7 / 383

  8. Pioneering in the 1960s Determine the diagnosis given findings e = { e p , . . . , e q } , 1 ≤ p, q ≤ m . The approach : Assume in addition that all findings e 1 , . . . , e m are conditionally independent given h i , i = 1 , . . . , n . Then: Pr( e p , . . . , e q | h i ) Pr( h i ) Pr( h i | e ) = � n k =1 Pr( e p , . . . , e q | h k ) Pr( h k ) Pr( e p | h i ) · . . . · Pr( e q | h i ) Pr( h i ) = � n k =1 Pr( e p | h k ) · . . . · Pr( e q | h k ) Pr( h k ) Benefit : Only m · n conditional probabilities and n − 1 prior probabilities are required for the computation. 8 / 383

  9. GLADYS GLADYS (GLASGOW DYSPEPSIA SYSTEM) is a system for diagnosing dyspepsia. The global structure of the system: Interview developed with Probabilistic Differential component data collected from diagnosis ± 1200 patients. Therapy selection D.J. Spiegelhalter, R.P . Knill-Jones (1984). Statistical and knowledge-based approaches to clinical decision-support systems with an application in gastroenterology, Journal of the Royal Statistical Society (Series A), vol. 147, pp. 35-77. 9 / 383

  10. Symptoms and diseases Context: patients with an Ulcer. Question: which type? duodenal ulcer gastric ulcer ( n = 248) ( n = 43 ) Sex: male 169 17 female 79 26 Age: < 26 43 1 26 - 40 82 5 41 - 55 87 19 > 55 36 18 Daily pain: yes 21 11 no 214 27 Effect food worsens 44 11 on pain: no effect 82 9 relieves 104 17 probability 0.85 0.15 10 / 383

  11. The idea Let Pr be a joint distribution on the diagnosis search space including hypothesis h and observed findings e . The prior odds for h , and posterior odds for h given e , are defined by 1 − Pr( h ) = Pr( h ) Pr( h ) O ( h | e ) = Pr( h | e ) O ( h ) = Pr( ¬ h ) , and Pr( ¬ h | e ) Assume that all findings e i ∈ e are conditionally independent given h , then Pr( e | h ) · Pr( h ) Pr( e i | h ) � O ( h | e ) = Pr( e | ¬ h ) · Pr( ¬ h ) = Pr( e i | ¬ h ) · O ( h ) i Now consider the following transformation: 10 · ln O ( h | e ) . . . 11 / 383

  12. The idea (cntd) Applying the transformation 10 · ln to λ i · O ( h ) , where λ i = Pr( e i | h ) � O ( h | e ) = Pr( e i | ¬ h ) i results in a score s : � � s = 10 · ln O ( h | e ) = 10 · ln O ( h )+ 10 · ln λ i = w 0 + w i i i where w i is a weight for finding e i . The probability Pr( h | e ) is now computed from s O ( h | e ) e 1 10 Pr( h | e ) = 1 + O ( h | e ) = 10 = s 1 + e − s 1 + e 10 12 / 383

  13. A scoring system h : duodenal ulcer (du) ¬ h : gastric ulcer (gu) ( n = 248) ( n = 43 ) male (m) 169 17 female (f) 79 26 Calculation of probabilities, likelihood ratios and weights: Pr( m | du ) = 169 248 ∼ 0 . 68 , Pr( m | gu ) ∼ 0 . 40 ⇒ λ m = Pr( m | du ) Pr( m | gu ) = 0 . 68 0 . 40 ∼ 1 . 7 = ⇒ w m = 10 · ln λ m ∼ 5 Pr( f | du ) = 79 248 ∼ 0 . 32 , Pr( f | gu ) ∼ 0 . 60 ⇒ λ f = Pr( f | du ) Pr( f | gu ) = 0 . 32 0 . 60 ∼ 0 . 53 = ⇒ w f = 10 · ln λ f ∼ − 6 13 / 383

  14. Symptoms and their weights duodenal ulcer gastric ulcer weight ( n = 248) ( n = 43 ) Sex: male 169 17 5 female 79 26 − 6 Age: < 26 43 1 18 26 - 40 82 5 10 41 - 55 87 19 − 2 > 55 36 18 − 10 Daily pain: yes 21 11 − 12 no 214 27 3 Effect food worsens 44 11 − 4 on pain: no effect 82 9 4 relieves 104 17 0 prior 0.85 0.15 17 14 / 383

  15. An example diagnosis A 30 year old woman reports to the clinic. She has pain in the abdominal area, but not on a daily basis; the pain worsens as soon as she eats. Calculation of the score: • the initial score: + 17 • the patient is female: − 6 • her age is 30: + 10 • she is in pain, but not every day: + 3 • − 4 food intake worsens the pain: + 20 Given that the patient has one of the two diseases, duodenal ulcer and gastric ulcer, she has with probability 10 ) − 1 ≈ 1 . 14 − 1 ≈ 0 . 88 (1 + e − 20 a duodenal ulcer and a gastric ulcer with probability 0.12. 15 / 383

  16. Reviewing ‘Idiot’s Bayes’ The naive Bayes approach is • mathematically correct, and • computationally easy. However • underlying assumptions usually unacceptable; • and, at the time , for larger applications • # of hypotheses often large → undoable to compute each Pr( h i | e ) ; • often not enough information for reliable probability assessments. 16 / 383

  17. History: diagnosis in the 1970s h i h 1 h 2 h n HY POTHESES : Pr ( h n | e 2 ∧ e m ) e j e 2 e m e 1 FINDINGS : The most likely hypothesis given observed findings is determined as follows: • prune the search space using heuristic rules; • approximate the missing probabilities required, for example with: Pr( e i ∧ e j ) = min { Pr( e i ) , Pr( e j ) } ; • select the hypothesis with the highest probability. 17 / 383

  18. Reviewing the quasi-probabilistic models The quasi-probabilistic models are • computationally easy, and • easy to use, even for larger applications. However, these models are • mathematically incorrect, and • even as an approximation model not convincing. 18 / 383

  19. The rehabilitation of probability theory in the 1980s Judea Pearl introduces Bayesian belief networks as representational device • + algorithms for inferring (computing) ’beliefs’ from those represented • first for trees and polytrees (singly connected graphs) • then for multiply-connected graphs • for the latter, the algorithm by Steffen Lauritzen & David Spiegelhalter was the first to find wide-spread use. Also see “Inference in Bayesian Networks: a Historical Perspective”, by Adnan Darwiche 19 / 383

  20. The Bayesian network framework A Bayesian network is a very compact representation of a joint probability distribution Pr . Such a network comprises • qualitative knowledge of Pr : a graphical representation of the independences between the variables involved; • quantitative knowledge of Pr : conditional probability distributions that describe Pr ‘locally’ per group of variables. Associated with a Bayesian network are algorithms for computing probabilities and for processing evidence. 20 / 383

  21. An example: Classical Swine Fever (CSF) The classical swine fever network is a decision-support system for the early detection of classical swine fever (varkenspest). • early detection of CSF is important, but hard; • the network has been developed in cooperation with 2 veterinarians of the Central Veterinary Institute of Wageningen UR; • part of european EPIZONE project; • veterinarians all over the country collected data with PDAs 22 / 383

  22. The Classical swine fever network: initial graphical structure 23 / 383

  23. The Classical swine fever network: probability tables Pr( Appetite | BodyTemp ∧ Malaise ) 24 / 383

  24. Classical swine fever: prior probabilities Faeces Prim. Other Infection Reproduction phase Respiratory problems 25 / 383

  25. Classical swine fever: diagnostic reasoning 26 / 383

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend