t 61 3050 machine learning basic principles
play

T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai - PowerPoint PPT Presentation

Bayesian Networks Probabilistic Inference Estimating Parameters T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and


  1. Bayesian Networks Probabilistic Inference Estimating Parameters T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam¨ aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 AB Kai Puolam¨ aki T-61.3050

  2. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Outline Bayesian Networks 1 Reminders Inference Finding the Structure of the Network Probabilistic Inference 2 Bernoulli Process Posterior Probabilities Estimating Parameters 3 Estimates from Posterior Bias and Variance Conclusion AB Kai Puolam¨ aki T-61.3050

  3. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Rules of Probability P ( E , F ) = P ( F , E ): probability of both E and F happening. P ( E ) = � F P ( E , F ) (sum rule, marginalization) P ( E , F ) = P ( F | E ) P ( E ) (product rule, conditional probability) Consequence: P ( F | E ) = P ( E | F ) P ( F ) / P ( E ) (Bayes’ formula) We say E and F are independent if P ( E , F ) = P ( E ) P ( F ) (for all E and F ). We say E and F are conditionally independent given G if P ( E , F | G ) = P ( E | G ) P ( F | G ), or equivalently P ( E | F , G ) = P ( E | G ). AB Kai Puolam¨ aki T-61.3050

  4. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Bayesian Networks Bayesian network is a directed acyclic graph (DAG) that describes a joint distribution over the vertices X 1 ,. . . , X d such that d � P ( X 1 , . . . , X d ) = P ( X i | parents ( X i )) , i =1 where parents ( X i ) are the set of vertices from which there is an edge to X i . C A B P ( A , B , C ) = P ( A | C ) P ( B | C ) P ( C ). AB ( A and B are conditionally independent given C .) Kai Puolam¨ aki T-61.3050

  5. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Outline Bayesian Networks 1 Reminders Inference Finding the Structure of the Network Probabilistic Inference 2 Bernoulli Process Posterior Probabilities Estimating Parameters 3 Estimates from Posterior Bias and Variance Conclusion AB Kai Puolam¨ aki T-61.3050

  6. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks When structure of the Bayesian network and the P� (� C� )=0.5� Cloudy� probability factors are known, one usually wants to P� (� S � | � C� )=0.1� P� (� R � | � C� )=0.8� P� (� S � | ~� C� )=0.5� P� (� R � | ~� C� )=0.1� do inference by computing conditional probabilities. Sprinkler� Rain� This can be done with the P� (� W � | � R� ,� S� )=0.95� (� | � )=0.1� P� F � R� help of the sum and product (� | � ,~� )=0.90� P� W � R� S� P� (� F � | ~� R� )=0.7� P� (� W � | ~� R� ,� S� )=0.90� rules. P� (� W � | ~� R� ,~� S� )=0.10� Wet grass� rooF� Example: probability of the cat being on roof if it is Figure 3.5 of Alpaydin (2004). cloudy, P ( F | C )? AB Kai Puolam¨ aki T-61.3050

  7. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks Example: probability of the cat being on roof if it is cloudy, P ( F | C )? Cloudy S , R and W are unknown or hidden variables. F and C are observed variables. Sprinkler Rain Conventionally, we denote the observed variables by gray nodes (see figure on the right). Wet grass rooF We use the product rule P ( F | C ) = P ( F , C ) / P ( C ), where P ( C , S , R , W , F ) = P ( C ) = � F P ( F , C ). P ( F | R ) P ( W | S , R ) P ( S | C ) P ( R | We must sum over or marginalize over C ) P ( C ) hidden variables S , R and W : P ( F , C ) = AB � � � W P ( C , S , R , W , F ). S R Kai Puolam¨ aki T-61.3050

  8. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks P ( F , C ) = P ( C , S , R , W , F ) + P ( C , − S , R , W , F ) Cloudy + P ( C , S , − R , W , F ) + P ( C , − S , − R , W , F ) + P ( C , S , R , − W , F ) + P ( C , − S , R , − W , F ) + P ( C , S , − R , − W , F ) + P ( C , − S , − R , − W , F ) Sprinkler Rain We obtain similar formula for P ( F , − C ), Wet grass rooF P ( − F , C ) and P ( − F , − C ). Notice: we have used shorthand F to P ( C , S , R , W , F ) = denote F = 1 and − F to denote F = 0. P ( F | R ) P ( W | In principle, we know the numeric value of S , R ) P ( S | C ) P ( R | each joint distribution, hence we can C ) P ( C ) compute the probabilities. AB Kai Puolam¨ aki T-61.3050

  9. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks There are 2 5 terms in the sums. Generally: marginalization is NP-hard, the Cloudy most staightforward approach would involve a computation of O (2 d ) terms. We can often do better by smartly Sprinkler Rain re-arranging the sums and products. Behold: Wet grass rooF Do the marginalization over W first: P ( C , S , R , F ) = � W P ( F | R ) P ( W | P ( C , S , R , W , F ) = S , R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | P ( F | R ) P ( W | R ) � W [ P ( W | S , R )] P ( S | C ) P ( R | S , R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | R ) P ( S | C ) P ( R | C ) P ( C ) C ) P ( C ). AB Kai Puolam¨ aki T-61.3050

  10. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks Now we can marginalize over S easily: P ( C , R , F ) = � S P ( F | R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | R ) � S [ P ( S | C )] P ( R | Cloudy C ) P ( C ) = P ( F | R ) P ( R | C ) P ( C ). We must still marginalize over R: Sprinkler Rain P ( C , F ) = P ( F | R ) P ( R | C ) P ( C ) + P ( F | − R ) P ( − R | C ) P ( C ) = 0 . 1 × 0 . 8 × 0 . 5 + 0 . 7 × 0 . 2 × 0 . 5 = 0 . 11. Wet grass rooF P ( C , − F ) = P ( − F | R ) P ( R | C ) P ( C )+ P ( − F | − R ) P ( − R | C ) P ( C ) = P ( C , S , R , W , F ) = 0 . 9 × 0 . 8 × 0 . 5 + 0 . 3 × 0 . 2 × 0 . 5 = 0 . 39. P ( F | R ) P ( W | P ( C ) = P ( C , F ) + P ( C , − F ) = 0 . 5. S , R ) P ( S | C ) P ( R | C ) P ( C ) P ( F | C ) = P ( C , F ) / P ( C ) = 0 . 22. AB P ( − F | C ) = P ( C , − F ) / P ( C ) = 0 . 78. Kai Puolam¨ aki T-61.3050

  11. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Bayesian Networks: Inference To do inference in Bayesian networks one has to marginalize over variables. For example: P ( X 1 ) = � X 2 . . . � X d P ( X 1 , . . . , X d ). If we have Boolean arguments the sum has O (2 d ) terms. This is inefficient! Generally, marginalization is a NP-hard problem. If Bayesian Network is a tree: Sum-Product Algorithm (a special case being Belief Propagation). If Bayesian Network is “close” to a tree: Junction Tree Algorithm. Otherwise: approximate methods (variational approximation, MCMC etc.) AB Kai Puolam¨ aki T-61.3050

  12. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Idea: sum of products is difficult to compute. Product of sums is easy to compute, if sums have been re-arranged smartly. Example: disconnected Bayesian network with d vertices, computing P ( X 1 ). sum of products: P ( X 1 ) = � X 2 . . . � X d P ( X 1 ) . . . P ( X d ). product of sums: �� � �� � P ( X 1 ) = P ( X 1 ) X 2 P ( X 2 ) . . . X d P ( X d ) = P ( X 1 ). Sum-Product Algorithm works if the Bayesian Network is directed tree. For details, see e.g., Bishop (2006). AB Kai Puolam¨ aki T-61.3050

  13. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Example D A B C P ( A , B , C , D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) Task: compute ˜ P ( D ) = � � � C P ( A , B , C , D ). A B AB Kai Puolam¨ aki T-61.3050

  14. Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Example D D P(A|D) P(B|D) P(C|D) P(D) A B C A B C P ( A , B , C , D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) Factor graph is composed of vertices (ellipses) and factors (squares), describing the factors of the joint probability. The Sum-Product Algorithm re-arranges the product (check!): X ! X ! X ! ˜ P ( D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) A B C X X X = P ( A , B , C , D ) . (1) AB A B C Kai Puolam¨ aki T-61.3050

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend