T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai - PowerPoint PPT Presentation

Bayesian Networks Probabilistic Inference Estimating Parameters T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam¨ aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Outline Bayesian Networks 1 Reminders Inference Finding the Structure of the Network Probabilistic Inference 2 Bernoulli Process Posterior Probabilities Estimating Parameters 3 Estimates from Posterior Bias and Variance Conclusion AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Rules of Probability P ( E , F ) = P ( F , E ): probability of both E and F happening. P ( E ) = � F P ( E , F ) (sum rule, marginalization) P ( E , F ) = P ( F | E ) P ( E ) (product rule, conditional probability) Consequence: P ( F | E ) = P ( E | F ) P ( F ) / P ( E ) (Bayes’ formula) We say E and F are independent if P ( E , F ) = P ( E ) P ( F ) (for all E and F ). We say E and F are conditionally independent given G if P ( E , F | G ) = P ( E | G ) P ( F | G ), or equivalently P ( E | F , G ) = P ( E | G ). AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Bayesian Networks Bayesian network is a directed acyclic graph (DAG) that describes a joint distribution over the vertices X 1 ,. . . , X d such that d � P ( X 1 , . . . , X d ) = P ( X i | parents ( X i )) , i =1 where parents ( X i ) are the set of vertices from which there is an edge to X i . C A B P ( A , B , C ) = P ( A | C ) P ( B | C ) P ( C ). AB ( A and B are conditionally independent given C .) Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Outline Bayesian Networks 1 Reminders Inference Finding the Structure of the Network Probabilistic Inference 2 Bernoulli Process Posterior Probabilities Estimating Parameters 3 Estimates from Posterior Bias and Variance Conclusion AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks When structure of the Bayesian network and the P� (� C� )=0.5� Cloudy� probability factors are known, one usually wants to P� (� S � | � C� )=0.1� P� (� R � | � C� )=0.8� P� (� S � | ~� C� )=0.5� P� (� R � | ~� C� )=0.1� do inference by computing conditional probabilities. Sprinkler� Rain� This can be done with the P� (� W � | � R� ,� S� )=0.95� (� | � )=0.1� P� F � R� help of the sum and product (� | � ,~� )=0.90� P� W � R� S� P� (� F � | ~� R� )=0.7� P� (� W � | ~� R� ,� S� )=0.90� rules. P� (� W � | ~� R� ,~� S� )=0.10� Wet grass� rooF� Example: probability of the cat being on roof if it is Figure 3.5 of Alpaydin (2004). cloudy, P ( F | C )? AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks Example: probability of the cat being on roof if it is cloudy, P ( F | C )? Cloudy S , R and W are unknown or hidden variables. F and C are observed variables. Sprinkler Rain Conventionally, we denote the observed variables by gray nodes (see figure on the right). Wet grass rooF We use the product rule P ( F | C ) = P ( F , C ) / P ( C ), where P ( C , S , R , W , F ) = P ( C ) = � F P ( F , C ). P ( F | R ) P ( W | S , R ) P ( S | C ) P ( R | We must sum over or marginalize over C ) P ( C ) hidden variables S , R and W : P ( F , C ) = AB � � � W P ( C , S , R , W , F ). S R Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks P ( F , C ) = P ( C , S , R , W , F ) + P ( C , − S , R , W , F ) Cloudy + P ( C , S , − R , W , F ) + P ( C , − S , − R , W , F ) + P ( C , S , R , − W , F ) + P ( C , − S , R , − W , F ) + P ( C , S , − R , − W , F ) + P ( C , − S , − R , − W , F ) Sprinkler Rain We obtain similar formula for P ( F , − C ), Wet grass rooF P ( − F , C ) and P ( − F , − C ). Notice: we have used shorthand F to P ( C , S , R , W , F ) = denote F = 1 and − F to denote F = 0. P ( F | R ) P ( W | In principle, we know the numeric value of S , R ) P ( S | C ) P ( R | each joint distribution, hence we can C ) P ( C ) compute the probabilities. AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks There are 2 5 terms in the sums. Generally: marginalization is NP-hard, the Cloudy most staightforward approach would involve a computation of O (2 d ) terms. We can often do better by smartly Sprinkler Rain re-arranging the sums and products. Behold: Wet grass rooF Do the marginalization over W first: P ( C , S , R , F ) = � W P ( F | R ) P ( W | P ( C , S , R , W , F ) = S , R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | P ( F | R ) P ( W | R ) � W [ P ( W | S , R )] P ( S | C ) P ( R | S , R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | R ) P ( S | C ) P ( R | C ) P ( C ) C ) P ( C ). AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Inference in Bayesian Networks Now we can marginalize over S easily: P ( C , R , F ) = � S P ( F | R ) P ( S | C ) P ( R | C ) P ( C ) = P ( F | R ) � S [ P ( S | C )] P ( R | Cloudy C ) P ( C ) = P ( F | R ) P ( R | C ) P ( C ). We must still marginalize over R: Sprinkler Rain P ( C , F ) = P ( F | R ) P ( R | C ) P ( C ) + P ( F | − R ) P ( − R | C ) P ( C ) = 0 . 1 × 0 . 8 × 0 . 5 + 0 . 7 × 0 . 2 × 0 . 5 = 0 . 11. Wet grass rooF P ( C , − F ) = P ( − F | R ) P ( R | C ) P ( C )+ P ( − F | − R ) P ( − R | C ) P ( C ) = P ( C , S , R , W , F ) = 0 . 9 × 0 . 8 × 0 . 5 + 0 . 3 × 0 . 2 × 0 . 5 = 0 . 39. P ( F | R ) P ( W | P ( C ) = P ( C , F ) + P ( C , − F ) = 0 . 5. S , R ) P ( S | C ) P ( R | C ) P ( C ) P ( F | C ) = P ( C , F ) / P ( C ) = 0 . 22. AB P ( − F | C ) = P ( C , − F ) / P ( C ) = 0 . 78. Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Bayesian Networks: Inference To do inference in Bayesian networks one has to marginalize over variables. For example: P ( X 1 ) = � X 2 . . . � X d P ( X 1 , . . . , X d ). If we have Boolean arguments the sum has O (2 d ) terms. This is inefficient! Generally, marginalization is a NP-hard problem. If Bayesian Network is a tree: Sum-Product Algorithm (a special case being Belief Propagation). If Bayesian Network is “close” to a tree: Junction Tree Algorithm. Otherwise: approximate methods (variational approximation, MCMC etc.) AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Idea: sum of products is difficult to compute. Product of sums is easy to compute, if sums have been re-arranged smartly. Example: disconnected Bayesian network with d vertices, computing P ( X 1 ). sum of products: P ( X 1 ) = � X 2 . . . � X d P ( X 1 ) . . . P ( X d ). product of sums: �� P ( X 1 ) = P ( X 1 ) X 2 P ( X 2 ) . . . X d P ( X d ) = P ( X 1 ). Sum-Product Algorithm works if the Bayesian Network is directed tree. For details, see e.g., Bishop (2006). AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Example D A B C P ( A , B , C , D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) Task: compute ˜ P ( D ) = � � � C P ( A , B , C , D ). A B AB Kai Puolam¨ aki T-61.3050

Bayesian Networks Reminders Probabilistic Inference Inference Estimating Parameters Finding the Structure of the Network Sum-Product Algorithm Example D D P(A|D) P(B|D) P(C|D) P(D) A B C A B C P ( A , B , C , D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) Factor graph is composed of vertices (ellipses) and factors (squares), describing the factors of the joint probability. The Sum-Product Algorithm re-arranges the product (check!): X ! X ! X ! ˜ P ( D ) = P ( A | D ) P ( B | D ) P ( C | D ) P ( D ) A B C X X X = P ( A , B , C , D ) . (1) AB A B C Kai Puolam¨ aki T-61.3050

T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai - PowerPoint PPT Presentation

Bayesian Networks Probabilistic Inference Estimating Parameters T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer

T-61.3050 Machine Learning: Basic Principles Model Selection Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Introduction Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Multivariate Methods Kai Puolam aki Laboratory

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Bayesian networks Independence Bayesian networks Markov conditions Inference by

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

Bayes Nets (Ch. 14) Announcements Homework 1 posted Bayesian Network A Bayesian network (Bayes

Conditional Independence in Testing Bayesian Networks Yujia Shen, Haiying Huang, Arthur Choi,

Applied Machine Learning Applied Machine Learning Naive Bayes Siamak Ravanbakhsh Siamak

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

Bayesian Deep Learning and Restricted Boltzmann Machines Narada Warakagoda Forsvarets

The LESO-PB building building control system 0.0015 Density estimate 0.0010 0.0005 0.0000 0