Bayes Networks 3 Robert Platt Northeastern University All slides - PowerPoint PPT Presentation

Bayes Networks 3 Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley

Bayes’ Nets  Representation  Conditional Independences  Probabilistic Inference  Enumeration (exact, exponential complexity)  Variable elimination (exact, worst-case exponential complexity, often better)  Inference is NP-complete  Sampling (approximate)  Learning Bayes’ Nets from Data

Inference  Inference: calculating  Examples: some useful quantity from  Posterior probability a joint probability distribution  Most likely explanation:

Inference by Enumeration * Works fjne   General case: We want: with multiple query  Evidence variables: variables, too  Query* variable: All  Hidden variables: variables   Step 3: Step 2: Sum out H to get Step 1: Select the joint of Query and Normalize entries consistent with the evidence evidence

Inference by Enumeration in Bayes’ Net  Given unlimited time, inference in BNs is easy B E  Reminder of inference by enumeration by example: A J M

Inference by Enumeration?

Inference by Enumeration vs. Variable Elimination  Why is inference by enumeration  Idea: interleave joining and so slow? marginalizing!  You join up the whole joint distribution  Called “Variable Elimination” before you sum out the hidden  Still NP-hard, but usually much faster variables than inference by enumeration  First we’ll need some new notation: factors

Factor Zoo Summary  In general, when we write P(Y 1 … Y N | X 1 … X M )  It is a “factor,” a multi-dimensional array  Its values are P(y 1 … y N | x 1 … x M )  Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array

Example: Traffjc Domain  Random Variables +r 0.1  R: Raining -r 0.9 R  T: T raffjc  L: Late for class! +r +t 0.8 T +r -t 0.2 -r +t 0.1 -r -t 0.9 L +t +l 0.3 +t -l 0.7 -t +l 0.1 -t -l 0.9

Inference by Enumeration: Procedural Outline  Track objects called factors  Initial factors are local CPT s (one per node) +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9  Any known values are selected  E.g. if we know , the initial factors are +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 -t +l 0.1 -r +t 0.1 -r -t 0.9  Procedure: Join all factors, then eliminate all hidden variables

Operation 1: Join Factors  First basic operation: joining factors  Combining factors:  Just like a database join  Get all factors over the joining variable  Build a new factor over the union of the variables involved  Example: Join on R R +r 0.1 +r +t 0.8 +r +t 0.08 R,T -r 0.9 +r -t 0.2 +r -t 0.02 -r +t 0.1 -r +t 0.09 T -r -t 0.9 -r -t 0.81  Computation for each entry: pointwise products

Example: Multiple Joins

Example: Multiple Joins +r 0.1 R -r 0.9 Join R Join T +r +t 0.08 R, T, L +r -t 0.02 -r +t 0.09 T +r +t 0.8 R, T -r -t 0.81 +r -t 0.2 -r +t 0.1 0.024 +r +t +l -r -t 0.9 0.056 +r +t -l L L 0.002 +r -t +l 0.018 +r -t -l +t +l 0.3 +t +l 0.3 0.027 -r +t +l +t -l 0.7 +t -l 0.7 0.063 -r +t -l -t +l 0.1 -t +l 0.1 0.081 -r -t +l -t -l 0.9 -t -l 0.9 0.729 -r -t -l

Operation 2: Eliminate  Second basic operation: marginalization  T ake a factor and sum out a variable  Shrinks a factor to a smaller one  A projection operation  Example: +r +t 0.08 +t 0.17 +r -t 0.02 -t 0.83 -r +t 0.09 -r -t 0.81

Multiple Elimination R, T, L T, L L 0.024 +r +t +l Sum Sum 0.056 +r +t -l out T out R 0.002 +r -t +l 0.018 +r -t -l +t +l 0.051 +l 0.134 0.027 -r +t +l +t -l 0.119 -l 0.886 0.063 -r +t -l -t +l 0.083 0.081 -r -t +l -t -l 0.747 0.729 -r -t -l

Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration)

Marginalizing Early (= Variable Elimination)

Traffjc Domain R  Inference by  Variable Elimination T Enumeration L Join on r Join on r Join on t Eliminate r Eliminate r Join on t Eliminate t Eliminate t

Marginalizing Early! (aka VE) Join R Sum out T Sum out R Join T +r +t 0.08 +r 0.1 +r -t 0.02 +t 0.17 -r 0.9 -r +t 0.09 -t 0.83 -r -t 0.81 R T T, L R, T L +r +t 0.8 +r -t 0.2 -r +t 0.1 T L -r -t 0.9 L +t +l 0.051 +l 0.134 +t -l 0.119 -l 0.866 -t +l 0.083 L +t +l 0.3 +t +l 0.3 -t -l 0.747 +t +l 0.3 +t -l 0.7 +t -l 0.7 +t -l 0.7 -t +l 0.1 -t +l 0.1 -t +l 0.1 -t -l 0.9 -t -l 0.9 -t -l 0.9

Evidence  If evidence, start with factors that select that evidence  No evidence uses these initial factors: +r 0.1 +r +t 0.8 +t +l 0.3 -r 0.9 +r -t 0.2 +t -l 0.7 -r +t 0.1 -t +l 0.1 -r -t 0.9 -t -l 0.9  Computing , the initial factors become: +r 0.1 +r +t 0.8 +t +l 0.3 +r -t 0.2 +t -l 0.7 -t +l 0.1 -t -l 0.9  We eliminate all vars other than query + evidence

Evidence II  Result will be a selected joint of query and evidence  E.g. for P(L | +r), we would end up with: Normalize +r +l 0.026 +l 0.26 +r -l 0.074 -l 0.74  T o get our answer, just normalize this!  That ’s it!

General Variable Elimination  Query:  Start with initial factors:  Local CPT s (but instantiated by evidence)  While there are still hidden variables (not Q or evidence):  Pick a hidden variable H  Join all factors mentioning H  Eliminate (sum out) H  Join all remaining factors and normalize

Example Choose A

Example Choose E Finish with B Normalize

Same Example in Equations marginal can be obtained from joint by summing out use Bayes’ net joint distribution expression use x*(y+z) = xy + xz joining on a, and then summing out gives f 1 use x*(y+z) = xy + xz joining on e, and then summing out gives f 2 All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational effjciency!

Another Variable Elimination Example Computational complexity critically depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2 --- as they all only have one variable (Z, Z, and X 3 respectively).

Variable Elimination Ordering  For the query P(X n |y 1 ,…,y n ) work through the following two difgerent orderings as done in previous slide: Z, X 1 , …, X n-1 and X 1 , …, X n-1 , Z. What is the size of the maximum factor generated for each of the orderings? … …  Answer: 2 n+1 versus 2 2 (assuming binary)  In general: the ordering can greatly afgect effjciency.

VE: Computational and Space Complexity  The computational and space complexity of variable elimination is determined by the largest factor  The elimination ordering can greatly afgect the size of the largest factor.  E.g., previous slide’s example 2 n vs. 2  Does there always exist an ordering that only results in small factors?  No!

Worst Case Complexity?  CSP: … …  If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution.  Hence inference in Bayes’ nets is NP-hard. No known effjcient probabilistic inference in general.

Polytrees  A polytree is a directed graph with no undirected cycles  For poly-trees you can always fjnd an ordering that is effjcient  T ry it!!  Cut-set conditioning for Bayes’ net inference  Choose set of variables such that if removed only a polytree remains  Exercise: Think about how the specifjcs would work out!

Bayes’ Nets  Representation  Conditional Independences  Probabilistic Inference  Enumeration (exact, exponential complexity)  Variable elimination (exact, worst- case exponential complexity, often better)  Inference is NP-complete  Sampling (approximate)  Learning Bayes’ Nets from Data

Bayes Networks 3 Robert Platt Northeastern University All slides - PowerPoint PPT Presentation

Bayes Networks 3 Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Bayes Nets Representation Conditional Independences Probabilistic Inference Enumeration (exact, exponential

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Reasoning with Bayes Bayes Networks Networks Reasoning with Course: CS40022 Course: CS40022

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes

Discussion of Capital Flows and the Adjustment to Common Shocks in a Two-Country Business

Threshold-based Fall Detection on Smart Phones Sebastian Fudickar, Alexander Lindemann, Bettina

draft-winter-opsawg-eap-metadata Why this work ? IETF has produced a great standard for

Photons in LHC data @ 7 TeV Results from ATLAS and CMS Mathieu Aurousseau (LAPP) On behalf of

K form factors from a dispersive approach Emilie Passemar* Indiana University/Jefferson

1. Factor analysis: Is the econometric way to model cross-sectional correla- tions in a group of

CAPM, Factor Models and APT Corporate Finance and Incentives Lars Jul Overby Department of

Higher Order Corrections John Campbell University of Glasgow Overview of the lectures

Sambuz

Useful Links

Newsletter

Mail Us