Probabilistic Graphical Models Part I: Bayesian Belief Networks
Selim Aksoy
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Fall 2015
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 1 / 27
Probabilistic Graphical Models Part I: Bayesian Belief Networks - - PowerPoint PPT Presentation
Probabilistic Graphical Models Part I: Bayesian Belief Networks Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015 2015, Selim Aksoy (Bilkent University) c 1 /
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 1 / 27
◮ Graphs are an intuitive way of representing and visualizing
◮ Probabilistic graphical models provide a tool to deal with
◮ Hence, they provide a compact representation of joint
◮ The graph structure specifies statistical dependencies
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 2 / 27
(a) Undirected graph (b) Directed graph
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 3 / 27
◮ Marginal independence:
◮ Conditional independence:
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 4 / 27
◮ Marginal and conditional independence examples:
◮ Amount of speeding fine ⊥ Type of car | Speed ◮ Lung cancer ⊥ Yellow teeth | Smoking ◮ (Position, Velocity)t+1 ⊥
◮ Child’s genes ⊥ Grandparents’ genes | Parents’ genes ◮ Ability of team A ⊥ Ability of team B ◮ not(Ability of team A ⊥
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 5 / 27
◮ Bayesian networks (BN) are probabilistic graphical models
◮ There are two components of a BN model: M = {G, Θ}.
◮ Each node in the graph G represents a random variable and
◮ The set Θ of parameters specifies the probability
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 6 / 27
◮ Edges represent
◮ Markov property: Each
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 7 / 27
◮ The joint probability of a set of variables x1, . . . , xn is given as
n
◮ The conditional independence relationships encoded in the
n
◮ Once we know the joint probability distribution encoded in the
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 8 / 27
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 9 / 27
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 10 / 27
◮ You have a new burglar alarm installed at home. ◮ It is fairly reliable at detecting burglary, but also sometimes
◮ You have two neighbors, Ali and Veli, who promised to call
◮ Ali always calls when he hears the alarm, but sometimes
◮ Veli likes loud music and sometimes misses the alarm. ◮ Given the evidence of who has or has not called, we would
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 11 / 27
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 12 / 27
◮ What is the probability that the alarm has sounded but
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 13 / 27
◮ What is the probability that there is a burglary given that Ali calls?
◮ What about if Veli also calls right after Ali hangs up?
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 14 / 27
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 15 / 27
◮ Suppose we observe the fact that the grass is wet. There
◮ We see that it is more likely that the grass is wet because it
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 16 / 27
◮ Example applications include:
◮ Machine learning ◮ Statistics ◮ Computer vision ◮ Natural language
◮ Speech recognition ◮ Error-control codes ◮ Bioinformatics ◮ Medical diagnosis ◮ Weather forecasting
◮ Example systems include:
◮ PATHFINDER medical diagnosis system at Stanford ◮ Microsoft Office assistant and troubleshooters ◮ Space shuttle monitoring system at NASA Mission Control
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 17 / 27
◮ Evaluation (inference) problem: Given the model and the
◮ Learning problem: Given training data and prior information
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 18 / 27
◮ If we observe the “leaves” and try to infer the values of the
◮ If we observe the “roots” and try to predict the effects, this is
◮ Exact inference is an NP-hard problem because the
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 19 / 27
◮ Some restricted classes of networks, namely the singly
◮ There are also clustering algorithms that convert multiply
◮ However, approximate inference methods such as
◮ sampling (Monte Carlo) methods ◮ variational methods ◮ loopy belief propagation
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 20 / 27
◮ The simplest situation is the one where the network structure is
◮ Other situations with increasing complexity are: known structure
Observability Structure Full Partial Known Maximum Likelihood Estimation EM (or gradient ascent) Unknown Search through model space EM + search through model space
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 21 / 27
◮ The joint pdf of the variables with parameter set Θ is
n
◮ Given training data X = {x1, . . . , xm} where
m
n
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 22 / 27
◮ The likelihood decomposes according to the structure of the
◮ An alternative is to assign a prior probability density
◮ We will study the special case of discrete variables with
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 23 / 27
◮ Let each discrete variable xi have ri possible values
◮ Given X, the MLE of θijk can be computed as
k=1 Nijk.
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 24 / 27
◮ Thus, learning just amounts to counting (in the case of
◮ For example, to compute the estimate for the W node in the
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 25 / 27
◮ Note that, if a particular event is not seen, it will be assigned
◮ We can avoid this using the Bayes estimate with a
k=1 αijk and Nij = ri k=1 Nijk as before. ◮ αij is sometimes called the equivalent sample size for the
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 26 / 27
◮ When the dependencies among the features are unknown, we
◮ This corresponds to the naive Bayesian network that gives the
n
x2 x1 xn w
CS 551, Fall 2015 c 2015, Selim Aksoy (Bilkent University) 27 / 27