Probabilistic Graphical Models Part II: Undirected Graphical Models
Selim Aksoy
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Fall 2018
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 1 / 27
Probabilistic Graphical Models Part II: Undirected Graphical Models - - PowerPoint PPT Presentation
Probabilistic Graphical Models Part II: Undirected Graphical Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall 2018 2018, Selim Aksoy (Bilkent University) c
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 1 / 27
◮ We looked at directed graphical models whose structure
◮ Undirected graphical models are useful where one cannot
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 2 / 27
◮ An example model that satisfies:
◮ (A ⊥ C|{B, D}) ◮ (B ⊥ D|{A, C}) ◮ No other independencies
◮ These independencies cannot be
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 3 / 27
◮ Four students are working together in pairs on a homework. ◮ Alice and Charles cannot stand each other, and Bob and
◮ Only the following pairs meet: Alice and Bob; Bob and
◮ The professor accidentally misspoke in the class, giving rise
◮ In study pairs, each student transmits her/his understanding
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 4 / 27
◮ Four binary random variables are defined, representing
◮ Assume that for each X ∈ {A, B, C, D}, x1 denotes the
◮ Alice and Charles never speak to each other directly, so A
◮ Similarly, B and D are conditionally independent given A
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 5 / 27
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 6 / 27
◮ How to parametrize this undirected graph? ◮ We want to capture the affinities between related variables. ◮ Conditional probability distributions cannot be used
◮ Marginals cannot be used because a product of marginals
◮ A general purpose function: factor (also called potential).
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 7 / 27
◮ Let D is a set of random variables.
◮ A factor φ is a function from Val(D) to R. ◮ A factor is nonnegative if all its entries are nonnegative. ◮ The set of variables D is called the scope of the factor.
◮ In the example in Figure 2, an example factor is
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 8 / 27
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 9 / 27
◮ The value associated with a particular assignment a, b
◮ For φ1, if A and B disagree, there is less weight. ◮ For φ3 , if C and D disagree, there is more weight. ◮ A factor is not normalized, i.e., the entries are not
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 10 / 27
◮ The Markov network defines the local interactions between
◮ To define a global model, we need to combine these
◮ We combine the local models by multiplying them as
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 11 / 27
◮ However, there is no guarantee that the result of this
◮ Thus, it is normalized as
◮ Z is known as the partition function.
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 12 / 27
Assignment Unnormalized Normalized a0 b0 c0 d0 300, 000 0.04 a0 b0 c0 d1 300, 000 0.04 a0 b0 c1 d0 300, 000 0.04 a0 b0 c1 d1 30 4.110−6 a0 b1 c0 d0 500 6.910−5 a0 b1 c0 d1 500 6.910−5 a0 b1 c1 d0 5, 000, 000 0.69 a0 b1 c1 d1 500 6.910−5 a1 b0 c0 d0 100 1.410−5 a1 b0 c0 d1 1, 000, 000 0.14 a1 b0 c1 d0 100 1.410−5 a1 b0 c1 d1 100 1.410−5 a1 b1 c0 d0 10 1.410−6 a1 b1 c0 d1 100, 000 0.014 a1 b1 c1 d0 100, 000 0.014 a1 b1 c1 d1 100, 000 0.014 CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 13 / 27
◮ There is a tight connection between the factorization of the
◮ For example, P |
◮ From the example in Figure 2,
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 14 / 27
◮ Factors do not correspond to either probabilities or to
◮ It is harder to estimate them from data. ◮ One idea for parametrization could be to associate
◮ This is not sufficient to parametrize a full distribution.
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 15 / 27
◮ A more general representation can be obtained by allowing
◮ Let X, Y, and Z be three disjoint sets of variables, and let
◮ We define the factor product φ1 × φ2 to be a factor
◮ The key aspect is the fact that the two factors φ1 and φ2 are
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 16 / 27
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 17 / 27
◮ Note that the factors are not marginals. ◮ In the misconception model, the marginal over A, B is
◮ A factor is only one contribution to the overall joint
◮ The distribution as a whole has to take into consideration
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 18 / 27
◮ We can use the more general notion of factor product to
◮ A distribution PΦ is a Gibbs distribution parametrized by a
◮ The Di are the scopes of the factors.
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 19 / 27
◮ If our parametrization contains a factor whose scope
◮ We say that a distribution PΦ with
◮ The factors that parametrize a Markov network are often
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 20 / 27
◮ If we observe some values, U = u, in the factor value table,
◮ Let H be a Markov network over X and U = u a context.
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 21 / 27
Grade Letter Job Happy Coherence SAT Intelligence Difficulty
(a)
Letter Job Happy Coherence SAT Intelligence Difficulty
(b)
Letter Job Happy Coherence Intelligence Difficulty
(c)
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 22 / 27
◮ Conditioning on a context U in Markov networks eliminates
◮ In a Bayesian network, conditioning on evidence can create
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 23 / 27
◮ Let H be a Markov network and let X1—. . .—Xk be a path
◮ Let Z ⊆ X be a set of observed variables. ◮ The path X1—. . .—Xk is active given Z if none of the Xi’s,
◮ A set of nodes Z separates X and Y in H, denoted
◮ We define the global independencies associated with H to
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 24 / 27
◮ Like in Bayesian networks, once the joint distribution is
◮ However, a key distinction between Markov networks and
◮ Markov networks use a global normalization constant called
◮ Bayesian networks involve local normalization within each
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 25 / 27
◮ The global factor couples all of the parameters across the
◮ The global parameter coupling has significant
◮ Even the simple maximum likelihood parameter estimation
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 26 / 27
◮ We generally have to resort to iterative methods such as
◮ The good news is that the likelihood objective is concave,
◮ The bad news is that each of the steps in the iterative
CS 551, Fall 2018 c 2018, Selim Aksoy (Bilkent University) 27 / 27