Models CMSC 678 UMBC Announcement 1: Progress Report on Project - PowerPoint PPT Presentation

Undirected Probabilistic Graphical Models CMSC 678 UMBC

Announcement 1: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress you’ve made Discuss what remains to be done Discuss any new blocks you’ve experienced (or anticipate experiencing) Any questions?

Announcement 2: Assignment 4 Due Monday May 14 th , 11:59 AM Topic: probabilistic & graphical modeling

Recap from last time…

Hidden Markov Model Representation 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 | 𝑨 0 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 | 𝑨 𝑂−1 𝑞 𝑥 𝑂 |𝑨 𝑂 emission transition = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 | 𝑨 𝑗−1 probabilities/parameters probabilities/parameters 𝑗 … z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

v = double[N+2][K*] Viterbi Algorithm b = int[N+2][K*] backpointers/ v(i, B) is the v [*][*] = 0 book-keeping maximum probability of v [0][START] = 1 any paths to that state B from the for(i = 1; i ≤ N+1; ++ i) { beginning (and emitting the for(state = 0; state < K*; ++state) { observation) p obs = p emission (obs i | state) for(old = 0; old < K*; ++old) { p move = p transition (state | old) if( v [i-1][old] * p obs * p move > v [i][state]) { v [i][state] = v [i-1][old] * p obs * p move b[i][state] = old } computing v at time i-1 will correctly } incorporate (maximize over) paths through time i-2 : } we correctly obey the Markov property }

Marginal Probability (via the Forward Algorithm) 𝛽 𝑗 − 1, 𝑡 ′ ∗ 𝑞 𝑡 𝑡 ′ ) ∗ 𝑞(obs at 𝑗 | 𝑡) 𝛽 𝑗, 𝑡 = ෍ 𝑡 ′ what are the what’s the total probability how likely is it to get immediate ways to up until now? into state s this way? get into state s ? α(i, s ) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i Q: What do we return? (How do we A: α [N+1][end] return the likelihood of the sequence?) There’s an analogous backwards algorithm

With Both Forward and Backward Values α( i, s ) * β( i, s) = total probability of paths through state s at step i 𝑞 𝑨 𝑗 = 𝑡 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝛾(𝑗, 𝑡) 𝛽(𝑂 + 1, END ) α( i, s) * p( s’ | B) * p(obs at i+1 | s’) * β( i+1, s’ ) = total probability of paths through the s  s ’ arc (at time i) 𝑞 𝑨 𝑗 = 𝑡, 𝑨 𝑗+1 = 𝑡 ′ 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝑞 𝑡 ′ 𝑡 ∗ 𝑞 obs 𝑗+1 𝑡 ′ ∗ 𝛾(𝑗 + 1, 𝑡′) 𝛽(𝑂 + 1, END )

EM For HMMs α = computeForwards() (Baum-Welch β = computeBackwards() Algorithm) L = α [N+1][ END ] for(i = N; i ≥ 0; --i) { for(next = 0; next < K*; ++next) { c obs (obs i+1 | next) += α [i+1][next]* β [i+1][next]/L for(state = 0; state < K*; ++state) { u = p obs (obs i+1 | next) * p trans (next | state) c trans (next| state) += α [i][state] * u * β [i+1][next]/L } } }

Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 “parents of” topological sort

Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 exact inference in general DAGs is NP-hard inference in trees can be exact

D-Separation: Testing for Conditional Independence d-separation X & Y are d-separated if for all paths P, one of the following is true: Variables X & Y are P has a chain with an observed middle node conditionally independent given Z if all X Y (undirected) paths from P has a fork with an observed parent node (any variable in) X to (any variable in) Y are X Y d-separated by Z P includes a “v - structure” or “collider” with all unobserved descendants X Z Y

D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants X Z Y not observing Z blocks the path from X to Y

D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants not observing Z blocks X Z Y the path from X to Y 𝑞 𝑦, 𝑧, 𝑨 = 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) 𝑞 𝑦, 𝑧 = ෍ 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) = 𝑞 𝑦 𝑞 𝑧 𝑨

Markov Blanket the set of nodes needed to form the complete conditional for a variable x i 𝑞(𝑦 1 , … , 𝑦 𝑂 ) 𝑞 𝑦 𝑗 𝑦 𝑘≠𝑗 = ∫ 𝑞 𝑦 1 , … , 𝑦 𝑂 𝑒𝑦 𝑗 x ς 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) factorization = of graph ∫ ς 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗 factor out terms not dependent on x i Markov blanket of a node x ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) is its parents, children, and = children's parents ∫ ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗

Markov Random Fields: Undirected Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)

Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not cliques necessarily a probability!)

Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not A : 𝜔 𝐷 ≥ 0 (or 𝜔 𝐷 > 0 ) cliques necessarily a probability!)

Terminology: Potential Functions 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 energy function (for clique C) (get the total energy of a configuration by summing the individual energy functions) 𝜔 𝐷 𝑦 𝑑 = exp −𝐹(𝑦 𝐷 ) Boltzmann distribution

Ambiguity in Undirected Model Notation 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔(𝑦, 𝑧, 𝑨) X Y Z 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔 1 𝑦,𝑧 𝜔 2 𝑧,𝑨 𝜔 3 𝑦,𝑨

Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?

Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar 𝐹 𝑦, 𝑧 = ℎ ෍ 𝑦 𝑗 − 𝛾 ෍ 𝑦 𝑗 𝑦 𝑘 − 𝜃 ෍ 𝑦 𝑗 𝑧 𝑗 𝑗 𝑗𝑘 𝑗 x i and y i should allow for a bias be correlated

Models CMSC 678 UMBC Announcement 1: Progress Report on Project - PowerPoint PPT Presentation

Undirected Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress youve made Discuss what remains

DSGE Models: A User Guide for Policymakers Lawrence J. Christiano Outline Why models? Why

Seminar LIGHTING MODELS What is a light? Types of light Illumination models

From Conceptual Models From Conceptual Models to Simulation Models to Simulation Models Model

Factor Models: A Review James J. Heckman The University of Chicago Econ 312, Winter 2019

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Outline Viscous Flow Turbulence Mixing Length Models One-Equation Models

Regression 2: Mixed Models Marco Baroni Practical Statistics in R Outline Mixed models with

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Outline Contagion Contagion Basic Contagion Basic Contagion Models Models Complex Networks,

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

EE359 Lecture 2 Outline TX and RX Signal Models Path Loss Models Free-space and

Probabilistic Models Models describe how (a portion of) the world works Models are always

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Models and refined models for involutory reflection groups and classical Weyl groups FABRIZIO

Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The

Image Segmentation Philipp Kr ahenb uhl Stanford University April 24, 2013 Philipp Kr

Markov Decision Processes 2/23/18 Recall: State Space Search Problems A set of discrete

Markov Decision Processes CS60077: Reinforcement Learning Abir Das IIT Kharagpur July 26, Aug

Markov Networks [KF] Chapter 4 CS 786 University of Waterloo Lecture 7: May 24, 2012 Outline

Fields for Information Extraction S U N I T A S A R A W A G I A N D W I L L I A M C O H E N

Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang Biointelligence

t t sts