models
play

Models CMSC 678 UMBC Announcement 1: Progress Report on Project - PowerPoint PPT Presentation

Undirected Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress youve made Discuss what remains


  1. Undirected Probabilistic Graphical Models CMSC 678 UMBC

  2. Announcement 1: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress you’ve made Discuss what remains to be done Discuss any new blocks you’ve experienced (or anticipate experiencing) Any questions?

  3. Announcement 2: Assignment 4 Due Monday May 14 th , 11:59 AM Topic: probabilistic & graphical modeling

  4. Recap from last time…

  5. Hidden Markov Model Representation 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 | 𝑨 0 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 | 𝑨 𝑂−1 𝑞 𝑥 𝑂 |𝑨 𝑂 emission transition = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 | 𝑨 𝑗−1 probabilities/parameters probabilities/parameters 𝑗 … z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

  6. v = double[N+2][K*] Viterbi Algorithm b = int[N+2][K*] backpointers/ v(i, B) is the v [*][*] = 0 book-keeping maximum probability of v [0][START] = 1 any paths to that state B from the for(i = 1; i ≤ N+1; ++ i) { beginning (and emitting the for(state = 0; state < K*; ++state) { observation) p obs = p emission (obs i | state) for(old = 0; old < K*; ++old) { p move = p transition (state | old) if( v [i-1][old] * p obs * p move > v [i][state]) { v [i][state] = v [i-1][old] * p obs * p move b[i][state] = old } computing v at time i-1 will correctly } incorporate (maximize over) paths through time i-2 : } we correctly obey the Markov property }

  7. Marginal Probability (via the Forward Algorithm) 𝛽 𝑗 − 1, 𝑡 ′ ∗ 𝑞 𝑡 𝑡 ′ ) ∗ 𝑞(obs at 𝑗 | 𝑡) 𝛽 𝑗, 𝑡 = ෍ 𝑡 ′ what are the what’s the total probability how likely is it to get immediate ways to up until now? into state s this way? get into state s ? α(i, s ) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i Q: What do we return? (How do we A: α [N+1][end] return the likelihood of the sequence?) There’s an analogous backwards algorithm

  8. With Both Forward and Backward Values α( i, s ) * β( i, s) = total probability of paths through state s at step i 𝑞 𝑨 𝑗 = 𝑡 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝛾(𝑗, 𝑡) 𝛽(𝑂 + 1, END ) α( i, s) * p( s’ | B) * p(obs at i+1 | s’) * β( i+1, s’ ) = total probability of paths through the s  s ’ arc (at time i) 𝑞 𝑨 𝑗 = 𝑡, 𝑨 𝑗+1 = 𝑡 ′ 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝑞 𝑡 ′ 𝑡 ∗ 𝑞 obs 𝑗+1 𝑡 ′ ∗ 𝛾(𝑗 + 1, 𝑡′) 𝛽(𝑂 + 1, END )

  9. EM For HMMs α = computeForwards() (Baum-Welch β = computeBackwards() Algorithm) L = α [N+1][ END ] for(i = N; i ≥ 0; --i) { for(next = 0; next < K*; ++next) { c obs (obs i+1 | next) += α [i+1][next]* β [i+1][next]/L for(state = 0; state < K*; ++state) { u = p obs (obs i+1 | next) * p trans (next | state) c trans (next| state) += α [i][state] * u * β [i+1][next]/L } } }

  10. Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 “parents of” topological sort

  11. Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 exact inference in general DAGs is NP-hard inference in trees can be exact

  12. D-Separation: Testing for Conditional Independence d-separation X & Y are d-separated if for all paths P, one of the following is true: Variables X & Y are P has a chain with an observed middle node conditionally independent given Z if all X Y (undirected) paths from P has a fork with an observed parent node (any variable in) X to (any variable in) Y are X Y d-separated by Z P includes a “v - structure” or “collider” with all unobserved descendants X Z Y

  13. D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants X Z Y not observing Z blocks the path from X to Y

  14. D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants not observing Z blocks X Z Y the path from X to Y 𝑞 𝑦, 𝑧, 𝑨 = 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) 𝑞 𝑦, 𝑧 = ෍ 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) = 𝑞 𝑦 𝑞 𝑧 𝑨

  15. Markov Blanket the set of nodes needed to form the complete conditional for a variable x i 𝑞(𝑦 1 , … , 𝑦 𝑂 ) 𝑞 𝑦 𝑗 𝑦 𝑘≠𝑗 = ∫ 𝑞 𝑦 1 , … , 𝑦 𝑂 𝑒𝑦 𝑗 x ς 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) factorization = of graph ∫ ς 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗 factor out terms not dependent on x i Markov blanket of a node x ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) is its parents, children, and = children's parents ∫ ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗

  16. Markov Random Fields: Undirected Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

  17. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

  18. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)

  19. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)

  20. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not cliques necessarily a probability!)

  21. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not A : 𝜔 𝐷 ≥ 0 (or 𝜔 𝐷 > 0 ) cliques necessarily a probability!)

  22. Terminology: Potential Functions 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 energy function (for clique C) (get the total energy of a configuration by summing the individual energy functions) 𝜔 𝐷 𝑦 𝑑 = exp −𝐹(𝑦 𝐷 ) Boltzmann distribution

  23. Ambiguity in Undirected Model Notation 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔(𝑦, 𝑧, 𝑨) X Y Z 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔 1 𝑦,𝑧 𝜔 2 𝑧,𝑨 𝜔 3 𝑦,𝑨

  24. Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?

  25. Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?

  26. Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar 𝐹 𝑦, 𝑧 = ℎ ෍ 𝑦 𝑗 − 𝛾 ෍ 𝑦 𝑗 𝑦 𝑘 − 𝜃 ෍ 𝑦 𝑗 𝑧 𝑗 𝑗 𝑗𝑘 𝑗 x i and y i should allow for a bias be correlated

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend