Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement - PowerPoint PPT Presentation

Directed Probabilistic Graphical Models CMSC 678 UMBC

Announcement 1: Assignment 3 Due Wednesday April 11 th , 11:59 AM Any questions?

Announcement 2: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress you’ve made Discuss what remains to be done Discuss any new blocks you’ve experienced (or anticipate experiencing) Any questions?

Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs

Recap from last time…

Expectation Maximization (EM): E-step 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters 𝑞(𝑨 𝑗 ) count(𝑨 𝑗 , 𝑥 𝑗 ) 2. M-step: maximize log-likelihood, assuming these uncertain counts 𝑞 𝑢+1 (𝑨) 𝑞 (𝑢) (𝑨) estimated counts http://blog.innotas.com/wp-

EM Math E-step: count under uncertainty max 𝔽 𝑨 ~ 𝑞 𝜄(𝑢) (⋅|𝑥) log 𝑞 𝜄 (𝑨, 𝑥) old parameters 𝜄 new parameters posterior distribution new parameters M-step: maximize log-likelihood 𝒟 𝜄 = log-likelihood of 𝒬 𝜄 = posterior log- ℳ 𝜄 = marginal log- complete data (X,Y) likelihood of incomplete data Y likelihood of observed data X ℳ 𝜄 = 𝔽 𝑍∼𝜄 (𝑢) [𝒟 𝜄 |𝑌] − 𝔽 𝑍∼𝜄 (𝑢) [𝒬 𝜄 |𝑌] EM does not decrease the marginal log-likelihood

Lagrange multipliers Assume an original optimization problem

Lagrange multipliers Assume an original optimization problem We convert it to a new optimization problem:

Lagrange multipliers: an equivalent problem?

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story 𝑥 1 = 1 for roll 𝑗 = 1 to 𝑂: 𝑥 𝑗 ∼ Cat(𝜄) 𝑥 2 = 5 𝑥 3 = 4 a probability distribution over 6 sides of the die ⋯ 6 0 ≤ 𝜄 𝑙 ≤ 1, ∀𝑙 ෍ 𝜄 𝑙 = 1 𝑙=1

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story 𝑥 1 = 1 for roll 𝑗 = 1 to 𝑂: 𝑥 𝑗 ∼ Cat(𝜄) 𝑥 2 = 5 Maximize Log-likelihood 𝑥 3 = 4 ℒ 𝜄 = ෍ log 𝑞 𝜄 (𝑥 𝑗 ) 𝑗 ⋯ = ෍ log 𝜄 𝑥 𝑗 𝑗

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story Maximize Log-likelihood for roll 𝑗 = 1 to 𝑂: ℒ 𝜄 = ෍ log 𝜄 𝑥 𝑗 𝑥 𝑗 ∼ Cat(𝜄) 𝑗 Q: What’s an easy way to maximize this, as written exactly (even without calculus)?

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story Maximize Log-likelihood for roll 𝑗 = 1 to 𝑂: ℒ 𝜄 = ෍ log 𝜄 𝑥 𝑗 𝑥 𝑗 ∼ Cat(𝜄) 𝑗 Q: What’s an easy way to maximize this, as written exactly (even without calculus)? A: Just keep increasing 𝜄 𝑙 ( we know 𝜄 must be a distribution, but it’s not specified)

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) 6 (we can include the inequality constraints ℒ 𝜄 = ෍ log 𝜄 𝑥 𝑗 s. t. ෍ 𝜄 𝑙 = 1 0 ≤ 𝜄 𝑙 , but it complicates the problem and, right 𝑗 𝑙=1 now , is not needed) solve using Lagrange multipliers

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) (we can include the 6 inequality constraints 0 ≤ 𝜄 𝑙 , but it ℱ 𝜄 = ෍ log 𝜄 𝑥 𝑗 − 𝜇 ෍ 𝜄 𝑙 − 1 complicates the problem and, right 𝑗 𝑙=1 now , is not needed) 6 𝜖ℱ 𝜄 1 𝜖ℱ 𝜄 = ෍ − 𝜇 = − ෍ 𝜄 𝑙 + 1 𝜖𝜄 𝑙 𝜄 𝑥 𝑗 𝜖𝜇 𝑗:𝑥 𝑗 =𝑙 𝑙=1

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) (we can include the 6 inequality constraints 0 ≤ 𝜄 𝑙 , but it ℱ 𝜄 = ෍ log 𝜄 𝑥 𝑗 − 𝜇 ෍ 𝜄 𝑙 − 1 complicates the problem and, right 𝑗 𝑙=1 now , is not needed) 6 σ 𝑗:𝑥 𝑗 =𝑙 1 𝜄 𝑙 = optimal 𝜇 when ෍ 𝜄 𝑙 = 1 𝜇 𝑙=1

Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) (we can include the 6 inequality constraints 0 ≤ 𝜄 𝑙 , but it ℱ 𝜄 = ෍ log 𝜄 𝑥 𝑗 − 𝜇 ෍ 𝜄 𝑙 − 1 complicates the problem and, right 𝑗 𝑙=1 now , is not needed) 6 σ 𝑗:𝑥 𝑗 =𝑙 1 σ 𝑙 σ 𝑗:𝑥 𝑗 =𝑙 1 = 𝑂 𝑙 𝜄 𝑙 = optimal 𝜇 when ෍ 𝜄 𝑙 = 1 𝑂 𝑙=1

Example: Conditionally Rolling a Die 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 add complexity to better explain what we see 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 𝑞 𝑥 𝑂 |𝑨 𝑂 = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 𝑗 𝑞 heads = 𝜇 𝑞 tails = 1 − 𝜇 𝑨 1 = 𝐼 𝑥 1 = 1 𝑞 heads = 𝛿 𝑨 2 = 𝑈 𝑥 2 = 5 𝑞 tails = 1 − 𝛿 ⋯ 𝑞 heads = 𝜔 𝑞 tails = 1 − 𝜔

Example: Conditionally Rolling a Die 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 add complexity to better explain what we see 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 𝑞 𝑥 𝑂 |𝑨 𝑂 = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 𝑗 Generative Story 𝑞 heads = 𝜇 𝜇 = distribution over penny 𝑞 tails = 1 − 𝜇 𝛿 = distribution for dollar coin 𝜔 = distribution over dime 𝑞 heads = 𝛿 for item 𝑗 = 1 to 𝑂: 𝑞 tails = 1 − 𝛿 𝑨 𝑗 ~ Bernoulli 𝜇 𝑞 heads = 𝜔 if 𝑨 𝑗 = 𝐼: 𝑥 𝑗 ~ Bernoulli 𝛿 else: 𝑥 𝑗 ~ Bernoulli 𝜔 𝑞 tails = 1 − 𝜔

Classify with Bayes Rule argmax 𝑍 𝑞 𝑍 𝑌) argmax 𝑍 log 𝑞 𝑌 𝑍) + log 𝑞(𝑍) likelihood prior

The Bag of Words Representation Adapted from Jurafsky & Martin (draft)

The Bag of Words Representation 29 Adapted from Jurafsky & Martin (draft)

Bag of Words Representation seen 2 classifier sweet 1 γ ( )=c whimsical 1 recommend 1 happy 1 classifier ... ... Adapted from Jurafsky & Martin (draft)

Naïve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels global for label 𝑙 = 1 to 𝐿: parameters 𝜄 𝑙 = generate parameters

Naïve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels y for label 𝑙 = 1 to 𝐿: 𝜄 𝑙 = generate parameters for item 𝑗 = 1 to 𝑂: 𝑧 𝑗 ~ Cat 𝜚

Naïve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels y for label 𝑙 = 1 to 𝐿: 𝜄 𝑙 = generate parameters for item 𝑗 = 1 to 𝑂: 𝑦 𝑗1 𝑦 𝑗2 𝑦 𝑗3 𝑦 𝑗4 𝑦 𝑗5 𝑧 𝑗 ~ Cat 𝜚 local variables for each feature 𝑘 𝑦 𝑗𝑘 ∼ F 𝑘 (𝜄 𝑧 𝑗 )

Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement - PowerPoint PPT Presentation

Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th , 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Probabilistic Graphical Models CMSC 678 UMBC Probabilistic Graphical Models A graph G that

Probabilistic Graphical Models Probabilistic Graphical Models Variable elimination Siamak

Graphical Models Graphical Models Bayesian Networks Siamak Ravanbakhsh Fall 2019 Previously on

Probabilistic Graphical Models Probabilistic Graphical Models introduction to learning Siamak

Computer Science Let me be provocative Probabilistic graphical models is how we do probabilistic

Probabilistic Graphical Models Probabilistic Graphical Models Undirected Models Fall 2019

Probabilistic Graphical Models Probabilistic Graphical Models parameter learning in undirected

Probabilistic Graphical Models Probabilistic Graphical Models Gaussian Network Models Fall 2019

CS 6782: Fall 2010 Probabilistic Graphical Models Guozhang Wang December 10, 2010 1

Directed Graphical Models: Bayesian Networks Probabilistic Graphical Models Sharif University of

Probabilistic Graphical Models Probabilistic Graphical Models Review of probability theory

Probabilistic Graphical Models Probabilistic Graphical Models Loopy BP and Bethe Free Energy

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Probabilistic Graphical Models Probabilistic Graphical Models MAP inference Siamak Ravanbakhsh

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Logging with SF4L and Logback J.Serrat 102759 Software Design November 3, 2015 Index Why

Week 2: Maximum Likelihood Estimation Instructor: Sergey Levine 1 Recap: MLE for the binomial

New approaches for statistical modelling Jelena Jockovi c ADVISORS: Pepa Ram rez Cobo,

Mixture Models Simulation-based Estimation Michel Bierlaire michel.bierlaire@epfl.ch

7th International dCache Workshop Berlin Bits and Pieces 2013 Christian Bernardt (at DESY)

RAFT Consensus Slide content borrowed from Diego Ongaro, John Ousterhout, and Alberto Montresor

Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data

The Why, What, and How of Software Transactions for More Reliable Concurrency Dan Grossman