Mixed Membership Markov Models for Unsupervised Conversation - PowerPoint PPT Presentation

Mixed Membership Markov Models for Unsupervised Conversation Modeling MICHAEL J. PAUL JOHNS HOPKINS UNIVERSITY

Conversation Modeling: High Level Idea 2  We’ll be modeling sequences of documents ¡ e.g. a sequence of email messages from a conversation  We’ll use M 4 = M ixed M embership M arkov M odels  M 4 is a combination of ¡ Topic models (LDA, PLSA, etc.) ÷ Documents are mixtures of latent classes/topics ¡ Hidden Markov models ÷ Documents in a sequence depend on the previous document M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Generative Models of Text 3  Some distinctions to consider… Inter-document structure Intra-document Independent Markov structure Single-Class Naïve Bayes HMM Mixed- LDA This talk! J Membership M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Overview 4  Unsupervised Content Models ¡ Naïve Bayes ¡ Topic Models  Unsupervised Conversation Modeling ¡ Hidden Markov Models  Mixed Membership Markov Models (M 4 ) ¡ Overview ¡ Inference  Experiments with Conversation Data ¡ Thread reconstruction ¡ Speech act induction M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Motivation: Unsupervised Models 5  Huge amounts of unstructured and unannotated data on the Web  Unsupervised models can help manage this data and are robust to variations in language and genre  Tools like topic models can uncover interesting patterns in large corpora M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

(Unsupervised) Naïve Bayes 6 θ class distribution � • Each document belongs to some category/class z z z z class � • Each class z is associated with its own w w w distribution over words words � N N N Doc 1 Doc 2 Doc 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

(Unsupervised) Naïve Bayes 7 football 0.03 team 0.01 “SPORTS” hockey 0.01 baseball 0.005 … … charge 0.02 probability court 0.02 imaginary distributions “CRIME” police 0.015 class over words robbery 0.01 labels … … congress 0.02 president 0.02 “POLITICS” election 0.015 senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

(Unsupervised) Naïve Bayes 8 football 0.03 team 0.01 hockey 0.01 baseball 0.005 … … charge 0.02 court 0.02 police 0.015 robbery 0.01 … … congress 0.02 president 0.02 election 0.015 senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

(Unsupervised) Naïve Bayes? 9 football 0.03 team 0.01 hockey 0.01 baseball 0.005 … … What if an article belongs charge 0.02 to more than one category? court 0.02 police 0.015 robbery 0.01 … … congress 0.02 president 0.02 election 0.015 senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

(Unsupervised) Naïve Bayes? 10 football 0.03 Jury Finds Baseball Star team 0.01 Roger Clemens Not Guilty On hockey 0.01 baseball 0.005 All Counts … … charge 0.02 court 0.02 police 0.015 robbery 0.01 … … A jury found baseball star Roger Clemens not guilty on six charges congress 0.02 against. Clemens was accused of lying president 0.02 to Congress in 2008 about his use of election 0.015 performance enhancing drugs . senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Topic Models 11 football 0.03 … team 0.01 Doc 1 hockey 0.01 baseball 0.005 … … charge 0.02 court 0.02 Doc 2 police 0.015 robbery 0.01 … … congress 0.02 president 0.02 Doc 3 election 0.015 … senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Topic Models 12 θ θ θ • One class distribution θ d per document z z z • One class value per token • (rather than per document) w w w N N N T. Hofmann. Probabilistic Latent Doc 1 Doc 2 Doc 3 Semantic Indexing. SIGIR 1999. M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Latent Dirichlet Allocation (LDA) 13 α Dirichlet prior � D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation. JMLR 2003. θ θ θ • One class distribution θ d per document z z z • One class value per token • (rather than per document) w w w N N N Doc 1 Doc 2 Doc 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Overview 14  Unsupervised Content Models  Unsupervised Conversation Modeling  Mixed Membership Markov Models  Experiments with Conversation Data  Conclusion M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Conversation Modeling 15  Documents on the web are more complicated than news articles M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Conversation Modeling 16  Documents on the web are more complicated than news articles M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Conversation Modeling 17  What’s missing from Naïve Bayes and LDA? ¡ They assume documents are generated independently of each other  Messages in conversations aren’t at all independent ¡ Doesn’t make sense to pretend that they are ¡ But we’d like to represent this dependence in a reasonably simple way  Solution: Hidden Markov Models M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Block HMM 18 • Message emitted at each time step of Markov chain π transition parameters (matrix) � z z z class � • Each message in thread w w w depends on the message to which it is a response N N N Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Bayesian Block HMM 19 α Dirichlet prior � A. Ritter, C. Cherry, B. Dolan. π Unsupervised Modeling of Twitter Conversations. HLT-NAACL 2010. z z z • Each message in thread w w w depends on the message to which it is a response N N N Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Block HMM 20 hey 0.1 GREETING SPORTS football 0.03 sup 0.06 team 0.01 hi 0.04 hockey 0.01 hello 0.01 baseball 0.005 … … … … what 0.03 charge 0.02 QUESTION CRIME what’s 0.025 court 0.02 how 0.02 police 0.015 is 0.02 robbery 0.01 … … … … lol 0.04 congress 0.02 LAUGHTER POLITICS haha 0.04 president 0.02 :) 0.03 election 0.015 lmao 0.01 senate 0.01 … … … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Block HMM 21  Nice and simple way to model dependencies between messages  This is similar to Naïve Bayes ¡ One class per document!  Let’s make it more like LDA ¡ Documents are mixtures of classes M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Generative Models of Text 22 Inter-document structure Independent Markov Intra-document structure Single-Class Mixed- This talk! J Membership M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Overview 23  Unsupervised Content Models  Unsupervised Conversation Modeling  Mixed Membership Markov Models  Experiments with Conversation Data  Conclusion M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Mixed Membership Markov Models (M 4 ) 24 Λ transition parameters � class distribution π π π (function of z and λ ) � • Like LDA z z z • One distribution π d per doc • One class z per token w w w • But now each message’s distribution depends on the class N N N assignments of previous message Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Mixed Membership Markov Models (M 4 ) 25 Λ transition parameters � class distribution π π π (function of z and λ ) � z z z Probability of class j in message d π dj ∝ exp( λ j T z d -1 ) w w w N N N log-linear function Message 1 Message 2 Message 3 of previous message M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Mixed Membership Markov Models (M 4 ) 26 Λ • Why not transition directly from π π π π to π ? • Makes more sense for next z z z message to depend on actual classes of previous message (not the distribution over all w w w possible classes) N N N Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Example 27 Suppose documents are mixtures of 4 classes: G R B Then Λ is a 4x4 matrix with values such as: λ G → R = –0.2 “The presence of G in doc 1 slightly decreases the likelihood of having R in doc 2” “The presence of B in doc 1 greatly increases the λ B → B = 5.0 likelihood of having B in doc 2” M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.

Mixed Membership Markov Models for Unsupervised Conversation - PowerPoint PPT Presentation

Mixed Membership Markov Models for Unsupervised Conversation Modeling MICHAEL J. PAUL JOHNS HOPKINS UNIVERSITY Conversation Modeling: High Level Idea 2 Well be modeling sequences of documents e.g. a sequence of email messages

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Regression 2: Mixed Models Marco Baroni Practical Statistics in R Outline Mixed models with

Mixing it up with random effects Joshua Loftus Mixed models Intro to mixed models What is a

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Mixed Precision Training PAI Overview What is mixed-precision

Satellite Edge Vaibhav Bhosale Ketan Bhardwaj Ada Gavrilovska LEO Satellites Image Credits:

Ubiquitous and Mobile Computing CS 528: Visage: A Face Interpretation Engine for Smartphone

Do Social Networks Improve e-Commerce? A Study on Social Marketplaces 1 GAYATRI SWAMYNATHAN,

Learning Accurate Cutset Networks by Exploiting Decomposability N. Di Mauro, A. Vergari, and F.

Finding Bugs Last time Run-time reordering transformations Today Program Analysis for

Revelation of Jesus Christ Reading Slides Be Thou My Vision REVELATION OF JESUS CHRIST

REVELATION OF JESUS CHRIST Revelation 2: 1-11 Church in Ephesus Revelations 2:1-7 ESV [To the

presentation Rzsa CNET CNET TF-NOC flash p US LHC US LHC Sndor US LHC US LHC Netw w