Topic Modeling Lecture 9: October 9, 2013 CS886 2 Natural Language - PDF document

2013 ‐ 10 ‐ 09 Topic Modeling Lecture 9: October 9, 2013 CS886 ‐ 2 Natural Language Understanding University of Waterloo CS886 Lecture Slides (c) 2013 P. Poupart 1 Information Retrieval Example #1 CS886 Lecture Slides (c) 2013 P. Poupart 2 1

2013 ‐ 10 ‐ 09 Information Retrieval Example #3 CS886 Lecture Slides (c) 2013 P. Poupart 7 Latent Semantic Analysis • Idea: singular value decomposition – Infer latent space in which documents or words can be described more succinctly • Issues: – How do we interpret this latent space? – How many dimensions should it have? – How can we represent uncertainty/ambiguities? CS886 Lecture Slides (c) 2013 P. Poupart 8 4

2013 ‐ 10 ‐ 09 Latent Dirichlet Allocation • Idea: probabilistic generative model for documents – Latent variables often correspond to topics – Some machine learning techniques can automatically infer the # of topics – Probabilistic framework allows us to quantify uncertainty/ambiguities CS886 Lecture Slides (c) 2013 P. Poupart 9 Graphical Model • Picture CS886 Lecture Slides (c) 2013 P. Poupart 10 5

2013 ‐ 10 ‐ 09 Plate Model • Picture CS886 Lecture Slides (c) 2013 P. Poupart 11 Dirichlet • Definition �� ; � � , � � � �� 1 � � � � �� • �: probability of head Dir(p; 1, 1) Dir(p; 2, 8) Dir(p; 20, 80) • � � � 1 : # of heads • � � � 1 : # of tails Pr(p) • Mean: � � /�� 0 0.2 1 p CS886 Lecture Slides (c) 2013 P. Poupart 12 6

2013 ‐ 10 ‐ 09 Conjugate Prior • Bayesian learning – Prior: Pr � � �� ; � � , � � – Posterior: Pr��|�� • Bayes theorem: Pr � �� ∝ Pr � Pr �� ; � � , � � � � �1 � �� 1 � � � � �� ; � � � 3, � � � 1� CS886 Lecture Slides (c) 2013 P. Poupart 13 Topic Modeling • Task – Infer topics and parameters: Pr �� :� , �, �|� �:� � • Two common approaches – Gibbs sampling • Simple, but stochastic and slow – Variational Bayes (variant of EM) • Complex, but deterministic and fast CS886 Lecture Slides (c) 2013 P. Poupart 14 7

2013 ‐ 10 ‐ 09 Sampling Techniques • Direct sampling • Rejection sampling • Likelihood weighting • Importance sampling • Markov chain Monte Carlo (MCMC) – Gibbs Sampling – Metropolis ‐ Hastings • Sequential Monte Carlo sampling (a.k.a. particle filtering) CS886 Lecture Slides (c) 2013 P. Poupart 15 Approximate Inference by Sampling • Expectation: � � � � � � � � � � �� – Approximate integral by sampling: � � � � � � � � ∑ �� where � � ~�� • Inference query: Pr��|�� ∑ Pr ��, �|�� – Approximate exponentially large sum by sampling: � � Pr � � � � ∑ Pr ��|� � , �� where � � ~��|�� CS886 Lecture Slides (c) 2013 P. Poupart 16 8

2013 ‐ 10 ‐ 09 Direct Sampling (a.k.a. forward sampling) • Unconditional inference queries (i.e., Pr �� ) • Bayesian networks only – Idea: sample each variable given the values of its parents according to the topological order of the graph. CS886 Lecture Slides (c) 2013 P. Poupart 17 Direct Sampling Algorithm Sort the variables by topological order For � � 1 to � do (sample � particles) For each variable � � do � ~ Pr � �� Sample � � � � �� • Approximation: Pr � � � � � � ∑ �� CS886 Lecture Slides (c) 2013 P. Poupart 18 9

2013 ‐ 10 ‐ 09 Example CS886 Lecture Slides (c) 2013 P. Poupart 19 Analysis • Complexity: ��|�|� where � � #variables • Accuracy � � � � � 2� �� – Absolute error � : P P � V � P V �� • Sample size � � � �� ∉ �1 � �, 1 � �� 2� � �� – Relative error � : � � � �� • Sample size � � � �� CS886 Lecture Slides (c) 2013 P. Poupart 20 10

2013 ‐ 10 ‐ 09 Markov Chain Monte Carlo • Iterative sampling technique that converges to the desired distribution in the limit • Idea: set up a Markov chain such that its stationary distribution is the desired distribution CS886 Lecture Slides (c) 2013 P. Poupart 21 Markov Chain • Definition: A Markov chain is a linear chain Bayesian network with a stationary conditional distribution known as the transition function � � � � � � � � … • Initial distribution: Pr �� • Transition distribution: Pr �� |� �� CS886 Lecture Slides (c) 2013 P. Poupart 22 11

2013 ‐ 10 ‐ 09 Asymptotic Behaviour • Let Pr�� be the distribution at time step � Pr�� ∑ Pr � �..� � �..�� ∑ Pr�� Pr�� |� �� • In the limit (i.e., when � → ∞ ), the Markov chain may converge to stationary distribution � � � Pr �� Pr � � � � � ∑ Pr �� ′� Pr �� |� �� ′� � �� Pr ��|� � � � ∑ �� CS886 Lecture Slides (c) 2013 P. Poupart 23 Stationary distribution ��|� � � be a matrix that represents the • Let � �|�� Pr transition function • If we think of � as a column vector, then � is an eigenvector of � with eigenvalue 1 �� CS886 Lecture Slides (c) 2013 P. Poupart 24 12

2013 ‐ 10 ‐ 09 Ergodic Markov Chain • Definition: A Markov chain is ergodic when there is a non ‐ zero probability of reaching any state from any state in a finite number of steps • When the Markov chain is ergodic, there is a unique stationary distribution • Sufficient condition: detailed balance �� |�� Pr ��|� � � � � Pr Detailed balance  ergodicity  unique stationary dist. CS886 Lecture Slides (c) 2013 P. Poupart 25 Markov Chain Monte Carlo • Idea: set up an ergodic Markov chain such that the unique stationary distribution is the desired distribution • Since the Markov chain is a linear chain Bayes net, we can use direct sampling (forward sampling) to obtain a sample of the stationary distribution CS886 Lecture Slides (c) 2013 P. Poupart 26 13

2013 ‐ 10 ‐ 09 Generic MCMC Algorithm Sample � � ~ Pr �� For � � 1 to � do (sample � particles) Sample � � ~ Pr � � � �� • Approximation: � � � � ∑ �� • In practice, ignore the first � samples for a better estimate (burn ‐ in period): � � � � � �� ∑ �� CS886 Lecture Slides (c) 2013 P. Poupart 27 Choosing a Markov Chain • Different Markov chains lead to different algorithms – Gibbs sampling – Metropolis Hastings CS886 Lecture Slides (c) 2013 P. Poupart 28 14

2013 ‐ 10 ‐ 09 Gibbs Sampling • Suppose Pr � defined by a graphical model (Bayes net or Markov net) • Inference query: Pr � � ? Where � ⊆ � • Idea: randomly assign values to all non ‐ evidence variables, then repeatedly sample each non ‐ evidence variable given the assigned values for all other variables CS886 Lecture Slides (c) 2013 P. Poupart 29 Gibbs Sampling Algorithm � to all non ‐ evidence variables � Randomly assign � � � For � � 1 to � do (sample � particles) For each non ‐ evidence variable � � do � ~ Pr � �� , � Sample � � � � ~� � � �� |� � � � � ∑ �� • Approximation: Pr � �� CS886 Lecture Slides (c) 2013 P. Poupart 30 15

2013 ‐ 10 ‐ 09 Example CS886 Lecture Slides (c) 2013 P. Poupart 31 Practical Consideration • Burn ‐ in period: ignore first � samples: � 1 � � �� Pr � � � �|� � � � � � �� • Use most recent values to sample � � � ~ Pr � |� �…�� , � ��… � � � • Use conditional independence to restrict parent variables to the Markov blanket � ~ Pr � |� ∀��,�∈�� , � ∀��,�∈�� CS886 Lecture Slides (c) 2013 P. Poupart 32 16

2013 ‐ 10 ‐ 09 Convergence �� |� �� , �� be the transition function of • Let Pr the Markov chain associated with Gibbs sampling • Theorem: Gibbs sampling converges to Pr � � when all potentials are strictly positive. �� |� �� , �� satisfies detailed balance • Proof: Pr i.e. Pr � � Pr � � �, � � Pr � � � Pr ��|�′, �� CS886 Lecture Slides (c) 2013 P. Poupart 33 17

Topic Modeling Lecture 9: October 9, 2013 CS886 2 Natural Language - PDF document

2013 10 09 Topic Modeling Lecture 9: October 9, 2013 CS886 2 Natural Language Understanding University of Waterloo CS886 Lecture Slides (c) 2013 P. Poupart 1 Information Retrieval Example #1 CS886 Lecture Slides (c) 2013 P. Poupart 2

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Why learn topic modeling Pavel Oleinikov Associate Director Quantitative Analysis Center

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Informed search algorithms This lecture topic Chapter 3.5-3.7 Next lecture topic Chapter

Using topic models as classifiers Pavel Oleinikov Associate Director Quantitative Analysis

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Second Year Student Meeting PhD Candidacy Exam On-topic or Off-topic Candidacy Exam? On-Topic:

The Dynamic Earth Unit Topics Topic 1: Earths Interior Topic 2: Continental Drift

Strategic Considerations for Managing a Nanotechnology Patent Portfolio Sarah Korman, Ph.D., J.D.

9/15/17 Outline Topic 1.Introduc8on Topic 2. RCS for six key fuels Topic 3.

Researching Researching Your Paper Topic Your Paper Topic A HOW TO GUIDE A HOW TO GUIDE

A Theory Of Inferred Causation Daniel Kttel ETH Zrich, Switzerland 23. May 2006 Our Task

Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun (

BiFluX: A Bidirectional Functional Update Language for XML Hugo Pacheco 1 Tao Zan 2 Zhenjiang Hu 2

Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev

Recommendation Systems Stony Brook University CSE545, Spring 2019 Recommendation Systems

Digital I&C Software Reliability February 1, 2011 Gerard J. Holzmann Laboratory for

Primordial black holes as dark matter Francesc Ferrer, Washington University in St. Louis

Magnetic Shape Memory Alloys Magnetically Induced Martensite (MIM) Magnetically

Topic Modeling Lecture 9: October 9, 2013 CS886 2 Natural Language - PDF document

2013 10 09 Topic Modeling Lecture 9: October 9, 2013 CS886 2 Natural Language Understanding University of Waterloo CS886 Lecture Slides (c) 2013 P. Poupart 1 Information Retrieval Example #1 CS886 Lecture Slides (c) 2013 P. Poupart 2

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

Why learn topic modeling Pavel Oleinikov Associate Director Quantitative Analysis Center

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Informed search algorithms This lecture topic Chapter 3.5-3.7 Next lecture topic Chapter

Using topic models as classifiers Pavel Oleinikov Associate Director Quantitative Analysis

Revised: March 4, 2013 3/19/2013 3/19/2013 2 3/19/2013 3 3/19/2013 4 3/19/2013 5

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Second Year Student Meeting PhD Candidacy Exam On-topic or Off-topic Candidacy Exam? On-Topic:

The Dynamic Earth Unit Topics Topic 1: Earths Interior Topic 2: Continental Drift

Strategic Considerations for Managing a Nanotechnology Patent Portfolio Sarah Korman, Ph.D., J.D.

9/15/17 Outline Topic 1.Introduc8on Topic 2. RCS for six key fuels Topic 3.

Researching Researching Your Paper Topic Your Paper Topic A HOW TO GUIDE A HOW TO GUIDE

A Theory Of Inferred Causation Daniel Kttel ETH Zrich, Switzerland 23. May 2006 Our Task

Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun (

BiFluX: A Bidirectional Functional Update Language for XML Hugo Pacheco 1 Tao Zan 2 Zhenjiang Hu 2

Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev

Recommendation Systems Stony Brook University CSE545, Spring 2019 Recommendation Systems

Digital I&amp;C Software Reliability February 1, 2011 Gerard J. Holzmann Laboratory for

Primordial black holes as dark matter Francesc Ferrer, Washington University in St. Louis

Magnetic Shape Memory Alloys Magnetically Induced Martensite (MIM) Magnetically

Digital I&C Software Reliability February 1, 2011 Gerard J. Holzmann Laboratory for