[PPT] - Review T opic Discovery with Phrases using the Po lya Urn Model PowerPoint Presentation

SLIDE 1

Review T

pic Discovery with Phrases

using the Pólya Urn Model

Geli Fei, Zhiyuan Chen, Bing Liu

University of Illinois at Chicago

Presenter: Alan Akbik

IBM Research Almaden / Berlin Institute of Technology

SLIDE 2

Product Aspects

 Large collection of product reviews

Example domain: Smartphones

 Task: Discover aspects that are being discussed in

the reviews

Battery - Battery life, AAA batteries

 „The battery life of this smartphone is great.“  „It uses AAA batteries.“

Screen - Screen size, touch screen
Camera - Resolution, image quality

SLIDE 3

T

pic Models

 Widely used in review topic / aspect discovery  Most models regard each topic as a distribution over

individual terms (unigrams)

 Terms in each document are assigned to topics

Documents assigned to topics via terms

 The generation of topics is mostly governed by “higher

rder co-occurrence” (Heinrich 2009)
i.e., how often words co-occur in different contexts

SLIDE 4

T

pic Models

 Major issue: individual words may not convey the

same information as natural phrases

e.g. “battery life” vs. “life”

 Leading to three problems:

Interpretability - T
pics are hard for users to interpret

unless they are domain experts

Ambiguity - Hard to directly make use of the topical

words

False evidence - Causes extra or wrong co-occurrences

in topic generation, leading to poorer topics

SLIDE 5

Possible Solutions (1)

 Treat each whole phrase as one term

“The battery life of this smartphone is great”

 Problems:

Many phrases very rare
Remove important words

 “battery life” may not be in the same topic as “battery”, because we don’t observe co-occurence

SLIDE 6

Possible Solutions (2)

 Keep individual words, add extra terms for

phrases

“The battery life of this tablet is great”

 Problems:

False evidence still exists
Many phrases rare

 “battery life” is much less frequent than “life” to be ranked

n the top in a topic

SLIDE 7

Challenge

How to retain connections between phrases and words while removing wrong co-

ccurrences?

SLIDE 8

Related Work

 Using n-grams in topic modeling (Mukherjee and Liu 2013;

Mukherjee et al. 2013).

 Identifying key phrases in the post-processing step based on

the discovered topical unigrams (Blei and Lafferty 2009; Liu et al. 2010; Zhao et al. 2011).

 Directly modeling word order in topic model (Wallach 2006;

Wang et al. 2007).

Breaking the “bag-of-word” assumption
Although ”bag-of-word” assumption does not always hold, it
ffers a great computational advantage
Our method still follows the ”bag-of-word” assumption

SLIDE 9

Gibbs Sampling for LDA

 One of the most commonly used inference

techniques for topic models.

 Considers each term in the documents in turn  Samples a topic to the current term,

conditioned on the topic assignments of other terms.

SLIDE 10

Simple Pólya Urn Model (SPU)

 Designed in the context of colored balls and

urns

 In the context of topic models:

A ball with a certain color: a term
The urn: contains a mixture of balls with various

colors (terms)

 Topic-word (topic-term) distribution is

reflected by the proportion of balls with a certain color in the urn

SLIDE 11

Simple Pólya Urn Model (SPU)

 Left: initial state  Middle: draw a ball of a certain color  Right: put two balls of the same color back  Self-reinforcing property known as “the rich get richer”

SLIDE 12

Generalized Pólya Urn Model (GPU)

 GPU vs. SPU: apart from two balls with the same color

being put back, a certain number of balls with some

ther colors are also put in the urn.

 We call this the promotion of these colored balls  Using the idea in the sampling process:

SPU: seeing “staff” under a topic only increases the chance of

seeing it again under the same topic

GPU: also increases the chance of seeing “hotel staff” under the

topic

SLIDE 13

Generalized Pólya Urn Model (GPU)

 In our application:

We use each whole phrase as a term to remove

wrong co-occurrences

And use GPU to regain the connection between

phrases and words

 Two directions of promotion:

Word to phrase: when a topic is assigned to an

individual word, phrases containing the word are promoted

Phrase to word: when a topic is assigned to a phrase,

each component word is promoted

SLIDE 14

Datasets and Preprocessing

 Data sets:

30 categories of electronics reviews from Amazon (1,000

reviews in each category)

Hotel reviews from TripAdvisor (101,234 reviews)
Restaurant reviews from Yelp (25,459 reviews)

 Preprocessing:

Review sentences as documents

 Standard topic models cannot discover product aspects well when directly applied to reviews (Titov and McDonald, 2008)

Rule-based method for noun phrase detection

 Use rule-based method for efficiency

SLIDE 15

Experiments

 Four sets of experiments on 32 domains

Baseline #1, LDA(w): without considering phrases
Baseline #2, LDA(p): considers phrases, uses each

whole phrase as a term

Baseline #3, LDA(w_p): considers phrases, keeps

individual component words, and adds phrases as extra terms

LDA(p_GPU): Our proposed method

SLIDE 16

Parameter Setting

 Use the same set of parameters for all experiments

Set Dirichlet priors as in (Griffiths and Steyvers, 2004)

 Set document-topic prior 𝛽=50/𝐿, where 𝐿 is the number of topics.  Set topic-term prior 𝛾=0.1

Set number of topics 𝐿=15
posterior inference was drawn after 2000 Gibbs sampling

iterations with 400 iterations of burn-in

SLIDE 17

Parameters for GPU Model

 Not all words in a phrase are equally important

e.g. “staff” is more important than “hotel” in “hotel staff”

 Determine head nouns

Following (Wang et al., 2007), we assume the last word in a noun

phrase as the head noun

 GPU promotion

Word to phrase: promote a phrase by virtualcount when a topic is

assigned to its head noun

Phrase to word: promote 0.5 * virtualcount to the head noun and

0.25 * virtualcount to all other words when a topic is assigned to a phrase

Set virtualcount=0.1 empirically, based on how much to promote

phrases

SLIDE 18

Statistical Evaluation

 Two commonly used evaluation statistics:

Perplexity: measures the likelihood of unseen documents
KL-divergence: measure the distinctiveness of topics
Neither of them correlates well with human judgments

 We use topic coherence (Mimno et al. 2011)

It measures the degree of co-occurrence of topical words

under a topic

Has been shown to correlate with human judgment quite well
Generates a negative value, the higher the better

SLIDE 19

Statistical Evaluation

 Topic Coherence using top 15 topical terms

SLIDE 20

Statistical Evaluation

 Topic Coherence using top 30 topical terms

SLIDE 21

Human Evaluation

 Done by two annotators in two stages sequentially

Topic labeling (Kappa score: 0.838)
Topical terms labeling by computing precision@n

(Kappa score: 0.846)

We compute average p@15 and p@30 for each model
n each domain

SLIDE 22

Human Evaluation

 Human evaluation on five domains

Hotel, Restaurant, Watch, Tablet, MP3Player

SLIDE 23

Example T

pics

 Example topics by LDA(w) and LDA(p_GPU)

SLIDE 24

Future Work

 Design a topic quality metrics for topics with

phrases

 Systematically set the amount of promotion

based on the designed metrics

SLIDE 25

Review T

using the Pólya Urn Model

Geli Fei, Zhiyuan Chen, Bing Liu

Presenter: Alan Akbik

Product Aspects

 Large collection of product reviews

 Task: Discover aspects that are being discussed in

the reviews

T

individual terms (unigrams)

T

same information as natural phrases

unless they are domain experts

words

in topic generation, leading to poorer topics

Possible Solutions (1)

 Treat each whole phrase as one term

“The battery life of this smartphone is great”

 Problems:

Possible Solutions (2)

 Keep individual words, add extra terms for

phrases

“The battery life of this tablet is great”

 Problems:

Challenge

How to retain connections between phrases and words while removing wrong co-

Related Work

Mukherjee et al. 2013).

the discovered topical unigrams (Blei and Lafferty 2009; Liu et al. 2010; Zhao et al. 2011).

Wang et al. 2007).

Gibbs Sampling for LDA

 One of the most commonly used inference

techniques for topic models.

 Considers each term in the documents in turn  Samples a topic to the current term,

conditioned on the topic assignments of other terms.

Simple Pólya Urn Model (SPU)

 Designed in the context of colored balls and

urns

 In the context of topic models:

colors (terms)

 Topic-word (topic-term) distribution is

reflected by the proportion of balls with a certain color in the urn

Simple Pólya Urn Model (SPU)

Generalized Pólya Urn Model (GPU)

being put back, a certain number of balls with some

Generalized Pólya Urn Model (GPU)

wrong co-occurrences

phrases and words

individual word, phrases containing the word are promoted

each component word is promoted

Datasets and Preprocessing

reviews in each category)

Experiments

 Four sets of experiments on 32 domains

whole phrase as a term

individual component words, and adds phrases as extra terms

Parameter Setting

 Use the same set of parameters for all experiments

iterations with 400 iterations of burn-in

Parameters for GPU Model

Statistical Evaluation

Statistical Evaluation

Statistical Evaluation

Human Evaluation

(Kappa score: 0.846)

Human Evaluation

 Human evaluation on five domains

Example T

 Example topics by LDA(w) and LDA(p_GPU)

Future Work

 Design a topic quality metrics for topics with

phrases

 Systematically set the amount of promotion

based on the designed metrics

Thank You!