Review T
- pic Discovery with Phrases
using the Pólya Urn Model
Geli Fei, Zhiyuan Chen, Bing Liu
University of Illinois at Chicago
Presenter: Alan Akbik
IBM Research Almaden / Berlin Institute of Technology
Review T opic Discovery with Phrases using the Po lya Urn Model - - PowerPoint PPT Presentation
Review T opic Discovery with Phrases using the Po lya Urn Model Geli Fei, Zhiyuan Chen, Bing Liu University of Illinois at Chicago Presenter: Alan Akbik IBM Research Almaden / Berlin Institute of Technology Product Aspects Large
University of Illinois at Chicago
IBM Research Almaden / Berlin Institute of Technology
„The battery life of this smartphone is great.“ „It uses AAA batteries.“
Widely used in review topic / aspect discovery Most models regard each topic as a distribution over
Terms in each document are assigned to topics
The generation of topics is mostly governed by “higher
Major issue: individual words may not convey the
Leading to three problems:
<the> <battery_life> <of> <this> <smartphone> <is> <great>
“battery life” may not be in the same topic as “battery”, because we don’t observe co-occurence
<the> <battery> <life> <battery_life> <of> <this> <smartphone> <is> <great>
“battery life” is much less frequent than “life” to be ranked
Using n-grams in topic modeling (Mukherjee and Liu 2013;
Identifying key phrases in the post-processing step based on
Directly modeling word order in topic model (Wallach 2006;
Left: initial state Middle: draw a ball of a certain color Right: put two balls of the same color back Self-reinforcing property known as “the rich get richer”
GPU vs. SPU: apart from two balls with the same color
We call this the promotion of these colored balls Using the idea in the sampling process:
seeing it again under the same topic
topic
In our application:
Two directions of promotion:
Data sets:
Preprocessing:
Standard topic models cannot discover product aspects well when directly applied to reviews (Titov and McDonald, 2008)
Use rule-based method for efficiency
Set document-topic prior 𝛽=50/𝐿, where 𝐿 is the number of topics. Set topic-term prior 𝛾=0.1
Not all words in a phrase are equally important
Determine head nouns
phrase as the head noun
GPU promotion
assigned to its head noun
0.25 * virtualcount to all other words when a topic is assigned to a phrase
phrases
Two commonly used evaluation statistics:
We use topic coherence (Mimno et al. 2011)
under a topic
Topic Coherence using top 15 topical terms
Topic Coherence using top 30 topical terms
Done by two annotators in two stages sequentially