Geographical Topic Discovery and Comparison
Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, Thomas Huang UIUC To appear in WWW’11
Geographical Topic Discovery and Comparison Zhijun Yin, Liangliang - - PowerPoint PPT Presentation
Geographical Topic Discovery and Comparison Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, Thomas Huang UIUC To appear in WWW11 Presenter: Jeff Huang Outline Motivation Problem Formulation Solution Sketch
Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, Thomas Huang UIUC To appear in WWW’11
3/21/2011 2
locations when the photos were taken.
interfaces for users to specify a location on the world map.
phones.
3/21/2011 3
3/21/2011 4
festivals, we can compare the cultural differences around the world.
candidates in presidential election in different places.
different regions and help make the marketing strategy.
3/21/2011 5
6
coherent in geographical regions.
geographical locations.
3/21/2011
associated with a GPS location.
In other words, the words that are often close in space are clustered in a topic.
tags and locations in Flickr, the desired geographical topics are the festivals in different areas, such as Cherry Blossom Festival in Washington DC and South by Southwest Festival in Austin, etc.
3/21/2011 7
locations.
3/21/2011 8
comparison
with tags and locations in Flickr, we would like to discover the geographical topics, i.e., what people eat in different
like to compare the food preference distributions in different geographical locations.
3/21/2011 9
distribution of the topics given a specific location.
= (x, y) where x is longitude and y is latitude.
3/21/2011 10
and the number of topics K, we would like to discover K geographical topics, i.e., where Z is the topic set and a geographical topic z is represented by a word distribution s.t. .
also would like to know the topic distribution in different geographical locations for topic comparison, i.e., p(z|l) for all z Z in location l.
3/21/2011 11
Z z z
} {
V w z
z w p
)} | ( { 1 ) | (
V w
z w p
Topic Analysis (LGTA))
3/21/2011 12
geographical topic may be from several different areas and these areas may not be close to each other.
these areas are not close to each other
3/21/2011 13
documents
3/21/2011 14
spatial structure of words
into the same geographical topic.
generated from regions instead of documents.
more likely to belong to the same region.
to be clustered into the same topic.
3/21/2011 15
3/21/2011 16
region importance
p(z|d) p(w|z)
location shape
the dataset.
discovery) is based on both location and topic information.
the text and region information.
3/21/2011 17
range, etc.
3/21/2011 18
3/21/2011 19
3/21/2011 20
3/21/2011 21
3/21/2011 22
coast desert mountain
LDM TDM GeoFolk LGTA
3/21/2011 23
3/21/2011 24
3/21/2011 25
3/21/2011 26
27 3/21/2011
images that are without geo information?