Placing images on the world map: a microblog- based enrichment - PowerPoint PPT Presentation

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff & Geert-Jan Houben, TU Delft (NL) SIGIR 2012 1 Claudia Hauff, 2012

Problem & Motivation autumn, leaves, red, Latitude/Longitude: 52.4/-3.2 sad, melancholy Travelogue Illustration 0.44/-12.29 4.65/-15.19 Travel Timeline Personal Archive Organization User Profile 44.65/-63.22 autumn, fall, reflection, Latitude/Longitude: 48.23/-74.34 Personalized Travel E-learning: water, morning, trees, jetty Recommendations Cultural Exposure ……. our Flickr data set: 80% of images not geo-tagged 2 Claudia Hauff, 2012

image with geo-tag Source: http://www.flickr.com/photos/nathanpirtz/6963996476/ 3 Claudia Hauff, 2012

image with geo-tag Source: http://www.flickr.com/photos/nathanpirtz/6963996476/ 4 Claudia Hauff, 2012

image without geo-tag Source: http://www.flickr.com/photos/29738009@N08/2975466425/ 5 Claudia Hauff, 2012

image without geo-tag or tags On a visit to the beautiful Japanese Garden in Portland , Oregon #mustsee #pdx 2:03PM – 18 June 2010 Source: http://www.flickr.com/photos/nido/4737115541/ 6 Claudia Hauff, 2012

The past vs. our approach + = • Location estimation based on text (mainly image tags) • Serdyukov et al., Van Laere et al. • Location estimation based on visual features • Lux et al. • Hybrid approaches (visual features as backup in the estimation) • Kelm et al. • This work: text-based, merges traces of the user on different social Web streams (cross-system exploitation) Hypothesis: enriching the image’s textual meta-data with the user’s tweets improves the accuracy of the location estimation. 7 Claudia Hauff, 2012

Why do people tweet? Why do we think our hypothesis holds? • Tweet categories, Java et al. • Daily chatter • Shared information and hyperlinks • Conversations • News • Majority of users (~80%) focus on themselves, Naaman et al. • Users’ view on the why, Zhao et al. • Keeping in touch • Collecting information (work & spare-time related) 8 Claudia Hauff, 2012

✚ ✚ From images to documents ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ Formulating an information retrieval problem ✚ ✚ ✚ ✚ ✚ ✚ • Given a set of training images with ✚ ✚ ✚ ✚ ✚ known latitude/longitude • Start with a grid cell spanning the ✚ ✚ ✚ world map ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ • Iteratievly training images • Split dense cells ✚ ✚  cells of small size in regions with ✚ ✚ ✚ ✚ large amounts of training data ✚ ✚ ✚ ✚ ✚ … • Each cell is transformed into a “region document” ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ • The textual meta-data across the images ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ is concatenated into one document ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ 9 Claudia Hauff, 2012

From documents to location estimation Formulating an information retrieval problem • A language model is derived from each world region (document) documents • The possible regions where test image I with textual meta- T I = { t 1 , t 2 ,..., t n } data may have been taken are ranked according to: query P ( ! R | T I ) = P ( T I | ! R ) P ( ! R ) n $ " P ( ! R ) # P ( t i | ! R ) P ( T I ) i = 1 • Assign I the location of the top ranked training image 10 Claudia Hauff, 2012

Eliminating noisy terms Geographic spread filtering • Not all image tags/terms are equally useful bowling london baby british • Spread of training images on the world map is a good indicator ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ vs. ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ ✚ 11 Claudia Hauff, 2012

Eliminating noisy terms Geographic spread filtering bowling 3.237 baby 1.809 filter out UK, British Columbia, the east 0.695 British Virgin Islands, British restaurants in the british 0.363 ! geo US, places with historic lakepukaki 0.049 battles against the British london 0.010 sydney 0.007 12 Claudia Hauff, 2012

Adding additional knowledge “New York City” 3,869,086 results “Great Victoria Desert” 131 results P ( ! R | T I ) = P ( T I | ! R ) P ( ! R ) n $ " P ( ! R ) # P ( t i | ! R ) P ( T I ) i = 1 • Region prior : instead of a uniform probability, add knowledge about the world and the different regions of the world • Population density, climate • Set of terms : the bag-of-words that describe an image can be extended by including terms from the user’s traces on the social Web • Tweets within D days of the image being taken 13 Claudia Hauff, 2012

Experimental setup • Training data: MediaEval data set, 3.2M geo-tagged images • Lack of usable Twitter resources: few geo-tagged tweets • Test data: starting with an 11 months Twitter data set of 20,000 users, we searched for corresponding Flickr accounts • A crawl of friendfeed.com profiles • Manual assessment of posted tweets 252 users 1.89M tweets 0.15M images Nov’10 – Sept’11 geo-tagged 7477 27,879 images 30,951 images 14 Claudia Hauff, 2012

Results Percentage of test images within Median 1km 10km 50km 1000km error BaseLine geo 7.2% 35.0% 48.6% 61.4% 61km BaseLine 7.5% 27.9% 34.9% 42.3% 2513km Population 7.1% 34.7% 48.4% 70.4% 62km BaseLine: 5600 unique terms BaseLine geo : 466 unique terms 15 Claudia Hauff, 2012

Results Percentage of test images within Median 1km 10km 50km 1000km error BaseLine geo 7.2% 35.0% 48.6% 61.4% 61km BaseLine 7.5% 27.9% 34.9% 42.3% 2513km Population 7.1% 34.7% 48.4% 70.4% 62km +/-2 days 4.3% 16.9% 25.2% 41.8% 1974km +/-2 days 9.0% 38.2% 54.7% 71.2% 22km +/-20 days 8.3% 36.7% 53.6% 70.8% 27km +/-2 days 9.0% 37.9% 54.6% 76.0% 21km Population BaseLine: 5600 unique terms BaseLine geo : 466 unique terms 16 Claudia Hauff, 2012

Results Percentage of test images within Median Image location estimation based on user traces 1km 10km 50km 1000km error across social Web platforms decreases the median error distance by up to 67%. BaseLine geo 7.2% 35.0% 48.6% 61.4% 61km BaseLine 7.5% 27.9% 34.9% 42.3% 2513km Population 7.1% 34.7% 48.4% 70.4% 62km The population density prior improves the +/-2 days 4.3% 16.9% 25.2% 41.8% 1974km accuracy (in the long range). +/-2 days 9.0% 38.2% 54.7% 71.2% 22km +/-20 days 8.3% 36.7% 53.6% 70.8% 27km +/-2 days 9.0% 37.9% 54.6% 76.0% 21km Population BaseLine: 5600 unique terms BaseLine geo : 466 unique terms 17 Claudia Hauff, 2012

What images benefit from Twitter enrichment? Number of tags 4 10 Baseline geo +/ � 2 Days � =0.0 +/ � 2 Days � =0.8 3 10 Median Error in KM 2 10 1 10 0 10 0 Tags (1747) 1 Tag (1783) 2 Tags (2930) Test set split according to the number of tags after geo-filtering. 18 Claudia Hauff, 2012

What images benefit from Twitter enrichment? Number of tags 4 10 Baseline geo The Twitter stream is particularly useful in +/ � 2 Days � =0.0 +/ � 2 Days � =0.8 cases of little or no textual meta-data. 3 10 Median Error in KM 2 10 1 10 0 10 0 Tags (1747) 1 Tag (1783) 2 Tags (2930) Test set split according to the number of tags after geo-filtering. 19 Claudia Hauff, 2012

What images benefit from Twitter enrichment? Distance to home location (4515 test images) 10,000 1km +/-2 Days, λ = 0 . 8 : error in KM 10km 100km 1000km 1,000 >1000km 100 BaseLine geo 10 performs better than +/-2 Days with λ = 0 . 8 1 BaseLine geo performs worse 0.1 than +/-2 Days with λ = 0 . 8 0.1 1 10 100 1,000 10,000 BaseLine geo : error in KM 20 Claudia Hauff, 2012

What images benefit from Twitter enrichment? Distance to home location (4515 test images) Locations further away from home are recognized with higher accuracy when using 10,000 1km +/-2 Days, λ = 0 . 8 : error in KM 10km the Twitter stream. 100km 1000km 1,000 >1000km 100 BaseLine geo 10 performs better than +/-2 Days with λ = 0 . 8 1 BaseLine geo performs worse 0.1 than +/-2 Days with λ = 0 . 8 0.1 1 10 100 1,000 10,000 BaseLine geo : error in KM 21 Claudia Hauff, 2012

Conclusions & future work • Image location estimation based on user traces across social Web platforms outperforms the single-source baseline • The Twitter stream is particularly useful in cases of little or no textual meta-data • The population density prior improves the accuracy (in the long range) Japan announced meltdown yesterday; situation grim. • Future work Here in Toronto the police made multiple arrests today. • Tweet filtering (personal experiences vs. news) • Improved combination of data gathered from social Web streams • Turning the task around: user account matching 22 Claudia Hauff, 2012

Thank you! c.hauff@tudelft.nl 23 Claudia Hauff, 2012

Placing images on the world map: a microblog- based enrichment - PowerPoint PPT Presentation

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff & Geert-Jan Houben, TU Delft (NL) SIGIR 2012 1 Claudia Hauff, 2012 Problem & Motivation autumn, leaves, red, Latitude/Longitude: 52.4/-3.2 sad,

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

CS4495/6495 Introduction to Computer Vision 2A-L1 Images as functions Images as functions Images

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen

Using WordNet for Query Expansion: ADAPT @ FIRE 2016 Microblog Track Wei Li , Debasis Ganguly,

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions Tao Zhu

Paraphrasing 4 Microblog Normalization Wang Ling Carnegie Mellon University Chris Dyer

for Microblog Search A Preliminary Study Maram Hasanain, Rana Malhas, Tamer Elsayed 11 July 2014

Predic'ng Responses to Microblog Posts Yoav Artzi 1 , Patrick

A Semi-Supervised Bayesian Network Model for Microblog Topic Classification Yan Chen 1 , 2 Zhoujun

wPod Weibo Public Opinion (Polarity) Detection Haotian He & Sanae Sato Microblog is

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Reconceptualizing Leadership and Advocacy in ECE: Placing Teacher Voices at the Center of

Placing the First Five Generations of Linguists John Goldsmith University of Chicago

microblogging posts Jasmina Smailovi Joef Stefan Institute Department of Knowledge Technologies

Information Extraction from Microblogs Posted during Disasters Saptarshi Ghosh 1 Kripabandhu Ghosh

A Year in the Life of a Parallel File System Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren

Applying geographical clustering methods to analyze geo-located open micro-blog posts Andy Turner 1

Bayesian Modeling for Analyzing Online Content and Users Bin Bi Computer Science Department

AKC Education Webinar Series: Social Media Resources Buffer: What Is It? Buffer is a software

Functional breakdown of Student: Wouter Miltenburg decentralised social networks Supervisor:

TCC Index China Telematics Brands Exposure Analysis Report 2013.3-2013.5 1 TCC Index TCC

Placing images on the world map: a microblog- based enrichment - PowerPoint PPT Presentation

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff & Geert-Jan Houben, TU Delft (NL) SIGIR 2012 1 Claudia Hauff, 2012 Problem & Motivation autumn, leaves, red, Latitude/Longitude: 52.4/-3.2 sad,

map-D map-D data refined map-D data refined map-D A GPU Database for Real-Time Big Data

CS4495/6495 Introduction to Computer Vision 2A-L1 Images as functions Images as functions Images

Abstract Data Type Map Map ADT Another fundamental abstract data type is the map (also The most

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

var ol3d = new olcs.OLCesium({map: map, target: id}); ol3d.setEnabled(true); var ol3d = new

A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen

Using WordNet for Query Expansion: ADAPT @ FIRE 2016 Microblog Track Wei Li , Debasis Ganguly,

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions Tao Zhu

Paraphrasing 4 Microblog Normalization Wang Ling Carnegie Mellon University Chris Dyer

for Microblog Search A Preliminary Study Maram Hasanain, Rana Malhas, Tamer Elsayed 11 July 2014

Predic'ng Responses to Microblog Posts Yoav Artzi 1 , Patrick

A Semi-Supervised Bayesian Network Model for Microblog Topic Classification Yan Chen 1 , 2 Zhoujun

wPod Weibo Public Opinion (Polarity) Detection Haotian He &amp; Sanae Sato Microblog is

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Reconceptualizing Leadership and Advocacy in ECE: Placing Teacher Voices at the Center of

Placing the First Five Generations of Linguists John Goldsmith University of Chicago

microblogging posts Jasmina Smailovi Joef Stefan Institute Department of Knowledge Technologies

Information Extraction from Microblogs Posted during Disasters Saptarshi Ghosh 1 Kripabandhu Ghosh

A Year in the Life of a Parallel File System Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren

Applying geographical clustering methods to analyze geo-located open micro-blog posts Andy Turner 1

Bayesian Modeling for Analyzing Online Content and Users Bin Bi Computer Science Department

AKC Education Webinar Series: Social Media Resources Buffer: What Is It? Buffer is a software

Functional breakdown of Student: Wouter Miltenburg decentralised social networks Supervisor:

TCC Index China Telematics Brands Exposure Analysis Report 2013.3-2013.5 1 TCC Index TCC

wPod Weibo Public Opinion (Polarity) Detection Haotian He & Sanae Sato Microblog is