for Microblog Search A Preliminary Study Maram Hasanain, Rana - PowerPoint PPT Presentation

Query Performance Prediction for Microblog Search A Preliminary Study Maram Hasanain, Rana Malhas, Tamer Elsayed 11 July 2014 SoMeRA’14 Workshop in conjunction with SIGIR’14

Why? sigir awards Expectation high quality results Reality Some queries are difficult Poor results 2

What’s QPP? Query Retrieval model Result list ( R ) sigir awards Query Performance Prediction (QPP) estimated performance 3

QPP in Microblog Search? • QPP is not a new problem RQ1: How well the existing state-of-the-art predictors perform in the context of microblog search ? • Microblog search is different RQ2: Will the predictors performance be consistent across different retrieval models, specifically temporal ones ? 4

Setup of the Study … 5

Overview • Examine frequently-used predictors for tweets search • 2 types of predictors : Content-based: consider terms in tweets and o queries Temporal: also consider time factor o • 2 types of retrieval models : Content-based o e.g. Query Likelihood Temporal o e.g. Time-based Exponential Priors 6

QP Predictors Content-based predictors • Standard deviation ( σ ) o Normalized Standard Deviation ( NSD ) o Normalized Query Commitment ( NQC ) Post- • KL-divergence retrieval o Clarity ( CLR ) • Information Gain o Weighted information gain ( WIG ) 7

QP Predictors • Inverse document frequency (IDF) SumIDF , MaxIDF , AvgIDF ,… Pre- • Collection-query similarity (SCQ) retrieval SumSCQ , MaxSCQ , AvgSCQ ,… • Simplified clarity score (SCS) 8

QP Predictors Temporal predictor • KL-divergence Temporal Clarity ( t -CLR ) Post-retrieval 9

Retrieval Models Content-based • Query Likelihood (QL) Temporal • QL with temporal prior ( t -EXP) • Temporal relevance modeling ( t -QRM) 10

Evaluation 11

Setup Datasets Tweets2011 Tweets2013 Source TREC’11 -12 TREC’13 Tweets ~16M ~243M Queries 108 60 Time span ~2 weeks ~2 months Evaluating retrieval Evaluation measure: Average precision (AP) 12

Setup Evaluating prediction • Correlation between predicted AP & actual AP . • Linear correlation: Pearson’s r • Rank correlation: Kendall’s - τ Training/Testing • 75% of queries for parameter tuning • Repeat and average with 120 trials 13

Results (Tweets2011) 0.60 t -CLR is best 0.55 NQC: Increase in performance 0.50 0.45 Not significant Pearson’s correlation 0.40 t-CLR 0.35 CLR 0.30 WIG SumIdf: Comparable quality 0.25 CLR: Decline in quality NSD 0.20 NQC 0.15 SumIdf 0.10 WIG: Decline in quality 0.05 0.00 QL t-EXP t-QRM Retrieval model 14

Results (Tweets2013) 0.60 NQC: Increase in performance 0.55 0.50 Not t -CLR has good 0.45 significant CLR is best performance Pearson’s correlation 0.40 t-CLR 0.35 CLR 0.30 WIG 0.25 NSD 0.20 NQC 0.15 SumIdf 0.10 0.05 0.00 QL t-EXP t-QRM Retrieval model 15

Combining Predictors • Using linear regression • Feature selection to find best predictors combination • Only over Tweet2011 • 40% of queries for parameter tuning • Train & test combined model by cross-validation with 60% of queries. 16

Combining Predictors (Tweets2011) 0.60 {t -CLR,CLR,WIG,SCS} {t -CLR,WIG,SCS} {t-CLR,NQC,NSD,SumIDF} 0.50 Pearson's correlation t -CLR in best 0.40 Pre-retrieval predictors combinations in best combinations 0.30 21.6% 27.8% 46.5% Combined Best 0.20 0.10 0.00 QL t-EXP t-QRM Retrieval model 17

Summary • First comprehensive study focusing on testing QPP in microblog search with different retrieval models. • Temporal predictors might be more suitable for microblog search. • Combining predictors improved prediction quality. • Some pre-retrieval predictors are showing promising results. 18

Future Work • Experiment with more temporal predictors & retrieval models • Develop new… Temporal predictors o Predictors considering tweet-specific features o • Use QPP in … Selective query expansion o Dynamic query expansion o 19

Thank You  20

for Microblog Search A Preliminary Study Maram Hasanain, Rana - PowerPoint PPT Presentation

Query Performance Prediction for Microblog Search A Preliminary Study Maram Hasanain, Rana Malhas, Tamer Elsayed 11 July 2014 SoMeRA14 Workshop in conjunction with SIGIR14 Why? sigir awards Expectation high quality results Reality

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff &

Using WordNet for Query Expansion: ADAPT @ FIRE 2016 Microblog Track Wei Li , Debasis Ganguly,

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions Tao Zhu

Paraphrasing 4 Microblog Normalization Wang Ling Carnegie Mellon University Chris Dyer

Predic'ng Responses to Microblog Posts Yoav Artzi 1 , Patrick

A Semi-Supervised Bayesian Network Model for Microblog Topic Classification Yan Chen 1 , 2 Zhoujun

wPod Weibo Public Opinion (Polarity) Detection Haotian He & Sanae Sato Microblog is

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

Authenticated Encryption Atul Luykx COSIC, ESAT, KU Leuven, Belgium July 15, 2016 1 2 2 2 2

IO on Lustre and GPFS David Henty and Adrian Jackson (EPCC, The University of Edinburgh) Charles

Log-Structured Merge Trees CSCI 333 How Should I Organize My Stuff (Data)? How Should I

Leveraging the GPU on Spark Tobias Polzer, Friedrich-Alexander University Erlangen-Nuremberg

Utilizing Micr Utilizing Microblogs f oblogs for A r Automatic matic Ne News Highlights

Microblogs as Parallel Corpora Wang Ling, Guang Xiang, Chris Dyer, Isabel Trancoso, Alan W Black

Real-time #SemanticWeb in <= 140 chars Linked Data on the Web (LDOW2010) April 27 th , 2010

DETECTING RUMORS FROM MICROBLOGS WITH RECURRENT NEURAL NETWORKS 515030910611 INTRODUCTION

Sambuz

Useful Links

Newsletter

Mail Us

for Microblog Search A Preliminary Study Maram Hasanain, Rana - PowerPoint PPT Presentation

Query Performance Prediction for Microblog Search A Preliminary Study Maram Hasanain, Rana Malhas, Tamer Elsayed 11 July 2014 SoMeRA14 Workshop in conjunction with SIGIR14 Why? sigir awards Expectation high quality results Reality

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff &amp;

Using WordNet for Query Expansion: ADAPT @ FIRE 2016 Microblog Track Wei Li , Debasis Ganguly,

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions Tao Zhu

Paraphrasing 4 Microblog Normalization Wang Ling Carnegie Mellon University Chris Dyer

Predic'ng Responses to Microblog Posts Yoav Artzi 1 , Patrick

A Semi-Supervised Bayesian Network Model for Microblog Topic Classification Yan Chen 1 , 2 Zhoujun

wPod Weibo Public Opinion (Polarity) Detection Haotian He &amp; Sanae Sato Microblog is

Text REtrieval Conference (TREC) TREC TRACKS Crowdsourcing Personal Blog, Microblog documents

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

Authenticated Encryption Atul Luykx COSIC, ESAT, KU Leuven, Belgium July 15, 2016 1 2 2 2 2

IO on Lustre and GPFS David Henty and Adrian Jackson (EPCC, The University of Edinburgh) Charles

Log-Structured Merge Trees CSCI 333 How Should I Organize My Stuff (Data)? How Should I

Leveraging the GPU on Spark Tobias Polzer, Friedrich-Alexander University Erlangen-Nuremberg

Utilizing Micr Utilizing Microblogs f oblogs for A r Automatic matic Ne News Highlights

Microblogs as Parallel Corpora Wang Ling, Guang Xiang, Chris Dyer, Isabel Trancoso, Alan W Black

Real-time #SemanticWeb in &lt;= 140 chars Linked Data on the Web (LDOW2010) April 27 th , 2010

DETECTING RUMORS FROM MICROBLOGS WITH RECURRENT NEURAL NETWORKS 515030910611 INTRODUCTION

Sambuz

Useful Links

Newsletter

Mail Us

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff &

wPod Weibo Public Opinion (Polarity) Detection Haotian He & Sanae Sato Microblog is

Real-time #SemanticWeb in <= 140 chars Linked Data on the Web (LDOW2010) April 27 th , 2010