If I Had a Million Queries - PowerPoint PPT Presentation

If I Had a Million Queries Ben Cartere)e, Virgil Pavlu, Evangelos Kanoulas,  Javed Aslam, James Allan 

TREC 2008 Million Query Track  • TradiDonal TREC evaluaDon setup  – Depth‐100 pools judged  – 50 queries  – Infeasible (judgment effort) and insufficient  • Million Query evaluaDon setup  – Reduce judgment effort by carefully selecDng  • Documents to judge  • Types of queries to evaluate systems on 

TREC 2008 Million Query Track  QuesDons:  1. Can low‐cost methods reliably evaluate retrieval systems?  2. What is the minimum cost needed to reach reliable result?  3. Are some query types more informaDve than others?  4. Is it be)er to judge a lot of documents for a few queries or  a few documents for a lot of queries? 

Million Query Track Setup  8 participating sites 25 retrieval runs 10,000 Queries GOV2 TREC crew @ NIST

Million Query Track Setup  Retrieval results 8 participating sites 25 retrieval runs 10,000 Queries GOV2 TREC crew @ NIST

Million Query Track Setup  Retrieval results 8 participating sites 25 retrieval runs 10,000 Queries GOV2 Assessors TREC crew @ NIST

Million Query Track Setup  Retrieval results 8 participating sites 25 retrieval runs 10,000 Queries GOV2 Assessors TREC crew @ NIST Relevance judgments

Million Query Track Setup  Retrieval results 1  2  10,000 Queries GOV2 3  4  Assessors 5  …  TREC crew @ NIST Relevance judgments

Document SelecDon and EvaluaDon  • Two low‐cost algorithms  – MTC  (Cartere)e, Allan, & Sitaraman, 2006)   Document SelecDon  • Greedy on‐line algorithm  • Selects most discriminaDve documents  • Targets at accurate ranking of systems  EvaluaDon  • Each document has a probability of relevance  • Measures as expectaDons over relevance distribuDon 

Document SelecDon and EvaluaDon  • Two low‐cost algorithms  – statAP  (Aslam & Pavlu, 2008)  Document SelecDon  • StraDfied random sampling  • Selects documents based on prior belief of relevance  EvaluaDon  • Apply well‐established esDmaDon techniques  • Targets at accurate system scores 

Queries  • 10,000 queries sampled from logs of a search  engine.  • Queries were assigned categories  – Long (more than 6 words) vs. Short  – Gov‐heavy (more than 3 clicks) vs. Gov‐slant  short long gov‐slant 2,434 2,434 gov‐heavy 2,434 2,434

Judgments per Query  • Five different targets for number of judgments  – 8, 16, 32, 64 and 128 judgments targeted  – Equal total number of judgments per target over  all queries 

Relevance Judgments  • 784 of the 10,000 queries judged  • 15,211 total judgments  – ~75% less than in past years 

Relevance Judgments  • DistribuDon of queries per category and  judgment target  Judgments 8 16 32 64 128 Total Category  Short‐govslant 95 55 29 13 4 196 Short‐govheavy 118 40 26 10 3 197 Long‐govslant 98 52 26 13 8 197 Long‐govheavy 92 57 21 14 10 194 Total 403 204 102 50 25 784

EvaluaDon Measure  • Weighted MAP:  5 5 wMAP = 1 1 1 ∑ ∑ ∑ MAP j = AP q 5 5 |Q j | j = 1 j = 1 q ∈ j Judgments 8 16 32 64 128 Total Total 403 204 102 50 25 784

System Scores and Rankings 

Timing Info for Cost Analysis  • Query overhead  refresh view last view topic short 2.34 18.0 25.5 67.6 long 2.54 24.5 31.0 86.5 gov‐slant 2.22 22.5 29.0 76.0 gov‐heavy 2.65 20.0 27.5 78.0 average 2.41 22.0 29.0 76.0

Timing Info for Cost Analysis  • Judging Dme per category and judgment target   8 16 32 64 128 average short 15.0 11.5 13.5 12.0 8.5 12.5 long 17.0 14.0 16.5 10.0 10.5 13.0 gov‐slant 13.0 12.5 13.0 9.5 10.5 12.0 gov‐heavy 19.0 13.0 17.0 12.5 8.5 13.5 average 15.0 13.0 15.0 11.0 9.0 13.0

Analysis of Variance  •       = variance due to systems  σ s •       = variance due to queries  σ q •       = variance due to query‐system interacDon  σ sq

TREC 2008 Million Query Track  • Measure the stability of   Variance due to systems – Scores:  Total variance Variance due to systems – Rankings:  Var. due to systems + Var. due to query-system

MAP Variance Components  •       What is the minimum cost needed to reach reliable result? 

MAP Variance Components   per Query Category  Are some queries types more informaDve than others?  •

Query SelecDon  Are some queries types more informaDve than others?  •

Kendall’s tau Analysis  •       What is the minimum cost needed to reach reliable result? 

Kendall’s tau Analysis  Are some query types more informaDve than others?  •

Relevance  • Percentage of relevant documents per query  category and judgment target   8 16 32 64 128 avg Judgments  Category Short‐govslant 18.7 12.1 20.2 13.9 3.0 14.6 Long‐govslant 20.2 17.0 17.3 12.0 13.7 15.9 Short‐govheavy 24.6 30.8 30.4 23.4 37.4 28.3 Long‐govheavy 28.8 20.4 22.3 13.6 16.0 19.6 avg 23.1 19.3 22.5 15.2 15.7 19.3

Cost‐Benefit Analysis  • Is it be)er to judge a lot of documents for a few queries or a  few documents for a lot of queries?  64 judgments & 50 queries

Conclusion  • Low‐cost methods reliably evaluate retrieval systems  with very few judgments  • Minimum cost to reach reliable results  – 10‐15 hours of judgment Dme  • Some queries more informaDve than others  – Gov‐heavy more informaDve than gov‐slant  • 64 judgments per query with around 50 queries is  opDmal for assessing systems’ performance ranking 

If I Had a Million Queries - PowerPoint PPT Presentation

If I Had a Million Queries BenCartere)e,VirgilPavlu,EvangelosKanoulas, JavedAslam,JamesAllan TREC2008MillionQueryTrack TradiDonalTRECevaluaDonsetup Depth100poolsjudged

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

SLIDE 1 Welcome everyone. We want to apologize for last weeks fiasco. We had had done several

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

New Requirements Top-N/Bottom-N queries Interactive queries Decision making

Computational Geometry Lecture 15: Windowing queries Computational Geometry Lecture 15:

Ownership Problems 23.1 Million parcel in Cadaster Record 32,5 Million parcel in

English Grammar I Past You had (not) Participle He / She / It (hadnt) We / You / They

What if Gauss had had a computer? Paul Zimmermann, INRIA, Nancy, France Celebrating 75 Years of

Judgment Aggregation Ulle Endriss Institute for Logic, Language and Computation University of

Introduction to Semantics Dr. Mattox Beckman University of Illinois at Urbana-Champaign

Engage. Advocate. WHY ENGAGE IN ADVOCACY? Tiny top hat from EYE! Why Advocate to Elected

One - Liners In D n Defens nse e of of Si Sin One - Liners In De n Defens nse

Government Financial Series: How Long-Term Financial Planning Helps Save Critical Resources -

Regulation stay in the loop! Terence Clark FCoI MCSI DipFSM Chairman & Compliance

Paths to Justice in the Paths to Justice in the Netherlands Netherlands Looking for signs of

Administrivia, Introduction CS 573: Algorithms Lecture 1 August 27, 2013 Sariel Har-Peled

Sambuz

Useful Links

Newsletter

Mail Us

If I Had a Million Queries - PowerPoint PPT Presentation

If I Had a Million Queries BenCartere)e,VirgilPavlu,EvangelosKanoulas, JavedAslam,JamesAllan TREC2008MillionQueryTrack TradiDonalTRECevaluaDonsetup Depth100poolsjudged

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

SLIDE 1 Welcome everyone. We want to apologize for last weeks fiasco. We had had done several

Geometric Algorithms Range &amp; windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

New Requirements Top-N/Bottom-N queries Interactive queries Decision making

Computational Geometry Lecture 15: Windowing queries Computational Geometry Lecture 15:

Ownership Problems 23.1 Million parcel in Cadaster Record 32,5 Million parcel in

English Grammar I Past You had (not) Participle He / She / It (hadnt) We / You / They

What if Gauss had had a computer? Paul Zimmermann, INRIA, Nancy, France Celebrating 75 Years of

Judgment Aggregation Ulle Endriss Institute for Logic, Language and Computation University of

Introduction to Semantics Dr. Mattox Beckman University of Illinois at Urbana-Champaign

Engage. Advocate. WHY ENGAGE IN ADVOCACY? Tiny top hat from EYE! Why Advocate to Elected

One - Liners In D n Defens nse e of of Si Sin One - Liners In De n Defens nse

Government Financial Series: How Long-Term Financial Planning Helps Save Critical Resources -

Regulation stay in the loop! Terence Clark FCoI MCSI DipFSM Chairman &amp; Compliance

Paths to Justice in the Paths to Justice in the Netherlands Netherlands Looking for signs of

Administrivia, Introduction CS 573: Algorithms Lecture 1 August 27, 2013 Sariel Har-Peled

Sambuz

Useful Links

Newsletter

Mail Us

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.

Regulation stay in the loop! Terence Clark FCoI MCSI DipFSM Chairman & Compliance