Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 - - PowerPoint PPT Presentation

search evaluation at grooveshark
SMART_READER_LITE
LIVE PREVIEW

Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 - - PowerPoint PPT Presentation

Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02 Traditional Evaluation: TREC Image Courtesy of TREC, http://trec.nist.gov Disadvantages of TREC-Style Evaluation Methods 1. Expensive: a. e.g., 2005 GOV2 collection i. > 45k


slide-1
SLIDE 1

Search Evaluation at Grooveshark

Yoni Teitelbaum 2013-07-02

slide-2
SLIDE 2

Traditional Evaluation: TREC

Image Courtesy of TREC, http://trec.nist.gov

slide-3
SLIDE 3

Disadvantages of TREC-Style Evaluation Methods

  • 1. Expensive:
  • a. e.g., 2005 GOV2 collection
  • i. > 45k judgments2
  • ii. > 25 million documents3
  • 2. Mostly news articles
  • a. significantly different data set than GS songs

database

slide-4
SLIDE 4

GS Weaknesses: Small Team, Few Resources

slide-5
SLIDE 5

GS Strengths:

We’ve got a huge audience!

slide-6
SLIDE 6

A/B Testing Using Click Data

Song 1 Song 2 Song 3 Song 4 Song 2 Song 3 Song 1 Song 4 A Group Sees: B Group Sees:

slide-7
SLIDE 7

What to Measure?

  • Average Rank of Click?
  • Bounce Rate (% of Searches Without a

Click)

  • Average Amount of Time Spent on Search

Page?

  • Median Rank of Click?
  • ...?
slide-8
SLIDE 8

So Which One's Better?

slide-9
SLIDE 9

"Gold Standard" Algorithms4

Song 7 Song 2 Song 3 Song 5 Song 4 Song 6 Song 1 Song 8

slide-10
SLIDE 10

Low Power on Conventional Metrics

Image courtesy of Radlinski, Kurup, and Joachims, 2008.

slide-11
SLIDE 11

Low Power Cont'd

Image courtesy of Radlinski, Kurup, and Joachims, 2008.

slide-12
SLIDE 12

Interleaving Method5

Algorithm A Algorithm B

Song 1A Song 1B Song 2A Song 3A Song 2B Song 3B

slide-13
SLIDE 13

Interleaving Method

User Sees...

Song 1A Song 1B Song 2A Song 3A Song 2B Song 3B

slide-14
SLIDE 14

R Script to Process Results

slide-15
SLIDE 15

Results From Interleaving Test

slide-16
SLIDE 16

The Whole Stack

HTML client (javascript) Server (PHP) HIVE / Hadoop (SQL) Binomial Test (R Script)

slide-17
SLIDE 17

References

1. Text Retrieval Conference. http://trec.nist.gov/ 2. TREC list of judgments for 2005 ad hoc query track. http://trec.nist. gov/data/terabyte/05/05.adhoc_qrels 3. University of Glasgow, Information Retrieval Group http://ir.dcs.gla.ac. uk/test_collections/gov2-summary.htm 4.

  • F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data

reflect retrieval quality? In Conference on Information and Knowledge Management (CIKM), 2008. 5.

  • T. Joachims. Evaluating retrieval performance using clickthrough data. In
  • J. Franke, G. Nakhaeizadeh, and I. Renz, editors, Text Mining, pages 79-
  • 96. Physica/Springer Verlag, 2003.