Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test - - PowerPoint PPT Presentation

imp mpro roving search rch thro
SMART_READER_LITE
LIVE PREVIEW

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test - - PowerPoint PPT Presentation

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test sting: : A A Ca Case se Study Nokia Maps Place Discovery Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen Nokia


slide-1
SLIDE 1

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test sting: : A A Ca Case se Study

Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen

slide-2
SLIDE 2

Nokia Maps s for r Everyo ryone!

slide-3
SLIDE 3

Nokia Maps s Team, m, Berli rlin

slide-4
SLIDE 4

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re”

slide-5
SLIDE 5

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re”

Easily discover places nearby with a tap wherever you are. View them in the map or in a list view.

slide-6
SLIDE 6

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re”

Easily discover places nearby with a tap wherever you are. View them in the map or in a list view. Tap on a list item to see detail information.

slide-7
SLIDE 7

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re”

Easily discover places nearby with a tap wherever you are. View them in the map or in a list view. Possible user actions:

  • SaveAsFavorite
  • CallThePlace
  • DriveTo

Tap on a list item to see detail information.

slide-8
SLIDE 8

Pro roble lem: m: Which ch Pla lace ces s to Show? w?

  • Restaurants? Hotels? Shopping? …
  • rank by Ratings?
  • Distance?
  • Usage?
  • Trending?
  • ….
slide-9
SLIDE 9

Ap Appro roach ch: : A/ A/B-Test st Differe rent Ve Versi rsions! s!

Here is classical Web A/B testing:

slide-10
SLIDE 10

A/ A/B-Test st for forNearb rby Pla lace ces

Version A: Best of Eat’n’Drink Version B: Best of Hotels

Versions Compete for User engagement: = Number of Actions performed on places.

slide-11
SLIDE 11

There re Is s A A Better r Ap Appro roach ch For r Ranked List sts s

[Joachims et al 2008]: “How Does Clickthrough Data Reflect Retrieval Quality?”

  • Classical A/B testing converges slowly for ranked lists
  • Classical A/B testing often doesn’t reflect actual relevance
  • A/B Tests for Ranked Result Lists: Rank- Interleaving
  • Use Rank-Interleaving for faster statistical significance
slide-12
SLIDE 12

Effici cient A/ A/B Test sting: : Rank Interle rleaving

Version A: Best of Eat’n’Drink Version B: Best of Hotels

slide-13
SLIDE 13

Effici cient A/ A/B Test sting: : Rank Interle rleaving

Version A: Best of Eat’n’Drink Version B: Best of Hotels Rank Interleaving: Version A + B

+ =

slide-14
SLIDE 14
  • Interleaved list is filled with pairs of results, one item from each version.

Coin toss decides who comes first.

Randomi mize zed Mixing of Resu sult lt List sts s

Version A

  • 1. alpha
  • 2. beta
  • 3. gamma
  • 4. delta
  • 5. epsilon

Version B

  • 1. beta
  • 2. kappa
  • 3. tau

Interleaved Result list <empty>

slide-15
SLIDE 15
  • Interleaved list is filled with pairs of results, one item from each version.

Coin toss decides who comes first.

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s

Version A

  • 1. alpha
  • 2. beta
  • 3. gamma
  • 4. delta
  • 5. epsilon

Version B

  • 1. beta
  • 2. Result f
  • 3. Result g

Interleaved Result list

  • 1. alpha (from A)
  • 2. beta (from B)
slide-16
SLIDE 16
  • Interleaved list is filled with pairs of results, one item from each version.

Coin toss decides who comes first.

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s

Version A

  • 1. alpha
  • 2. (beta)
  • 3. gamma
  • 4. delta
  • 5. epsilon

Version B

  • 1. beta
  • 2. kappa
  • 3. tau

Interleaved Result list

  • 1. alpha (from A)
  • 2. beta (from B)

Duplicates below current item are removed

slide-17
SLIDE 17
  • Interleaved list is filled with pairs of results, one item from each version.

Coin toss decides who comes first.

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s

Version A

  • 1. alpha
  • 2. (beta)
  • 3. gamma
  • 4. delta
  • 5. epsilon

Version B

  • 1. beta
  • 2. kappa
  • 3. tau

Interleaved Result list

  • 1. alpha (from A)
  • 2. beta (from B)
  • 3. gamma (from A)
  • 4. kappa (from B)
slide-18
SLIDE 18
  • Interleaved list is filled with pairs of results, one item from each version.

Coin toss decides who comes first.

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s

Version A

  • 1. alpha
  • 2. (beta)
  • 3. gamma
  • 4. delta
  • 5. epsilon

Version B

  • 1. beta
  • 2. kappa
  • 3. tau

Interleaved Result list

  • 1. alpha (from A)
  • 2. beta (from B)
  • 3. gamma (from A)
  • 4. kappa (from B)
  • 5. tau (from B)
  • 6. delta (from A)
slide-19
SLIDE 19
  • Interleaved list is filled with pairs of results, one item from each version.

Coin toss decides who comes first.

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s

Version A

  • 1. alpha
  • 2. (beta)
  • 3. gamma
  • 4. delta
  • 5. epsilon

Version B

  • 1. beta
  • 2. kappa
  • 3. tau

Interleaved Result list

  • 1. alpha (from A)
  • 2. beta (from B)
  • 3. gamma (from A)
  • 4. kappa (from B)
  • 5. tau (from B)
  • 6. delta (from A)
  • 7. epsilon (from A, extra)

Leftover results are appended but clicks are not counted

slide-20
SLIDE 20
  • Interleaved list is filled with pairs of results, one item from each version.

Coin toss decides who comes first.

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s

Version A

  • 1. alpha
  • 2. (beta)
  • 3. gamma
  • 4. delta
  • 5. epsilon

Version B

  • 1. beta
  • 2. kappa
  • 3. tau

Interleaved Result list

  • 1. alpha (from A)
  • 2. beta (from B)
  • 3. gamma (from A)
  • 4. kappa (from B)
  • 5. tau (from B)
  • 6. delta (from A)
  • 7. epsilon (from A, extra)

Final list shown to user

slide-21
SLIDE 21
  • Statistical Significance Test
  • Input (after hadoop-based log-processing...)
  • Number of clicks on version A
  • Number of clicks on version B
  • G-Test:
  • improved version of Pearson's Chi-squared test.
  • G > 6.635 corresponds to 99% confidence level
  • Null hypothesis:
  • Frequency of counts is equally distributed over both versions.
  • Test statistic:

Decla clari ring A A Winner r

G = 2 [counts i] ln [counts i] [total counts/2] ! " # $ % &

i'{A,B}

(

slide-22
SLIDE 22

Managing Mult ltiple le Ve Versi rsions s

Place Address

QA / Indexing Cluster

Core Type 4 Core Type 1

replication

Data Frontend

(REST API)

Core Type 2

Federation/Ranking Discovery Spelling

Search API Servlet Container

RPC Interaction Area

Core Type 2 Core Type 1 Core Type 3

... Users

Data providers

Batch updates for recovery

SOLR

instance-2

SOLR

instance-1

Zookeeper

slide-23
SLIDE 23

Managing Mult ltiple le Ve Versi rsions s

Place Address

QA / Indexing Cluster

Core Type 4 Core Type 1

replication

Data Frontend

(REST API)

Core Type 2

Federation/Ranking Discovery Spelling

Search API Servlet Container

RPC Interaction Area

Core Type 2 Core Type 1 Core Type 3

... Users

Data providers

Batch updates for recovery

SOLR

instance-2

SOLR

instance-1

Zookeeper

  • Every incoming query is replicated and routed to

Versions A and B

  • Each Version is implemented as a specific type of

SOLR query

  • We deploy more than 2 versions to production and

switch between them using zookeeper

  • Result-mixing of A and B is implemented in a

processing layer above SOLR

slide-24
SLIDE 24
  • don’t confuse users with changing results, i.e.: provide a consistent user

experience

  • Solution:
  • Random generator is seeded with USER-ID for each query.
  • Each user gets his personal random generator.

Ca Caveat 1: : Randomi miza zation

slide-25
SLIDE 25
  • we are relying on the integrity of transmitted user actions
  • sensitive to log contamination (unidentified QA, spam)
  • user-clicks plot:

Ca Caveat 2: : Healt lthy y Cli Click ck Data

slide-26
SLIDE 26
  • Coverage = non-empty responses (in percent)
  • For example
  • A/B interleaving of eat&drink vs. eat&drink + going out
  • difference is not significant
  • But coverage different, percentage of responses with POIs nearby:
  • 60% eat&drink
  • 62% eat&drink + going out
  • Higher coverage decides in case there is no statistical difference

Ca Caveat 3: : A/ A/B Cli Click cks s vs.

  • s. Co

Covera rage

slide-27
SLIDE 27

Ca Case se Study: y: Eat’n’Dri rink versu rsus s Hotels: ls: Not the Use ser r Behaviour Behaviour we we had expect cted!

Rate Save (Fav’s) Contact: Call Contact: URL Share Navigate: Drive Navigate: Walk Navigage: Add Info Provider 375 750 1125 1500

slide-28
SLIDE 28

Ca Case se Study: y: versu rsus s : : Not the Use ser r Behaviour Behaviour we we had expect cted!

Rate Save (Fav’s) Contact: Call Contact: URL Share Navigate: Drive Navigate: Walk Navigage: Add Info Provider 375 750 1125 1500

Some users select their driving destination with the help of Nearby Places. Hotels are a common destination in the car navigation use case.

slide-29
SLIDE 29

Summa mmary ry

  • use A/B Rank Interleaving to optimize result relevance
  • Rank Interleaving is easy to implement. Works.
  • in a distributed search architecture manage your A/B test

configurations conveniently using Zookeeper

  • harness your hadoop/search analytics stack for A/B test evaluations
  • don’t make assumptions about your users!
  • [Joachims et al 2008]:

“How Does Clickthrough Data Reflect Retrieval Quality?”

slide-30
SLIDE 30

Thanks! s! Get in touch ch: : hannes. s.kru ruppa@nokia.co com

Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen