Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test - PowerPoint PPT Presentation

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test sting: : A A Ca Case se Study Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen

Nokia Maps s for r Everyo ryone!

Nokia Maps s Team, m, Berli rlin

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re”

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Easily discover places nearby with a tap wherever you are. View them in the map or in a list view.

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Easily discover places nearby with a tap wherever you are. View them in the map or in a list view. Tap on a list item to see detail information.

Nokia Maps: s: Nearb rby y Pla lace ces s “Disco scover r Pla lace ces s You Will ll Love, , An Anywh ywhere re” Possible user actions: Easily discover places • SaveAsFavorite nearby with a tap • CallThePlace wherever you are. View • DriveTo them in the map or in a • … list view. Tap on a list item to see detail information.

Pro roble lem: m: Which ch Pla lace ces s to Show? w? • Restaurants? Hotels? Shopping? … • rank by Ratings? • Distance? • Usage? • Trending? • ….

Ap Appro roach ch: : A/ A/B-Test st Differe rent Ve Versi rsions! s! Here is classical Web A/B testing:

A/ A/B-Test st for for Nearb rby Pla lace ces Version A: Version B: Best of Eat’n’Drink Best of Hotels Versions Compete for User engagement: = Number of Actions performed on places.

There re Is s A A Better r Ap Appro roach ch For r Ranked List sts s [Joachims et al 2008]: “How Does Clickthrough Data Reflect Retrieval Quality?” • Classical A/B testing converges slowly for ranked lists • Classical A/B testing often doesn’t reflect actual relevance • A/B Tests for Ranked Result Lists: Rank- Interleaving • Use Rank-Interleaving for faster statistical significance

Effici cient A/ A/B Test sting: : Rank Interle rleaving Version A: Version B: Best of Eat’n’Drink Best of Hotels

Effici cient A/ A/B Test sting: : Rank Interle rleaving Rank Interleaving: Version A: Version B: Version A + B Best of Eat’n’Drink Best of Hotels + =

Randomi mize zed Mixing of Resu sult lt List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list <empty> Version A Version B 1. alpha 1. beta 2. beta 2. kappa 3. gamma 3. tau 4. delta 5. epsilon

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. beta 2. Result f 3. gamma 3. Result g 4. delta 5. epsilon

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau Duplicates below current 4. delta item are removed 5. epsilon

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B)

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A)

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A) Leftover results are appended but clicks 7. epsilon (from A, extra) are not counted

A/ A/B Interle rleaving: : Randomi mize zed Mixing of List sts s • Interleaved list is filled with pairs of results, one item from each version. Coin toss decides who comes first. Interleaved Result list Final list shown to user 1. alpha (from A) Version A Version B 2. beta (from B) 1. alpha 1. beta 2. (beta) 2. kappa 3. gamma 3. tau 4. delta 3. gamma (from A) 5. epsilon 4. kappa (from B) 5. tau (from B) 6. delta (from A) 7. epsilon (from A, extra)

Decla clari ring A A Winner r • Statistical Significance Test • Input (after hadoop-based log-processing...) • Number of clicks on version A • Number of clicks on version B • G-Test: • improved version of Pearson's Chi-squared test. • G > 6.635 corresponds to 99% confidence level • Null hypothesis: • Frequency of counts is equally distributed over both versions. • Test statistic: ! $ [counts i] ( G = 2 [counts i] ln # & [total counts/2] " % i ' { A , B }

Managing Mult ltiple le Ve Versi rsions s RPC Interaction Search API Servlet Container Users Area Zookeeper Federation/Ranking Spelling Discovery Place Address Data Frontend Batch (REST API) updates for recovery ... SOLR SOLR instance-1 instance-2 QA / Indexing Data Core Core Core Core Core Core Cluster providers Type 1 Type 2 Type 3 Type 2 Type 4 Type 1 replication

Managing Mult ltiple le Ve Versi rsions s RPC Interaction • Every incoming query is replicated and routed to Search API Servlet Container Users Area Zookeeper Versions A and B Federation/Ranking Spelling • Each Version is implemented as a specific type of Discovery Place Address SOLR query Data Frontend • We deploy more than 2 versions to production and Batch (REST API) updates for switch between them using zookeeper recovery ... SOLR SOLR • Result-mixing of A and B is implemented in a instance-1 instance-2 processing layer above SOLR QA / Indexing Data Core Core Core Core Core Core Cluster providers Type 1 Type 2 Type 3 Type 2 Type 4 Type 1 replication

Caveat 1: Ca : Randomi miza zation • don’t confuse users with changing results, i.e.: provide a consistent user experience • Solution: • Random generator is seeded with USER-ID for each query. • Each user gets his personal random generator.

Ca Caveat 2: : Healt lthy y Cli Click ck Data • we are relying on the integrity of transmitted user actions • sensitive to log contamination (unidentified QA, spam) • user-clicks plot:

Ca Caveat 3: : A/ A/B Cli Click cks s vs. s. Co Covera rage • Coverage = non-empty responses (in percent) • For example • A/B interleaving of eat&drink vs. eat&drink + going out • difference is not significant • But coverage different, percentage of responses with POIs nearby: • 60% eat&drink • 62% eat&drink + going out • Higher coverage decides in case there is no statistical difference

Ca Case se Study: y: Eat’n’Dri rink versu rsus s Hotels: ls: Not the Use ser r Behaviour Behaviour we we had expect cted! Rate Save (Fav’s) Contact: Call Contact: URL Share Navigate: Drive Navigate: Walk Navigage: Add Info Provider 0 375 750 1125 1500

Ca Case se Study: y: versu rsus s : : Not the Use ser r Behaviour Behaviour we we had expect cted! Rate Save (Fav’s) Contact: Call Contact: URL Share Some users select their driving Navigate: Drive destination with the help of Nearby Places. Hotels are a Navigate: Walk common destination in the car navigation use case. Navigage: Add Info Provider 0 375 750 1125 1500

Summa mmary ry • use A/B Rank Interleaving to optimize result relevance • Rank Interleaving is easy to implement. Works. • in a distributed search architecture manage your A/B test configurations conveniently using Zookeeper • harness your hadoop/search analytics stack for A/B test evaluations • don’t make assumptions about your users! • [Joachims et al 2008]: “How Does Clickthrough Data Reflect Retrieval Quality?”

Thanks! s! Get in touch ch: : hannes. s.kru ruppa@nokia.co com Nokia Maps “Place Discovery” Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test - PowerPoint PPT Presentation

Imp mpro roving Search rch Thro rough Effici cient A/ A/B Test sting: : A A Ca Case se Study Nokia Maps Place Discovery Team, Berlin: Hannes Kruppa, Steffen Bickel, Mark Waldaukat, Felix Weigel, Ross Turner, Peter Siemen Nokia

L L a ndsc a pe a ndsc a pe I I mpro ve me nts mpro ve me nts Optio ns Optio ns Pre se nte

Roving Interpretation: Principles and Practice By: Danielle Bradley, Region 6 Interpretive

Me ta L e a rning : L e ve ra g ing Re se a rc h o n L e a rning to I mpro ve Stude nt Suc

Mixed-Integer Nonlinear Programming Leo Liberti LIX, Ecole Polytechnique, France MPRO

Optimization for Sustainable Development Leo Liberti LIX, Ecole Polytechnique, France MPRO

IMPECD Imp roving E ducation and C ompetences in D ietetics Alexandra Kolm, St. Plten University

Imp mprovi roving ng sh shel elf f life e of of fre resh sh bison son st stea eaks

Introd u ction to click - thro u gh rates P R E D IC TIN G C TR W ITH MAC H IN E L E AR N IN G

Imp mpro rovin ing An Antith tithrombotic tic Ed Educatio ation f for r En Endoscopic

Me Medica cal Cannabis s Formu rmulations s thro rough Unive versi rsity y Rese search

RCH Programme Child health and Immunization Issues and Way Forward Vigyan Bhawan New Delhi Dr.

RCH EMR Team #RCHbigbang #HIC19 Mike.south@rch.org.au April 2016 EMR implemented Why might

Art Artistic ic Imp mpressio ion Onl Only Art Artistic ic Imp mpressio ion Onl Only No

IMP IMPO IMP - Impala Platinum Holdings - Consolidated interim results for the six

Using Imp Type Theory and Coq Tom Salet Radboud University Nijmegen May 13, 2016 Tom Salet

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Plasma models physically consistent from kinetic scale to hydrodynamic scale Thierry Magin

1 Peter Series Lesson #115 December 21, 2017 Dean Bible Ministries www.deanbibleministries.org Dr.

Curb Your Dogs Enthusiasm Kate Naito, CPDT-KA Doggie Academy SEQUENCE FOR TRAINING 1 . Mark:

LECTURE 5 Advanced Functions and OOP FUNCTIONS Before we start, lets talk about how name

PROPNEX LIMITED AGM 25 April 2019 1 Disclaimer This presentation does not constitute or form

Evaluation of State and Local Education Programs and Policies (84.305E) Allen Ruby Associate

DARTEP NOVEMBER 30, 2018 WAYNE STATE UNIVERSITY WELCOME DARTEP ATTENDEES Julie Sinkovitz, Chair

CS6220: DATA MINING TECHNIQUES Chapter 1: Introduction Instructor: Yizhou Sun yzsun@ccs.neu.edu

Sambuz

Useful Links

Newsletter

Mail Us