A/B Testing in the Wild Disclaimer: This talk represents my own - - PowerPoint PPT Presentation

a b testing in the wild disclaimer this talk represents
SMART_READER_LITE
LIVE PREVIEW

A/B Testing in the Wild Disclaimer: This talk represents my own - - PowerPoint PPT Presentation

Emily Robinson @robinson_es A/B Testing in the Wild Disclaimer: This talk represents my own views, not those of Etsy Ov Overview INTRODUCTION CHALLENGES & LESSONS A/B Testing Etsy Business Statistical Et Etsy Etsy is a global


slide-1
SLIDE 1

A/B Testing in the Wild

Emily Robinson @robinson_es

slide-2
SLIDE 2

Disclaimer: This talk represents my own views, not those of Etsy

slide-3
SLIDE 3

Ov Overview

INTRODUCTION CHALLENGES & LESSONS

Statistical Business Etsy A/B Testing

slide-4
SLIDE 4

Et Etsy

slide-5
SLIDE 5

Etsy is a global creative commerce platform. We build markets, services and economic opportunity for creative entrepreneurs.

slide-6
SLIDE 6

Our Items

slide-7
SLIDE 7

By The Numbers

1.8M

active sellers

AS OF MARCH 31, 2017

29.7M

active buyers

AS OF MARCH 31, 2017

$2.84B

annual GMS

IN 2016

45+M

items for sale

AS OF MARCH 31, 2017

Photo by Kirsty-Lyn Jameson

slide-8
SLIDE 8

A/B Testin ing

slide-9
SLIDE 9

What is A/B Testing?

slide-10
SLIDE 10

Old Experience

slide-11
SLIDE 11

New Feature

slide-12
SLIDE 12

A/B Testing: It’s Everywhere

slide-13
SLIDE 13

Highly Researched

slide-14
SLIDE 14

My Perspective

Millions of visitors daily Data Engineering Pipeline Set-Up

slide-15
SLIDE 15

Generating numbers is easy; generating numbers you should trust is hard!

slide-16
SLIDE 16

Why Statistics Anyway?

  • “Election surveys are done with a few thousand people”1
  • Targeting small effects
  • A .5% change in conversion rate (e.g. 6% to 6.03%) on a high traffic page can be

millions of dollars annually

1Online Experimentation at Microsoft

slide-17
SLIDE 17

Example le Experim iment

slide-18
SLIDE 18

Listing Card Experiment

slide-19
SLIDE 19

Result

👏

slide-20
SLIDE 20

Listing Card Experiment: Redux

🎊 👏 💰 👎

slide-21
SLIDE 21

Statis istic ical l Challe llenges

slide-22
SLIDE 22

Level of Analysis

Visit:

activity by browser over a defined time period (30 minutes)

Browser:

cookie or device ID (for apps)

User:

Signed-in user ID

slide-23
SLIDE 23

Browser vs Visit: An Example

I really want my

  • wn lightsaber
slide-24
SLIDE 24
slide-25
SLIDE 25

Next Day

slide-26
SLIDE 26

Pros and Cons

Visit

Browser Tighter attribution Captures relevant later behavior Independence violation assumption Introduces noise Cannibalization potential Misses multiple events for proportion metrics Our conclusion: offer both, browser generally better

slide-27
SLIDE 27

GMS per User

  • Generally this is key metric
  • But it’s a very badly behaved distribution
  • Highly skewed and strictly non-negative: can’t use t-test
  • Many zeros: can’t log numbers
slide-28
SLIDE 28

ACBV/ACVV

slide-29
SLIDE 29

Definitions

  • Power: Probability if there is an effect of a certain magnitude, we will detect it
  • Bootstrap: random sampling with replacement
  • Simulation: modeling random events
slide-30
SLIDE 30

Test Selection Process

Take Real Experiment Estimate Power of Different Tests

slide-31
SLIDE 31

Estimating Power

slide-32
SLIDE 32

Test Selection Process

Take Real Experiment Estimate Power of Different Tests Estimate Power for Different Effect Sizes Find Best Simulation Method

slide-33
SLIDE 33

Simulation Method Comparison

slide-34
SLIDE 34

Estimating Power

slide-35
SLIDE 35

Power at 1% Increase in ACBV

slide-36
SLIDE 36

Busin iness Challe llenges

slide-37
SLIDE 37

Working with Teams

slide-38
SLIDE 38

Proactive Communication

Early involvement:

No post-mortems

Demonstrate value:

Prioritization, feasibility, sequencing

Develop relationship:

Understand teammates

slide-39
SLIDE 39

Dealing with Adhoc Questions

Question:

What’s the conversion rate of visitors in Estonia on Saturday looking in the wedding category?

First Response:

What decision are you using this for?

slide-40
SLIDE 40

Helps Avoid This

slide-41
SLIDE 41

Checks Translation

slide-42
SLIDE 42

We often joke that our job … is to tell our clients that their new baby is ugly

Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained

slide-43
SLIDE 43

Business Partners & Experiments

  • Financial and emotional investment
  • Inaccurate expectations:
  • Features are built because team believes they’re useful
  • But experiment success rate across industry is (sometimes far less) than 50%
slide-44
SLIDE 44

Peeking

Question: “What do the results mean?” Answer: “It’s been up for 15 minutes…”

slide-45
SLIDE 45
slide-46
SLIDE 46

Daily Experiment Updates

*This is a Made-up Example

Offers Interpretation Shows You’re Monitoring

slide-47
SLIDE 47

Want Fast Decision Making

slide-48
SLIDE 48

Cost of Peeking: 5% FPR to 20%!

slide-49
SLIDE 49

Solution 1: Adjust P-Value Threshold

Not Rigorous Easy to Interpret

slide-50
SLIDE 50

Solution 2: “Outlaw” Peeking

Miss Bugs Correct Way

slide-51
SLIDE 51

Solution 3: Continuous Monitoring

Complicated to Implement & Explain Peek and Stay Rigorous

slide-52
SLIDE 52

And at the End of the Day …

From Julia Evans, @b0rk “How to be a Wizard Programmer”

slide-53
SLIDE 53

Resources

  • Controlled Experiments on the Web: Survey and Practical Guide
  • Overlapping Experiment Infrastructure: More, Better, Faster Experimentatio
  • From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks
  • What works in e-commerce – a meta-analysis of 6700 online experiments
  • Online Controlled Experiments at Large-Scale
  • Online Experimentation at Microsoft
  • Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained
slide-54
SLIDE 54

Acknowledgments

  • Evan D’Agostini (for ACBV development & slides)
  • Jack Perkins & Anastasia Erbe (former & fellow search analysts)
  • Michael Berkowitz, Callie McRee, David Robinson, Bill Ulammandakh, & Dana Levin-

Robinson (for presentation feedback)

  • Etsy Analytics team
  • Etsy Search UI & Search Ranking teams
slide-55
SLIDE 55

Thank You

tiny.cc/abslides robinsones.github.io @robinson_es