Dont trust the HiPPOs: A/B Testing Online Games Steve Collins CTO - PowerPoint PPT Presentation

Don’t trust the HiPPOs: A/B Testing Online Games Steve Collins CTO / Swrve

Me, ¡me, ¡me. 1986 1998 2007 2010

The ¡HiPPO Highest Paid Person’s Opinion http://www.kaushik.net ¡-‐‒ ¡Occam’s ¡Razor ¡Blog

Example ¡#1 A B +218% www.abtests.com ¡ and ¡skritter.com ¡

Example ¡#2 B A +102% www.abtests.com ¡ and ¡diythemes.com ¡

“One ¡accurate ¡measurement ¡is ¡worth ¡1,000 ¡expert ¡opinions.” ¡ Admiral ¡Grace ¡Murray ¡Hopper

Game ¡Service Tune Tune Tune Tune LAUNCH Update Update Update Update Design Dev. Beta Dev Dev Dev Dev 3 - 6 months Game Service Minimum Viable Product

What is testing?

A/B ¡Testing ¡Overview 1. Split population 2. Show variations 3. Measure response 4. Choose winner 5. Deploy winner

What ¡not ¡to ¡do... Inflexible Error prone Complex Forces app update

Solution: Data Driven Approaches http://bit.ly/oWVvX3

Serial ¡Cycles Measure Measure Measure Update Update Dev Dev Dev

Meta-data button-‑style 3D-‑bevel PM !""#$%#&'($# = call-‑to-‑action “Add ¡to ¡cart” colour Orange !"#$%&'(&)*' !"#$%&'(&)*' !"#$%&'(&)*' ENG ART

Serial ¡Cycles PM Measure / Adapt Meta-data 00111011 10001110 00010000 01010101 Develop ENG ART

Parallel ¡Cycles 1. Observe 2. Tune 3. Observe Meta-data Tune Tune Update Update 00111011 10001110 Dev 00010000 Dev 01010101

Sessions 14 days V2.3 V2.4 !""#$% !"#$% !"#$%$#$& '(& &'% &'% Build CDN / AppStore Game Server

Sessions Session n Session 1 Session 2 2 days 12 days !"#$%&$'()$ !""#$%#&'($# !""#$%#&'($# V2.3 V2.4 !""#$% !"#$% !"#$%$#$& '(& &'% &'% Build CDN / AppStore Game Server

Implementing Testing

State ¡Hypothesis Sales Before Sales After

Agree ¡OEC ¡(overall ¡evaluation ¡criterion) %age of players who purchase

1. ¡Split ¡population

User ¡-‐‒ ¡Bucket ¡Assignment Consistent Unbiased md5_hash(UUID ¡+ ¡testID) Efficient Ensure users are bucketed independently for each test

Split ¡population ¡(into ¡independent ¡groups) !"#$ %"#$ %"#$ Control Variant 1 Variant 2 !"#$ !"#$ Control Variant

2. ¡Show ¡variations

Testing ¡Architecture ! ! !"#$%&'((#&)"% !"#$*+% ! !"#$%&'()*)+) 0"12% 3% /% !" ! +$,"-","% !"#$%&'( ./% )*( !"#$% ! &#'()*'% !"#$%&$'()*$%

3. ¡Measure ¡response

f e i r b y l b i d e r c . . n . f i o n y A r a m m u s Statistics ¡of ¡testing

Conversions Conversion ¡Event { Item ¡Purchase Buy-‐‒in #players ¡= ¡ n Completed ¡Tutorial #conversions ¡= ¡ r Fired ¡in-‐‒game ¡event Added ¡a ¡friend conversion ¡rate ¡= ¡ r / n ¡= ¡ p Watched ¡movie ...

Binomial ¡Distribution n ! P [ r, n ] = ( n − r )! r ! p r (1 − p ) n − r n ¡= ¡5, ¡p ¡= ¡0.3 n ¡= ¡100, ¡p ¡= ¡0.1 Probability ¡of ¡ r ¡successes ¡given ¡ n ¡trials and ¡probability ¡of ¡success ¡ p ¡per ¡trial n ¡= ¡1000, ¡p ¡= ¡0.05 n ¡= ¡100, ¡p ¡= ¡0.1

Binomial ¡Distribution Run ¡100 ¡“experiments”, ¡ n ¡= ¡1000, ¡ p ¡= ¡0.05 and ¡compute ¡the ¡average ¡“conversion ¡rate” � µ = np σ = np (1 − p ) x ¡= ¡49.96 Expected ¡value ¡is ¡50

Binomial ¡Distribution Repeat ¡this ¡process... x ¡= ¡50.36 x ¡= ¡49.98 x ¡= ¡50.30 x ¡= ¡49.96

Binomial ¡Distribution Keep ¡repeating ¡this ¡process ¡and ¡plot ¡the ¡averages... 10000 ¡runs 100 ¡runs 1000 ¡runs 10 ¡runs These ¡are ¡ sampling ¡distributions ¡of ¡the ¡mean

Central ¡Limit ¡Theorem As ¡ n ¡increases, ¡the ¡sampling ¡distribution ¡of ¡the ¡mean ¡becomes ¡ “normal”, ¡independent ¡of ¡the ¡underlying ¡distribution

The ¡Normal ¡Distribution πσ 2 − µ 1 2 πσ 2 e − ( x − µ )2) f ( x ) = 2 σ 2 √

The ¡Null ¡Hypothesis : x ! Accept Reject : x ! − µ H 0 : x = µ

Comparing ¡distributions A B H 0 : µ A = µ B

Type-‐‒I ¡Error False Positive A B : µ A = µ B

Type-‐‒II ¡Error False B A Negative : µ A = µ B

p -‐‒Value 95% The ¡probability ¡of ¡observing ¡ as ¡extreme ¡a ¡result 2.5% assuming ¡the ¡null ¡hypothesis ¡ 2.5% is ¡true µ + 1 . 96 σ µ − 1 . 96 σ − µ p -‐‒value ¡= ¡area ¡outside ¡these ¡critical ¡points ¡= ¡0.05 ¡here

Power Standard deviation Number of participants n = ( z α + z β ) 2 σ 2 per group δ 2 Required change n ! 16 ! 2 " 2

n ! 16 ! 2 " 2 Control conversion rate = 5% Desired increase = 50% (i.e. to 7.5%) Standard deviation = 0.1 n = 256 (25 mins for 50k DAU game)* ( * assume 40% retention rate, even daily session distribution and 2 buckets in test)

n ! 16 ! 2 " 2 Control conversion rate = 5% Desired increase = 5% (i.e. to 5.25%) Standard deviation = 0.1 n = 25,600 (20 hrs for 50k DAU game)

n ! 16 ! 2 " 2 Control conversion rate = 5% Desired increase = 5% (i.e. to 5.25%) Standard deviation = 0.5 n = 640,000 (21 days for 50k DAU game)

Challenges when Testing

Primacy • News users behave differently to old users • Familiarity with existing UI / resources / items etc. SOLUTION : Restrict tests to new users

Causality • There may be many reasons for a change in test statistic • Seasonality, events, trends, errors, etc. SOLUTION : use tight evaluation criteria (e.g. sales of item tested NOT overall revenue)

Testing ¡QA • Tests can (will) introduce errors • Particularly with many variants SOLUTION (s) • ramp-up, roll-back capability • force user bucket capability

Temporal ¡Effects • Daily, weekly, yearly • False signals • Ramp up bias SOLUTION • Run tests for sufficiently long to normalize for effects

Version ¡Control • Multiple app-versions in flight • Resources may have changing schema • Can’t force upgrade always SOLUTION (s) • Limit to one app-version; careful version control with schema

Testing ¡in ¡Online ¡Games ...

Death to HiPPOs

‣ Homework: • h=p://exp-‐‒platform.com ¡-‐‒ ¡Ron ¡Kovahi ¡et ¡al. • h=p://statisticsforexperimenters.net/ ¡-‐‒ ¡George ¡Box ¡et ¡al. • h=p://www.kaushik.net ¡-‐‒ ¡Occam’s ¡Razor ¡Blog • h=p://www.abtests.com/ !"#$%&'()*'&!($+'(,-(.$%/&#()001(2304'5$+6&%7809:( ;*%($+'(80%'&%'(,-(<8&$%=4(.+&$9(

Dont trust the HiPPOs: A/B Testing Online Games Steve Collins CTO - PowerPoint PPT Presentation

Dont trust the HiPPOs: A/B Testing Online Games Steve Collins CTO / Swrve Me, me, me. 1986 1998 2007 2010 The HiPPO Highest Paid Persons Opinion http://www.kaushik.net - Occams Razor Blog Example #1 A B

They Don t Want Them Or You t Want Them Or You They Don Don t Have Them: t Have

Don Juans Troubles Don Juans Troubles Hey, Anna, how are you? Don Juans Troubles Hey,

Trust But Verify Trust But Verify Trust But Verify Trust But Verify What Is CEC Entertainment?

Dynamics, robustness and fragility Private trust Public trust of trust Conclusions Dusko

Gods stories Gods stories Trust Trust To Rely Upon Something Totally Trust trust:

Composite Trust Composite Trust Composite Trust A formal derivation of conjunction A formal

Lower Don Trail Master Plan Refresh Public Open House_September 17 2019 1 Lower Don Trail

DON Cybersecurity/Information Assurance Workforce Management Chris Kelsall DON CIO, Director,

Typical English mistakes The system consist of three main component. Giorgio Buttazzo don't forget

BACKGROUND JOB PROCESSING DO'S AND DON'TS BACKGROUND JOB PROCESSING - DO'S AND DON'TS IMAGE

Session 1 The New Codex Trust Fund Purpose of the Codex Trust Fund? The Codex Trust Fund supports

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Islands Trust Council September 15, 2011 The Islands Trust Trust Council 26 elected

The Economics of Trust in Organisations Trust Trust is the voluntary acceptance of vulnerability

Account Compliance Trust Account Reconciliation Agenda Trust Account Overview Top

FT Consultation An NHS Foundation Trust The Trust is applying to become an NHS Foundation Trust

Computer Graphics - Rasterization - Philipp Slusallek Rasterization Definition Given

CMPSC 497 Other Memory Vulnerabilities Trent Jaeger Systems and Internet Infrastructure

Agenda Getting started Drawing Charting Images Interaction Animation Using

Queer supercrystals in SageMath Wencin Poh and Anne Schilling Department of Mathematics, UC Davis

Georgia Tech NASA Flight Readiness Review Teleconference Agenda 1. 2. 3. 4. 5. 6. 7. 8.

Computing Mitered Offset Curves Based on Straight Skeletons Peter Palfrader Martin Held

SVG SVG Scalable Vector Graphics (SVG) is an XML-based vector image format for two-

Hello! TaA - Beverly Chou - 1 What are we doing ? intro part one Intro to gear mechanisms.