Contjnuous Experimentatjon and A/B Testjng: A Mapping Study Rasmus - - PowerPoint PPT Presentation

contjnuous experimentatjon and a b testjng a mapping study
SMART_READER_LITE
LIVE PREVIEW

Contjnuous Experimentatjon and A/B Testjng: A Mapping Study Rasmus - - PowerPoint PPT Presentation

Contjnuous Experimentatjon and A/B Testjng: A Mapping Study Rasmus Ros and Per Runeson A/B Testjng Software changes A User interface tweaks Algorithm parameters Old vs new feature B 1 http://bit.ly/rcose18ab A/B


slide-1
SLIDE 1

Contjnuous Experimentatjon and A/B Testjng: A Mapping Study

Rasmus Ros and Per Runeson

slide-2
SLIDE 2

A/B Testjng

A B

  • Software changes

– User interface tweaks – Algorithm parameters – ”Old” vs ”new” feature

1 http://bit.ly/rcose18ab

slide-3
SLIDE 3

A/B Testjng

A B

User data

1 http://bit.ly/rcose18ab

slide-4
SLIDE 4

A/B Testjng

A B

Wait 1-2 weeks [1]

1 http://bit.ly/rcose18ab

  • 1. Kevic, Katja, et al. "Characterizing Experimentation in

Continuous Deployment: A Case Study on Bing." ICSE:

  • SEIP. 2017. doi: 10.1109/ICSE-SEIP.2017.19
slide-5
SLIDE 5

A/B Testjng

A B

4.3% X 4.4% X

1 http://bit.ly/rcose18ab

slide-6
SLIDE 6

A/B Testjng

A B

4.3% X 11% Y 4.4% X 10% Y

1 http://bit.ly/rcose18ab

slide-7
SLIDE 7

A/B Testjng

A B

4.3% X 11% Y 4.4% X 10% Y

Influence

1 http://bit.ly/rcose18ab ? ?

?

? ?

slide-8
SLIDE 8

A/B Testjng

A B

4.3% X 11% Y 4.4% X 10% Y

1 http://bit.ly/rcose18ab ? ?

?

? ?

slide-9
SLIDE 9

A/B Testjng

A B

4.3% X 11% Y … Z 4.4% X 10% Y … Z

1 http://bit.ly/rcose18ab ? ?

?

? ?

slide-10
SLIDE 10

A/B Testjng

A B

4.3% X 11% Y … Z 4.4% X 10% Y … Z

1 http://bit.ly/rcose18ab ? ?

?

? ?

Wait, there’s more!

  • Learning effects
  • Impression effects
  • Power calculations
  • Multiple testing
  • Early stopping
slide-11
SLIDE 11

A/B Testjng

A B

4.3% X 11% Y … Z 4.4% X 10% Y … Z

1 http://bit.ly/rcose18ab ? ?

?

? ? Please stop!

slide-12
SLIDE 12

A/B Test

  • Controlled experiment

– 2 groups – Metric – Hypothesis test

  • Also A/B/n test and MVT

A B

2 http://bit.ly/rcose18ab

slide-13
SLIDE 13

A/B Test

  • Controlled experiment

– 2 groups – Metric – Hypothesis test

  • Also A/B/n test and MVT

A B A B n A1B1

MVT

n1 A2B2n2

A/B A/B/n

2 http://bit.ly/rcose18ab

slide-14
SLIDE 14

Contjnuous Experimentatjon

  • Continuous iterative process [2]

1)Vision 2)Business goals 3)Experiment 4)Learnings

  • Roles, architecture, infrastructure, …
  • Synergies with continuous *
  • 2. Fagerholm, Fabian, et al. "The RIGHT Model for Continuous Experimentation."

Journal of Systems and Software 123 (2017): 292-305.

A B

2 http://bit.ly/rcose18ab

slide-15
SLIDE 15

Contjnuous Experimentatjon

  • Continuous iterative process [2]

1)Vision 2)Business goals 3)Experiment 4)Learnings

  • Roles, architecture, infrastructure, …
  • Synergies with continuous *
  • 2. Fagerholm, Fabian, et al. "The RIGHT Model for Continuous Experimentation."

Journal of Systems and Software 123 (2017): 292-305.

A B

2 http://bit.ly/rcose18ab

slide-16
SLIDE 16

Contjnuous Experimentatjon

  • Continuous iterative process [2]

1)Vision 2)Business goals 3)Experiment 4)Learnings

  • Roles, architecture, infrastructure, …
  • Synergies with continuous *
  • 2. Fagerholm, Fabian, et al. "The RIGHT Model for Continuous Experimentation."

Journal of Systems and Software 123 (2017): 292-305.

A B

2 http://bit.ly/rcose18ab

slide-17
SLIDE 17

Overview

  • Aim

State of research Applicability of CE

  • Method

Phrase search and references Thematic analysis

  • Result

62 papers

3 http://bit.ly/rcose18ab

slide-18
SLIDE 18

Overview

  • Aim

State of research Applicability of CE

  • Method

Phrase search and references Thematic analysis

  • Result

62 papers

3 http://bit.ly/rcose18ab

slide-19
SLIDE 19

Overview

  • Aim

State of research Applicability of CE

  • Search

Phrase search and references Thematic analysis

  • Result

62 papers http://lup.lub.lu.se/search/ws/files/40009496/extracted_data.csv

3 http://bit.ly/rcose18ab

slide-20
SLIDE 20

Research Trend

2 Other

2007 2009 2011 2013 2015 2017

Year

4 8 12 16

Papers

43 Data science 17 Software engineering

  • 3. Kohavi, Ron, et al. "Controlled Experiments on the Web: Survey and Practical

Guide." Data Mining and Knowledge Discovery (SIGKDD) (2009). Seminal paper: 4 http://bit.ly/rcose18ab

slide-21
SLIDE 21

Research Questjons

  • RQ1 What are the main topics researched

within CE and how are they studied?

  • RQ2 Which kind of organizations

use CE and which sectors do they operate in?

  • RQ3 With what type of experiments

have CE been applied?

?

?

?

5 http://bit.ly/rcose18ab

slide-22
SLIDE 22

Research Questjons

  • RQ1 What are the main topics researched

within CE and how are they studied?

  • RQ2 Which kind of organizations

use CE and which sectors do they operate in?

  • RQ3 With what type of experiments

have CE been applied?

?

?

?

5 http://bit.ly/rcose18ab

slide-23
SLIDE 23

Research Questjons

  • RQ1 What are the main topics researched

within CE and how are they studied?

  • RQ2 Which kind of organizations

use CE and which sectors do they operate in?

  • RQ3 With what type of experiments

have CE been applied?

?

?

?

5 http://bit.ly/rcose18ab

slide-24
SLIDE 24

Research Topics (RQ1)

Topic Total Experiment process 7 Infrastructure 10 Challenges 19 Benefits 3 Variability management 5 Metrics 6 Statistical methods 16 Design of experiments 8 Domain considerations 6 Ethics 1 6 http://bit.ly/rcose18ab

slide-25
SLIDE 25

Research Topics (RQ1)

  • Evaluation

– Evaluation research – Experience report

  • Solution

– Validation – Proposed solution

Topic Total Experiment process 7 Infrastructure 10 Challenges 19 Benefits 3 Variability management 5 Metrics 6 Statistical methods 16 Design of experiments 8 Domain considerations 6 Ethics 1 6 http://bit.ly/rcose18ab

slide-26
SLIDE 26

Research approach Topic Total Evaluation Solution Experiment process 7 7 Infrastructure 10 8 2 Challenges 19 17 2 Benefits 3 3 Variability management 5 5 Metrics 6 3 3 Statistical methods 16 1 15 Design of experiments 8 2 6 Domain considerations 6 3 3 Ethics 1 1

Research Topics (RQ1)

6 http://bit.ly/rcose18ab

slide-27
SLIDE 27

Research approach Topic Total Evaluation Solution Experiment process 7 7 Infrastructure 10 8 2 Challenges 19 17 2 Benefits 3 3 Variability management 5 5 Metrics 6 3 3 Statistical methods 16 1 15 Design of experiments 8 2 6 Domain considerations 6 3 3 Ethics 1 1

Research Topics (RQ1)

  • Anna Karenina

principle

7 http://bit.ly/rcose18ab

slide-28
SLIDE 28

Research approach Topic Total Evaluation Solution Experiment process 7 7 Infrastructure 10 8 2 Challenges 19 17 2 Benefits 3 3 Variability management 5 5 Metrics 6 3 3 Statistical methods 16 1 15 Design of experiments 8 2 6 Domain considerations 6 3 3 Ethics 1 1

Research Topics (RQ1)

  • Technical topics
  • Ethics guidelines

8 http://bit.ly/rcose18ab

slide-29
SLIDE 29

Research approach Topic Total Evaluation Solution Experiment process 7 7 Infrastructure 10 8 2 Challenges 19 17 2 Benefits 3 3 Variability management 5 5 Metrics 6 3 3 Statistical methods 16 1 15 Design of experiments 8 2 6 Domain considerations 6 3 3 Ethics 1 1

Research Topics (RQ1)

  • No data sets
  • One open source tool

9 http://bit.ly/rcose18ab

slide-30
SLIDE 30

Organizatjons – Where is A/B Testjng Used? (RQ2)

  • Sectors
  • Business model
  • Company size

Subscribed or free Σ 29 E-commerce 9 Search engine 5 Other 15 Perpetual Σ 9 Finance 4 Gaming 2 Other 4 Embedded * Σ 4 10 http://bit.ly/rcose18ab

slide-31
SLIDE 31

Organizatjons – Where is A/B Testjng Used? (RQ2)

  • Sectors
  • Business model
  • Company size

Business model Business to consumer (B2C) 32 Business to business (B2B) 10

Quality increase Quality increase

Reputation Vs.

11 http://bit.ly/rcose18ab

slide-32
SLIDE 32

Organizatjons – Where is A/B Testjng Used? (RQ2)

  • Sectors
  • Business model
  • Company size

(employees)

Company size Large ≥ 250 29 Medium < 250 5 Small < 50 8 12 http://bit.ly/rcose18ab

slide-33
SLIDE 33

Experiments – What and How? (RQ3)

  • Treatment
  • Goal
  • Experiment design

Treatment Visual change 64 Algorithmic change 23 New feature 4 13 http://bit.ly/rcose18ab

slide-34
SLIDE 34

Experiments – What and How? (RQ3)

  • Treatment
  • Goal
  • Experiment design

Goal Engagement 58 Revenue 26 Knowledge 7 14 http://bit.ly/rcose18ab

slide-35
SLIDE 35

Experiments – What and How? (RQ3)

  • Treatment
  • Goal
  • Experiment design

Experiment design A/B 77 A/B/n 3 MVT 6 Optimization 3 15 http://bit.ly/rcose18ab

slide-36
SLIDE 36

Slicing Experiments

A/B A/B/n MVT Optimization Visual change Algorithmic change New feature

16 http://bit.ly/rcose18ab

slide-37
SLIDE 37

Slicing Experiments

A/B A/B/n MVT Optimization Visual change Algorithmic change New feature

Experiment Design Complexity Treatment Complexity

16 http://bit.ly/rcose18ab

slide-38
SLIDE 38

Slicing Experiments Chaos

Experiment Design Complexity

Risk

Treatment Complexity

16 http://bit.ly/rcose18ab

slide-39
SLIDE 39

Take Aways

1) Research gaps

– Tool evaluations – Ethics guidelines – Embedded

2) Diverse organisations do A/B testing 3) Simple experimental designs – (Best) practice or bias?

17 http://bit.ly/rcose18ab

slide-40
SLIDE 40

Take Aways

1) Research gaps

– Tool evaluations – Ethics guidelines – Embedded

2) Diverse organisations do A/B testing 3) Simple experimental designs – (Best) practice or bias?

17 http://bit.ly/rcose18ab

slide-41
SLIDE 41

Take Aways

1) Research gaps

– Tool evaluations – Ethics guidelines – Embedded

2) Diverse organisations do A/B testing 3) Simple experimental designs – (Best) practice or bias?

17 http://bit.ly/rcose18ab

slide-42
SLIDE 42
slide-43
SLIDE 43

”Out-of-Scope” A/B Testjng Highlights

  • Qubit meta analysis

https://www.qubit.com/wp-content/uploads/2017 /12/qubit-research-meta-analysis.pdf

  • Errors in statistics

https://blog.sumall.com/journal/optimizely-got-m e-fired.html

https://www.aarondefazio.com/tangentially/?p= 83

  • Continuous Experimentation origin

http://mcfunley.com/design-for-continuous-expe rimentation

  • A/B testing superflous changes

https://iterativepath.wordpress.com/2012/10/29 /testing-40-shades-of-blue-ab-testing/

http://stopdesign.com/archive/2009/03/20/goo dbye-google.html

  • Ethics of experiments

https://doi.org/10.1073/pnas.1320040111

https://freedom-to-tinker.com/2014/07/08/on-the

  • ethics-of-ab-testing/

http://observer.com/2014/07/okcupid-brags-abou t-manipulating-online-daters-in-secret-tests/