Adaptive Data Analysis Machine learning in science and society - - PowerPoint PPT Presentation

adaptive data analysis
SMART_READER_LITE
LIVE PREVIEW

Adaptive Data Analysis Machine learning in science and society - - PowerPoint PPT Presentation

Adaptive Data Analysis Machine learning in science and society Christos Dimitrakakis August 21, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Dimitrakakis


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Adaptive Data Analysis

Machine learning in science and society Christos Dimitrakakis August 21, 2019

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 1 / 53

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 1 / 53

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning

1 Introduction to machine learning

Data analysis, learning and planning Experiment design Bayesian inference. Course overview

2 Nearest neighbours 3 Reproducibility

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 2 / 53

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning

Scientific applications

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 3 / 53

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning

Scientific applications

Interpretability, Reproducibility

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 3 / 53

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning

Pervasive “intelligent” systems

Home assistants Autonomous vehicles Web advertising Ridesharing Lending Public policy

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 4 / 53

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning

Pervasive “intelligent” systems

Home assistants Autonomous vehicles Web advertising Ridesharing Lending Public policy

Privacy, Fairness, Safety

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 4 / 53

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning

What can machine learning do?

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 5 / 53

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning

Can machines learn from data?

An unsupervised learning problem: topic modelling

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 6 / 53

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning

Can machines learn from data?

A supervised learning problem: object recognition

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 6 / 53

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning

Can machines learn from their mistakes?

Reinforcement learning

Take actions a1, . . . , at, so as to maximise utility U = ∑T

t=1 rt

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 7 / 53

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning

Can machines make complex plans?

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 8 / 53

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning

Machines can make complex plans!

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 9 / 53

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design

The scientific process as machine learning

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 10 / 53

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 11 / 53

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design

Adam, the robot scientist

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 12 / 53

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design

Drug discovery

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 13 / 53

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design

Drawing conclusions from results

hypothesis experiment result conclusion

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 14 / 53

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

Tycho Brahe’s minute eye measurements

Figure: Tycho’s measurements of the orbit of Mars and the conclusion about the actual orbits, under the assumption of an earth-centric universe with circular orbits.

Hypothesis: Earth-centric, Circular orbits Conclusion: Specific circular orbits

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 15 / 53

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

Johannes Kepler’s alternative hypothesis

Hypothesis: Circular or elliptic orbits Conclusion: Specific elliptic orbits

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 16 / 53

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

200 years later, Gauss formalised this statistically

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 17 / 53

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

A warning: The dead salmon mirage

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 18 / 53

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

A simple simulation study

src/reproducibility/mri_analysis.ipynb

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 19 / 53

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

Planning future experiments

hypothesis experiment result conclusion

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 20 / 53

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

Planning experiments is like Tic-Tac-Toe

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 21 / 53

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.

Eve, another robot scientist

Discovered a malaria drug

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 22 / 53

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview

Machine learning in practice

Avoiding pitfalls

Choosing hypotheses. Correctly interpreting conclusions. Using a good testing methodology.

Machine learning in society

Privacy Fairness Safety

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 23 / 53

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview

Machine learning in practice

Avoiding pitfalls

Choosing hypotheses. Correctly interpreting conclusions. Using a good testing methodology.

Machine learning in society

Privacy — Credit risk. Fairness Safety

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 23 / 53

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview

Machine learning in practice

Avoiding pitfalls

Choosing hypotheses. Correctly interpreting conclusions. Using a good testing methodology.

Machine learning in society

Privacy — Credit risk. Fairness — Job market. Safety

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 23 / 53

slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview

Machine learning in practice

Avoiding pitfalls

Choosing hypotheses. Correctly interpreting conclusions. Using a good testing methodology.

Machine learning in society

Privacy — Credit risk. Fairness — Job market. Safety — Medicine.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 23 / 53

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview

Course structure

Module structure

Activity-based, hands-on. Mini-lectures with short exercises in each class. Technical tutorials and labs in alternate week.

Modules

Three mini-projects. Simple decision problems: Credit risk. Sequential problems: Medical diagnostics and treatment.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 24 / 53

slide-32
SLIDE 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview

Technical topics

Machine learning problems

Unsupervised learning. . Supervised learning. Reinforcement learning.

Algorithms and models

Bayesian inference and graphical models. Stochastic optimisation and neural networks. Backwards induction and Markov decision processes.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 25 / 53

slide-33
SLIDE 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview

Further reading

Bennett et al. 2 describe how the usual uncorrected analysis of fMRI data leads to the conclusion that the dead salmon can reason about human images. Bennett et al. 1 discuss how to perform analyses of medical images in a principled

  • way. They also introduce the use of simulations in order to test how well a particular

method is going to perform.

Resources

Online QA platform: https://piazza.com/class/jufgabrw4d57nh Course code and notes: https://github.com/olethrosdc/ml-society-science Book https://github.com/olethrosdc/ml-society-science/notes.pdf

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 26 / 53

slide-34
SLIDE 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

1 Introduction to machine learning 2 Nearest neighbours 3 Reproducibility

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 27 / 53

slide-35
SLIDE 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Discriminating between diseases

50 100 150 200 250 2e+06 4e+06 6e+06 8e+06 1e+07 50 100 150 200 250 2e+06 4e+06 6e+06 8e+06 1e+07 Spectral statistics VVX strain Spectral statistics for BUT

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 28 / 53

slide-36
SLIDE 36

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Nearest neighbour: the hidden secret of machine learning

  • 1.5e+08
  • 1e+08
  • 5e+07

5e+07 1e+08

  • 2e+08
  • 1e+08

1e+08 2e+08 BUT VVJ

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 29 / 53

slide-37
SLIDE 37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Comparing spectral data

10 20 30 40 50 1 2 3 4 5 6

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 30 / 53

slide-38
SLIDE 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Comparing spectral data

10 20 30 40 50 1 2 3 4 5 6

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 30 / 53

slide-39
SLIDE 39

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

The nearest neighbour algorithm

Algorithm 1 k-NN Classify

1: Input Data D = {(x1, y1), . . . , (xT, yT)}, k ≥ 1, d : X × X → R+, new point x ∈ X 2: D = Sort(D, d) % Sort D so that d(x, xi) ≤ d(x, xi+1). 3: py = ∑k

i=1 I {yi = y} /k for y ∈ Y.

4: Return p ≜ (p1, . . . , pk)

Algorithm parameters

Neighbourhood k ≥ 1. Distance d : X × X → R+. What does the algorithm output when k = T?

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 31 / 53

slide-40
SLIDE 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Figure: The nearest neighbours algorithm was introduced by Fix and Hodges Jr 3, who also proved consistency properties.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 32 / 53

slide-41
SLIDE 41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Nearest neighbour: What type is the new bacterium?

  • 1.5e+08
  • 1e+08
  • 5e+07

5e+07 1e+08

  • 2e+08
  • 1e+08

1e+08 2e+08 BUT VVJ ?

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 33 / 53

slide-42
SLIDE 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Nearest neighbour: What type is the new bacterium?

  • 1.5e+08
  • 1e+08
  • 5e+07

5e+07 1e+08

  • 2e+08
  • 1e+08

1e+08 2e+08 BUT VVJ ?

What if it a completely different strain?

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 33 / 53

slide-43
SLIDE 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Separating the model from the classification policy

The k-NN algorithm returns a model giving class probabilities for new data points. Deciding a class given the model π(a | x) = I {pa ≥ py∀y} , p = k-NN(D, k, d, x)

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 34 / 53

slide-44
SLIDE 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Hands on with Python console

src/decision-problems/knn-classify.py src/decision-problems/KNN.ipynb

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 35 / 53

slide-45
SLIDE 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Discussion: Shortcomings of k-nearest neighbour

Choice of k Choice of metric d. Representation of uncertainty. Scaling with large amounts of data. Meaning of label probabilities.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 36 / 53

slide-46
SLIDE 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours

Learning outcomes

Understanding

How kNN works The effect of hyperameters k, d for nearest neighbour. The use of kNN to classify new data.

Skills

Use a standard kNN class in python Optimise kNN hyperparameters in an unbiased manner. Calculate probabilities of class labels using kNN.

Reflection

When is kNN a good model? How can we deal with large amounts of data? How can we best represent uncertainty?

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 37 / 53

slide-47
SLIDE 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

1 Introduction to machine learning 2 Nearest neighbours 3 Reproducibility

The human as an algorithm Algorithmic sensitivity Beyond the data you have: simulation and replication

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 38 / 53

slide-48
SLIDE 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

Computational reproducibility: Can the study be repeated?

Can we, from the available information and data, exactly reproduce the reported methods and results? jupyter notebooks svn, git or mercurial version control systems

Scientific reproducibility: Is the conclusion correct?

Can we, from the available information and a new set of data, reproduce the conclusions

  • f the original study?

When publishing results about a new method, computational reproducibility is essential for scientific reproducibility.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 39 / 53

slide-49
SLIDE 49

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 40 / 53

slide-50
SLIDE 50

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

The principle of independent evaluation

Data used for estimation cannot be used for evaluation.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 41 / 53

slide-51
SLIDE 51

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection

Figure: The decision process in classification.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-52
SLIDE 52

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training

Figure: The decision process in classification.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-53
SLIDE 53

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training λ Algorithm, hyperparameters

Figure: The decision process in classification.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-54
SLIDE 54

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training λ Algorithm, hyperparameters π Classifier

Figure: The decision process in classification.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-55
SLIDE 55

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training λ Algorithm, hyperparameters π Classifier DH Holdout

Figure: The decision process in classification.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-56
SLIDE 56

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training λ Algorithm, hyperparameters π Classifier DH Holdout

Figure: The decision process in classification.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-57
SLIDE 57

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training λ Algorithm, hyperparameters π Classifier DH Holdout U Measurement

Figure: The decision process in classification.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-58
SLIDE 58

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training λ Algorithm, hyperparameters π Classifier DH Holdout U Measurement

Figure: The decision process in classification.

Classification accuracy

Eχ[U(π)] = ∑

x,y

Pχ(x, y)

  • Data probability

Decision probability

  • π(a = y | x)
  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-59
SLIDE 59

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility

χ Data Collection DT Training λ Algorithm, hyperparameters π Classifier DH Holdout U Measurement

Figure: The decision process in classification.

Classification accuracy

EDH U(π) = ∑

(x,y)∈DH

π(a = y | x)/|DH|.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 42 / 53

slide-60
SLIDE 60

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

The human as an algorithm.

χ Data Collection DT Training DH Holdout

Figure: Selecting algorithms and hyperparameters through holdouts

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 43 / 53

slide-61
SLIDE 61

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

The human as an algorithm.

χ Data Collection DT Training DH Holdout λ1 Algorithm, hyperparameters

Figure: Selecting algorithms and hyperparameters through holdouts

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 43 / 53

slide-62
SLIDE 62

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

The human as an algorithm.

χ Data Collection DT Training DH Holdout λ1 Algorithm, hyperparameters π1 Classifier

Figure: Selecting algorithms and hyperparameters through holdouts

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 43 / 53

slide-63
SLIDE 63

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

The human as an algorithm.

χ Data Collection DT Training DH Holdout λ1 Algorithm, hyperparameters π1 Classifier U1 Measurement

Figure: Selecting algorithms and hyperparameters through holdouts

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 43 / 53

slide-64
SLIDE 64

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

The human as an algorithm.

χ Data Collection DT Training DH Holdout λ1 Algorithm, hyperparameters π1 Classifier U1 Measurement λ2 Algorithm, hyperparameters

Figure: Selecting algorithms and hyperparameters through holdouts

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 43 / 53

slide-65
SLIDE 65

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

The human as an algorithm.

χ Data Collection DT Training DH Holdout λ1 Algorithm, hyperparameters π1 Classifier U1 Measurement λ2 Algorithm, hyperparameters π2 Classifier

Figure: Selecting algorithms and hyperparameters through holdouts

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 43 / 53

slide-66
SLIDE 66

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

The human as an algorithm.

χ Data Collection DT Training DH Holdout λ1 Algorithm, hyperparameters π1 Classifier U1 Measurement λ2 Algorithm, hyperparameters π2 Classifier U2 Measurement

Figure: Selecting algorithms and hyperparameters through holdouts

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 43 / 53

slide-67
SLIDE 67

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

Holdout sets

Original data D, e.g. D = (x1, . . . , xT). Training data DT ⊂ D, e.g. DT = x1, . . . , xn, n < T. Holdout data DH = D \ DT, used to measure the quality of the result. Algorithm λ with hyperparametrs φ. Get algorithm output π = λ(DT, φ). Calculate quality of output U(π, DH)

Holdout and test sets for unbiased algorithm comparison

Algorithm 2 Unbiased adaptive evaluation through data partitioning Partition data into DT, DH, D∗. for λ ∈ Λ do for φ ∈ Φλ do πφ,λ = λ(DT, φ). end for Get π∗

λ maximising U(πφ,λ, DH).

uλ = U(π∗

λ, D∗).

end for λ∗ = arg maxλ uλ.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 44 / 53

slide-68
SLIDE 68

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm

Final performance measurement

χ Data Collection DT Training DH Holdout D∗ Testing η human λ1 Algorithm 1 π1 Classifier 1 U∗

1

Result 1 λ2 Algorithm 2 π2 Classifier 2 U∗

2

Result 2

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 45 / 53

slide-69
SLIDE 69

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity

Independent data sets

χ Experiment λ Algorithm D1 1st sample π1 1st Result

Figure: Multiple samples

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 46 / 53

slide-70
SLIDE 70

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity

Independent data sets

χ Experiment λ Algorithm D1 1st sample π1 1st Result D2 2nd Sample π2 2nd Result

Figure: Multiple samples

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 46 / 53

slide-71
SLIDE 71

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity

Bootstrap samples

χ Experiment DT training λ Algorithm D1 1st sample π1 1st Result D2 2nd Sample π2 2nd Result

Figure: Bootstrap replicates of a single sample

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 47 / 53

slide-72
SLIDE 72

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity

Bootstrapping

Bootstrapping is a general technique that can be used to: Estimate the sensitivity of λ to the data x. Obtain a distribution of estimates π from λ and the data x. When estimating the performance of an algorithm on a small dataset D∗, use bootstrap samples of D∗.

Bootstrapping

1 Input Training data D, number of samples k. 2 For i = 1, . . . , k 3

D(i) = Bootstrap(D)

4 return

{ D(i)

  • i = 1, . . . , k

} . where Bootstrap(D) samples with replacement |D| points from DT.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 48 / 53

slide-73
SLIDE 73

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity

Cross-validation

k-fold Cross-Validation

1 Input Training data DT, number of folds k, algorithm λ, measurement function U 2 Create the partition D(1) . . . , D(k) so that ∪k i=1 D(k) = D. 3 Define D(i) T = D \ D(i) 4

πi = λ(D(i)

T ) 5 For i = 1, . . . , k: 6

πi = λ(D(i))

7

ui = U(πi)

8 return {y1, . . . , yi}.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 49 / 53

slide-74
SLIDE 74

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication

Simulation

Steps for a simulation pre-study

1 Define a data-generating process as close to the original dataset as possible. 2 Collect data according to your protocol. 3 Run the intended analysis. 4 See if the results are reasonable, or if you need more power.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 50 / 53

slide-75
SLIDE 75

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication

Simulation

Simulation study

1 Create a simulation that allows you to collect data similar to the real one. 2 Collect data from the simulation and analyse it according to your protocol. 3 If the results are not as expected, alter the protocol or the simulation. In which

cases do you get good results?

4 Finally, use the best-performing method as the protocol.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 51 / 53

slide-76
SLIDE 76

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication

Independent replication

Replication study

1 Reinterpret the original hypothesis and experiment. 2 Collect data according to the original protocol, unless flawed. 3 Run the analysis again, unless flawed. 4 See if the conclusions are in agreement.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 52 / 53

slide-77
SLIDE 77

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication

Learning outcomes

Understanding

What is a hold-out set, cross-validation and bootstrapping. The idea of not reusing data input to an algorithm to evaluate it. The fact that algorithms can be implemented by both humans and machines.

Skills

Use git and notebooks to document your work. Use hold-out sets or cross-validation to compare parameters/algorithms in Python. Use bootstrapping to get estimates of uncertainty in Python.

Reflection

What is a good use case for cross-validation over hold-out sets? When is it a good idea to use bootstrapping? How can we use the above techniques to avoid the false discovery problem? Can these techniques fully replace independent replication?

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 53 / 53

slide-78
SLIDE 78

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication

[1] Craig M. Bennett, George L. Wolford, and Michael B. Miller. The principled control

  • f false positives in neuroimaging. Social cognitive and affective neuroscience, 4 4:

417–22, 2009. URL https://pdfs.semanticscholar.org/19c3/ d8b67564d0e287a43b1e7e0f496eb1e8a945.pdf. [2] Craig M Bennett, Abigail A Baird, Michael B Miller, and George L Wolford. Journal

  • f serendipitous and unexpected results. Journal of Serendipitous and Unexpected

Results (jsur. org)-Vol, 1(1):1–5, 2012. URL https://teenspecies.github.io/pdfs/NeuralCorrelates.pdf. [3] Evelyn Fix and Joseph L Hodges Jr. Discriminatory analysis-nonparametric discrimination: consistency properties. Technical report, California Univ Berkeley, 1951.

  • C. Dimitrakakis

Adaptive Data Analysis August 21, 2019 53 / 53