. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adaptive Data Analysis
Machine learning in science and society Christos Dimitrakakis August 21, 2019
- C. Dimitrakakis
Adaptive Data Analysis August 21, 2019 1 / 53
Adaptive Data Analysis Machine learning in science and society - - PowerPoint PPT Presentation
Adaptive Data Analysis Machine learning in science and society Christos Dimitrakakis August 21, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Dimitrakakis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adaptive Data Analysis August 21, 2019 1 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adaptive Data Analysis August 21, 2019 1 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning
1 Introduction to machine learning
2 Nearest neighbours 3 Reproducibility
Adaptive Data Analysis August 21, 2019 2 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning
Adaptive Data Analysis August 21, 2019 3 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning
Adaptive Data Analysis August 21, 2019 3 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning
Adaptive Data Analysis August 21, 2019 4 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning
Adaptive Data Analysis August 21, 2019 4 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning
Adaptive Data Analysis August 21, 2019 5 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning
Adaptive Data Analysis August 21, 2019 6 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning
Adaptive Data Analysis August 21, 2019 6 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning
t=1 rt
Adaptive Data Analysis August 21, 2019 7 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning
Adaptive Data Analysis August 21, 2019 8 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Data analysis, learning and planning
Adaptive Data Analysis August 21, 2019 9 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design
Adaptive Data Analysis August 21, 2019 10 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design
Adaptive Data Analysis August 21, 2019 11 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design
Adaptive Data Analysis August 21, 2019 12 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design
Adaptive Data Analysis August 21, 2019 13 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Experiment design
Adaptive Data Analysis August 21, 2019 14 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Figure: Tycho’s measurements of the orbit of Mars and the conclusion about the actual orbits, under the assumption of an earth-centric universe with circular orbits.
Adaptive Data Analysis August 21, 2019 15 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Adaptive Data Analysis August 21, 2019 16 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Adaptive Data Analysis August 21, 2019 17 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Adaptive Data Analysis August 21, 2019 18 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Adaptive Data Analysis August 21, 2019 19 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Adaptive Data Analysis August 21, 2019 20 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Adaptive Data Analysis August 21, 2019 21 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Bayesian inference.
Adaptive Data Analysis August 21, 2019 22 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview
Adaptive Data Analysis August 21, 2019 23 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview
Adaptive Data Analysis August 21, 2019 23 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview
Adaptive Data Analysis August 21, 2019 23 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview
Adaptive Data Analysis August 21, 2019 23 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview
Adaptive Data Analysis August 21, 2019 24 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview
Adaptive Data Analysis August 21, 2019 25 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to machine learning Course overview
Adaptive Data Analysis August 21, 2019 26 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
1 Introduction to machine learning 2 Nearest neighbours 3 Reproducibility
Adaptive Data Analysis August 21, 2019 27 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 28 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 29 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 30 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 30 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
1: Input Data D = {(x1, y1), . . . , (xT, yT)}, k ≥ 1, d : X × X → R+, new point x ∈ X 2: D = Sort(D, d) % Sort D so that d(x, xi) ≤ d(x, xi+1). 3: py = ∑k
i=1 I {yi = y} /k for y ∈ Y.
4: Return p ≜ (p1, . . . , pk)
Adaptive Data Analysis August 21, 2019 31 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Figure: The nearest neighbours algorithm was introduced by Fix and Hodges Jr 3, who also proved consistency properties.
Adaptive Data Analysis August 21, 2019 32 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 33 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 33 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 34 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 35 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 36 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest neighbours
Adaptive Data Analysis August 21, 2019 37 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
1 Introduction to machine learning 2 Nearest neighbours 3 Reproducibility
Adaptive Data Analysis August 21, 2019 38 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Adaptive Data Analysis August 21, 2019 39 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Adaptive Data Analysis August 21, 2019 40 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Adaptive Data Analysis August 21, 2019 41 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
x,y
Decision probability
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility
Figure: The decision process in classification.
(x,y)∈DH
Adaptive Data Analysis August 21, 2019 42 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
Figure: Selecting algorithms and hyperparameters through holdouts
Adaptive Data Analysis August 21, 2019 43 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
Figure: Selecting algorithms and hyperparameters through holdouts
Adaptive Data Analysis August 21, 2019 43 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
Figure: Selecting algorithms and hyperparameters through holdouts
Adaptive Data Analysis August 21, 2019 43 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
Figure: Selecting algorithms and hyperparameters through holdouts
Adaptive Data Analysis August 21, 2019 43 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
Figure: Selecting algorithms and hyperparameters through holdouts
Adaptive Data Analysis August 21, 2019 43 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
Figure: Selecting algorithms and hyperparameters through holdouts
Adaptive Data Analysis August 21, 2019 43 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
Figure: Selecting algorithms and hyperparameters through holdouts
Adaptive Data Analysis August 21, 2019 43 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
λ maximising U(πφ,λ, DH).
λ, D∗).
Adaptive Data Analysis August 21, 2019 44 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility The human as an algorithm
1
2
Adaptive Data Analysis August 21, 2019 45 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity
Figure: Multiple samples
Adaptive Data Analysis August 21, 2019 46 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity
Figure: Multiple samples
Adaptive Data Analysis August 21, 2019 46 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity
Figure: Bootstrap replicates of a single sample
Adaptive Data Analysis August 21, 2019 47 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity
1 Input Training data D, number of samples k. 2 For i = 1, . . . , k 3
4 return
Adaptive Data Analysis August 21, 2019 48 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Algorithmic sensitivity
1 Input Training data DT, number of folds k, algorithm λ, measurement function U 2 Create the partition D(1) . . . , D(k) so that ∪k i=1 D(k) = D. 3 Define D(i) T = D \ D(i) 4
T ) 5 For i = 1, . . . , k: 6
7
8 return {y1, . . . , yi}.
Adaptive Data Analysis August 21, 2019 49 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication
1 Define a data-generating process as close to the original dataset as possible. 2 Collect data according to your protocol. 3 Run the intended analysis. 4 See if the results are reasonable, or if you need more power.
Adaptive Data Analysis August 21, 2019 50 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication
1 Create a simulation that allows you to collect data similar to the real one. 2 Collect data from the simulation and analyse it according to your protocol. 3 If the results are not as expected, alter the protocol or the simulation. In which
4 Finally, use the best-performing method as the protocol.
Adaptive Data Analysis August 21, 2019 51 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication
1 Reinterpret the original hypothesis and experiment. 2 Collect data according to the original protocol, unless flawed. 3 Run the analysis again, unless flawed. 4 See if the conclusions are in agreement.
Adaptive Data Analysis August 21, 2019 52 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication
Adaptive Data Analysis August 21, 2019 53 / 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproducibility Beyond the data you have: simulation and replication
Adaptive Data Analysis August 21, 2019 53 / 53