SLIDE 1
Modernmachinelearningmethods fortrustworthyscience TomCharnock - - PowerPoint PPT Presentation
Modernmachinelearningmethods fortrustworthyscience TomCharnock - - PowerPoint PPT Presentation
Modernmachinelearningmethods fortrustworthyscience TomCharnock Institutd'AstrophysiquedeParis Whyneuralnetworksdon'twork (andhowtousethem) TomCharnock
SLIDE 2
SLIDE 3
Why neural networks don't work
Tom Charnock Institut d'Astrophysique de Paris
SLIDE 4
Apologies about the term bias
when something is intrinsically unknowable it is biased if there is some offset, which could in principle be corrected, it is biased
SLIDE 5
Apologies about the term bias
when something is intrinsically unknowable it is biased if there is some offset, which could in principle be corrected, it is biased
I (almost always) mean the top one
SLIDE 6
An approximation to a model,
ℕℕ(, ) : → : →
SLIDE 7
A crazy likelihood surface of how likely we are to get targets from data
SLIDE 8
What are we actually interested in?
SLIDE 9
(|) = ∫ (|, , )(, )
SLIDE 10
(|) = ∫ (|, , )(, )
- Posterior predictive density How likely are the true targets given some data? - Likelihood How likely are the targets to be generated by a particular network? - Probability density What is the probability of obtaining a particular network with particular parameter values?
(|) (|, , ) (, )
SLIDE 11
(|) = ∫ (|, , )(, )
SLIDE 12
Where does this information about the weights and hyperparameters come from?
SLIDE 13
Training and validation data
SLIDE 14
Training and validation data
Training data and targets: Validation data and targets: Posterior distribution of weights and hyperparameters
{, ≡ { , | ∈ [1, ]} }train train
- train
- train
{, ≡ { , | ∈ [1, ]} }val val
- val
- val
(, |{, , {, ) ∝ }train }val (, |{, , {, )(, ) }train }val
SLIDE 15
The failing of traditional training
SLIDE 16
The failing of traditional training
approximator Cost function and likelihood smooth and convex complex and non-convex in and
: → ℕℕ(, ) : → (, ) = − ln (|, , ) ∗ ∗ (, ) (|, , )
SLIDE 17
Optimising (or training) a network
SLIDE 18
Optimising (or training) a network
What are the maximum likelihood estimates of the weights?
= [({ |{ , , )] MLE argmax
- }train
}train ∗
SLIDE 19
Local maximum likelihood estimates
SLIDE 20
The main problem...
SLIDE 21
We degenerate the posterior
(, |{, ) ∝ }train → (, |{, )(, ) }train ( − , − ) MLE ∗
SLIDE 22
We degenerate the posterior
(, |{, ) ∝ }train → (, |{, )(, ) }train ( − , − ) MLE ∗
SLIDE 23
All predictions are (probably incorrect) estimates
(|) = ()
SLIDE 24
There is no way to interpret how close is to ...
SLIDE 25
There is no way to interpret how close is to ...
- Because the likelihood is non-interpretably complex
SLIDE 26
Are there better methods?
SLIDE 27
Variational inference
SLIDE 28
(|) = ∫ (|, , )(|, , {, )(, ) }train
SLIDE 29
Still depends on xed weights in the complex likelihood surface and choice of variational distribution
(|) = ∫ (|, , )(|, , {, ) }train × ( − , − ) MLE ∗ = ∫ (|, , )(| , , {, ). ∗ MLE ∗ }train
SLIDE 30
Bayesian neural networks
SLIDE 31
Bayesian neural networks
SLIDE 32
Sample the likelihood of the training data
(|) =
∝
∫ (|, , )(, |{, ) }train
∫ (|, , ) × ( | , , )(, ). ∏
- train
train
- train
SLIDE 33
Still dependent on the training data!
Classical network : Variational inference : Bayesian networks :
(, |{, ) → ( − , − ) }train MLE ∗ (, |{, ) = (| , , {, ) }train MLE ∗ }train (, |{, ) = ( | , , )(, ) }train ∏train
- train
- train
SLIDE 34
Problems with physical models...
SLIDE 35
Problems with physical models...
SLIDE 36
How can we use a neural network then?
SLIDE 37
Build it into the physical model
SLIDE 38
Method 1 : Infer the data, physics and the neural network
SLIDE 39
SLIDE 40
Method 2 : Understand the likelihood (using neural physical engines)
SLIDE 41
SLIDE 42
Method 3 : Likelihood-free inference
SLIDE 43
SLIDE 44
Compare distance between observed summaries and simulation summaries and select results within
SLIDE 45
Conclusions
SLIDE 46
Conclusions
Neural networks are not to be trusted They can make trusty companions - when the correct framework is introduced Using statistics we can build neural networks into the forward model to get unbiased results
SLIDE 47