Modernmachinelearningmethods fortrustworthyscience TomCharnock - - PowerPoint PPT Presentation

modern machine learning methods for trustworthy science
SMART_READER_LITE
LIVE PREVIEW

Modernmachinelearningmethods fortrustworthyscience TomCharnock - - PowerPoint PPT Presentation

Modernmachinelearningmethods fortrustworthyscience TomCharnock Institutd'AstrophysiquedeParis Whyneuralnetworksdon'twork (andhowtousethem) TomCharnock


slide-1
SLIDE 1

Modern
machine
learning
methods

 for
trustworthy
science

Tom
Charnock Institut
d'Astrophysique
de
Paris

 
 


slide-2
SLIDE 2

Why
neural
networks
don't
work

 (and
how
to
use
them)

Tom
Charnock Institut
d'Astrophysique
de
Paris

 
 


slide-3
SLIDE 3

Why
neural
networks
don't
work
 


Tom
Charnock Institut
d'Astrophysique
de
Paris

 
 


slide-4
SLIDE 4

Apologies
about
the
term
bias

when
something
is
intrinsically
unknowable
it
is
biased if
there
is
some
offset,
which
could
in
principle
be corrected,
it
is
biased

slide-5
SLIDE 5

Apologies
about
the
term
bias

when
something
is
intrinsically
unknowable
it
is
biased if
there
is
some
offset,
which
could
in
principle
be corrected,
it
is
biased

I
(almost
always)
mean
the
top
one

slide-6
SLIDE 6


 An
approximation
to
a
model,


ℕℕ(, ) : →  : →

slide-7
SLIDE 7

A
crazy
likelihood
surface
of
how
likely
we are
to
get
targets
from
data

slide-8
SLIDE 8

What
are
we
actually
interested
in?

slide-9
SLIDE 9

(|) = ∫ (|, , )(, )

slide-10
SLIDE 10

(|) = ∫ (|, , )(, )


-
Posterior predictive
density
 How
likely
are
the
true targets
given
some
data?
 
 
 
-
Likelihood
 How
likely
are
the
targets
to be
generated
by
a
particular network? 
-
Probability
density
 What
is
the
probability
of
obtaining
a
particular
network
with particular
parameter
values?

(|) (|, , ) (, )

slide-11
SLIDE 11


 
 
 


(|) = ∫ (|, , )(, )

slide-12
SLIDE 12

Where
does
this
information
about
the weights
and
hyperparameters
come
from?

slide-13
SLIDE 13

Training
and
validation
data

slide-14
SLIDE 14

Training
and
validation
data

Training
data
and
targets:
 
 
 Validation
data
and
targets:
 Posterior
distribution
of
weights
and
hyperparameters

{, ≡ { , | ∈ [1, ]} }train train

  • train
  • train

{, ≡ { , | ∈ [1, ]} }val val

  • val
  • val

(, |{, , {, ) ∝ }train }val (, |{, , {, )(, ) }train }val

slide-15
SLIDE 15

The
failing
of
traditional
training

slide-16
SLIDE 16

The
failing
of
traditional
training


 
approximator
 
 Cost
function
and
likelihood 
 
 
 smooth
and
convex
 
 
 complex
and
non-convex in
 
and


 : → ℕℕ(, ) : → (, ) = − ln (|, , ) ∗ ∗ (, ) (|, , )

slide-17
SLIDE 17

Optimising
(or
training)
a
network

slide-18
SLIDE 18

Optimising
(or
training)
a
network

What
are
the
maximum
likelihood
estimates
of
the
weights? 


= [({ |{ , , )] MLE argmax

  • }train

}train ∗

slide-19
SLIDE 19

Local
maximum
likelihood
estimates

slide-20
SLIDE 20

The
main
problem...

slide-21
SLIDE 21

We
degenerate
the
posterior

(, |{, ) ∝ }train → (, |{, )(, ) }train ( − , − ) MLE ∗

slide-22
SLIDE 22

We
degenerate
the
posterior

(, |{, ) ∝ }train → (, |{, )(, ) }train ( − , − ) MLE ∗

slide-23
SLIDE 23

All
predictions
are
(probably
incorrect)
estimates
 


(|) = ()

slide-24
SLIDE 24

There
is
no
way
to
interpret
how
close
 
is
to
 ...

slide-25
SLIDE 25

There
is
no
way
to
interpret
how
close
 
is
to
 ...

  • Because
the
likelihood
is
non-interpretably
complex
slide-26
SLIDE 26

Are
there
better
methods?

slide-27
SLIDE 27

Variational
inference

slide-28
SLIDE 28

(|) = ∫ (|, , )(|, , {, )(, ) }train

slide-29
SLIDE 29

Still
depends
on
xed
weights
in
the
complex
likelihood
surface
 and
choice
of
variational
distribution 


(|) = ∫ (|, , )(|, , {, ) }train × ( − , − ) MLE ∗ = ∫ (|, , )(| , , {, ). ∗ MLE ∗ }train

slide-30
SLIDE 30

Bayesian
neural
networks

slide-31
SLIDE 31

Bayesian
neural
networks

slide-32
SLIDE 32

Sample
the
likelihood
of
the
training
data

(|) =

∫ (|, , )(, |{, ) }train

∫ (|, , ) × ( | , , )(, ). ∏

  • train

train

  • train
slide-33
SLIDE 33

Still
dependent
on
the
training
data!


 Classical
network
:
 Variational
inference
:
 Bayesian
networks
:


(, |{, ) → ( − , − ) }train MLE ∗ (, |{, ) = (| , , {, ) }train MLE ∗ }train (, |{, ) = ( | , , )(, ) }train ∏train

  • train
  • train
slide-34
SLIDE 34

Problems
with
physical
models...

slide-35
SLIDE 35

Problems
with
physical
models...

slide-36
SLIDE 36

How
can
we
use
a
neural
network
then?

slide-37
SLIDE 37

Build
it
into
the
physical
model

slide-38
SLIDE 38

Method
1
:

 
 Infer
the
data,
physics
and
the
neural network

slide-39
SLIDE 39
slide-40
SLIDE 40

Method
2
:
 
 Understand
the
likelihood
(using
neural physical
engines)

slide-41
SLIDE 41


 
 
 
 


slide-42
SLIDE 42

Method
3
:

 
 Likelihood-free
inference

slide-43
SLIDE 43
slide-44
SLIDE 44

Compare
distance
between
observed
summaries
and simulation
summaries
and
select
results
within
 
 


slide-45
SLIDE 45

Conclusions

slide-46
SLIDE 46

Conclusions

Neural
networks
are
not
to
be
trusted They
can
make
trusty
companions
-
when
the
correct framework
is
introduced Using
statistics
we
can
build
neural
networks
into
the forward
model
to
get
unbiased
results

slide-47
SLIDE 47

For
more
information
read
my
new
blog
 


bit.ly/ProbNN