Tutorials Interpretable Deep Learning: Towards Understanding & - - PowerPoint PPT Presentation

tutorials
SMART_READER_LITE
LIVE PREVIEW

Tutorials Interpretable Deep Learning: Towards Understanding & - - PowerPoint PPT Presentation

Tutorials Interpretable Deep Learning: Towards Understanding & Explaining DNNs P a r t 3 : V a l i d a t i n g E x p l a n a t i o n s W o j c i e c h S a m e k , G r g o i r e M o n t


slide-1
SLIDE 1

1 / 2 8

Interpretable Deep Learning: Towards Understanding & Explaining DNNs

P a r t 3 : V a l i d a t i n g E x p l a n a t i

  • n

s

W

  • j

c i e c h S a m e k , G r é g

  • i

r e M

  • n

t a v

  • n

, K l a u s

  • R
  • b

e r t M ü l l e r

Tutorials

slide-2
SLIDE 2

2 / 2 8

F r

  • m

M L S u c c e s s e s t

  • A

p p l i c a t i

  • n

s

Autonomous Driving Medical Diagnosis Networks (smart grids, etc.)

Visual Reasoning AlphaGo beats Go human champ Deep Net outperforms humans in image classification

slide-3
SLIDE 3

3 / 2 8

M a k i n g M L M

  • d

e l s I n t e r p r e t a b l e

slide-4
SLIDE 4

4 / 2 8

L a y e r

  • W

i s e R e l e v a n c e P r

  • p

a g a t i

  • n

( L R P )

[ B a c h ’ 1 5 ]

slide-5
SLIDE 5

5 / 2 8

Q u e s t i

  • n

: S u p p

  • s

e t h a t w e h a v e p r

  • p

a g a t e d t h e r e l e v a n c e u n t i l a g i v e n l a y e r . H

  • w

s h

  • u

l d i t b e p r

  • p

a g a t e d

  • n

e l a y e r f u r t h e r ? I d e a : B y p e r f

  • r

m i n g a T a y l

  • r

e x p a n s i

  • n
  • f

t h e r e l e v a n c e .

D e e p T a y l

  • r

D e c

  • m

p

  • s

i t i

  • n

[ M

  • n

t a v

  • n

’ 1 7 ]

slide-6
SLIDE 6

6 / 2 8

D e e p T a y l

  • r

D e c

  • m

p

  • s

i t i

  • n

R e l e v a n c e n e u r

  • n

: T a y l

  • r

e x p a n s i

  • n

: R e d i s t r i b u t i

  • n

:

slide-7
SLIDE 7

7 / 2 8

R e v i s i t i n g t h e D T D R

  • t

P

  • i

n t

✔ ✔

1 . n e a r e s t r

  • t

3 . g e n e r a l i z e d 2 . r e s c a l e d e x c i t a t i

  • n

s C h

  • i

c e

  • f

r

  • t

p

  • i

n t

( D e e p T a y l

  • r

g e n e r i c )

G e n e r a l i z e d r u l e

slide-8
SLIDE 8

8 / 2 8

T h e S p e c i a l C a s e “

γ

= 1 . ”

Q u e s t i

  • n

: I s t h e r e a c

  • n

n e c t i

  • n

b e t w e e n t h e t w

  • m

e t h

  • d

s ? F i n d t h e d i f f e r e n c e . . .

slide-9
SLIDE 9

9 / 2 8

T h e S p e c i a l C a s e “

γ

= 1 . ”

w h i c h c a n a l s

  • b

e r e w r i t t e n a s : F

  • r

n e t w

  • r

k s w i t h b i a s z e r

  • ,

t h e p r

  • c

e d u r e b e c

  • m

e s e q u i v a l e n t t

  • g

r a d x i n p u t

[ s e e a l s

  • S

h r i k u m a r ’ 1 7 ]

[Shrikumar’17] Not Just a Black Box: Learning Important Features Through Propagating Activation Differences, ArXiv

slide-10
SLIDE 10

1 / 2 8

Question: How to select the optimal parameter “γ ” ?

slide-11
SLIDE 11

1 1 / 2 8

E x p l a n a t i

  • n

S e l e c t i

  • n
slide-12
SLIDE 12

1 2 / 2 8

E x p l a n a t i

  • n

S e l e c t i

  • n

Q u e s t i

  • n

: H

  • w

t

  • a

s s e s s e x p l a n a t i

  • n

q u a l i t y ? M

  • r

e d i r e c t a p p r

  • a

c h : T r y a l l p a r a m e t e r s , a n d s e l e c t t h e

  • n

e p r

  • d

u c i n g t h e b e s t e x p l a n a t i

  • n

s .

slide-13
SLIDE 13

1 3 / 2 8

E v a l u a t i n g E x p l a n a t i

  • n

s

H u m a n a s s e s s m e n t

  • A

e s t h e t i c p r

  • p

e r t i e s

  • U

s a b i l i t y

  • f

t h e e x p l a n a t i

  • n

( e . g . t

  • u

n d e r s t a n d t h e c l a s s i fi e r ) .

→ R e q u i r e s a n e x p e r i me n t a l s t u d y .

slide-14
SLIDE 14

1 4 / 2 8

E v a l u a t i n g E x p l a n a t i

  • n

s

I d e a : T e s t i n g i f e x p l a n a t i

  • n

s s a t i s f y c e r t a i n a x i

  • m

s / p r

  • p

e r t i e s . E x a mp l e s :

  • E

x p l a n a t i

  • n

m u s t b e s e l f

  • c
  • n

s i s t e n t ( e . g . c

  • n

s e r v a t i

  • n
  • f

e v i d e n c e )

  • E

x p l a n a t i

  • n

m u s t b e c

  • n

s i s t e n t i n i n p u t d

  • m

a i n ( e . g . c

  • n

t i n u i t y )

  • E

x p l a n a t i

  • n

m u s t b e c

  • n

s i s t e n t i n t h e s p a c e

  • f

m

  • d

e l s ( e . g . i m p l e m e n t a t i

  • n

i n v a r i a n c e )

slide-15
SLIDE 15

1 5 / 2 8

E x a m p l e 1 : C

  • n

s e r v a t i

  • n

P

  • s

s i b l e e x p l a n a t i

  • n

s : S i m p l e e x a m p l e :

slide-16
SLIDE 16

1 6 / 2 8

E x a m p l e 1 : C

  • n

s e r v a t i

  • n
slide-17
SLIDE 17

1 7 / 2 8

E x a m p l e 1 : C

  • n

s e r v a t i

  • n
slide-18
SLIDE 18

1 8 / 2 8

W h y G r a d x I n p u t S c

  • r

e s E x p l

  • d

e ?

A n s w e r : N e u r a l n e t w

  • r

k d e p t h c a u s e s t h e f u n c t i

  • n

t

  • b

e c

  • m

e s t e e p a n d t h e g r a d i e n t v e r y l a r g e .

[ c f . B e n g i

9 4 , M

  • n

t u f a r ’ 1 4 ]

  • 1

1 2 2

  • 1

x y x y 1 1

  • 1

1 2 2

  • 1

x

  • 1

1 2 2

  • 1

y x y 1 1 x y 1 1

  • 1

1 2 2

  • 1

x

  • 1

1 2 2

  • 1

y

  • 1

1 2 2

  • 1

d e p t h 1 d e p t h 2 d e p t h 3

[Bengio’94] Learning long- term dependencies with gradient descent is difficult. IEEE Trans. Neural Networks [Montufar’14] On the Number

  • f Linear Regions of DNNs.

NIPS 2014.

slide-19
SLIDE 19

1 9 / 2 8

W h y G r a d x I n p u t S c

  • r

e s E x p l

  • d

e ?

d i v i s i

  • n

b y z e r

  • T

h i s c a n a l s

  • b

e s e e n f r

  • m

t h e f

  • r

m u l a s :

slide-20
SLIDE 20

2 / 2 8

E x a m p l e 2 : C

  • n

t i n u i t y

[ M

  • n

t a v

  • n

’ 1 8 ]

E x p l a n a t i

  • n

s c

  • r

e s m u s t b e c

  • n

t i n u

  • u

s i n i n p u t d

  • m

a i n .

slide-21
SLIDE 21

2 1 / 2 8

v i d e

  • i

n p u t

C

  • n

t i n u i t y D e m

  • A

n i m a t i

  • n

s a v a i l a b l e a t : h t t p : / / w w w . h e a t m a p p i n g .

  • r

g / e v a l u a t i n g

slide-22
SLIDE 22

2 2 / 2 8

W h y i s G r a d x I n p u t D i s c

  • n

t i n u

  • u

s ?

A n s w e r : A g a i n , b e c a u s e

  • f

d e p t h , s p e c i fi c a l l y , b e c a u s e t h e f u n c t i

  • n

b e c

  • m

e s h i g h l y n

  • n
  • s

m

  • t

h .

[ c f . M

  • n

t u f a r ’ 1 4 , B a l d u z z i ’ 1 7 ]

  • 1

1 2 2

  • 1

x y x y 1 1

  • 1

1 2 2

  • 1

x

  • 1

1 2 2

  • 1

y x y 1 1 x y 1 1

  • 1

1 2 2

  • 1

x

  • 1

1 2 2

  • 1

y

  • 1

1 2 2

  • 1

d e p t h 1 d e p t h 2 d e p t h 3

[Montufar’14] On the Number

  • f Linear Regions of DNNs.

NIPS 2014. [Balduzzi’17] The Shattered Gradients Problem: If resnets are the answer [...] ICML 2017

slide-23
SLIDE 23

2 3 / 2 8

E x a m p l e 3 : I m p l . I n v a r i a n c e

[ S u n d a r a r a j a n ’ 1 7 ]

E x a mp l e : t w

  • n

e t w

  • r

k s i m p l e m e n t i n g t h e m a x i m u m f u n c t i

  • n

:

N e t w

  • r

k ( a ) : N e t w

  • r

k ( b ) : G r a d i e n t i s i m p l e m e n t a t i

  • n

i n v a r i a n t , t h e r e f

  • r

e e x p l a n a t i

  • n

t

  • .

C

  • u

n t e r

  • e

x a m p l e f

  • r

:

[Sundararajan’17] M Sundararajan, A Taly, Q Yan: Axiomatic Attribution for Deep

  • Networks. ICML 2017
slide-24
SLIDE 24

2 4 / 2 8

I m p l e m e n t a t i

  • n

M a t t e r s f

  • r

L R P

n a i v e i m p l e m e n t a t i

  • n

s b e t t e r i m p l e m e n t a t i

  • n
slide-25
SLIDE 25

2 5 / 2 8

A B l i n d S p

  • t

i n E x p l a n a t i

  • n

S e l e c t i

  • n

r e d i s t r i b u t i n g u n i f

  • r

m l y

  • n

p i x e l s . I t i s :

  • c
  • n

s e r v a t i v e

  • c
  • n

t i n u

  • u

s

  • i

m p l e m e n t a t i

  • n

i n v a r i a n t b u t i t i s a l s

  • c
  • m

p l e t e l y u n i n f

  • r

m a t i v e . C

  • n

s i d e r t h e s i m p l e e x p l a n a t i

  • n

t e c h n i q u e : → N e e d t

  • v

e r i f y s e l e c t i v i t y ( i . e . t h e e x p l a n a t i

  • n

s h

  • u

l d d i s c r i m i n a t e b e t w e e n r e l e v a n t a n d i r r e l e v a n t v a r i a b l e s . )

slide-26
SLIDE 26

2 6 / 2 8

P i x e l

  • F

l i p p i n g

[ B a c h ’ 1 5 , S a m e k ’ 1 7 ]

I d e a : T e s t t h a t r e m

  • v

i n g i n p u t v a r i a b l e s w i t h h i g h a s s i g n e d r e l e v a n c e m a k e s t h e f u n c t i

  • n

v a l u e d r

  • p

q u i c k l y .

c

  • m

p u t e h e a t m a p d e s t r

  • y

p i x e l s c h e c k n e w f u n c t i

  • n

v a l u e

slide-27
SLIDE 27

2 7 / 2 8

P i x e l

  • F

l i p p i n g D e m

  • i

n p u t e x p l a n a t i

  • n

A n i m a t i

  • n

s a v a i l a b l e a t : h t t p : / / w w w . h e a t m a p p i n g .

  • r

g / e v a l u a t i n g

slide-28
SLIDE 28

2 8 / 2 8

C

  • n

c l u s i

  • n

f

  • r

P a r t 3

Most explanation methods have hyperparameters. As there is no ground-truth explanations available, standard model selection techniques do not apply. The problem of explanation selection can be addressed axiomatically (e.g. conservation, continuity, implementation invariance). Axioms may not suffice in selecting a good explanation. It is also important to design experiments that test the explanation against the model (e.g. pixel-flipping).