E x p l a i n i n g M a c h i n e L e a r n i - PowerPoint PPT Presentation

E x p l a i n i n g M a c h i n e L e a r n i n g D e c i s i o n s G r é g o i r e M o n t a v o n , T U B e r l i n J o i n t w o r k w i t h : W o j c i e c h S a m e k , K l a u s - R o b e r t M ü l l e r , S e b a s t i a n L a p u s c h k i n , A l e x a n d e r B i n d e r 1 8 / 0 9 / 2 0 1 8 I n t l . W o r k s h o p M L & A I , T e l e c o m P a r i s T e c h 1 / 4 5

F r o m M L S u c c e s s e s t o A p p l i c a t i o n s Deep Net outperforms Medical Diagnosis humans in image classification AlphaGo beats Go human champ Autonomous Driving Visual Reasoning Networks (smart grids, etc.) 2 / 4 5

Can we interpret what a ML model has learned? 3 / 4 5

First, we need to define what we want from interpretable ML. 4 / 4 5

U n d e r s t a n d i n g D e e p N e t s : T w o V i e w s Understanding what Understanding how the mechanism the network network relates the input uses to solve a problem or to the output variables. implement a function. 5 / 4 5

interpreting explaining predicted classes individual decisions 6 / 4 5

I n t e r p r e t i n g P r e d i c t e d C l a s s e s “How does a goose typically look like E x a m p l e : according to the neural network?” non-goose goose Image from Symonian’13

E x p l a i n i n g I n d i v i d u a l D e c i s i o n s “Why is a given image classified as a E x a m p l e : sheep?” non-sheep sheep Images from Lapuschkin’16 8 / 4 5

E x a m p l e : A u t o n o m o u s D r i v i n g [ B o j a r s k i ’ 1 7 ] B o j a r s k i e t a l . 2 0 1 7 “ E x p l a i n i n g H o w a D e e p N e u r a l N e t w o r k T r a i n e d w i t h E n d - t o - E n d L e a r n i n g S t e e r s a C a r ” P i l o t N e t I n p u t : D e c i s i o n E x p l a n a t i o n : 9 / 4 5

E x a m p l e : P a s c a l V O C C l a s s i fi c a t i o n [ L a p u s c h k i n ’ 1 6 ] C o m p a r i n g P e r f o r m a n c e o n P a s c a l V O C 2 0 0 7 ( F i s h e r V e c t o r C l a s s i fi e r v s . D e e p N e t p r e t r a i n e d o n I m a g e N e t ) F i s h e r c l a s s i fi e r ( p r e t r a i n e d ) d e e p n e t L a p u s c h k i n e t a l . 2 0 1 6 . A n a l y z i n g C l a s s i fi e r s : F i s h e r V e c t o r s a n d D e e p N e u r a l 1 0 / 4 5 N e t w o r k s

E x a m p l e : P a s c a l V O C C l a s s i fi c a t i o n [ L a p u s c h k i n ’ 1 6 ] L a p u s c h k i n e t a l . 2 0 1 6 . A n a l y z i n g C l a s s i fi e r s : F i s h e r V e c t o r s a n d D e e p N e u r a l 1 1 / 4 5 N e t w o r k s

E x a m p l e : M e d i c a l D i a g n o s i s [ B i n d e r ’ 1 8 ] B i n d e r e t a l . 2 0 1 8 “ T o w a r d s c o m p u t a t i o n a l fl u o r e s c e n c e m i c r o s c o p y : M a c h i n e l e a r n i n g - b a s e d i n t e g r a t e d p r e d i c t i o n o f m o r p h o l o g i c a l a n d m o l e c u l a r t u m o r p r o fi l e s ” A : I n v a s i v e b r e a s t c a n c e r , H & E s t a i n ; B : N o r m a l m a m m a r y g l a n d s a n d fi b r o u s t i s s u e , H & E s t a i n ; C : D i f f u s e c a r c i n o m a i n fi l t r a t e i n fi b r o u s t i s s u e , H e m a t o x y l i n s t a i n 1 2 / 4 5

E x a m p l e : Q u a n t u m C h e m i s t r y [ S c h ü t t ’ 1 7 ] S c h ü t t e t a l . 2 0 1 7 : Q u a n t u m - C h e m i c a l I n s i g h t s f r o m D e e p T e n s o r N e u r a l N e t w o r k s m o l e c u l a r s t r u c t u r e ( e . g . a t o m s p o s i t i o n s ) i n t e r p r e t a b l e i n s i g h t DFT calculation of the stationary Schrödinger Equation P B E 0 , D T N N , P e d r e w ’ 8 6 S c h ü t t ’ 1 7 m o l e c u l a r e l e c t r o n i c p r o p e r t i e s ( e . g . a t o m i z a t i o n e n e r g y ) 1 3 / 4 5

E x a m p l e s o f E x p l a n a t i o n M e t h o d s 1 4 / 4 5

E x p l a i n i n g b y D e c o m p o s i n g I m p o r t a n c e o f a v a r i a b l e i s t h e s h a r e o f t h e f u n c t i o n s c o r e t h a t c a n b e a t t r i b u t e d t o i t . i n p u t D N N D e c o m p o s i t i o n p r o p e r t y : 1 5 / 4 5

E x p l a n i n g L i n e a r M o d e l s A s i m p l e m e t h o d : 1 6 / 4 5

E x p l a n i n g L i n e a r M o d e l s T a y l o r d e c o m p o s i t i o n a p p r o a c h : I n s i g h t : e x p l a n a t i o n d e p e n d s o n t h e r o o t p o i n t . 1 7 / 4 5

E x p l a i n i n g N o n l i n e a r M o d e l s s e c o n d - o r d e r t e r m s a r e h a r d t o i n t e r p r e t a n d c a n b e v e r y l a r g e 1 8 / 4 5

E x p l a i n i n g N o n l i n e a r M o d e l s b y P r o p a g a t i o n L a y e r - W i s e R e l e v a n c e P r o p a g a t i o n ( L R P ) [ B a c h ’ 1 5 ] 1 9 / 4 5

E x p l a i n i n g N o n l i n e a r M o d e l s b y P r o p a g a t i o n Is there an underlying mathematical framework? 2 0 / 4 5

D e e p T a y l o r D e c o m p o s i t i o n ( D T D ) [ M o n t a v o n ’ 1 7 ] Question: Suppose that we have propagated LRP scores (“relevance”) until a given layer. How should it be propagated one layer further? Key idea: Let’s use Taylor expansions for this. 2 1 / 4 5

D T D S t e p 1 : T h e S t r u c t u r e o f R e l e v a n c e O b s e r v a t i o n : R e l e v a n c e a t e a c h l a y e r i s a p r o d u c t o f t h e a c t i v a t i o n a n d a n a p p r o x i m a t e l y c o n s t a n t t e r m . 2 2 / 4 5

D T D S t e p 1 : T h e S t u c t u r e o f R e l e v a n c e 2 3 / 4 5

D T D S t e p 2 : T a y l o r E x p a n s i o n 2 4 / 4 5

D T D S t e p 2 : T a y l o r E x p a n s i o n T a y l o r e x p a n s i o n a t r o o t p o i n t : R e l e v a n c e c a n n o w b e b a c k w a r d p r o p a g a t e d 2 5 / 4 5

D T D S t e p 3 : C h o o s i n g t h e R o o t P o i n t ( D e e p T a y l o r g e n e r i c ) C h o i c e o f r o o t p o i n t 1 . n e a r e s t r o o t ✔ 2 . r e s c a l e d e x c i t a t i o n s (same as LRP - ) α β 1 0 2 6 / 4 5

D T D : C h o o s i n g t h e R o o t P o i n t ( D e e p T a y l o r g e n e r i c ) P i x e l s d o ma i n C h o i c e o f r o o t p o i n t 2 7 / 4 5

D T D : C h o o s i n g t h e R o o t P o i n t ( D e e p T a y l o r g e n e r i c ) E mb e d d i n g : C h o i c e o f r o o t p o i n t i m a g e s o u r c e : T e n s o r fl o w t u t o r i a l 2 8 / 4 5

E x p l a i n i n g M a c h i n e L e a r n i - PowerPoint PPT Presentation

E x p l a i n i n g M a c h i n e L e a r n i n g D e c i s i o n s G r g o i r e M o n t a v o n , T U B e r l i n J o i n t w o r k w i t h : W o j c i e c h S a

Off-Path Round Trip Time Measurement via TCP/IP Side Channels Geoffrey Alexander and Jedidiah R.

Digital Pre-Distortion Derek Kozel What is Digital Pre-Distortion (DPD) A technique for

6/22/2018 Outline Parkinsons Disease Demographics PARKINSONS DISEASE PRIMER

Small Inductive Safe Invariants Alexander Ivrii, Arie Gurfinkel, Anton Belov Introduction

SuperKEKB R&D for SuperKEKB and the next generation high luminosity colliders M.

FUZZING FOR VULNERABILITY DETECTION FUZZING FOR VULNERABILITY DETECTION WHO WE ARE Prof. Dr.

Boosting Verification Scalability via Structural Grouping and Semantic Partitioning of Properties

Implementing Grover Oracles for Quantum Key Search on AES and LowMC Samuel Jaques 1 , Michael

Genus minimizing knots in rational homology spheres Yi Ni yini@caltech.edu Department of

R enyi Entropy and Spectral Geometry Alexander Patrushev in collaboration with Dmitri Fursaev

Alex Krechmer, Chris Pawlowicz, Alexander Sorkin TechInsights 3000 Solandt Road, Ottawa, ON,

Feature Engineering and Selection CS 294: Practical Machine Learning October 1 st , 2009

teaching English for Academic Purposes at low proficiency levels. Olwyn Alexander Chair BALEAP

JOOS COMP 520: Compiler Design (4 credits) Alexander Krolik alexander.krolik@mail.mcgill.ca MWF

Income Tax Planning for Negative Capital Account Real Estate: Dealing with Phantom Gain Jerome

Towards a BES Light Source Wide Event-triggered Tomography Data Analysis Pipeline Using a

inverse MEG-problems Authors: Galchenkova Marina, Demidov Alexander, Kochurov Alexander Plan

The duality between Poincar e type inequalities on Hamming cube and square function

Grow & Cut - Using Representations in Cryptanalysis Alexander Meurer , Ruhr-Universitt

Controllable Neural Plot Generation via Reward Shaping PRADYUMNA TAMBWEKAR, MURTAZA DHULIAWALA,

and ( ) ( ) , F , P , , P . 1 Combining both models, we get the product space

Coulostatic Discharge-Based Biosensor Array in 180nm CMOS Alexander Sun, Enrique

On the power of non-adaptive quantum chosen-ciphertext attacks joint work with Gorjan Alagic

CS171 Visualization Alexander Lex alex@seas.harvard.edu Graphs [xkcd] This Week Reading: VAD,

Sambuz

Useful Links

Newsletter

Mail Us

E x p l a i n i n g M a c h i n e L e a r n i - PowerPoint PPT Presentation

E x p l a i n i n g M a c h i n e L e a r n i n g D e c i s i o n s G r g o i r e M o n t a v o n , T U B e r l i n J o i n t w o r k w i t h : W o j c i e c h S a

Off-Path Round Trip Time Measurement via TCP/IP Side Channels Geoffrey Alexander and Jedidiah R.

Digital Pre-Distortion Derek Kozel What is Digital Pre-Distortion (DPD) A technique for

6/22/2018 Outline Parkinsons Disease Demographics PARKINSONS DISEASE PRIMER

Small Inductive Safe Invariants Alexander Ivrii, Arie Gurfinkel, Anton Belov Introduction

SuperKEKB R&amp;D for SuperKEKB and the next generation high luminosity colliders M.

FUZZING FOR VULNERABILITY DETECTION FUZZING FOR VULNERABILITY DETECTION WHO WE ARE Prof. Dr.

Boosting Verification Scalability via Structural Grouping and Semantic Partitioning of Properties

Implementing Grover Oracles for Quantum Key Search on AES and LowMC Samuel Jaques 1 , Michael

Genus minimizing knots in rational homology spheres Yi Ni yini@caltech.edu Department of

R enyi Entropy and Spectral Geometry Alexander Patrushev in collaboration with Dmitri Fursaev

Alex Krechmer, Chris Pawlowicz, Alexander Sorkin TechInsights 3000 Solandt Road, Ottawa, ON,

Feature Engineering and Selection CS 294: Practical Machine Learning October 1 st , 2009

teaching English for Academic Purposes at low proficiency levels. Olwyn Alexander Chair BALEAP

JOOS COMP 520: Compiler Design (4 credits) Alexander Krolik alexander.krolik@mail.mcgill.ca MWF

Income Tax Planning for Negative Capital Account Real Estate: Dealing with Phantom Gain Jerome

Towards a BES Light Source Wide Event-triggered Tomography Data Analysis Pipeline Using a

inverse MEG-problems Authors: Galchenkova Marina, Demidov Alexander, Kochurov Alexander Plan

The duality between Poincar e type inequalities on Hamming cube and square function

Grow &amp; Cut - Using Representations in Cryptanalysis Alexander Meurer , Ruhr-Universitt

Controllable Neural Plot Generation via Reward Shaping PRADYUMNA TAMBWEKAR*, MURTAZA DHULIAWALA*,

and ( ) ( ) , F , P , , P . 1 Combining both models, we get the product space

Coulostatic Discharge-Based Biosensor Array in 180nm CMOS Alexander Sun, Enrique

On the power of non-adaptive quantum chosen-ciphertext attacks joint work with Gorjan Alagic

CS171 Visualization Alexander Lex alex@seas.harvard.edu Graphs [xkcd] This Week Reading: VAD,

Sambuz

Useful Links

Newsletter

Mail Us

SuperKEKB R&D for SuperKEKB and the next generation high luminosity colliders M.

Grow & Cut - Using Representations in Cryptanalysis Alexander Meurer , Ruhr-Universitt

Controllable Neural Plot Generation via Reward Shaping PRADYUMNA TAMBWEKAR, MURTAZA DHULIAWALA,