When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago - - PowerPoint PPT Presentation

when neurons fail
SMART_READER_LITE
LIVE PREVIEW

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago - - PowerPoint PPT Presentation

When Neurons Fail El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016 1 / 28 Motivations Table of Contents Motivations 1 Problem statement 2 Results 3 2 / 28 Motivations Universality NNs everywhere 3 / 28 Motivations


slide-1
SLIDE 1

When Neurons Fail

El Mahdi El Mhamdi, Rachid Guerraoui BDA, Chicago July 25th, 2016

1 / 28

slide-2
SLIDE 2

Motivations

Table of Contents

1

Motivations

2

Problem statement

3

Results

2 / 28

slide-3
SLIDE 3

Motivations Universality

NNs everywhere

3 / 28

slide-4
SLIDE 4

Motivations Universality

Model

Figure : Feed forward neural network

Nodes: neurons Links: synapses

4 / 28

slide-5
SLIDE 5

Motivations Universality

Model

Fneu(X) =

NL

X

i=1

w (L+1)

i

y (L)

i

with y (l)

j

= '(s(l)

j ) for 1  l  L ; y (0) j

= xj and s(l)

j

=

Nl−1

X

i=1

w (l)

ji y (l−1) i

5 / 28

slide-6
SLIDE 6

Motivations Scalability

Software simulated NN

6 / 28

slide-7
SLIDE 7

Motivations Scalability

Hardware-based NNs

SyNAPSE (DARPA, IBM), Human Brain Project (SP9 on neuromorphic), Brains in Silicon at Stanford...

7 / 28

slide-8
SLIDE 8

Motivations Fault tolerance

How robust is this?

Crash failure: a component stops working.

8 / 28

slide-9
SLIDE 9

Motivations Fault tolerance

How robust is this?

Byzantine failure: a component sends arbitrary values.

9 / 28

slide-10
SLIDE 10

Motivations Fault tolerance

Biological plausibility

Examples of extreme robustness in nature

1

1Feuillet et al., 2007. Brain of a white-collar worker. Lancet (London, England), 370(9583), p.262. 10 / 28

slide-11
SLIDE 11

Motivations Experimental observations

Classical training leads to non-robust NN

E: difference between desired and actual outputs on a training set ∆w(l)

ij

= dE dw(l)

ij

9 robust weight distribution 7! Reach them with learning !

11 / 28

slide-12
SLIDE 12

Motivations Solution

Dropout

Randomly switch neurons off during the training phase Kerlirzin and Vallet (1991, 1993), Hinton et al. (2012, 2014) Minimize Eav = P

D

E DP(D) where P(D) = (1 p)|D|p(N|D|)

12 / 28

slide-13
SLIDE 13

Motivations Lack of theory

Experimentally observed robustness

2

Over-provisionning Upper-bound ?

2from Kerlirzin 1993, edited 13 / 28

slide-14
SLIDE 14

Problem statement

Table of Contents

1

Motivations

2

Problem statement

3

Results

14 / 28

slide-15
SLIDE 15

Problem statement Given a precision ✏, derive a tight bound on failures to keep ✏-precision for a any neural network3 approximating a function F

3note: learning is taken for granted 15 / 28

slide-16
SLIDE 16

Problem statement

Theoretical background: universality

Theorem4: 8(F, ✏), 9 NN generating Fneu s.t kFneu Fk < ✏

4Cybenko 1989, Horkink 1991 16 / 28

slide-17
SLIDE 17

Problem statement

Minimal networks are not robust 5 Given over-provision ✏0 (✏0 < ✏), what condition on failures to preserve ✏-precision?

5not to mention: impossible to derive 17 / 28

slide-18
SLIDE 18

Results

Table of Contents

1

Motivations

2

Problem statement

3

Results

18 / 28

slide-19
SLIDE 19

Results Single layer, crash

f  ✏ ✏0 wm More over-provision 7! more robustness Unequal weight distribution 7! single point of failure No Byzantine FT 7! bounded synaptic capacity

19 / 28

slide-20
SLIDE 20

Results General case

Multilayer networks, Byzantine failures

Failure at layer l propagates though layers l0 > l (Byz and crash). Factors: weights, |layers|, |neurons|, Lipschitz coef. of ' Total error propagated to the output should be  ✏ ✏0

20 / 28

slide-21
SLIDE 21

Results General case

Multilayer networks, Byzantine failures

Bounded channel capacity (otherwise no robustness to Byzantine) Propagated error  C

L

P

l=1

flK Llw(L+1)

m L

Q

l0=l+1

(Nl0 fl0)w(l0)

m

! C: capacity, K: Lipschitz coeff., w(l)

m maximal weight to layer l

Nl: |neurons|, fl: |failures|

21 / 28

slide-22
SLIDE 22

Results General case

How to read the formula

C

L

X

l=1

flK Llw(L+1)

m L

Y

l0=l+1

(Nl0 fl0)w(l0)

m

!  ✏ ✏0

22 / 28

slide-23
SLIDE 23

Results General case

How to read the formula

C

L

X

l=1

flK Llw(L+1)

m L

Y

l0=l+1

(Nl0 fl0)w(l0)

m

!  ✏ ✏0 worst-case propagated error

23 / 28

slide-24
SLIDE 24

Results General case

How to read the formula

C

L

X

l=1

flK Llw(L+1)

m L

Y

l0=l+1

(Nl0 fl0)w(l0)

m

!  ✏ ✏0 error margin permitted by the over-provision

24 / 28

slide-25
SLIDE 25

Results General case

How to read the formula

C

L

X

l=1

flK Llw(L+1)

m L

Y

l0=l+1

(Nl0 fl0)w(l0)

m

!  ✏ ✏0 Error (at most C is transmitted) at fl neurons in layer l propagating through l0 > l. (Nl0 fl0) : only correct neurons propagating it, multiplying by Kw(l0)

m .

25 / 28

slide-26
SLIDE 26

Results General case

Unbounded capacity

Taking C 7! 1 C

L

X

l=1

flK Llw(L+1)

m L

Y

l0=l+1

(Nl0 fl0)w(l0)

m

!  ✏ ✏0 Then 8 l fl = 0 No Byzantine FT.

26 / 28

slide-27
SLIDE 27

Results Applications

Generalization to synaptic failures. Applications of the bound (Memory cost, neuron duplication, synchrony) Other neural computing models.

27 / 28

slide-28
SLIDE 28

Questions ?

More details: https://infoscience.epfl.ch/record/217561

28 / 28