On the Invertibility of ReLU Networks Inverse Problems and Machine - - PowerPoint PPT Presentation

on the invertibility of relu networks
SMART_READER_LITE
LIVE PREVIEW

On the Invertibility of ReLU Networks Inverse Problems and Machine - - PowerPoint PPT Presentation

Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics On the Invertibility of ReLU Networks Inverse Problems and Machine Learning, Caltech Jens Behrmann joint work with: S oren Dittmer, Pascal Fernsel, Peter Maass


slide-1
SLIDE 1

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

On the Invertibility of ReLU Networks

Inverse Problems and Machine Learning, Caltech Jens Behrmann joint work with: S¨

  • ren Dittmer, Pascal Fernsel, Peter Maass

February 09. 2018

U

n i v e r s i t y

  • f

B re m e n 1 / 20 Motivation Uniqueness Stability

slide-2
SLIDE 2

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Motivation: Inverting a network

Reconstruct input x from features 1 z∗ ≈ F(x), F : Rd → RD, MLP or CNN x∗ ∈ Rd input z∗ ∈ RD features, z∗ = F(x∗)

600 800 1000 1200 1400 1600 1800 2000 2200 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 10 -3

Further applications: Inverse problems with learned forward operators Theoretical understanding ...

1Mahendran et al. 2015: Understanding deep image representations by inverting them

U

n i v e r s i t y

  • f

B re m e n 2 / 20 Motivation Uniqueness Stability

slide-3
SLIDE 3

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Main Questions

1 How is information lost during propagation?

Pre-images of ReLU layers

2 Is the inverse mapping stable/ instable?

Singular values of linearization

Related work: Invertibility via assumptions of random weights2,3 Injectivity and stability of ReLU and pooling4

2Giryes et al. 2016: DNN with Random Gaussian Weights: A Universal Classification Strategy? 3Arora et al. 2015: Why are deep nets reversible: a simple theory, with implications for training 4Bruna et al. 2014: Signal Recovery from Pooling Representations

U

n i v e r s i t y

  • f

B re m e n 3 / 20 Motivation Uniqueness Stability

slide-4
SLIDE 4

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Injectivity, Pre-images, Activation functions

Combinatorial conditions for injectivity under ReLU 5

Definition (Retrieval, singleton pre-images)

A ∈ Rm×n, b ∈ Rm. Then, (A, b) does retrieval under ReLU for x ∈ Rn if the pre-image of ReLU(Ax + b) is a singleton. Remark: Other activation functions like ELU, leakyReLU, tanh injective cReLU injective if A is frame 6

5Bruna et al. 2014: Signal Recovery from Pooling Representations 6Shang et al. 2016: Understanding and Improving Convolutional Neural Networks via

Concatenated Rectified Linear Units

U

n i v e r s i t y

  • f

B re m e n 4 / 20 Motivation Uniqueness Stability

slide-5
SLIDE 5

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Equality and Inequality Systems

A|y>0x + b|y>0 = y|y>0 A|y=0x + b|y=0 ≤ 0. Consider the two cases N(A|y>0) = {0} and N(A|y>0) = {0} A|y≤0(PN (A|y>0)⊥x + PN (A|y>0)x) + b|y≤0 ≤ 0 Rewrite it into: Ax + b ≤ 0

U

n i v e r s i t y

  • f

B re m e n 5 / 20 Motivation Uniqueness Stability

slide-6
SLIDE 6

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Definition (Omnidirectional)

A ∈ Rm×n is called omnidirectional if ∃!x : Ax ≤ 0.

Corollary

The following statements are equivalent:

1 A ∈ Rm×n is omnidirectional. 2 Every linear open halfspace contains a row of A. 3 Ax ≤ 0 implies x = 0, where x ∈ Rn.

U

n i v e r s i t y

  • f

B re m e n 6 / 20 Motivation Uniqueness Stability

slide-7
SLIDE 7

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Definition (Omnidirectional for point)

A ∈ Rm×n is called omnidirectional if ∃!x : Ax ≤ 0. A ∈ Rm×n and b ∈ Rm is called omnidirectional for the point p ∈ Rn if b = −Ap and A omnidirectional.

p

U

n i v e r s i t y

  • f

B re m e n 7 / 20 Motivation Uniqueness Stability

slide-8
SLIDE 8

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Theorem (Unique solutions of inequality system)

Let Ax + b ≤ 0 have a solution x0. Then this solution is unique iff there exists an index set, I, for the rows s.t. (A|I, b|I) is omnidirectional for x0. Realistic?

p

U

n i v e r s i t y

  • f

B re m e n 8 / 20 Motivation Uniqueness Stability

slide-9
SLIDE 9

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Pre-Image finite or infinite?

U

n i v e r s i t y

  • f

B re m e n 9 / 20 Motivation Uniqueness Stability

slide-10
SLIDE 10

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Pre-Image finite or infinite?

Theorem (Convex hull)

A ∈ Rm×n is omnidirectional iff 0 ∈ Conv(A)o, where Conv(A)o is the interior of the convex hull, spanned by the rows of A.

U

n i v e r s i t y

  • f

B re m e n 9 / 20 Motivation Uniqueness Stability

slide-11
SLIDE 11

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Singleton / Finite / Infinite

Setup: 2 layer MLP on MNIST, (3500, 784) neurons

1 Count number of positive outputs (> 784 singleton) 2 Projection onto Null-Space of equality system 3 Check for omnidirectionality via linear programming (convex hull as

side-condition)

U

n i v e r s i t y

  • f

B re m e n 10 / 20 Motivation Uniqueness Stability

slide-12
SLIDE 12

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Singleton / Finite / Infinite

Setup: 2 layer MLP on MNIST, (3500, 784) neurons

1 Count number of positive outputs (> 784 singleton) 2 Projection onto Null-Space of equality system 3 Check for omnidirectionality via linear programming (convex hull as

side-condition)

300 400 500 600 723 900 1,000 10 20 30 40 784 infinite singleton in-/finite # positive outputs # (in-)finite pre-image Finite Infinite

U

n i v e r s i t y

  • f

B re m e n 10 / 20 Motivation Uniqueness Stability

slide-13
SLIDE 13

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Stability - Locally Linear

Theorem (Linear functions on convex polytopes 7)

The input space Rd of a ReLU network F is partitioned into convex polytopes PF, where for P ∈ PF F(x) = APx + bP, ∀x ∈ P. (1)

7Raghu et al. 2017: On the Expressive Power of Deep Neural Networks

U

n i v e r s i t y

  • f

B re m e n 11 / 20 Motivation Uniqueness Stability

slide-14
SLIDE 14

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Stability - Simplifications

Assume: x ∈ P known (for reconstruction of x given a output z∗ of the network F) Analyze: Stability of linearization using singular values σmin, σmax: σminx − x′2 ≤ AP(x − x′)2 ≤ σmaxx − x′2, x, x′ ∈ P ∩ N(AP)⊥

Source: Raghu et al. 2017: On the Expressive Power of Deep Neural Networks

U

n i v e r s i t y

  • f

B re m e n 12 / 20 Motivation Uniqueness Stability

slide-15
SLIDE 15

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Stability - ReLU as Diagonal Matrix

Linearization AP of a network with L layers can be written as 8 AP = ALDI L−1AL−1 · · · DI 1A1, where Dii =

  • 1, i ∈ I

0, i ∈ I . → Removal of rows due to ReLU

  • x
  • 8Wang et al. 2016: Analysis of deep neural networks with extended data jacobian matrix
U

n i v e r s i t y

  • f

B re m e n 13 / 20 Motivation Uniqueness Stability

slide-16
SLIDE 16

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Lemma (Removal of weakly correlated rows)

A ∈ Rm×n with rows aj and I ⊆ [m]. For a fixed k ∈ I let ak ∈ N(DIA)⊥. Moreover, let ∀j ∈ I : |aj, ak| ≤ c ak2 √ M , where M = m − |I| and constant c > 0. Then for the singular values σl = 0 of DIA: 0 < σK = min{σl : σl = 0} ≤ c

  • x
  • U

n i v e r s i t y

  • f

B re m e n 14 / 20 Motivation Uniqueness Stability

slide-17
SLIDE 17

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Numerical Experiments

Convolutional Networks (CNN) fit the theoretical framework Linearization via backpropagation w.r.t. input Full SVD for different layers/ samples (nonlinear!) Small CNN on CIFAR10

Type kernel size stride # feature maps # output units Conv layer (3,3) (1,1) 32

  • Conv layer

(3,3) (2,2) 64

  • Conv layer

(3,3) (1,1) 64

  • Conv layer

(3,3) (1,1) 32

  • Conv layer

(3,3) (1,1) 32

  • Conv layer

(3,3) (2,2) 64

  • Dense layer
  • 512

Dense layer

  • 10
U

n i v e r s i t y

  • f

B re m e n 15 / 20 Motivation Uniqueness Stability

slide-18
SLIDE 18

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Effect of ReLU

500 1000 1500 2000 2500 3000 3500 Index of singular value 10−4 10−3 10−2 10−1 100 101 102 Singular value

Layer 3 Layer 4 Layer 9 Layer 10

U

n i v e r s i t y

  • f

B re m e n 16 / 20 Motivation Uniqueness Stability

slide-19
SLIDE 19

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Decay over Layers

500 1000 1500 2000 2500 3000 3500 Index of singular value 10−4 10−3 10−2 10−1 100 101 102 Singular value

Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6

U

n i v e r s i t y

  • f

B re m e n 17 / 20 Motivation Uniqueness Stability

slide-20
SLIDE 20

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Trade-off: Stability vs. Information Loss

5000 10000 15000 20000 25000 30000 35000 size of output 102 103 104 # singular values 1 2 3 4 5 6 7 8 9 10 11 # layers 100 101 102 103 104 105 106 condition

U

n i v e r s i t y

  • f

B re m e n 18 / 20 Motivation Uniqueness Stability

slide-21
SLIDE 21

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Conclusion and Outlook

Approach to better understand invertibility of deep ReLU networks Condition if pre-image of a layer is singleton, finite or infinite Stability analysis via SVD of linearization

U

n i v e r s i t y

  • f

B re m e n 19 / 20 Motivation Uniqueness Stability

slide-22
SLIDE 22

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Conclusion and Outlook

Approach to better understand invertibility of deep ReLU networks Condition if pre-image of a layer is singleton, finite or infinite Stability analysis via SVD of linearization Next steps: Theory for CNNs, residual connections, ... Dropping linearity assumption Connection to stability analysis in context of adversarial examples ...

U

n i v e r s i t y

  • f

B re m e n 19 / 20 Motivation Uniqueness Stability

slide-23
SLIDE 23

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Thank you for your attention!

Joint work with: S¨

  • ren Dittmer

Pascal Fernsel Peter Maass

U

n i v e r s i t y

  • f

B re m e n 20 / 20 Motivation Uniqueness Stability

slide-24
SLIDE 24

Center for Industrial Mathematics Faculty 03 Mathematics/ Computer Sciences

Boskamp, T. et al. (2016)A new classification method for MALDI imaging mass spectrometry data acquired on formalin-fixed paraffin-embedded tissue samples, BBA- Proteins and Proteomics Behrmann,J. et al. (2017) Deep learning for tumor classification in imaging mass spectrometry, Bioinformatics Bruna et al. (2014) Signal Recovery from Pooling Representations, ICML Mahendran,A. ,Vedaldi,A. (2015) Understanding deep image representations by inverting them, CVPR Raghu,M. et al. (2017) On the Expressive Power of Deep Neural Networks, ICML

U

n i v e r s i t y

  • f

B re m e n 20 / 20 Motivation Uniqueness Stability