on the invertibility of relu networks
play

On the Invertibility of ReLU Networks Inverse Problems and Machine - PowerPoint PPT Presentation

Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics On the Invertibility of ReLU Networks Inverse Problems and Machine Learning, Caltech Jens Behrmann joint work with: S oren Dittmer, Pascal Fernsel, Peter Maass


  1. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics On the Invertibility of ReLU Networks Inverse Problems and Machine Learning, Caltech Jens Behrmann joint work with: S¨ oren Dittmer, Pascal Fernsel, Peter Maass February 09. 2018 Motivation Uniqueness Stability 1 / 20 U n i v e r s i t y o f B re m e n

  2. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Motivation: Inverting a network Reconstruct input x from features 1 10 -3 2.2 2 z ∗ ≈ F ( x ) , 1.8 1.6 1.4 1.2 F : R d → R D , MLP or CNN 1 x ∗ ∈ R d input 0.8 0.6 z ∗ ∈ R D features, z ∗ = F ( x ∗ ) 0.4 0.2 600 800 1000 1200 1400 1600 1800 2000 2200 Further applications: Inverse problems with learned forward operators Theoretical understanding ... 1 Mahendran et al. 2015: Understanding deep image representations by inverting them Motivation Uniqueness Stability 2 / 20 U n i v e r s i t y o f B re m e n

  3. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Main Questions 1 How is information lost during propagation? Pre-images of ReLU layers 2 Is the inverse mapping stable/ instable? Singular values of linearization Related work: Invertibility via assumptions of random weights 2 , 3 Injectivity and stability of ReLU and pooling 4 2 Giryes et al. 2016: DNN with Random Gaussian Weights: A Universal Classification Strategy? 3 Arora et al. 2015: Why are deep nets reversible: a simple theory, with implications for training 4 Bruna et al. 2014: Signal Recovery from Pooling Representations Motivation Uniqueness Stability 3 / 20 U n i v e r s i t y o f B re m e n

  4. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Injectivity, Pre-images, Activation functions Combinatorial conditions for injectivity under ReLU 5 Definition (Retrieval, singleton pre-images) A ∈ R m × n , b ∈ R m . Then, ( A , b ) does retrieval under ReLU for x ∈ R n if the pre-image of ReLU( Ax + b ) is a singleton. Remark: Other activation functions like ELU, leakyReLU, tanh injective cReLU injective if A is frame 6 5 Bruna et al. 2014: Signal Recovery from Pooling Representations 6 Shang et al. 2016: Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units Motivation Uniqueness Stability 4 / 20 U n i v e r s i t y o f B re m e n

  5. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Equality and Inequality Systems A | y > 0 x + b | y > 0 = y | y > 0 A | y =0 x + b | y =0 ≤ 0 . Consider the two cases N ( A | y > 0 ) = { 0 } and N ( A | y > 0 ) � = { 0 } A | y ≤ 0 ( P N ( A | y > 0 ) ⊥ x + P N ( A | y > 0 ) x ) + b | y ≤ 0 ≤ 0 Rewrite it into: Ax + b ≤ 0 Motivation Uniqueness Stability 5 / 20 U n i v e r s i t y o f B re m e n

  6. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Definition (Omnidirectional) A ∈ R m × n is called omnidirectional if ∃ ! x : Ax ≤ 0 . Corollary The following statements are equivalent: 1 A ∈ R m × n is omnidirectional. 2 Every linear open halfspace contains a row of A. 3 Ax ≤ 0 implies x = 0 , where x ∈ R n . Motivation Uniqueness Stability 6 / 20 U n i v e r s i t y o f B re m e n

  7. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Definition (Omnidirectional for point) A ∈ R m × n is called omnidirectional if ∃ ! x : Ax ≤ 0 . A ∈ R m × n and b ∈ R m is called omnidirectional for the point p ∈ R n if b = − Ap and A omnidirectional. p Motivation Uniqueness Stability 7 / 20 U n i v e r s i t y o f B re m e n

  8. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Theorem (Unique solutions of inequality system) Let Ax + b ≤ 0 have a solution x 0 . Then this solution is unique iff there exists an index set, I, for the rows s.t. ( A | I , b | I ) is omnidirectional for x 0 . Realistic? p Motivation Uniqueness Stability 8 / 20 U n i v e r s i t y o f B re m e n

  9. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Pre-Image finite or infinite? Motivation Uniqueness Stability 9 / 20 U n i v e r s i t y o f B re m e n

  10. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Pre-Image finite or infinite? Theorem (Convex hull) A ∈ R m × n is omnidirectional iff 0 ∈ Conv( A ) o , where Conv( A ) o is the interior of the convex hull, spanned by the rows of A. Motivation Uniqueness Stability 9 / 20 U n i v e r s i t y o f B re m e n

  11. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Singleton / Finite / Infinite Setup: 2 layer MLP on MNIST, (3500, 784) neurons 1 Count number of positive outputs ( > 784 singleton) 2 Projection onto Null-Space of equality system 3 Check for omnidirectionality via linear programming (convex hull as side-condition) Motivation Uniqueness Stability 10 / 20 U n i v e r s i t y o f B re m e n

  12. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Singleton / Finite / Infinite Setup: 2 layer MLP on MNIST, (3500, 784) neurons 1 Count number of positive outputs ( > 784 singleton) 2 Projection onto Null-Space of equality system 3 Check for omnidirectionality via linear programming (convex hull as side-condition) in-/finite Finite Infinite # (in-)finite pre-image 40 singleton 30 infinite 20 10 0 300 400 500 600 723 900 1 , 000 784 # positive outputs Motivation Uniqueness Stability 10 / 20 U n i v e r s i t y o f B re m e n

  13. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Stability - Locally Linear Theorem (Linear functions on convex polytopes 7 ) The input space R d of a ReLU network F is partitioned into convex polytopes P F , where for P ∈ P F F ( x ) = A P x + b P , ∀ x ∈ P . (1) 7 Raghu et al. 2017: On the Expressive Power of Deep Neural Networks Motivation Uniqueness Stability 11 / 20 U n i v e r s i t y o f B re m e n

  14. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Stability - Simplifications Assume: x ∈ P known (for reconstruction of x given a output z ∗ of the network F ) Analyze: Stability of linearization using singular values σ min , σ max : x , x ′ ∈ P ∩ N ( A P ) ⊥ σ min � x − x ′ � 2 ≤ � A P ( x − x ′ ) � 2 ≤ σ max � x − x ′ � 2 , Source: Raghu et al. 2017: On the Expressive Power of Deep Neural Networks Motivation Uniqueness Stability 12 / 20 U n i v e r s i t y o f B re m e n

  15. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Stability - ReLU as Diagonal Matrix Linearization A P of a network with L layers can be written as 8 � 1 , i �∈ I A P = A L D I L − 1 A L − 1 · · · D I 1 A 1 , where D ii = . 0 , i ∈ I → Removal of rows due to ReLU x • � � 8 Wang et al. 2016: Analysis of deep neural networks with extended data jacobian matrix Motivation Uniqueness Stability 13 / 20 U n i v e r s i t y o f B re m e n

  16. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Lemma (Removal of weakly correlated rows) A ∈ R m × n with rows a j and I ⊆ [ m ] . For a fixed k ∈ I let a k ∈ N ( D I A ) ⊥ . Moreover, let ∀ j �∈ I : |� a j , a k �| ≤ c � a k � 2 √ , M where M = m − | I | and constant c > 0 . Then for the singular values σ l � = 0 of D I A: 0 < σ K = min { σ l : σ l � = 0 } ≤ c x • � � Motivation Uniqueness Stability 14 / 20 U n i v e r s i t y o f B re m e n

  17. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Numerical Experiments Convolutional Networks (CNN) fit the theoretical framework Linearization via backpropagation w.r.t. input Full SVD for different layers/ samples (nonlinear!) Small CNN on CIFAR10 Type kernel size stride # feature maps # output units Conv layer (3,3) (1,1) 32 - Conv layer (3,3) (2,2) 64 - Conv layer (3,3) (1,1) 64 - Conv layer (3,3) (1,1) 32 - Conv layer (3,3) (1,1) 32 - Conv layer (3,3) (2,2) 64 - Dense layer - - - 512 Dense layer - - - 10 Motivation Uniqueness Stability 15 / 20 U n i v e r s i t y o f B re m e n

  18. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Effect of ReLU 10 2 10 1 10 0 Singular value 10 − 1 10 − 2 Layer 3 Layer 4 10 − 3 Layer 9 Layer 10 10 − 4 0 500 1000 1500 2000 2500 3000 3500 Index of singular value Motivation Uniqueness Stability 16 / 20 U n i v e r s i t y o f B re m e n

  19. Faculty 03 Center for Mathematics/ Computer Sciences Industrial Mathematics Decay over Layers 10 2 10 1 10 0 Singular value 10 − 1 Layer 1 Layer 2 10 − 2 Layer 3 Layer 4 10 − 3 Layer 5 Layer 6 10 − 4 0 500 1000 1500 2000 2500 3000 3500 Index of singular value Motivation Uniqueness Stability 17 / 20 U n i v e r s i t y o f B re m e n

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend