Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky - - PowerPoint PPT Presentation

invertible residual networks
SMART_READER_LITE
LIVE PREVIEW

Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky - - PowerPoint PPT Presentation

Invertible Residual Networks Jens Behrmann * Will Grathwohl* Ricky T. Q. Chen David Duvenaud Jrn-Henrik Jacobsen* (*equal contribution) What are Invertible Neural Networks? Non-invertible Invertible Invertible Neural Networks (INNs) are


slide-1
SLIDE 1

Invertible Residual Networks

Jens Behrmann* Will Grathwohl* Ricky T. Q. Chen David Duvenaud Jörn-Henrik Jacobsen*

(*equal contribution)

slide-2
SLIDE 2

What are Invertible Neural Networks?

Invertible Neural Networks (INNs) are bijective function approximators which have a forward mapping and an inverse mapping

2

Invertible Residual Networks

Non-invertible Invertible

slide-3
SLIDE 3

Why Invertible Networks?

  • Mostly known because of Normalizing Flows

– Training via maximum-likelihood and evaluation of likelihood

Generated samples from GLOW (Kingma et al. 2018)

3

Invertible Residual Networks

slide-4
SLIDE 4

Workshop: Invertible Networks and Normalizing Flows

Why Invertible Networks?

  • Generative modeling via invertible mappings with exact

likelihoods (Dinh et al. 2014, Dinh et a. 2016, Kingma et al. 2018, Ho et al. 2019) – Normalizing Flows

  • Mutual information preservation
  • Analysis and regularization of invariance (Jacobsen et al. 2019)
  • Memory-efficient backprop (Gomez et al. 2017)
  • Analyzing inverse problems (Ardizzone et al. 2019)

4

Invertible Residual Networks

slide-5
SLIDE 5

Invertible Networks use Exotic Architectures

  • Dimension partitioning and coupling layers (Dinh et al. 2014/2016,

Gomez et al. 2017, Jacobsen et al. 2018, Kingma et al. 2018)

– Transforms one part of the input at a time – Choice of partitioning is important

5

Invertible Residual Networks

slide-6
SLIDE 6

Invertible Networks use Exotic Architectures

  • Dimension partitioning and coupling layers (Dinh et al. 2014/2016,

Gomez et al. 2017, Jacobsen et al. 2018, Kingma et al. 2018)

– Transforms one part of the input at a time – Choice of partitioning is important

  • Invertible dynamics via Neural ODEs (Chen et al. 2018, Grathwohl et al.

2019)

– Requires numerical integration – Hard to tune and often slow due to need of ODE-solver

6

Invertible Residual Networks

slide-7
SLIDE 7

Why do we move away from standard architectures?

  • Partitioning, coupling layers, ODE-based approaches move

further away from standard architectures – Many new design choices necessary and not well understood yet

  • Why not use most successful discriminative architecture?
  • Use connection of ResNet and Euler integration of ODEs

(Haber et al. 2018)

7

ResNets

Invertible Residual Networks

slide-8
SLIDE 8

Making ResNets invertible

Theorem (sufficient condition for invertible residual layer): Let be a residual layer, then it is invertible if where

8

Invertible Residual Networks

slide-9
SLIDE 9

Making ResNets invertible

Theorem (sufficient condition for invertible residual layer): Let be a residual layer, then it is invertible if where

Invertible Residual Networks (i-ResNet)

9

Invertible Residual Networks

slide-10
SLIDE 10

i-ResNets: Constructive Proof

Theorem: (invertible residual layer) Let be a residual layer, then it is invertible if Proof: Features: Fixed-point equation:

10

Invertible Residual Networks

slide-11
SLIDE 11

i-ResNets: Constructive Proof

Theorem: (invertible residual layer) Let be a residual layer, then it is invertible if Proof: Features: Fixed-point equation:  Use fixed-point iteration:

11

Invertible Residual Networks

slide-12
SLIDE 12

i-ResNets: Constructive Proof

Theorem: (invertible residual layer) Let be a residual layer, then it is invertible if Proof: Features: Fixed-point equation:  Use fixed-point iteration:

 Guaranteed convergence to x if g contractive (Banach fixed-point theorem)

12

Invertible Residual Networks

slide-13
SLIDE 13

Inverting i-ResNets

  • Inversion method from proof
  • Fixed-point iteration:

– Init: – Iteration:

13

Invertible Residual Networks

slide-14
SLIDE 14

Inverting i-ResNets

  • Inversion method from proof
  • Fixed-point iteration:

– Init: – Iteration:

  • Rate of convergence depends on

Lipschitz constant

  • In practice: cost of inverse is 5-10

forward passes

14

Invertible Residual Networks

slide-15
SLIDE 15

How to build i-ResNets

  • Satisfy Lip-condition: data-independent upper bound

15

Invertible Residual Networks

slide-16
SLIDE 16

How to build i-ResNets

  • Satisfy Lip-condition: data-independent upper bound
  • Spectral normalization (Miyato et al. 2018, Gouk et al. 2018)

approx of largest singular value via power-iteration

16

Invertible Residual Networks

slide-17
SLIDE 17

How to build i-ResNets

  • Satisfy Lip-condition: data-independent upper bound
  • Spectral normalization (Miyato et al. 2018, Gouk et al. 2018)

approx of largest singular value via power-iteration

17

Invertible Residual Networks

slide-18
SLIDE 18

Validation

  • Reconstructions

CIFAR10 Data Reconstructions: i-ResNet Reconstructions: standard ResNet

18

Invertible Residual Networks

slide-19
SLIDE 19

Classification Performance

  • Competetive performance
  • But what do we get additionally?

Generative models via Normalizing Flows

19

Invertible Residual Networks

slide-20
SLIDE 20

Maximum-Likelihood Generative Modeling with i-ResNets

  • We can define a simple generative

model as

Gaussian distribution Data distribution

20

Invertible Residual Networks

slide-21
SLIDE 21

Maximum-Likelihood Generative Modeling with i-ResNets

  • We can define a simple generative

model as

  • Maximization (and evaluation) of

likelihood via change-of-variables … if is invertible

Gaussian distribution Data distribution

21

Invertible Residual Networks

slide-22
SLIDE 22

Maximum-Likelihood Generative Modeling with i-ResNets

  • Maximization (and evaluation) of

likelihood via change-of-variables … if is invertible

  • Challenges:

– Flexible invertible models – Efficient computation of log-determinant

Gaussian distribution Data distribution

22

Invertible Residual Networks

slide-23
SLIDE 23

Efficient Estimation of Likelihood

  • Likelihood with log-determinant of Jacobian
  • Previous approaches:

– exact computation of log-determinant via constraining architecture to be triangular

(Dinh et al. 2016, Kingma et al. 2018)

– ODE-solver and estimation only of trace of Jacobian

(Grathwohl et al. 2019)

  • We propose an efficient estimator for i-ResNets based on

trace-estimation and truncation of a power series

23

Invertible Residual Networks

slide-24
SLIDE 24

Generative Modeling Results

Data Samples GLOW

24

Invertible Residual Networks

slide-25
SLIDE 25

Generative Modeling Results

Data Samples GLOW i-ResNets

25

Invertible Residual Networks

slide-26
SLIDE 26

Generative Modeling Results

GLOW (Kingma et al. 2018) FFJORD (Grathwohl et al. 2019) i-ResNet

26

Invertible Residual Networks

slide-27
SLIDE 27

i-ResNets Across Tasks

  • i-ResNet as an architecture which works well both in discriminative and

generative modeling

  • i-ResNets are generative models which use the best discriminative

architecture

  • Promising for:

– Unsupervised pre-training – Semi-supervised learning

27

Invertible Residual Networks

slide-28
SLIDE 28

Drawbacks

  • Iterative inverse

– Fast convergence in practice – Rate depends on Lip-constant and not on dimension

  • Requires estimation of log-determinant

– Due to free-form of Jacobian – Properties of i-ResNets allows to design efficient estimator

28

Invertible Residual Networks

slide-29
SLIDE 29

Conclusion

  • Simple modification makes ResNets invertible
  • Stability is guaranteed by construction
  • New class of likelihood-based generative models

– without structural constraints

  • Excellent performance in discriminative/ generative tasks

– with one unified architecture

  • Promising approach for:

– unsupervised pre-training – semi-supervised learning – tasks which require invertibility

29

Invertible Residual Networks

slide-30
SLIDE 30

See us at Poster #11 (Pacific Ballroom)

30

Invertible Residual Networks

Paper: Code:

Follow-up work: Residual Flows for Invertible Generative Modeling

Invertible Networks and Normalizing Flows, workshop on Saturday (contributed talk)