Relaxing Bijectivitiy Constraints with Continuously Indexed - - PowerPoint PPT Presentation

relaxing bijectivitiy constraints with continuously
SMART_READER_LITE
LIVE PREVIEW

Relaxing Bijectivitiy Constraints with Continuously Indexed - - PowerPoint PPT Presentation

. . . . . . . . . . . . . . Relaxing Bijectivitiy Constraints with Continuously Indexed Normalising Flows ICML 2020 Rob Cornish, Anthony Caterini, George Deligiannidis, Arnaud Doucet University of Oxford July 12-18, 2020


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Relaxing Bijectivitiy Constraints with Continuously Indexed Normalising Flows

ICML 2020 Rob Cornish, Anthony Caterini, George Deligiannidis, Arnaud Doucet

University of Oxford

July 12-18, 2020

University of Oxford Continuously Indexed Flows July 12-18, 2020 1 / 18

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Motivation

The following densities were learned using a Gaussian prior with a 10-layer Residual Flow [Chen et al., 2019] (.5M parameters) trained to convergence.

Figure 1: Darker regions indicate lower density. Data shown in black.

University of Oxford Continuously Indexed Flows July 12-18, 2020 2 / 18

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why Does This Occur?

Normalising Flows (NFs) defjne the following process: Z ∼ PZ, X := f(Z), where f is a difgeomorphism. Hence the support of X will share the same topological properties as the support

  • f Z, i.e.

Number of connected components Number of “holes” How they are “knotted” etc.

University of Oxford Continuously Indexed Flows July 12-18, 2020 3 / 18

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Why Does This Occur?

Normalising Flows (NFs) defjne the following process: Z ∼ PZ, X := f(Z), where f is a difgeomorphism. Hence the support of X will share the same topological properties as the support

  • f Z, i.e.

Number of connected components Number of “holes” How they are “knotted” etc.

University of Oxford Continuously Indexed Flows July 12-18, 2020 3 / 18

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Problem

This suggests a problem when the support of the prior PZ is simple (e.g. a Gaussian): we usually can’t then reproduce the target exactly. Moreover, to approximate the target closely, our fmow must approach non-invertibility.

University of Oxford Continuously Indexed Flows July 12-18, 2020 4 / 18

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Problem

This suggests a problem when the support of the prior PZ is simple (e.g. a Gaussian): we usually can’t then reproduce the target exactly. Moreover, to approximate the target closely, our fmow must approach non-invertibility.

University of Oxford Continuously Indexed Flows July 12-18, 2020 4 / 18

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Our Proposal: Continuously Indexed Flows

Continuously indexed fmows (CIFs) instead use the process Z ∼ PZ, U | Z ∼ PU|Z(· | Z), X := F(Z; U), where U is a continuous index variable, and each F(·; u) is a normalising fmow. Any existing normalising fmow can be used to construct F. A continuous index means the density of X is no longer tractable, but can be trained via a natural ELBO objective instead.

University of Oxford Continuously Indexed Flows July 12-18, 2020 5 / 18

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Our Proposal: Continuously Indexed Flows

Continuously indexed fmows (CIFs) instead use the process Z ∼ PZ, U | Z ∼ PU|Z(· | Z), X := F(Z; U), where U is a continuous index variable, and each F(·; u) is a normalising fmow. Any existing normalising fmow can be used to construct F. A continuous index means the density of X is no longer tractable, but can be trained via a natural ELBO objective instead.

University of Oxford Continuously Indexed Flows July 12-18, 2020 5 / 18

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Our Proposal: Continuously Indexed Flows

Continuously indexed fmows (CIFs) instead use the process Z ∼ PZ, U | Z ∼ PU|Z(· | Z), X := F(Z; U), where U is a continuous index variable, and each F(·; u) is a normalising fmow. Any existing normalising fmow can be used to construct F. A continuous index means the density of X is no longer tractable, but can be trained via a natural ELBO objective instead.

University of Oxford Continuously Indexed Flows July 12-18, 2020 5 / 18

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Benefjts

Intuitively, CIFs can “clean up” mass that would otherwise be misplaced by a single bijection.

Figure 2: 10-layer Residual Flow (top) and Continuously-Indexed Residual Flow (bottom). Both use .5M parameters.

University of Oxford Continuously Indexed Flows July 12-18, 2020 6 / 18

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Going Deeper

What happens when we model a complicated target using a normalising fmow? Theorem: If the prior Z has non-homeomorphic support to a target X⋆, then a sequence of fmows fn(Z) → X⋆ in distribution only if max { Lip fn, Lip f−1

n

} → ∞.

University of Oxford Continuously Indexed Flows July 12-18, 2020 7 / 18

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Going Deeper

What happens when we model a complicated target using a normalising fmow? Theorem: If the prior Z has non-homeomorphic support to a target X⋆, then a sequence of fmows fn(Z) → X⋆ in distribution only if max { Lip fn, Lip f−1

n

} → ∞.

University of Oxford Continuously Indexed Flows July 12-18, 2020 7 / 18

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Implications for Residual Flows

For residual fmows [Chen et al., 2019], max { Lip fn, Lip f−1

n

} ≤ max { 1 + κ, (1 − κ)−1}L < ∞, where κ ∈ (0, 1) is fjxed and L is the number of layers. Hence the previous theorem guarantees we cannot have fn(Z) → X⋆ in distribution regardless of training time, neural network size, etc.

University of Oxford Continuously Indexed Flows July 12-18, 2020 8 / 18

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Implications for Other Flows

For most other fmows, max { Lip fn, Lip f−1

n

} is unconstrained [Behrmann et al., 2020]. However, we can still only have fn(Z) = X⋆ exactly if the supports of Z and X⋆ are homeomorphic. It seems reasonable to hope for better performance if we can generalise our model class so that fn(Z) = X⋆ is at least possible.

University of Oxford Continuously Indexed Flows July 12-18, 2020 9 / 18

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Continuously Indexed Flows

Recap: Continuously-indexed fmows (CIFs) use the process Z ∼ PZ, U | Z ∼ PU|Z(· | Z), X := F(Z; U), where U is a continuous index variable, and each F(·; u) is a normalising fmow. This is compatible with all existing normalising fmows: take F z u f e

s u

z t u where f is a standard fmow.

University of Oxford Continuously Indexed Flows July 12-18, 2020 10 / 18

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Continuously Indexed Flows

Recap: Continuously-indexed fmows (CIFs) use the process Z ∼ PZ, U | Z ∼ PU|Z(· | Z), X := F(Z; U), where U is a continuous index variable, and each F(·; u) is a normalising fmow. This is compatible with all existing normalising fmows: take F(z; u) = f ( e−s(u) ⊙ z − t(u) ) . where f is a standard fmow.

University of Oxford Continuously Indexed Flows July 12-18, 2020 10 / 18

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-layer CIFs

An L-layer CIF is obtained by Z0 ∼ PZ0, U1 ∼ PU1|Z0(·|Z0), Z1 = F1(Z0; U1), · · · UL ∼ PUL|ZL−1(·|ZL−1), X = FL(ZL−1; UL).

  • −1

1 1 −1

...

  • Figure 3: Graphical multi-layer CIF generative model.

University of Oxford Continuously Indexed Flows July 12-18, 2020 11 / 18

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Training and inference

The marginal pX is intractable, but the joint pX,U1:L has a closed-form. Given an inference model qU1 L X, we can use the ELBO for training: x

u1 L qU1 L X x

pX U1 L x u1 L qU1 L X u1 L x pX x At test time, we can estimate pX x to arbitrary precision using an m-sample IWAE estimate with m 1.

University of Oxford Continuously Indexed Flows July 12-18, 2020 12 / 18

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Training and inference

The marginal pX is intractable, but the joint pX,U1:L has a closed-form. Given an inference model qU1:L|X, we can use the ELBO for training: L(x) := Eu1:L∼qU1:L|X(·|x) [ log pX,U1:L(x, u1:L) qU1:L|X(u1:L|x) ] ≤ log pX(x). At test time, we can estimate pX x to arbitrary precision using an m-sample IWAE estimate with m 1.

University of Oxford Continuously Indexed Flows July 12-18, 2020 12 / 18

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Training and inference

The marginal pX is intractable, but the joint pX,U1:L has a closed-form. Given an inference model qU1:L|X, we can use the ELBO for training: L(x) := Eu1:L∼qU1:L|X(·|x) [ log pX,U1:L(x, u1:L) qU1:L|X(u1:L|x) ] ≤ log pX(x). At test time, we can estimate log pX(x) to arbitrary precision using an m-sample IWAE estimate with m ≫ 1.

University of Oxford Continuously Indexed Flows July 12-18, 2020 12 / 18

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Inference model

To obtain an effjcient inference model qU1:L|X, we exploit the conditional independence structure of pU1:L|X from the forward model: ZL = X, UL ∼ qUL|ZL(·|ZL), ZL−1 = F−1

L (ZL; UL),

· · · U1 ∼ qU1|Z1(·|Z1), Z0 = F−1

1 (Z1; U1).

In other words qU1:L|X(U1:L|X) :=

L

ℓ=1

qUℓ|Zℓ(Uℓ|Zℓ). This naturally shares weights between the forward and inverse models, since the same F are used in both cases.

University of Oxford Continuously Indexed Flows July 12-18, 2020 13 / 18

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Inference model

To obtain an effjcient inference model qU1:L|X, we exploit the conditional independence structure of pU1:L|X from the forward model: ZL = X, UL ∼ qUL|ZL(·|ZL), ZL−1 = F−1

L (ZL; UL),

· · · U1 ∼ qU1|Z1(·|Z1), Z0 = F−1

1 (Z1; U1).

In other words qU1:L|X(U1:L|X) :=

L

ℓ=1

qUℓ|Zℓ(Uℓ|Zℓ). This naturally shares weights between the forward and inverse models, since the same Fℓ are used in both cases.

University of Oxford Continuously Indexed Flows July 12-18, 2020 13 / 18

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Intuition

Intuitively, the additional fmexibility afgorded by PU|Z allows a CIF to “clean up” mass that would be misplaced by a single bijection Proposition: Under mild conditions on the target and F, there exists PU Z such that the model X has the same support as the target. Proposition: If F z is surjective for each z, there exists PU Z such that X matches the target distribution exactly.

University of Oxford Continuously Indexed Flows July 12-18, 2020 14 / 18

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Intuition

Intuitively, the additional fmexibility afgorded by PU|Z allows a CIF to “clean up” mass that would be misplaced by a single bijection Proposition: Under mild conditions on the target and F, there exists PU|Z such that the model X has the same support as the target. Proposition: If F z is surjective for each z, there exists PU Z such that X matches the target distribution exactly.

University of Oxford Continuously Indexed Flows July 12-18, 2020 14 / 18

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Intuition

Intuitively, the additional fmexibility afgorded by PU|Z allows a CIF to “clean up” mass that would be misplaced by a single bijection Proposition: Under mild conditions on the target and F, there exists PU|Z such that the model X has the same support as the target. Proposition: If F(z; ·) is surjective for each z, there exists PU|Z such that X matches the target distribution exactly.

University of Oxford Continuously Indexed Flows July 12-18, 2020 14 / 18

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Comparison with related models

CIFs may be understood as a hybrid between standard normalising fmow and VAE density models: X Z U NF X U Z VAE X U Z CIF In all cases X = F(Z; U) for some family of bijections F

University of Oxford Continuously Indexed Flows July 12-18, 2020 15 / 18

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Experimental Results

Table 1: Test set bits per dimension. Lower is better. MNIST CIFAR-10 ResFlow (small) 1.074 3.474 ResFlow (big) 1.018 3.422 CIF-ResFlow 0.922 3.334

Note that these ResFlows were smaller than those used by Chen et al. [2019]. We obtained similar improvements on several other problems and fmow models

University of Oxford Continuously Indexed Flows July 12-18, 2020 16 / 18

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Experimental Results

Table 1: Test set bits per dimension. Lower is better. MNIST CIFAR-10 ResFlow (small) 1.074 3.474 ResFlow (big) 1.018 3.422 CIF-ResFlow 0.922 3.334

Note that these ResFlows were smaller than those used by Chen et al. [2019]. We obtained similar improvements on several other problems and fmow models

University of Oxford Continuously Indexed Flows July 12-18, 2020 16 / 18

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Experimental Results

Table 1: Test set bits per dimension. Lower is better. MNIST CIFAR-10 ResFlow (small) 1.074 3.474 ResFlow (big) 1.018 3.422 CIF-ResFlow 0.922 3.334

Note that these ResFlows were smaller than those used by Chen et al. [2019]. We obtained similar improvements on several other problems and fmow models

University of Oxford Continuously Indexed Flows July 12-18, 2020 16 / 18

slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Thank you!

Figure 4: Joint work with Anthony Caterini, George Deligiannidis, and Arnaud Doucet

University of Oxford Continuously Indexed Flows July 12-18, 2020 17 / 18

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

References

Rob Cornish, Anthony L Caterini, George Deligiannidis, and Arnaud Doucet. Relaxing bijectivity constraints with continuously-indexed normalising fmows. In International Conference on Machine Learning, 2020. Tian Qi Chen, Jens Behrmann, David K Duvenaud, and Jörn-Henrik Jacobsen. Residual fmows for invertible generative modeling. In Advances in Neural Information Processing Systems, pages 9913–9923, 2019. Jens Behrmann, Paul Vicol, Kuan-Chieh Wang, Roger B. Grosse, and Jörn-Henrik

  • Jacobsen. On the invertibility of invertible neural networks, 2020.

University of Oxford Continuously Indexed Flows July 12-18, 2020 18 / 18