Constructive universal high-dimensional distribution generation - - PowerPoint PPT Presentation

constructive universal high dimensional distribution
SMART_READER_LITE
LIVE PREVIEW

Constructive universal high-dimensional distribution generation - - PowerPoint PPT Presentation

Constructive universal high-dimensional distribution generation through deep ReLU networks Dmytro Perekrestenko July 2020 joint work with Stephan M uller and Helmut B olcskei Motivation Deep neural networks are widely used as generative


slide-1
SLIDE 1

Constructive universal high-dimensional distribution generation through deep ReLU networks

Dmytro Perekrestenko

July 2020

joint work with Stephan M¨ uller and Helmut B¨

  • lcskei
slide-2
SLIDE 2

Motivation

Deep neural networks are widely used as generative models for complex data as images and natural language. Many generative network architectures are based on the transformation of low-dimensional distributions to high-dimensional ones, e.g., Variational Autoencoder, Wasserstein Autoencoder, etc. This talk answers the question of whether there exists a fundamental limitation in going from low dimension to a higher

  • ne.
slide-3
SLIDE 3

Our contribution

This talk will show that there is no such limitation.

slide-4
SLIDE 4

Generation of multi-dimensional distributions from U[0, 1]

Classical approaches - transforming distributions of the same dimension, e.g., the Box-Muller method [Box and Muller, 1958]. [Bailey and Telgarsky, 2018] show that deep ReLU networks can transport U[0, 1] to U[0, 1]d.

slide-5
SLIDE 5

Neural networks

A map Φ : RN0 → RNL given by Φ := WL ◦ ρ ◦ WL−1 ◦ ρ ◦ · · · ◦ ρ ◦ W1 is called a neural network (NN). Affine maps: Wℓ = Aℓx + bℓ : RNℓ−1 → RNℓ, ℓ ∈ {1, 2, . . . , L} Non-linearity or activation function: ρ acts component-wise Network connectivity: M(Φ) – total number of non-zero parameters in Wℓ Depth of network or number of layers: L(Φ) := L We denote by Nd,d′ the set of all ReLU networks with input dimension N0 = d and output dimension NL = d′.

slide-6
SLIDE 6

Histogram distributions

Histogram distribution E[0, 1]1

n,

d = 1, n = 5. Histogram distribution E[0, 1]2

n,

d = 2, n = 4.

slide-7
SLIDE 7

Our goal

Transport U[0, 1] to an approximation of any given distribution supported on [0, 1]d. For illustration purposes we look at d = 2.

slide-8
SLIDE 8

ReLU networks and histograms

Takeaway message For any histogram distribution there exists a ReLU net that generates it from a uniform input. This net realizes an inverse cumulative distribution function (cdf−1).

slide-9
SLIDE 9

The key ingredient to dimension increase

Sawtooth function g : [0, 1] → [0, 1], g(x) =

  • 2x,

if x < 1

2,

2(1 − x), if x ≥ 1

2,

let g1(x) = g(x), and define the “sawtooth” function of order s as the s-fold composition of g with itself according to gs := g ◦ g ◦ · · · ◦ g

  • s

, s ≥ 2. NN realize sawtooth as g(x) = 2ρ(x) − 4ρ(x − 1/2) + 2ρ(x − 1).

slide-10
SLIDE 10

Related work

Theorem ([Bailey and Telgarsky, 2018, Th. 2.1], case d = 2) There exists a ReLU network Φ : x → (x, gs(x)), Φ ∈ N1,d with connectivity M(Φ) ≤ Cs for some constant C > 0, and of depth L(Φ) ≤ s + 1, such that W(Φ#U[0, 1], U[0, 1]2) ≤ √ 2 2s . Main proof idea - space-filling property of sawtooth function.

slide-11
SLIDE 11

Generalization of the space-filling property

slide-12
SLIDE 12

Approximating 2D distributions

M : x → (x, f(gs(x)))

Generating a histogram distribution via the transport map (x, f(gs(x))). Left—the function f(x), center—f(g4(x)), right—a heatmap of the resulting histogram distribution.

slide-13
SLIDE 13

Approximating 2D distributions con’t

M : x →

  • fmarg(x),

n−1

  • i=0

fi(gs(nfmarg(x) − i))

  • Generating a general 2-D histogram distribution. Left—the function

f1 = f3, center—3

i=0 fi

  • g3
  • 4x − i)
  • , right—a heatmap of the resulting

histogram distribution. The function f0 = f2 is depicted on the left in Figure 3.

slide-14
SLIDE 14

Generating histogram distributions with NNs

Theorem For every distribution pX,Y (x, y) in E[0, 1]2

n, there exists a Ψ ∈ N1,2

with connectivity M(Ψ) ≤ C1n2 + C2ns, for some constants C1, C2 > 0, and of depth L(Ψ) ≤ s + 3, such that W(Φ#U[0, 1], pX,Y ) ≤ 2 √ 2 n2s . Error decays exponentially with depth and linearly in n Connectivity is in O(n2) which is of the same order as the number of E[0, 1]2

n’s parameters (n2 − 1).

Special case n = 1 coincides with [Bailey and Telgarsky, 2018,

  • Th. 2.1].
slide-15
SLIDE 15

Histogram approximation

Theorem Let pX,Y be a 2-dimensional Lipschitz-continuous pdf of finite differential entropy on its support [0, 1]2. Then, for every n > 0, there exists a ˜ pX,Y ∈ E[0, 1]2

n such that

W(pX,Y , ˜ pX,Y ) ≤ 1 2pX,Y − ˜ pX,Y L1([0,1]2) ≤ L √ 2 2n .

slide-16
SLIDE 16

Universal approximation

Theorem Let pX,Y be an L-Lipschitz continuous pdf supported on [0, 1]2. Then, for every n > 0, there exists a Φ ∈ N1,2 with connectivity M(Φ) ≤ C1n2 + C2ns for some constants C1, C2 > 0, and of depth L(Φ) ≤ s + 3, such that W(Φ#U[0, 1], pX,Y ) ≤ L √ 2 2n + 2 √ 2 n2s . Takeaway message ReLU networks have no fundamental limitation in going from low dimension to a higher one.

slide-17
SLIDE 17

References I

Bailey, B. and Telgarsky, M. J. (2018). Size-noise tradeoffs in generative networks. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, pages 6489–6499. Curran Associates, Inc. Box, G. E. P. and Muller, M. E. (1958). A note on the generation of random normal deviates.

  • Ann. Math. Statist., 29(2):610–611.