Unsupervised Learning and Inverse Problems with Deep Neural - - PowerPoint PPT Presentation

unsupervised learning and inverse problems with deep
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Learning and Inverse Problems with Deep Neural - - PowerPoint PPT Presentation

Unsupervised Learning and Inverse Problems with Deep Neural Networks J oan Bruna, Stphane Mallat, Ivan Dokmanic, Martin de Hoop cole Normale Suprieure www.di.ens.fr/data Deep Convolutional Networks ( x ) x ( u ) L j L 1 f M (


slide-1
SLIDE 1

Unsupervised Learning and Inverse Problems with Deep Neural Networks

Joan Bruna, Stéphane Mallat, Ivan Dokmanic, Martin de Hoop École Normale Supérieure

www.di.ens.fr/data

slide-2
SLIDE 2

Deep Convolutional Networks

x(u)

fM(x) Φ(x)

L1

ρ

Lj ρ ρ(u) is a scalar non-linearity: max(u, 0) or |u| or ... Part III Inverse problems

slide-3
SLIDE 3

Dimensionality Reduction Multiscale

u1 u2

Interactions de d variables x(u): pixels, particules, agents...

slide-4
SLIDE 4

Deep Convolutional Trees

Cascade of convolutions: no channel connections

x(u)

L1

ρ

Lj ρ

Φ(x)

y = ˜ f(x)

LJ

slide-5
SLIDE 5

rotated and dilated:

real parts imaginary parts

Scale separation with Wavelets

ψ2j,θ(u) = 2−j ψ(2−jrθu)

  • Wavelet transform:

: average : higher frequencies

Wx = ✓ x ? 2J(u) x ? 2j,θ(u) ◆

j≤J,θ

  • Wavelet filter ψ(u):

| ˆ ψλ(ω)|2

ω1

ω2

Preserves norm: Wx2 = x2 . x ? 2j,θ(u) = Z x(v) 2j,θ(u − v) dv

slide-6
SLIDE 6

20 21

|x ? 21,θ|

Fast Wavelet Filter Bank

|W1|

2J Scale

slide-7
SLIDE 7

20 22 2J

|x ? 22,θ|

|W1|

Scale 21

|x ? 21,θ|

|W1|

Wavelet Filter Bank

x(u) ρ(α) = |α|

|x ? 2j,θ|

slide-8
SLIDE 8

Wavelet Scattering Network

ρL1 ρL2 ρLJ ρ W1 ρ W2 ... ρ WJ x

ρ(α) = |α| Interactions across scales

|x ? λ1| x ? J

SJ = SJx = n |||x ? λ1|? λ2 ? ...| ? λm| ? J

  • λk

||x ? λ1| ? λ2| ? J

20 2J Scale 21

slide-9
SLIDE 9

= . . . |W3| |W2| |W1| x

SJx =       x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2| ? 2J |||x ? λ2| ? λ2| ? λ3| ? 2J ...      

λ1,λ2,λ3,...

preserves norms kSJxk = kxk kWkxk = kxk ) k|Wkx| |Wkx0|k  kx x0k Lemma : k[Wk, Dτ]k = kWkDτ DτWkk  C krτk∞ translations invariance and deformation stability: if Dτx(u) = x(u − τ(u)) then lim

J→∞ kSJDτx SJxk  C krτk∞ kxk

Scattering Properties

contractive kSJx SJyk  kx yk (L2 stability)

Theorem: For appropriate wavelets, a scattering is

slide-10
SLIDE 10

LeCun et. al.

Classification Errors Joan Bruna

Digit Classification: MNIST

SJx y = f(x) x Supervised Linear classifier

Invariants to specific deformations Separates different patterns Invariants to translations Linearises small deformations No learning

Training size

  • Conv. Net.

Scattering 50000 0.4% 0.4%

slide-11
SLIDE 11

Part II- Unsupervised Learning

Unsupervised learning: Approximate the probability distribution p(x) of X ∈ Rd given P realisations {xi}i≤P with potentially P = 1

slide-12
SLIDE 12

Stationary Processes

: stationary vector

SJX =       X ? 2J(t) |X ? λ1| ? 2J(t) ||X ? λ1| ? λ2| ? 2J(t) |||X ? λ2| ? λ2| ? λ3| ? 2J(t) ...      

λ1,λ2,λ3,...

slide-13
SLIDE 13

Ergodicity and Moments

E(SX) =       E(X) E(|X ? λ1|) E(||X ? λ1| ? λ2|) E(|||X ? λ2| ? λ2| ? λ3|) ...      

λ1,λ2,λ3,...

with ”weak” ergodicity conditions Central limit theorem

slide-14
SLIDE 14

Generation of Random Processes

  • Reconstruction: compute ˜

X which satisfies with random initialisation and gradient descent.

slide-15
SLIDE 15

Texture Reconstructions

Joan Bruna

Ising-critical Turbulence 2D

slide-16
SLIDE 16

Original Paper Cocktail Party

Representation of Audio Textures

Joan Bruna

60 20 40 60 20 40 60

Applauds Gaussian in time

t

ω

slide-17
SLIDE 17

Max Entropy Canonical Models

  • A representation Φ(x) = {φk(x)}k≤K with x ∈ Rd

µk = E(φkX) = Z φk(x) p(x) dx maximum entropy: H(p) = − R p(x) log p(x) dx ⇒ p(x) = Z−1 exp ⇣ − X

k

θkφk(x) ⌘

slide-18
SLIDE 18

Ergodic Microcanonical Model

Rd

slide-19
SLIDE 19

Uniform Distribution on Balls

  • Sphere in Rd

Φx = d−1/2kxk2 = ⇣ d−1

d

X

k=1

|x(k)|2⌘1/2 = µ

  • Simplex in Rd

Φx = d−1kxk1 = d−1

d

X

k=1

|x(k)| = µ

slide-20
SLIDE 20

Scattering Representation

Φx = n d−1 X

u

x(u) , d−1kx ? λ1k1 , d−1k|x ? λ1| ? λ2k1

  • Scattering coefficients of order 0, 1 and 2:
slide-21
SLIDE 21

Microcanonical Scattering

Rd

slide-22
SLIDE 22

˜ X

Scattering Approximations

X µ = E(ΦX)

If

slide-23
SLIDE 23

Ergodic Microcanonical Model

Rd

slide-24
SLIDE 24

Singular Ergodic Processes

slide-25
SLIDE 25

Scattering Ising

slide-26
SLIDE 26

Stochastic Geometry: Cox Process

slide-27
SLIDE 27

Non-Ergodic Mixture

Rd

slide-28
SLIDE 28

Non-Ergodic Microcanonical Mixture

Rd

slide-29
SLIDE 29

Scattering Multifractal Processes

  • Scattering coefficients of order 0, 1 and 2:
slide-30
SLIDE 30

Scat Ising at Critical Temperature

slide-31
SLIDE 31

Failures of Audio Synthesis

Original Time Scattering

  • J. Anden and V. Lostanlen
slide-32
SLIDE 32

Time-Frequency Translation Group

|x ? λ| ? J ||x ? λ| ? α ? β| ? J

t t

t t

log λ

t

Time-frequency wavelet convolutions

  • J. Anden and V. Lostanlen
slide-33
SLIDE 33

Joint Time-Frequency Scattering

Original Time Scattering Time/Freq Scattering

  • J. Anden and V. Lostanlen
slide-34
SLIDE 34

x(u) x1(u, k1) x2(u, k2) xJ(u, kJ) k1 k2

Part III- Supervised Learning

ρL1 ρLJ xj = ρ Lj xj−1 xj(u, kj) = ⇢ ⇣ X

k

xj−1(·, k) ? hkj,k(u) ⌘

sum across channels

classification

  • Lj is a linear combination of convolutions and subsampling:

What is the role of channel connections ?

slide-35
SLIDE 35

Environmental Sound Classification

MFCC audio descriptors 0,39 time scattering 0,27 ConvNet
 (Piczak, MLSP 2015) 0,26 time-frequency scattering 0,2

UrbanSound8k: 10 classes 8k training examples class-wise average error

air conditioner car horns children playing dog barks drilling engine at idle

  • J. Anden and V. Lostanlen

SJx y = f(x) x Supervised Linear classifier

No learning

slide-36
SLIDE 36

=     x ? 2J |x ? λ1| ? 2J ... |||x ? λ1| ? ..| ? λm| ? 2J    

λ1,...,λm

    ˜ x ? 2J |˜ x ? λ1| ? 2J ... |||˜ x ? λ1| ? ..| ? λm| ? 2J    

λ1,...,λm

SJ ˜ x = = SJx

  • Given SJx we want to compute ˜

x such that:

Inverse Scattering Transform

Joan Bruna then ˜ x is equal to x up to a translation.

  • If x(u) is a Dirac, or a straight edge or a sinusoid

We shall use m = 2.

slide-37
SLIDE 37

With a gradient descent algorithm: Original images of N 2 pixels:

Sparse Shape Reconstruction

Joan Bruna m = 2, 2J = N: reconstruction from O(log2

2 N) scattering coeff.

m = 1, 2J = N: reconstruction from O(log2 N) scattering coeff.

slide-38
SLIDE 38

2J = 16 2J = 32 2J = 64 2J = 128 = N Scattering Reconstruction N 2 pixels 1.4 N 2 coeff. 0.5 N 2 coeff.

Multiscale Scattering Reconstructions

Original Images

slide-39
SLIDE 39

III- Inverse Problems

  • Best Linear Method: Least Squares estimate (linear interpolation):

x y F ˆ y = (b Σ†

xb

Σxy)x

slide-40
SLIDE 40

Super-Resolution

  • Best Linear Method: Least Squares estimate (linear interpolation):
  • State-of-the-art Methods:

– Dictionary-learning Super-Resolution – CNN-based: Just train a CNN to regress from low-res to high- res. – They optimize cleverly a fundamentally unstable metric criterion: x y F ˆ y = (b Σ†

xb

Σxy)x Θ∗ = arg min

Θ

X

i

kF(xi, Θ) yik2 , ˆ y = F(x, Θ∗)

slide-41
SLIDE 41

Scattering Super-Resolution

F x y

SL,Jx =   x ? 2J(u) |x ? j1,k1| ? 2J(u) ||x ? j1,k1| ? j2,k2| ? 2J(u)  

L≤j1,j2≤J

SL,J x SL−α,J x

  • Linear estimation in the scattering domain
  • No phase estimation: potentially worst PSNR
  • Good image quality because of deformation stability
slide-42
SLIDE 42

Super-Resolution Results

Original 
 Linear Estimate Scattering state-of-the-art

  • J. Bruna, P. Sprechmann
slide-43
SLIDE 43

Super-Resolution Results

Original Best
 Linear Estimate Scattering
 Estimate state-of-the-art

  • J. Bruna, P. Sprechmann
slide-44
SLIDE 44

Super-Resolution Results

Original Best
 Linear Estimate Scattering
 Estimate state-of-the-art

  • J. Bruna, P. Sprechmann
slide-45
SLIDE 45

A

Super-Resolution Results

Original Original Low-Resolution Low-Resolution Scattering TV Regularization l1 Regularization Scattering

  • I. Dokmanic, J. Bruna, M. De Hoop
slide-46
SLIDE 46

Tomography Results

B C

Original Low-Resolution Scattering TV Regularization

  • I. Dokmanic, J. Bruna, M. De Hoop
slide-47
SLIDE 47

Conclusions

  • Deep convolutional networks have spectacular high-dimensional

and generic approximation capabilities.

  • New stochastic models of images for inverse problems.
  • Outstanding mathematical problem to understand deep nets:

– How to learn representations for inverse problems ?

Understanding Deep Convolutional Networks, arXiv 2016.