SLIDE 1
Unsupervised Learning and Inverse Problems with Deep Neural - - PowerPoint PPT Presentation
Unsupervised Learning and Inverse Problems with Deep Neural - - PowerPoint PPT Presentation
Unsupervised Learning and Inverse Problems with Deep Neural Networks J oan Bruna, Stphane Mallat, Ivan Dokmanic, Martin de Hoop cole Normale Suprieure www.di.ens.fr/data Deep Convolutional Networks ( x ) x ( u ) L j L 1 f M (
SLIDE 2
SLIDE 3
Dimensionality Reduction Multiscale
u1 u2
Interactions de d variables x(u): pixels, particules, agents...
SLIDE 4
Deep Convolutional Trees
Cascade of convolutions: no channel connections
x(u)
L1
ρ
Lj ρ
Φ(x)
y = ˜ f(x)
LJ
SLIDE 5
rotated and dilated:
real parts imaginary parts
Scale separation with Wavelets
ψ2j,θ(u) = 2−j ψ(2−jrθu)
- Wavelet transform:
: average : higher frequencies
Wx = ✓ x ? 2J(u) x ? 2j,θ(u) ◆
j≤J,θ
- Wavelet filter ψ(u):
| ˆ ψλ(ω)|2
ω1
ω2
Preserves norm: Wx2 = x2 . x ? 2j,θ(u) = Z x(v) 2j,θ(u − v) dv
SLIDE 6
20 21
|x ? 21,θ|
Fast Wavelet Filter Bank
|W1|
2J Scale
SLIDE 7
20 22 2J
|x ? 22,θ|
|W1|
Scale 21
|x ? 21,θ|
|W1|
Wavelet Filter Bank
x(u) ρ(α) = |α|
|x ? 2j,θ|
SLIDE 8
Wavelet Scattering Network
ρL1 ρL2 ρLJ ρ W1 ρ W2 ... ρ WJ x
ρ(α) = |α| Interactions across scales
|x ? λ1| x ? J
SJ = SJx = n |||x ? λ1|? λ2 ? ...| ? λm| ? J
- λk
||x ? λ1| ? λ2| ? J
20 2J Scale 21
SLIDE 9
= . . . |W3| |W2| |W1| x
SJx = x ? 2J |x ? λ1| ? 2J ||x ? λ1| ? λ2| ? 2J |||x ? λ2| ? λ2| ? λ3| ? 2J ...
λ1,λ2,λ3,...
preserves norms kSJxk = kxk kWkxk = kxk ) k|Wkx| |Wkx0|k kx x0k Lemma : k[Wk, Dτ]k = kWkDτ DτWkk C krτk∞ translations invariance and deformation stability: if Dτx(u) = x(u − τ(u)) then lim
J→∞ kSJDτx SJxk C krτk∞ kxk
Scattering Properties
contractive kSJx SJyk kx yk (L2 stability)
Theorem: For appropriate wavelets, a scattering is
SLIDE 10
LeCun et. al.
Classification Errors Joan Bruna
Digit Classification: MNIST
SJx y = f(x) x Supervised Linear classifier
Invariants to specific deformations Separates different patterns Invariants to translations Linearises small deformations No learning
Training size
- Conv. Net.
Scattering 50000 0.4% 0.4%
SLIDE 11
Part II- Unsupervised Learning
Unsupervised learning: Approximate the probability distribution p(x) of X ∈ Rd given P realisations {xi}i≤P with potentially P = 1
SLIDE 12
Stationary Processes
: stationary vector
SJX = X ? 2J(t) |X ? λ1| ? 2J(t) ||X ? λ1| ? λ2| ? 2J(t) |||X ? λ2| ? λ2| ? λ3| ? 2J(t) ...
λ1,λ2,λ3,...
SLIDE 13
Ergodicity and Moments
E(SX) = E(X) E(|X ? λ1|) E(||X ? λ1| ? λ2|) E(|||X ? λ2| ? λ2| ? λ3|) ...
λ1,λ2,λ3,...
with ”weak” ergodicity conditions Central limit theorem
SLIDE 14
Generation of Random Processes
- Reconstruction: compute ˜
X which satisfies with random initialisation and gradient descent.
SLIDE 15
Texture Reconstructions
Joan Bruna
Ising-critical Turbulence 2D
SLIDE 16
Original Paper Cocktail Party
Representation of Audio Textures
Joan Bruna
60 20 40 60 20 40 60
Applauds Gaussian in time
t
ω
SLIDE 17
Max Entropy Canonical Models
- A representation Φ(x) = {φk(x)}k≤K with x ∈ Rd
µk = E(φkX) = Z φk(x) p(x) dx maximum entropy: H(p) = − R p(x) log p(x) dx ⇒ p(x) = Z−1 exp ⇣ − X
k
θkφk(x) ⌘
SLIDE 18
Ergodic Microcanonical Model
Rd
SLIDE 19
Uniform Distribution on Balls
- Sphere in Rd
Φx = d−1/2kxk2 = ⇣ d−1
d
X
k=1
|x(k)|2⌘1/2 = µ
- Simplex in Rd
Φx = d−1kxk1 = d−1
d
X
k=1
|x(k)| = µ
SLIDE 20
Scattering Representation
Φx = n d−1 X
u
x(u) , d−1kx ? λ1k1 , d−1k|x ? λ1| ? λ2k1
- Scattering coefficients of order 0, 1 and 2:
SLIDE 21
Microcanonical Scattering
Rd
SLIDE 22
˜ X
Scattering Approximations
X µ = E(ΦX)
If
SLIDE 23
Ergodic Microcanonical Model
Rd
SLIDE 24
Singular Ergodic Processes
SLIDE 25
Scattering Ising
SLIDE 26
Stochastic Geometry: Cox Process
SLIDE 27
Non-Ergodic Mixture
Rd
SLIDE 28
Non-Ergodic Microcanonical Mixture
Rd
SLIDE 29
Scattering Multifractal Processes
- Scattering coefficients of order 0, 1 and 2:
SLIDE 30
Scat Ising at Critical Temperature
SLIDE 31
Failures of Audio Synthesis
Original Time Scattering
- J. Anden and V. Lostanlen
SLIDE 32
Time-Frequency Translation Group
|x ? λ| ? J ||x ? λ| ? α ? β| ? J
t t
t t
log λ
t
Time-frequency wavelet convolutions
- J. Anden and V. Lostanlen
SLIDE 33
Joint Time-Frequency Scattering
Original Time Scattering Time/Freq Scattering
- J. Anden and V. Lostanlen
SLIDE 34
x(u) x1(u, k1) x2(u, k2) xJ(u, kJ) k1 k2
Part III- Supervised Learning
ρL1 ρLJ xj = ρ Lj xj−1 xj(u, kj) = ⇢ ⇣ X
k
xj−1(·, k) ? hkj,k(u) ⌘
sum across channels
classification
- Lj is a linear combination of convolutions and subsampling:
What is the role of channel connections ?
SLIDE 35
Environmental Sound Classification
MFCC audio descriptors 0,39 time scattering 0,27 ConvNet (Piczak, MLSP 2015) 0,26 time-frequency scattering 0,2
UrbanSound8k: 10 classes 8k training examples class-wise average error
air conditioner car horns children playing dog barks drilling engine at idle
- J. Anden and V. Lostanlen
SJx y = f(x) x Supervised Linear classifier
No learning
SLIDE 36
= x ? 2J |x ? λ1| ? 2J ... |||x ? λ1| ? ..| ? λm| ? 2J
λ1,...,λm
˜ x ? 2J |˜ x ? λ1| ? 2J ... |||˜ x ? λ1| ? ..| ? λm| ? 2J
λ1,...,λm
SJ ˜ x = = SJx
- Given SJx we want to compute ˜
x such that:
Inverse Scattering Transform
Joan Bruna then ˜ x is equal to x up to a translation.
- If x(u) is a Dirac, or a straight edge or a sinusoid
We shall use m = 2.
SLIDE 37
With a gradient descent algorithm: Original images of N 2 pixels:
Sparse Shape Reconstruction
Joan Bruna m = 2, 2J = N: reconstruction from O(log2
2 N) scattering coeff.
m = 1, 2J = N: reconstruction from O(log2 N) scattering coeff.
SLIDE 38
2J = 16 2J = 32 2J = 64 2J = 128 = N Scattering Reconstruction N 2 pixels 1.4 N 2 coeff. 0.5 N 2 coeff.
Multiscale Scattering Reconstructions
Original Images
SLIDE 39
III- Inverse Problems
- Best Linear Method: Least Squares estimate (linear interpolation):
x y F ˆ y = (b Σ†
xb
Σxy)x
SLIDE 40
Super-Resolution
- Best Linear Method: Least Squares estimate (linear interpolation):
- State-of-the-art Methods:
– Dictionary-learning Super-Resolution – CNN-based: Just train a CNN to regress from low-res to high- res. – They optimize cleverly a fundamentally unstable metric criterion: x y F ˆ y = (b Σ†
xb
Σxy)x Θ∗ = arg min
Θ
X
i
kF(xi, Θ) yik2 , ˆ y = F(x, Θ∗)
SLIDE 41
Scattering Super-Resolution
F x y
SL,Jx = x ? 2J(u) |x ? j1,k1| ? 2J(u) ||x ? j1,k1| ? j2,k2| ? 2J(u)
L≤j1,j2≤J
SL,J x SL−α,J x
- Linear estimation in the scattering domain
- No phase estimation: potentially worst PSNR
- Good image quality because of deformation stability
SLIDE 42
Super-Resolution Results
Original Linear Estimate Scattering state-of-the-art
- J. Bruna, P. Sprechmann
SLIDE 43
Super-Resolution Results
Original Best Linear Estimate Scattering Estimate state-of-the-art
- J. Bruna, P. Sprechmann
SLIDE 44
Super-Resolution Results
Original Best Linear Estimate Scattering Estimate state-of-the-art
- J. Bruna, P. Sprechmann
SLIDE 45
A
Super-Resolution Results
Original Original Low-Resolution Low-Resolution Scattering TV Regularization l1 Regularization Scattering
- I. Dokmanic, J. Bruna, M. De Hoop
SLIDE 46
Tomography Results
B C
Original Low-Resolution Scattering TV Regularization
- I. Dokmanic, J. Bruna, M. De Hoop
SLIDE 47
Conclusions
- Deep convolutional networks have spectacular high-dimensional
and generic approximation capabilities.
- New stochastic models of images for inverse problems.
- Outstanding mathematical problem to understand deep nets: