SLIDE 1 Scattering Bricks to Build
- Invariants
- Joan Bruna, Joakim Anden, Stéphane Mallat
Laurent Sifre, Irène Waldspurger
SLIDE 2
- Considerable variability in each class: not low-dimensional
- Euclidean distances are meaningless on raw data
- Need to find Informative Invariants.
Anchor Joshua Tree Beaver Lotus Water Lily
High Dimensional Classification
CalTech 101
SLIDE 3
- Analysis in high dimension: x ∈ Rd with d ≥ 106.
Curse of Dimensionality
- Points are far away in high dimensions d:
SLIDE 4
- Analysis in high dimension: x ∈ Rd with d ≥ 106.
Curse of Dimensionality
- Points are far away in high dimensions d:
- 10 points cover [0, 1] at a distance 10−1
SLIDE 5
- Analysis in high dimension: x ∈ Rd with d ≥ 106.
Curse of Dimensionality
- Points are far away in high dimensions d:
- 10 points cover [0, 1] at a distance 10−1
- 100 points for [0, 1]2
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
SLIDE 6
- Analysis in high dimension: x ∈ Rd with d ≥ 106.
- need 10d points over [0, 1]d
impossible if d ≥ 20
Curse of Dimensionality
- Points are far away in high dimensions d:
- 10 points cover [0, 1] at a distance 10−1
- 100 points for [0, 1]2
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
SLIDE 7
- Analysis in high dimension: x ∈ Rd with d ≥ 106.
- need 10d points over [0, 1]d
impossible if d ≥ 20
Curse of Dimensionality
- Points are far away in high dimensions d:
- 10 points cover [0, 1] at a distance 10−1
- 100 points for [0, 1]2
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
points are concentrated in 2d corners!
lim
d→∞
volume sphere of radius r volume [0, r]d = 0
SLIDE 8
- Analysis in high dimension: x ∈ Rd with d ≥ 106.
- need 10d points over [0, 1]d
impossible if d ≥ 20 ⇒ Euclidean metrics are not appropriate on raw data.
Curse of Dimensionality
- Points are far away in high dimensions d:
- 10 points cover [0, 1] at a distance 10−1
- 100 points for [0, 1]2
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
- o o o o o o o o o
points are concentrated in 2d corners!
lim
d→∞
volume sphere of radius r volume [0, r]d = 0
SLIDE 9
Euclidean metric
Kernel Learning
hΦx, wi = X
n
wn Φnx For any two classes Ck and Cl finds w so that is nearly invariant and different in any Ck and Cl.
w Ck Cl
Φx = {Φnx}n
Representation
x Φ hΦx, wi ≥ T
class ?
Supervised Linear classification
SLIDE 10 Euclidean metric
Kernel Learning
hΦx, wi = X
n
wn Φnx For any two classes Ck and Cl finds w so that is nearly invariant and different in any Ck and Cl.
w Ck Cl
Φx = {Φnx}n
Representation
x Φ hΦx, wi ≥ T
class ?
Supervised Linear classification
SLIDE 11 Deep Neural Networks
Hierarchical invariance
Linear Classifier
Pooling Pooling
Wavelets
d
x
Linear
W1
Non Linear
ρ W2 ρ
Φ(x)
...
y
Hinton, LeCun
SLIDE 12 Deep Neural Networks
Hierarchical invariance
Linear Classifier
Pooling Pooling
Wavelets
d
x
Linear
W1
Non Linear
ρ W2 ρ
Φ(x)
...
y
- Role of reconstruction ?
- Why wavelets ?
- Role of sparsity ?
- How to do Unsupervised Learning ?
- What non-linearities ?
- Why cascading ?
Hinton, LeCun
SLIDE 13 Invariance to Translations Two dimensional group: R2
Translations and Deformations
- Patterns are translated and deformed
SLIDE 14 Deformations are actions of diffeomorphisms: infinite group. Each digit is invariant to a specific set of small deformations Invariance to Translations Two dimensional group: R2
Translations and Deformations
- Patterns are translated and deformed
SLIDE 15 Deformations are actions of diffeomorphisms: infinite group. Each digit is invariant to a specific set of small deformations Invariance to Translations Two dimensional group: R2
Translations and Deformations
- Patterns are translated and deformed
- Textures are stationary (translation invariant) processes
SLIDE 16 Deformations are actions of diffeomorphisms: infinite group. Each digit is invariant to a specific set of small deformations Invariance to Translations Two dimensional group: R2
Translations and Deformations
- Patterns are translated and deformed
- Textures are stationary (translation invariant) processes
with deformations
SLIDE 17 Translation orbits (two-dimensional)
- Specific deformation invariance must be learned.
Translation Invariance
SLIDE 18 Translation orbits (two-dimensional)
- Specific deformation invariance must be learned.
Translation Invariance
SLIDE 19 Translation orbits (two-dimensional)
- Specific deformation invariance must be learned.
Translation Invariance
Invariant to translations
Φ
SLIDE 20 Translation orbits (two-dimensional)
- Specific deformation invariance must be learned.
Translation Invariance
Deformation orbits (high dimensional) Invariant to translations
Φ
SLIDE 21 Translation orbits (two-dimensional)
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Deformation orbits (high dimensional) Invariant to translations
Φ
SLIDE 22 Translation orbits (two-dimensional) Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Deformation orbits (high dimensional) Invariant to translations
Φ
SLIDE 23 Translation orbits (two-dimensional)
PV⊥
k
nearly invariant to deformations
Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Deformation orbits (high dimensional) Invariant to translations
Φ
SLIDE 24 Translation orbits (two-dimensional)
PV⊥
k
nearly invariant to deformations
Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Invariant to translations
Φ
SLIDE 25 Translation orbits (two-dimensional)
PV⊥
k
nearly invariant to deformations
Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Invariant to translations
Φ
Discriminant
SLIDE 26 Translation orbits (two-dimensional)
PV⊥
k
nearly invariant to deformations
Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Invariant to translations
Φ
Discriminant
SLIDE 27 Translation orbits (two-dimensional)
PV⊥
k
nearly invariant to deformations
Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Invariant to translations
Φ
Discriminant
SLIDE 28 Translation orbits (two-dimensional)
PV⊥
k
nearly invariant to deformations
Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Invariant to translations
Φ
Discriminant
SLIDE 29 Translation orbits (two-dimensional)
PV⊥
k
nearly invariant to deformations
Supervised learning:
- Specific deformation invariance must be learned.
”Linearizes” deformations
Translation Invariance
Invariant to translations
Φ
Discriminant
SLIDE 30
- Invariance to translations xc(t) = x(t − c)
Stable Translation Invariants
∀c ∈ R , Φ(xc) = Φ(x) .
SLIDE 31
- Invariance to translations xc(t) = x(t − c)
Stable Translation Invariants
∀c ∈ R , Φ(xc) = Φ(x) .
x(t) xc(t)
SLIDE 32
- Invariance to translations xc(t) = x(t − c)
Stable Translation Invariants
∀c ∈ R , Φ(xc) = Φ(x) .
Φ(x) Φ(xc) : registration
SLIDE 33
- Invariance to translations xc(t) = x(t − c)
Stable Translation Invariants
∀c ∈ R , Φ(xc) = Φ(x) .
: Fourier Modulus Φ(x) = |ˆ x(ω)| Φ(xc) = |ˆ xc(ω)|
ω ω
SLIDE 34
- Invariance to translations xc(t) = x(t − c)
- Lipschitz stable to deformations xτ(t) = x(t − τ(t))
Stable Translation Invariants
∀c ∈ R , Φ(xc) = Φ(x) .
x(t) xτ(t)
SLIDE 35
- Invariance to translations xc(t) = x(t − c)
- Lipschitz stable to deformations xτ(t) = x(t − τ(t))
Stable Translation Invariants
small deformations of x = ⇒ small modifications of Φ(x) ∀c ∈ R , Φ(xc) = Φ(x) .
x(t) xτ(t) deformation size
⇤τ , ⇧Φ(xτ) Φ(x)⇧ ⇥ C sup
t |⌃τ(t)| ⇧x⇧ .
SLIDE 36
- Invariance to translations xc(t) = x(t − c)
Not stable
- Lipschitz stable to deformations xτ(t) = x(t − τ(t))
Stable Translation Invariants
small deformations of x = ⇒ small modifications of Φ(x) ∀c ∈ R , Φ(xc) = Φ(x) .
Φ(x) Φ(xτ) kΦ(x) Φ(xτ)k sup
t |τ 0(t)| kxk
deformation size
⇤τ , ⇧Φ(xτ) Φ(x)⇧ ⇥ C sup
t |⌃τ(t)| ⇧x⇧ .
SLIDE 37
- Invariance to translations xc(t) = x(t − c)
Fourier invariants are not stable either. Not stable
- Lipschitz stable to deformations xτ(t) = x(t − τ(t))
Stable Translation Invariants
small deformations of x = ⇒ small modifications of Φ(x) ∀c ∈ R , Φ(xc) = Φ(x) .
Φ(x) Φ(xτ) kΦ(x) Φ(xτ)k sup
t |τ 0(t)| kxk
deformation size
⇤τ , ⇧Φ(xτ) Φ(x)⇧ ⇥ C sup
t |⌃τ(t)| ⇧x⇧ .
SLIDE 38
- Dilated:
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t)
ψλ(t) = 2−j ψ(2−jt) with λ = 2−j .
Wavelet Transform
| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)
SLIDE 39
- Dilated:
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t)
x ? λ(t) = Z x(u) λ(t − u) du ψλ(t) = 2−j ψ(2−jt) with λ = 2−j .
Wavelet Transform
| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)
Wx = ✓ x ? (t) x ? λ(t) ◆
t,λ
ˆ x(ω)
SLIDE 40
Unitary: Wx2 = x2 .
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t)
x ? λ(t) = Z x(u) λ(t − u) du ψλ(t) = 2−j ψ(2−jt) with λ = 2−j .
Wavelet Transform
| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)
Wx = ✓ x ? (t) x ? λ(t) ◆
t,λ
ˆ x(ω)
SLIDE 41 rotated and dilated:
real parts imaginary parts
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)
ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)
Image Wavelet Transform
SLIDE 42 rotated and dilated:
real parts imaginary parts
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)
ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)
Image Wavelet Transform
| ˆ ψλ(ω)|2
ω1
ω2
SLIDE 43 rotated and dilated:
real parts imaginary parts
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)
ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)
Image Wavelet Transform
| ˆ ψλ(ω)|2
ω1
ω2
SLIDE 44 rotated and dilated:
real parts imaginary parts
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)
ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)
Image Wavelet Transform
| ˆ ψλ(ω)|2
ω1
ω2
SLIDE 45 rotated and dilated:
real parts imaginary parts
- Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)
ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)
Image Wavelet Transform
Wx = ✓ x ? (t) x ? λ(t) ◆
t,λ
Unitary: Wx2 = x2 .
| ˆ ψλ(ω)|2
ω1
ω2
SLIDE 46 x ? λ1(t) = x ? a
λ1(t) + i x ? b λ1(t)
Wavelet Translation Invariance
SLIDE 47
- The modulus |x ? λ1| is a regular envelop
Wavelet Translation Invariance
pooling
|x ? λ1(t)| = q |x ? a
λ1(t)|2 + |x ? b λ1(t)|2
SLIDE 48
- The modulus |x ? λ1| is a regular envelop
|x ? λ1| ? (t)
- The average |x ? λ1| ? (t) is invariant to small translations
relatively to the support of φ.
Wavelet Translation Invariance
SLIDE 49
- The modulus |x ? λ1| is a regular envelop
|x ? λ1| ? (t)
- The average |x ? λ1| ? (t) is invariant to small translations
relatively to the support of φ. lim
φ→1 |x ? λ1| ? (t) =
Z |x ? λ1(u)| du = kx ? λ1k1
Wavelet Translation Invariance
SLIDE 50
|x ? λ1|
Recovering Lost Information
|x ⇤⇥ λ1| ⇤
SLIDE 51 |x ? λ1|
- The high frequencies of |x ? λ1| are in wavelet coefficients:
W|x ? λ1| = ✓ |x ? λ1| ? (t) |x ? λ1| ? λ2(t) ◆
t,λ2
Recovering Lost Information
|x ⇤⇥ λ1| ⇤
SLIDE 52 |x ? λ1|
- The high frequencies of |x ? λ1| are in wavelet coefficients:
W|x ? λ1| = ✓ |x ? λ1| ? (t) |x ? λ1| ? λ2(t) ◆
t,λ2
Recovering Lost Information
∀1 , 2 , | | x ? λ1| ? λ2| ? (t)
- Translation invariance by time averaging the amplitude:
|x ⇤⇥ λ1| ⇤
SLIDE 53
x
Deep Convolution Network
SLIDE 54
x x ? |x ? λ1|
|W1|
Deep Convolution Network
SLIDE 55
x x ? |x ? λ1| ? ||x ? λ1| ? λ2| |x ? λ1|
|W1| |W2|
Deep Convolution Network
SLIDE 56
x x ? |x ? λ1| ? ||x ? λ1| ? λ2| ||x ? λ1| ? λ2| ? |||x ? λ1| ? λ2| ? λ3| |x ? λ1|
|W1| |W2| |W3|
Deep Convolution Network
SLIDE 57
Sx = x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...
u,λ1,λ2,λ3,...
Network ouptut:
Scattering Vector
SLIDE 58 log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75
18 Hz
Amplitude Modulation
log(λ1) t log(λ1) t t log(λ2)
|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)
SLIDE 59 log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75
18 Hz
Amplitude Modulation
log(λ1) t log(λ1) t t log(λ2)
|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)
SLIDE 60 log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75
18 Hz
Amplitude Modulation
log(λ1) t log(λ1) t t log(λ2)
|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)
SLIDE 61 log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75
18 Hz
Amplitude Modulation
1977 Hz
log(λ1) t log(λ1) t t log(λ2)
|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)
SLIDE 62
Cascade of Contractions
x
|W1| |W2| |W3|
SLIDE 63
Cascade of Contractions
x
|W1| |W2| |W3|
x ? |x ? λ1| ? ||x ? λ1| ? λ2| ?
SLIDE 64
- Cascade of contractive operators
⇤|Wk|x |Wk|x0⇤ ⇥ ⇤x x0⇤
Cascade of Contractions
x
|W1| |W2| |W3|
x ? |x ? λ1| ? ||x ? λ1| ? λ2| ?
SLIDE 65
- Cascade of contractive operators
⇤|Wk|x |Wk|x0⇤ ⇥ ⇤x x0⇤ with |Wk|x = x .
Cascade of Contractions
x
|W1| |W2| |W3|
x ? |x ? λ1| ? ||x ? λ1| ? λ2| ?
SLIDE 66 Sx = x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...
u,λ1,λ2,λ3,...
contractive kSx Syk kx yk preserves norms kSxk = kxk
Scattering Properties
Theorem: For appropriate wavelets, a scattering is
SLIDE 67 Sx = x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...
u,λ1,λ2,λ3,...
contractive kSx Syk kx yk preserves norms kSxk = kxk stable to deformations xτ(t) = x(t − τ(t)) kSx Sxτk C sup
t |rτ(t)| kxk
Scattering Properties
Theorem: For appropriate wavelets, a scattering is
SLIDE 68 Sx = x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...
u,λ1,λ2,λ3,...
contractive kSx Syk kx yk preserves norms kSxk = kxk stable to deformations xτ(t) = x(t − τ(t)) kSx Sxτk C sup
t |rτ(t)| kxk
⇒ linear discriminative classification from Φx = Sx
Scattering Properties
Theorem: For appropriate wavelets, a scattering is
SLIDE 69
X1
X2
computed with PCA. MNIST data basis:
Linearized Classification Joan Bruna
SLIDE 70
- Each class Xk is represented by a scattering centroid E(SXk)
Affine space model Ak = E(SXk) + Vk. X1
X2
S computed with PCA. MNIST data basis:
Linearized Classification
A1 A2 E(SX1) E(SX2)
Joan Bruna
SLIDE 71
- Each class Xk is represented by a scattering centroid E(SXk)
Affine space model Ak = E(SXk) + Vk. X1
X2
S computed with PCA. MNIST data basis:
Linearized Classification
A1 A2 E(SX1) E(SX2)
x
Joan Bruna
SLIDE 72
- Each class Xk is represented by a scattering centroid E(SXk)
Affine space model Ak = E(SXk) + Vk. X1
X2
S computed with PCA. MNIST data basis:
Linearized Classification
A1 A2 E(SX1) E(SX2)
x
Sx
Joan Bruna
SLIDE 73
- Each class Xk is represented by a scattering centroid E(SXk)
Affine space model Ak = E(SXk) + Vk. X1
X2
S computed with PCA. MNIST data basis:
Linearized Classification
A1 A2 E(SX1) E(SX2)
x
Sx
Joan Bruna
SLIDE 74 SX = E(X) E(|X ? λ1|) E(||X ? λ1| ? λ2|) E(|||X ? λ2| ? λ2| ? λ3|) ...
λ1,λ2,λ3,...
SX(t) = X ? (t) |X ? λ1| ? (t) ||X ? λ1| ? λ2| ? (t) |||X ? λ2| ? λ2| ? λ3| ? (t) ...
λ1,λ2,λ3,...
The scattering transform of a stationary process X(t) is an estimator of the expected scattering of X(t)
Scattering Moments
SLIDE 75
x(t): stationary process
Textures with Same Spectrum
Textures Power Spectrum Fourier
ω1 ω2 ω1 ω2
x(t)
SLIDE 76
x(t): stationary process
Textures with Same Spectrum
Textures window size = image size Wavelet Scattering Power Spectrum Fourier
ω1 ω2 ω1 ω2
x(t)
|x ? λ1| ?
SLIDE 77
x(t): stationary process
Textures with Same Spectrum
Textures window size = image size Wavelet Scattering Power Spectrum Fourier
ω1 ω2 ω1 ω2
x(t)
|x ? λ1| ? ||x ? λ1| ? λ2| ?
SLIDE 78 log(1) t First−order windowed scattering (small scale) log(1) t First−order windowed scattering (large scale) log(2) t Second−order windowed scattering (large scale) Band #51
Spectrum
X: stationary process
Sounds with Same Spectrum
ω
2s window
Fourier
|x ⇥ λ1|(t)
|x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = 2000
log(λ2) log(λ1) log(λ1) t
SLIDE 79 log(1) t First−order windowed scattering (small scale) log(1) t First−order windowed scattering (large scale) log(2) t Second−order windowed scattering (large scale) Band #51
Spectrum
X: stationary process
Sounds with Same Spectrum
ω
2s window
Fourier
|x ⇥ λ1|(t)
|x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = 2000
log(λ2) log(λ1) log(λ1) t
SLIDE 80 SX = E(X) = E(U0X) E(|X ⇥ λ1|) = E(U1X) E(||X ⇥ λ1| ⇥ λ2|) = E(U2X) E(|||X ⇥ λ2| ⇥ λ2| ⇥ λ3|) = E(U3X) ...
λ1,λ2,λ3,...
p(x) = 1 Z exp ⇣
∞
X
m=1
λm . Umx ⌘ and maximizes the entropy − R p(x) log p(x) dx can be written: Theorem (Boltzmann) The distribution p(x) which satisfies
- An expected scattering is a non-complete representation
Z
RN Umx p(x) dx = E(UmX)
Representation of Random Processes
SLIDE 81
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 82
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 83
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 84
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 85
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 86
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 87
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 88
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 89
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 90
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 91
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 92
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 93
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 94
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 95
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 96
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 97
- Maximum entropy estimation of X(t) :
- Gaussian model from N power spectrum coefficients.
- Scattering model from (log2 N)2/2 1st & 2nd orders.
Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party Not good for everything: learn from mistakes.
Synthesis from Second Order
Joan Bruna
Joakim Anden
SLIDE 98 Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0.2 %
Classification of Textures
CUREt database 61 classes
y x Sx Supervised Linear Classifier: PCA/SVM
SLIDE 99 Wavelet Transform on a Group
(r, t) . x(u) = x(r−1(u − t))
- Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}
Laurent Sifre
SLIDE 100
X ~ φ(g) = Z
G
X(g0) φ(g
01g) dg0
Wavelet Transform on a Group
(r, t) . x(u) = x(r−1(u − t))
- Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}
Laurent Sifre
SLIDE 101
X ~ φ(g) = Z
G
X(g0) φ(g
01g) dg0
W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆
λ2,g
.
Wavelet Transform on a Group
(r, t) . x(u) = x(r−1(u − t))
- Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}
Laurent Sifre
SLIDE 102
X ~ φ(g) = Z
G
X(g0) φ(g
01g) dg0
W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆
λ2,g
. |W1|
Wavelet Transform on a Group
x x ? (t) |x ? 2jr(t)|
translation = Xj(r, t) (r, t) . x(u) = x(r−1(u − t))
- Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}
Laurent Sifre
SLIDE 103 Xj ~ φ(r, t)
|Xj ~ ψλ2(r, t)|
X ~ φ(g) = Z
G
X(g0) φ(g
01g) dg0
W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆
λ2,g
. |W1| |W2|
Wavelet Transform on a Group
x x ? (t) |x ? 2jr(t)|
translation roto-translation = Xj(r, t) (r, t) . x(u) = x(r−1(u − t))
- Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}
Laurent Sifre
SLIDE 104 X ~ φ(2j, r, t)
|X ~ ψλ2(2j, r, t)|
X ~ φ(g) = Z
G
X(g0) φ(g
01g) dg0
W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆
λ2,g
. |W1| |W2|
Wavelet Transform on a Group
x x ? (t) |x ? 2jr(t)|
translation
scalo-roto-translation = X(2j, r, t)
+ renormalization
(r, t) . x(u) = x(r−1(u − t))
- Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}
Laurent Sifre
SLIDE 105
UIUC database: 25 classes Scattering classification errors Training Translation Transl + Rotation + Scaling 20 20 % 2% 0.6%
Rotation and Scaling Invariance
Laurent Sifre
SLIDE 106 p(x)
- estimate a representation of ΦX of p(x)
- model the {xi}i as realization of a random vector X ∈ Rd
- Unsupervised learning of Φ from unlabeled examples {xi}:
Unsupervised Learning
x ∈ Rd Φx ∈ RD
Representation
Φ
Unsupervised Learning
SLIDE 107
- Classical approaches: clustering and Gaussian mixture models
decomposition in ellipsoids not feasible in high dimensions p(x)
- estimate a representation of ΦX of p(x)
- model the {xi}i as realization of a random vector X ∈ Rd
- Unsupervised learning of Φ from unlabeled examples {xi}:
Unsupervised Learning
x ∈ Rd Φx ∈ RD
Representation
Φ
Unsupervised Learning
SLIDE 108
- Classical approaches: clustering and Gaussian mixture models
decomposition in ellipsoids not feasible in high dimensions p(x)
- estimate a representation of ΦX of p(x)
- model the {xi}i as realization of a random vector X ∈ Rd
- Unsupervised learning of Φ from unlabeled examples {xi}:
Unsupervised Learning
x ∈ Rd Φx ∈ RD
Representation
Φ
Unsupervised Learning
SLIDE 109
Generalized Scattering
X0
SLIDE 110
X1 = |W1(X0 − E(X0)| with W ∗
1 W1 = Id
Generalized Scattering
X0 E(X0) E
+ +
−
|W1| X1
SLIDE 111
X1 = |W1(X0 − E(X0)| with W ∗
1 W1 = Id
∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
mWm = Id
Generalized Scattering
X0 E(X0) E
+ +
−
|W1| X1 E(X1) E
+ +
−
X2 |W2|
SLIDE 112
- Expected scattering transform:
SX = {E(Xm)}m∈N X1 = |W1(X0 − E(X0)| with W ∗
1 W1 = Id
∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
mWm = Id
Generalized Scattering
X0 E(X0) E
+ +
−
|W1| X1 E(X1) E
+ +
−
X2 |W2| E(X2) E
+ +
−
|W3| X3
SLIDE 113
- Expected scattering transform:
SX = {E(Xm)}m∈N represents the probability density of X. X1 = |W1(X0 − E(X0)| with W ∗
1 W1 = Id
∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
mWm = Id
Generalized Scattering
X0 E(X0) E
+ +
−
|W1| X1 E(X1) E
+ +
−
X2 |W2| E(X2) E
+ +
−
|W3| X3
SLIDE 114
- Expected scattering transform:
SX = {E(Xm)}m∈N represents the probability density of X. X1 = |W1(X0 − E(X0)| with W ∗
1 W1 = Id
∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
mWm = Id
Generalized Scattering
kSXk2 = E(kXk2)
kSX SY k E(kX Y k2)
Theorem:
X0 E(X0) E
+ +
−
|W1| X1 E(X1) E
+ +
−
X2 |W2| E(X2) E
+ +
−
|W3| X3
SLIDE 115
- Deep scattering perform adaptive space contractions
- Squeeze the space while minimizing the data volume reduction
Unsupervised Learning by Scattering
SLIDE 116
- Deep scattering perform adaptive space contractions
- Squeeze the space while minimizing the data volume reduction
|W1|
Unsupervised Learning by Scattering
SLIDE 117
- Deep scattering perform adaptive space contractions
- Squeeze the space while minimizing the data volume reduction
|W2|
Unsupervised Learning by Scattering
SLIDE 118
- Deep scattering perform adaptive space contractions
- Squeeze the space while minimizing the data volume reduction
|W3|
Unsupervised Learning by Scattering
SLIDE 119
- Deep scattering perform adaptive space contractions
- Squeeze the space while minimizing the data volume reduction
|W3|
Unsupervised Learning by Scattering
Normalization
SLIDE 120
- Deep scattering perform adaptive space contractions
- Squeeze the space while minimizing the data volume reduction
|W3|
Unsupervised Learning by Scattering
Proposition: The data volume reduction at layer m is E(kXm−1 E(Xm−1)k2) E(kXm E(Xm)k2) = kE(Xm)k2
Normalization
SLIDE 121
- Deep scattering perform adaptive space contractions
- Squeeze the space while minimizing the data volume reduction
) for all m minimize kE(Xm)k .
|W3|
Unsupervised Learning by Scattering
Proposition: The data volume reduction at layer m is E(kXm−1 E(Xm−1)k2) E(kXm E(Xm)k2) = kE(Xm)k2
Normalization
SLIDE 122
X0 E(X0) X0 − E(X0)
Sparse Layerwise Learning
E
+ +
−
Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
m Wm = Id.
SLIDE 123 X0 E(X0) X0 − E(X0)
Sparse Layerwise Learning
E
+ +
−
Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
m Wm = Id.
- Given Xm−1 − E(Xm−1) we compute Wm by minimizing
⇥E(Xm)⇥ =
⇣ |Wm(Xm−1 E(Xm−1)| ⌘
- l1 norm across realizations
SLIDE 124 X0 E(X0) X1 − E(X1)
Sparse Layerwise Learning
E
+ +
−
|W1| X1 E(X1) E
+ +
−
Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
m Wm = Id.
- Given Xm−1 − E(Xm−1) we compute Wm by minimizing
⇥E(Xm)⇥ =
⇣ |Wm(Xm−1 E(Xm−1)| ⌘
- l1 norm across realizations
SLIDE 125 X0 E(X0)
Sparse Layerwise Learning
E
+ +
−
|W1| X1 E(X1) E
+ +
−
E(X2) X2 |W2| E
+ +
−
Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
m Wm = Id.
- Given Xm−1 − E(Xm−1) we compute Wm by minimizing
⇥E(Xm)⇥ =
⇣ |Wm(Xm−1 E(Xm−1)| ⌘
- l1 norm across realizations
SLIDE 126 X0 E(X0)
Sparse Layerwise Learning
E
+ +
−
|W1| X1 E(X1) E
+ +
−
E(X2) X2 |W2| E
+ +
−
E(X3) |W3| X3 E
+ +
−
Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
m Wm = Id.
- Given Xm−1 − E(Xm−1) we compute Wm by minimizing
⇥E(Xm)⇥ =
⇣ |Wm(Xm−1 E(Xm−1)| ⌘
- l1 norm across realizations
SLIDE 127 X0 E(X0) ⇒ Wm defines a sparse representation of Xm−1 − E(Xm−1)
Sparse Layerwise Learning
E
+ +
−
|W1| X1 E(X1) E
+ +
−
E(X2) X2 |W2| E
+ +
−
E(X3) |W3| X3 E
+ +
−
Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗
m Wm = Id.
- Given Xm−1 − E(Xm−1) we compute Wm by minimizing
⇥E(Xm)⇥ =
⇣ |Wm(Xm−1 E(Xm−1)| ⌘
- l1 norm across realizations
SLIDE 128
S
Linear Classifier
Φx
Binary classification y(x) = 0 or 1 1 1
Supervised Linear Classifiers
E(X0)
+ +
−
|W1| E(X1)
+ +
−
|Wm|
...
E(Xm)
+ +
−
SLIDE 129
- Average pooling is ok without clutter:
- Clutter requires detection: max pooling
Problem: Clutter
0 action classes + "oth
SLIDE 130 Conclusion
- Wavelets are good for deformations and because of sparsity
- Scattering defines contractive deep neural networks which can be
analyzed mathematically
- Unsupervised learning can be optimized with sparse contractions
- What about max and clutter ?
- Analysis of non-linear PDE : turbulence and Navier-Stokes
- Papers and softwares: www.di.ens.fr/scattering