Scattering Bricks to Build Invariants Joan Bruna, Joakim Anden, - - PowerPoint PPT Presentation

scattering bricks to build
SMART_READER_LITE
LIVE PREVIEW

Scattering Bricks to Build Invariants Joan Bruna, Joakim Anden, - - PowerPoint PPT Presentation

Scattering Bricks to Build Invariants Joan Bruna, Joakim Anden, Stphane Mallat Laurent Sifre, Irne Waldspurger cole Normale Suprieure High Dimensional Classification CalTech 101 Anchor Joshua Tree Beaver Lotus Water


slide-1
SLIDE 1

Scattering Bricks to Build

  • Invariants
  • Joan Bruna, Joakim Anden, Stéphane Mallat

Laurent Sifre, Irène Waldspurger

  • École Normale Supérieure
slide-2
SLIDE 2
  • Considerable variability in each class: not low-dimensional
  • Euclidean distances are meaningless on raw data
  • Need to find Informative Invariants.

Anchor Joshua Tree Beaver Lotus Water Lily

High Dimensional Classification

CalTech 101

slide-3
SLIDE 3
  • Analysis in high dimension: x ∈ Rd with d ≥ 106.

Curse of Dimensionality

  • Points are far away in high dimensions d:
slide-4
SLIDE 4
  • Analysis in high dimension: x ∈ Rd with d ≥ 106.

Curse of Dimensionality

  • Points are far away in high dimensions d:
  • 10 points cover [0, 1] at a distance 10−1
slide-5
SLIDE 5
  • Analysis in high dimension: x ∈ Rd with d ≥ 106.

Curse of Dimensionality

  • Points are far away in high dimensions d:
  • 10 points cover [0, 1] at a distance 10−1
  • 100 points for [0, 1]2
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
slide-6
SLIDE 6
  • Analysis in high dimension: x ∈ Rd with d ≥ 106.
  • need 10d points over [0, 1]d

impossible if d ≥ 20

Curse of Dimensionality

  • Points are far away in high dimensions d:
  • 10 points cover [0, 1] at a distance 10−1
  • 100 points for [0, 1]2
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
slide-7
SLIDE 7
  • Analysis in high dimension: x ∈ Rd with d ≥ 106.
  • need 10d points over [0, 1]d

impossible if d ≥ 20

Curse of Dimensionality

  • Points are far away in high dimensions d:
  • 10 points cover [0, 1] at a distance 10−1
  • 100 points for [0, 1]2
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o

points are concentrated in 2d corners!

lim

d→∞

volume sphere of radius r volume [0, r]d = 0

slide-8
SLIDE 8
  • Analysis in high dimension: x ∈ Rd with d ≥ 106.
  • need 10d points over [0, 1]d

impossible if d ≥ 20 ⇒ Euclidean metrics are not appropriate on raw data.

Curse of Dimensionality

  • Points are far away in high dimensions d:
  • 10 points cover [0, 1] at a distance 10−1
  • 100 points for [0, 1]2
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o
  • o o o o o o o o o

points are concentrated in 2d corners!

lim

d→∞

volume sphere of radius r volume [0, r]d = 0

slide-9
SLIDE 9

Euclidean metric

Kernel Learning

hΦx, wi = X

n

wn Φnx For any two classes Ck and Cl finds w so that is nearly invariant and different in any Ck and Cl.

w Ck Cl

Φx = {Φnx}n

Representation

x Φ hΦx, wi ≥ T

class ?

Supervised Linear classification

slide-10
SLIDE 10

Euclidean metric

  • How to construct Φ ?

Kernel Learning

hΦx, wi = X

n

wn Φnx For any two classes Ck and Cl finds w so that is nearly invariant and different in any Ck and Cl.

w Ck Cl

Φx = {Φnx}n

Representation

x Φ hΦx, wi ≥ T

class ?

Supervised Linear classification

slide-11
SLIDE 11

Deep Neural Networks

Hierarchical invariance

Linear Classifier

Pooling Pooling

Wavelets

d

x

Linear

W1

Non Linear

ρ W2 ρ

Φ(x)

...

y

Hinton, LeCun

slide-12
SLIDE 12

Deep Neural Networks

Hierarchical invariance

Linear Classifier

Pooling Pooling

Wavelets

d

x

Linear

W1

Non Linear

ρ W2 ρ

Φ(x)

...

y

  • Role of reconstruction ?
  • Why wavelets ?
  • Role of sparsity ?
  • How to do Unsupervised Learning ?
  • What non-linearities ?
  • Why cascading ?

Hinton, LeCun

slide-13
SLIDE 13

Invariance to Translations Two dimensional group: R2

Translations and Deformations

  • Patterns are translated and deformed
slide-14
SLIDE 14

Deformations are actions of diffeomorphisms: infinite group. Each digit is invariant to a specific set of small deformations Invariance to Translations Two dimensional group: R2

Translations and Deformations

  • Patterns are translated and deformed
slide-15
SLIDE 15

Deformations are actions of diffeomorphisms: infinite group. Each digit is invariant to a specific set of small deformations Invariance to Translations Two dimensional group: R2

Translations and Deformations

  • Patterns are translated and deformed
  • Textures are stationary (translation invariant) processes
slide-16
SLIDE 16

Deformations are actions of diffeomorphisms: infinite group. Each digit is invariant to a specific set of small deformations Invariance to Translations Two dimensional group: R2

Translations and Deformations

  • Patterns are translated and deformed
  • Textures are stationary (translation invariant) processes

with deformations

slide-17
SLIDE 17

Translation orbits (two-dimensional)

  • Specific deformation invariance must be learned.

Translation Invariance

slide-18
SLIDE 18

Translation orbits (two-dimensional)

  • Specific deformation invariance must be learned.

Translation Invariance

slide-19
SLIDE 19

Translation orbits (two-dimensional)

  • Specific deformation invariance must be learned.

Translation Invariance

Invariant to translations

Φ

slide-20
SLIDE 20

Translation orbits (two-dimensional)

  • Specific deformation invariance must be learned.

Translation Invariance

Deformation orbits (high dimensional) Invariant to translations

Φ

slide-21
SLIDE 21

Translation orbits (two-dimensional)

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Deformation orbits (high dimensional) Invariant to translations

Φ

slide-22
SLIDE 22

Translation orbits (two-dimensional) Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Deformation orbits (high dimensional) Invariant to translations

Φ

slide-23
SLIDE 23

Translation orbits (two-dimensional)

PV⊥

k

nearly invariant to deformations

Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Deformation orbits (high dimensional) Invariant to translations

Φ

slide-24
SLIDE 24

Translation orbits (two-dimensional)

PV⊥

k

nearly invariant to deformations

Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Invariant to translations

Φ

slide-25
SLIDE 25

Translation orbits (two-dimensional)

PV⊥

k

nearly invariant to deformations

Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Invariant to translations

Φ

Discriminant

slide-26
SLIDE 26

Translation orbits (two-dimensional)

PV⊥

k

nearly invariant to deformations

Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Invariant to translations

Φ

Discriminant

slide-27
SLIDE 27

Translation orbits (two-dimensional)

PV⊥

k

nearly invariant to deformations

Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Invariant to translations

Φ

Discriminant

slide-28
SLIDE 28

Translation orbits (two-dimensional)

PV⊥

k

nearly invariant to deformations

Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Invariant to translations

Φ

Discriminant

slide-29
SLIDE 29

Translation orbits (two-dimensional)

PV⊥

k

nearly invariant to deformations

Supervised learning:

  • Specific deformation invariance must be learned.

”Linearizes” deformations

Translation Invariance

Invariant to translations

Φ

Discriminant

slide-30
SLIDE 30
  • Invariance to translations xc(t) = x(t − c)

Stable Translation Invariants

∀c ∈ R , Φ(xc) = Φ(x) .

slide-31
SLIDE 31
  • Invariance to translations xc(t) = x(t − c)

Stable Translation Invariants

∀c ∈ R , Φ(xc) = Φ(x) .

x(t) xc(t)

slide-32
SLIDE 32
  • Invariance to translations xc(t) = x(t − c)

Stable Translation Invariants

∀c ∈ R , Φ(xc) = Φ(x) .

Φ(x) Φ(xc) : registration

slide-33
SLIDE 33
  • Invariance to translations xc(t) = x(t − c)

Stable Translation Invariants

∀c ∈ R , Φ(xc) = Φ(x) .

: Fourier Modulus Φ(x) = |ˆ x(ω)| Φ(xc) = |ˆ xc(ω)|

ω ω

slide-34
SLIDE 34
  • Invariance to translations xc(t) = x(t − c)
  • Lipschitz stable to deformations xτ(t) = x(t − τ(t))

Stable Translation Invariants

∀c ∈ R , Φ(xc) = Φ(x) .

x(t) xτ(t)

slide-35
SLIDE 35
  • Invariance to translations xc(t) = x(t − c)
  • Lipschitz stable to deformations xτ(t) = x(t − τ(t))

Stable Translation Invariants

small deformations of x = ⇒ small modifications of Φ(x) ∀c ∈ R , Φ(xc) = Φ(x) .

x(t) xτ(t) deformation size

⇤τ , ⇧Φ(xτ) Φ(x)⇧ ⇥ C sup

t |⌃τ(t)| ⇧x⇧ .

slide-36
SLIDE 36
  • Invariance to translations xc(t) = x(t − c)

Not stable

  • Lipschitz stable to deformations xτ(t) = x(t − τ(t))

Stable Translation Invariants

small deformations of x = ⇒ small modifications of Φ(x) ∀c ∈ R , Φ(xc) = Φ(x) .

Φ(x) Φ(xτ) kΦ(x) Φ(xτ)k sup

t |τ 0(t)| kxk

deformation size

⇤τ , ⇧Φ(xτ) Φ(x)⇧ ⇥ C sup

t |⌃τ(t)| ⇧x⇧ .

slide-37
SLIDE 37
  • Invariance to translations xc(t) = x(t − c)

Fourier invariants are not stable either. Not stable

  • Lipschitz stable to deformations xτ(t) = x(t − τ(t))

Stable Translation Invariants

small deformations of x = ⇒ small modifications of Φ(x) ∀c ∈ R , Φ(xc) = Φ(x) .

Φ(x) Φ(xτ) kΦ(x) Φ(xτ)k sup

t |τ 0(t)| kxk

deformation size

⇤τ , ⇧Φ(xτ) Φ(x)⇧ ⇥ C sup

t |⌃τ(t)| ⇧x⇧ .

slide-38
SLIDE 38
  • Dilated:
  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t)

ψλ(t) = 2−j ψ(2−jt) with λ = 2−j .

Wavelet Transform

| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)

slide-39
SLIDE 39
  • Dilated:
  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t)

x ? λ(t) = Z x(u) λ(t − u) du ψλ(t) = 2−j ψ(2−jt) with λ = 2−j .

Wavelet Transform

| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)

Wx = ✓ x ? (t) x ? λ(t) ◆

t,λ

  • Wavelet transform:

ˆ x(ω)

slide-40
SLIDE 40
  • Dilated:

Unitary: Wx2 = x2 .

  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t)

x ? λ(t) = Z x(u) λ(t − u) du ψλ(t) = 2−j ψ(2−jt) with λ = 2−j .

Wavelet Transform

| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)

Wx = ✓ x ? (t) x ? λ(t) ◆

t,λ

  • Wavelet transform:

ˆ x(ω)

slide-41
SLIDE 41

rotated and dilated:

real parts imaginary parts

  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)

ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)

Image Wavelet Transform

slide-42
SLIDE 42

rotated and dilated:

real parts imaginary parts

  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)

ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)

Image Wavelet Transform

| ˆ ψλ(ω)|2

ω1

ω2

slide-43
SLIDE 43

rotated and dilated:

real parts imaginary parts

  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)

ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)

Image Wavelet Transform

| ˆ ψλ(ω)|2

ω1

ω2

slide-44
SLIDE 44

rotated and dilated:

real parts imaginary parts

  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)

ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)

Image Wavelet Transform

| ˆ ψλ(ω)|2

ω1

ω2

slide-45
SLIDE 45

rotated and dilated:

real parts imaginary parts

  • Complex wavelet: ψ(t) = ψa(t) + i ψb(t) , t = (t1, t2)

ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)

Image Wavelet Transform

Wx = ✓ x ? (t) x ? λ(t) ◆

t,λ

Unitary: Wx2 = x2 .

  • Wavelet transform:

| ˆ ψλ(ω)|2

ω1

ω2

slide-46
SLIDE 46

x ? λ1(t) = x ? a

λ1(t) + i x ? b λ1(t)

Wavelet Translation Invariance

slide-47
SLIDE 47
  • The modulus |x ? λ1| is a regular envelop

Wavelet Translation Invariance

pooling

|x ? λ1(t)| = q |x ? a

λ1(t)|2 + |x ? b λ1(t)|2

slide-48
SLIDE 48
  • The modulus |x ? λ1| is a regular envelop

|x ? λ1| ? (t)

  • The average |x ? λ1| ? (t) is invariant to small translations

relatively to the support of φ.

Wavelet Translation Invariance

slide-49
SLIDE 49
  • The modulus |x ? λ1| is a regular envelop

|x ? λ1| ? (t)

  • The average |x ? λ1| ? (t) is invariant to small translations

relatively to the support of φ. lim

φ→1 |x ? λ1| ? (t) =

Z |x ? λ1(u)| du = kx ? λ1k1

Wavelet Translation Invariance

slide-50
SLIDE 50

|x ? λ1|

Recovering Lost Information

|x ⇤⇥ λ1| ⇤

slide-51
SLIDE 51

|x ? λ1|

  • The high frequencies of |x ? λ1| are in wavelet coefficients:

W|x ? λ1| = ✓ |x ? λ1| ? (t) |x ? λ1| ? λ2(t) ◆

t,λ2

Recovering Lost Information

|x ⇤⇥ λ1| ⇤

slide-52
SLIDE 52

|x ? λ1|

  • The high frequencies of |x ? λ1| are in wavelet coefficients:

W|x ? λ1| = ✓ |x ? λ1| ? (t) |x ? λ1| ? λ2(t) ◆

t,λ2

Recovering Lost Information

∀1 , 2 , | | x ? λ1| ? λ2| ? (t)

  • Translation invariance by time averaging the amplitude:

|x ⇤⇥ λ1| ⇤

slide-53
SLIDE 53

x

Deep Convolution Network

slide-54
SLIDE 54

x x ? |x ? λ1|

|W1|

Deep Convolution Network

slide-55
SLIDE 55

x x ? |x ? λ1| ? ||x ? λ1| ? λ2| |x ? λ1|

|W1| |W2|

Deep Convolution Network

slide-56
SLIDE 56

x x ? |x ? λ1| ? ||x ? λ1| ? λ2| ||x ? λ1| ? λ2| ? |||x ? λ1| ? λ2| ? λ3| |x ? λ1|

|W1| |W2| |W3|

Deep Convolution Network

slide-57
SLIDE 57

Sx =       x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...      

u,λ1,λ2,λ3,...

Network ouptut:

Scattering Vector

slide-58
SLIDE 58

log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75

18 Hz

Amplitude Modulation

log(λ1) t log(λ1) t t log(λ2)

|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)

slide-59
SLIDE 59

log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75

18 Hz

Amplitude Modulation

log(λ1) t log(λ1) t t log(λ2)

|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)

slide-60
SLIDE 60

log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75

18 Hz

Amplitude Modulation

log(λ1) t log(λ1) t t log(λ2)

|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)

slide-61
SLIDE 61

log(!1) t First−order windowed scattering (small scale) log(!1) t First−order windowed scattering (large scale) log(!2) t Second−order windowed scattering (large scale) Band #75

18 Hz

Amplitude Modulation

1977 Hz

log(λ1) t log(λ1) t t log(λ2)

|x ? λ1(t)| |x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = log(1977)

slide-62
SLIDE 62

Cascade of Contractions

x

|W1| |W2| |W3|

slide-63
SLIDE 63

Cascade of Contractions

x

|W1| |W2| |W3|

x ? |x ? λ1| ? ||x ? λ1| ? λ2| ?

slide-64
SLIDE 64
  • Cascade of contractive operators

⇤|Wk|x |Wk|x0⇤ ⇥ ⇤x x0⇤

Cascade of Contractions

x

|W1| |W2| |W3|

x ? |x ? λ1| ? ||x ? λ1| ? λ2| ?

slide-65
SLIDE 65
  • Cascade of contractive operators

⇤|Wk|x |Wk|x0⇤ ⇥ ⇤x x0⇤ with |Wk|x = x .

Cascade of Contractions

x

|W1| |W2| |W3|

x ? |x ? λ1| ? ||x ? λ1| ? λ2| ?

slide-66
SLIDE 66

Sx =       x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...      

u,λ1,λ2,λ3,...

contractive kSx Syk  kx yk preserves norms kSxk = kxk

Scattering Properties

Theorem: For appropriate wavelets, a scattering is

slide-67
SLIDE 67

Sx =       x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...      

u,λ1,λ2,λ3,...

contractive kSx Syk  kx yk preserves norms kSxk = kxk stable to deformations xτ(t) = x(t − τ(t)) kSx Sxτk  C sup

t |rτ(t)| kxk

Scattering Properties

Theorem: For appropriate wavelets, a scattering is

slide-68
SLIDE 68

Sx =       x ⇤ (u) |x ⇤ ⇥λ1| ⇤ (u) ||x ⇤⇥ λ1| ⇤ ⇥λ2| ⇤ (u) |||x ⇤⇥ λ2| ⇤ ⇥λ2| ⇤ ⇥λ3| ⇤ (u) ...      

u,λ1,λ2,λ3,...

contractive kSx Syk  kx yk preserves norms kSxk = kxk stable to deformations xτ(t) = x(t − τ(t)) kSx Sxτk  C sup

t |rτ(t)| kxk

⇒ linear discriminative classification from Φx = Sx

Scattering Properties

Theorem: For appropriate wavelets, a scattering is

slide-69
SLIDE 69

X1

X2

computed with PCA. MNIST data basis:

Linearized Classification Joan Bruna

slide-70
SLIDE 70
  • Each class Xk is represented by a scattering centroid E(SXk)

Affine space model Ak = E(SXk) + Vk. X1

X2

S computed with PCA. MNIST data basis:

Linearized Classification

A1 A2 E(SX1) E(SX2)

Joan Bruna

slide-71
SLIDE 71
  • Each class Xk is represented by a scattering centroid E(SXk)

Affine space model Ak = E(SXk) + Vk. X1

X2

S computed with PCA. MNIST data basis:

Linearized Classification

A1 A2 E(SX1) E(SX2)

x

Joan Bruna

slide-72
SLIDE 72
  • Each class Xk is represented by a scattering centroid E(SXk)

Affine space model Ak = E(SXk) + Vk. X1

X2

S computed with PCA. MNIST data basis:

Linearized Classification

A1 A2 E(SX1) E(SX2)

x

Sx

Joan Bruna

slide-73
SLIDE 73
  • Each class Xk is represented by a scattering centroid E(SXk)

Affine space model Ak = E(SXk) + Vk. X1

X2

S computed with PCA. MNIST data basis:

Linearized Classification

A1 A2 E(SX1) E(SX2)

x

Sx

Joan Bruna

slide-74
SLIDE 74

SX =       E(X) E(|X ? λ1|) E(||X ? λ1| ? λ2|) E(|||X ? λ2| ? λ2| ? λ3|) ...      

λ1,λ2,λ3,...

SX(t) =       X ? (t) |X ? λ1| ? (t) ||X ? λ1| ? λ2| ? (t) |||X ? λ2| ? λ2| ? λ3| ? (t) ...      

λ1,λ2,λ3,...

The scattering transform of a stationary process X(t) is an estimator of the expected scattering of X(t)

Scattering Moments

slide-75
SLIDE 75

x(t): stationary process

Textures with Same Spectrum

Textures Power Spectrum Fourier

ω1 ω2 ω1 ω2

x(t)

slide-76
SLIDE 76

x(t): stationary process

Textures with Same Spectrum

Textures window size = image size Wavelet Scattering Power Spectrum Fourier

ω1 ω2 ω1 ω2

x(t)

|x ? λ1| ?

slide-77
SLIDE 77

x(t): stationary process

Textures with Same Spectrum

Textures window size = image size Wavelet Scattering Power Spectrum Fourier

ω1 ω2 ω1 ω2

x(t)

|x ? λ1| ? ||x ? λ1| ? λ2| ?

slide-78
SLIDE 78

log(1) t First−order windowed scattering (small scale) log(1) t First−order windowed scattering (large scale) log(2) t Second−order windowed scattering (large scale) Band #51

Spectrum

X: stationary process

Sounds with Same Spectrum

ω

2s window

Fourier

  • J. McDermott

|x ⇥ λ1|(t)

|x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = 2000

log(λ2) log(λ1) log(λ1) t

slide-79
SLIDE 79

log(1) t First−order windowed scattering (small scale) log(1) t First−order windowed scattering (large scale) log(2) t Second−order windowed scattering (large scale) Band #51

Spectrum

X: stationary process

Sounds with Same Spectrum

ω

2s window

Fourier

  • J. McDermott

|x ⇥ λ1|(t)

|x ? λ1| ? (t) ||x ? λ1| ? λ2| ? (t) for 1 = 2000

log(λ2) log(λ1) log(λ1) t

slide-80
SLIDE 80

SX =       E(X) = E(U0X) E(|X ⇥ λ1|) = E(U1X) E(||X ⇥ λ1| ⇥ λ2|) = E(U2X) E(|||X ⇥ λ2| ⇥ λ2| ⇥ λ3|) = E(U3X) ...      

λ1,λ2,λ3,...

p(x) = 1 Z exp ⇣

X

m=1

λm . Umx ⌘ and maximizes the entropy − R p(x) log p(x) dx can be written: Theorem (Boltzmann) The distribution p(x) which satisfies

  • An expected scattering is a non-complete representation

Z

RN Umx p(x) dx = E(UmX)

Representation of Random Processes

slide-81
SLIDE 81
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-82
SLIDE 82
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-83
SLIDE 83
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-84
SLIDE 84
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-85
SLIDE 85
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-86
SLIDE 86
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-87
SLIDE 87
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-88
SLIDE 88
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-89
SLIDE 89
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-90
SLIDE 90
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-91
SLIDE 91
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-92
SLIDE 92
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-93
SLIDE 93
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-94
SLIDE 94
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-95
SLIDE 95
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-96
SLIDE 96
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-97
SLIDE 97
  • Maximum entropy estimation of X(t) :
  • Gaussian model from N power spectrum coefficients.
  • Scattering model from (log2 N)2/2 1st & 2nd orders.

Original Gaussian Model Scattering Moments JackHammer Water Applause Paper Cocktail Party Not good for everything: learn from mistakes.

Synthesis from Second Order

Joan Bruna

  • J. McDermott textures

Joakim Anden

slide-98
SLIDE 98

Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0.2 %

  • J. Bruna

Classification of Textures

CUREt database 61 classes

  • Texte

y x Sx Supervised Linear Classifier: PCA/SVM

slide-99
SLIDE 99

Wavelet Transform on a Group

(r, t) . x(u) = x(r−1(u − t))

  • Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}

Laurent Sifre

slide-100
SLIDE 100
  • Averaging on G:

X ~ φ(g) = Z

G

X(g0) φ(g

01g) dg0

Wavelet Transform on a Group

(r, t) . x(u) = x(r−1(u − t))

  • Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}

Laurent Sifre

slide-101
SLIDE 101
  • Averaging on G:

X ~ φ(g) = Z

G

X(g0) φ(g

01g) dg0

  • Wavelet transform on G:

W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆

λ2,g

.

Wavelet Transform on a Group

(r, t) . x(u) = x(r−1(u − t))

  • Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}

Laurent Sifre

slide-102
SLIDE 102
  • Averaging on G:

X ~ φ(g) = Z

G

X(g0) φ(g

01g) dg0

  • Wavelet transform on G:

W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆

λ2,g

. |W1|

Wavelet Transform on a Group

x x ? (t) |x ? 2jr(t)|

translation = Xj(r, t) (r, t) . x(u) = x(r−1(u − t))

  • Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}

Laurent Sifre

slide-103
SLIDE 103

Xj ~ φ(r, t)

|Xj ~ ψλ2(r, t)|

  • Averaging on G:

X ~ φ(g) = Z

G

X(g0) φ(g

01g) dg0

  • Wavelet transform on G:

W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆

λ2,g

. |W1| |W2|

Wavelet Transform on a Group

x x ? (t) |x ? 2jr(t)|

translation roto-translation = Xj(r, t) (r, t) . x(u) = x(r−1(u − t))

  • Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}

Laurent Sifre

slide-104
SLIDE 104

X ~ φ(2j, r, t)

|X ~ ψλ2(2j, r, t)|

  • Averaging on G:

X ~ φ(g) = Z

G

X(g0) φ(g

01g) dg0

  • Wavelet transform on G:

W2X = ✓ X ~ φ(g) X ~ ψλ2(g) ◆

λ2,g

. |W1| |W2|

Wavelet Transform on a Group

x x ? (t) |x ? 2jr(t)|

translation

scalo-roto-translation = X(2j, r, t)

+ renormalization

(r, t) . x(u) = x(r−1(u − t))

  • Roto-translation group G = {g = (r, t) ∈ SO(2) × R2}

Laurent Sifre

slide-105
SLIDE 105

UIUC database: 25 classes Scattering classification errors Training Translation Transl + Rotation + Scaling 20 20 % 2% 0.6%

Rotation and Scaling Invariance

Laurent Sifre

slide-106
SLIDE 106

p(x)

  • estimate a representation of ΦX of p(x)
  • model the {xi}i as realization of a random vector X ∈ Rd
  • Unsupervised learning of Φ from unlabeled examples {xi}:

Unsupervised Learning

x ∈ Rd Φx ∈ RD

Representation

Φ

Unsupervised Learning

slide-107
SLIDE 107
  • Classical approaches: clustering and Gaussian mixture models

decomposition in ellipsoids not feasible in high dimensions p(x)

  • estimate a representation of ΦX of p(x)
  • model the {xi}i as realization of a random vector X ∈ Rd
  • Unsupervised learning of Φ from unlabeled examples {xi}:

Unsupervised Learning

x ∈ Rd Φx ∈ RD

Representation

Φ

Unsupervised Learning

slide-108
SLIDE 108
  • Classical approaches: clustering and Gaussian mixture models

decomposition in ellipsoids not feasible in high dimensions p(x)

  • estimate a representation of ΦX of p(x)
  • model the {xi}i as realization of a random vector X ∈ Rd
  • Unsupervised learning of Φ from unlabeled examples {xi}:

Unsupervised Learning

x ∈ Rd Φx ∈ RD

Representation

Φ

Unsupervised Learning

slide-109
SLIDE 109

Generalized Scattering

X0

slide-110
SLIDE 110

X1 = |W1(X0 − E(X0)| with W ∗

1 W1 = Id

Generalized Scattering

X0 E(X0) E

+ +

|W1| X1

slide-111
SLIDE 111

X1 = |W1(X0 − E(X0)| with W ∗

1 W1 = Id

∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

mWm = Id

Generalized Scattering

X0 E(X0) E

+ +

|W1| X1 E(X1) E

+ +

X2 |W2|

slide-112
SLIDE 112
  • Expected scattering transform:

SX = {E(Xm)}m∈N X1 = |W1(X0 − E(X0)| with W ∗

1 W1 = Id

∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

mWm = Id

Generalized Scattering

X0 E(X0) E

+ +

|W1| X1 E(X1) E

+ +

X2 |W2| E(X2) E

+ +

|W3| X3

slide-113
SLIDE 113
  • Expected scattering transform:

SX = {E(Xm)}m∈N represents the probability density of X. X1 = |W1(X0 − E(X0)| with W ∗

1 W1 = Id

∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

mWm = Id

Generalized Scattering

X0 E(X0) E

+ +

|W1| X1 E(X1) E

+ +

X2 |W2| E(X2) E

+ +

|W3| X3

slide-114
SLIDE 114
  • Expected scattering transform:

SX = {E(Xm)}m∈N represents the probability density of X. X1 = |W1(X0 − E(X0)| with W ∗

1 W1 = Id

∀m Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

mWm = Id

Generalized Scattering

kSXk2 = E(kXk2)

kSX SY k  E(kX Y k2)

Theorem:

X0 E(X0) E

+ +

|W1| X1 E(X1) E

+ +

X2 |W2| E(X2) E

+ +

|W3| X3

slide-115
SLIDE 115
  • Deep scattering perform adaptive space contractions
  • Squeeze the space while minimizing the data volume reduction

Unsupervised Learning by Scattering

slide-116
SLIDE 116
  • Deep scattering perform adaptive space contractions
  • Squeeze the space while minimizing the data volume reduction

|W1|

Unsupervised Learning by Scattering

slide-117
SLIDE 117
  • Deep scattering perform adaptive space contractions
  • Squeeze the space while minimizing the data volume reduction

|W2|

Unsupervised Learning by Scattering

slide-118
SLIDE 118
  • Deep scattering perform adaptive space contractions
  • Squeeze the space while minimizing the data volume reduction

|W3|

Unsupervised Learning by Scattering

slide-119
SLIDE 119
  • Deep scattering perform adaptive space contractions
  • Squeeze the space while minimizing the data volume reduction

|W3|

Unsupervised Learning by Scattering

Normalization

slide-120
SLIDE 120
  • Deep scattering perform adaptive space contractions
  • Squeeze the space while minimizing the data volume reduction

|W3|

Unsupervised Learning by Scattering

Proposition: The data volume reduction at layer m is E(kXm−1 E(Xm−1)k2) E(kXm E(Xm)k2) = kE(Xm)k2

Normalization

slide-121
SLIDE 121
  • Deep scattering perform adaptive space contractions
  • Squeeze the space while minimizing the data volume reduction

) for all m minimize kE(Xm)k .

|W3|

Unsupervised Learning by Scattering

Proposition: The data volume reduction at layer m is E(kXm−1 E(Xm−1)k2) E(kXm E(Xm)k2) = kE(Xm)k2

Normalization

slide-122
SLIDE 122

X0 E(X0) X0 − E(X0)

Sparse Layerwise Learning

E

+ +

Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

m Wm = Id.

slide-123
SLIDE 123

X0 E(X0) X0 − E(X0)

Sparse Layerwise Learning

E

+ +

Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

m Wm = Id.

  • Given Xm−1 − E(Xm−1) we compute Wm by minimizing

⇥E(Xm)⇥ =

  • E

⇣ |Wm(Xm−1 E(Xm−1)| ⌘

  • l1 norm across realizations
slide-124
SLIDE 124

X0 E(X0) X1 − E(X1)

Sparse Layerwise Learning

E

+ +

|W1| X1 E(X1) E

+ +

Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

m Wm = Id.

  • Given Xm−1 − E(Xm−1) we compute Wm by minimizing

⇥E(Xm)⇥ =

  • E

⇣ |Wm(Xm−1 E(Xm−1)| ⌘

  • l1 norm across realizations
slide-125
SLIDE 125

X0 E(X0)

Sparse Layerwise Learning

E

+ +

|W1| X1 E(X1) E

+ +

E(X2) X2 |W2| E

+ +

Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

m Wm = Id.

  • Given Xm−1 − E(Xm−1) we compute Wm by minimizing

⇥E(Xm)⇥ =

  • E

⇣ |Wm(Xm−1 E(Xm−1)| ⌘

  • l1 norm across realizations
slide-126
SLIDE 126

X0 E(X0)

Sparse Layerwise Learning

E

+ +

|W1| X1 E(X1) E

+ +

E(X2) X2 |W2| E

+ +

E(X3) |W3| X3 E

+ +

Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

m Wm = Id.

  • Given Xm−1 − E(Xm−1) we compute Wm by minimizing

⇥E(Xm)⇥ =

  • E

⇣ |Wm(Xm−1 E(Xm−1)| ⌘

  • l1 norm across realizations
slide-127
SLIDE 127

X0 E(X0) ⇒ Wm defines a sparse representation of Xm−1 − E(Xm−1)

Sparse Layerwise Learning

E

+ +

|W1| X1 E(X1) E

+ +

E(X2) X2 |W2| E

+ +

E(X3) |W3| X3 E

+ +

Xm = |Wm(Xm−1 − E(Xm−1)| with W ∗

m Wm = Id.

  • Given Xm−1 − E(Xm−1) we compute Wm by minimizing

⇥E(Xm)⇥ =

  • E

⇣ |Wm(Xm−1 E(Xm−1)| ⌘

  • l1 norm across realizations
slide-128
SLIDE 128

S

Linear Classifier

Φx

Binary classification y(x) = 0 or 1 1 1

Supervised Linear Classifiers

E(X0)

+ +

|W1| E(X1)

+ +

|Wm|

...

E(Xm)

+ +

slide-129
SLIDE 129
  • Average pooling is ok without clutter:
  • Clutter requires detection: max pooling

Problem: Clutter

0 action classes + "oth

slide-130
SLIDE 130

Conclusion

  • Wavelets are good for deformations and because of sparsity
  • Scattering defines contractive deep neural networks which can be

analyzed mathematically

  • Unsupervised learning can be optimized with sparse contractions
  • What about max and clutter ?
  • Analysis of non-linear PDE : turbulence and Navier-Stokes
  • Papers and softwares: www.di.ens.fr/scattering