Harmonic Analysis of Deep Convolutional Networks
Yuan YAO HKUST Based on Mallat and Bolcskei talks etc.
1
Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST - - PowerPoint PPT Presentation
Harmonic Analysis of Deep Convolutional Networks 1 Yuan YAO HKUST Based on Mallat and Bolcskei talks etc. Acknowledgement A following-up course at HKUST: https://deeplearning-math.github.io/ High Dimensional Natural Image Classification
Yuan YAO HKUST Based on Mallat and Bolcskei talks etc.
1
A following-up course at HKUST: https://deeplearning-math.github.io/
d = 106
Anchor Joshua Tree Beaver Lotus Water Lily
impossible if d ≥ 20 ⇒ Euclidean metrics are not appropriate on raw data.
points are concentrated in 2d corners!
lim
d→∞
volume sphere of radius r volume [0, r]d = 0
u1 u2
M
n=1
wn.x = P
k wk,nxk
d
M
n=1
f(x)
x
✏
ρ(u) = max(u, 0) ˜ f(x) = X
n
an ⇢(x − n✏) n✏
ρ(u) = max(u, 0)
˜ f(x) = X
n
an ⇢(wn.x − n✏)
f(x) pu(x)
x u
k
1D projection
Metric: kx x0k
Linear Classifier
kΦ(x) Φ(x0)k
) kf fMk C M −1/d0 Φ(x) 6= Φ(x0) if f(x) 6= f(x0) ⇒ ∃ ˜ f with f(x) = ˜ f(Φ(x))
, |f(x) f(x0)| C kΦ(x) Φ(x0)k z = Φ(x)
f is Lipschitz: | ˜ f(z) ˜ f(z0)| C kz z0k Discriminative: kΦ(x) Φ(x0)k C1 |f(x) f(x0)|
Linear Classificat.
linear convolution linear convolution
non-linear scalar:
neuron
ρ(u) = max(u, 0)
Hierarchical invariants Linearization
x(u) x1(u, k1) x2(u, k2) xJ(u, kJ) k1 k2
ρL1 ρLJ xj = ρ Lj xj−1 xj(u, kj) = ⇢ ⇣ X
k
xj−1(·, k) ? hkj,k(u) ⌘
sum across channels
classification
ρ(u) = max(u, 0) or ρ(u) = |u|
x(u) x1(u, k1) x2(u, k2) xJ(u, kJ) k1 k2
classification
Φ(x) x
Φ(x) x
g g
1 gp2 2 ... gpn n
x(u)
Video of Philipp Scott Johnson
: small : huge group
x0(u) = x(u − τ(u))
Ω3 Ω5
x gp
1.x
g1x gp
1.x0
g1x0 x0
Φ(gp
1.x0)
Φ(x0) Φ(x) Φ(gp
1.x)
x(u) x0(u)
Video of Philipp Scott Johnson
|b x(ω)| |b xτ(ω)|
x(ω) = R x(t) e−iωt dt The modulus is invariant to translations: ) k|ˆ x| |ˆ xτ|k krτk∞ kxk Φ(x) = |ˆ x| = |ˆ xc|
| |ˆ xτ(ω)| − |ˆ x(ω)| | is big at high frequencies
ω
xc(t) = x(t − c) ⇒ ˆ xc(ω) = e−icω ˆ x(ω)
⌧(t) = ✏ t
| ˆ ψλ(ω)|2 λ | ˆ ψλ(ω)|2 λ ω |ˆ φ(ω)|2 ψλ(t) ψλ(t)
t,λ
ˆ x(ω)
rotated and dilated:
real parts imaginary parts
ψλ(t) = 2−j ψ(2−jrt) with λ = (2j, r)
Wx = ✓ x ? (t) x ? λ(t) ◆
t,λ
Unitary: Wx2 = x2 .
| ˆ ψλ(ω)|2
ω1
ω2
t |⌅τ(t)| .
´ Wavelets are uniformly stable to deformations ´ Wavelets are sparse representations of functions ´ Wavelets separate multiscale information ´ Wavelets can be locally translation invariant
t |⌅τ(t)| .
x(t) |x ⇥ λ1(t)| =
x(u)λ1(t − u) du
1/λ1
|x ⇥ λ1(t)|
x(t) |x ⇥ λ1(t)| =
x(u)λ1(t − u) du
1/λ1
|x ⇥ λ1(t)| ψλ2
Second wavelet transform modulus |W2| |x ? λ1|= ✓ |x ? λ1| ? 2J(t) ||x ? λ1| ? λ2(t)| ◆
λ2
x ? λ1(t) = x ? a
λ1(t) + i x ? b λ1(t)
pooling
|x ? λ1(t)| = q |x ? a
λ1(t)|2 + |x ? b λ1(t)|2
|x ? λ1| ? (t)
relatively to the support of φ.
|x ? λ1| ? (t)
relatively to the support of φ. lim
φ→1 |x ? λ1| ? (t) =
Z |x ? λ1(u)| du = kx ? λ1k1
|x ? λ1|
W|x ? λ1| = ✓ |x ? λ1| ? (t) |x ? λ1| ? λ2(t) ◆
t,λ2
∀1 , 2 , | | x ? λ1| ? λ2| ? (t)
|x ⇤⇥ λ1| ⇤
20 22 2J
|x ? 22,θ|
Scale 21
|x ? 21,θ|
x(u) ρ(α) = |α|
|x ? 2j,θ|
|W|x = ✓ x ⇤ (t) |x ⇤ ⇥λ(t)| ◆
t,λ
is non-linear Wx = ✓ x ⇤ (t) x ⇤ ⇥λ(t) ◆
t,λ
is linear and kWxk = kxk
because for (a, b) ∈ C2 ||a| − |b|| ≤ |a − b|
ρ(u) = |u|
⇤|Wk|x |Wk|x0⇤ ⇥ ⇤x x0⇤ with |Wk|x = x .
x
|W1| |W2| |W3|
x ? |x ? λ1| ? ||x ? λ1| ? λ2| ?