The Assignment Flow Christoph Schnrr Ruben Hhnerbein, Stefania - - PowerPoint PPT Presentation
The Assignment Flow Christoph Schnrr Ruben Hhnerbein, Stefania - - PowerPoint PPT Presentation
The Assignment Flow Christoph Schnrr Ruben Hhnerbein, Stefania Petra, Fabrizio Savarino, Alexander Zeilmann, Artjom Zern, Matthias Zisler Image & Pattern Analysis Group Heidelberg University Mathematics of Imaging: W1
SLIDE 1
SLIDE 2
Machine learning and …
- signal & image proc.
- computer vision
- math. imaging
- inverse problems
- statistics
- …
SLIDE 3
Machine learning and …
predictive accuracy descriptive accuracy
deep networks
- signal & image proc.
- computer vision
- math. imaging
- inverse problems
- statistics
- …
debugging heat maps …
SLIDE 4
Machine learning and …
predictive accuracy descriptive accuracy
deep networks debugging heat maps …
“interpretable ML !” “explainable AI !”
SLIDE 5
Machine learning and …
predictive accuracy descriptive accuracy
deep networks graphical models
Geman & Geman (T-PAMI 1984) … Kappes et al. (IJCV 2015)
SLIDE 6
Machine learning and …
predictive accuracy descriptive accuracy
deep networks graphical models
SLIDE 7
Machine learning and …
predictive accuracy descriptive accuracy
deep networks assignment flow
- smoothness
- hierarchical structure
- math. framework
graphical models
SLIDE 8
Machine learning and …
predictive accuracy descriptive accuracy
deep networks assignment flow
- smoothness
- hierarchical structure
- math. framework
graphical models
SLIDE 9
Outline
- set-up: assignment flow
supervised labeling
- unsupervised labeling
- label evolution
- label learning from scratch
- parameter estimation (control)
- outlook
JMIV’17 SIIMS’18
SLIDE 10
Set-up: assignment flow & supervised labeling
fi g1 V g2 gm
Di =
- d(fi, g1), . . . , d(fi, gm)
- feature
labels (prototypes) metric, distance vector (data term)
- fi ∈ (X, d)
features metric space
∈ X ∈ W Wi =
- Pr(g1|fi), . . . , Pr(gm|fi)
- assignment
vector
g1
g2 g3
Wi
Fisher-Rao metric
- n S
SLIDE 11
Set-up: assignment flow & supervised labeling
fi g1 V g2 gm
Di =
- d(fi, g1), . . . , d(fi, gm)
- labels
(prototypes) metric, distance vector (data term)
- fi ∈ (X, d)
features metric space
g1
g2 g3
Wi
˙ Wi = ΠWi
- Li(Wi)
- = ΠWi
⇣ expWi
- 1
ρDi ⌘
likelihood vector Fisher-Rao metric
- n S
feature
SLIDE 12
Set-up: assignment flow & supervised labeling
fi g1 V g2 gm
Di =
- d(fi, g1), . . . , d(fi, gm)
- labels
(prototypes) metric, distance vector (data term)
- fi ∈ (X, d)
features metric space
g1
g2 g3
Wi
Gw
i (W) = ExpWi
⇣ X
k2Ni
wik Exp1
Wi(Wk)
⌘
, Si(W) = Gw
i
- L(W)
- similarity
vector regularisation
control parameters
scale feature
SLIDE 13
Set-up: assignment flow & supervised labeling
fi g1 V g2 gm
Di =
- d(fi, g1), . . . , d(fi, gm)
- labels
(prototypes) metric, distance vector (data term)
- fi ∈ (X, d)
features metric space
˙ W = ΠW
- S(W)
- ,
W
∈ , W(0) = 1W.
W(t) 2 (W, gF R)
assignment manifold assignment flow
W = S ⇥ · · · ⇥ S
scale feature
SLIDE 14
Illustration
ρ
scale
SLIDE 15
Illustration
local 3 x 3 5 x 5
SLIDE 16
Assignment flow: geometric integration
ODEs on manifolds, Lie group methods (Iserles et al.’05, Hairer et al. ’06)
λ(v, p) = Λ(expG(v), p)
(λ∗v)p = d dtΛ(expG(tv), p)
- t=0
, y(0) = p. ˙ y =
- λ∗f(t, y)
- y,
y
y(t) = λ
- v(t), p
- ˙
v = (dexp−1
G )v
- f(t, λ(v, p))
- ,
v
, v(0) = 0,
Λ: G ⇥ M ! M
g X(M) G λ M Λ
λ∗ expG expM f
SLIDE 17
Assignment flow: geometric integration
nonlinear assignment flow linear assignment flow geometric RKMK implicit geometric Euler embedded RKMK adaptive RK exponential integrator numerical experiments
W(t) = expW0
- V (t)
- any W0 = W(0)
˙ V = ΠT0S
- expW0(V )
- ,
V , V (0) = 0.
(Zeilmann et al, arXiv’18)
SLIDE 18
Properties (more general viewpoint)
- elementary state space
- information geometry
- scale
- distances
statistical manifold dualistic structure small / cooperative & large / competitive adaptive distances
e (g, r, r∗)
r Zg(X, Y ) = g(rZX, Y ) + g(X, r∗
ZY ),
(W, g)
D D
D D(W), W 2 W
(Amari & Chentsov)
SLIDE 19
- set-up: assignment flow
supervised labeling
- unsupervised labeling
- label evolution
- label learning from scratch
- parameter estimation (control)
- outlook
Outline
(Zern et al., GCPR’18) (submitted)
SLIDE 20
Label evolution
fi gj
data (feature) manifold
M
Di =
- d(fi, g1), . . . , d(fi, gm)
- metric, distance vector (data term)
fi gj
data labels
adapt online ! preprocessing
SLIDE 21
Label evolution
P
∈
˙ Wi(t) = ΠWi(t)
- Si
- W(t)
- ,
Wi(0) = 1S, i
, i ∈ I,
label flow (“m”: label as Riemannian means) assignment flow
coupling spatial regularisation time scale divergence measure
˙ mj(t) = α X
i∈I
νj|i
- M(t)
- b
g−1 djD(fi, mj(t))
- ,
mj(0) = mj0, α > 0 X
∈
- b
- νj|i(M) =
Lσ
ij(Wi; M)
P
k∈I Lσ kj(Wk; M),
Lσ
ij(Wi; M) =
Wije− 1
σ D(fi,mj)
P
l∈J Wile− 1
σ D(fi,ml) ,
σ > 0
SLIDE 22
Label evolution
SO(3)-valued data
SLIDE 23
Label evolution
Euclidean color space supervised: 200 labels unsupervised: few labels
SLIDE 24
Label evolution
positive-def. manifold (dim = 120) supervised: 200 labels
Pd 3 Fi = Z h(xi y) ⇥ (f Ei[f]) ⌦ (f Ei[f]) ⇤ (y) dy
1
SLIDE 25
Label evolution
supervised: 200 labels few labels few labels
SLIDE 26
- set-up: assignment flow
supervised labeling
- unsupervised labeling
- label evolution
- label learning from scratch
- parameter estimation (control)
- outlook
Outline
(Zern et al., GCPR’18) (submitted)
SLIDE 27
Label learning from scratch Di =
- d(fi, g1), . . . , d(fi, gm)
- metric, distance vector (data term)
D =
- d(fi, fk)
- i,k2I.
∈ X ∈ W Wi =
- Pr(g1|fi), . . . , Pr(gm|fi)
- ?
each datum is a label
?
SLIDE 28
Label learning from scratch Di =
- d(fi, g1), . . . , d(fi, gm)
- metric, distance vector (data term)
D =
- d(fi, fk)
- i,k2I.
∈ X ∈ W Wi =
- Pr(g1|fi), . . . , Pr(gm|fi)
- ?
each datum is a label
?
2 W Q(fi|fk) = P(fk|fi)P(fi) P
l2I P(fk|fl)P(fl),
P , P(fk) = 1 |I|, k 2 I
marginalize over “data labels’’
P
2
| | | Aji(W) = X
k2I
Q(fj|fk)P(fk|fi) =
- WC(W)1W >
ji
self-affinity matrix symmetric, non-negative, doubly stochastic parametrised by assigments
SLIDE 29
Label learning from scratch
- bjective: spatially regularised data self-assignment
min
W 2W E(W),
E , E(W) = hD, A(W)i
(generalizes )
hD, Wi
SLIDE 30
Label learning from scratch
- bjective: spatially regularised data self-assignment
min
W 2W E(W),
E , E(W) = hD, A(W)i
approach: redefine the likelihood vectors
L(W) = expW
- 1
ρrE(W)
- 2 W,
L
(generalizes )
hD, Wi
˙ W = ΠW
- S(t)
- unsupervised
self-assignment flow
scale
single parameter:
SLIDE 31
Label learning from scratch
- bjective: spatially regularised data self-assignment
min
W 2W E(W),
E , E(W) = hD, A(W)i
approach: redefine the likelihood vectors
L(W) = expW
- 1
ρrE(W)
- 2 W,
L
(generalizes )
hD, Wi
˙ W = ΠW
- S(t)
- unsupervised
self-assignment flow result: spatially regularised discrete optimal transport approaches a low-rank manifold labels and their number emerge from data as latent variables
D, A(W)
, gk
, gk fi
scale
single parameter:
SLIDE 32
Unsupervised self-assignment flow
S1 - valued data
SLIDE 33
- set-up: assignment flow
supervised labeling
- unsupervised labeling
- label evolution
- label learning from scratch
- parameter estimation (control)
- outlook
Outline
(submitted)
SLIDE 34
Recall: regularisation & control parameters
fi g1 V g2 gm
Di =
- d(fi, g1), . . . , d(fi, gm)
- labels
(prototypes) metric, distance vector (data term)
- fi ∈ (X, d)
features metric space
g1
g2 g3
Wi
Gw
i (W) = ExpWi
⇣ X
k2Ni
wik Exp1
Wi(Wk)
⌘
, Si(W) = Gw
i
- L(W)
- similarity
vector regularisation
control parameters
scale feature
SLIDE 35
Motivation
all patch-weights
uniform weights predicted weights after learning
- Ω = {wik : k 2 Ni, i 2 I} 2 P
- (= weight manifold)
SLIDE 36
Approach
all patch-weights linear assignment flow
W(t) = ExpW0
- V (t)
- ,
{ 2 N 2 } ˙ V = ΠW0
- S(W0) + dSW0V
- ,
, V (0) = 0,
linear
- Ω = {wik : k 2 Ni, i 2 I} 2 P
SLIDE 37
Approach
all patch-weights linear assignment flow
W(t) = ExpW0
- V (t)
- ,
{ 2 N 2 } ˙ V = ΠW0
- S(W0) + dSW0V
- ,
, V (0) = 0,
linear
- Ω = {wik : k 2 Ni, i 2 I} 2 P
- E
- V (T)
- = DKL
- W ⇤, exp1W
- V (T)
- bjective
ground-assignments (labelings)
SLIDE 38
Approach
all patch-weights linear assignment flow
W(t) = ExpW0
- V (t)
- ,
{ 2 N 2 } ˙ V = ΠW0
- S(W0) + dSW0V
- ,
, V (0) = 0,
linear
- Ω = {wik : k 2 Ni, i 2 I} 2 P
- E
- V (T)
- = DKL
- W ⇤, exp1W
- V (T)
- bjective
ground-assignments (labelings)
parameter estimation problem
min
Ω∈P
E
- V (T)
- s.t.
˙ V (t) = f(V (t), Ω), t 2 [0, T], V (0) = 0|I|×|J|.
training data for parameter prediction
SLIDE 39
Approach
all patch-weights linear assignment flow
W(t) = ExpW0
- V (t)
- ,
{ 2 N 2 } ˙ V = ΠW0
- S(W0) + dSW0V
- ,
, V (0) = 0,
linear
- Ω = {wik : k 2 Ni, i 2 I} 2 P
- E
- V (T)
- = DKL
- W ⇤, exp1W
- V (T)
- bjective
ground-assignments (labelings)
parameter estimation problem
min
Ω∈P
E
- V (T)
- s.t.
˙ V (t) = f(V (t), Ω), t 2 [0, T], V (0) = 0|I|×|J|.
parameter prediction
b w: Fi ! P,
(fk)k2Ni ! (wik)k2Ni, i
training data for parameter prediction novel data
definition & meaning of what the network learns!
SLIDE 40
Parameter estimation algorithm
- ˙
Ω = rPE
- V (T, Ω)
- = ΠΩ
⇣ d dΩE
- V (T, Ω)
⌘ , Ω(0) = 1P
geometric integration (parameter manifold)
recurring structure
- ptimize then discretise vs. discretise then optimise ?
SLIDE 41
Parameter estimation algorithm
- ˙
Ω = rPE
- V (T, Ω)
- = ΠΩ
⇣ d dΩE
- V (T, Ω)
⌘ , Ω(0) = 1P
E
- V (T, Ω)
- s.t. ˙
V = f(V, Ω) adjoint system nonlinear program sensitivity
diffentiate discretize discretize diffentiate
gradient flow (weight parameter manifold)
recurring structure
Either way yields the same solution ! Key aspect: symplectic integrator for the joint system
SLIDE 42
Adaptive regularization
image class: non-curvilinear letters image features: binary patches
SLIDE 43
Adaptive regularization
uniform uniform adaptive adaptive sanity check novel curvilinear structure
SLIDE 44
Adaptive regularization
noisy random voronoi images
training image test image
Ω* (sample of patches)
uniform weights adaptive weights weight deviation from uniform
SLIDE 45
Outlook
patch assignment pixel assignments unsupervised label learning patch dictionary based prior statistics model reduction & optimal control parameter estimation
̂ w (ℱ𝒪i) ̂ w (ℱ𝒪i, W𝒪i(t))
two-grid structure: data-driven model-driven dynamic adaptive parameter estimation
SLIDE 46
Publications
prior work: IPA group, Heidelberg https://ipa.math.uni-heidelberg.de geometric integration: arXiv:1810.06970, submitted to SI label learning from scratch: submitted to conference parameter estimation: full TRs: arXiv soon synopsis: handbook: variational methods for nonlinear geometric data and applications, ~ July’19
SLIDE 47